JP3878995B2

JP3878995B2 - Redundant system, redundant system configuration system, and redundant system configuration method

Info

Publication number: JP3878995B2
Application number: JP2002340516A
Authority: JP
Inventors: 巧久野
Original assignee: National Institute of Advanced Industrial Science and Technology AIST
Current assignee: National Institute of Advanced Industrial Science and Technology AIST
Priority date: 2002-11-25
Filing date: 2002-11-25
Publication date: 2007-02-07
Anticipated expiration: 2022-11-25
Also published as: JP2004178048A

Description

【０００１】
【発明の属する技術分野】
本発明は、計算機システムの構成要素にフォールトが発生したときに部分的な機能で運用を継続する冗長システムに関するものであり、具体的には、計算機システムの構成要素にフォールトが発生したときでも部分的な機能でシステム運用を継続できるようにするために処理装置や通信装置の機能を多重化した冗長システム、冗長システムを構成する冗長システム構成システムおよび冗長システム構成方法に関するものである。
【０００２】
【従来の技術】
従来から、計算機システムの構成要素にフォールトが発生したときに部分的な機能で運用を継続するための冗長システムが提案されている。この種の計算機システムについて概略を説明する。
【０００３】
計算機システムの構成要素に、フォールト（ハードウェアの故障やソフトウェアのバグ、設計ミス、過負荷、輻輳等を含む広義の障害をいう）が発生しても、外から見る限りそのシステムの機能が正常に維持されるシステムをフォールトトレラントシステムという。一方、フォールトの発生によってシステムの正常な機能をすべて維持することができなくても、その機能の一部を保持して運用を継続するシステムをフェイルソフトシステムという。フォールトトレラントシステムあるいはフェイルソフトシステムを構築するための技術の総称がフォールトトレランス技術である。
【０００４】
フォールトトレランス技術の基本は、多重化による冗長システムを構成する手法である。ハードウェア上の多重化には、静的冗長方式や動的冗長方式がある。ソフトウェア上の多重化には、Ｎバージョン法がある。
【０００５】
静的冗長方式は、フォールトの影響が出力に現れないようにマスクする方式である。静的冗長方式の代表例として多重化多数決システムがある。例えば、３つの同一処理モジュールに同じ入力を与えて各出力の多数決をとると、たとえ１つのモジュールにフォールトが発生しても多数決された出力は、通常の場合は正しい結果となっている。
【０００６】
動的冗長方式は、フォールトの発生した構成要素を予備要素に切替えるか、あるいはフォールトの発生した構成要素を可能な範囲で除去して機能を縮退させる方式である。前者をシステム再構成法、後者を機能縮退法ともいう。
【０００７】
予備要素への切替えは、通常の高信頼システムでよく採用される。プロセッサレベルの予備要素を用意するものには、デュアル方式（２台のプロセッサで同じ処理を同期運転し、比較チェックにより診断／切離し／処理再開を行う）、デュプレックス方式（２台のプロセッサを現用と予備に分け、現用プロセッサに誤りが検出されると、自動または手動で予備プロセッサに切替える）がある。周辺装置レベルの予備要素を用意するものには、外部記憶装置において予備の磁気ハードディスクドライブを利用した並列化記憶装置のＲＡＩＤ（Redundant Arrays of Independent Disks）と呼ばれる方式がある。
【０００８】
機能縮退は、複数の構成要素で構成されている冗長構成のプロセッサやメモリを対象にする場合が多い。複数のプロセッサからなるマルチプロセッサシステムの機能縮退では、故障したプロセッサをシステムから切離し、残ったプロセッサだけで処理を続行する。メモリの機能縮退では、故障メモリブロックを切離し、全体のメモリ容量を縮小して処理を続行する。どちらの場合もシステム性能の低下は避けられない。
【０００９】
Ｎバージョン法は、同一仕様に基づいて独立に作成されたＮ個のプログラムを同時に実行し、得られたＮ個の実行結果から、例えば、多数決のアルゴリズムに従って正しいと思われる結果を選び出す方法である。
【００１０】
以上の説明から分かるように、多重化による冗長構成では、同一仕様の構成要素を複数用意し、そのうち１つを稼働させながら、別の少なくとも１つを並列稼働あるいは待機させることを基本方針としている。その理由は、故障の発生した構成要素の切替えや切離しを容易にするためである。
【００１１】
従来における冗長構成方法では、システムのすべての構成要素を多重化するのではなく、フォールトの発生しやすい部分、あるいはフォールトの発生がシステム機能に致命的な影響を与える部分だけを多重化することが多い。その理由は、第一には、すべての構成要素を多重化するとシステム構築のコストが大幅に上昇するためであり、第二には、ハードウェアの故障発生確率等に基づいて多重化すべき箇所を限定したとしても、耐フォールト性の向上に一定の効果があるためである。
【００１２】
次に、情報量を縮減するための抽象化に関する技術である多階層モデル表現法とマルチレベルシミュレーションについて説明する。
【００１３】
多階層モデル表現法は、計算機システムの構造と動作を抽象的な表現レベル（抽象的な属性を表す変数の領域）と具体的な表現レベル（具体的な属性を表す変数の領域）で記述するための手法である。
【００１４】
通常、具体的な表現レベルの構造記述と現実の計算機システムの構成要素とが一対一に対応するように表現される。抽象的な表現レベルの構造記述は、上位下位対応関係の記述を通して、間接的に計算機システムの構成要素と対応づけられる。主に論理装置の設計の分野で用いられてきた技術である。
【００１５】
また、マルチレベルシミュレーションは、多階層モデルの記述の一貫性を確かめるための検証法である。多階層モデルの記述に一貫性があれば、その表現対象である計算機システム（論理装置）の設計は正しいことになる（非特許文献１を参照）。
【００１６】
マルチレベルシミュレーションの検証手順を簡単に説明する。前提として、検証の対象となる論理装置の多階層モデルの記述が存在しているものとする。すなわち、論理装置は、多階層モデル表現法に基づいて２つの表現レベル（抽象的な上位表現レベル；例えばレジスタトランスファレベル、下位表現レベル；例えばゲートレベル）で構造と動作が記述され、さらに、それら２つの表現レベル間の論理装置の入出力および状態の上位下位対応関係（上位表現レベルの変数と下位表現レベルの変数との対応関係）も明確に記述されているものとする。多階層モデルに３つ以上の表現レベルがある場合には、上位下位対応関係の記述された２つの（隣り合う）表現レベルに注目した検証を複数回繰り返す。
【００１７】
まず、下位表現レベルの（構造記述を変換して得られた）動作記述から、特定のテストパターンに対する出力結果Ｒ１をシミュレーションにより得る。次に、テストパターンを上位下位対応関係を使って上位表現レベルの入力データに変換し、上位表現レベルの論理装置の動作記述からそのデータに対する出力結果Ｒ２をシミュレーションにより得る。
【００１８】
一方、出力結果Ｒ１は、テストパターンの変換と同様にして、上位下位対応関係を使って出力結果Ｒ１’に変換される。その後、変換された出力結果Ｒ１’と出力結果Ｒ１とが照合される。複数の適切なテストパターンに対して照合が成功したときに、（論理装置の）多階層モデルの記述には一貫性がある、と判断される。
【００１９】
通常のシミュレーションでは、設計者がシミュレーション結果を逐一観察して論理装置の正しさを判断しなければならないが、マルチレベルシミュレーションを利用すればその観察と判断を自動的に行えるという利点がある。
【非特許文献１】
久野巧：マルチレベルシミュレーションによる多階層モデルの検証、電子情報通信学会論文誌D-II, Vol. J76-D-II, No. 4, pp.908-913 （1993）
【００２０】
【発明が解決しようとする課題】
ところで、今後、世の中のあらゆる場面に計算機システムが浸透していくことが予想される。時と場所にかかわらず計算機システムによる支援があたりまえになる社会では、システムの障害（フォールト）が近代的な生活の営みをすべて停止させてしまう危険性をはらんでいる。そのような危険を顕在化させないためには、耐フォールト性を向上させた冗長システムの構築と運用が所望される。
【００２１】
従来の計算機システムの冗長構成では、構成要素をすべて多重化してしまうとシステム構築コストの大幅な上昇を引き起こすため、ハードウェア故障の発生確率に基づいて多重化すべき箇所を限定し、フォールトの発生しやすい部分あるいはフォールトの発生がシステム機能に致命的な影響を与える部分だけを多重化することが多かった。
【００２２】
部分的な冗長構成であっても、フォールト発生の予測が正しければ耐フォールト性は確かに向上する。ところが、広義の障害、ハードウェアの故障はもちろんであるが、ソフトウェアのバグ、設計ミス、過負荷、輻輳等を含む障害までを想定した場合には、フォールト発生確率の見積もりに誤差が増えるため、部分的な冗長構成では運用の継続を保証することは極めて困難になる。
【００２３】
計算機システムの遍在を前提にした社会では、システム全体の耐フォールト性が高く、かつ、システム構築に要するハードウェアやソフトウェア（以下、リソース）の増大を抑えることのできる冗長構成方式がより一層求められる。
【００２４】
したがって、そのような冗長構成方式の候補として、耐フォールト性を確実に向上させるためにデータの生成部分からデータの最終出力部分までを一貫して多重化し、所要リソースの増大を抑制するために情報量を縮減したデータを利用するフェイルソフトシステムの仕組みが所望される。
【００２５】
本発明は、上記のような課題を解決するためになされたものであり、上記のような仕組みの具現化に際して克服すべき問題として、具体的には、次に説明するような（課題１）から（課題３）が具体的に解決されるべき課題として挙げられる。
【００２６】
（課題１）
フォールト発生箇所やフォールト発生要因の予測が困難であることを前提にした場合、最も有効な冗長構成は、すべての構成要素を多重化することである。ただし、従来技術のように同一仕様の構成要素を複数用意したのでは、システムの構築に要するリソースが２倍以上に増大するので、そのような構成は多くの場面において現実的ではない。
【００２７】
所要リソースを抑える手段としては、例えば、基本モジュールに対して、情報量を縮減したデータを扱う冗長モジュールを用意し、データの生成部分からデータの最終出力部分までを一貫して多重化する構成が必要になる。さらに、冗長モジュールは、フォールト発生時に基本モジュールの予備要素となって、機能縮退のために利用可能なデータを供給できるような構成でなければならない。ここでの「基本モジュール」とは、冗長的な構成にする前段階のシステム（非冗長システム）の各部分をいい、「冗長モジュール」とは冗長的な構成にするために追加した付加的なモジュールをいう。
【００２８】
（課題２）
ハードウェア故障に対しては、従来の自己診断回路による誤り検出や、デュアル方式での同期運転結果の一致比較により故障検出することが有効である。しかし、ソフトウェアのバグ、設計ミス、過負荷、輻輳等を自己診断や同期運転で検出することは、困難である。このため、広義の障害としてのフォールトを検出できなければならない。したがって、システム設計の段階で定義した望ましい動作と稼働中のシステムの動作が一致しているか否かを常時比較してフォールトを検出する手段が必要になる。
【００２９】
（課題３）
システムが耐フォールト性を向上させたフェイルソフトシステムであるためには、フォールト発生箇所が冗長モジュールであれば、基本モジュールのデータをそのまま出力し、フォールト発生箇所が基本モジュールであれば、冗長モジュールのデータから元のデータを復元して出力するといった制御ができなければならない。フォールトを検出したときには、より確からしいデータの選択して機能縮退するための制御手段が必要になる。
【００３０】
したがって、本発明の目的は、上記の（課題１）から（課題３）により検討された課題を解決して、正常時にはシステム機能検証を実行し、フォールト発生時には機能縮退するためのフェイルソフトシステムである冗長システム、冗長システム構成システムおよび冗長システム構成方法を提供することである。具体的には、本発明の目的は、計算機システムの構成要素にフォールトが発生したときでも部分的な機能でシステム運用を継続できるようにするために処理装置や通信装置の機能を多重化した冗長システム、冗長システムを構成する冗長システム構成システムおよび冗長システム構成方法を提供することにある。
【００３１】
【課題を解決するための手段】
上記のような目的を達成するため、本発明による冗長システムは、基本的な構成として、計算機システムの構成要素にフォールトが発生したときでも部分的な機能でシステム運用を継続できるようにするために処理装置や通信装置の機能を多重化した冗長システムであって、計算機システムの通常のデータを処理または通信するシステム機能の第１レベルモジュールと、機能縮退ができるように抽象化を行う抽象化手段により前記通常のデータから情報量が縮減されたデータを処理または通信するシステム機能の第２レベルモジュールと、第１レベルモジュールの出力データから情報量を縮減したデータと第２レベルモジュールの出力データを照合してシステム機能を検証する機能検証モジュールと、前記機能検証モジュールによりフォールトが検出された場合に第１レベルモジュールにフォールトが検出されなければ、第１レベルモジュールの出力データをそのまま選択し、第１レベルモジュールにフォールトが検出されれば、第２レベルモジュールから前記抽象化手段による抽象化とは逆の処理を行って復元されたデータを選択してシステム機能を縮退する機能縮退モジュールとを備える。
【００３２】
また、ここでの機能縮退モジュールは、システム機能の機能検証により、例えば、第１レベルモジュールおよび第２レベルモジュールのそれぞれにおいて通信または処理する過程での誤り検出および情報量を縮減する操作によって変化する、データの確からしさを表す変数の値の大小比較により、第１レベルモジュールにフォールトが検出されなければ、第１レベルモジュールの出力データをそのまま選択し、第１レベルモジュールにフォールトが検出されれば、第２レベルモジュールから復元されたデータを選択するようにしてシステム機能を縮退するように構成される。
【００３３】
また、本発明による冗長システム構成システムは、計算機システムの構成要素にフォールトが発生したときでも部分的な機能でシステム運用を継続できるようにするために処理装置や通信装置の機能を多重化した冗長システムを構成する冗長システム構成システムであって、計算機システムの通常のデータを処理または通信するシステム機能の第１レベルモジュールから、機能縮退ができるように抽象化を行う抽象化手段により前記通常のデータから情報量を縮減したデータを処理または通信する第２レベルモジュールを作成する縮退モデル作成手段と、前記第１レベルモジュールと第２レベルモジュールを同時並行的に動作させ、フォールトが検出されないときにはシステム機能を検証し、フォールトが検出されたときには第１レベルモジュールにフォールトが検出されなければ、第１レベルモジュールの出力データをそのまま選択し、第１レベルモジュールにフォールトが検出されれば、第２レベルモジュールから前記抽象化手段による抽象化とは逆の処理を行って復元されたデータを選択して機能縮退する機能縮退モジュールを作成する機能縮退モジュール作成手段とを備えるように構成される。
【００３４】
本発明による冗長システム構成方法は、計算機システムの構成要素にフォールトが発生したときでも部分的な機能でシステム運用を継続できるようにするために処理装置や通信装置の機能を多重化した冗長システムを構成する冗長システム構成方法であって、計算機システムの通常のデータを処理または通信するシステム機能の第１レベルモジュールから、機能縮退ができるように抽象化を行う抽象化手段により前記通常のデータから情報量を縮減したデータを処理または通信する第２レベルモジュールを作成し、前記第１レベルモジュールと第２レベルモジュールを同時並行的に動作させ、フォールトが検出されないときにはシステム機能を検証し、フォールトが検出されたときには第１レベルモジュールにフォールトが検出されなければ、第１レベルモジュールの出力データをそのまま選択し、第１レベルモジュールにフォールトが検出されれば、第２レベルモジュールから前記抽象化手段による抽象化とは逆の処理を行って復元されたデータを選択して機能縮退する機能縮退モジュールを作成するような構成とされる。
【００３５】
また、本発明による冗長システムは、計算機システムの構成要素にフォールトが発生したときでも部分的な機能でシステム運用を継続できるようにするために処理装置や通信装置の機能を多重化した冗長システムであって、計算機システムの通常のデータを処理または通信するシステム機能の第１レベルモジュールと、機能縮退ができるように前記通常のデータから情報量を縮減したデータを処理または通信するシステム機能の第２レベルモジュールと、前記第２レベルモジュールのデータから情報量を縮減したデータを処理または通信するシステム機能の第３レベルモジュールと、第１レベルモジュールの出力データから情報量を縮減したデータと第２レベルモジュールの出力データを照合しシステム機能を検証する第１検証手段と、第２レベルモジュールの出力データから情報量を縮減したデータと第３レベルモジュールの出力データを照合しシステム機能を検証する第２検証手段と、前記１検証手段または第２検証手段によりフォールトが検出された場合に、システム機能を縮退する機能縮退モジュールとを備えるものとしても良い。
【００３６】
また、本発明による冗長システム構成システムは、計算機システムの構成要素にフォールトが発生したときでも部分的な機能でシステム運用を継続できるようにするために処理装置や通信装置の機能を多重化した冗長システムを構成する冗長システム構成システムであって、計算機システムの通常のデータを処理または通信するシステム機能の第１レベルモジュールから、前記通常のデータから情報量を縮減したデータを処理または通信する第２レベルモジュールを作成し、第２レベルモジュールのデータから情報量を縮減したデータを処理または通信する第３レベルモジュールを作成する多階層縮退モデル作成手段と、前記第１レベルモジュール、第２レベルモジュール及び第３レベルモジュールを同時並行的に動作させ、フォールトが検出されないときにはシステム機能を検証し、フォールトが検出されたときには機能縮退する機能縮退モジュールを作成する機能縮退モジュール作成手段とを備えるものとしても良い。
【００３７】
また、本発明による冗長システム構成方法は、計算機システムの構成要素にフォールトが発生したときでも部分的な機能でシステム運用を継続できるようにするために処理装置や通信装置の機能を多重化した冗長システムを構成する冗長システム構成方法であって、計算機システムの通常のデータを処理または通信するシステム機能の第１レベルモジュールから、前記通常のデータから情報量を縮減したデータを処理または通信する第２レベルモジュールを作成し、第２レベルモジュールのデータから情報量を縮減したデータを処理または通信する第３レベルモジュールを作成し、前記第１レベルモジュール、第２レベルモジュール及び第３レベルモジュールを同時並行的に動作させ、フォールトが検出されないときにはシステム機能を検証し、フォールトが検出されたときには機能縮退する機能縮退モジュールを作成するようにしても良い。
【００３８】
この場合において、第１レベルモジュールと第２レベルモジュールに対して、第３レベルモジュールから第Ｎレベルモジュール（Ｎ＞３）までの少なくとも１レベルを追加し、第１レベルモジュールから第Ｎレベルモジュールまでを同時並行動作させて検証と縮退を実行させる。
【００３９】
以下の説明において、「フォールト」とは、ハードウェアの故障やソフトウェアのバグ、設計ミス、過負荷、輻輳等を含む広義の障害をいい、「情報量を縮減したデータ」とは、「抽象化」処理の操作を行ったデータをいう。例えば、元のデータから特定の属性に関する情報を抽出する操作あるいは特定のパターン情報を特定のシンボル情報で代表する操作等の適用により、データの情報量を減らす変換を行ったデータいう。また、抽象化したデータを元のデータに対して「抽象データ」という。データの流れが抽象化と逆方向になる変換を「逆抽象化」という。逆抽象化したデータを「復元データ」という。
【００４０】
「第１レベルモジュール」とは、本発明による冗長システムが適用された計算機システムにおいて、システムを冗長な構成にする前段階のシステムの各部分をいう。当然ながら、フォールトの発生を想定しなければ第１レベルモジュールだけでシステムの機能のすべてを実現できる。後の説明における「第１レベルデータ」とは、主に第１レベルモジュールにおいて処理または通信される通常のデータをいう。
【００４１】
「第２レベルデータ」は、第１レベルデータを抽象化したデータである。「第２レベルモジュール」とは、主に第２レベルデータを処理または通信するモジュールをいう。同様な意味で、「第３レベルデータ」と「第３レベルモジュール」および「第Ｎレベルデータ」と「第Ｎレベルモジュール」（Ｎ＞３）が定義される。
【００４２】
また、第１レベルモジュールは、通常のデータを処理または通信するシステム機能のモジュールであるから、「基本モジュール」ともいう。また、本発明による冗長システムが適用された計算機システムにおける「基本モジュール」以外のモジュールが「冗長モジュール」である。
【００４３】
本発明による冗長システムの構成方法では、次の構成手順にしたがって、情報量を縮減したデータを扱う冗長モジュールを作成して、設置し、データの生成部分からデータの最終出力部分までを一貫して多重化する。
【００４４】
すなわち、計算機システムの構成要素にフォールトが発生したときでも部分的な機能でシステム運用を継続できるようにするために処理装置や通信装置の機能を多重化した冗長システムを構成する場合には、まず、冗長システムとするシステム構成のデータを入力して、計算機システムの通常のデータを処理または通信するシステム機能の第１レベルモジュールとして作成する。この第１レベルモジュールから、通常のデータから情報量を縮減したデータを処理または通信する第２レベルモジュールを作成する。そして、第１レベルモジュールと第２レベルモジュールを同時並行的に動作させ、フォールトが検出されないときにはシステム機能を検証し、フォールトが検出されたときには機能縮退する機能縮退モジュールを作成する。
【００４５】
具体的には、図１に示すように、まず、冗長構成にする前段階のシステム構成を入力する。これが第１レベルモジュールを構成することになる（ステップ１００）。次に、冗長構成にする前段階のシステムの多階層モデルを作成する。これにより、第２レベルモジュール、第３レベルモジュール、…、第Ｎレベルモジュールを作成することができることになる。このとき、多階層モデルの下位表現レベルの構造記述に表れる各構成要素が第１レベルモジュールに対応する（ステップ１０１）。
【００４６】
次に、当該多階層モデルの上位表現レベルの構造記述に表れる各構成要素に対応した（実体のある）第２レベルモジュールを設置する（ステップ１０２）。
【００４７】
ここで、当該多階層モデルを作成した段階においては、上位表現レベルの動作記述は上位表現レベルの構造記述の動作を表している。つまり、上位表現レベルの動作記述は、上位表現レベルの構造記述に基づいて構成された第２レベルモジュールの動作をも表していることになる。一方、多階層モデルを利用したシステム設計では、システムの望ましい抽象的な動作、すなわち機能は、仕様の一部として上位表現レベルの動作記述に表現されている。したがって、システムの望ましい抽象的な動作、すなわち機能と第２レベルモジュールの動作は、正しい実装／実現がなされているかぎり、一致する。
【００４８】
最後に、システム機能を縮退する制御を行うモジュールを作成するため、当該多階層モデルの上位下位対応関係の記述に相当する抽象化手段と逆抽象化手段、および機能検証のための照合手段と機能縮退のための選択制御手段を作成して、設置する（ステップ１０３，ステップ１０４）。
【００４９】
このようなシステム構成方法により、作成された冗長システムによれば、第１レベルモジュールと、第１レベルモジュールに対してその抽象的な動作（すなわち機能）を実現した第２レベルモジュールが、同時並行的に実行可能な状態で存在するため、システム稼働中にマルチレベルシミュレーション相当の機能検証を常時実行することができる。さらに、検証時の照合の失敗に基づいてフォールトを検出し、機能縮退を実行することができる。
【００５０】
ここでの冗長システムにおいては、機能検証と機能縮退は、次のように実現される。すなわち、機能検証は、２組の等価な処理手段や通信手段に等価なデータを入力すれば等価な出力データが得られるという原理を利用する。（抽象化手段により）情報量を縮退したデータとして、各レベルモジュールの間で等価なデータに変換して、照合して等価性を調べる。等価な出力データが得られなかった場合、第１レベルモジュールまたは第２レベルモジュールの２組の処理手段や通信手段のいずれかにフォールトが発生したと判断する。
【００５１】
機能検証は、冗長構成を前提にしている。第１レベルモジュールと第２レベルモジュールは、扱うデータの情報量の違いを除いて機能的には等価である。特定の入力データが第１レベルモジュールに入力されると、それと同時に当該入力データは抽象化されて第２レベルモジュールにも入力される。第１レベルモジュールと第２レベルモジュールが同時並行的に動作した後、第１レベルモジュールの出力データと第２レベルモジュールの出力データが、各レベルモジュールの間で等価なデータに変換されて照合される。ここでシステム機能の照合では、第１レベルモジュールの出力データを抽象化し、当該抽象データと第２レベルモジュールの出力データを照合する。照合により当該照合が成功すれば「フォールトなし」、当該照合が失敗すれば「フォールトあり」と判断する。
【００５２】
入力データの処理結果を抽象化したデータと当該入力データを抽象化したデータの処理結果を照合する手順は、マルチレベルシミュレーションのそれと同じである。マルチレベルシミュレーションにおいては、下位表現レベルの構造の動作を確認することにより、上位表現レベルの動作（すなわち機能）を満足しているか否かを確認することができる。
【００５３】
同様に、システム機能の照合においては、システム設計の段階で定義した望ましい動作（すなわち機能）と稼働中のシステムの動作が一致しているか否かを確認していることになる。ただし、マルチレベルシミュレーションではあらかじめ設計者がテストパターンを入力データとして準備しなければならなかったが、本発明による冗長システムにおける機能検証では、稼働中の実データを照合対象とする点で異なる。
【００５４】
また、ここでの機能検証では、上述のような照合によるフォールト検出に加えて、実施例として説明するように、例えば、第１レベルモジュールと第２レベルモジュールにそれぞれ独立して設けた誤り検出手段によっても、フォールト検出を行うことができる。照合手段と誤り検出手段を組み合わせて機能検証を行うことにより、各種のフォールト（ハードウェア故障、ソフトウェアのバグ、設計ミス、過負荷、輻輳等）を検出することができるようになる。
【００５５】
機能縮退は、例えば、フォールト検出結果に基づいてデータ選択を行い、復元データを用いて機能縮退を実行する。第１レベルデータを選択する場合は、照合手段および誤り検出手段によるフォールト検出結果に基づいて、逆抽象化した復元データかあるいは第１レベルデータから、より確からしいデータを選択し、第１レベルデータとして出力する。
【００５６】
第２レベルデータを選択する場合は、照合手段および誤り検出手段によるフォールト検出結果に基づいて、抽象化した抽象データかあるいは第２レベルデータから、より確からしいデータを選択し、第２レベルデータとして出力する。
【００５７】
また、確からしいデータの選択のために、処理対象となるデータあるいは通信対象となるデータには、当該データの確からしさを表す変数（単調増加変数、ＣＴ変数）を付与するようにしても良い。確からしさの指標となる変数は、抽象化や逆抽象化あるいはフォールト発生によって、単調に減少する値（ＣＴ値）とされて、各データに対応して保持される。より確からしいデータの選択は、当該変数の値の比較に基づいて行う。
【００５８】
システムの主要な出力結果は、第１レベルデータであるため、機能縮退では第１レベルデータの選択制御が特に重要である。第１レベルデータの出力結果として、第１レベルモジュールにフォールトが検出されなければ、第１レベルモジュールの出力データをそのまま選択し、第１レベルモジュールにフォールトが検出されれば、逆抽象化して第２レベルデータから復元されたデータを選択するように制御される。
【００５９】
ここで、注意されるべき点としては、復元データの情報量は、抽象化と逆抽象化を伴っているので、本来の第１レベルモジュールのデータのそれに比べて少ないことである。そのため、それ以降の処理または通信によって得られた結果にはより多くの誤差が含まれるようになる。すなわち、システムは第１レベルモジュールでフォールトが発生したときに出力結果に誤差が増える状態に移行しているという意味で機能を縮退させながら、動作を継続する。
【００６０】
【発明の実施の形態】
次に、本発明を実施する場合の形態について、具体的に図面を参照して説明する。この説明の前に、内容の理解を深めるために、本発明の実施例の説明において使用する用語を定義する。
【００６１】
次のように用語を定義する。「単位データ」とは、データ処理や通信の実行単位毎にまとめられた一式のデータをいう。連続したデータの単位データへの分割は、通常、データの生成段階で行われ、データの確からしさを表すために利用する変数として、その分割された単位データ毎に単調増加変数とＣＴ変数（後述する）が付与される。ただし、ここでの説明においては、単調増加変数やＣＴ変数に言及する場合を除いて「単位データ」と「データ」を同じ意味で用いる。
【００６２】
「単調増加変数」は、単調に増加する値を格納するための、すべての単位データに付与される変数である。「単調増加変数の制約」とは、単調増加変数に代入し得る値の範囲に関する制約をいう。任意の単位データの単調増加変数の値は、時系列上で直前の単位データの単調増加変数の値に比べて、等しいかあるいはより大きな値でなければならない。「ＣＴ変数」は、選択制御手段において、より確からしいデータを選択するための指標となる値を格納するために、すべての単位データに付与される変数である。ＣＴ変数の値を「ＣＴ値」という。値の範囲は、０≦ＣＴ値≦１である。ＣＴ値＝１のとき「最も確からしい」とする。なお、ここでのＣＴは、Certaintyの略語である。
【００６３】
（基本構成）
本発明による冗長システムは、概略を説明した構成手順（図１）に従って作成されたものである。冗長システムの基本構成としては、図２に示すように、第１レベルモジュール１０、第２レベルモジュール２０、第１−２レベル抽象化モジュール３０、第１−２レベル機能検証モジュール４０、および第１レベル機能縮退モジュール５０から構成される。
【００６４】
第１レベルモジュール１０は、第１レベルデータ処理手段１１と、第１レベルデータ通信手段１２と、それぞれに付加された誤り検出手段１３，１４から構成されている。同様に、第２レベルモジュール２０は、第２レベルデータ処理手段２１と、第２レベルデータ通信手段２２と、それぞれに付加された誤り検出手段２３，２４から構成されている。
【００６５】
第１レベルモジュール１０と第２レベルモジュール２０の間に介在する第１−２レベル抽象化モジュール３０は、抽象化手段３１から構成されており、抽象化手段３１が情報量を縮減したデータに変換して、第１レベル入力データを第２レベル入力データに変換する。また、第１−２レベル機能検証モジュール４０は、第１レベルモジュールの出力データを抽象化する抽象化手段４１、抽象データと第２レベルモジュールの出力データとを比較照合する照合手段４２、および抽象データと第２レベルモジュールの出力データから第２レベル出力結果３を選ぶ選択制御手段４３から構成される。
【００６６】
第１レベル機能縮退モジュール５０は、第２レベルデータから第１レベルデータにデータを復元する逆抽象化手段５１と、復元されたデータと第１レベルモジュールの出力データから第１レベル出力結果２を選ぶ選択制御手段５２とから構成されている。
【００６７】
なお、第１レベルモジュール１０における第１レベルデータ処理手段１１や、第１レベルデータ通信手段１２の組合せ方（各手段の個数や順序）が異なる構成についても、当然ながら、この基本構成と等価である。同様に、第２レベルモジュール２０における第２レベルデータ処理手段２１や、第２レベルデータ通信手段２２の組合せ方（各手段の個数や順序）が異なる構成ついても、この基本構成と等価である。
【００６８】
（各モジュールの処理内容）
次に、冗長システムの基本構成における各モジュールの処理内容について説明する。第１レベルモジュール１０における第１レベルデータ処理手段１１と第１レベルデータ通信手段１２は、冗長的な構成にする前段階のシステムの各部分に対応するデータ処理とデータ通信をそれぞれ実行する。すなわち、通常のデータを処理または通信するモジュールとなっている。
【００６９】
第１レベルデータ処理手段１１に付加された誤り検出手段１３は、誤り検出符号や時間監視タイマに基づいて当該処理手段の出力の誤りを検出する。この誤り検出手段１３により、データ処理系のハードウェア故障が発生したときのフォールトが検出できるものとなっている。
【００７０】
第１レベルデータ通信手段１２に付加された誤り検出手段１４についても、同様に、誤り検出符号や時間監視タイマに基づいて当該通信手段の出力の誤りを検出する。この誤り検出手段１４により、データ通信系のハードウェア故障が発生したときのフォールトが検出できるものとなっている。
【００７１】
第２レベルモジュール２０における第２レベルデータ処理手段２１と第２レベルデータ通信手段２２は、第１レベルデータに対して情報量を縮減した（抽象化した）第２レベルデータのデータ処理とデータ通信をそれぞれ実行する。
【００７２】
第２レベルデータ処理手段２１に付加された誤り検出手段２３は、誤り検出符号や時間監視タイマに基づいて当該処理手段の出力の誤りを検出する。また、第２レベルデータ通信手段２２に付加された誤り検出手段２４は、誤り検出符号や時間監視タイマに基づいて当該通信手段の出力の誤りを検出する。
【００７３】
第１−２レベル抽象化モジュール３０の抽象化手段３１は、第１レベルモジュールの入力データ（第１レベル入力データ１）から第２レベルモジュールの入力データを抽象化によって求める。
【００７４】
ここでの第１−２レベル抽象化モジュール３０の抽象化手段３１および第１−２レベル機能検証モジュール４０の抽象化手段４１は、次の操作（１ａ）から操作（１ｄ）の各操作によって、データの情報量を縮減し抽象化を実行する処理モジュールである。すなわち
操作（１ａ）：特定の属性に関する情報を抽出する操作、
操作（１ｂ）：時間的な変化を伴うデータにおいて時間的な標本間隔を広くする操作、
操作（１ｃ）：空間的な広がりを伴うデータにおいて空間的な標本間隔を広くする操作、
操作（１ｄ）：これらを組み合わせた操作。
なお、ここで操作（１ｂ）の操作においては、単位データに付与された単調増加変数を変更する。一連の単位データに対して、操作（１ｂ）の操作が適用されると、縮減された単位データに付与された単調増加変数も同時に削除される。
【００７５】
第１−２レベル機能検証モジュール４０における抽象化手段４１と照合手段４２の処理内容は次のとおりである。
【００７６】
第１−２レベル機能検証モジュール４０の抽象化手段４１は、第１レベルモジュールの出力データを抽象化する。すなわち、抽象化手段４１は、第１レベルモジュールの出力データに対して、前述した抽象化手段３１と同じ操作（１ａ）〜（１ｄ）を適用する。
【００７７】
第１−２レベル機能検証モジュール４０における照合手段４２は、第２レベルモジュールの出力データと、抽象化した第１レベルモジュールの出力データとを比較照合する。照合手段４２は、当該照合が成功すれば「フォールトなし」、当該照合が失敗すれば「フォールトあり」と判断する。
【００７８】
また、照合手段４２による照合によって、システム設計の段階で定義した望ましい動作（すなわち機能）と稼働中のシステムの動作が一致しているか否かを確認することができる。照合手段４２が「フォールトなし」と判断したとき、システムは設計の段階で定義した望ましい動作（すなわち機能）を実行しているとする判定できる。
【００７９】
ここでは、また、照合手段４２、第１レベルデータの誤り検出手段（第１レベルデータ処理手段１１に付加された誤り検出手段１３、第１レベルデータ通信手段１２に付加された誤り検出手段１４）、および第２レベルデータの誤り検出手段（第２レベルデータ処理手段２１に付加された誤り検出手段２３、第２レベルデータ通信手段２２に付加された誤り検出手段２４）の結果を組み合わせることによって、各種フォールト（ハードウェア故障、ソフトウェアのバグ、設計ミス、過負荷、輻輳等）を検出する。
【００８０】
ハードウェア故障に対しては、ここでは、各レベルモジュールにおいて検出するように構成する。すなわち、第１レベルモジュール１０の誤り検出手段１３，１４が「第１レベルモジュールのハードウェア故障」を検出する。また、第２レベルモジュールの誤り検出手段２３，２４が「第２レベルモジュールのハードウェア故障」を検出する。また、誤り検出手段１３，１４，２３，２４の結果にかかわらず、第１レベルモジュール１０あるいは第２レベルモジュール２０のどちらかがハードウェア故障で異常データを出力したとき、照合手段４２によって、フォールトが検出されることになる。
【００８１】
ここでの冗長システムでは、ソフトウェアのバグや設計ミスのフォールトに対しては、第１レベルモジュール１０と、第２レベルモジュール２０（あるいは第Ｎレベルモジュール）が、同じ仕様の別の実装／実現となるので、ソフトウェアのＮバージョン法と同じ程度の範囲で、照合手段４２によって、フォールトが検出されることになる。
【００８２】
ところで、過負荷や輻輳に対しては、第１レベルモジュール１０と、第２レベルモジュール２０においてデータを処理または通信を行う場合に、ここで扱うデータの情報量の違いによって各モジュールの隘路における性能上の限界が異なるため、出力結果の時間的遅れとして現れるので、これを照合手段４２により、フォールトとして検出するようにもできる。
【００８３】
次に、照合手段４２における出力結果の時間差を考慮した照合手順について、詳細に説明する。図５は、照合手段に適用された出力結果の時間差を考慮した照合手順を説明するフローチャートである。フローチャートに照合手順を従って説明する。
【００８４】
まず、出力結果に存在する時間差とその対処のために導入する単調増加変数について説明する。通常、第１レベルモジュールでの処理または通信に要する時間と第２レベルモジュールでの処理または通信に要する時間には、過負荷や輻輳による出力結果の時間的遅れ以外にも、定常的な差が存在する。その時間差は、データの情報量の違いおよび処理や通信を実行するリソースの違いに起因する。
【００８５】
照合手段４２においては、照合時にこの定常的な時間差を吸収し、過負荷や輻輳に基づく出力結果の時間的遅れと区別するため、単調増加変数を利用する。単位データの単調増加変数の値は、時系列上で直前の単位データの単調増加変数の値に比べて、等しいかあるいはより大きな値である。ここでは、単調増加変数の制約を満たす値として、単位データが生成されたときの実時間を特定の形式の数値に変換して単調増加変数に代入する。
【００８６】
このような単調増加変数を前提にすると、任意の２つの単位データにおいて、それぞれの単調増加変数の値が等しいとき、処理または通信の流れを生成時点まで逆に辿れば同じデータに行き着くことが保証される。
【００８７】
機能検証は、２組の等価な処理手段や通信手段に等価なデータを入力すれば等価な出力データが得られるという原理を利用しているので、その等価性を判定する照合手段においては、単調増加変数の値が等しい単位データだけを対象にすればよい。このため、単調増加変数の値が等しい単位データを照合することで、定常的な時間差を吸収することができる。
【００８８】
照合手順は、図５に示すように、まず、ステップ２００において、前処理として次の処理を行う。
（前提１）２つの入力毎に単位データを少なくともＴｄ時間だけ保持するバッファを持ち、
（前提２）当該バッファは単位データの到着時刻を記録し、
（前提３）当該バッファは単位データの単調増加変数の最大値を記録するものとする。
具体的なＴｄ時間の値は単位データの定常的な時間差を実測して決定する。
【００８９】
次に、ステップ２０１において、２つのバッファの中から到着時刻の最も早い単位データを選択する。ステップ２０２において、選択した単位データと単調増加変数の値が同じ単位データが（選択した単位データのバッファとは）別のバッファに存在するかどうかを調べる。
【００９０】
もし存在すれば、ステップ２０５に進み、ステップ２０５において、単位データの内容の一致比較を行う。この比較により、一致すれば「フォールトなし」と判断する（ステップ２０６）。不一致ならば「フォールトあり」と判断する（ステップ２０７）。そして、ステップ２１１に進む。
【００９１】
また、ステップ２０２の判定において、単調増加変数の値が同じ単位データが存在しなければ、ステップ２０３において、選択した単位データの単調増加変数の値Ｖと（選択した単位データのバッファとは）別のバッファの単調増加変数の最大値Ｖｍａｘを比較する。
【００９２】
ステップ２０３において、Ｖ＞Ｖｍａｘならば、照合対象となる単位データが遅れて到着するかもしれないので、最大Ｔｄ時間待つ（ステップ２０４）。
【００９３】
Ｖ≦Ｖｍａｘならば、照合対象となる単位データは既に到着済みであり、「フォールトあり」と判断が下った後であるから、ステップ２１１に進み、ステップ２１１において、単位増加変数の値がＶである単位データをバッファから削除して、ステップ２０１からの照合手順を繰り返す。
【００９４】
また、Ｔｄ時間待ち（ステップ２０４）、Ｔｄ時間経過しても照合対象となる単位データが到着しなければ、ステップ２０８に進み、選択した単位データがどちら側から入力したかを調べる。
【００９５】
ステップ２０８において、選択した単位データの入力が、第２レベルモジュール側からであれば、第１レベルモジュールからのデータがＴｄ時間を越えて異常に遅れているためであり、ステップ２０９において「第１レベルモジュールにフォールトあり」と判断する。この場合のフォールトは「第１レベルモジュールでの過負荷あるいは輻輳の発生」である。次に、ステップ２１１に進む。
【００９６】
また、ステップ２０８において、選択した単位データの入力が、抽象化手段側からであれば、第２レベルモジュールからのデータがＴｄ時間を越えて異常に遅れているためであり、ステップ２１０において、「第２レベルモジュールにフォールトあり」と判断する。この場合のフォールトは「第２レベルモジュールでの過負荷あるいは輻輳の発生」である。そして、ステップ２１１に進む。
【００９７】
ステップ２０１からの処理により、選択した単位データに関する判断が終了すると、ステップ２１１において、当該単位データおよび単調増加変数の値が同じ単位データをすべてバッファから削除する。そして、ステップ２０１からの照合手順を繰り返す。
【００９８】
このような照合手順によって、正常に稼働中の定常的な時間的な遅れと過負荷や輻輳の発生による時間的な遅れを区別してフォールトを検出することができるようになる。
【００９９】
次に、第１レベル機能縮退モジュール５０の処理内容について説明する。第１レベル機能縮退モジュール５０における逆抽象化手段５１は、第２レベルデータから第１レベルのデータを復元する処理を行う。
【０１００】
逆抽象化手段５１においては、次の操作（３ａ）から操作（３ｄ）の操作によって、抽象データから復元データを得る。
操作（３ａ）：存在しない属性の値を推定する操作、
操作（３ｂ）：時間的な標本間隔の間のデータを補間する操作、
操作（３ｃ）：空間的な標本間隔の間のデータを補間する操作、および
操作（３ｄ）：これらを組み合わせた操作。
ここで、操作（３ｂ）の操作は単位データに付与された単調増加変数を変更する。抽象データに対して、操作（３ｂ）の操作が適用されると、補間された単位データの単調増加変数に、補間前後の単位データの単調増加変数の値を最小と最大とし、かつ、単調増加変数の制約を満たす値が代入される。
【０１０１】
第１レベル機能縮退モジュール５０における選択制御手段５２は、前述したようなフォールト検出結果に基づき、第２レベルデータあるいは第１レベルデータの抽象データから、より確からしいデータを選択して、第２レベルデータとして出力する。
【０１０２】
次に、第１−２レベル機能検証モジュール４０の選択制御手段４３および第１レベル選択制御モジュール５０の選択制御手段５２の処理動作について、より確からしいデータを選択するための選択制御手順を説明する。
【０１０３】
図６は、第１−２レベル機能検証モジュール４０の選択制御手段４３の選択制御手順を説明するフローチャートである。フローチャートに従って説明する。
【０１０４】
まず、より確からしいデータの選択のための指標となる値を格納するＣＴ変数について説明する。すべての単位データにはＣＴ変数が付与する。ＣＴ変数の値であるＣＴ値は、確からしさを表す指標である。値の範囲は、０≦ＣＴ値≦１である。ＣＴ値＝１のとき最も確からしい単位データであることを示している。
【０１０５】
ＣＴ値は、抽象化、逆抽象化およびフォールト検出によって次のように変化するものとする。つまり
（４ａ）：抽象化（抽象化手段を経由すること）によりＣＴ値は減少する。
（４ｂ）：逆抽象化（逆抽象化手段を経由すること）によってＣＴ値は減少する。
（４ｃ）：誤り検出手段で致命的な誤りの検出によってＣＴ値は０になる。
（４ｄ）：誤り検出手段で音声データや画像データ等の些細な誤りの検出によってＣＴ値は減少する。
このようなＣＴ変数を用いることよって、選択制御手段は、２つの入力側に到着したそれぞれの単位データについてＣＴ値を比較し、より大きいＣＴ値を持つ単位データを出力結果にするという選択制御手順を実行する。
【０１０６】
選択制御手順は、図６に示すように、まず、ステップ３００において、前処理を行う。前処理として、次の処理を行う。
（前提１）２つの入力毎に単位データを少なくともＴｄ時間だけ保持するバッファを持ち、
（前提２）選択制御手段の出力した単位データの単調増加変数の値Ｖｏｕｔを記録する。
Ｔｄの値は照合手段におけるバッファ保持時間Ｔｄと同じである。第１レベルモジュールから抽象化手段を経由したデータを保持するバッファを第１レベル側バッファ、第２レベルモジュールからのデータを保持するバッファを第２レベル側バッファとする。
【０１０７】
次に、ステップ３０１において、照合手段が「第２レベルモジュールにフォールトあり」と判断したかどうかを確認する。もし「第２レベルモジュールにフォールトあり」ならば、ステップ３０２に進み、ステップ３０２において、第１レベル側バッファから単調増加変数の値がＶｏｕｔと同じかあるいはより小さい単位データを削除する。次にステップ３０３において、第１レベル側バッファから単調増加変数の値が最も小さい単位データを出力し、ステップ３０１に戻る。
【０１０８】
また、ステップ３０１の判定において、「第２レベルモジュールにフォールトあり」でなければ、ステップ３０４に進み、ステップ３０４において、第２レベル側バッファから単調増加変数の値が最も小さい単位データＤ２を選択する。
【０１０９】
次に、ステップ３０５において、Ｄ２と単調増加変数の値が同じ単位データＤ１が第１レベル側バッファに存在するか否かを判定する。この判定で、もし、単位データＤ２と単調増加変数の値が同じ単位データＤ１が第１レベル側バッファに存在すれば、ステップ３０７に進み、ステップ３０７において、単位データＤ１のＣＴ値と単位データＤ２のＣＴ値を比較する。
【０１１０】
また、ステップ３０５の判定で、単位データＤ２と単調増加変数の値が同じ単位データＤ１が第１レベル側バッファに存在しなければ、ステップ３０６に進んで、ステップ３０６において、第１レベル側バッファの単調増加変数の値が最大の単位データをＤ１として、ステップ３０７に進み、ステップ３０７において、単位データＤ１のＣＴ値と単位データＤ２のＣＴ値を比較する。
【０１１１】
このステップ３０７に比較において、「単位データＤ１のＣＴ値＞単位データＤ２のＣＴ値」ならば、ＣＴ値の定義から単位データＤ１が「より確からしい」と判断できるので、ステップ３０８に進み、ステップ３０８において、第１レベル側バッファから単調増加変数の値がＶｏｕｔと同じかあるいはより小さい単位データを削除する。次に、ステップ３０９において、第１レベル側バッファから単調増加変数の値が最も小さい単位データを選択制御手段４３から出力して、ステップ３０１に戻る。
【０１１２】
また、ステップ３０７の比較において、「単位データＤ１のＣＴ値≦単位データＤ２のＣＴ値」ならば、この場合には、ＣＴ値の定義から単位データＤ２が「より確からしい」と判断できるので、ステップ３１０に進み、ステップ３１０において、第２レベル側バッファから単調増加変数の値がＶｏｕｔと同じかあるいはより小さい単位データを削除する。次に、ステップ３１１において、第２レベル側バッファから単調増加変数の値が最も小さい単位データを選択制御手段４３から出力し、ステップ３０１に戻る。
【０１１３】
なお、単位データＤ２と単調増加変数の値が同じ単位データが第１レベル側バッファに存在しなければ（ステップ３０５）、第１レベル側バッファの中で単調増加変数が最大の単位データをＤ１とし（ステップ３０６）、第１レベル側バッファに単位データが全く存在しなければ、Ｄ１のＣＴ値は０とする。当該単位データＤ１のＣＴ値と単位データＤ２のＣＴ値を比較する（ステップ３０７）。
【０１１４】
図７は、第１レベル機能縮退モジュール５０の選択制御手段５２の選択制御手順を説明するフローチャートである。フローチャートに従って説明する。
【０１１５】
選択制御手段５２の選択制御手順では、図７に示すように、まず、ステップ４００において、次の前処理を行う。
（前提１）２つの入力毎に単位データを少なくともＴｄ時間だけ保持するバッファを持ち、
（前提２）選択制御手段の出力した単位データの単調増加変数の値Ｖｏｕｔを記録するものとする。
Ｔｄの値は照合手段におけるバッファ保持時間Ｔｄと同じである。第１レベルモジュールからのデータを保持するバッファを第１レベル側バッファ、第２レベルモジュールから逆抽象化手段経由のデータを保持するバッファを第２レベル側バッファとする。
【０１１６】
次に、ステップ４０１において、照合手段が「第１レベルモジュールにフォールトあり」と判断したかどうかを確認する。もし「第１レベルモジュールにフォールトあり」と判断したならば、次のステップ４０２において、第２レベル側バッファから単調増加変数の値がＶｏｕｔと同じかあるいはより小さい単位データを削除する。次に、ステップ４０３において、第２レベル側バッファから単調増加変数の値が最も小さい単位データを出力して、ステップ４０１に戻る。
【０１１７】
ステップ４０１の判断において「第１レベルモジュールにフォールトあり」の判断でなければ、ステップ４０４に進み、ステップ４０４において、第１レベル側バッファから単調増加変数の値が最も小さい単位データＤ１を選択する。
【０１１８】
次に、ステップ４０５において、Ｄ１と単調増加変数の値が同じ単位データＤ２が第２レベル側バッファに存在するか否かを判定する。この判定で、もし、単位データＤ１と単調増加変数の値が同じ単位データＤ２が第２レベル側バッファに存在すれば、ステップ４０７において、単位データＤ１のＣＴ値と単位データＤ２のＣＴ値を比較する。
【０１１９】
また、ステップ４０５の判定で、単位データＤ１と単調増加変数の値が同じ単位データＤ２が第２レベル側バッファに存在しなければ、ステップ４０６に進んで、ステップ４０６において、第２レベル側バッファの単調増加変数の値が最大の単位データをＤ２として、ステップ４０７に進み、ステップ４０７において、単位データＤ１のＣＴ値と単位データＤ２のＣＴ値を比較する。
【０１２０】
このステップ４０７に比較において、もし、「単位データＤ１のＣＴ値≧単位データＤ２のＣＴ値」ならば、ＣＴ値の定義から単位データＤ１が「より確からしい」と判断できるので、ステップ４０８において、第１レベル側バッファから単調増加変数の値がＶｏｕｔと同じかあるいはより小さい単位データを削除し、次のステップ４０９において、第１レベル側バッファから単調増加変数の値が最も小さい単位データを選択制御手段５２から出力し、ステップ４０１に戻る。
【０１２１】
また、ステップ４０７の比較において、「単位データＤ１のＣＴ値＜単位データＤ２のＣＴ値」ならば、ＣＴ値の定義から単位データＤ２が「より確からしい」と判断することができるので、ステップ４１０において、第２レベル側バッファから単調増加変数の値がＶｏｕｔと同じかあるいはより小さい単位データを削除し、次のステップ４１１において、第２レベル側バッファから単調増加変数の値が最も小さい単位データを選択制御手段５２から出力し、ステップ４０１に戻る。
【０１２２】
なお、単位データＤ１と単調増加変数の値が同じ単位データが第２レベル側バッファに存在しなければ（ステップ４０５）、第２レベル側バッファの中で単調増加変数が最大の単位データをＤ２とする（ステップ４０６）が、第２レベル側バッファに単位データが全く存在しなければ、Ｄ２のＣＴ値は０として、単位データＤ１のＣＴ値と当該単位データＤ２のＣＴ値を比較する。
【０１２３】
次に、システムの基本構成の拡張について説明する。基本構成のシステムは、適用する計算機システムのシステム構成の要件に応じて、第Ｎレベルモジュール拡張構成または縦列接続拡張構成のシステムに拡張できる。図３は、第Ｎレベルモジュール拡張構成において第３レベルモジュールを追加設置した例を示す図であり、図４は、縦列接続拡張構成において基本構成を２段階に縦列接続した例を示す図である。
【０１２４】
図３に示すように、第Ｎレベルモジュール拡張構成では、当該基本構成の第１レベルモジュールと第２レベルモジュールに対して、第３レベルモジュールから第Ｎレベルモジュール（Ｎ＞３）までの少なくとも１レベルを追加設置し、さらに関連する抽象化モジュール、機能検証モジュール、機能縮退モジュールを付加した構成とする。多階層モデルの表現レベルが、３つ以上の場合に該当する。第Ｎレベルモジュール拡張構成において、第３レベルモジュールを追加設置した例を、図３に示すシステム構成では示している。
【０１２５】
つまり、図３に示すように、拡張された冗長システムでは、図２の基本構成のシステム要素の構成に加えて、第３レベルモジュール６０、第２−３レベル抽象化モジュール６３、第２−３レベル機能検証モジュール６４、および第２レベル機能縮退モジュール６５が更に設けられている。
【０１２６】
拡張された冗長システムは、第１レベルモジュール１０、第２レベルモジュール２０、第１−２レベル抽象化モジュール３０、第１−２レベル機能検証モジュール４０、第１レベル機能縮退モジュール５０、第３レベルモジュール６０、第２−３レベル抽象化モジュール６３、第２−３レベル機能検証モジュール６４、および第２レベル機能縮退モジュール６５から構成される。
【０１２７】
これらの追加されたシステム構成要素は、基本システムのそれと同様である。すなわち、第３レベルモジュール６０は、図示されないが、第３レベルデータ処理手段と第３レベルデータ通信手段およびそれぞれに付加された誤り検出手段から構成されており、第２レベルモジュール２０と第３レベルモジュール６０の間に介在する第２−３レベル抽象化モジュール６３は、抽象化手段から構成される。また、第２−３レベル機能検証モジュール６４は、第２レベルモジュールの出力データを抽象化する抽象化手段、抽象データと第３レベルモジュールの出力データとを比較照合する照合手段、および抽象データと第３レベルモジュールの出力データから第３レベル出力結果を選ぶ選択制御手段から構成される。
【０１２８】
また、第２レベル機能縮退モジュール６５は、第３レベルデータから第２レベルデータにデータを復元する逆抽象化手段と、復元されたデータと第２レベルモジュールの出力データから第２レベル出力結果を選ぶ選択制御手段とから構成される。
【０１２９】
図４に示すように、縦列接続の拡張構成では、前述した冗長システムの基本構成（あるいは第Ｎレベルモジュール拡張構成）をデータの流れにそって縦列に接続する構成である。いくつかの処理ノードを経由してデータが処理あるいは通信される場合に該当する。基本構成のシステムを２段階に縦列接続した縦続接続の拡張構成のシステム構成の一例が、図４に示されている。
【０１３０】
拡張された冗長システムでは、図４に示すように、図２の基本構成のシステム要素の構成に加えて、縦続に接続するシステム構成として、基本構成システムと同様なシステム構成を加えている。ここでは、第１レベルモジュール７１、第２レベルモジュール７２、第１−２レベル機能検証モジュール７４、および第１レベル機能縮退モジュール７５が更に設けられている。
【０１３１】
縦続接続で拡張された冗長システムは、第１レベルモジュール１０、第２レベルモジュール２０、第１−２レベル抽象化モジュール３０、第１−２レベル機能検証モジュール４０、第１レベル機能縮退モジュール５０、第１レベルモジュール７１、第２レベルモジュール７２、第１−２レベル機能検証モジュール７４、および第１レベル機能縮退モジュール７５から構成される。これらの追加されたシステム構成要素は、基本システムのそれと同様である。
【０１３２】
次に、基本構成を拡張した、第Ｎレベルモジュール拡張構成、および縦列接続拡張構成の各モジュール内の処理内容については、基本構成のシステムで説明した内容と同様なので、詳細については省略する。
【０１３３】
第Ｎレベルモジュール拡張構成（図３）では、当該基本構成の機能検証と機能縮退が複数のレベルで同時並行的に実行される。そのため、第１レベルモジュールと第２レベルモジュール（第Ｎ−１レベルモジュール）にフォールトが発生しても機能縮退することができる。
【０１３４】
また、縦列接続拡張構成（図４）では、冗長システムの基本構成の機能検証と機能縮退が複数の箇所（複数の処理ノード）で行われるので、複数のフォールトが（各処理ノードの第１レベルモジュールか第２レベルモジュールのどちらかで）発生しても機能縮退することができる。
【０１３５】
次に、より具体的な実施例の一つとして、本発明による冗長システムを、画像データの伝送システムに適用した応用システムについて説明する。
【０１３６】
例えば、施設に出入りする来客者を確認する目的で施設入口にビデオカメラを設置し、距離的に離れた受付や待機場所に置いたモニタテレビに来客者の姿を表示する応用システムを、冗長システムに構成するシステム構成の一例である。冗長化した応用システムの構成を、図８に示している。
【０１３７】
図８では、図２により説明した基本構成の冗長システムを、各モジュールおよび各手段（構成要素）を具体化している。具体化の過程を明確にするために、以下の説明では、図８に示すシステム構成と図２に示すシステム構成とは、モジュール名および手段名を同じにして説明する。
【０１３８】
図８において、ビデオカメラ５０１からの動画像は画像入力装置５０２によって、６４０×４８０画素の毎秒３０フレームのＪＰＥＧ方式で圧縮されたカラーの静止画データに変換される。また、画像入力装置５０２は、この静止画データに対して単調増加変数とＣＴ変数を付与する。単調増加変数とＣＴ変数が付与された画像データが第１レベルデータとなる。
【０１３９】
この段階での具体化の過程をまとめる。単位データは、ＪＰＥＧ方式で圧縮された１画面分の画像データである。単調増加変数の値は、ビデオカメラから画像データを取得した時間を、０．０１秒単位で表した整数値とする。ＣＴ変数の値（ＣＴ値）は、初期値１．０の実数値であり、抽象化、逆抽象化およびフォールト検出によって、次のように変化するものとする。
（１）抽象化手段を経由するとＣＴ値を０.９倍する（（４ａ）のＣＴ値の変化に対応している）。
（２）逆抽象化手段を経由するとＣＴ値を０.９倍する（（４ｂ）のＣＴ値の変化に対応している）。
（３）誤りを検出するとＣＴ値を０にする（（４ｃ），（４ｄ）のＣＴ値の変化に対応している）。
【０１４０】
各モジュールおよび各手段の具体的な処理内容を説明すると、第１レベルモジュール５１０において第１レベルデータ処理手段５１１は、第１レベルデータを１／３にコマ落とし処理するプログラムである。６４０×４８０画素の毎秒３０フレームの画像データを６４０×４８０画素の毎秒１０フレームの画像データに変換する。第１レベルデータ通信手段５１２は、毎秒１００メガビットのローカルエリアネットワークと制御プログラムであり、６４０×４８０画素の毎秒１０フレームの画像データをモニタテレビ側に伝送する。また、第１レベルモジュールの誤り検出手段（図示せず）は、例えば、コマ落とし処理プログラムからのエラー情報と毎秒１００メガビットのローカルエリアネットワークからの通信エラーを集約するプログラムである。
【０１４１】
第１−２レベル抽象化モジュール５３０における抽象化手段（図示せず）は、次の３つの操作を実行する。
（１）カラーの画像データから輝度信号を抽出してモノクロの画像データへ変換する操作（操作（１ａ）の特定の属性に関する情報を抽出する操作に対応している）。
（２）１秒当たりのフレーム数を１/１０にする操作（操作（１ｂ）の時間的な標本間隔を広くする操作に対応している）。
（３）画面サイズをX軸Y軸それぞれの方向に１/２にする操作（操作（１ｃ）の空間的な標本間隔を広くする操作に対応している）。
【０１４２】
したがって、第２レベルデータは、第１レベルデータである６４０×４８０画素の毎秒３０フレームのカラー画像データに、抽象化操作されて３２０×２４０画素の毎秒３フレームのモノクロ画像データとなる。
【０１４３】
第２レベルモジュール５２０の第２レベルデータ処理手段５２１は、第２レベルデータを１／３にコマ落とし処理するプログラムである。３２０×２４０画素毎秒３フレームのモノクロ画像データを３２０×２４０画素毎秒１フレームのモノクロ画像データに変換する。第２レベルデータ通信手段５２２は、毎秒１０メガビットのローカルエリアネットワークと制御プログラムであり、３２０×２４０画素毎秒１フレームのモノクロ画像データをモニタテレビ側に伝送する。
【０１４４】
また、第２レベルモジュールの誤り検出手段（図示せず）は、例えば、コマ落とし処理プログラムからのエラー情報と毎秒１０メガビットのローカルエリアネットワークからの通信エラーを集約するプログラムである。
【０１４５】
第１−２レベル機能検証モジュール５４０の抽象化手段は、第１−２レベル抽象化モジュール５３０の抽象化手段と同じ操作を実行する。第１−２レベル機能検証モジュール５４０の照合手段は、図５により説明したような照合手順を実行する。なお、照合手段においては、入力毎に画像データを少なくとも１秒間バッファに保持する（Ｔｄ＝１secに相当）。また、単位データの内容の一致比較については、ＪＰＥＧ方式で圧縮された画像データのデータ量が内容によって変化することを利用して、データ量だけの比較で代替する。第１−２レベル機能検証モジュール５４０の選択制御手段は、図６により説明したような選択制御手順を実行する。
【０１４６】
第１レベル機能縮退モジュール５５０における逆抽象化手段は、次の２つの操作を実行する。
（１）１秒当たりのフレーム数を補間によって１０倍にする（（３ｂ）の時間的な標本間隔の間のデータを補間する操作に対応している）。
（２）画素数をＸ軸Ｙ軸それぞれの方向に２倍にする（（３ｃ）の空間的な標本間隔の間のデータを補間する操作に対応している）。
なお、ここでは（３ａ）に対応する操作はない。つまり、復元画像データはモノクロ画像である。
【０１４７】
第１レベル機能縮退モジュール５５０の選択制御手段は、図７により説明したような選択制御手順を実行する。モニタテレビ５０３は、第１レベル機能縮退モジュール５５０の選択制御手段の出力結果を映像として表示する。
【０１４８】
（応用システムの動作例）
応用システムの典型的な動作を次に説明する。特に、ＣＴ値の比較による、第１レベル機能縮退モジュールの選択制御手段でのより確からしいデータの選択の様子を中心にして述べる。
【０１４９】
（動作例１（機能検証））
フォールトが検出されていない場合：
第１レベル機能縮退モジュールの選択制御手段は、第１レベルデータを選ぶ。なぜなら、復元データは抽象化と逆抽象化を経由しているので、ＣＴ値＝０．８１となり、第１レベルデータのＣＴ値＝１．０が大きくなるためである。
【０１５０】
モニタテレビ５０３はビデオカメラ５０１からの映像を６４０×４８０画素毎秒１０フレームでカラー表示する。
【０１５１】
なお、機能検証のための画像データの照合は常時実行され、「フォールトなし」との結果が（コンソール等に）出力される。
【０１５２】
（動作例２（機能縮退））
第１レベルモジュールでフォールトが発生した場合：
第１レベル機能縮退モジュールの選択制御手段は、復元データを選ぶ。なぜなら、第１レベルデータのＣＴ値はフォールト発生によりＣＴ値＝０となり、復元データのＣＴ値＝０.８１が大きくなるためである。
【０１５３】
モニタテレビ５０３はビデオカメラ５０１からの映像を（３２０×２４０画素毎秒１フレームの画像データから復元して）６４０×４８０画素毎秒１０フレームでモノクロ表示する。同時に「第１レベルモジュールにフォールトあり」との結果が（コンソール等に）出力される。
【０１５４】
（動作例３）
第２レベルモジュールでフォールトが発生した場合：第１レベル機能縮退モジュールの選択制御手段は、第１レベルデータを選ぶ。なぜなら、第２レベルデータのＣＴ値は０であり、逆抽象化後もＣＴ値が０になるので、第１レベルデータのＣＴ値＝１．０が大きくなるためである。
【０１５５】
モニタテレビ５０３は、ビデオカメラ５０１からの映像を６４０×４８０画素毎秒１０フレームでカラー表示する。同時に「第２レベルモジュールにフォールトあり」との結果が（コンソール等に）出力される。
【０１５６】
【発明の効果】
以上に説明した本発明による冗長システムによれば、以下の（効果１）から（効果４）の効果を奏することできる。
（効果１）
冗長システムに構成する前段階のシステムと同じモジュールを多重化するのではなく、基本モジュールに対して情報量を縮減したデータを扱う冗長モジュールを多重化することで、所要リソースの増大は抑制される。
【０１５７】
（効果２）
データ処理手段や通信手段に個別に設置した誤り検出手段および抽象データの等価性を検証する照合手段を利用することによって、ハードウェアの故障はもちろん、ソフトウェアのバグ、設計ミス、過負荷、輻輳等を含む広義の障害を検出することができる。
【０１５８】
（効果３）
フォールトが検出された場合に、その発生要因や発生箇所に応じて、抽象データから復元されたデータを出力結果とすることで、広義の障害に対して機能縮退することができる。
【０１５９】
（効果４）
冗長システムに構成する前段階のシステムの構成に応じて、第Ｎレベルモジュール拡張構成あるいは縦列接続拡張構成への拡張が可能になる。拡張した構成では、複数のフォールトに対しても機能縮退することができる。
【０１６０】
以上の効果により、所要リソースを抑制しながら、正常時にシステム機能検証を実行し、フォールト発生時に機能縮退を実行する冗長システムを容易に構成することできる。
【図面の簡単な説明】
【図１】本発明の基本的な構成手順を示す図、
【図２】本発明の実施形態の冗長システムの基本構成を示す図、
【図３】第Ｎレベルモジュール拡張構成において第３レベルモジュールを追加設置したシステム構成を示す図、
【図４】縦列接続拡張構成において基本構成を２段階に縦列接続したシステム構成を示す図、
【図５】出力結果の時間差を考慮した照合手順を示すフローチャート、
【図６】選択制御手段４３の選択制御手順を示すフローチャート、
【図７】選択制御手段５２の選択制御手順を示すフローチャート、
【図８】実施例としての応用システムの構成を示す図である。
【符号の説明】
１…第１レベル入力データ
２…第１レベル出力結果
３…第２レベル出力結果
１１…第１レベルデータ処理手段
１２…第１レベルデータ通信手段
１３…誤り検出手段
１４…誤り検出手段
２０…第２レベルモジュール
２１…第２レベルデータ処理手段
２２…第２レベルデータ通信手段
２３…誤り検出手段
２４…誤り検出手段
３０…第１−２レベル抽象化モジュール
３１…抽象化手段
４０…第１−２レベル機能検証モジュール
４１…抽象化手段
４２…照合手段
４３…選択制御手段
５０…第１レベル機能縮退モジュール
５１…逆抽象化手段
５２…選択制御手段
６０…第３レベルモジュール
６３…第２−３レベル抽象化モジュール
６４…第２−３レベル機能検証モジュール
６５…第２−３レベル機能縮退モジュール
７１…第１レベルモジュール
７２…第２レベルモジュール
７４…第１−２レベル機能検証モジュール
７５…第１レベル機能縮退モジュール[0001]
BACKGROUND OF THE INVENTION
The present invention relates to a redundant system that continues operation with a partial function when a fault occurs in a component of a computer system. Specifically, even when a fault occurs in a component of a computer system, the present invention relates to a redundant system. The present invention relates to a redundant system in which functions of a processing device and a communication device are multiplexed so that system operation can be continued with a typical function, a redundant system configuration system that configures a redundant system, and a redundant system configuration method.
[0002]
[Prior art]
Conventionally, a redundant system for continuing operation with a partial function when a fault occurs in a component of a computer system has been proposed. An outline of this type of computer system will be described.
[0003]
Even if a fault occurs in a computer system component (a broad failure including hardware failure, software bug, design error, overload, congestion, etc.), the function of the system is normal as far as seen from the outside. A system that is maintained in this way is called a fault tolerant system. On the other hand, even if not all normal functions of the system can be maintained due to the occurrence of a fault, a system that maintains a part of the functions and continues to operate is called a fail soft system. A general term for a technique for constructing a fault tolerant system or a fail soft system is a fault tolerance technique.
[0004]
The basis of the fault tolerance technique is a method of configuring a redundant system by multiplexing. There are a static redundancy method and a dynamic redundancy method for multiplexing on hardware. There is an N version method for software multiplexing.
[0005]
The static redundancy method is a method of masking so that the influence of a fault does not appear in the output. A representative example of the static redundancy system is a multiplexed majority system. For example, if the same input is given to three identical processing modules and the majority of each output is taken, even if a fault occurs in one module, the majority voted output usually has a correct result.
[0006]
The dynamic redundancy system is a system in which a faulty component is switched to a spare element, or a faulty component is removed as much as possible to reduce the function. The former is also called a system reconfiguration method, and the latter is also called a function degeneration method.
[0007]
Switching to spare elements is often employed in normal high reliability systems. To prepare processor level spare elements, dual method (synchronous operation of the same processing by two processors and diagnosis / separation / resumption of processing by comparison check), duplex method (two processors are assumed to be active) When an error is detected in the active processor, the processor is automatically or manually switched to the spare processor). As a device for preparing a spare element at the peripheral device level, there is a method called RAID (Redundant Arrays of Independent Disks) of a parallel storage device using a spare magnetic hard disk drive in an external storage device.
[0008]
In many cases, the function degeneration is performed on a redundant processor or memory composed of a plurality of components. In the function degradation of a multiprocessor system composed of a plurality of processors, a failed processor is disconnected from the system, and processing is continued only with the remaining processors. In the function degradation of the memory, the faulty memory block is separated, the entire memory capacity is reduced, and the process is continued. In either case, a decrease in system performance is inevitable.
[0009]
The N version method is a method of simultaneously executing N programs created independently based on the same specification, and selecting, from the obtained N execution results, for example, a result that seems to be correct according to a majority algorithm. .
[0010]
As can be seen from the above description, in the redundant configuration by multiplexing, the basic policy is to prepare a plurality of components having the same specification and operate one of them in parallel, or at least one other in parallel operation or standby. . The reason for this is to facilitate switching and disconnection of the component in which the failure has occurred.
[0011]
In the conventional redundant configuration method, not all the components of the system are multiplexed, but only the portion where the fault is likely to occur or the portion where the occurrence of the fault has a fatal influence on the system function is multiplexed. Many. The reason for this is that, firstly, multiplexing all the components greatly increases the cost of system construction. Second, the location to be multiplexed is determined based on the probability of hardware failure. This is because even if it is limited, there is a certain effect in improving the fault resistance.
[0012]
Next, a multi-layer model representation method and multi-level simulation, which are technologies related to abstraction for reducing the amount of information, will be described.
[0013]
The multi-layer model representation method describes the structure and operation of a computer system at an abstract representation level (variable region representing abstract attributes) and a specific representation level (variable region representing concrete attributes). It is a technique for.
[0014]
Usually, the structure description at a specific expression level and the components of the actual computer system are expressed in a one-to-one correspondence. The structure description at the abstract expression level is indirectly associated with the components of the computer system through the description of the upper-lower correspondence relationship. This technique has been used mainly in the field of logic device design.
[0015]
Multilevel simulation is a verification method for confirming the consistency of the description of a multi-layer model. If the description of the multi-hierarchy model is consistent, the design of the computer system (logical device) that is the representation target is correct (see Non-Patent Document 1).
[0016]
The verification procedure of multilevel simulation will be briefly described. As a premise, it is assumed that there is a description of a multi-hierarchy model of a logical device to be verified. That is, the logic device is described in terms of structure and operation at two expression levels (abstract upper expression level; eg, register transfer level, lower expression level; eg, gate level) based on the multi-layer model expression method. It is also assumed that the logical device input / output and state upper and lower correspondences between the two representation levels (correspondence between the upper representation level variables and the lower representation level variables) are clearly described. When there are three or more expression levels in the multi-hierarchy model, verification focusing on two (adjacent) expression levels in which the upper and lower correspondence relationships are described is repeated a plurality of times.
[0017]
First, an output result R1 for a specific test pattern is obtained by simulation from the behavioral description (obtained by converting the structural description) at the lower representation level. Next, the test pattern is converted into input data at the upper representation level using the upper and lower correspondence relationship, and an output result R2 for the data is obtained by simulation from the behavioral description of the logic device at the upper representation level.
[0018]
On the other hand, the output result R1 is converted into the output result R1 'using the upper and lower correspondences in the same manner as the test pattern conversion. Thereafter, the converted output result R1 'is collated with the output result R1. When the verification is successful for a plurality of appropriate test patterns, it is determined that the description of the multi-layer model (of the logical device) is consistent.
[0019]
In a normal simulation, the designer must observe the simulation results one by one to determine the correctness of the logic device. However, using a multilevel simulation has the advantage that the observation and determination can be performed automatically.
[Non-Patent Document 1]
Takuo Kuno: Verification of multi-layer model by multi-level simulation, IEICE Transactions D-II, Vol. J76-D-II, No. 4, pp.908-913 (1993)
[0020]
[Problems to be solved by the invention]
By the way, it is expected that computer systems will permeate every scene in the future. In a society where support by computer systems is common regardless of time and place, there is a risk that a fault in the system will stop all modern life activities. In order not to reveal such danger, it is desired to construct and operate a redundant system with improved fault resistance.
[0021]
In a redundant configuration of a conventional computer system, if all the components are multiplexed, the system construction cost will increase significantly. Therefore, the locations to be multiplexed are limited based on the probability of hardware failure, and faults will occur. In many cases, only easy-to-use parts or parts where the occurrence of a fault has a fatal effect on the system function are multiplexed.
[0022]
Even in a partially redundant configuration, if the fault occurrence is predicted correctly, the fault tolerance will surely improve. However, not only faults in the broad sense and hardware failures, but errors such as software bugs, design errors, overloads, congestions, etc. are assumed. In a partially redundant configuration, it is extremely difficult to guarantee continued operation.
[0023]
In a society that presupposes the ubiquity of computer systems, there is a further need for a redundant configuration method that has high fault tolerance for the entire system and that can suppress the increase in hardware and software (hereinafter resources) required for system construction. It is done.
[0024]
Therefore, as a candidate for such a redundant configuration method, in order to reliably improve fault tolerance, information is multiplexed to consistently multiplex from the data generation part to the final output part of the data to suppress an increase in required resources. A mechanism of a fail soft system that uses data with a reduced amount is desired.
[0025]
The present invention has been made to solve the above-described problems. Specifically, as problems to be overcome when the above-described mechanism is realized, the following problems are specifically described (Problem 1). (Problem 3) is a problem that should be specifically solved.
[0026]
(Problem 1)
Assuming that it is difficult to predict the location of fault occurrence and the cause of fault occurrence, the most effective redundant configuration is to multiplex all the components. However, if a plurality of components having the same specifications are prepared as in the prior art, the resources required for system construction increase more than twice, and such a configuration is not practical in many situations.
[0027]
As a means to reduce the required resources, for example, a redundant module that handles data with a reduced amount of information is prepared for the basic module, and the configuration from the data generation part to the final output part of the data is consistently multiplexed. I need it. Furthermore, the redundant module must be configured to be able to supply data that can be used for function degeneration by serving as a spare element of the basic module when a fault occurs. The “basic module” here refers to each part of the system in the previous stage (non-redundant system) to make a redundant configuration, and the “redundant module” is an additional part added to make a redundant configuration. A module.
[0028]
(Problem 2)
For hardware faults, it is effective to detect faults by error detection by a conventional self-diagnosis circuit or by coincidence comparison of synchronous operation results in the dual method. However, it is difficult to detect software bugs, design mistakes, overload, congestion, etc. by self-diagnosis or synchronous operation. For this reason, it must be possible to detect faults as faults in a broad sense. Therefore, there is a need for a means for detecting a fault by constantly comparing whether or not a desired operation defined at the system design stage matches the operation of an operating system.
[0029]
(Problem 3)
In order for the system to be a fail soft system with improved fault tolerance, if the fault occurrence location is a redundant module, the basic module data is output as is, and if the fault occurrence location is a basic module, the redundant module It must be possible to control such that the original data is restored from the data and output. When a fault is detected, a control means for selecting more reliable data and degenerating the function is required.
[0030]
Accordingly, an object of the present invention is a fail software system for solving the problems studied in the above (Problem 1) to (Problem 3), performing system function verification when normal, and degenerating function when a fault occurs. A redundant system, a redundant system configuration system, and a redundant system configuration method are provided. Specifically, an object of the present invention is to provide redundancy by multiplexing the functions of a processing device and a communication device so that the system operation can be continued with a partial function even when a fault occurs in a component of a computer system. It is an object to provide a system, a redundant system configuration system constituting the redundant system, and a redundant system configuration method.
[0031]
[Means for Solving the Problems]
  In order to achieve the above object, the redundant system according to the present invention includes:As a basic configuration,A redundant system that multiplexes the functions of processing devices and communication devices so that the system operation can be continued with partial functions even when a fault occurs in a component of the computer system. A first level module of a system function that processes or communicates;By abstraction means to abstract so that the function can be reducedInformation amount from the normal dataButReductionWasFunction verification for verifying the system function by comparing the second level module of the system function for processing or communicating data, the data obtained by reducing the amount of information from the output data of the first level module, and the output data of the second level modulemoduleAnd the functional verificationmoduleIf a fault is detected byIf no fault is detected in the first level module, the output data of the first level module is selected as it is, and if a fault is detected in the first level module, the abstraction by the abstraction means from the second level module is Select the restored data by performing the reverse processFunctional degradation to degenerate system functionsmoduleWith.
[0032]
  Also, here the function degenerationmoduleIs a system function verificationFor example, by comparing the magnitude of the value of a variable representing the probability of data, which is changed by error detection and an operation for reducing the amount of information in the process of communication or processing in each of the first level module and the second level module,If no fault is detected in the first level module, the output data of the first level module is selected as it is, and if a fault is detected in the first level module, the data restored from the second level module is selected. Degenerate system functionsConfigured as.
[0033]
  Further, the redundant system configuration system according to the present invention is a redundant system in which the functions of the processing device and the communication device are multiplexed so that the system operation can be continued with a partial function even when a fault occurs in the component of the computer system. A redundant system configuration system constituting the system, from a first level module of a system function for processing or communicating normal data of a computer system,By abstraction means to abstract so that the function can be reducedA degenerate model creating means for creating a second level module that processes or communicates data obtained by reducing the amount of information from the normal data, and the first level module and the second level module are operated simultaneously in parallel to detect a fault. If not, verify system functionality and if a fault is detectedIf no fault is detected in the first level module, the output data of the first level module is selected as it is, and if a fault is detected in the first level module, the abstraction by the abstraction means from the second level module is Select the restored data by performing the reverse processA function degeneration module creating means for creating a function degeneration module for function degenerationConfigured as follows.
[0034]
  The redundant system configuration method according to the present invention provides a redundant system in which the functions of a processing device and a communication device are multiplexed so that system operation can be continued with partial functions even when a fault occurs in a component of a computer system. A redundant system configuration method for configuring a first level module of a system function for processing or communicating normal data of a computer system,By abstraction means to abstract so that the function can be reducedA second level module that processes or communicates data with a reduced amount of information from the normal data is created, and the first level module and the second level module are operated simultaneously in parallel. When no fault is detected, the system function is And when a fault is detectedIf no fault is detected in the first level module, the output data of the first level module is selected as it is, and if a fault is detected in the first level module, the abstraction by the abstraction means from the second level module is Select the restored data by performing the reverse processCreate a function degeneration module that degenerates functionsIt is set as such.
[0035]
  The redundant system according to the present invention is a redundant system in which the functions of the processing device and the communication device are multiplexed so that the system operation can be continued with a partial function even when a fault occurs in a component of the computer system. A first level module of a system function for processing or communicating normal data of a computer system;To reduce the functionalityA second level module of a system function for processing or communicating data in which the information amount is reduced from the normal data, and a third level of a system function for processing or communicating data in which the information amount is reduced from the data of the second level module A module, first verification means for verifying the system function by collating the data obtained by reducing the amount of information from the output data of the first level module and the output data of the second level module, and calculating the amount of information from the output data of the second level module A second verification means for verifying the system function by comparing the reduced data and the output data of the third level module; and a function reduction for reducing the system function when a fault is detected by the first verification means or the second verification means.moduleIt is good also as what comprises.
[0036]
Further, the redundant system configuration system according to the present invention is a redundant system in which the functions of the processing device and the communication device are multiplexed so that the system operation can be continued with a partial function even when a fault occurs in the component of the computer system. A redundant system configuration system constituting a system, wherein a second level module of a system function for processing or communicating normal data of a computer system processes or communicates data obtained by reducing the amount of information from the normal data. A multi-layer degenerate model creating means for creating a level module and creating a third level module for processing or communicating data reduced in the amount of information from the data of the second level module; the first level module; the second level module; Run third level modules in parallel to detect faults It validates the system functions when not in, or as comprising a functional degeneration module creation means for creating functional regressions module that functions degenerate when a fault is detected.
[0037]
Further, the redundant system configuration method according to the present invention is a redundant system in which the functions of a processing device and a communication device are multiplexed in order to continue system operation with a partial function even when a fault occurs in a component of a computer system. A redundant system configuration method for configuring a system, wherein a second level module of a system function for processing or communicating normal data of a computer system processes or communicates data obtained by reducing the amount of information from the normal data. A level module is created to create a third level module that processes or communicates data with a reduced amount of information from the data of the second level module, and the first level module, the second level module, and the third level module are simultaneously executed in parallel. System function and verify system function when no fault is detected , May be created functional regressions module that functions degenerate when a fault is detected.
[0038]
In this case, at least one level from the third level module to the Nth level module (N> 3) is added to the first level module and the second level module, and from the first level module to the Nth level module. Are executed in parallel and verification and degeneration are executed.
[0039]
In the following explanation, “fault” means a broad failure including hardware failure, software bug, design error, overload, congestion, etc. “Data with reduced amount of information” means “abstraction "Data that has been processed. For example, it refers to data that has been converted to reduce the amount of data information by applying an operation for extracting information about a specific attribute from original data or an operation for representing specific pattern information with specific symbol information. The abstracted data is called “abstract data” with respect to the original data. A transformation whose data flow is opposite to that of abstraction is called “reverse abstraction”. The reverse abstracted data is called “restored data”.
[0040]
The “first level module” refers to each part of the system in the previous stage in which the system is made redundant in the computer system to which the redundant system according to the present invention is applied. Of course, all the functions of the system can be realized with only the first level module if no fault is assumed. “First level data” in the following description refers to normal data processed or communicated mainly in the first level module.
[0041]
The “second level data” is data obtained by abstracting the first level data. The “second level module” refers to a module that mainly processes or communicates second level data. In a similar sense, “third level data” and “third level module” and “Nth level data” and “Nth level module” (N> 3) are defined.
[0042]
The first level module is also referred to as a “basic module” because it is a system function module that processes or communicates normal data. A module other than the “basic module” in the computer system to which the redundant system according to the present invention is applied is a “redundant module”.
[0043]
In the redundant system configuration method according to the present invention, a redundant module that handles data with a reduced amount of information is created and installed according to the following configuration procedure, and is consistent from the data generation part to the final output part of the data. Multiplex.
[0044]
That is, when configuring a redundant system in which the functions of a processing device and a communication device are multiplexed so that the system operation can be continued with partial functions even when a fault occurs in a component of the computer system, The system configuration data for the redundant system is input, and the data is created as a first level module of a system function for processing or communicating normal data of the computer system. From this first level module, a second level module for processing or communicating data obtained by reducing the amount of information from normal data is created. Then, the first level module and the second level module are operated simultaneously in parallel, and when a fault is not detected, the system function is verified, and when a fault is detected, a function degenerate module that degenerates the function is created.
[0045]
Specifically, as shown in FIG. 1, first, the system configuration at the previous stage for making the redundant configuration is input. This constitutes the first level module (step 100). Next, a multi-layer model of the system in the previous stage to make a redundant configuration is created. Thereby, the second level module, the third level module,..., The Nth level module can be created. At this time, each component appearing in the structure description at the lower representation level of the multi-hierarchy model corresponds to the first level module (step 101).
[0046]
Next, a second level module (substantial) corresponding to each component appearing in the structure description of the higher representation level of the multi-layer model is installed (step 102).
[0047]
Here, at the stage where the multi-layer model is created, the behavioral description at the higher representation level represents the behavior of the structural description at the higher representation level. That is, the behavioral description at the higher representation level also represents the behavior of the second level module configured based on the structural description at the higher representation level. On the other hand, in the system design using the multi-hierarchy model, the desired abstract behavior, that is, the function of the system is expressed in the behavioral description at the higher expression level as a part of the specification. Therefore, the desired abstract behavior of the system, i.e. the function and the behavior of the second level module, will match as long as the correct implementation / realization is made.
[0048]
Finally, in order to create a module that performs control to degenerate system functions, abstraction means and de-abstraction means corresponding to the description of the upper-lower correspondence relationship of the multi-layer model, and collation means and functions for function verification A selection control means for degeneration is created and installed (step 103, step 104).
[0049]
According to the redundant system created by such a system configuration method, the first level module and the second level module that realizes the abstract operation (ie, function) for the first level module are simultaneously Therefore, functional verification equivalent to multi-level simulation can always be executed during system operation. Furthermore, it is possible to detect a fault based on a verification failure at the time of verification and perform function degeneration.
[0050]
In the redundant system here, the function verification and the function degeneration are realized as follows. That is, the function verification uses the principle that equivalent output data can be obtained by inputting equivalent data to two sets of equivalent processing means and communication means. As data with a reduced amount of information (by the abstraction means), it is converted into equivalent data among the level modules, and collated to check equivalence. If equivalent output data is not obtained, it is determined that a fault has occurred in either of the two sets of processing means or communication means of the first level module or the second level module.
[0051]
Functional verification assumes a redundant configuration. The first level module and the second level module are functionally equivalent except for the difference in the amount of data handled. When specific input data is input to the first level module, at the same time, the input data is abstracted and input to the second level module. After the first level module and the second level module operate simultaneously in parallel, the output data of the first level module and the output data of the second level module are converted into equivalent data between each level module and collated. The Here, in the verification of the system function, the output data of the first level module is abstracted, and the abstract data and the output data of the second level module are verified. If the collation is successful as a result of collation, “no fault” is determined, and if the collation fails, “fault is present” is determined.
[0052]
The procedure for collating the data obtained by abstracting the processing result of the input data with the processing result of the data abstracting the input data is the same as that of the multilevel simulation. In the multi-level simulation, it is possible to confirm whether or not the operation (that is, the function) of the higher expression level is satisfied by checking the operation of the structure of the lower expression level.
[0053]
Similarly, in the verification of the system function, it is confirmed whether or not the desired operation (that is, the function) defined at the system design stage matches the operation of the operating system. However, in the multi-level simulation, the designer had to prepare a test pattern as input data in advance, but the functional verification in the redundant system according to the present invention is different in that the actual data during operation is a target for verification.
[0054]
Further, in the function verification here, in addition to the fault detection based on the collation as described above, for example, error detection means provided independently for each of the first level module and the second level module as described in the embodiment. Also, fault detection can be performed. Various functions (hardware failure, software bug, design error, overload, congestion, etc.) can be detected by performing functional verification by combining the verification unit and the error detection unit.
[0055]
In function degeneration, for example, data selection is performed based on a fault detection result, and function degeneration is executed using restored data. When selecting the first level data, based on the fault detection result by the collating means and the error detecting means, the data that is more abstract from the restored data or the first level data is selected from the first level data. Output as.
[0056]
  Second level dataTheWhen selecting, based on the fault detection result by the collating means and the error detecting means, more likely data is selected from the abstracted abstract data or the second level data, and is output as the second level data.
[0057]
  In addition, in order to select certain data, the data to be processed or the data to be communicated includes variables representing the certainty of the data (monotonically increasing variable, CT variable).TheYou may make it provide. A variable serving as an index of accuracy is a value (CT value) that decreases monotonically due to abstraction, de-abstraction, or occurrence of a fault, and is held corresponding to each data. The more likely data selection is based on a comparison of the values of the variables.
[0058]
Since the main output result of the system is the first level data, the selection control of the first level data is particularly important in the function degeneration. As a result of the output of the first level data, if no fault is detected in the first level module, the output data of the first level module is selected as it is, and if a fault is detected in the first level module, the abstraction is performed. Control is performed to select data restored from the two-level data.
[0059]
Here, it should be noted that the amount of information of the restored data is less than that of the original first level module data because it is accompanied by abstraction and reverse abstraction. Therefore, more errors are included in the results obtained by subsequent processing or communication. In other words, the system continues to operate while degrading the function in the sense that the error shifts to the output result when a fault occurs in the first level module.
[0060]
DETAILED DESCRIPTION OF THE INVENTION
Next, a mode for carrying out the present invention will be specifically described with reference to the drawings. Prior to this description, the terms used in the description of the embodiments of the present invention will be defined for better understanding of the contents.
[0061]
Terms are defined as follows: “Unit data” refers to a set of data collected for each execution unit of data processing and communication. The division of continuous data into unit data is usually performed at the data generation stage, and is used as a variable used to represent the likelihood of data, and a monotonically increasing variable and a CT variable (described later) for each of the divided unit data. ) Is granted. However, in the description here, “unit data” and “data” are used in the same meaning except when referring to a monotonically increasing variable or a CT variable.
[0062]
The “monotonically increasing variable” is a variable given to all unit data for storing a monotonically increasing value. “Restriction of monotonically increasing variable” refers to a restriction on a range of values that can be substituted for a monotonically increasing variable. The value of the monotonically increasing variable of any unit data must be equal to or greater than the value of the monotonically increasing variable of the previous unit data in the time series. The “CT variable” is a variable given to all unit data in order to store a value serving as an index for selecting more likely data in the selection control means. The value of the CT variable is referred to as “CT value”. The range of values is 0 ≦ CT value ≦ 1. When the CT value = 1, it is assumed that “the most probable”. Here, CT is an abbreviation for Certainty.
[0063]
(Basic configuration)
The redundant system according to the present invention is created according to the outlined configuration procedure (FIG. 1). As shown in FIG. 2, the basic configuration of the redundant system includes a first level module 10, a second level module 20, a 1-2 level abstraction module 30, a 1-2 level function verification module 40, and a first level module. The level function degeneration module 50 is configured.
[0064]
The first level module 10 includes a first level data processing unit 11, a first level data communication unit 12, and error detection units 13 and 14 added thereto. Similarly, the second level module 20 is composed of a second level data processing means 21, a second level data communication means 22, and error detection means 23 and 24 added thereto.
[0065]
The first-second level abstraction module 30 interposed between the first level module 10 and the second level module 20 is composed of the abstraction means 31, and the abstraction means 31 converts the data into data with a reduced amount of information. Then, the first level input data is converted into second level input data. The first-second level function verification module 40 includes an abstraction unit 41 that abstracts the output data of the first level module, a collation unit 42 that compares and collates the abstract data with the output data of the second level module, and an abstract. It comprises selection control means 43 for selecting the second level output result 3 from the data and the output data of the second level module.
[0066]
The first level function degeneration module 50 includes a reverse abstraction means 51 for restoring data from the second level data to the first level data, and outputs the first level output result 2 from the restored data and the output data of the first level module. And a selection control means 52 to be selected.
[0067]
Of course, the configuration in which the first level data processing means 11 and the first level data communication means 12 in the first level module 10 are combined in different ways (number and order of each means) is equivalent to this basic configuration. is there. Similarly, a configuration in which the second level data processing means 21 and the second level data communication means 22 in the second level module 20 are combined in different ways (number and order of each means) is equivalent to this basic configuration.
[0068]
(Processing contents of each module)
Next, processing contents of each module in the basic configuration of the redundant system will be described. The first level data processing means 11 and the first level data communication means 12 in the first level module 10 respectively execute data processing and data communication corresponding to each part of the system in the previous stage having a redundant configuration. That is, it is a module that processes or communicates normal data.
[0069]
The error detection means 13 added to the first level data processing means 11 detects an error in the output of the processing means based on the error detection code and the time monitoring timer. The error detection means 13 can detect a fault when a hardware failure of the data processing system occurs.
[0070]
Similarly, the error detection means 14 added to the first level data communication means 12 detects an error in the output of the communication means based on the error detection code and the time monitoring timer. The error detection means 14 can detect a fault when a hardware failure of the data communication system occurs.
[0071]
The second level data processing means 21 and the second level data communication means 22 in the second level module 20 perform data processing and data communication of second level data in which the amount of information is reduced (abstracted) with respect to the first level data. Are executed respectively.
[0072]
The error detection means 23 added to the second level data processing means 21 detects an error in the output of the processing means based on the error detection code and the time monitoring timer. Further, the error detection means 24 added to the second level data communication means 22 detects an error in the output of the communication means based on the error detection code and the time monitoring timer.
[0073]
The abstraction means 31 of the 1-2 level abstraction module 30 obtains the input data of the second level module by abstraction from the input data of the first level module (first level input data 1).
[0074]
Here, the abstraction means 31 of the first-second level abstraction module 30 and the abstraction means 41 of the first-second level function verification module 40 perform the following operations (1a) to (1d) respectively. It is a processing module that reduces the amount of data and executes abstraction. Ie
Operation (1a): Operation for extracting information on a specific attribute,
Operation (1b): Operation to widen the temporal sampling interval in data with temporal changes,
Operation (1c): Operation to widen the spatial sampling interval in data with spatial expansion,
Operation (1d): Operation combining these.
Here, in the operation (1b), the monotonically increasing variable given to the unit data is changed. When the operation (1b) is applied to a series of unit data, the monotonically increasing variable assigned to the reduced unit data is also deleted.
[0075]
The processing contents of the abstraction means 41 and the collation means 42 in the first-second level function verification module 40 are as follows.
[0076]
The abstraction means 41 of the first-second level function verification module 40 abstracts the output data of the first level module. That is, the abstraction means 41 applies the same operations (1a) to (1d) as the abstraction means 31 described above to the output data of the first level module.
[0077]
The collation means 42 in the first-second level function verification module 40 compares and collates the output data of the second level module with the abstracted output data of the first level module. The collation means 42 determines that “no fault” if the collation is successful, and “has fault” if the collation fails.
[0078]
Further, by the collation by the collating means 42, it is possible to confirm whether or not the desired operation (that is, the function) defined at the system design stage and the operation of the operating system are in agreement. When the collating unit 42 determines that “no fault”, it can be determined that the system is performing a desired operation (ie, function) defined at the design stage.
[0079]
Here, the collating means 42, the first level data error detecting means (the error detecting means 13 added to the first level data processing means 11, the error detecting means 14 added to the first level data communication means 12). By combining the results of the error detection means of the second level data (the error detection means 23 added to the second level data processing means 21 and the error detection means 24 added to the second level data communication means 22), Detects various faults (hardware failure, software bug, design error, overload, congestion, etc.).
[0080]
Here, a hardware failure is configured to be detected in each level module. That is, the error detection means 13 and 14 of the first level module 10 detects “a hardware failure of the first level module”. Further, the error detection means 23, 24 of the second level module detects “a hardware failure of the second level module”. Regardless of the result of the error detection means 13, 14, 23, 24, when either the first level module 10 or the second level module 20 outputs abnormal data due to a hardware failure, the verification means 42 causes a fault. Will be detected.
[0081]
  In the redundant system here, the first level module 10 and the second level module 20 (or the Nth level module) have different implementation / implementation of the same specification for software bugs and design error faults. Therefore, within the same range as the software N version method, the collating means 42ButWill be detected.
[0082]
By the way, with respect to overload and congestion, when data is processed or communicated between the first level module 10 and the second level module 20, the performance of each module in the bottleneck due to the difference in the information amount of data handled here. Since the upper limit is different, it appears as a time delay of the output result, and this can be detected as a fault by the collating means 42.
[0083]
Next, the collation procedure in consideration of the time difference of the output result in the collation means 42 will be described in detail. FIG. 5 is a flowchart for explaining a collation procedure that takes into account the time difference of the output results applied to the collation means. The collating procedure will be described in accordance with the flowchart.
[0084]
First, the time difference existing in the output result and the monotonically increasing variable introduced to deal with it will be described. Normally, there is a steady difference between the time required for processing or communication in the first level module and the time required for processing or communication in the second level module in addition to the time delay of the output result due to overload or congestion. Exists. The time difference is caused by a difference in information amount of data and a resource in executing processing or communication.
[0085]
The matching means 42 uses a monotonically increasing variable in order to absorb this steady time difference at the time of matching and distinguish it from the time delay of the output result due to overload or congestion. The value of the monotonically increasing variable of the unit data is equal to or greater than the value of the monotonically increasing variable of the immediately preceding unit data on the time series. Here, as a value satisfying the constraints of the monotonically increasing variable, the real time when the unit data is generated is converted into a numerical value in a specific format and substituted into the monotonically increasing variable.
[0086]
Assuming such a monotonically increasing variable, in any two unit data, when the values of each monotonically increasing variable are equal, it is guaranteed that the same data will be reached if the processing or communication flow is traced back to the generation point. Is done.
[0087]
Functional verification uses the principle that equivalent output data can be obtained by inputting equivalent data to two sets of equivalent processing means and communication means. Therefore, the collation means for determining equivalence is monotonous. Only unit data with the same value of the increasing variable need be considered. For this reason, a stationary time difference can be absorbed by collating unit data having the same value of the monotonically increasing variable.
[0088]
As shown in FIG. 5, in the collation procedure, first, in step 200, the following processing is performed as preprocessing.
(Premise 1) A buffer that holds unit data for at least Td time every two inputs,
(Premise 2) The buffer records the arrival time of unit data,
(Assumption 3) The buffer records the maximum value of the monotonically increasing variable of the unit data.
A specific value of Td time is determined by actually measuring a steady time difference of unit data.
[0089]
Next, in step 201, unit data with the earliest arrival time is selected from the two buffers. In step 202, it is checked whether or not unit data having the same value of the monotonically increasing variable as that of the selected unit data exists in another buffer (not the buffer of the selected unit data).
[0090]
If it exists, the process proceeds to step 205, and in step 205, the unit data contents are compared and compared. As a result of this comparison, if they match, it is determined that “no fault” (step 206). If they do not match, it is determined that “there is a fault” (step 207). Then, the process proceeds to Step 211.
[0091]
If there is no unit data having the same value of the monotonically increasing variable in the determination of step 202, then in step 203, the value of the monotonically increasing variable of the selected unit data is separated from the buffer of the selected unit data. The maximum values Vmax of the monotonically increasing variables of the buffers are compared.
[0092]
If V> Vmax in step 203, the unit data to be verified may arrive with a delay, so a maximum Td time is waited (step 204).
[0093]
If V ≦ Vmax, the unit data to be collated has already arrived and has been judged as “with fault”. Therefore, the process proceeds to step 211. In step 211, the value of the unit increase variable is V. A certain unit data is deleted from the buffer, and the collating procedure from step 201 is repeated.
[0094]
If the unit data to be collated does not arrive even after the Td time has elapsed (step 204), the process proceeds to step 208 to check from which side the selected unit data has been input.
[0095]
If the input of the unit data selected in step 208 is from the second level module side, the data from the first level module is abnormally delayed beyond the Td time. It is determined that there is a fault in the level module. The fault in this case is “occurrence of overload or congestion in the first level module”. Next, the process proceeds to step 211.
[0096]
Further, if the input of the unit data selected at step 208 is from the abstraction means side, the data from the second level module is abnormally delayed beyond the Td time. It is determined that there is a fault in the second level module. The fault in this case is “occurrence of overload or congestion in the second level module”. Then, the process proceeds to Step 211.
[0097]
When the determination on the selected unit data is completed by the processing from step 201, in step 211, all the unit data having the same unit data and the same monotonically increasing variable value are deleted from the buffer. And the collation procedure from step 201 is repeated.
[0098]
Such a collation procedure makes it possible to detect a fault by distinguishing between a steady time delay during normal operation and a time delay due to the occurrence of overload or congestion.
[0099]
Next, processing contents of the first level function degeneration module 50 will be described. The de-abstracting means 51 in the first level function degeneration module 50 performs a process of restoring the first level data from the second level data.
[0100]
The inverse abstraction means 51 obtains restored data from the abstract data by the following operations (3a) to (3d).
Operation (3a): Operation for estimating the value of an attribute that does not exist,
Operation (3b): operation for interpolating data during a temporal sample interval,
Operation (3c): operation to interpolate data between spatial sample intervals; and
Operation (3d): Operation combining these.
Here, the operation (3b) changes the monotonically increasing variable assigned to the unit data. When the operation (3b) is applied to abstract data, the monotonically increasing variable of the interpolated unit data is set to the minimum and maximum values of the monotonically increasing variable of the unit data before and after the interpolation, and the monotonically increasing value. A value that satisfies the constraints of the variable is assigned.
[0101]
The selection control means 52 in the first level function degeneracy module 50 selects more probable data from the second level data or the abstract data of the first level data based on the fault detection result as described above, and selects the second level data. Output as data.
[0102]
Next, a selection control procedure for selecting more likely data for the processing operations of the selection control means 43 of the first-second level function verification module 40 and the selection control means 52 of the first level selection control module 50 will be described. .
[0103]
FIG. 6 is a flowchart for explaining the selection control procedure of the selection control means 43 of the first-second level function verification module 40. It demonstrates according to a flowchart.
[0104]
First, a description will be given of a CT variable that stores a value that serves as an index for selecting more reliable data. All unit data are assigned CT variables. The CT value, which is the value of the CT variable, is an index representing the certainty. The range of values is 0 ≦ CT value ≦ 1. When the CT value = 1, it indicates that the unit data is most likely.
[0105]
The CT value is assumed to change as follows by abstraction, de-abstraction and fault detection. That is
(4a): CT value decreases due to abstraction (via abstraction means).
(4b): CT value decreases due to de-abstraction (via de-abstraction means).
(4c): The CT value becomes 0 when a fatal error is detected by the error detection means.
(4d): The CT value decreases due to the detection of minor errors such as audio data and image data by the error detection means.
By using such a CT variable, the selection control means compares the CT values of the unit data arriving at the two input sides, and selects the unit data having a larger CT value as the output result. Execute.
[0106]
In the selection control procedure, as shown in FIG. As preprocessing, the following processing is performed.
(Premise 1) A buffer that holds unit data for at least Td time every two inputs,
(Premise 2) The value Vout of the monotonically increasing variable of the unit data output from the selection control means is recorded.
The value of Td is the same as the buffer holding time Td in the collating means. A buffer that holds data from the first level module via the abstraction means is a first level buffer, and a buffer that holds data from the second level module is a second level buffer.
[0107]
Next, in step 301, it is confirmed whether or not the collation means determines that “the second level module has a fault”. If “There is a fault in the second level module”, the process proceeds to step 302. In step 302, unit data whose monotonically increasing variable is equal to or smaller than Vout is deleted from the first level buffer. Next, in step 303, the unit data having the smallest monotonically increasing variable value is output from the first level buffer, and the process returns to step 301.
[0108]
If it is determined in step 301 that “the second level module has no fault”, the process proceeds to step 304, and in step 304, unit data D2 having the smallest monotonically increasing variable value is selected from the second level buffer. .
[0109]
Next, in step 305, it is determined whether or not unit data D1 having the same value of the monotonically increasing variable as D2 exists in the first level side buffer. In this determination, if the unit data D1 having the same monotonically increasing variable value as the unit data D2 exists in the first level side buffer, the process proceeds to step 307. In step 307, the CT value of the unit data D1 and the unit data D2 The CT values of are compared.
[0110]
If it is determined in step 305 that the unit data D1 having the same value of the unit data D2 and the monotonically increasing variable does not exist in the first level buffer, the process proceeds to step 306. In step 306, the first level buffer The unit data having the largest monotonically increasing variable value is set as D1, and the process proceeds to step 307. In step 307, the CT value of the unit data D1 is compared with the CT value of the unit data D2.
[0111]
If “CT value of unit data D1> CT value of unit data D2” in the comparison with step 307, it can be determined that the unit data D1 is “more likely” from the definition of the CT value. At 308, unit data whose monotonically increasing variable is equal to or smaller than Vout is deleted from the first level buffer. Next, in step 309, the unit data having the smallest monotonically increasing variable value is output from the selection control means 43 from the first level buffer, and the process returns to step 301.
[0112]
In the comparison of step 307, if “CT value of unit data D1 ≦ CT value of unit data D2”, in this case, it can be determined that the unit data D2 is “more likely” from the definition of the CT value. Proceeding to step 310, in step 310, unit data having a monotonically increasing variable value equal to or smaller than Vout is deleted from the second level buffer. Next, in step 311, the unit data having the smallest value of the monotonically increasing variable is output from the selection control means 43 from the second level side buffer, and the process returns to step 301.
[0113]
If unit data having the same value of the unit data D2 and the monotonically increasing variable does not exist in the first level buffer (step 305), the unit data having the largest monotonically increasing variable in the first level buffer is set to D1. (Step 306) If there is no unit data in the first level side buffer, the CT value of D1 is set to zero. The CT value of the unit data D1 is compared with the CT value of the unit data D2 (step 307).
[0114]
FIG. 7 is a flowchart for explaining the selection control procedure of the selection control means 52 of the first level function degeneration module 50. It demonstrates according to a flowchart.
[0115]
In the selection control procedure of the selection control means 52, as shown in FIG.
(Premise 1) A buffer that holds unit data for at least Td time every two inputs,
(Premise 2) The value Vout of the monotonically increasing variable of the unit data output from the selection control means is recorded.
The value of Td is the same as the buffer holding time Td in the collating means. A buffer that holds data from the first level module is referred to as a first level side buffer, and a buffer that holds data from the second level module via the reverse abstraction means is referred to as a second level side buffer.
[0116]
Next, in step 401, it is confirmed whether or not the collation means determines that “the first level module has a fault”. If it is determined that “the first level module has a fault”, in the next step 402, unit data whose monotonically increasing variable is equal to or smaller than Vout is deleted from the second level side buffer. Next, in step 403, the unit data having the smallest monotonically increasing variable value is output from the second level buffer, and the process returns to step 401.
[0117]
If it is not determined in step 401 that “the first level module has a fault”, the process proceeds to step 404, and in step 404, unit data D1 having the smallest monotonically increasing variable value is selected from the first level side buffer.
[0118]
Next, in step 405, it is determined whether or not unit data D2 having the same value of the monotonically increasing variable as D1 exists in the second level buffer. In this determination, if the unit data D2 having the same value of the monotonically increasing variable as the unit data D1 exists in the second level buffer, the CT value of the unit data D1 and the CT value of the unit data D2 are compared in step 407. To do.
[0119]
If it is determined in step 405 that unit data D2 having the same value of the unit data D1 and the monotonically increasing variable does not exist in the second level buffer, the process proceeds to step 406, and in step 406, the second level buffer is stored. The unit data having the largest monotonically increasing variable value is set as D2, and the process proceeds to step 407. In step 407, the CT value of the unit data D1 is compared with the CT value of the unit data D2.
[0120]
In comparison with step 407, if “CT value of unit data D1 ≧ CT value of unit data D2”, it can be determined that the unit data D1 is “more likely” from the definition of the CT value. Unit data whose monotonically increasing variable is the same as or smaller than Vout is deleted from the first level buffer, and in the next step 409, unit data having the smallest monotonically increasing variable is selected from the first level buffer. Output from the means 52 and return to step 401.
[0121]
Further, in the comparison in step 407, if “CT value of unit data D1 <CT value of unit data D2”, it can be determined from the definition of the CT value that unit data D2 is “more likely”. , The unit data whose monotonically increasing variable is equal to or smaller than Vout is deleted from the second level side buffer, and in the next step 411, the unit data having the smallest monotonically increasing variable value is deleted from the second level side buffer. The data is output from the selection control means 52 and the process returns to step 401.
[0122]
If unit data having the same monotonically increasing variable value as unit data D1 does not exist in the second level buffer (step 405), the unit data having the largest monotonically increasing variable in the second level buffer is D2. If there is no unit data in the second level buffer, the CT value of D2 is set to 0, and the CT value of unit data D1 is compared with the CT value of unit data D2.
[0123]
Next, expansion of the basic configuration of the system will be described. The system having the basic configuration can be expanded to a system having an N-th level module expansion configuration or a cascade connection expansion configuration according to the requirements of the system configuration of the computer system to be applied. FIG. 3 is a diagram illustrating an example in which a third level module is additionally installed in the Nth level module expansion configuration, and FIG. 4 is a diagram illustrating an example in which the basic configuration is cascaded in two stages in the cascade connection expansion configuration. .
[0124]
As shown in FIG. 3, in the Nth level module extension configuration, at least one from the third level module to the Nth level module (N> 3) with respect to the first level module and the second level module of the basic configuration. A level is added and a related abstraction module, function verification module, and function degeneration module are added. This corresponds to a case where the expression level of the multi-hierarchy model is three or more. An example in which a third level module is additionally installed in the Nth level module expansion configuration is shown in the system configuration shown in FIG.
[0125]
That is, as shown in FIG. 3, in the extended redundant system, in addition to the system element configuration of the basic configuration of FIG. 2, a third level module 60, a second to third level abstraction module 63, and a second level 2-3 A level function verification module 64 and a second level function degeneration module 65 are further provided.
[0126]
The extended redundant system includes a first level module 10, a second level module 20, a 1-2 level abstraction module 30, a 1-2 level function verification module 40, a 1st level function degeneration module 50, and a 3rd level. The module 60 includes a second-third level abstraction module 63, a second-third level function verification module 64, and a second-level function degeneration module 65.
[0127]
These added system components are similar to those of the base system. That is, although not shown, the third level module 60 includes third level data processing means, third level data communication means, and error detection means added to each of the third level data processing means, and the second level module 20 and the third level module. The 2-3 level abstraction module 63 interposed between the modules 60 is composed of abstraction means. The second-third level function verification module 64 includes an abstraction unit that abstracts the output data of the second level module, a collation unit that compares and collates the abstract data with the output data of the third level module, and the abstract data. It comprises selection control means for selecting a third level output result from the output data of the third level module.
[0128]
The second level function degeneration module 65 also includes a reverse abstraction means for restoring data from the third level data to the second level data, and outputs the second level output result from the restored data and the output data of the second level module. And selection control means for selecting.
[0129]
As shown in FIG. 4, the extended configuration of the cascade connection is a configuration in which the basic configuration (or the Nth level module expansion configuration) of the redundant system described above is connected in cascade along the data flow. This applies when data is processed or communicated via several processing nodes. FIG. 4 shows an example of a system configuration of an extended configuration of cascade connection in which basic configuration systems are connected in cascade in two stages.
[0130]
In the extended redundant system, as shown in FIG. 4, in addition to the system element configuration of the basic configuration of FIG. 2, a system configuration similar to the basic configuration system is added as a system configuration to be connected in cascade. Here, a first level module 71, a second level module 72, a 1-2 level function verification module 74, and a first level function degeneration module 75 are further provided.
[0131]
The redundant system expanded by cascade connection includes a first level module 10, a second level module 20, a 1-2 level abstraction module 30, a 1-2 level function verification module 40, a 1st level function degeneration module 50, It comprises a first level module 71, a second level module 72, a 1-2 level function verification module 74, and a first level function degeneration module 75. These added system components are similar to those of the base system.
[0132]
  Next, the details of the processing in each module of the Nth level module expansion configuration and the cascade connection expansion configuration, which are the expansion of the basic configuration, are the same as those described in the basic configuration system.InI will omit the explanation.
[0133]
In the Nth level module extension configuration (FIG. 3), the function verification and the function degeneration of the basic configuration are executed simultaneously in a plurality of levels. Therefore, even if a fault occurs in the first level module and the second level module (N-1th level module), the function can be degraded.
[0134]
Further, in the cascade connection extended configuration (FIG. 4), the function verification and the function degradation of the basic configuration of the redundant system are performed at a plurality of locations (a plurality of processing nodes), so that a plurality of faults (first level of each processing node). If it occurs (either in a module or in a second level module), it can be degenerate.
[0135]
Next, as one of more specific embodiments, an application system in which the redundant system according to the present invention is applied to an image data transmission system will be described.
[0136]
For example, a redundant system is an application system in which a video camera is installed at the entrance of the facility for the purpose of confirming visitors entering and exiting the facility, and the appearance of the visitor is displayed on a monitor TV placed at a distant reception or in a waiting area. It is an example of the system configuration | structure comprised in this. FIG. 8 shows the configuration of a redundant application system.
[0137]
In FIG. 8, each module and each means (component) are embodied in the redundant system having the basic configuration described with reference to FIG. In order to clarify the process of realization, in the following description, the system configuration shown in FIG. 8 and the system configuration shown in FIG.
[0138]
In FIG. 8, a moving image from a video camera 501 is converted by an image input device 502 into color still image data compressed by the JPEG method of 640 × 480 pixels at 30 frames per second. In addition, the image input device 502 gives a monotonically increasing variable and a CT variable to the still image data. The image data to which the monotonically increasing variable and the CT variable are assigned becomes the first level data.
[0139]
The process of materialization at this stage is summarized. The unit data is image data for one screen compressed by the JPEG method. The value of the monotonically increasing variable is an integer value representing the time when image data is acquired from the video camera in units of 0.01 seconds. The value of the CT variable (CT value) is a real value with an initial value of 1.0, and changes as follows by abstraction, de-abstraction, and fault detection.
(1) When passing through the abstraction means, the CT value is multiplied by 0.9 (corresponding to the change of the CT value in (4a)).
(2) The CT value is multiplied by 0.9 through the inverse abstraction means (corresponding to the change of the CT value in (4b)).
(3) When an error is detected, the CT value is set to 0 (corresponding to the change in the CT value of (4c) and (4d)).
[0140]
The specific processing contents of each module and each means will be explained. In the first level module 510, the first level data processing means 511 is a program for dropping the first level data into 1/3 frames. The image data of 30 frames per second of 640 × 480 pixels is converted into image data of 10 frames per second of 640 × 480 pixels. The first level data communication means 512 is a local area network of 100 megabits per second and a control program, and transmits 10 frames of image data of 640 × 480 pixels per second to the monitor television side. The error detection means (not shown) of the first level module is a program that aggregates error information from the frame drop processing program and communication errors from the local area network at 100 megabits per second, for example.
[0141]
The abstraction means (not shown) in the first-second level abstraction module 530 performs the following three operations.
(1) An operation of extracting a luminance signal from color image data and converting it to monochrome image data (corresponding to an operation of extracting information on a specific attribute of the operation (1a)).
(2) An operation for reducing the number of frames per second to 1/10 (corresponding to an operation for widening the temporal sampling interval of the operation (1b)).
(3) An operation for setting the screen size to ½ in the direction of each of the X axis and the Y axis (corresponding to an operation for widening a spatial sample interval in the operation (1c))
[0142]
Accordingly, the second level data is subjected to an abstraction operation to color image data of 30 frames per second of 640 × 480 pixels, which is the first level data, and becomes monochrome image data of 3 frames per second of 320 × 240 pixels.
[0143]
The second level data processing means 521 of the second level module 520 is a program for dropping the second level data into 1/3 frames. The monochrome image data of 320 × 240 pixels at 3 frames per second is converted into monochrome image data of 320 × 240 pixels at 1 frame per second. The second level data communication means 522 is a local area network of 10 megabits per second and a control program, and transmits monochrome image data of 320 × 240 pixels per frame to the monitor television side.
[0144]
The error detection means (not shown) of the second level module is a program that aggregates error information from the frame drop processing program and communication errors from the local area network at 10 megabits per second, for example.
[0145]
The abstraction unit of the first-second level function verification module 540 performs the same operation as the abstraction unit of the first-second level abstraction module 530. The verification unit of the first-second level function verification module 540 executes the verification procedure as described with reference to FIG. The collating means holds the image data in the buffer for at least 1 second for each input (corresponding to Td = 1 sec). Further, the comparison of unit data contents is replaced by a comparison of only the data amount by utilizing the fact that the data amount of image data compressed by the JPEG method changes depending on the content. The selection control means of the first-second level function verification module 540 executes the selection control procedure as described with reference to FIG.
[0146]
The inverse abstraction means in the first level function degeneration module 550 executes the following two operations.
(1) The number of frames per second is multiplied by 10 by interpolation (corresponding to the operation of interpolating data during the temporal sampling interval of (3b)).
(2) The number of pixels is doubled in the respective directions of the X axis and Y axis (corresponding to the operation of interpolating data between the spatial sample intervals in (3c)).
Here, there is no operation corresponding to (3a). That is, the restored image data is a monochrome image.
[0147]
The selection control means of the first level function degeneration module 550 executes the selection control procedure as described with reference to FIG. The monitor TV 503 displays the output result of the selection control means of the first level function degeneration module 550 as a video.
[0148]
(Example of application system operation)
A typical operation of the application system will be described next. In particular, the description will focus on the more probable data selection by the selection control means of the first level functional degeneration module by comparing the CT values.
[0149]
(Operation example 1 (functional verification))
If no fault is detected:
The selection control means of the first level function degeneration module selects the first level data. This is because the restoration data passes through abstraction and de-abstraction, so CT value = 0.81 and CT value = 1.0 of the first level data becomes large.
[0150]
The monitor television 503 displays the image from the video camera 501 in color at 10 frames per second of 640 × 480 pixels.
[0151]
Note that image data verification for function verification is always executed, and a result of “no fault” is output (to a console or the like).
[0152]
(Operation example 2 (degenerate function))
If a fault occurs in the first level module:
The selection control means of the first level function degeneration module selects restoration data. This is because the CT value of the first level data becomes CT value = 0 due to the occurrence of a fault, and the CT value of restored data = 0.81 becomes large.
[0153]
The monitor TV 503 displays the video from the video camera 501 in monochrome at 10 frames of 640 × 480 pixels / second (restored from image data of 1 frame of 320 × 240 pixels / second). At the same time, the result of “There is a fault in the first level module” is output (to the console or the like).
[0154]
(Operation example 3)
When a fault occurs in the second level module: The selection control means of the first level function degeneration module selects the first level data. This is because the CT value of the second level data is 0 and the CT value becomes 0 even after de-abstraction, so the CT value of the first level data = 1.0 becomes large.
[0155]
The monitor television 503 displays the image from the video camera 501 in color at 10 frames per second of 640 × 480 pixels. At the same time, a result of “There is a fault in the second level module” is output (to the console or the like).
[0156]
【The invention's effect】
According to the redundant system according to the present invention described above, the following effects (Effect 1) to (Effect 4) can be achieved.
(Effect 1)
Increase in required resources is suppressed by multiplexing redundant modules that handle data with a reduced amount of information with respect to basic modules, instead of multiplexing the same modules as the previous system configured in a redundant system. .
[0157]
(Effect 2)
By using error detection means installed separately in data processing means and communication means and collation means for verifying the equivalence of abstract data, not only hardware failures but also software bugs, design errors, overload, congestion, etc. It is possible to detect faults in a broad sense including
[0158]
(Effect 3)
When a fault is detected, the data restored from the abstract data is used as an output result in accordance with the cause and location of the fault, so that the function can be degraded with respect to a broad sense of failure.
[0159]
(Effect 4)
The system can be expanded to an N-th level module expansion configuration or a cascade connection expansion configuration according to the configuration of the previous system configured as a redundant system. In the expanded configuration, the function can be reduced even for a plurality of faults.
[0160]
With the above effects, it is possible to easily configure a redundant system that executes system function verification in a normal state and performs function degeneration when a fault occurs while suppressing required resources.
[Brief description of the drawings]
FIG. 1 is a diagram showing a basic configuration procedure of the present invention;
FIG. 2 is a diagram showing a basic configuration of a redundant system according to an embodiment of the present invention;
FIG. 3 is a diagram showing a system configuration in which a third level module is additionally installed in an N level module expansion configuration;
FIG. 4 is a diagram showing a system configuration in which a basic configuration is cascaded in two stages in a cascade connection extended configuration;
FIG. 5 is a flowchart showing a collation procedure considering a time difference between output results;
FIG. 6 is a flowchart showing a selection control procedure of the selection control means 43;
FIG. 7 is a flowchart showing a selection control procedure of the selection control means 52;
FIG. 8 is a diagram illustrating a configuration of an application system as an embodiment.
[Explanation of symbols]
1 ... 1st level input data
2 ... First level output result
3. Second level output result
11: First level data processing means
12. First level data communication means
13. Error detection means
14: Error detection means
20 ... 2nd level module
21. Second level data processing means
22 ... Second level data communication means
23. Error detection means
24. Error detection means
30 ... 1-2 level abstraction module
31 ... Abstraction means
40 ... 1-2 level function verification module
41 ... Abstraction means
42. Verification means
43 ... Selection control means
50. First level function degeneration module
51. Reverse abstraction means
52. Selection control means
60 ... Third level module
63 ... 2nd-3 level abstraction module
64 ... 2nd-3 level functional verification module
65 ... 2-3 level function degeneration module
71 ... 1st level module
72. Second level module
74: 1-2 level function verification module
75. First level function degeneration module

Claims

A redundant system in which the functions of a processing device and a communication device are multiplexed so that system operation can be continued with partial functions even when a fault occurs in a component of a computer system,
A first level module of a system function for processing or communicating normal data of a computer system;
A second level module of a system function for processing or communicating data in which the amount of information is reduced from the normal data by abstraction means for performing abstraction so that the function can be reduced;
A function verification module that verifies the system function by comparing the data obtained by reducing the amount of information from the output data of the first level module and the output data of the second level module;
When a fault is detected by the function verification module , if no fault is detected in the first level module, the output data of the first level module is selected as it is, and if a fault is detected in the first level module, the second level is selected. A redundant system comprising: a function degeneration module that degenerates a system function by selecting data restored by performing a process reverse to the abstraction by the abstraction means from a level module .

The redundant system according to claim 1,
The function degeneration module compares the magnitudes of the values of variables representing the likelihood of data, which change according to error detection and information reduction in the process of communication or processing in each of the first level module and the second level module. Thus, if no fault is detected in the first level module, the output data of the first level module is selected as it is, and if a fault is detected in the first level module, the data restored from the second level module is selected. Thus, a redundant system characterized by degenerating system functions.

A redundant system configuration system for configuring a redundant system in which the functions of a processing device and a communication device are multiplexed so that the system operation can be continued with a partial function even when a fault occurs in a component of a computer system. ,
From the first level module of the system function that processes or communicates normal data of a computer system, the data whose information amount is reduced from the normal data is processed or communicated by the abstraction means that performs abstraction so that the function can be reduced. A degenerate model creating means for creating a second level module;
The first level module and the second level module are operated simultaneously, and when the fault is not detected, the system function is verified. When the fault is detected, if the fault is not detected in the first level module, the first level module is verified. If the module output data is selected as it is and a fault is detected in the first level module, the restored data is selected from the second level module by performing the reverse process of the abstraction by the abstraction means. A redundant system configuration system comprising: a function degeneration module creating means for creating a function degeneration module that degenerates.

A redundant system configuration method for configuring a redundant system in which functions of a processing device and a communication device are multiplexed so that system operation can be continued with partial functions even when a fault occurs in a component of a computer system. ,
From the first level module of the system function that processes or communicates normal data of a computer system, the data whose information amount is reduced from the normal data is processed or communicated by the abstraction means that performs abstraction so that the function can be reduced. Create a second level module
The first level module and the second level module are operated simultaneously, and when the fault is not detected, the system function is verified. When the fault is detected, if the fault is not detected in the first level module, the first level module is verified. If the module output data is selected as it is and a fault is detected in the first level module, the restored data is selected from the second level module by performing the reverse process of the abstraction by the abstraction means. A redundant system configuration method comprising: generating a function degenerate module that degenerates.