JPH04241038A

JPH04241038A - Recovering method for high-reliability computer system

Info

Publication number: JPH04241038A
Application number: JP3007520A
Authority: JP
Inventors: Takeshi Miyao; 健宮尾; Manabu Araoka; 学荒岡; Tomoaki Nakamura; 智明中村; Masayuki Tanji; 雅行丹治; Shigenori Kaneko; 茂則金子; Koji Masui; 晃二桝井; Saburo Iijima; 三朗飯島; Nobuyasu Kanekawa; 信康金川; Shinichiro Yamaguchi; 伸一朗山口; Yoshiki Kobayashi; 芳樹小林
Original assignee: Hitachi Ltd; Hitachi Process Computer Engineering Inc
Current assignee: Hitachi Ltd; Hitachi Information and Control Systems Inc
Priority date: 1991-01-25
Filing date: 1991-01-25
Publication date: 1992-08-28
Anticipated expiration: 2017-01-15
Also published as: JP3246751B2

Abstract

PURPOSE:To eliminate various trouble due to the recombination of processors by disconnecting some of the processors in the case of trouble occurrence and switching all the processor into another new processor group at the time of recovery. CONSTITUTION:Three MPUs(microprocessing unit) of a BPU(basic processing unit) 2A are normally in operation. If trouble occurs to the MPUC during the execution of a process B, it is disconnected and the operation is carried on by the MPUA and MPUB. The printed board of a new BPU2B is inserted into a free slot in response to abnormality information by the MPUA, and then the respective MPUs in the new BPU2B diagnose themselves; and the process is transferred from the old BPU2A to the new BPU2B and a process D is carried out according to the result of majority decision making by the three MPUs. Thus, the process is taken over by carrying on the operation of the BPLT up to a good switching point of time or a repair maintenance period, so that the takeover is done at the most suitable point of time of software.

Description

[Detailed description of the invention]

【０００１】0001

【産業上の利用分野】本発明は高信頼化コンピュータシ
ステムにかかり、特に障害発生時に運転継続できること
は勿論、その後の復旧策について工夫された高信頼化コ
ンピュータシステムの復旧方法に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a highly reliable computer system, and more particularly to a method for restoring a highly reliable computer system that not only allows continued operation in the event of a failure, but also devises subsequent recovery measures.

【０００２】0002

【従来の技術】例えば交通管制システムや、金融，証券
システムは情報化社会の浸透に伴い、社会生活の根幹を
占めるようになってきており、これらに使用されるコン
ピュータシステムは障害が発生しないように工夫される
とともに、仮に障害が発生したとしてもデータの一貫性
を保持したまま処理を続行するように構成される必要が
ある。[Prior Art] For example, traffic control systems, financial systems, and securities systems have become the basis of social life as the information society spreads, and the computer systems used in these systems are designed to prevent failures. It also needs to be configured so that even if a failure occurs, processing can continue while maintaining data consistency.

【０００３】これらの要求に応えるため、従来より、障
害許容コンピュータもしくは、耐故障，耐欠陥コンピュ
ータシステムが種々提案されており、障害が発生しても
データ処理を継続できるように同じ機能を有する複数の
システムないし、部品で構成し、各部で冗長性を持たせ
ることにより障害の発生したシステムないし部品を検出
し、データ処理を続行するようにされている。[0003] In order to meet these demands, various fault-tolerant computers, fault-tolerant, and defect-tolerant computer systems have been proposed. It consists of a system or parts, and by providing redundancy in each part, it is possible to detect a system or part in which a failure has occurred and continue data processing.

【０００４】具体的な従来例として、米国特許第４６５
４８５７　号は、通称ペアアンドスペア法と呼ばれる方
式を採用し、自己診断機能のあるメモリ，プロセッサ，
入出力制御装置などからなるプロセッサボード２枚を１
組にして動作する。どのプロセッサボードも内部には２
個のマイクロプロセッサを持ち、マイクロプロセッサの
出力を照合し、不一致の場合はボード故障と見なすこと
により、障害を検出している。また、プロセッサボード
からバスにだされた出力はもう一方のプロセッサボード
とバスクロック毎に照合、同期するロックステップ方式
を採用しており、片方のプロセッサボードで障害が発生
してもそのバスクロック内で検出し、切り離し処理が行
われ、正常なプロセッサボードの出力のみが使用される
。As a specific conventional example, US Pat. No. 465
No. 4857 adopts a method commonly called the pair-and-spare method, and uses memory, processor, and
Two processor boards consisting of input/output control devices, etc.
Works in pairs. Every processor board has two
It has several microprocessors, and detects failures by comparing the outputs of the microprocessors and assuming that there is a discrepancy as a board failure. In addition, a lockstep method is adopted in which the output sent from the processor board to the bus is compared and synchronized with the other processor board for each bus clock, so even if a failure occurs on one processor board, the output will remain within that bus clock. The processor board is detected and disconnected, and only the output of the normal processor board is used.

【０００５】また、特開昭５９−１６０８９９号では、
米国特許第４６５４８５７　号と同様に二重のシステム
バスの夫々に接続され、その内部に２つのプロセッサを
有する２つのプロセッサボードを有し、その同期化のた
めにキャッシュメモリに着目し、キャッシュメモリから
主記憶装置へのフラッシュ動作をＯＳ制御の下で行うこ
とにより、ロックステップ動作による性能制限を避けて
いる。そして、プロセッサボード内の２個のマイクロプ
ロセッサの照合により障害が検出された場合、前回のフ
ラッシュポイントから代替プロセッサボードで処理を再
実行する。[0005] Also, in Japanese Patent Application Laid-open No. 160899/1989,
Similar to U.S. Pat. No. 4,654,857, this patent has two processor boards connected to each of the dual system buses and has two processors inside, and focuses on cache memory for synchronization. By performing the flash operation to the main memory under OS control, performance limitations due to lockstep operation are avoided. If a failure is detected by comparing the two microprocessors in the processor board, the process is re-executed on the alternative processor board from the previous flash point.

【０００６】上記システムではプロセッサボード上の２
台と別のプロセッサボード上の２台の計４台のマイクロ
プロセッサを使用するが、特開平１−２５８０５７　号
では、ＴＭＲ（Ｔｒｉｐｌｅ　Ｍｏｄｕｌａｒ　Ｒｅｄ
ｕｎｄａｎｃｙ　）技法を採用し、プロセッサ３台の出
力結果を多数決回路を介して二重化システムバスに出力
する。[0006] In the above system, two
A total of four microprocessors are used, one on one processor board and two on another processor board.
The output results of the three processors are output to the redundant system bus via the majority circuit.

【０００７】[0007]

【発明が解決しようとする課題】上記従来例は、一つの
プロセッサボ−ド上に何台のプロセッサを配置するかと
言ったことは別にして、いずれの場合も３台乃至４台の
プロセッサを使用するシステムであり、そのいずれかの
プロセッサに障害を発生したときにはこのプロセッサを
切り離して２台運転にシステムを縮小し、その後新たな
別の１台または２台のプロセッサを組み込んで元のシス
テム構成に再構成されるものである。[Problems to be Solved by the Invention] In the above conventional example, apart from the question of how many processors are arranged on one processor board, in any case, three to four processors are arranged on one processor board. If one of the processors fails, the system is cut down to two processors, and then one or two new processors are installed to restore the original system configuration. It is reconstructed into

【０００８】これらのシステムでは障害発生前のプロセ
ッサの組と、復旧後のプロセッサの組とは全く相違する
。つまり、前２者の従来例では当初Ａ，Ｂ，Ｃ，Ｄの４
つのプロセッサで運転していたとすると、復旧後のプロ
セッサ構成はＥ，ＦＣ，Ｄにて運転されることになる。また最後の従来例では当初Ａ，Ｂ，Ｃのものが、Ｄ，Ｂ
，Ｃとなる。このように従来のものでは障害発生後の復
旧時にプロセッサの組替えが必要であり、このため従来
例のものではそのシステムを構成する他のプロセッサと
の間での特別な接続、切離しハードウエア、同期機構が
必要である。また、プロセッサあるいはプロセッサボー
ドは徐々にバージョンアップされ、あるいはレビジョン
されるのが通例であるが、システムの一部であるプロセ
ッサあるいはプロセッサボードを交換する上記従来例で
は復旧後のミスマッチを防ぐための十分な事前対応が不
可欠である。また、プロセッサボードを交換するもので
は常に高価な交換ボードを準備しておく必要がある。さ
らに、プロセッサ間での同期化が困難である。In these systems, the set of processors before a failure occurs and the set of processors after recovery are completely different. In other words, in the former two conventional examples, initially there were 4 A, B, C, and D.
If the system was operating with three processors, the processor configuration after recovery would be E, FC, and D. In addition, in the last conventional example, initially A, B, and C were changed to D, B
,C. In this way, with conventional systems, it is necessary to rearrange the processors when recovering after a failure occurs, and for this reason, with conventional systems, special connections, disconnection hardware, and synchronization with other processors that make up the system are required. A mechanism is necessary. In addition, although processors or processor boards are usually upgraded or revised gradually, in the above conventional example of replacing a processor or processor board that is part of a system, sufficient Preliminary measures are essential. Furthermore, when replacing the processor board, it is always necessary to prepare an expensive replacement board. Additionally, synchronization between processors is difficult.

【０００９】以上のことから本発明では、障害発生後の
復旧時にプロセッサの組替えが不要な高信頼化コンピュ
ータシステムの復旧方法を提供することを目的とする。In view of the above, an object of the present invention is to provide a method for restoring a highly reliable computer system that does not require processor replacement when restoring after a failure occurs.

【００１０】0010

【課題を解決するための手段】本発明においては、シス
テムバス，該システムバスに接続される主記憶装置、シ
ステムバスに接続されるベーシックプロセッシングユニ
ットよりなる高信頼化コンピュータシステムにおいて、
前記ベーシックプロセッシングユニットが通常は第１，
第２，第３のプロセッサを備えて同一演算を実行してお
り、第１のプロセッサの故障時にこれを切り離して第２
，第３のプロセッサによる同一演算を実行し、その後第
４，第５，第６のプロセッサによる同一演算に処理を移
行して、第２，第３のプロセッサによる演算を停止する
ように構成変更される。[Means for Solving the Problems] The present invention provides a highly reliable computer system comprising a system bus, a main storage device connected to the system bus, and a basic processing unit connected to the system bus.
The basic processing unit is usually a first,
It is equipped with second and third processors to execute the same operation, and when the first processor fails, it is disconnected and the second
, the configuration is changed so that the third processor executes the same operation, then the fourth, fifth, and sixth processors perform the same operation, and the second and third processors stop the operation. Ru.

【００１１】[0011]

【作用】本発明では、障害発生時にその一部を切離し、
復旧時にプロセッサの全てを新たな別のプロセッサ群に
切替てしまうので、プロセッサの組替えに伴う種々の障
害が排除される。[Operation] In the present invention, when a failure occurs, a part of it is separated,
Since all of the processors are switched to a new, different processor group at the time of recovery, various failures associated with processor rearrangement are eliminated.

【００１２】0012

【実施例】以下本発明について詳細に説明するが、本明
細書での説明はその理解を容易にするために以下の項目
に分けて行う。[Example] The present invention will be explained in detail below, but the explanation in this specification will be divided into the following items to facilitate understanding.

【００１３】Ｉ．　システムの概略全体構成ＩＩ．　　
ＢＰＵ２の構成ＩＩＩ．　異常検出手法ＩＶ．　異常時の構成変更制御Ｖ．　内部バス接続時の信号処理ＶＩ．　異常発生後の復旧策ＶＩＩ．　各部回路の代案変形例Ｉ．　システムの概略全体構成図１に本発明のフォルトトレーラントシステムの概略全
体構成を示す。このシステムは２組のシステムバス１−
１と１−２を有しており、このバス上には一つまたは複
数のベーシックプロセッシングユニット（以下単にＢＰ
Ｕという）２−１，２−２……２−ｎがシステムバス１
−１と１−２に夫々接続されている。またシステムバス
１−１には主記憶装置３−１が、１−２には主記憶装置
３−２が夫々個別に接続され、入出力装置（以下単にＩ
ＯＵという）４−１，４−２が夫々システムバスの何れ
にも接続される。主記憶装置３及びＩＯＵ４は、夫々２
台を一組として使用され、図１の例では各一組づつ使用
する例を示しているが、これはシステムの拡張に応じて
適宜組数を増加して使用することができる。図示のｎ組
のＢＰＵは、通常は夫々別の処理を実行しているが、何
れも同じ構成とされているのでここでの説明は特に必要
のないかぎりＢＰＵ２−１を例にとってその構成及び作
用について説明する。I. Schematic overall configuration of the system II.
Configuration of BPU2 III. Anomaly detection method IV. Configuration change control in case of abnormality V. Signal processing during internal bus connection VI. Recovery measures after an abnormality occurs VII. Alternative modification example of each part circuit I. Schematic Overall Structure of the System FIG. 1 shows the schematic overall structure of the fault-tolerant system of the present invention. This system consists of two system buses 1-
1 and 1-2, and on this bus there is one or more basic processing units (hereinafter simply BP).
2-1, 2-2...2-n is system bus 1
-1 and 1-2, respectively. In addition, a main storage device 3-1 is connected to the system bus 1-1, and a main storage device 3-2 is connected to the system bus 1-2.
(referred to as OU) 4-1 and 4-2 are respectively connected to either system bus. The main storage device 3 and IOU4 each have 2
The units are used as a set, and the example in FIG. 1 shows an example in which each set is used, but the number of sets can be increased as appropriate as the system expands. The illustrated n sets of BPUs normally execute different processes, but since they all have the same configuration, the configuration and operation will be explained using BPU 2-1 as an example unless otherwise necessary. I will explain about it.

【００１４】ＢＰＵ２は、複数のマイクロプロセッシン
グユニット２０（以下単にＭＰＵという。図の例では３
台）、複数のＭＰＵ出力チェック回路２３（図の例では
３台）、３ステートバッファ回路２９等、複数のキャッ
シュメモリ２２０，２２１，複数のバスインターフェイ
ス回路２７（以下単にＢＩＵという）等を主要な構成要
件としている。ここで図１回路の概略の動作を説明して
おくと、３台のＭＰＵ２０により演算が実行され、この
ＭＰＵの出力がチェック回路２３においてチェックされ
、正常と判断された２つのＭＰＵの出力が夫々バスイン
ターフェイス回路２７を介して２組のシステムバス１、
あるいは２組のキャッシュメモリ２２０，２２１に夫々
出力される。ＭＰＵの１つに異常が発見された場合、こ
のＭＰＵは除外されて残りの２つの正常なＭＰＵにより
その出力が夫々バスインターフェイス回路２７を介して
２組のシステムバス１に、あるいは２組のキャッシュメ
モリ２２０，２２１に夫々出力される。３台のＭＰＵ２
０の一部に異常が発見された後は、適宜のタイミングで
３台のＭＰＵ２０が全く別の新たな３台のＭＰＵ２０に
切替られて演算を実行する。The BPU 2 includes a plurality of microprocessing units 20 (hereinafter simply referred to as MPUs; in the example shown in the figure, 3
), multiple MPU output check circuits 23 (three in the example shown), three-state buffer circuits 29, multiple cache memories 220, 221, multiple bus interface circuits 27 (hereinafter simply referred to as BIU), etc. It is considered a configuration requirement. To briefly explain the operation of the circuit in FIG. 1, three MPUs 20 execute calculations, the outputs of these MPUs are checked in the check circuit 23, and the outputs of the two MPUs determined to be normal are respectively two sets of system buses 1 via the bus interface circuit 27;
Alternatively, the data is output to two sets of cache memories 220 and 221, respectively. If an abnormality is found in one of the MPUs, this MPU is excluded and the remaining two normal MPUs send their outputs to two sets of system buses 1 via the bus interface circuit 27, or to two sets of caches. The signals are output to memories 220 and 221, respectively. 3 MPU2
After an abnormality is discovered in a part of 0, the three MPUs 20 are switched to three completely different new MPUs 20 at an appropriate timing to execute calculations.

【００１５】ＩＩ．ＢＰＵ２の構成ＢＰＵ２のより詳細な構成は図２に示されている。なお
後述するように、ＢＰＵは一枚のプリント板上に図示の
機能を搭載されるのが良い。II. Configuration of BPU2 A more detailed configuration of BPU2 is shown in FIG. As will be described later, it is preferable that the BPU has the illustrated functions mounted on a single printed board.

【００１６】図２において、３台のＭＰＵ２０−１，２
０−２，２０−３は図示せぬクロックにより同期演算が
実行され、その結果がアドレスラインＡとデータライン
Ｄに夫々出力される。ＭＰＵ２０−１，２０−２，２０
−３のアドレスラインＡ上のアドレスとデータラインＤ
上のデータには、パリティ生成／検査照合回路１０乃至
１５から適宜のパリティ信号が付与されてＭＰＵ出力チ
ェック回路２３に与えられる。ＭＰＵ出力チェック回路
２３は、ＭＰＵＡ（２０−１）からの出力（パリティ信
号が付与されたアドレス，データ）とＭＰＵＢ（２０−
２）からの出力とを比較する第１のチェック回路ＣＨＫ
ＡＢ（２３−１）と、ＭＰＵＡ（２０−１）からの出力
とＭＰＵＣ（２０−３）からの出力とを比較する第２の
チェック回路ＣＨＫＣＡ　（２３−２）と、ＭＰＵＢ（
２０−２）からの出力とＭＰＵＣ（２０−３）からの出
力とを比較する第３のチェック回路ＣＨＫＢＣ　（２３
−３）と、３つのチェック回路ＣＨＫからの比較結果に
応じてＭＰＵのいずれの故障であるかを特定するエラー
チェック回路２３４，２３５から構成される。このＭＰ
Ｕ出力チェック回路２３はいわゆる多数決回路であり、
この判定結果に応じて３ステートバッファ回路２００，
２０１，２０３，２０４，２９の開閉状態が制御される
。この判定結果と３ステートバッファ回路の状態の関係
については後述するが、要するに異常と判定されたＭＰ
Ｕを以後使用せず、正常とされたＭＰＵの出力を２つの
キャッシュメモリ２２０，２２１に与えて２重系として
運用するものである。なお、以下の説明においては３ス
テートバッファ回路のイネーブル状態を単に開状態と称
し、ディセーブル状態を閉状態ということにする。In FIG. 2, three MPUs 20-1 and 2
0-2 and 20-3 are subjected to synchronous calculations by a clock (not shown), and the results are output to address line A and data line D, respectively. MPU20-1, 20-2, 20
-3 address on address line A and data line D
Appropriate parity signals are added to the above data from the parity generation/check verification circuits 10 to 15, and the resulting data is applied to the MPU output check circuit 23. The MPU output check circuit 23 checks the output from MPUA (20-1) (address and data to which a parity signal is attached) and MPUB (20-1).
2) A first check circuit CHK that compares the output from
AB (23-1), a second check circuit CHKCA (23-2) that compares the output from MPUA (20-1) and the output from MPUC (20-3), and MPUB (
A third check circuit CHKBC (20-2) and an output from MPUC (20-3) are compared.
-3), and error check circuits 234 and 235 that identify which MPU is at fault in accordance with the comparison results from the three check circuits CHK. This MP
The U output check circuit 23 is a so-called majority circuit,
According to this determination result, the 3-state buffer circuit 200,
The open/close states of 201, 203, 204, and 29 are controlled. The relationship between this determination result and the state of the 3-state buffer circuit will be described later, but in short, the MP determined to be abnormal
U is no longer used, and the output of the MPU determined to be normal is given to the two cache memories 220 and 221 to operate as a dual system. In the following description, the enabled state of the three-state buffer circuit will be simply referred to as an open state, and the disabled state will be referred to as a closed state.

【００１７】３ステートバッファ回路２００，２０１，
２０３，２０４を介して得られたアドレス、データは２
つのキャッシュメモリ２２０，２２１に夫々与えられ、
その際パリティチェック回路２５０においてパリティ生
成／検査照合回路１０乃至１５で付与したパリティのチ
ェックが行われる。またＭＰＵ出力は、同期回路２９０
，２９１において２つのＭＰＵ出力の同期が図られ、バ
スインターフェイスユニットＢＩＵを介してシステムバ
スに送出される。その際パリティチェック回路３０，３
１においてパリティ生成／検査照合回路１０乃至１５で
付与したパリティのチェックが行われる。以上の構成は
、ＭＰＵからのライトアクセスを主体に述べたものであ
るが、このようにＭＰＵからのライトアクセスのときは
ＭＰＵ出力チェック回路２３とパリティチェック回路３
０，３１においてチェックが行われる。3-state buffer circuits 200, 201,
The address and data obtained through 203 and 204 are 2
cache memories 220 and 221, respectively,
At that time, the parity check circuit 250 checks the parity given by the parity generation/checking and matching circuits 10 to 15. Also, the MPU output is output from the synchronous circuit 290.
, 291, the two MPU outputs are synchronized and sent to the system bus via the bus interface unit BIU. At that time, the parity check circuit 30, 3
1, the parity assigned by the parity generation/checking and matching circuits 10 to 15 is checked. The above configuration mainly describes write access from the MPU, but in this way, when write access from the MPU is performed, the MPU output check circuit 23 and the parity check circuit 3
A check is made at 0,31.

【００１８】これに対し、キャッシュリードアクセス時
は、各キャッシュメモリ２２０，２２１，３ステートバ
ッファ回路２０２，２０５，ＭＰＵのル−トで信号伝送
が行われ、この場合にはパリティ生成／検査照合回路１
０乃至１５でキャッシュメモリからのアドレス，データ
のチェックが行われる。なお、２６，２７も３ステート
バッファ回路であり、キャッシュリードアクセス時にパ
リティ生成／検査照合回路１０乃至１５でのアドレス，
データのチェック結果に応じて開閉状態が制御される。On the other hand, during cache read access, signals are transmitted through the routes of each cache memory 220, 221, 3-state buffer circuits 202, 205, and MPU, and in this case, the parity generation/check verification circuit 1
Addresses and data from the cache memory are checked from 0 to 15. Note that 26 and 27 are also 3-state buffer circuits, and the addresses in the parity generation/check verification circuits 10 to 15 are
The opening/closing state is controlled according to the data check results.

【００１９】図２の構成から明らかなように、本発明の
ＢＰＵシステムでは少なくとも３台のＭＰＵと、多数決
回路による異常ＭＰＵ検出回路と、二重化されたキャッ
シュメモリと、二重化された出力回路部分とを有する。As is clear from the configuration of FIG. 2, the BPU system of the present invention includes at least three MPUs, an abnormal MPU detection circuit using a majority circuit, a duplicated cache memory, and a duplicated output circuit portion. have

【００２０】ＩＩＩ．異常検出手法図２のＢＰＵ内部には、その異常検出部としてＭＰＵ出
力チェック回路２３と、多くのパリティチェック回路を
採用している。この項では、これらの異常検出手法につ
いて説明する。III. Abnormality Detection Method Inside the BPU shown in FIG. 2, an MPU output check circuit 23 and many parity check circuits are employed as an abnormality detection section. This section describes these anomaly detection methods.

【００２１】《ＭＰＵ出力回路による異常検出》このう
ち、ＭＰＵ出力チェック部分について図３に示す。図３
において第１のチェック回路ＣＨＫＡＢ　の出力をＡＢ
，第２のチェック回路ＣＨＫＣＡ　の出力をＣＡ，第３
のチェック回路ＣＨＫＢＣ　の出力をＢＣ，エラーチェ
ック回路２３１の出力を夫々Ａｇ，Ｃｇ，２９ｇとして
、３つのチェック回路の出力とそのときの３ステートバ
ッファ回路の開閉状態との関係について説明する。なお
、この図においてＣは図２では記述しない制御線である
。<<Abnormality Detection by MPU Output Circuit>> Of these, the MPU output check portion is shown in FIG. Figure 3
AB the output of the first check circuit CHKAB
, the output of the second check circuit CHKCA, and the output of the third check circuit CHKCA
Letting the output of the check circuit CHKBC be BC, and the outputs of the error check circuit 231 as Ag, Cg, and 29g, the relationship between the outputs of the three check circuits and the open/close states of the three-state buffer circuit at that time will be explained. Note that in this figure, C is a control line that is not described in FIG.

【００２２】まず、第１乃至第３のチェック回路ＣＨＫ
は、その夫々の２組の入力（アドレス，データ，制御信
号）を得て、第１のチェック回路ＣＨＫＡＢ　はＭＰＵ
Ａの出力とＭＰＵＢの出力との比較結果ＡＢを、第２の
チェック回路ＣＨＫＣＡ　はＭＰＵＡの出力とＭＰＵＣ
の出力との比較結果ＣＡを、第３のチェック回路ＣＨＫ
ＢＣ　はＭＰＵＢの出力とＭＰＵＣの出力との比較結果
ＢＣを出力する。この比較結果は一致するか、しないか
のいずれかの状態信号である。First, the first to third check circuits CHK
obtains their respective two sets of inputs (address, data, control signals), and the first check circuit CHKAB is connected to the MPU
The second check circuit CHKCA compares the comparison result AB between the output of A and the output of MPUB with the output of MPUA and MPUC.
The comparison result CA with the output of
BC outputs the comparison result BC between the output of MPUB and the output of MPUC. The result of this comparison is a status signal that either matches or does not match.

【００２３】エラーチェック回路２３１は、３つのチェ
ック回路ＣＨＫの出力ＡＢ，ＢＣ，ＣＡから、（１），
（２），（３）式に従いＭＰＵＡ，ＭＰＵＢ，ＭＰＵＣ
の正常を表す出力Ａｇ，Ｂｇ，Ｃｇを得る。なお、図２
，図３においてエラーチェック回路は二重化されている
。The error check circuit 231 detects (1),
MPUA, MPUB, MPUC according to formulas (2) and (3)
Outputs Ag, Bg, and Cg representing normality are obtained. In addition, Figure 2
, the error check circuit in FIG. 3 is duplicated.

【００２４】　　Ａｇ＝「ＡＢ・「ＣＡ＋「ＡＢ・ＢＣ・ＣＡ＋ＡＢ
・ＢＣ・「ＣＡ……（１）　　Ｂｇ＝「ＡＢ・「ＢＣ＋
「ＡＢ・ＢＣ・ＣＡ＋ＡＢ・「ＢＣ・ＣＡ……（２）　
　Ｃｇ＝「ＢＣ・「ＣＡ＋ＡＢ・「ＢＣ・ＣＡ＋ＡＢ・
ＢＣ・「ＣＡ……（３）　　但し、ＡＢ：ＭＰＵＡとＭ
ＰＵＢの出力不一致の事象（２３−１で確認）　　　　
　　　　ＢＣ：ＭＰＵＢとＭＰＵＣの出力不一致の事象
（２３−３で確認）　　　　　　　　ＣＡ：ＭＰＵＡと
ＭＰＵＣの出力不一致の事象（２３−２で確認）　　　
　　　　　　　・：論理積（ＡＮＤ）　　　　　　　　
　　＋：論理和（ＯＲ）　　　　　　　　　　「：否定
（ＮＯＴ）（１），（２），（３）式演算の結果に応じ
て３ステートバッファ回路２００，２０１，２０４，２
０５，２９の開閉状態が制御されるが、この説明は次の
項で行う。表１は、３つのチェック回路ＣＨＫＡＢ，Ｃ
ＨＫＢＣ　、ＣＨＫＣＡ　の出力（一致，不一致）と、
このときの異常ＭＰＵの判定結果Ａｇ，Ｂｇ，Ｃｇと、
その結果としての３ステートバッファ回路の開閉状態を
纏めた表である。なお、表１中の判定結果の項において
、１はＭＰＵ正常，０は異常または不明を意味する。Ag="AB・"CA+"AB・BC・CA+AB
・BC・"CA...(1) Bg="AB・"BC+
"AB・BC・CA+AB・"BC・CA...(2)
Cg="BC・"CA+AB・"BC・CA+AB・
BC/“CA…(3) However, AB: MPUA and M
PUB output mismatch event (confirmed with 23-1)
BC: Event of mismatched output between MPUB and MPUC (confirmed in 23-3) CA: Event of mismatched output between MPUA and MPUC (confirmed in 23-2)
・: Logical product (AND)
+: Logical sum (OR) ": Negation (NOT) 3-state buffer circuits 200, 201, 204, 2 according to the results of the calculations of expressions (1), (2), and (3)
The open/close states of 05 and 29 are controlled, and this will be explained in the next section. Table 1 shows the three check circuits CHKAB,C
HKBC, CHKCA output (match, mismatch),
The determination results Ag, Bg, Cg of the abnormal MPU at this time,
This is a table summarizing the open/close states of the 3-state buffer circuit as a result. Note that in the determination result section of Table 1, 1 means MPU normal, and 0 means abnormal or unknown.

【００２５】表２は表１の一致，不一致のチェック回路
出力を生じる原因として想定される事例の一部を述べた
ものであるが、（本発明は、異常の際にＢＰＵ内の回路
構成を如何に変更し運転継続させるかに主眼があり、異
常発生原因を特定することは本旨ではないので）ここで
の詳細説明を省略する。[0025] Table 2 describes some of the cases that are assumed to be the causes of the match/mismatch check circuit outputs in Table 1. The main focus is on how to make changes and continue operation, and the main purpose is not to identify the cause of the abnormality, so a detailed explanation will be omitted here.

【００２６】[0026]

【表１】[Table 1]

【００２７】[0027]

【表２】[Table 2]

【００２８】図３，図２，表１，表２を参照して説明し
たように、本発明においては、ＭＰＵ出力チェック回路
２３で以上の論理でＭＰＵの正常，異常を判断する。As explained with reference to FIGS. 3, 2, Tables 1 and 2, in the present invention, the MPU output check circuit 23 determines whether the MPU is normal or abnormal based on the above logic.

【００２９】次に、ＢＰＵ内各部にその他の異常検出手
法として採用したパリティチェック回路による異常検出
手法について説明する。但し、パリティチェック回路自
体は周知であり任意のものが採用できるので回路につい
ての詳細説明を省略し、ここではパリティエラー検出し
たときの異常個所特定手法について説明する。Next, a description will be given of an abnormality detection method using a parity check circuit, which is employed as another abnormality detection method in each part of the BPU. However, since the parity check circuit itself is well known and any one can be used, a detailed explanation of the circuit will be omitted, and a method for identifying an abnormal location when a parity error is detected will be described here.

【００３０】図２に示すように、ライトアクセス時には
パリティ生成／検査照合回路１０乃至１５から適宜のパ
リティ信号が付与されてアドレスラインＡ，データライ
ンＤに情報送出され、この異常をパリティチェック回路
２５０，３０，３１にて検知する。またリードアクセス
時には、パリティ生成／検査照合回路１０乃至１５，パ
リティチェック回路２５０，３０，３１にて情報の異常
を検知する。これらのパリティチェックは基本的にアド
レスとデータに分けて個別に実施される。そしてアドレ
スについてみると、アドレス情報にパリティエラー検出
したときの異常個所はこのアドレス信号を送出している
バスマスタであり、図２の内部バスの使用権を与えるバ
スアービタ（図示していない）からのバスグラント信号
を監視することでバスマスタとなっている機器（ＭＰＵ
，キャッシュメモリ，ＢＩＵ）を特定することができる
。次にデータについてみると、ライトアクセス時にデー
タ情報のパリティエラー検出したときの異常個所はこの
データ信号を送出しているバスマスタである。バスマス
タの特定は、バスアービタのバスグラント信号監視によ
り行われる。最後に、リードアクセス時にデータ情報の
パリティエラー検出したときの異常個所はこのデータ信
号の出力元であり、この特定はこのデータに付属するア
ドレスが指し示しているデバイスをアドレスをデコード
することで特定できる。As shown in FIG. 2, at the time of write access, appropriate parity signals are applied from the parity generation/check verification circuits 10 to 15 and information is sent to the address line A and data line D, and this abnormality is detected by the parity check circuit 250. , 30, 31. Furthermore, during read access, parity generation/check verification circuits 10 to 15 and parity check circuits 250, 30, and 31 detect abnormalities in information. These parity checks are basically performed separately for addresses and data. Regarding the address, when a parity error is detected in the address information, the abnormality is the bus master that is sending out this address signal, and the bus arbiter (not shown) that gives the right to use the internal bus in Figure 2. The device (MPU) that is the bus master by monitoring the grant signal
, cache memory, BIU). Next, regarding data, when a parity error in data information is detected during write access, the abnormality is the bus master that is sending out this data signal. The bus master is identified by monitoring the bus grant signal of the bus arbiter. Finally, when a parity error is detected in data information during read access, the abnormal location is the output source of this data signal, and this can be identified by decoding the address to the device pointed to by the address attached to this data. .

【００３１】この異常個所特定の考え方を論理式にて示
すと以下のようになる。[0031] The concept of specifying the abnormal location can be expressed as a logical formula as follows.

【００３２】《パリティチェックによる異常検出》ＰＴ
ＹＧＥＮ／ＮＧ＝ＡＰＥ・ＭＰＵ／ＭＳＴ＋ＤＰＥ（Ｗ
Ｔ・ＭＰＵ／ＭＳＴ　　　　　　　　　　　　＋ＲＤ・
ＭＰＵ／ＳＮＤ）　　　　　　　　　　　　　　　　　
　　　　　　　　　……（４）Ｃａｃｈ／ＮＧ＝ＡＰＥ
・Ｃａｃｈ／ＭＳＴ＋ＤＰＥ（ＷＴ・Ｃａｃｈ／ＭＳＴ
　　　　　　　　　　　　　　　　＋ＲＤ・Ｃａｃｈ／
ＳＮＤ）　　　　　　　　　　　　　　　　　　　　…
…（５）ＢＩＵ／ＮＧ＝ＡＰＥ・ＢＩＵ／ＭＳＴ＋ＤＰ
Ｅ（ＷＴ・ＢＩＵ／ＭＳＴ　　　　　　　　　　　　　
　＋ＲＤ・ＢＩＵ／ＳＮＤ）　　　　　　　　　　　　
　　　　　　　　　　　　……（６）ＳＹＳＢＵＳ／Ｎ
Ｇ＝ＢＩＵ／ＮＧ　　　　　　　　　　　　　　　　　
　　　　　　　　　　　　　　　　　　　……（７）但
し、（４）乃至（７）式において、ＰＴＹＧＥＮ：パリティ生成／検査照合回路１０乃至１
５／ＮＧ：パリティ異常ＡＰＥ：アドレスパリティ異常・：論理積／ＭＳＴ：バスマスタ＋：論理和ＤＰＥ：データパリティ異常ＷＴ：バスマスタがデータ出力Ｃａｃｈ：キャッシュメモリＲＤ：バスマスタがデータ入力／ＳＮＤ：データ出力元ＩＶ．異常時の構成変更制御ＢＰＵ内の異常には、ＭＰＵからのライトアクセス時に
ＭＰＵ出力チェック回路で検知されるものと、ライトア
クセス時あるいはキャッシュリードアクセス時にパリテ
ィチェック回路で発見されるものとがある。<<Abnormality detection by parity check>> PT
YGEN/NG=APE・MPU/MST+DPE(W
T・MPU/MST +RD・
MPU/SND)
...(4) Cach/NG=APE
・Cach/MST+DPE (WT・Cach/MST
+RD・Cach/
SND)...
...(5) BIU/NG=APE・BIU/MST+DP
E(WT・BIU/MST
+RD・BIU/SND)
...(6) SYSBUS/N
G=BIU/NG
...(7) However, in equations (4) to (7), PTYGEN: parity generation/check matching circuits 10 to 1
5/NG: Parity error APE: Address parity error ・: AND/MST: Bus master +: OR DPE: Data parity error WT: Bus master outputs data Cach: Cache memory RD: Bus master inputs data/SND: Data output source IV. Configuration change control during abnormality Abnormalities within the BPU include those detected by the MPU output check circuit during write access from the MPU, and those detected by the parity check circuit during write access or cache read access.

【００３３】〔ＭＰＵ出力チェック回路による異常検出時の構成変更
〕前記ＭＰＵ出力チェック回路２３のエラーチェック回路
２３１の出力Ａｇに応じて３ステートバッファ回路２０
０，２０１が、Ｃｇに応じて２０３，２０４が、２９ｇ
に応じて２９の開閉状態が、夫々表１のように制御され
る。なお、表１において、ＭＰＵ判定結果Ａｇ＝１は２
００，２０１開、Ａｇ＝０は２００，２０１閉に基本的
に対応し、Ｃｇ＝１は２０３，２０４開、Ｃｇ＝０は２
０３，２０４閉に基本的に対応するが、Ｂｇと２９ｇは
対応関係にはない。２９ｇ従って、２９の開閉状態は、
Ａｇ＝１かつＣｇ＝１のときに閉、ＡｇとＣｇのいずれ
かが１のときは０となった３ステートバッファ回路に向
かう方向の３ステートバッファ回路２９のみが開放され
る。以下、表１の各ケースについて、図４の系統構成を
参照してより詳細に説明する。[Configuration change when an abnormality is detected by the MPU output check circuit] The three-state buffer circuit 20 changes depending on the output Ag of the error check circuit 231 of the MPU output check circuit 23.
0,201 is 203,204 depending on Cg, 29g
The opening/closing states of 29 are controlled as shown in Table 1, respectively. In addition, in Table 1, MPU determination result Ag=1 is 2
00,201 open, Ag=0 basically corresponds to 200,201 closed, Cg=1 is 203,204 open, Cg=0 is 2
It basically corresponds to 03 and 204 closures, but Bg and 29g do not have a corresponding relationship. 29g Therefore, the open/closed state of 29 is:
When Ag=1 and Cg=1, it is closed, and when either Ag or Cg is 1, only the 3-state buffer circuit 29 in the direction toward the 3-state buffer circuit, which is 0, is opened. Each case in Table 1 will be described in more detail below with reference to the system configuration in FIG. 4.

【００３４】ケース１：全てのＭＰＵ出力が一致し、全
ＭＰＵ正常である。３ステートバッファ回路２００，２
０１，２０３，２０４が開状態，２９が閉状態とされ、
図４（ａ）のようにＭＰＵＡとキャッシュメモリ２２０
による系統と、ＭＰＵＣとキャッシュメモリ２２１によ
る系統とが独立して二重化運用される。Case 1: All MPU outputs match and all MPUs are normal. 3-state buffer circuit 200,2
01, 203, 204 are in the open state, 29 is in the closed state,
As shown in FIG. 4(a), the MPUA and cache memory 220
The system based on the MPUC and the cache memory 221 are independently operated in a redundant manner.

【００３５】ケース２：チェック回路ＣＨＫＣＡ　のみ
が不一致出力を与えており、ＭＰＵＢのみが正常と判断
される。図２に示すようにＭＰＵＢは他のＭＰＵの参照
用として使用され、キャッシュメモリに出力を与えるよ
うに構成されていないので構成変更しての運転継続不可
能であり、この場合システムダウンとなる。Case 2: Only the check circuit CHKCA gives a non-coincidence output, and only MPUB is judged to be normal. As shown in FIG. 2, MPUB is used as a reference for other MPUs and is not configured to provide output to the cache memory, so it is impossible to continue operation even after changing the configuration, and in this case, the system will go down.

【００３６】ケース３：チェック回路ＣＨＫＢＣ　のみ
が不一致出力を与えており、ＭＰＵＡのみが正常と判断
される。この場合には３ステートバッファ回路２００，
２０１が開状態，２０３，２０４が閉状態，２９はキャ
ッシュメモリ２２１方向への３ステートバッファ回路の
みが開状態とされる。ＭＰＵＢとＭＰＵＣは停止され、
図４（ｂ）のようにＭＰＵＡのみによる単独系統による
運転とされる。キャッシュメモリ２２１方向への３ステ
ートバッファ回路２９のみが開状態とされるのは、キャ
ッシュメモリ記憶内容の同一性保持のためである。Case 3: Only the check circuit CHKBC is giving a non-coincidence output, and only the MPUA is judged to be normal. In this case, the 3-state buffer circuit 200,
201 is open, 203 and 204 are closed, and only the 3-state buffer circuit 29 toward the cache memory 221 is open. MPUB and MPUC are stopped,
As shown in FIG. 4(b), the system is operated by a single system using only the MPUA. The reason why only the three-state buffer circuit 29 toward the cache memory 221 is kept open is to maintain the sameness of the contents stored in the cache memory.

【００３７】ケース４：チェック回路ＣＨＫＡＢ　のみ
が一致出力を与えており、ＭＰＵＡとＭＰＵＢが正常と
判断される。この場合には３ステートバッファ回路２０
０，２０１が開状態，２０３，２０４が閉状態、２９は
キャッシュメモリ２２１方向への３ステートバッファ回
路のみが開状態とされる。この場合にはＭＰＵＣを停止
し、図４（ｃ）のようにＭＰＵＡとＭＰＵＢで二重系を
構成して、ＭＰＵＢによりＭＰＵＡの出力を監視する二
重化運転とされる。キャッシュメモリ２２１方向への３
ステートバッファ回路２９のみが開状態とされるのは、
キャッシュメモリ記憶内容の同一性保持のためである。Case 4: Only the check circuit CHKAB gives a matching output, and MPUA and MPUB are judged to be normal. In this case, the 3-state buffer circuit 20
0 and 201 are open, 203 and 204 are closed, and only the 3-state buffer circuit 29 toward the cache memory 221 is open. In this case, the MPUC is stopped, a duplex system is configured with MPUA and MPUB as shown in FIG. 4(c), and a duplex operation is performed in which the output of MPUA is monitored by MPUB. 3 toward cache memory 221
Only the state buffer circuit 29 is open because
This is to maintain the identity of the contents stored in the cache memory.

【００３８】ケース５：チェック回路ＣＨＫＡＢ　のみ
が不一致出力を与えており、ＭＰＵＡとＭＰＵＢが異常
，ＭＰＵＡのみが正常と判断される。この場合には３ス
テートバッファ回路２００，２０１が閉状態，２０３，
２０４が開状態，２９はキャッシュメモリ２２０方向へ
の３ステートバッファ回路のみが開状態とされる。この
場合にはＭＰＵＡとＭＰＵＢを停止し、図４（ｄ）のよ
うにＭＰＵＣのみによる単独運転とされる。キャッシュ
メモリ２２０方向への３ステートバッファ回路２９のみ
が開状態とされるのは、キャッシュメモリ記憶内容の同
一性保持のためである。Case 5: Only the check circuit CHKAB is giving a non-coincidence output, and it is determined that MPUA and MPUB are abnormal and only MPUA is normal. In this case, the three-state buffer circuits 200, 201 are in the closed state, 203,
204 is in an open state, and only the 3-state buffer circuit 29 toward the cache memory 220 is in an open state. In this case, MPUA and MPUB are stopped, and only MPUC operates independently as shown in FIG. 4(d). The reason why only the three-state buffer circuit 29 toward the cache memory 220 is kept open is to maintain the sameness of the contents stored in the cache memory.

【００３９】ケース６：チェック回路ＣＨＫＢＣ　のみ
が一致出力を与えており、ＭＰＵＣとＭＰＵＢが正常と
判断される。この場合には３ステートバッファ回路２０
０，２０１が閉状態，２０３，２０４が開状態，２９は
キャッシュメモリ２２０方向への３ステートバッファ回
路のみが開状態とされる。この場合には基本的にケース
４と同様に運用される。Case 6: Only the check circuit CHKBC gives a matching output, and MPUC and MPUB are judged to be normal. In this case, the 3-state buffer circuit 20
0 and 201 are in the closed state, 203 and 204 are in the open state, and 29, only the three-state buffer circuit toward the cache memory 220 is in the open state. In this case, the operation is basically the same as Case 4.

【００４０】ケース７：チェック回路ＣＨＫＣＡ　のみ
が一致出力を与えており、ＭＰＵＣとＭＰＵＡが正常と
判断される。この場合には参照用ＭＰＵの異常なので、
図４（ｅ）ケース７のように、ＭＰＵＢのみを切離し、
３ステートバッファ回路は何等の変更もせずにＭＰＵＣ
とＭＰＵＡによる二重化運転を継続する。Case 7: Only the check circuit CHKCA gives a matching output, and MPUC and MPUA are judged to be normal. In this case, the reference MPU is abnormal, so
As in case 7 of Fig. 4(e), only MPUB is separated,
The 3-state buffer circuit can be used as MPUC without any modification.
and continue duplex operation using MPUA.

【００４１】ケース８：いずれのチェック回路ＣＨＫも
不一致を検出しており、全ＭＰＵ異常であることから以
後の運転継続不可能である。Case 8: Since both check circuits CHK have detected a mismatch and all MPUs are abnormal, it is impossible to continue operation from now on.

【００４２】以上のようにして、３台のＭＰＵとその周
辺回路（例えばパリティ生成／検査照合回路）の正常性
が確認され、適宜構成変更制御が実施されるが、この表
１はあくまでも照合結果の考え得る組合せを述べたにす
ぎず、実際問題としてはケース２から８の７つの異常事
象が同一確率で発生するわけではない。つまり、このう
ち単一故障のケースは４，６，７の３事例、二重故障は
２，３，５の３事例、三重故障は８のケースであり、良
く知られているように運転継続不能となるケース２，８
を含む多重故障の同時発生確率は単一故障に比べて極め
て低い。しかも、実際には単一故障が進展して多重故障
に至ることが殆どであり、従って単一故障の時点で何等
かの回復対策を施すことで事実上運転継続に支障のない
システム構成とすることができる。なお、本発明では仮
に二重故障が発生したとしても多くの場合に支障無く運
転継続可能であり、この意味においては非常に信頼性の
高いシステムであるといえる。[0042] As described above, the normality of the three MPUs and their peripheral circuits (for example, parity generation/check verification circuits) is confirmed, and configuration change control is implemented as appropriate. However, Table 1 only shows the verification results. This is merely a description of possible combinations, and in reality, the seven abnormal events in cases 2 to 8 do not occur with the same probability. In other words, among these, there are three cases of single failure (4, 6, and 7), three cases of double failure (2, 3, and 5), and three cases of triple failure (8 cases), and as is well known, operation continues. Cases 2 and 8 where it becomes impossible
The probability of multiple failures including multiple failures occurring simultaneously is extremely low compared to a single failure. Moreover, in reality, in most cases, a single failure progresses and leads to multiple failures, so by taking some kind of recovery measures at the time of a single failure, it is necessary to create a system configuration that virtually does not hinder continued operation. be able to. In addition, in the present invention, even if a double failure occurs, operation can be continued without any trouble in most cases, and in this sense, it can be said that the system is extremely reliable.

【００４３】なお、以上の異常事象発生の際に図２には
図示がないが、異常ＭＰＵを停止する信号がＭＰＵ出力
チェック回路２３から発生されてこれを停止し、あるい
は外部出力されて運転員に異常の発生を報知し、以後の
対策の必要性を報知せしめることは当然のこととして行
われる。Although not shown in FIG. 2, when the above-mentioned abnormal event occurs, a signal to stop the abnormal MPU is generated from the MPU output check circuit 23 and is stopped, or is output externally and sent to the operator. It is a matter of course to notify the user of the occurrence of an abnormality and the necessity of future countermeasures.

【００４４】〔パリティチェックによる異常検出時の構成変更〕前記
のＩＩＩ　項で述べたようにして、ライトアクセス時あ
るいはキャッシュリードアクセス時に、キャッシュメモ
リ２２０，２２１，ＢＩＵ２７−１，２７−２の異常個
所が特定できる。次に各異常の時のＢＰＵ内部の構成変
更制御について説明する。なお、表３はキャッシュリー
ドアクセス時の各部異常の際にキャッシュメモリ２２０
，２２１，ＢＩＵ２７−１，２７−２，３ステートバッ
ファ回路２９，２６，２７をどのように制御するのかを
一覧表にしたものである。[Configuration change when abnormality is detected by parity check] As described in Section III above, abnormal locations in the cache memories 220, 221, BIUs 27-1, 27-2 are changed during write access or cache read access. can be identified. Next, the configuration change control inside the BPU at the time of each abnormality will be explained. Table 3 shows that the cache memory 220 is
, 221, BIU 27-1, 27-2, and how to control the 3-state buffer circuits 29, 26, and 27.

【００４５】[0045]

【表３】[Table 3]

【００４６】図５は各ケースの時の回路構成を図示した
ものであり、以下表３と図５を参照して説明する。図５
（ａ）は正常時の信号の流れを示している。この場合、
３ステートバッファ回路２９，２６は閉、２７は開とさ
れており、従ってＢＩＵ２７−１またはキャッシュメモ
リ２２０からの情報がＭＰＵＡ２０−１と、ＭＰＵＢ２
０−１に供給され、ＢＩＵ２７−２またはキャッシュメ
モリ２２１からの情報がＭＰＵＣ２０−３に供給される
。このように、通常はＢＩＵ２７−１、キャッシュメモ
リ２２０，ＭＰＵＡ２０−１，ＭＰＵＢ２０−１が一つ
の組を構成し、ＢＩＵ２７−２，キャッシュメモリ２２
１，ＭＰＵＣ２０−３が別の一組を構成するように運用
される。FIG. 5 shows the circuit configuration for each case, which will be explained below with reference to Table 3 and FIG. Figure 5
(a) shows the signal flow during normal operation. in this case,
The 3-state buffer circuits 29 and 26 are closed, and the 3-state buffer circuit 27 is open, so that information from the BIU 27-1 or the cache memory 220 is transferred to the MPUA 20-1 and the MPUB 2.
0-1, and information from the BIU 27-2 or the cache memory 221 is supplied to the MPUC 20-3. In this way, normally the BIU 27-1, the cache memory 220, the MPUA 20-1, and the MPUB 20-1 constitute one set, and the BIU 27-2 and the cache memory 22
1. The MPUC 20-3 is operated to form another set.

【００４７】ケース１：キャッシュメモリ２２０の異常
である。図５（ｂ）のように、キャッシュメモリ２２０
の出力が停止され、３ステートバッファ回路２９はＭＰ
ＵＡ２０−１側への信号のみが通過するように制御され
、３ステートバッファ回路２６は開、２７は閉とされる
。これにより、全てのＭＰＵはキャッシュメモリ２２１
からの共通情報を受け取るように構成されて異常発見後
も運転継続される。なお、３ステートバッファ回路２６
を開、２７を閉のように正常状態から切替る理由は、論
理的にはキャッシュメモリ２２０の異常と特定していて
も、キャッシュメモリ２２０が接続された内部バスの異
常の可能性も否定できず、念のためにキャッシュメモリ
２２１側に切替るものである。もし、キャッシュメモリ
２２０が接続された内部バスの異常のときは、３ステー
トバッファ回路２９が一方向通信となっているためにＭ
ＰＵＣ側にはその影響が現れない。Case 1: The cache memory 220 is abnormal. As shown in FIG. 5(b), the cache memory 220
The output of MP is stopped, and the 3-state buffer circuit 29
It is controlled so that only the signal to the UA 20-1 side passes through, the 3-state buffer circuit 26 is open, and the 3-state buffer circuit 27 is closed. As a result, all MPUs are connected to the cache memory 221.
The system is configured to receive common information from the system and continues to operate even after an abnormality is detected. Note that the 3-state buffer circuit 26
Although the reason for switching from the normal state such as opening and closing 27 is logically determined to be an abnormality in the cache memory 220, the possibility of an abnormality in the internal bus to which the cache memory 220 is connected cannot be ruled out. First, just to be sure, it is switched to the cache memory 221 side. If there is an abnormality in the internal bus to which the cache memory 220 is connected, the 3-state buffer circuit 22 may
This effect will not be felt on the PUC side.

【００４８】ケース２：キャッシュメモリ２２１の異常
である。図５（ｃ）のように、キャッシュメモリ２２１
の出力が停止され、３ステートバッファ回路２９はＭＰ
ＵＣ２０−３側への信号のみが通過するように制御され
、これにより全てのＭＰＵはキャッシュメモリ２２０か
らの共通情報を受取るように構成されて異常発見後も運
転継続される。Case 2: The cache memory 221 is abnormal. As shown in FIG. 5(c), the cache memory 221
The output of MP is stopped, and the 3-state buffer circuit 29
It is controlled so that only the signal to the UC 20-3 side passes through, so that all MPUs are configured to receive common information from the cache memory 220 and continue to operate even after an abnormality is discovered.

【００４９】ケース３，５：ＢＩＵ２７０あるいはその
接続されたシステムバス１−１側の異常である。図５（
ｄ），（ｅ）のように、ＢＩＵ２７０あるいはその接続
されたシステムバス１−１側を停止し、ケース１と同様
に運用する。Cases 3 and 5: There is an abnormality in the BIU 270 or the system bus 1-1 connected to it. Figure 5 (
As shown in d) and (e), the BIU 270 or the system bus 1-1 side connected to it is stopped, and the operation is performed in the same manner as in case 1.

【００５０】以上のようにして、パリティエラーによる
異常検知されたときは構成変更とともに異常の旨、外部
報知される。As described above, when an abnormality due to a parity error is detected, the abnormality is notified externally along with the configuration change.

【００５１】以上詳細に述べたように、本発明によれば
ＢＰＵの内部に異常が発生したとしても、その回路構成
の一部を切離しあるいは情報の流れを変更することによ
って、正常時と同様に運転継続が可能である。このため
データ処理の途中で異常が発生した場合には、（１）切
りの良い時点または、修理保守時期まで当該ＢＰＵでの
動作を継続させ、（２）切りの良い時点または、修理保守時期に当該ＢＰ
Ｕで実行していた処理を他の正常なＢＰＵに引き継がせ
れば良い。As described in detail above, according to the present invention, even if an abnormality occurs inside the BPU, by disconnecting a part of the circuit configuration or changing the flow of information, the system can be restored as normal. Continued operation is possible. Therefore, if an abnormality occurs during data processing, (1) the operation of the relevant BPU should be continued until a suitable time is reached or it is time for repair/maintenance; The relevant BP
All that is required is to have another normal BPU take over the processing being executed by U.

【００５２】この結果、異常発生時のチェックポイント
リスタートに備えてのバックアップ動作が不要となり、
処理性能を向上させることができる。[0052] As a result, there is no need for backup operations in preparation for checkpoint restart in the event of an abnormality.
Processing performance can be improved.

【００５３】Ｖ．内部バス接続時の信号処理以上説明し
たように、各部異常の際に内部バスの切替を３ステート
バッファ２９を用いて行うが、３ステートバッファ２９
の開閉操作は、通常の経路でのライトアクセスに比べて
切替に時間がかかり、しかもバス間で迂回するために時
間がかかる。この改善策としては、図６のように異常発
生時にのみリトライによりバスサイクルを延長するのが
バスサイクルの遅延を生じず有効である。[0053]V. Signal processing when connecting the internal bus As explained above, the internal bus is switched using the 3-state buffer 29 in the event of an abnormality in each part, but the 3-state buffer 29
Opening/closing operations take longer to switch than normal write access, and it also takes time to make detours between buses. As a countermeasure for this problem, it is effective to extend the bus cycle by retrying only when an abnormality occurs as shown in FIG. 6 without causing a delay in the bus cycle.

【００５４】つまり、異常が発見された（ステップＳ１
，Ｓ２）ときには、ステップＳ４においてリトライをさ
せる信号をアサートし、ステップＳ５において異常出力
の停止（異常ＭＰＵの切離し操作等），正常出力の迂回
処理を実施した後で、ステップＳ６においてこのバスサ
イクルを終了させる信号をアサ−トして一連の処理を終
了する。なお、正常であるときにはステップＳ３におい
てこのバスサイクルを終了させる信号をアサ−トするの
みでよい。ＭＰＵにバスサイクルを終了させたり、リト
ライをさせたりするための信号線はＭＰＵの種類により
名称が異なるが、多くのＭＰＵではリトライ信号をＭＰ
Ｕに入力することでＭＰＵが自動的に実行する。表４に
代表的なＭＰＵの信号名を示す。In other words, an abnormality has been discovered (step S1
, S2), a signal for retrying is asserted in step S4, and after stopping the abnormal output (disconnecting the abnormal MPU, etc.) and detouring the normal output in step S5, this bus cycle is terminated in step S6. The series of processing is completed by asserting a termination signal. Incidentally, when the bus cycle is normal, it is only necessary to assert a signal for terminating this bus cycle in step S3. The name of the signal line that causes the MPU to end the bus cycle or make a retry differs depending on the type of MPU, but in many MPUs, the retry signal is
The MPU automatically executes this by inputting it to U. Table 4 shows typical MPU signal names.

【００５５】[0055]

【表４】[Table 4]

【００５６】図７，図８は図６のリトライ方式をライト
アクセス時に採用したときの信号の流れを示したもので
あり、図７は正常時、図８は異常時を示す。同図におい
て、縦軸は時間の経過を示し、横軸はＭＰＵ出力がキャ
ッシュメモリに至るまでの各部回路を示している。通常
、ＭＰＵからはデータ信号に先立って、アドレス信号が
出力される。図７では、アドレス信号，データ信号がと
もに正常であるためにＭＰＵ出力チェック回路２３，パ
リティチェック回路２５０では正常と判断され、ＭＰＵ
には終了信号が返され、キャッシュメモリ２２０ではデ
ータを格納しバスサイクルが終了する。FIGS. 7 and 8 show the signal flow when the retry method of FIG. 6 is adopted at the time of write access, with FIG. 7 showing the normal state and FIG. 8 showing the abnormal state. In the figure, the vertical axis shows the passage of time, and the horizontal axis shows each circuit from the MPU output to the cache memory. Usually, an address signal is output from the MPU before a data signal. In FIG. 7, since both the address signal and the data signal are normal, the MPU output check circuit 23 and the parity check circuit 250 determine that they are normal, and the MPU
An end signal is returned to the cache memory 220, and the data is stored in the cache memory 220, and the bus cycle ends.

【００５７】図８では、ＭＰＵＡが異常でアドレス信号
，データ信号がともにＭＰＵ出力チェック回路２３によ
り異常と判定され、各ＭＰＵに終了信号とともにリトラ
イ信号が返されリトライ動作に入る。リトライ動作時に
は３ステートバッファ２００、２０１を閉状態としてＭ
ＰＵＡから内部バスへの信号伝達を阻止し、３ステート
バッファ２９を一方向のみ開としてＭＰＵＣの出力信号
をキャッシュメモリ２５０にも供給する。その後、各Ｍ
ＰＵには終了信号が返され、動作が終了する。In FIG. 8, the MPUA is abnormal and both the address signal and the data signal are determined to be abnormal by the MPU output check circuit 23, and a retry signal is returned to each MPU along with a termination signal to enter a retry operation. During retry operation, the 3-state buffers 200 and 201 are closed and M
Signal transmission from the PUA to the internal bus is blocked, the 3-state buffer 29 is opened in only one direction, and the output signal of the MPUC is also supplied to the cache memory 250. Then each M
An end signal is returned to the PU, and the operation ends.

【００５８】図９，図１０，図１１は図６のリトライ方
式をキャッシュリードアクセス時に採用したときの信号
の流れを示したものであり、図９は正常時、図１０はア
ドレス信号異常時、図１１はデータ信号異常時を夫々示
す。図９では、アドレス信号、データ信号がともに正常
であり異常が見られないために、ＭＰＵには終了信号が
返され、ＭＰＵはキャッシュメモリ２５０からのデータ
を格納してバスサイクルを終了する。図１０では、ＭＰ
ＵＡからのアドレス信号が他と一致せずに異常と判断さ
れ、各ＭＰＵに終了信号とともにリトライ信号が返され
リトライ動作に入る。リトライ動作時には３ステートバ
ッファ２０１を閉状態としてＭＰＵＡから内部バスへの
信号伝達を阻止し、３ステートバッファ２９を一方向の
み開としてＭＰＵＣのアドレス出力信号をキャッシュメ
モリ２２０に供給し、キャッシュメモリ２２０は与えら
れたアドレスに格納されているデータをＭＰＵＡとＭＰ
ＵＢに供給する。その後、各ＭＰＵに終了信号を返して
、リトライ動作が終了する。FIGS. 9, 10, and 11 show the signal flow when the retry method of FIG. 6 is adopted at the time of cache read access. FIG. 9 shows the normal state, FIG. 10 shows the address signal abnormal state, FIG. 11 shows each case when the data signal is abnormal. In FIG. 9, since both the address signal and the data signal are normal and no abnormality is observed, an end signal is returned to the MPU, and the MPU stores the data from the cache memory 250 and ends the bus cycle. In Figure 10, MP
The address signal from the UA does not match the others and is determined to be abnormal, and a retry signal is returned together with an end signal to each MPU, and a retry operation begins. During retry operation, the 3-state buffer 201 is closed to prevent signal transmission from MPUA to the internal bus, and the 3-state buffer 29 is opened in only one direction to supply the MPUC address output signal to the cache memory 220. MPUA and MP data stored at the given address
Supply to UB. Thereafter, a termination signal is returned to each MPU, and the retry operation is terminated.

【００５９】図１１では、キャッシュメモリ２２０から
のデータに異常があり、パリティ生成照合検査回路１０
，１２，パリティチェック回路２５０でのパリティチェ
ックにより各常と判断され、各ＭＰＵに終了信号ととも
にリトライ信号が返されリトライ動作に入る。リトライ
動作時にはキャッシュメモリ２２０の出力が阻止され、
３ステートバッファ２９を一方向のみ開としてキャッシ
ュメモリ２２１の出力をＭＰＵＡとＭＰＵＢに供給する
。なおこの場合、３ステートバッファ回路２６を閉、２
７を開のように正常状態から切替え、３ステートバッフ
ァ回路２７を通じてキャッシュメモリ２２１の出力をＭ
ＰＵＢに供給することにより、キャッシュメモリ２２０
からＭＰＵＢへのデータ信号の経路の異常により誤った
データがＭＰＵＢへ供給されるのを防ぐことができる。In FIG. 11, there is an abnormality in the data from the cache memory 220, and the parity generation and verification circuit 10
, 12. The parity check in the parity check circuit 250 determines that each is normal, and a retry signal is returned together with an end signal to each MPU, and a retry operation begins. During the retry operation, the output of the cache memory 220 is blocked,
The 3-state buffer 29 is opened in only one direction and the output of the cache memory 221 is supplied to MPUA and MPUB. In this case, the 3-state buffer circuit 26 is closed and the 2-state buffer circuit 26 is closed.
7, the output of the cache memory 221 is changed to M through the 3-state buffer circuit 27.
By supplying PUB, the cache memory 220
It is possible to prevent incorrect data from being supplied to MPUB due to an abnormality in the data signal path from to MPUB.

【００６０】ＶＩ．異常発生後の復旧策このように本発
明装置は異常発生後も運転継続できるが、この構成のま
ま永続的に運転することは二次的故障の可能性を考慮す
ると、早急に初期の状態に復旧させるべきであり、次に
、以上発生したＢＰＵの機能を正常に復旧させるための
復旧策について説明する。その方法は、図１のＢＰＵを
１つのプリント板上に形成しておき、異常ＢＰＵプリン
ト板を正常ＢＰＵプリント板に交換することで達成され
る。VI. Restoration measures after an abnormality occurs As described above, the device of the present invention can continue to operate even after an abnormality occurs, but if it is operated permanently with this configuration, considering the possibility of secondary failure, it is impossible to return to the initial state as soon as possible. Next, we will explain the recovery measures to restore the BPU functions that occurred above normally. This method is achieved by forming the BPU of FIG. 1 on one printed board and replacing the abnormal BPU printed board with a normal BPU printed board.

【００６１】図１２は、計算機盤構成を示しており、そ
の扉を開放するとその内部にプリント板を収納するスロ
ット部が形成され、更に各スロットには図１の主記憶装
置３、ＢＰＵ２，入出力制御装置ＢＩＵ４を構成する各
プリント板が挿入され、挿入された状態で図１１には図
示せぬシステムバスに接続されるようになっている。図
示の例ではスロットＳＬは１２個あり、このうちＳＬ１
，ＳＬ３〜ＳＬ６にプリント板が挿入され、他のＳＬ２
，ＳＬ７〜ＳＬ１２が空スロットとなっている。スロッ
トＳＬに挿入されるプリント板ＰＬは通常知られたもの
で良いが、本発明のものではこのプリント板をスロット
ＳＬに固定するためのレバー２８２，プリント板が停止
中か否かを表わす表示ランプ２８０を備え、必要に応じ
て適宜プリント板の取外し要求ボタン２８１が備えられ
る。以下、ＢＰＵプリント板の交換手順について説明す
る。FIG. 12 shows the configuration of a computer board, and when the door is opened, a slot portion for storing a printed board is formed inside, and each slot has the main storage device 3, BPU 2, and input board shown in FIG. Each printed board constituting the output control device BIU4 is inserted, and in the inserted state is connected to a system bus (not shown in FIG. 11). In the illustrated example, there are 12 slots SL, among which SL1
, a printed board is inserted into SL3 to SL6, and the other SL2
, SL7 to SL12 are empty slots. The printed board PL inserted into the slot SL may be of a commonly known type, but the one of the present invention includes a lever 282 for fixing the printed board to the slot SL, and an indicator lamp indicating whether or not the printed board is stopped. 280, and a printed board removal request button 281 as required. The procedure for replacing the BPU printed board will be explained below.

【００６２】《ＢＰＵプリント板が１枚のときの交換》
図１３は、システムバス（説明の都合上一重系で示す）
１にプリント板ＰＬが接続可能なｎ個のスロットＳＬの
うち、ＳＬ１にその内部で異常発生したＢＰＵ，ＳＬ２
に主記憶装置３、ＳＬｎにＩＯＵ４のプリントが夫々挿
入されており、ＳＬ３が空きスロットとなっている例を
示す。ここでは、異常ＢＰＵに代わり機能すべき新ＢＰ
Ｕは未だスロットに挿入されていない。そしてプリント
板上の表示ランプ２８０は稼働中のために消灯している
。《Replacement when there is only one BPU printed board》
Figure 13 shows the system bus (shown as a single system for convenience of explanation)
Among the n slots SL to which a printed circuit board PL can be connected, the BPU and SL2 in which an abnormality has occurred in SL1
An example is shown in which a print of IOU4 is inserted into the main storage device 3 and SLn, respectively, and SL3 is an empty slot. Here, we introduce a new BP that should function in place of the abnormal BPU.
U has not yet been inserted into the slot. The display lamp 280 on the printed board is off because it is in operation.

【００６３】この状態で、旧ＢＰＵ２Ａの機能を正常な
新ＢＰＵ２Ｂに引き継ぐには、まず、空きスロットを用
意する。図１３の例の場合は、スロットＳＬ３が空きス
ロットとなっているので、次に新ＢＰＵ２Ｂを空きスロ
ットＳＬ３に挿入する。In this state, in order to take over the functions of the old BPU 2A to the normal new BPU 2B, an empty slot is first prepared. In the example of FIG. 13, since the slot SL3 is an empty slot, the new BPU 2B is next inserted into the empty slot SL3.

【００６４】ＢＰＵ２ＡはＢＰＵ２Ｂの挿入を検知し、
そのオペレーティングシステム（以下ＯＳと略す）の処
理により、旧ＢＰＵＡで実行中のタスクを新ＢＰＵ２Ｂ
に移管し、旧ＢＰＵ２Ａのプリント板上の表示ランプ２
８０を点灯する。以降、オンラインの業務は新ＢＰＵ２
Ｂ　により実行される。旧ＢＰＵ２Ａから新ＢＰＵ２Ｂ
への業務移管は瞬時に行われる。その後、旧ＢＰＵプリ
ント板上の表示ランプ２８０が点灯し、該ＢＰＵが停止
状態であることを確認した上で、旧ＢＰＵ２Ａを取外す
。以上の手順により、旧ＢＰＵ２Ａを抜く前に、オンラ
イン業務を新ＢＰＵ２Ｂ　に移管完了されているため、
システムを停止することなく、またシステム性能を低下
させることなくＢＰＵの交換を実現できる。[0064] BPU2A detects the insertion of BPU2B,
Through the processing of the operating system (hereinafter referred to as OS), tasks being executed on the old BPUA are transferred to the new BPU2B.
The display lamp 2 on the printed board of the old BPU2A was transferred to
Turn on 80. From then on, online operations will be handled by the new BPU2.
Executed by B. From old BPU2A to new BPU2B
The transfer of business to will be instantaneous. After that, the indicator lamp 280 on the old BPU printed board lights up, and after confirming that the BPU is in a stopped state, the old BPU 2A is removed. By following the above steps, you will have completed transferring online operations to the new BPU2B before removing the old BPU2A.
BPU replacement can be achieved without stopping the system or reducing system performance.

【００６５】図１４は，図１３で示した例についてＢＰ
Ｕ交換手順を人による動作と計算機内部の処理に分けて
処理の内容を示したＢＰＵ交換手順処理フローである。ＢＰＵを交換する場合、まず空きスロットを用意（Ｓｔ
１）する。空きスロットは、既に未使用の空きスロット
があればそれを用いればよく、また空きスロットがない
場合も、一時的に取り外し可能なハードウェアボードが
あれば、そのボードを抜き、一時的に空きスロットを作
り出し、目的のＢＰＵ交換後に、再び該ボードを戻すこ
とにより空スロットを準備することも可能である。次に
、空きスロットに新ＢＰＵを挿入（Ｓｔ５）する。その
ＢＰＵ挿入を、旧ＢＰＵ２Ａ　は割込等の手段で認識（
Ｓｔ４）する。すると、旧ＢＰＵ２Ａ　は現在実行中の
タスクを主記憶装置上に退避（Ｓｔ３）し、新ＢＰＵ２
Ｂ　が該タスクの処理を続行できるようにする。新ＢＰ
Ｕ２Ｂ　はそれを受けて、該タスクを実行（Ｓｔ５）し
、オンライン業務を開始する。旧ＢＰＵ２Ａ　は自らＢ
ＰＵ上のボード停止ランプを点灯（Ｓｔ６）し、処理を
停止（Ｓｔ７）する。その後、旧ＢＰＵ上のボード停止
ランプが点灯しているのを人間が確認（Ｓｔ８）後、旧
ＢＰＵを取り外す（Ｓｔ９）。これで、ＢＰＵ交換は完
了である。FIG. 14 shows the BP for the example shown in FIG.
This is a process flow of a BPU exchange procedure that shows the contents of the U exchange procedure divided into human operations and processing inside the computer. When replacing the BPU, first prepare an empty slot (St
1) Do. If there is an empty slot that is already unused, you can use it, or if there is no empty slot, if there is a temporarily removable hardware board, remove that board and use the temporarily empty slot. It is also possible to prepare an empty slot by creating a board and returning the board again after replacing the target BPU. Next, a new BPU is inserted into the empty slot (St5). The old BPU2A recognizes the BPU insertion through means such as interrupts (
St4). Then, the old BPU2A saves the currently executing task onto the main memory (St3), and the new BPU2A
Allow B to continue processing the task. New BP
Upon receiving this, U2B executes the task (St5) and starts online business. Old BPU2A is B itself
The board stop lamp on the PU is lit (St6) and the process is stopped (St7). Thereafter, after a person confirms that the board stop lamp on the old BPU is lit (St8), the old BPU is removed (St9). The BPU exchange is now complete.

【００６６】図１５は、上記実施例における、旧ＢＰＵ
２Ａ　上で実行中のタスクを新ＢＰＵ２Ｂ　に引き継ぎ
する手段を詳細に説明した図である。システムバスに旧
ＢＰＵ２Ａ　，新ＢＰＵ２Ｂ、さらに主記憶装置３の各
々プリント板が装着されている。旧ＢＰＵ２Ａ　　上で
は、あるタスク９２０ー１が実行中である。その時に、
新ＢＰＵ２Ｂ　が挿入されたことの連絡が旧ＢＰＵ２Ａ
　に入ったとすると、旧ＢＰＵ２Ａ　は、処理を中断し
、実行中のタスク９２０ー１を主記憶装置３上に退避す
る。一方、新ＢＰＵ２Ｂ　は主記憶装置３上に退避され
たタスク９２０−１に続くタスク９２０ー２を回復して
、中断したポイントからタスクの処理を続行する。以上
の方式を用いて、交換したＢＰＵ間の業務の引き継ぎを
行う。FIG. 15 shows the old BPU in the above embodiment.
2A is a diagram illustrating in detail a means for handing over a task being executed on BPU 2A to a new BPU 2B. Printed boards for the old BPU 2A, the new BPU 2B, and the main storage device 3 are attached to the system bus. A certain task 920-1 is being executed on the old BPU2A. At that moment,
The notification that the new BPU2B has been inserted is the old BPU2A.
If this occurs, the old BPU 2A interrupts processing and saves the task 920-1 being executed onto the main storage device 3. On the other hand, the new BPU 2B recovers the task 920-2 following the task 920-1 saved on the main storage device 3, and continues processing the task from the point where it was interrupted. The above method is used to transfer business between the replaced BPUs.

【００６７】以上が、ＢＰＵが１つの場合のＢＰＵの交
換の例である。上記実施例では、ＢＰＵが１つの場合で
も、システムを停止することなくＢＰＵの交換が可能で
ある。The above is an example of BPU replacement when there is one BPU. In the above embodiment, even when there is only one BPU, the BPU can be replaced without stopping the system.

【００６８】《ＢＰＵプリント板が複数のときの交換》
次にＢＰＵが複数の場合、あるいは挿入したＢＰＵが正
しく動作しなかった場合の対応について説明する。図１
６の本実施例では、ＢＰＵが複数装着されている。それ
ぞれのＢＰＵは交換されるべきＢＰＵを指定する手段と
して、ボード取外し要求ボタン２８１と、プリント板番
号２８２を具備している。《Replacement when there are multiple BPU printed boards》
Next, a description will be given of what to do when there is a plurality of BPUs or when the inserted BPU does not operate correctly. Figure 1
In this embodiment No. 6, a plurality of BPUs are installed. Each BPU is provided with a board removal request button 281 and a printed board number 282 as means for specifying the BPU to be replaced.

【００６９】システムバス１にプリント板を接続するた
めの、スロットＳＬ１からＳＬ３にはＢＰＵ２Ａ，２Ｂ
，２Ｃがそれぞれ装着されている。スロットＳＬ４には
主記憶装置が接続されている。スロットＳＬ５は空きス
ロットである。また、各ＢＰＵは、ＢＰＵが停止したと
きに点灯する表示ランプ２８０と、取り外すべきＢＰＵ
を指定するために用いるプリント板取外し要求ボタン２
８１と、プリント板番号２８２を有する。ここで、プリ
ント板番号はＢＰＵ２Ａ　が１、ＢＰＵ２Ｂ　が２、Ｂ
ＰＵ２Ｃ　が３と約束されている。今、新ＢＰＵ２Ｄ　
をスロットＳＬ２に装着されている旧ＢＰＵ２Ｂと交換
する場合には、まず、新ＢＰＵ２Ｄ　を空きスロットで
あるスロットＳＬ５に挿入する。それから、スロットＳ
Ｌ１〜ＳＬ３に装着されているＢＰＵのうち、交換した
いスロットＳＬ２のＢＰＵ２Ｂの取外し要求ボタン２８
１を押す。そうすると、旧ＢＰＵ２Ｂ　は実行中のタス
クと自身のプリント板番号を主記憶装置３上に退避し、
新ＢＰＵ２Ｄ　が主記憶装置３上に退避されたプリント
板番号を取り込み、退避中タスクを実行する。旧ＢＰＵ
２Ｂ　は、表示２８０を点灯し自ら停止する。その後、
旧ＢＰＵ２Ｂ　のボード停止ランプ２８０が点灯してい
るのを確認後、該ＢＰＵ２Ｂ　を取り外す。BPU2A, 2B are installed in slots SL1 to SL3 for connecting printed boards to system bus 1.
, 2C are installed respectively. A main storage device is connected to slot SL4. Slot SL5 is an empty slot. Each BPU also has an indicator lamp 280 that lights up when the BPU stops, and a BPU that should be removed.
Printed board removal request button 2 used to specify
81 and printed board number 282. Here, the printed board numbers are 1 for BPU2A, 2 for BPU2B, and B
PU2C is promised to be 3. Now, new BPU2D
When replacing the old BPU 2B installed in the slot SL2, the new BPU 2D is first inserted into the empty slot SL5. Then slot S
Among the BPUs installed in L1 to SL3, remove request button 28 for BPU2B in slot SL2 that you want to replace.
Press 1. Then, the old BPU2B saves the task being executed and its own printed board number onto the main storage device 3,
The new BPU 2D takes in the printed board number saved on the main storage device 3 and executes the task being saved. Old B.P.U.
2B turns on the display 280 and stops by itself. after that,
After confirming that the board stop lamp 280 of the old BPU2B is lit, remove the BPU2B.

【００７０】図１７は、図１６で示した例についてのＢ
ＰＵ交換手順を人による動作と計算機内部の処理に分け
て処理の内容を示したＢＰＵ交換手順処理フローである
。FIG. 17 shows B for the example shown in FIG.
This is a process flow of a BPU replacement procedure that shows the contents of the PU replacement procedure divided into human operations and computer internal processing.

【００７１】ＢＰＵ交換する場合、まず空きスロットを
用意（Ｓｔ１）する。空きスロットは、既に未使用の空
きスロットがあればそれを用いればよく、また空きスロ
ットがない場合も、一時的に取り外し可能なハードウェ
アボードがあれば、そのボードを抜き、一時的に空きス
ロットを作り出し、目的のＢＰＵ交換後に、再び該ボー
ドを戻すことにより空スロットを準備することも可能で
ある。When replacing the BPU, first prepare an empty slot (St1). If there is an empty slot that is already unused, you can use it, or if there is no empty slot, if there is a temporarily removable hardware board, remove that board and use the temporarily empty slot. It is also possible to prepare an empty slot by creating a board and returning the board again after replacing the target BPU.

【００７２】次に、空きスロットに新ＢＰＵ２Ｄ　を挿
入（Ｓｔ２）する。その後、取り外したい旧ＢＰＵ２Ｂ
　のプリント板取り外し要求ボタンを押す（Ｓｔ３）。すると、旧ＢＰＵ２Ｂは現在実行中のタスクと自プリン
ト板番号を主記憶装置３上に退避（Ｓｔ４）し、新ＢＰ
Ｕ２Ｄが該タスクの処理を続行できるようにする。新Ｂ
ＰＵ２Ｄはそれを受けて、該タスクを実行（Ｓｔ５）し
、オンライン業務を開始する。旧ＢＰＵ２Ｂ　は自らＢ
ＰＵ上の表示ランプを点灯（Ｓｔ６）し、処理を停止（
Ｓｔ７）する。その後、旧ＢＰＵ２Ｂ上の表示ランプが
点灯しているのを確認（Ｓｔ８）後、旧ＢＰＵ２Ｂを取
り外す（Ｓｔ９）。これで、ＢＰＵ交換は完了である。Next, a new BPU2D is inserted into the empty slot (St2). After that, the old BPU2B that you want to remove
Press the printed board removal request button (St3). Then, the old BPU2B saves the task currently being executed and its own printed board number onto the main storage device 3 (St4), and
Allow U2D to continue processing the task. New B
Upon receiving it, the PU2D executes the task (St5) and starts online business. Old BPU2B is B itself
Turn on the display lamp on the PU (St6) and stop the process (
St7). Thereafter, after confirming that the display lamp on the old BPU 2B is lit (St8), the old BPU 2B is removed (St9). The BPU exchange is now complete.

【００７３】図１８は、上記実施例における、旧ＢＰＵ
上で実行中のタスクとプリント板番号を新ＢＰＵに引継
ぐ手段を詳細に説明した図である。システムバスに旧Ｂ
ＰＵが３台（２Ａ，２Ｂ，２Ｃ）、新ＢＰＵ２Ｄ　、さ
らに主記憶装置が装着されている。旧ＢＰＵ２Ａ，２Ｂ
，２Ｃ上では、夫々タスク１，２，３、旧ＢＰＵ２Ｃ上
ではタスク２が実行中である。また、旧ＢＰＵ２Ａ，２
Ｂ，２Ｃのプリント板番号２８２は夫々１，２，３であ
る。その時に、取り外しＢＰＵを指定するために、旧Ｂ
ＰＵ２Ｂのプリント板取り外し要求ボタンが押されたと
すると、旧ＢＰＵ２Ｂは、処理を中断し、実行中のタス
ク２と自プリント板番号２を主記憶装置３上に退避する
。一方、新ＢＰＵ２Ｄ　は主記憶装置３上に退避された
プリント板番号２とタスク２を回復し、中断ポイントか
らタスクの処理を続行する。以上の方式を用いて、交換
したＢＰＵ間の業務の引き継ぎを行う。FIG. 18 shows the old BPU in the above embodiment.
FIG. 4 is a diagram illustrating in detail the means for taking over the task being executed above and the printed board number to a new BPU. old B on the system bus
Three PUs (2A, 2B, 2C), a new BPU 2D, and a main storage device are installed. Old BPU2A, 2B
, 2C, tasks 1, 2, and 3 are being executed, respectively, and task 2 is being executed on the old BPU 2C. Also, old BPU2A, 2
The printed board numbers 282 of B and 2C are 1, 2, and 3, respectively. At that time, in order to specify the removed BPU,
If the printed board removal request button of the PU 2B is pressed, the old BPU 2B interrupts processing and saves the task 2 being executed and its own printed board number 2 onto the main storage device 3. On the other hand, the new BPU 2D recovers the printed board number 2 and task 2 saved on the main memory 3, and continues processing the task from the interruption point. The above method is used to transfer business between the replaced BPUs.

【００７４】本実施例によれば、交換されるべきＢＰＵ
を指定する手段であるプリント板取外し要求ボタンを設
けることにより、ＢＰＵが複数装着されている場合でも
、システムを停止することなく、さらにはシステム性能
を低下させることなくＢＰＵを交換できるという長所が
ある。According to this embodiment, the BPU to be replaced
By providing a printed board removal request button, which is a means of specifying this, even if multiple BPUs are installed, the BPU can be replaced without stopping the system or degrading system performance. .

【００７５】また、交換するＢＰＵに割当てているプリ
ント板番号を交換ＢＰＵ間で引継ぐことにより、ユーザ
プログラムにより動作プリント板番号が指定されている
場合でも、ユーザプログラムを変更することなくＢＰＵ
を交換できるという長所がある。Furthermore, by inheriting the printed board number assigned to the replaced BPU between replaced BPUs, even if the operating printed board number is specified by the user program, the BPU can be changed without changing the user program.
It has the advantage of being able to be replaced.

【００７６】《挿入されたＢＰＵが正しく作動しなかった場合》一方
、交換された新ＢＰＵが万一正常に動作しない場合に、
システムに重大な影響を及ぼすという短所がある。図１９。図２０によれば、挿入されたＢＰＵの動作チェ
ックを実行する手段を有し、新しく挿入した新ＢＰＵが
万一正常に動作しない場合にもシステムへの影響を与え
ることがない。<<If the inserted BPU does not operate properly>> On the other hand, in the event that the replaced new BPU does not operate properly,
It has the disadvantage of having a serious impact on the system. Figure 19. According to FIG. 20, there is a means for checking the operation of the inserted BPU, and even if the newly inserted new BPU does not operate normally, the system will not be affected.

【００７７】図１９は、新ＢＰＵ２Ｂが挿入された状態
を示す図であり、このとき旧ＢＰＵ２Ａではあるタスク
が実行中である。新ＢＰＵ２Ｂが挿入されると、該ＢＰ
Ｕ上で動作チェックを行うため、ＢＰＵ自己診断プログ
ラム９２５を実行する。診断プログラムが正常に終了す
るまでは旧ＢＰＵＡにはボード挿入の連絡はしない。該
診断プログラム９２５により新ＢＰＵに故障箇所が発見
されると旧ＢＰＵへは連絡せず、自ＢＰＵ２Ｂ　の表示
ランプ２８０を点灯し、処理を停止する。旧ＢＰＵでは
、新ＢＰＵ挿入タイミングでタスク１を中断することな
く、何事もなかったかのようにタスクの処理を続行する
。FIG. 19 is a diagram showing a state in which the new BPU 2B has been inserted, and at this time a certain task is being executed in the old BPU 2A. When a new BPU2B is inserted, the corresponding BP
In order to check the operation on the U, the BPU self-diagnosis program 925 is executed. The old BPUA will not be notified of board insertion until the diagnostic program is completed normally. When a failure point is discovered in the new BPU by the diagnostic program 925, the display lamp 280 of the own BPU 2B is turned on and the processing is stopped without contacting the old BPU. The old BPU continues processing the task as if nothing had happened, without interrupting task 1 at the timing of inserting the new BPU.

【００７８】図２０は、上記実施例における、ＢＰＵ交
換手順を人による動作と計算機内部の処理に分けて処理
の内容を示したＢＰＵ交換手順処理フローである。Ｓｔ
１，Ｓｔ２，Ｓｔ４〜Ｓｔ８，Ｓｔ１１〜Ｓｔ１３の処
理については、図２１と全く同一の処理であるためここ
では説明を省略し、本実施例に特有の処理につき説明す
る。FIG. 20 is a flowchart of the BPU replacement procedure in the above embodiment, which shows the contents of the BPU replacement procedure divided into human operations and computer internal processing. St
1, St2, St4 to St8, and St11 to St13 are completely the same as those in FIG. 21, so their explanations will be omitted here, and only the processes specific to this embodiment will be explained.

【００７９】新ＢＰＵが挿入されると、まず該ＢＰＵの
動作チェックを実施するため診断プログラムを実行（Ｓ
ｔ３）する。該診断プログラムの結果、正常と判定され
た場合には、前実施例と同じく処理Ｓｔ４に移る。しか
し、故障と判定された場合には、挿入された新ＢＰＵ上
の表示ランプを点灯（Ｓｔ９）し、新ＢＰＵの処理を停
止（Ｓｔ１０）する。その後、新ＢＰＵ上の表示ランプ
の点灯を確認（Ｓｔ１４）し、新ＢＰＵを再度取り外す
（Ｓｔ１５）。この結果、ＢＰＵの交換は失敗に終った
ものの、旧ＢＰＵが処理を継続しているため、オンライ
ンシステムには影響を与えることはない。交換が成功し
たか否かは、ＢＰＵ挿入後，新旧ＢＰＵのどちらの表示
ランプが点灯するかにより判定する。When a new BPU is inserted, first run a diagnostic program (S
t3). As a result of the diagnostic program, if it is determined to be normal, the process moves to St4 as in the previous embodiment. However, if it is determined that there is a failure, the indicator lamp on the inserted new BPU is turned on (St9) and the processing of the new BPU is stopped (St10). Thereafter, it is confirmed that the display lamp on the new BPU is lit (St14), and the new BPU is removed again (St15). As a result, although the BPU replacement ended in failure, the online system is not affected because the old BPU continues processing. Whether or not the replacement was successful is determined by which indicator lamp of the old or new BPU lights up after the BPU is inserted.

【００８０】以上、本実施例の方式により、挿入された
ＢＰＵが正常に動作しない場合にも、オンラインシステ
ムには影響を排除することが可能となった。As described above, according to the method of this embodiment, even if the inserted BPU does not operate normally, it is possible to eliminate the influence on the online system.

【００８１】《異常発生前後の構成と処理》以上述べた
旧ＢＰＵ２Ａと新ＢＰＵ２Ｂ内のＭＰＵの処理並びに構
成を時系列的に示したものが図２１であり、正常運転時
にはＢＰＵ２Ａ　の３台のＭＰＵが運転しており、その
多数決結果が出力されている。そして処理Ｂの実行中に
ＭＰＵＣに障害が発生するとこれを切離し、ＭＰＵＡと
ＭＰＵＢによる多重化回路構成により運転が正常に継続
される。他方ＭＰＵＡの異常報知により新ＢＰＵ２Ｂ　
のプリント板を空スロットに挿入すると、新ＢＰＵ２Ｂ
　内の各ＭＰＵは自己診断を実施し、適宜の時点で処理
を旧ＢＰＵ２Ａから新ＢＰＵ２Ｂに移してＢＰＵ２Ｂ　
の３台のＭＰＵ（ＭＰＵＤ，ＭＰＵＥ，ＭＰＵＦ）の多
数決結果による処理Ｄを実行する。この処理引継ぎは、切りの良い時点または、修理保守時
期まで、当該ＢＰＵでの動作を継続させ、切りの良い時
点または、修理保守時期に当該ＢＰＵで実行した処理を
他の正常なＢＰＵに引き継がせれば良く、実際にはソフ
トウェアの都合で最も性能上望ましい時点で行うことが
できる。このようなタイミングとしては、タスク切替の
タイミングが一般的にはふさわしいことは明らかである
。なんとなれば、マルチプロセッサシステムにおけるプ
ロセッサの切替とまったく同一手順でＢＰＵの切替が可
能であり、引き継ぎに伴う余分な性能上のオーバーヘッ
ドを０にすることが可能であるからである。このため本
発明によれば、フォールト発生時のチェックポイントリ
スタートに備えてのバックアップ動作が不要となり、処
理性能を向上させることができる。<<Configuration and processing before and after abnormality occurrence>> Figure 21 shows the processing and configuration of the MPUs in the old BPU2A and new BPU2B described above in chronological order.During normal operation, the three MPUs of BPU2A is driving, and the majority vote result is output. If a failure occurs in MPUC during execution of process B, it is disconnected, and operation continues normally using the multiplexed circuit configuration of MPUA and MPUB. On the other hand, new BPU2B is activated due to MPUA abnormality notification.
When inserting the printed circuit board into the empty slot, the new BPU2B
Each MPU in the MPU performs self-diagnosis, transfers processing from the old BPU2A to the new BPU2B at an appropriate time, and transfers processing from the old BPU2A to the new BPU2B.
Process D is executed based on the majority vote of the three MPUs (MPUD, MPUE, MPUF). This processing handover allows the operation of the relevant BPU to continue until the appropriate time or repair/maintenance period is reached, and the processing executed by the relevant BPU at the appropriate time or the repair/maintenance period is transferred to another normal BPU. In fact, it can be performed at the most desirable point in terms of performance due to software considerations. It is clear that the timing of task switching is generally appropriate as such timing. This is because it is possible to switch BPUs using exactly the same procedure as switching processors in a multiprocessor system, and it is possible to reduce the extra performance overhead associated with handover to zero. Therefore, according to the present invention, there is no need for a backup operation in preparation for a checkpoint restart when a fault occurs, and processing performance can be improved.

【００８２】なお、フォールトが発生した場合には、ハ
ードウェアはフォールトの発生状況をレジスタに記録し
、オペレーティングシステムはコンテクストスイッチ時
や修理保守のための割込み処理時にレジスタを参照し、
処理の引継ぎが必要な場合には、処理引継ぎ先のＢＰＵ
に割込みなどで通知し、自ＢＰＵでの処理を終了する。ＢＰＵ２を構成する要素（ＭＰＵ，キャッシュメモリな
ど）の一部で故障が発生した場合、他の要素は正常であ
っても、本方式では処理引継ぎ後には、他の正常な要素
も含めてＢＰＵ２全体の使用を中止する。[0082] When a fault occurs, the hardware records the fault occurrence status in a register, and the operating system refers to the register when switching contexts or processing interrupts for repair and maintenance.
If it is necessary to take over the processing, select the BPU to which the processing will be taken over.
This is notified by an interrupt or the like, and the processing in the own BPU is terminated. If a failure occurs in some of the elements that make up BPU2 (MPU, cache memory, etc.), even if other elements are normal, in this method, after processing is taken over, the entire BPU2, including other normal elements, will fail. discontinue use.

【００８３】図２２に、フォールトトレランスの為に冗
長化したＭＰＵＡ，ＭＰＵＢ，ＭＰＵＣが故障などの原
因で障害をうけた場合の引継ぎ時の本発明方式と公知例
との構成の相違を模式的に示す。従来の方法では、障害
をうけたＭＰＵＡのみを正常なＭＰＵＤと交換する方法
を採っていた。これに対し、本発明による方法では、障
害をうけたＭＰＵＡだけでなく、正常なＭＰＵＢ，ＭＰ
ＵＣも新たにＭＰＵＤ，ＭＰＵＥ，ＭＰＵＦと交換して
いる。以上の様にすることにより、フォールトトレラン
スの為に冗長化したＭＰＵの組合わせ、すなわちＭＰＵ
Ａ，ＭＰＵＢ，ＭＰＵＣの組合わせを固定化することが
できる。従ってＭＰＵの組合わせを交換単位にすれば、それぞれ
の組合わせを構成するＭＰＵ間を高速のクロックで結合
することができ、高速のフォールトトレラントコンピュ
ータを実現することができる。また従来のように、ＭＰ
Ｕの組替えに伴う種々のハードウェア，ソフトウェアが
不要である。FIG. 22 schematically shows the difference in configuration between the system of the present invention and the known example at the time of takeover when MPUA, MPUB, and MPUC, which have been made redundant for fault tolerance, are damaged due to a failure or the like. show. In the conventional method, only the failed MPUA is replaced with a normal MPUD. In contrast, in the method according to the present invention, not only the failed MPUA but also the normal MPUB, MP
UC has also been newly replaced with MPUD, MPUE, and MPUF. By doing the above, the combination of MPUs made redundant for fault tolerance, that is, the MPU
The combination of A, MPUB, and MPUC can be fixed. Therefore, if a combination of MPUs is used as a unit of exchange, the MPUs forming each combination can be connected using a high-speed clock, and a high-speed fault-tolerant computer can be realized. Also, as in the past, MP
Various hardware and software associated with recombination of U are not required.

【００８４】なお、ＢＰＵは単一故障の場合には動作を
継続することができるので、この処理引継ぎは故障発生
後直ちに行う必要は無く、処理の切りの良い時点または
、修理保守時に処理引継ぎを行えばよい。[0084] Furthermore, since the BPU can continue operating in the case of a single failure, it is not necessary to take over the processing immediately after the occurrence of a failure. Just go.

【００８５】本実施例により処理を継続しながら、故障
の発生したＢＰＵ２０−１の配線基板を引き抜き正常な
配線基板を交換することができる。According to this embodiment, the wiring board of the faulty BPU 20-1 can be pulled out and a normal wiring board replaced while the processing is continued.

【００８６】ＶＩＩ．各部回路の代案変形例以上、本発
明について説明したが、本発明の各部回路等は適宜変更
して実現することができる。以下、これらの代案，変形
例について説明する。VII. Alternative Modifications of Each Part Circuit Although the present invention has been described above, each part circuit etc. of the present invention can be appropriately changed and realized. These alternatives and modifications will be explained below.

【００８７】《多数決論理部》図２３は、図２の多数決論理回路部の組方と切替の様子
を、他の構成要件を省いて簡略化し理解しやすい形にし
て示したものであり、ＭＰＵＡとＭＰＵＣを出力専用に
固定化して用い、ＭＰＵＢをＭＰＵＡとＭＰＵＣの健全
性確認の参照用としてのみ用いるとともに、ＭＰＵＡあ
るいはＭＰＵＣの異常時には健全性の確認された方の一
つの出力を共通に用いて２組のキャッシュメモリに供給
するようにしたものである。この方式の場合、ＭＰＵの
出力が多数決回路を通らずに直接キャッシュメモリに入
力されるので、多数決回路での遅延時間の分キャッシュ
メモリアクセス時間を短縮できる。<<Majority Logic Section>> FIG. 23 shows how the majority logic circuit section shown in FIG. and MPUC are fixed for output only, and MPUB is used only as a reference for checking the health of MPUA and MPUC, and in the event of an abnormality in MPUA or MPUC, the output of one of the ones whose health has been confirmed is commonly used. The data is supplied to two sets of cache memories. In this method, the output of the MPU is input directly to the cache memory without passing through the majority circuit, so that the cache memory access time can be shortened by the delay time in the majority circuit.

【００８８】本発明においては、以上のようにして多数
決論理を用いて３重系を２重系に切替て運転継続するも
のであり、本発明の変形例としてはこの方式以外にも種
々のものとすることができる。例えば、図２５では３つ
のＭＰＵの出力を多数決選択回路２１０と２１１に夫々
与え、３つのＭＰＵの中から健全性の確認された１つの
出力を選択する。この場合、故障した方の多数決選択回
路に接続されているキャッシュメモリのデータが破壊さ
れるが、正常な多数決選択回路に接続されているキャッ
シュメモリのデータを用いて運転継続できる。In the present invention, as described above, the triple system is switched to the double system and operation is continued using the majority logic, and there are various other variations of the present invention in addition to this system. It can be done. For example, in FIG. 25, the outputs of three MPUs are given to majority selection circuits 210 and 211, respectively, and one output whose soundness has been confirmed is selected from among the three MPUs. In this case, data in the cache memory connected to the failed majority selection circuit is destroyed, but operation can be continued using data in the cache memory connected to the normal majority selection circuit.

【００８９】また、図２４のようにＭＰＵの出力をゲー
ト回路，切替回路等を通さずに直接キャッシュメモリに
入力し、異常となったＭＰＵから信号を受けるキャッシ
ュメモリの動作を停止して以降そのデータを使用しない
ようにすれば、さらにゲート回路，切替回路等の遅延時
間の分キャッシュメモリアクセス時間を短縮することが
できる。しかも多くの信号線からなるアドレスバス，デ
ータバスの切替手段が不要となるのでハード量を減少さ
せることができる。In addition, as shown in FIG. 24, the output of the MPU is directly input to the cache memory without passing through the gate circuit, switching circuit, etc., and the operation of the cache memory that receives the signal from the abnormal MPU is stopped and then the By not using data, the cache memory access time can be further shortened by the delay time of gate circuits, switching circuits, etc. Moreover, since switching means for address buses and data buses consisting of many signal lines is not required, the amount of hardware can be reduced.

【００９０】図２６は４台のＭＰＵを備え、ＭＰＵＡと
ＭＰＵＣを出力専用に固定し、ＭＰＵＢとＭＰＵＤをそ
れらの参照用に用い、２組の出力一致により出力専用Ｍ
ＰＵの出力を夫々与えるものである。なお、ＭＰＵの異
常時には、健全側のものに切替て使用する方法とか、異
常となったＭＰＵから信号を受けるキャッシュメモリの
動作を停止して以降そのデータを使用しないようにする
方法等で対応できる。FIG. 26 has four MPUs, MPUA and MPUC are fixed for output only, MPUB and MPUD are used for reference, and output-only MPU is set by matching the outputs of the two sets.
It gives the output of each PU. In addition, in the event of an abnormality in the MPU, countermeasures can be taken such as switching to a healthy one for use, or stopping the operation of the cache memory that receives signals from the abnormal MPU and preventing the data from being used thereafter. .

【００９１】《キャッシュデータのリードアクセス部》また、キャッ
シュメモリについてみると、キャッシュメモリ２２０，
２２１の出力（データ）はパリティチェックにより正常
／異常が判断できるので、図２７のようにパリティチェ
ック２５０により正常と判断されたキャッシュメモリの
出力を切替手段２６０を通じてＭＰＵＡ，ＭＰＵＢ，Ｍ
ＰＵＣに入力する。また、両方のキャッシュメモリが正
常である場合には、キャッシュメモリの主系，従系を予
め決めておき、主系の出力を選択すればよい。<<Cache data read access section>> Also, looking at the cache memory, the cache memory 220,
Since the output (data) of 221 can be determined to be normal or abnormal by a parity check, the output of the cache memory determined to be normal by the parity check 250 is transferred to MPUA, MPUB, M through the switching means 260 as shown in FIG.
Enter in PUC. Further, if both cache memories are normal, it is sufficient to decide in advance the main system and slave system of the cache memory, and select the output of the main system.

【００９２】又、図２８のようにＭＰＵＡ，ＭＰＵＢは
接続するキャッシュをそれぞれキャッシュメモリを２２
０，２２１に固定しておきＭＰＵＢのみに選択したキャ
ッシュメモリの出力を入力してもよい。この場合、いず
れかのキャッシュメモリが故障しても３つのうちの２つ
のＭＰＵに正常な動作をさせることができ、しかもハー
ド量を削減することができる。Furthermore, as shown in FIG. 28, MPUA and MPUB each have a cache memory of 22
It is also possible to fix it to 0,221 and input the output of the selected cache memory only to MPUB. In this case, even if any of the cache memories fails, two of the three MPUs can operate normally, and the amount of hardware can be reduced.

【００９３】[0093]

【発明の効果】本発明では、障害発生時にその一部を切
離し、復旧時にプロセッサの全てを新たな別のプロセッ
サ群に切替てしまうので、プロセッサの組替えに伴う種
々の障害が排除される。According to the present invention, when a failure occurs, a part of the processors is disconnected, and upon recovery, all of the processors are switched to a new, different processor group, thereby eliminating various failures that occur when processors are rearranged.

【図面の簡単な説明】[Brief explanation of the drawing]

【図１】本発明の全体システム構成を示す図。FIG. 1 is a diagram showing the overall system configuration of the present invention.

【図２】本発明のＢＰＵの構成を示す図。FIG. 2 is a diagram showing the configuration of a BPU of the present invention.

【図３】ＭＰＵ出力チェック回路の一実施例図。FIG. 3 is a diagram of an embodiment of an MPU output check circuit.

【図４】ライトアクセスでの異常時のＢＰＵの構成を示
す図。FIG. 4 is a diagram showing the configuration of a BPU when an abnormality occurs in write access.

【図５】リードアクセスでの異常時のＢＰＵの構成を示
す図。FIG. 5 is a diagram showing the configuration of a BPU when an abnormality occurs in read access.

【図６】バスサイクル制御フロー図。FIG. 6 is a bus cycle control flow diagram.

【図７】ＭＰＵ正常時のＢＰＵ内の信号の流れを示す図
。FIG. 7 is a diagram showing the flow of signals within the BPU when the MPU is normal.

【図８】ＭＰＵ異常時のＢＰＵ内の信号の流れを示す図
。FIG. 8 is a diagram showing the flow of signals within the BPU when the MPU is abnormal.

【図９】ＭＰＵ正常時のＢＰＵ内の信号の流れを示す図
。FIG. 9 is a diagram showing the flow of signals within the BPU when the MPU is normal.

【図１０】アドレス信号異常時のＢＰＵ内の信号の流れ
を示す図。FIG. 10 is a diagram showing the flow of signals within the BPU when an address signal is abnormal.

【図１１】データ信号異常時のＢＰＵ内の信号の流れを
示す図。FIG. 11 is a diagram showing the flow of signals within the BPU when a data signal is abnormal.

【図１２】計算機盤構成を示す図。FIG. 12 is a diagram showing a computer board configuration.

【図１３】ＢＰＵ交換原理説明図。FIG. 13 is a diagram explaining the principle of BPU replacement.

【図１４】ＢＰＵ交換手順を示す図。FIG. 14 is a diagram showing a BPU replacement procedure.

【図１５】新旧ＢＰＵの処理引継を示す図。FIG. 15 is a diagram showing processing takeover between old and new BPUs.

【図１６】マルチプロセッサ時のＢＰＵ交換原理説明図
。FIG. 16 is an explanatory diagram of the BPU exchange principle in multiprocessor mode.

【図１７】マルチプロセッサ時のＢＰＵ交換手順を示す
図。FIG. 17 is a diagram showing a BPU exchange procedure in the case of multiprocessors.

【図１８】マルチプロセッサ時の新旧ＢＰＵ処理引継を
示す図。FIG. 18 is a diagram showing the inheritance of old and new BPU processing in multiprocessor mode.

【図１９】挿入ＢＰＵ故障時のＢＰＵ交換処理を示す図
。FIG. 19 is a diagram showing BPU replacement processing when an inserted BPU fails.

【図２０】挿入ＢＰＵ故障時のＢＰＵ交換処理フロー図
。FIG. 20 is a flow diagram of BPU replacement processing when an inserted BPU fails.

【図２１】ＢＰＵ故障時の処理の引継ぎを示す図。FIG. 21 is a diagram showing the succession of processing when a BPU fails.

【図２２】ＢＰＵ故障時の処理の引継ぎを示す図。FIG. 22 is a diagram showing the succession of processing when a BPU fails.

【図２３】３ＭＰＵによる比較照合の実施例図。FIG. 23 is an example diagram of comparison and verification using 3MPU.

【図２４】３ＭＰＵによる比較照合の他の実施例図。FIG. 24 is a diagram showing another example of comparison and verification using 3MPU.

【図２５】多数決方式の他の実施例図。FIG. 25 is a diagram showing another embodiment of the majority voting system.

【図２６】４ＭＰＵによる比較照合の実施例図。FIG. 26 is an example diagram of comparison and verification using 4MPUs.

【図２７】キャッシュデータのリードアクセスを示す図
。FIG. 27 is a diagram showing read access to cache data.

【図２８】キャッシュデータのリードアクセスの他の実
施例図。FIG. 28 is a diagram showing another embodiment of cache data read access.

[Explanation of symbols]

１…システムバス、２…ＢＰＵ、１０，１１，１２，１
３，１４，１５…パリティ生成／照合回路、２０…ＭＰ
Ｕ、２３…ＭＰＵ出力チェック回路、２７…ＢＩＵ（バ
スインタフェースユニット）、３０，３１…パリティチ
ェック回路、２００乃至２０５，２６，２７，２９…３
ステートバッファ、２２０，２２１…キャッシュメモリ
、２３４，２３５…エラーチェック回路。1...System bus, 2...BPU, 10, 11, 12, 1
3, 14, 15... Parity generation/verification circuit, 20... MP
U, 23... MPU output check circuit, 27... BIU (bus interface unit), 30, 31... parity check circuit, 200 to 205, 26, 27, 29... 3
State buffer, 220, 221... Cache memory, 234, 235... Error check circuit.

Claims

[Claims]

1. A highly reliable computer system comprising a system bus, a main storage device connected to the system bus, and a basic processing unit connected to the system bus, wherein the basic processing unit includes a first, a second, and a third basic processing unit. It is equipped with two processors to execute the same operation, and when the first processor fails, the second and third processors
High reliability characterized in that the same operation is executed by the first processor, the processing is then shifted to the same operation by the fourth, fifth, and sixth processors, and the operations by the second and third processors are stopped. How to recover your computer system.

2. A highly reliable computer system comprising a basic processing unit comprising a system bus, a main memory connected to the system bus, and a plurality of processors connected to the system bus and executing the same operation, wherein the basic processing unit comprises: a system bus; A method for restoring a highly reliable computer system, characterized in that the plurality of processors have completely different sets before the occurrence of a failure and after recovery from the failure.

[Claim 3] A plurality of processors that execute the same operation;
A selection circuit that selects a plurality of outputs whose soundness has been confirmed among the processors, a plurality of interface units that output the outputs of the selected plurality of processors to the outside and take in external input, and arithmetic operations in the processor. A basic processing unit consists of a plurality of cache memories that store information necessary for processing, and an internal bus provided between them, and these are provided on a single processor board. If the basic processing unit fails, operation continues using the remaining configuration excluding the failed part.
A basic processing unit recovery method characterized by replacing the processor board as a unit when restoring the normal configuration.

4. A highly reliable computer system comprising a plurality of system buses, a main storage device connected to the plurality of system buses, and a plurality of basic processing units connected to the plurality of system buses. The unit includes a plurality of processors that execute the same operation, a selection circuit that selects a plurality of outputs whose soundness has been confirmed from among the processors, outputs each of the selected outputs to the outside, and receives an external input. multiple interface units for storing information, multiple cache memories for storing information necessary for calculations in the processor,
The internal bus provided between these units is mounted on a single processor board, and normally the multiple processors execute calculations, and if the internal circuit of the basic processing unit fails, operation continues except for the failed part. ,
A method for restoring a highly reliable computer system, comprising transferring processing to another basic processing unit on the system bus and stopping a failed basic processing unit when restoring the normal configuration.

Claim 5: A plurality of processors that execute the same operation;
A selection circuit that selects a plurality of outputs whose soundness has been confirmed from among the processors, a plurality of interface units that output each of the selected outputs to the outside and take in external inputs, and a selection circuit that is necessary for calculations in the processor. A basic processor consisting of multiple cache memories that store information, and an internal bus provided between them, and multiple independent calculation units consisting of a processor, an interface unit, a cache memory, and an internal bus. Normally, calculations are executed using multiple calculation units, and when the basic processor fails, the failed calculation unit is stopped and operation is continued with the normal operation unit, and when the basic processor is restored, the normal calculation unit is restarted. A basic processor recovery method characterized by replacing a basic processor that continues to operate only as a unit with another basic processor.

6. A highly reliable computer system comprising a plurality of system buses, a main storage device connected to the plurality of system buses, and a plurality of basic processors connected to the plurality of system buses, wherein the basic processor is , multiple processors performing the same operation,
A selection circuit that selects a plurality of outputs whose soundness has been confirmed among the processors, a plurality of interface units that output the selected outputs to the outside and take in external inputs, and arithmetic operations in the processor. It is composed of a plurality of cache memories for storing information necessary for the processing, and an internal bus provided between them, and a plurality of independent calculation units each consisting of a processor, an interface unit, a cache memory, and an internal bus are provided. The feature is that in the event of a failure, the faulty processing unit is stopped and operation continues in the normal processing unit, and upon recovery, processing is taken over from the operation in the normal processing unit to a new basic processor and then stopped. A method for restoring highly reliable computer systems.

7. A highly reliable computer system comprising a plurality of system buses, a main storage device connected to the plurality of system buses, and a basic processing unit connected to the plurality of system buses and constituted by one processor board, The basic processing unit is a multi-system system consisting of a processor that executes the same operation, an interface unit that outputs the processor output to the outside and takes in external input, and a cache memory that stores information necessary for the operation in the processor. A basic processing unit that has a failure continues to operate with the remaining configuration excluding the failed part, and also transfers processing to a new basic processing unit connected to the system bus and then stops. A method for restoring highly reliable computer systems.

8. A highly reliable computer system in which a basic processor composed of a plurality of processors that execute the same operation is connected to a system bus and operated,
1. A method for restoring a highly reliable computer system, characterized in that a product set of a plurality of basic processors that execute predetermined processing before a failure occurs and after a failure occurs is emptied.

9. A system bus comprising a plurality of slots for inserting boards, into which a main storage board and one basic processor board composed of a plurality of processors that execute the same operation are inserted. In a high-reliability computer system that operates as a high-reliability computer system, the basic processor board has a display means indicating an inoperable state in a part thereof, and recovery from a degraded operating state due to a malfunction of a part of the processor is performed as follows. How to restore a trusted computer system. a. The old basic processor board detects that the new basic processor board is inserted into an empty slot and is ready for operation, and saves the task being executed to the main memory. b. The new basic processor board executes the task saved in the main memory, and the display means of the old basic processor board displays the stopped state. c. Stop the old basic processor board.

10. A system bus comprising a plurality of slots for inserting boards, into which a main storage board and a basic processor board consisting of a plurality of processors that execute the same operation are inserted and operated. A recovery method for a highly reliable computer system in which recovery from a degraded operation state due to failure of some processors is performed as follows. a. The old basic processor board detects that the new basic processor board is inserted into an empty slot and is ready for operation, and saves the task being executed to the main memory. b. The new basic processor board performs self-diagnosis and executes the tasks saved in the main memory only when the board is normal. c. Stop the old basic processor board.

11. A system bus comprising a plurality of slots for inserting boards, into which a main storage board and a plurality of basic processor boards each consisting of a plurality of processors that execute the same operation are inserted. In highly reliable computer systems that operate in How to recover a computer system. a. The old basic processor board saves the task being executed and the identification number indicating its own board to the main memory in response to a signal from the board removal request means provided therein. b. The new basic processor board inputs the task and identification number saved in the main memory and continues to execute the processing that the old basic processor board should have executed. c. Stop the old basic processor board.

12. A system bus comprising a plurality of slots for inserting boards, into which a main storage board and a plurality of basic processor boards each consisting of a plurality of processors that execute the same operation are inserted. In highly reliable computer systems that operate in How to recover computer system. a. The old basic processor board saves the task being executed and the identification number indicating its own board to the main memory in response to a signal from the board removal request means provided therein. b. The new basic processor board performs a self-diagnosis, and only if it is normal, inputs the task and identification number saved in the main memory, and continues executing the process that the old basic processor board should have executed. c. Stop the old basic processor board.

[Claim 13] A computer system comprising a plurality of slots for inserting boards on the system bus, and the slots are composed of a main storage board and a plurality of processors that execute the same operation, and a failure has occurred in some of the circuits. In a high-reliability computer system that operates with multiple basic processor boards inserted, which continues to operate in the remaining configuration except for the faulty part, when all of the multiple slots have operating boards inserted, any Remove the board, insert the new basic processor board in its place, transfer the processing of the old basic processor board that is still operating with some circuits due to a failure to the new basic processor board, and stop the old basic processor board. 1. A method for restoring a highly reliable computer system, comprising: removing the board from the slot, and inserting the removed arbitrary board into the slot position after removal and operating the board.

14. A processor board and a main memory board that are equipped with a plurality of processors that perform the same calculation and that continue to operate by the remaining processors except for the faulty processor when a fault occurs are installed in slots in a storage rack, and these boards In a highly reliable computer system in which processor boards and main memory boards installed in slots are connected via a system bus, all of the processor boards and main memory boards installed in the slots are in operation, and a spare processor board for replacement in the event of a failure is installed in the slot. A highly reliable computer system that is not installed in

15. A board installed in a plurality of slots provided in a board storage rack, means for supplying power to the board when the board is installed, and electrically connected to the board when the board is installed in the slot. The board is equipped with a system bus inside the panel, and the board performs input/output of data between the processor board equipped with the processor, the main memory board that stores data used for processing by the processor, and the outside. In a highly reliable computer system including an input/output interface board that performs
A means for outputting the output of a processor whose soundness has been confirmed to the system bus by having three or more processors that take in data from an input/output interface board and perform the same calculation based on this data, and a means for preventing failure of some of the processors. High reliability characterized in that the processor starts operating by installing the board, and the processing tasks on the plurality of processor boards are different from each other. computer system.

16. A computer board that houses a storage rack having a plurality of slots, a board installed in the slot of the storage rack, and a system bus connected to the board, which performs data processing among the boards. The processor board includes a plurality of processors that perform the same operation, an output section for externally outputting a selected processor output, and an abnormality detection section that detects a processor abnormality and excludes the abnormal processor. A computer board characterized by:

17. A storage rack having a plurality of slots, a board installed in the slot of the storage rack that includes a processor board that processes data, and a main memory board that stores data necessary for data processing. , and a system bus connected to the board, the processor board having a removal request means for requesting removal from the storage rack, and a first storage means for storing tasks executed by the processor. a second storage means for storing the board number of the processor board; a means for transferring the task and the processor number to the main memory board in response to the removal request and stopping calculations on the processor board; and a main memory board. Receive the above task and board number,
A computer board characterized by comprising means for starting an operation by the processor board.

18. A processor board storage rack comprising a plurality of processors, wherein when one of the processors fails, the output of the processor paired with this processor and whose health has been confirmed is outputted to the outside. In a computer system that is installed in a slot and in which multiple boards installed in the slot are connected by a system bus, the slot receives the necessary data transfer from the processor board including the failed processor, and the transfer is completed. A computer system characterized in that another processor board capable of starting calculations by multiple processors can be installed.

19. A processor board including a plurality of processors and display means for indicating the operating status of the processors is provided on a system bus, and this processor board is connected to other processor boards on the system bus. means for detecting that another processor board is connected, means for transmitting data held in the own processor board to the system bus when the means detects that another processor board is connected, and means for displaying the own processor board after data transmission is completed. 1. A computer system comprising: means for changing the display state of the computer; and means for stopping the processor of its own processor board after data transmission is completed.

20. A system bus includes a plurality of processor boards each having a plurality of processors and display means for indicating the operating status of the processors, and each of these processor boards stores a unique identification number for each processor board. storage means, stop request means for externally instructing the stop of this processor board, means for detecting a processor board stop request command from the stop request means, and when the stop request command is detected by the means, a stop request command is sent to the own processor board; means for transmitting retained data and a unique identification number stored in the storage means to the system bus; means for changing the display state of the display means of its own processor board after completion of transmission of data; and means for changing the display state of the display means of its own processor board after completion of transmission of data A computer system comprising means for stopping a processor on its own processor board.