JPH08190494A

JPH08190494A - High-reliability computer with dual processors

Info

Publication number: JPH08190494A
Application number: JP7002390A
Authority: JP
Inventors: Shinichiro Yamaguchi; 伸一朗山口; Tetsuaki Nakamigawa; 哲明中三川; Naoto Miyazaki; 直人宮崎; Yoshihiro Miyazaki; 義弘宮崎; Kazuhiro Hiuga; 一弘日向; Suketaka Ishikawa; 佐孝石川; Hiroshi Oguro; 浩大黒
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 1995-01-11
Filing date: 1995-01-11
Publication date: 1996-07-23

Abstract

PURPOSE: To provide a high-reliability computer which is highly common with general-purpose computers and also high in performance-cost ratio. CONSTITUTION: The computer consisting of two CPUs 1A and 1B of the same constitution and an input/output device has an identical-frequency, in-phase clock supply means for both the CPUs 1A and 1B, a doubled controller DSBA 2 which connects both the CPU 1A and 1B to the input/output device, and a communication means which sends and receives the states of the CPUs, etc., between both the CPUs 1A and 1B. The DSBA 2 selects an output instruction from one CPU and sends it to the input/output device, sends a response from the input/output device to both the CPUs 1A and 1B, informs the memories in the CPUs 1A and 1B of memory access from the an input/output device, and selects a memory access response from one CPU and sends it to the input/ output device. If a fault occurs in the CPU 1A or 1B, the DSBA 2 automatically disconnects the faulty CPU and continues to execute a program with the sound CPU. Consequently, high reliability is easily obtained at a low cost and a fault position can be replaced during on-line transaction execution.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は高信頼化コンピュータ
（フォールトトレラントコンピュータ）の構成にかか
り、特に二重化されたプロセッサとメモリを単一の入出
力バスに接続したフォールトトレラントコンピュータに
関するものである。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a highly reliable computer (fault tolerant computer), and more particularly to a fault tolerant computer in which a dual processor and memory are connected to a single input / output bus.

【０００２】[0002]

【従来の技術】交通管制システム，金融システム等の社
会の根幹を占める機能をコンピュータが担うようになっ
てきている。これらの機能を担うコンピュータに障害が
発生し動作が停止すると、社会に大きな混乱を与える。
従って、コンピュータの信頼性が益々要求されてきてい
る。2. Description of the Related Art Computers have come to carry out the functions that occupy the core of society, such as traffic control systems and financial systems. When a computer that performs these functions fails and stops operating, it causes a great deal of confusion in society.
Therefore, there is an increasing demand for computer reliability.

【０００３】このようなコンピュータに対する高信頼化
の要求は、電子制御(コントローラ)の分野で従来より研
究がなされており、特開昭57−20847 号公報に示すよう
な多重計算機システムが発案されている。特開昭57−20
847 号公報では、高信頼化の方式として複数のコンピュ
ータで同一の演算を行わせて、データ出力の時点でこれ
らを比較し、正しいものを出力する方法が示されてい
る。このような方法は、ソフトウエアで出力のタイミン
グを合わせて、比較することが前提となっており比較的
小規模な制御システムならば適用可能である。しかし、
大規模かつ複雑な動きをする近年のアプリケーションソ
フトウエアでは、データ比較のために多大な工数が必要
となり適用できない。このような問題点に対応するため
に、ハードウエア主体でデータ比較を行う下記の高信頼
化技術が、発案されている。The demand for high reliability of such a computer has been studied in the field of electronic control (controller), and a multi-computer system as shown in Japanese Patent Laid-Open No. 57-20847 was devised. There is. JP-A-57-20
Japanese Laid-Open Patent Publication No. 847 discloses a method of increasing reliability, in which a plurality of computers perform the same arithmetic operation, compare these at the time of data output, and output the correct one. Such a method is based on the premise that the output timings are matched by software and compared, and can be applied to a relatively small-scale control system. But,
Large-scale and complicated application software of recent years cannot be applied because it requires a large number of man-hours for data comparison. In order to deal with such a problem, the following high-reliability technology which compares data mainly by hardware has been proposed.

【０００４】コンピュータの高信頼化(フォールトトレ
ランス：Fault-tolerance）技術に関しては、特開平2−
202638 号公報がある。この方式のように多数決に加わ
っているプロセッサが独立したクロックで動作している
とプロセッサ間で同期を採るためになんらかの工夫が必
要である。特開平2−202638 号公報はプロセッサ間で同
期を採る手段に関する発明である。Regarding the computer high reliability (Fault-tolerance) technology, Japanese Patent Laid-Open No.
There is 202638 publication. As in this system, if the processors participating in the majority vote are operating with independent clocks, some kind of ingenuity is required to synchronize the processors. Japanese Unexamined Patent Publication No. 2-202638 is an invention relating to means for achieving synchronization between processors.

【０００５】またより高い処理性能のコンピュータへの
要求に応えて、処理性能向上のために従来からマルチプ
ロセッサ構成のコンピュータが広く用いられている。マ
ルチプロセッサ構成のコンピュータのフォールトトレラ
ンス技術としては文献「日経エレクトロニクス１９８３
年５月９日号第１９７頁から第２０２頁」がある。In response to the demand for computers having higher processing performance, computers having a multiprocessor structure have been widely used conventionally for improving the processing performance. As a fault-tolerance technique for a computer with a multiprocessor configuration, refer to the document "Nikkei Electronics 1983.
May 1977 issue, pages 197 to 202 ”.

【０００６】この文献には、Pair and spare法と呼ばれ
る、自己診断機能のあるメモリ，プロセッサなどからな
る配線基板２枚を１組にして動作をする技術が記載され
ている。片方の配線基板上の回路でフォールトが生じた
場合には、もう一方の配線基板上の回路で処理動作を続
ける方式である。この方法によれば、フォールト発生時
でも動作が続けられるのでチェックポイントリスタート
(Checkpoint Restart)と呼ばれるフォールト発生時点以
前のチェックポイントからの処理のやりなおし動作が不
要となる。[0006] This document describes a technique called pair and spare method for operating two wiring boards composed of a memory, a processor and the like having a self-diagnosis function as one set. When a fault occurs in the circuit on one wiring board, the processing operation is continued on the circuit on the other wiring board. With this method, operation continues even if a fault occurs, so checkpoint restart
It is not necessary to restart the process from the checkpoint before the fault occurrence, called (Checkpoint Restart).

【０００７】また、他の高信頼化技術としては、米国特
許第4907228号（特開平1−154240号公報）及び米国特許
第5255367号（特開平1−154241号公報）がある。これ
は、二つのプロセッサから伸びるデータパス（２重レー
ル）にメモリなどの共有リソースを接続し、共有リソー
スの入り口で二つのデータバスからの信号を比較するこ
とによりエラーの検出を可能とする基本データ処理装置
を１組（２台）準備する。そして、１組のデータ処理装
置で共有される入出力装置には、入り口での比較による
エラー検出手段を持つ高信頼な計算機構成方法が示され
ている。[0007] Other high reliability techniques include US Pat. No. 4,907,228 (JP-A-1-154240) and US Pat. No. 5255367 (JP-A-1-154241). This is a basic that enables error detection by connecting a shared resource such as memory to a data path (double rail) extending from two processors and comparing signals from two data buses at the entrance of the shared resource. Prepare one set (two) of data processing devices. A highly reliable computer configuration method having error detection means by comparison at the entrance is shown for an input / output device shared by a set of data processing devices.

【０００８】またさらに他の高信頼化技術としては、特
開平4−241039 号公報がある。これは、障害時の取替え
単位である各々の配線基板上に構成されるＢＰＵ自体に
フォールトトレランス機能を持たせたものである。ＢＰ
Ｕ内で障害が発生してもフォールトトレランス機能によ
り次の「切りの良い時点」（以下便宜上チェックポイン
トと呼ぶ）まで正常動作を継続し、チェックポイントで
他のＢＰＵに処理を引き継ぐものである。この場合、
「切りの良い時点」（チェックポイント）は例えばタス
ク切り替えの時点などに設定するのが適当である。Further, as another technique for improving reliability, there is JP-A-4-241039. In this, the BPU itself formed on each wiring board, which is a replacement unit at the time of failure, has a fault tolerance function. BP
Even if a failure occurs in U, the fault tolerance function allows the normal operation to continue until the next "breaking point" (hereinafter referred to as a checkpoint for convenience), and the process is handed over to another BPU at the checkpoint. in this case,
It is appropriate to set the “breaking point” (check point) at, for example, a task switching point.

【０００９】そして、ＢＰＵ内で障害が発生しても次の
チェックポイントまで正常動作を継続させるために、Ｂ
ＰＵを構成する各要素を多重化（冗長化）し、正常な要
素を組み合わせて動作を継続させる。パリティチェック
などにより障害検出が可能なキャッシュメモリなどは二
重化し、正常なキャッシュメモリを選択して用いる。Ｍ
ＰＵに汎用品を用いる場合はＭＰＵにチェック機能を持
たせることはできないので、ＭＰＵの出力信号を比較照
合して正常異常を判断し、正常なものを選択して用いる
ために三重化又は四重化する。In order to continue normal operation until the next checkpoint even if a failure occurs in the BPU, B
Each element forming the PU is multiplexed (redundant), and normal elements are combined to continue the operation. A cache memory that can detect a failure by parity check or the like is duplicated, and a normal cache memory is selected and used. M
When a general-purpose product is used for the PU, the MPU cannot have a check function, so the output signals of the MPU are compared and compared to determine normality / abnormality. Turn into.

【００１０】このようにして取替え単位である各々の配
線基板上に構成されるＢＰＵ自体が内部で障害が発生し
ても次のチェックポイントまで処理を継続できるので、
障害発生時のチェックポイントリスタートに備えてチェ
ックポイント時点の状態を保存する動作による処理性能
の低下を減少することが出来る。しかもペアとなるＢＰ
Ｕが不要であるので、別々のＢＰＵ間のクロック同期の
ための信号線は不要となり、クロックの高速化が可能と
なる。In this way, even if the BPU itself formed on each wiring board as a replacement unit has an internal failure, the processing can be continued until the next check point.
It is possible to reduce the deterioration of processing performance due to the operation of saving the state at the checkpoint in preparation for the restart of the checkpoint when a failure occurs. Moreover, a pair of BP
Since U is unnecessary, a signal line for clock synchronization between different BPUs is unnecessary, and the speed of the clock can be increased.

【００１１】また、取替え単位を構成するＭＰＵが同一
クロックで動作するので、ＭＰＵ間の同期のための特別
な動作が不要となり、このための処理性能の低下がな
い。Further, since the MPUs constituting the replacement unit operate with the same clock, no special operation for synchronization between MPUs is required, and there is no reduction in processing performance.

【００１２】以上の従来技術は、いずれもソフトウエア
実行に必要な最小限の環境であるプロセッサとメモリを
多重化し、これらの部分に障害が発生したときにはハー
ドウエアで障害部位を切り離して、プログラムの継続を
保証しようとするものである。つまりプログラムから
は、プロセッサとメモリ部分の障害が全く見えない（透
過）ことを実現するものであり、高信頼システム構築の
ための特殊なプログラミングを軽減するための重要技術
となっている。In the above conventional techniques, the minimum environment required for software execution is the multiplex of the processor and memory, and when a failure occurs in these parts, the failure part is separated by the hardware, and the program is executed. It is intended to guarantee continuity. In other words, the program realizes that the failure of the processor and the memory is completely invisible (transparent), which is an important technology for reducing special programming for constructing a highly reliable system.

【００１３】[0013]

【発明が解決しようとする課題】これらの従来技術は、
一般に流通している汎用プロセッサを用いてその周辺回
路に特別な工夫を行って多重化したＣＰＵを実現するも
のであり、同じ汎用プロセッサを用いた通常のデータ処
理装置やワークステーションやパーソナルコンピュータ
と比較した場合、コストの増大やハードウエア・ソフト
ウエアのオーバヘッドの増大が避けられないものとなっ
ている。These conventional techniques are
This is a general-purpose processor that is distributed in the market, and its peripheral circuits are specially devised to realize a multiplexed CPU. Compared with ordinary data processing devices, workstations, and personal computers that use the same general-purpose processor. In that case, an increase in cost and an increase in hardware / software overhead are inevitable.

【００１４】特に近年、汎用プロセッサは急激に性能を
向上させており、この高速プロセッサを用いた通常のデ
ータ処理装置やワークステーションやパーソナルコンピ
ュータの開発速度は、早まる一方である。このことは、
同じプロセッサを用いても特別な周辺回路を必要とする
高信頼化計算機と、通常のデータ処理装置やワークステ
ーションやパーソナルコンピュータとの性能価格差がさ
らに開いていく問題が内在することを意味する。Particularly in recent years, the performance of general-purpose processors has been rapidly improved, and the development speed of ordinary data processing devices, workstations, and personal computers using this high-speed processor has been accelerating. This is
This means that there is an inherent problem that the performance price difference between a high-reliability computer that requires a special peripheral circuit even if the same processor is used, and an ordinary data processing device, workstation, or personal computer is further widened.

【００１５】本発明は、これらの問題点に鑑みてなされ
たものであり、本発明の目的は、汎用計算機と共通性が
高く、汎用計算機と共同開発可能で性能価格性の高い高
信頼計算機を提供することに有る。The present invention has been made in view of these problems, and an object of the present invention is to provide a highly reliable computer which has a high commonality with a general-purpose computer, can be jointly developed with the general-purpose computer, and has a high performance price. It is in providing.

【００１６】[0016]

【課題を解決するための手段】上記目的を達成するため
に、メモリとプロセッサを例とするデータ処理装置より
構成されるＣＰＵを例とする第１のデータ処理ブロック
及び入出力バスと入出力デバイスから成る入出力装置と
を有する計算機に対して、第１のＣＰＵと同じ構成を持
つ第２のＣＰＵと、これら第１及び第２のＣＰＵに対し
て同一周波数かつ同一位相のクロック乃至はリセット信
号を供給するクロック手段と、二つのＣＰＵと入出力装
置を接続する二重化制御手段ＤＳＢＡ（デュアルシステ
ムバスアダプタ(Dual System Bus Adapter）、以下ＤＳ
ＢＡと呼ぶ）と、二つのＣＰＵ間でＣＰＵの状態などを
受け渡しするブロック間通信手段を具備する。In order to achieve the above-mentioned object, a first data processing block exemplifying a CPU including a memory and a data processing device exemplifying a processor, an input / output bus and an input / output device. A second CPU having the same configuration as the first CPU and a clock or reset signal having the same frequency and the same phase with respect to the first and second CPUs. And a clock control means for supplying two CPUs and a dual control means for connecting two CPUs to the input / output device DSBA (Dual System Bus Adapter)
It is referred to as BA) and an inter-block communication means for passing the state of the CPU and the like between the two CPUs.

【００１７】ＤＳＢＡは、第１または第２のＣＰＵから
の出力指示を選択して入出力装置に伝え、入出力装置か
らの応答を第１及び第２の両方のＣＰＵに伝える。また
DSBAは、入出力装置からのメモリアクセスを第１及び第
２のＣＰＵ内のメモリに伝え、第１または第２のＣＰＵ
からのメモリアクセス応答を選択して入出力装置に伝え
る機能を持つ。The DSBA selects an output instruction from the first or second CPU, transmits it to the input / output device, and transmits a response from the input / output device to both the first and second CPUs. Also
The DSBA conveys memory access from the input / output device to the memory in the first and second CPUs, and the first or second CPU
It has a function to select the memory access response from and transmit it to the input / output device.

【００１８】[0018]

【作用】上記の構成において、電源投入による初期立ち
上げの時は、クロック手段によって同一のクロックを二
つのＣＰＵに供給しておき、それぞれのＣＰＵでメモリ
クリアなどの初期化動作を独立に行う。そして、両方の
初期化終了を通信手段によって確認した後、クロック手
段によって同じタイミングで二つのＣＰＵにリセットを
かける。これによって、以降二つのＣＰＵは、全く同じ
動作、即ち同じプログラムを同じ順番で実行するように
なる。そして入出力アクセスの時には、前述のようにＤ
ＳＢＡで二重化ＣＰＵのアクセス制御を行う。In the above structure, at the time of initial start-up by turning on the power, the same clock is supplied to the two CPUs by the clock means, and each CPU independently performs an initialization operation such as memory clear. Then, after confirming the completion of both initializations by the communication means, the two CPUs are reset at the same timing by the clock means. As a result, the two CPUs thereafter execute exactly the same operation, that is, the same program in the same order. Then, at the time of input / output access, as described above, D
Access control of the redundant CPU is performed by SBA.

【００１９】次に、ＣＰＵで障害が発生したときは、Ｄ
ＳＢＡが障害ＣＰＵを自動的に切り離すことによって、
健全なＣＰＵでプログラムの実行を継続する。Next, when a failure occurs in the CPU, D
By SBA automatically disconnecting the faulty CPU,
Continue running the program on a healthy CPU.

【００２０】以上のように、ＣＰＵで障害が発生したと
きに、ＤＳＢＡが障害ＣＰＵを自動的に切り離すことに
よって、健全なＣＰＵでプログラムの実行を継続できる
ので高信頼なコンピュータを実現出来る。As described above, when a failure occurs in the CPU, DSBA automatically disconnects the failed CPU, so that the program execution can be continued by the sound CPU, so that a highly reliable computer can be realized.

【００２１】さらに、ＣＰＵのような非常に高速で高度
な実装技術が要求される部分に二重化制御装置を設ける
のでなく、入出力装置との低速なインタフェイス部分に
DSBAを設けることで、容易にかつ安価に高信頼性を実現
でき、ＣＰＵと入出力装置を通常のデータ処理装置やワ
ークステーションやパーソナルコンピュータと共通化出
来るのでコストの低減と開発速度の向上をはかることが
できる。Further, rather than providing a redundant control device in a portion such as a CPU that requires a very high speed and a high degree of mounting technology, a low speed interface portion with an input / output device is used.
By providing the DSBA, high reliability can be realized easily and inexpensively, and the CPU and the input / output device can be shared with the ordinary data processing device, workstation or personal computer, so that the cost can be reduced and the development speed can be improved. be able to.

【００２２】[0022]

【実施例】以下本発明の実施例を図面を用いて説明す
る。Embodiments of the present invention will be described below with reference to the drawings.

【００２３】図２は、本発明が実施される高信頼化コン
ピュータ（以下計算機）の外観を示す。２００Ａ，２０
０ＢはＣＰＵ−ＢＯＸであり、プロセッサやメモりを含
んでいる。２０１Ａ，２０１Ｂはディスク装置であり、
ディスク装置の信頼性を向上させるために必要に応じて
ミラーディスク構成を採る事が出来る。２０２Ａ，２０
２Ｂは電源であり、それぞれＡ系とＢ系に電源を供給す
る。２０３Ａ，２０３Ｂは冷却ファンであり、それぞれ
Ａ系とＢ系の冷却を行う。この様にシステムの稼働に必
要なハードウエアを全て二重化する事によって、ハード
ウエアの一点故障が発生しても、障害モジュールを切り
離すことによって、連続稼働を実現できる。FIG. 2 shows the external appearance of a highly reliable computer (hereinafter computer) in which the present invention is implemented. 200A, 20
OB is a CPU-BOX, which includes a processor and a memory. 201A and 201B are disk devices,
A mirror disk configuration can be adopted as necessary in order to improve the reliability of the disk device. 202A, 20
Reference numeral 2B is a power supply, which supplies power to the A system and the B system, respectively. Reference numerals 203A and 203B denote cooling fans, which cool the A system and the B system, respectively. In this way, by duplicating all the hardware required for system operation, even if a single point of hardware failure occurs, continuous operation can be realized by disconnecting the faulty module.

【００２４】また、図２の様な装置実装を採る事によっ
て、Ａ系，Ｂ系を物理的に分離できるので二重化形態が
明確になり、保守作業時のヒューマンエラーを減らし、
またバックボード清掃などの重点検も可能となる。Further, by adopting the device mounting as shown in FIG. 2, since the A system and the B system can be physically separated, the duplication form becomes clear and human error at the time of maintenance work is reduced.
It also enables heavy inspections such as backboard cleaning.

【００２５】図３，図２に示した計算機の概念的なハー
ドウエア構成を示す。A conceptual hardware configuration of the computer shown in FIGS. 3 and 2 is shown.

【００２６】１Ａ，１Ｂは、プロセッサやメモリ及びこ
れらの周辺制御回路を有する中央処理装置（ＣＰＵ）で
ある。ＣＰＵ１Ａ，１Ｂは、通常稼働時には同一周波
数，同一位相のクロックに同期して全く同じプログラム
を同じ命令実行順序で実行する。２は、二つのＣＰＵ１
Ａ，１Ｂと入出力バス３０Ａまたは３０Ｂを接続するＤ
ＳＢＡであり、正常なＣＰＵのアクセスを入出力バスに
伝え、入出力バス３０（３０Ａ，３０Ｂ）からのアクセ
スを二つのＣＰＵに伝える働きをする。入出力バス３０
Ａ，３０Ｂは、様々な入出力装置アダプタ（ＩＯＡ）を
含む。ＩＯＡとしては、例えばディスク装置用のＩＯ
Ａ，回線制御装置用のＩＯＡやネットワーク用のＬＡＮ
Ｃなどがある。Reference numerals 1A and 1B are central processing units (CPU) having a processor, a memory, and peripheral control circuits for these. During normal operation, the CPUs 1A and 1B execute exactly the same programs in the same instruction execution order in synchronization with clocks having the same frequency and the same phase. 2 CPUs 1
D connecting A and 1B with the input / output bus 30A or 30B
It is an SBA and transmits a normal CPU access to the input / output bus and transmits an access from the input / output bus 30 (30A, 30B) to the two CPUs. I / O bus 30
A and 30B include various input / output device adapters (IOA). As the IOA, for example, IO for a disk device
A, IOA for line control equipment and LAN for network
There is C etc.

【００２７】図からも分かるように、ハードウエアの１
点故障に対する高信頼性を実現するために全てのハード
ウエアを二重化する。ここで、エラー検出能力を持ち、
切り離しの単位となるものをブロックと呼ぶことにす
る。本実施例では、３種類のブロックを持ち、各ブロッ
クはＡ系，Ｂ系で二重化されているので合計６ブロック
ある。各ブロックの主な構成要素は、以下の表１の通り
である。As can be seen from the figure, the hardware 1
All hardware is duplicated to achieve high reliability against point failures. Where it has the ability to detect errors,
A unit that serves as a separation unit will be called a block. In this embodiment, there are three types of blocks, and each block is duplicated in A system and B system, so there are a total of 6 blocks. The main components of each block are as shown in Table 1 below.

【００２８】[0028]

【表１】 [Table 1]

【００２９】本実施例での故障に対する動作の上で特徴
的な点は、多重化されたブロックの切り離しをハードウ
エアとソフトウエアで分担していることにある。つま
り、ＣＰＵブロックは、ハードウエアで二重化制御を行
い、ＩＯバスブロックとデバイスブロックは、ソフトウ
エアによって二重化制御を行う点である。ＣＰＵブロッ
ク内で検出されるエラーについては、ハードウエアで当
該ブロックの切り離しを行い、ＩＯブロックおよびデバ
イスブロックで検出されるエラーについては、ソフトで
当該ブロックの切り離しを行い、残った正常な系による
処理を継続する。A characteristic point of the operation against the failure in this embodiment is that the separation of the multiplexed blocks is shared by the hardware and the software. That is, the CPU block performs duplication control by hardware, and the IO bus block and the device block perform duplication control by software. For the error detected in the CPU block, the block is separated by hardware, and for the error detected in the IO block and device block, the block is separated by software, and the remaining normal processing is performed. To continue.

【００３０】従って、入出力バスを制御するソフトウエ
アから見ると、独立したアドレスを持つ二つの入出力バ
ス３０Ａ，３０Ｂが存在しており、どちらの入出力バス
が正常かを示すフラグをメモりに持って、このフラグに
従って二つの入出力装置を組み合わせて使う事によっ
て、入出力装置の二重化して入出力装置の１点故障に対
して連続稼働を実現する。Therefore, from the viewpoint of software controlling the I / O bus, there are two I / O buses 30A and 30B having independent addresses, and the flag indicating which I / O bus is normal is memorized. Therefore, by using two input / output devices in combination according to this flag, the input / output devices are duplicated and continuous operation is realized for one point failure of the input / output devices.

【００３１】図１に図３で示したハード構成図のより詳
細なブロック図を示す。ここで構成図の右半分と左半分
は、全く同じ構成なので、ここでは、左半分のＡ系部分
について詳細に説明する。FIG. 1 shows a more detailed block diagram of the hardware block diagram shown in FIG. Here, since the right half and the left half of the configuration diagram have exactly the same configuration, the A-system portion of the left half will be described in detail here.

【００３２】３Ａ及び４Ａは、全く同一構成のキャッシ
ュメモリを内蔵したプロセッサであり、通常は全く同じ
動作を行う。５Ａは、命令やデータを格納するメモリで
あり、その容量や構成は当業者周知の様々なものが有る
が、いずれの場合も本発明の実施にあたっては、問題と
ならないのでここでは詳しく述べない。Reference numerals 3A and 4A are processors each having a built-in cache memory having exactly the same structure, and normally perform exactly the same operation. 5A is a memory for storing instructions and data, and its capacity and configuration are various known to those skilled in the art, but in any case, it will not be a problem in carrying out the present invention, and therefore will not be described in detail here.

【００３３】６Ａは、プロセッサ３Ａ，４Ａとメモリ５
Ａとシステムバス９Ａを接続するプロセッサメモリ制御
ユニット(ＰＭＣＵ)あり、主にプロセッサ３Ａまたは４
Ａからのアクセスをメモリやシステムバス９Ａに伝えた
り、システムバスからのアクセスをメモリに伝えたりす
る。さらに、プロセッサ３Ａと４Ａの出力信号５００と
５０１を比較して、二つのプロセッサの出力不一致を調
べてプロセッサ内の故障を検出する。6A is a processor 3A, 4A and a memory 5
There is a processor memory control unit (PMCU) that connects A and the system bus 9A, and is mainly the processor 3A or 4
The access from A is transmitted to the memory or the system bus 9A, and the access from the system bus is transmitted to the memory. Further, the output signals 500 and 501 of the processors 3A and 4A are compared with each other to check the output disagreement of the two processors to detect a failure in the processors.

【００３４】７Ａは、ＣＰＵ１Ａと１Ｂ間で信号線５７
（ＰＸＩバス）を介して、ＣＰＵに状態などの情報を受
け渡すための制御を行うプロセッサ間インタフェイス制
御装置（ＰＸＩ）である。7A is a signal line 57 between the CPUs 1A and 1B.
It is an inter-processor interface control device (PXI) that performs control for passing information such as status to the CPU via the (PXI bus).

【００３５】８Ａは、発振器を有するクロック回路（Ｃ
ＬＫ）であり、ＣＬＫ８Ａは、信号線５５を介してＢの
ＣＬＫ８Ｂと協調して、同一周波数／位相のクロックを
Ａ系全体にクロックタイミング信号を供給する。またＣ
ＬＫ８Ａは、発振器の停止検出回路を有している。8A is a clock circuit (C
CLK), CLK8A cooperates with CLK8B of B through the signal line 55 to supply a clock of the same frequency / phase to the entire A system as a clock timing signal. Also C
The LK8A has an oscillator stop detection circuit.

【００３６】１１，１２，１３，１４は、Ａ系とＢ系を
接続するＤＳ(Dual System）バスを制御するＤＳＢＡで
ある。本実施例では、４組のＤＳＢＡが示されており、
それぞれ１１Ａと１１Ｂ（１１），１２Ａと１２Ｂ（１
２），１３Ａと１３Ｂ(１３)，１４Ａと１４Ｂ（１４）
が１組となっており、さらに１１と１２の組と１３と１
４の組が、入出力バス及び入出力装置の二重系を形成し
ている。各組の動作は、同じなのでここでは、１１Ａと
１１Ｂの組（１１）について説明する。Reference numerals 11, 12, 13, and 14 are DSBAs for controlling a DS (Dual System) bus connecting the A system and the B system. In this example, four sets of DSBAs are shown,
11A and 11B (11), 12A and 12B (1
2), 13A and 13B (13), 14A and 14B (14)
Is a set, and further 11 and 12 pairs and 13 and 1
The set of 4 forms a dual system of I / O buses and I / O devices. Since the operation of each set is the same, the set (11) of 11A and 11B will be described here.

【００３７】１６は、ＤＳバスのＡ系側１５ＡとＢ系側
１５Ｂの接続／分離を行うバススイッチであり、遅延デ
ィレイの少ないＣ−ＭＯＳプロセスで製造されたＭＯＳ
スイッチが好ましい。バススイッチを切ることによりＡ
系Ｂ系間が論理的にも電気的にも分離できるのでオンラ
イン動作状態での部品交換がやりやすくなる。A bus switch 16 connects / disconnects the A system side 15A and the B system side 15B of the DS bus, and is a MOS manufactured by a C-MOS process with a small delay delay.
Switches are preferred. A by turning off the bus switch
Since the systems B and B can be separated logically and electrically, it becomes easy to replace components in the online operation state.

【００３８】DSBA11AはプライマリＤＳＢＡと定義さ
れ、他方のDSBA11Bは、セカンダリDSBAと定義される。
プライマリとセカンダリＤＳＢＡは、それぞれのＣＰＵ
１Ａと１Ｂから同時にＤＳバスアクセスを受けるが、実
際にはプライマリＤＳＢＡだけが、ＤＳバスに対してそ
のＣＰＵアクセスと伝える。即ち一種のセレクタを形成
する。そして、ＤＳバスからのアクセスは、プライマリ
とセカンダリＤＳＢＡが同時に受けて、それぞれのＣＰ
Ｕ１Ａと１Ｂに同じタイミングで伝える。The DSBA 11A is defined as a primary DSBA, and the other DSBA 11B is defined as a secondary DSBA.
The primary and secondary DSBAs are the respective CPUs
Although it receives a DS bus access from 1A and 1B at the same time, only the primary DSBA actually tells the DS bus its CPU access. That is, a kind of selector is formed. Then, the access from the DS bus is received by the primary and secondary DSBAs at the same time, and the respective CPs are received.
Report to U1A and 1B at the same timing.

【００３９】２０Ａと２１Ａは、ＤＳバスと入出力バス
３０Ａを接続する入出力バスアダプタ(ＩＯＢＡ)であ
る。３１Ａ，３５Ａは、それぞれ入出力バス３０ＡとＳ
ＣＳＩ（Small Computer System Interface)バスに代表
される標準的な入出力装置バス３２Ａ，３３Ａ，３６
Ａ，３７Ａを接続する入出力アダプタ（ＩＯＡ）であ
る。３９Ａは、イーサネット（Ethernet）に代表される
ローカルエリアネットワークを接続するローカルエリア
ネットワーク制御装置（ＬＡＮＣ）である。Reference numerals 20A and 21A are input / output bus adapters (IOBA) for connecting the DS bus and the input / output bus 30A. 31A and 35A are input / output buses 30A and S, respectively.
Standard input / output device buses 32A, 33A, 36 represented by CSI (Small Computer System Interface) buses
It is an input / output adapter (IOA) for connecting A and 37A. 39A is a local area network controller (LANC) for connecting a local area network represented by Ethernet.

【００４０】３４（３４Ａ，３４Ｂ）は、所謂ディスク
装置であり本実施例の場合は３４Ａと３４Ｂをソフトウ
エアでミラーディスク構成としているが、ここに示さな
い他の高信頼なディスク構成を採る事も可能である。３
８は、公衆回線網に接続するための公衆回線接続装置で
ある。Reference numeral 34 (34A, 34B) is a so-called disk device, and in the case of this embodiment, 34A and 34B have a mirror disk structure by software, but another highly reliable disk structure not shown here should be adopted. Is also possible. Three
Reference numeral 8 is a public line connection device for connecting to a public line network.

【００４１】本実施例で注意すべきことは、これらの入
出力バスがソフトウエアからは、２０Ａと２０Ｂは異な
ったアドレスを持ったＩＯＢＣと認識され、また入出力
装置がソフトウエアからは、それぞれ異なった入出力装
置と認識されることである。It should be noted in this embodiment that these input / output buses are recognized by the software as IOBCs having different addresses in 20A and 20B, and the input / output devices are recognized by the software. It is to be recognized as a different input / output device.

【００４２】従って、ソフトウエアの設定だけで様々な
グレードを持った高信頼計算機が実現できる。例えば、
ここでは全ての入出力バスと入出力装置を二重化した例
を示しているが、システム上で重要性の少ない入出力装
置を１重化することによって、システムのコストを下げ
るなど柔軟な対応が可能となる。あるいは、非常に重要
なデータを格納したディスクを４重化するなどの対応も
可能となる。Therefore, a highly reliable computer having various grades can be realized only by setting the software. For example,
Here, an example is shown in which all I / O buses and I / O devices are duplicated. However, by unifying I / O devices that are less important in the system, it is possible to flexibly deal with system cost reduction. Becomes Alternatively, it is possible to take measures such as quadrupling a disk storing very important data.

【００４３】（ａ）プロセッサメモリ制御ユニット（Ｐ
ＭＣＵ）：図４にＰＭＣＵのブロック図を示す。６Ａも
６Ｂも全く同じ構成なのでここでは６Ａを取り上げて説
明する。ＰＭＣＵは、大きくプロセッサインタフェイス
ユニット(ＰＩＵ）４０とメモリインタフェイスユニッ
ト(ＭＩＵ）４１とシステムバスインタフェイスユニッ
ト（ＳＢＩＵ）４２及びプロセッサ出力比較器４４にわ
かれる。４０は、プロセッサ３Ａと４Ａとのインタフェ
イスユニット(ＰＩＵ)であり、プロセッサの外部アクセ
スがメモリアクセスのときは、マスタプロセッサ３Ａか
らのメモリアドレス・データを信号線５００を介して受
信バッファ４７に取り込み、プロセッサの外部アクセス
が入出力バスへのアクセス（ＰＩＯアクセス）のとき
は、メモリアドレス・データを受信バッファ４６に格納
する。一方、スレーブプロセッサ４Ａからのメモリアド
レス・データは、信号線５０１介してＰＭＣＵに取り込
まれるが、受信バッファには格納されない。(A) Processor memory control unit (P
MCU): FIG. 4 shows a block diagram of the PMCU. Since 6A and 6B have exactly the same configuration, 6A will be taken up and described here. The PMCU is roughly divided into a processor interface unit (PIU) 40, a memory interface unit (MIU) 41, a system bus interface unit (SBIU) 42, and a processor output comparator 44. Reference numeral 40 denotes an interface unit (PIU) between the processors 3A and 4A. When the external access of the processor is a memory access, the memory address / data from the master processor 3A is fetched into the reception buffer 47 via the signal line 500. When the external access of the processor is an access to the input / output bus (PIO access), the memory address data is stored in the reception buffer 46. On the other hand, the memory address data from the slave processor 4A is taken into the PMCU via the signal line 501, but is not stored in the receiving buffer.

【００４４】マスタ及びチェッカプロセッサからのアド
レスやデータや制御信号は、マスタプロセッサが書き込
みアクセスを出力したとき、プロセッサ出力比較器４４
で比較される。そして値が不一致の時は、マスタチェッ
カエラー信号４００は、アサートされる。また、図示さ
れていないがこのＰＩＵ動作中に検出されるパリティエ
ラーや制御回路のエラーなどは、信号線４０１を介して
論理和素子４３に伝えられる。Addresses, data and control signals from the master and checker processors are sent to the processor output comparator 44 when the master processor outputs a write access.
Are compared. When the values do not match, the master checker error signal 400 is asserted. Further, although not shown, a parity error detected during this PIU operation, an error in the control circuit, and the like are transmitted to the logical sum element 43 via the signal line 401.

【００４５】ＭＩＵ４１は、ＰＩＵからのメモリアクセ
スとＳＢＩＵからのＤＭＡアクセスをセレクタ５０６で
受けて、メモリ５Ａをアクセスして応答をそれぞれＰＩ
ＵあるいはＳＢＩＵに返す。ＭＩＵがＰＩＵから受ける
アクセスは、メモリリードとメモリライトである。メモ
リリードのときには受信バッファ４７に格納されたリー
ドアドレスがセレクタ５０６を介して、メモリ５Ａに伝
えられる。そして、読み出されたデータは、セレクタ５
０５を介して、送信バッファ４８に格納され、マスタ・
チェッカプロセッサ３Ａと４Ａに返される。メモリライ
トのときには受信バッファ４７に格納されたリードアド
レスとライトデータがセレクタ５０６を介して、メモリ
５Ａに書き込まれる。The MIU 41 receives a memory access from the PIU and a DMA access from the SBIU at the selector 506, accesses the memory 5A and sends a response to each PI.
Return to U or SBIU. The access that the MIU receives from the PIU is memory read and memory write. At the time of memory read, the read address stored in the reception buffer 47 is transmitted to the memory 5A via the selector 506. Then, the read data is stored in the selector 5
Stored in the transmission buffer 48 via
It is returned to the checker processors 3A and 4A. At the time of memory write, the read address and the write data stored in the reception buffer 47 are written in the memory 5A via the selector 506.

【００４６】ＭＩＵがＳＢＩＵから受けるアクセスは、
ＤＭＡリードとメモリライトである。ＤＭＡリードのと
きには受信バッファ５０に格納されたリードアドレスが
セレクタ５０６を介して、メモリ５Ａに伝えられる。そ
して、読み出されたデータは、送信バッファ４９に格納
され、システムバス９Ａを介して入出力バスあるいは入
出力装置に返される。ＤＭＡライトのときには受信バッ
ファ５０に格納されたリードアドレスとライトデータが
セレクタ５０６を介して、メモリ５Ａに書き込まれる。
セレクタ５０６の制御方法は、ここでは示していない
が、ＳＢＩＵを優先することが望ましい。The access that MIU receives from SBIU is
DMA read and memory write. At the time of DMA read, the read address stored in the reception buffer 50 is transmitted to the memory 5A via the selector 506. Then, the read data is stored in the transmission buffer 49 and returned to the input / output bus or the input / output device via the system bus 9A. At the time of DMA write, the read address and the write data stored in the reception buffer 50 are written in the memory 5A via the selector 506.
Although the control method of the selector 506 is not shown here, it is desirable to give priority to SBIU.

【００４７】このＭＩＵ動作中に検出されるパリティエ
ラーや制御回路のエラーなどは、信号線４０２を介して
論理和素子４３に伝えられる。４５は、メモリコピー
（詳細後述）の時に正常系のＭＩＵが行うメモリへのリ
ード・ライトアクセスを監視して、必要に応じてアドレ
スとデータを取り込む。そして、これを信号線５０４を
介してメモリコピーアクセスとして、システムバスに出
力して、組みとなっているＤＳＢＡを経由して、他系の
メモリに書き込まれる。Parity errors and control circuit errors detected during the MIU operation are transmitted to the logical sum element 43 via the signal line 402. Reference numeral 45 monitors read / write access to the memory performed by a normal MIU at the time of memory copy (details will be described later), and fetches an address and data as necessary. Then, this is output to the system bus as a memory copy access via the signal line 504, and is written to the memory of the other system via the paired DSBA.

【００４８】SBIU42は、システムバス９ＡからのＤＭＡ
アクセスとＰＩＵからのＰＩＯアクセスを処理する。Ｐ
ＩＵからのＰＩＯリードアクセスのときは、受信バッフ
ァ４６に格納されたアドレスをシステムバス権を獲得
後、システムバス９Ａに出力する。そして、リードデー
タは、セレクタ５０５を介して、送信バッファ４８に格
納され、マスタ・チェッカプロセッサ３Ａと４Ａに返さ
れる。ＰＩＯライトのときには受信バッファ４６に格納
されたリードアドレスとライトデータがセレクタ５０７
を介して、システムバス権を獲得後、システムバス９Ａ
に出力する。システムバス９ＡからのＤＭＡリード・ラ
イトアクセスのときは、前述の通りである。これらのＳ
ＢＩＵ動作中にパリティエラーや制御回路のエラーなど
が検出されたときは、信号線４０３を介して、論理和素
子４３に伝えられる。SBIU42 is a DMA from the system bus 9A.
Handles access and PIO access from PIUs. P
At the time of PIO read access from the IU, the address stored in the reception buffer 46 is output to the system bus 9A after acquiring the system bus right. Then, the read data is stored in the transmission buffer 48 via the selector 505 and returned to the master checker processors 3A and 4A. At the time of PIO write, the read address and write data stored in the reception buffer 46 are stored in the selector 507.
System bus 9A after acquiring the system bus right via
Output to. The DMA read / write access from the system bus 9A is as described above. These S
When a parity error, a control circuit error, or the like is detected during the BIU operation, it is transmitted to the logical sum element 43 via the signal line 403.

【００４９】４３は、論理和素子であり、ＰＭＣＵの動
作中に何等かのエラーが検出されると、信号線（ＰＭＣ
Ｕ−ＥＲＲ）９５Ａをアサートして、ＰＸＩ７Ａに伝え
る。（ｂ）クロック回路（ＣＬＫ）：図５にクロック回路８
Ａ，８Ｂの内部構成図と接続を示す。クロック回路自体
は、Ａ，Ｂとも同じなので、８Ａについて説明する。５
０Ａは、水晶発信器を有する当業者周知のオシレータ
（ＯＳＣ）であり、１０ＭＨｚの比較的低い周波数のク
ロック５０１Ａを出力する。ＯＳＣの周波数を低く設定
することによって、図２に示したような８Ａと８Ｂの間
が数１０センチメートル離れる実装を行っても、安定し
たクロックを両方の系に供給できる。５１Ａは、自系の
ＯＳＣからのクロックと他系のＯＳＣからのクロックを
選択するセレクタである。５２Ａは、フェーズロックル
ープ回路（ＰＬＬ）であり、５１Ａで選択されたクロッ
クと位相の合ったｎ倍周波数のクロック５４Ａを生成し
て、プロセッサや周辺回路に必要な高周波数クロックを
供給する。５３Ａは、ＯＳＣの停止検出回路であり、５
０１Ａ，５０１Ｂの発信停止を検出すると、セレクタの
制御信号５６Ａによって、正常なＯＳＣの出力を選択す
るようにする。Reference numeral 43 is a logical sum element, and when an error is detected during the operation of the PMCU, the signal line (PMC
U-ERR) 95A is asserted and transmitted to PXI7A. (B) Clock circuit (CLK): The clock circuit 8 in FIG.
The internal block diagram of A and 8B and connection are shown. Since the clock circuit itself is the same for both A and B, 8A will be described. 5
OA is an oscillator (OSC) having a crystal oscillator, which is well known to those skilled in the art, and outputs a clock 501A having a relatively low frequency of 10 MHz. By setting the OSC frequency low, it is possible to supply a stable clock to both systems even when mounting is performed with a distance of several tens of centimeters between 8A and 8B as shown in FIG. 51A is a selector for selecting a clock from the OSC of the own system and a clock from the OSC of the other system. Reference numeral 52A is a phase-locked loop circuit (PLL), which generates a clock 54A having an n-fold frequency in phase with the clock selected in 51A and supplies a high-frequency clock necessary for the processor and peripheral circuits. 53A is a stop detection circuit of the OSC,
When the transmission stop of 01A and 501B is detected, the normal OSC output is selected by the control signal 56A of the selector.

【００５０】OSC50Aと５０Ｂは、最初に電源の入った方
をクロックマスタとするマスタ・スレーブ動作を行う。
例えば、Ａ系が先に立ち上がったとすれば、５０Ａがク
ロックマスタとなりセレクタ５１Ａ，５１Ｂは、いずれ
も５０１Ａを選択する。そして、５０Ａが停止したとき
には、停止検出回路５１Ａ，５１Ｂでこれを検出して、
セレクタが５０１Ｂを選択するように動作する。この切
替動作は、ＰＬＬ引き込み時間よりも小さい、３００ｎ
ｓ程度の短い時間で行われるためにセレクタが切り替わ
っても両方のＰＬＬの出力は、途切れることなくプロセ
ッサや周辺回路にクロックを供給できる。The OSCs 50A and 50B perform a master / slave operation in which the first power source is the clock master.
For example, if the A-system starts up first, 50A becomes the clock master and both selectors 51A and 51B select 501A. When 50A is stopped, the stop detection circuits 51A and 51B detect it,
The selector operates to select 501B. This switching operation is less than the PLL pull-in time, 300n
Since it is performed in a short time of about s, the outputs of both PLLs can supply the clock to the processor and peripheral circuits without interruption even if the selectors are switched.

【００５１】（ｃ）多重システムバスアダプタ（ＤＳＢ
Ａ）：図６と図７及び図８を用いて、二つのＣＰＵと１
つの入出力バスを接続する多重システムバスアダプタ
（ＤＳＢＡ）を説明する。図６にＤＳＢＡによる二重化
システムバス制御まわりに焦点を当てた構成を示す。DS
BA11A はプライマリDSBAと定義され、他方のDSBA11B
は、セカンダリＤＳＢＡと定義される。二つのDSBAは、
それぞれのＣＰＵ１Ａと１Ｂから同時にＤＳバスアクセ
スを受けるが、プライマリＤＳＢＡだけが、ＤＳバスに
対してそのＣＰＵアクセスと伝える。そして、ＤＳバス
からのアクセスは、プライマリとセカンダリＤＳＢＡが
同時に受けて、それぞれのＣＰＵ１Ａと１Ｂに同じタイ
ミングで伝える。つまり、ＤＳＢＡは、ＣＰＵからのア
クセスに対しては、セレクタを形成し、ＤＳバスからの
アクセスに対しては、ディストリビュータの役割をす
る。(C) Multiple system bus adapter (DSB
A): Two CPUs and one CPU using FIG. 6, FIG. 7 and FIG.
A multi-system bus adapter (DSBA) that connects two input / output buses will be described. FIG. 6 shows a configuration focusing on the duplex system bus control by DSBA. DS
BA11A is defined as the primary DSBA and the other DSBA11B
Is defined as the secondary DSBA. The two DSBAs are
It receives a DS bus access from each CPU 1A and 1B at the same time, but only the primary DSBA tells the DS bus that CPU access. Then, the access from the DS bus is simultaneously received by the primary and secondary DSBAs and transmitted to the respective CPUs 1A and 1B at the same timing. That is, the DSBA forms a selector for access from the CPU and acts as a distributor for access from the DS bus.

【００５２】ところで、ＣＰＵ１Ａ，１Ｂは、正常時に
は同じ動作をしているが、何等かの障害が発生すると同
期動作がずれてくる。この同期ずれは、二つのＣＰＵが
１つのＤＳバス１５Ａにアクセスするときに顕在化す
る。ＤＳＢＡ間インタフェイス６０は、この同期ずれを
検出し、片系障害時に障害系ＣＰＵブロックを切り離す
タイミングを生成する。ＤＳＢＡ間インタフェイス６０
は図６に示すように、少なくとも８本の信号線を有して
いる。６１Ａ，６１Ｂは、それぞれＣＰＵ１Ａと１Ｂか
らのシステムバス要求であり、６２Ａ，６２Ｂは、それ
ぞれＣＰＵ１Ａと１Ｂが生成するシステムバスの使用許
可信号である。６３Ａ，６３Ｂは、それぞれのＤＳＢＡ
内でエラーを検出したことを示すエラー信号である。本
実施例では、ＤＳＢＡ間インタフェイス６０にパリティ
ビットなどのエラー検出符号を付加していないが、必要
に応じて付加できることは、明かである。By the way, the CPUs 1A and 1B perform the same operation under normal conditions, but if some trouble occurs, the synchronous operation is deviated. This synchronization deviation becomes apparent when two CPUs access one DS bus 15A. The inter-DSBA interface 60 detects this synchronization deviation and generates the timing for disconnecting the faulty CPU block when one-sided fault occurs. DSBA interface 60
Has at least eight signal lines, as shown in FIG. 61A and 61B are system bus requests from the CPUs 1A and 1B, respectively, and 62A and 62B are system bus use permission signals generated by the CPUs 1A and 1B, respectively. 63A and 63B are the respective DSBAs
It is an error signal indicating that an error has been detected within. In this embodiment, an error detection code such as a parity bit is not added to the inter-DSBA interface 60, but it is clear that it can be added as needed.

【００５３】図７に、DSBA11A の内部構成を示す。ＤＳ
ＢＡは、プライマリかセカンダリによって動作は異なる
が、全て同じハードウエアで実現できるので、ここで
は、DSBA11A を説明する。FIG. 7 shows the internal structure of the DSBA11A. DS
The operation of the BA differs depending on whether it is the primary or the secondary, but since it can be realized with the same hardware, the DSBA11A will be described here.

【００５４】システムバス９Ａの信号は、データ／アド
レス信号７５１と制御信号７５２を含んでいる。同様に
ＤＳバス３０Ａの信号は、データ／アドレス信号７５３
と制御信号７５４を含んでいる。７３と７４は、アドレ
スやデータを格納するバッファである。７１は、バッフ
ァ７３，７４を管理してシステムバス９ＡあるいはＤＳ
バス３０Ａへのアクセスを管理する送受信制御部であ
る。ＤＳＢＡが処理するアクセスは、ＣＰＵから入出力
装置へのアクセス（ＰＩＯリードアクセスとライトアク
セスと呼ぶ），入出力装置からメモリへのアクセス（Ｄ
ＭＡリードアクセスとライトアクセスと呼ぶ）である。The signals on the system bus 9A include a data / address signal 751 and a control signal 752. Similarly, the signal of the DS bus 30A is the data / address signal 753.
And control signal 754. Reference numerals 73 and 74 are buffers for storing addresses and data. The reference numeral 71 manages the buffers 73 and 74 and controls the system bus 9A or DS
A transmission / reception control unit that manages access to the bus 30A. The access processed by the DSBA is the access from the CPU to the input / output device (called PIO read access and write access) and the access from the input / output device to the memory (D
(MA read access and write access).

【００５５】ＰＩＯリードアクセスの時は、信号線７５
１上のリードアドレスを受信バッファ７３に格納し、こ
のアドレスを用いてＤＳバスをアクセスする。そして入
出力アダプタあるいは入出力装置から詠み出されたデー
タは、システムバスへの送信バッファ７４に一旦格納さ
れた後、信号線７５１を介してＣＰＵに伝えられる。Ｐ
ＩＯライトアクセスの時には、信号線７５１上のライト
アドレスとライトデータを受信バッファ７３に格納し、
このアドレスを用いてＤＳバスをアクセスする。そして
入出力アダプタあるいは入出力装置内のレジスタに書き
込まれる。At the time of PIO read access, the signal line 75
The read address on 1 is stored in the reception buffer 73, and the DS bus is accessed using this address. The data read out from the input / output adapter or the input / output device is once stored in the transmission buffer 74 for the system bus, and then transmitted to the CPU via the signal line 751. P
At the time of IO write access, the write address and the write data on the signal line 751 are stored in the reception buffer 73,
This address is used to access the DS bus. Then, it is written in a register in the input / output adapter or the input / output device.

【００５６】ＤＭＡリードアクセスの時は、信号線７５
３上のリードアドレスを送信バッファ７４に格納し、こ
のアドレスを用いてＣＰＵ内のメモリをアクセスする。
そしてメモリから読み出されたデータは、ＤＳバスへの
受信バッファ７３に一旦格納された後、信号線７５３を
介してＤＭＡアクセス元に返される。ＤＭＡライトアク
セスの時には、信号線７５３上のライトアドレスとライ
トデータを送信バッファ７４に格納し、このアドレスを
用いてメモリに書き込まれる。At the time of DMA read access, the signal line 75
The read address on No. 3 is stored in the transmission buffer 74, and the memory in the CPU is accessed using this address.
Then, the data read from the memory is temporarily stored in the reception buffer 73 for the DS bus and then returned to the DMA access source via the signal line 753. At the time of DMA write access, the write address and the write data on the signal line 753 are stored in the transmission buffer 74 and written in the memory using this address.

【００５７】７６は、受信バッファ７３に格納されたＰ
ＩＯアクセスのアドレスを当該DSBA配下の入出力アダプ
タあるいは入出力装置内のアドレスに変換するＩＯ空間
畳み込み回路である。７７１は、ＩＯ空間畳み込み回路
７６の出力を有効とするかの選択を指示するフラグであ
り、ＤＳＢＡ内の制御レジスタの一つとして本DSBAと対
になるＤＳＢＡが故障したときにソフトウエアによって
設定される。本実施例に於ては、二重化された入出力装
置の制御をすべてソフトウエアで行う場合を中心に説明
しているが、上記ＩＯ空間畳み込み回路７６を用いるこ
とによって、入出力装置を切り替えてもソフトウエア
（たとえばデバイスドライバなど）に対してのアドレス
をかえずに済むため２重制御の負担を減らすことが可能
になる。Reference numeral 76 denotes P stored in the reception buffer 73.
It is an IO space convolution circuit that converts an IO access address into an address within the input / output adapter or input / output device under the DSBA. Reference numeral 771 is a flag for instructing selection of whether the output of the IO space convolution circuit 76 is valid, and is set by software when one of the control registers in the DSBA has a failure in the DSBA paired with this DSBA. It In the present embodiment, the description has been centered on the case where all the control of the duplicated input / output device is performed by software, but even if the input / output device is switched by using the IO space convolution circuit 76, Since it is not necessary to change the address for software (for example, device driver), the burden of dual control can be reduced.

【００５８】７５５と７５６は、それぞれ信号線７５１
と７５３のパリティチェッカであり、エラーを検出する
とパリティエラーとして論理和素子７６に伝えられる。
DSBA内で何等かのエラーが検出されると、DSBAERROR63A
として自系と他系の切り離し要求生成部に伝えられる。
切り離し要求生成部７２は、これらＤＳＢＡ内あるいは
ＤＳＢＡ間でエラーを検出したときに、故障ＣＰＵブロ
ックを切り離し要求信号（DISCONREQ)６４Ａを生成し
て、ＰＸＩ７Ｂ（他系のＰＸＩ）に伝える。７５は、Ｄ
Ｓバスへの出力ゲートの制御を行う出力ゲート制御回路
である。755 and 756 are signal lines 751 respectively.
And 753 are parity checkers, and when an error is detected, it is transmitted to the logical sum element 76 as a parity error.
If any error is detected in DSBA, DSBAERROR63A
Is transmitted to the disconnection request generator of the own system and the other system.
When the disconnection request generation unit 72 detects an error in these DSBAs or between DSBAs, the disconnection request generation unit 72 generates a disconnection request signal (DISCONREQ) 64A and transmits it to the PXI7B (PXI of the other system). 75 is D
It is an output gate control circuit that controls an output gate to the S bus.

【００５９】図８に出力ゲート制御回路の詳細を示す。
通常の出力ゲート制御は、送受信制御回路からＤＳバス
への送信信号（send）が出されたときに出力ゲートを開
ければ良い。しかし、本実施例では、ＤＳＢＡを多重シ
ステムバスアダプタとして動作させるために、図８に示
す構成を採る。８１，８２は、論理積素子であり、８３
０，８３１，８３２は、否定論理素子である。信号線８
４は、送受信制御回路からＤＳバスへの送信信号（sen
d）であり、信号線６５は、自系ＣＰＵブロックが切り
離されていることを示すDISCON信号である。信号線６６
は、他系ＣＰＵブロックが切り離されていることを示す
DISCON信号である。信号線６７は、セカンダリＤＳＢＡ
であることを示す信号である。二つのＣＰＵが正常に動
作しているときには、プライマリ・セカンダリ両方のＤ
ＳＢＡは同じ様に送信信号８４をアサートするが、プラ
イマリ(即ち信号線６７がネゲートされている)ＤＳＢＡ
だけの出力ゲートが開いて、アドレス／データ等を送出
できる。また、自系ＣＰＵブロックが障害を起こして、
切り離される（信号線６５がアサート）時には、信号線
６５によって出力ゲート制御信号（ＯＥ）８５がネゲー
トされるので、自系の出力ゲートは一切開かず、これに
よって自系ＣＰＵブロックがＤＳバスからの切り離しを
実現する。また、他系が障害を起こして、切り離される
（信号線６６がアサート）時には、セカンダリＤＳＢＡ
であってもsend信号８４に従って、アドレス／データ等
を送出できる。FIG. 8 shows the details of the output gate control circuit.
The normal output gate control may be performed by opening the output gate when a transmission signal (send) to the DS bus is output from the transmission / reception control circuit. However, in this embodiment, the configuration shown in FIG. 8 is adopted in order to operate the DSBA as a multiple system bus adapter. 81 and 82 are AND elements, and 83
Reference numerals 0, 831 and 832 are negative logic elements. Signal line 8
4 is a transmission signal from the transmission / reception control circuit to the DS bus (sen
The signal line 65 is a DISCON signal indicating that the own system CPU block is disconnected. Signal line 66
Indicates that the other system CPU block is disconnected
This is the DISCON signal. The signal line 67 is a secondary DSBA
Is a signal indicating that. When the two CPUs are operating normally, both the primary and secondary D
The SBA asserts the send signal 84 in the same way, but the primary (ie, signal line 67 is negated) DSBA
Only the output gate is opened and address / data etc. can be sent out. Also, when the own system CPU block fails,
When it is disconnected (the signal line 65 is asserted), the output gate control signal (OE) 85 is negated by the signal line 65, so that the output gate of the own system is not opened at all, which causes the own system CPU block to be disconnected from the DS bus. Achieve separation. When another system fails and is disconnected (the signal line 66 is asserted), the secondary DSBA
Even in this case, the address / data and the like can be transmitted according to the send signal 84.

【００６０】図９に切り離し要求制御部の構成を示す。
９００(９００−１〜９００−４)，９０１，９０２，９
０３(９０３−１〜９０３−３)は、それぞれ論理積素
子，論理和素子，排他的論理和素子，否定論理素子であ
る。９０は、１０マイクロ秒のタイマである。タイマ９
０は、ＰＭＣＵからのシステムバス使用許可信号９１Ａ
（６１Ａ)と９１Ｂ(６１Ｂ）の不一致が排他的論理和素
子９０２で検出されて信号９１０がアサートされるか、
あるいはＰＭＣＵでエラーが検出されてPMCUERR信号９
５Ａがアサートされると、カウントを開始し、他系のＣ
ＰＵブロックを切り離すXDISCON 信号がアサートされる
と、カウントを停止しクリアされるが、１０マイクロ秒
経過するとタイムアウト信号９６をアサートする。論理
積素子９００−１は、片系エラーを検出する回路であ
る。即ち、自系がエラーでなく他系がエラーの時に、他
系のエラー信号９９をアサートし、これによって他系CP
Uブロックの切り離し要求信号９４Ａがアサートされ
る。論理積素子９００−２，９００−３は、それぞれ同
期ずれを検出する回路である。システムバス要求信号や
システムバス使用許可信号をアサートしていない方を故
障と見なして、他系の同期ずれエラー信号９１１，９８
をアサートする。このうち同期ずれエラー信号９８につ
いては、ＣＰＵが別の原因で既に同期ずれを起こしてい
る可能性があるので、すぐに切り離し要求をだすのは危
険である。そこで、要因が判明するまで前述の様にしば
らくの間、タイマ９０，出力９６で同期ずれエラー信号
９８をマスクする。FIG. 9 shows the configuration of the separation request control unit.
900 (900-1 to 900-4), 901, 902, 9
03 (903-1 to 903-3) are a logical product element, a logical sum element, an exclusive logical sum element, and a negative logical element, respectively. 90 is a timer of 10 microseconds. Timer 9
0 is the system bus use permission signal 91A from the PMCU
Whether a mismatch between (61A) and 91B (61B) is detected by the exclusive OR element 902 and the signal 910 is asserted,
Or PMCUERR signal 9 when an error is detected in PMCU
When 5A is asserted, it starts counting and C of other system
When the XDISCON signal that disconnects the PU block is asserted, the count is stopped and cleared, but after 10 microseconds have elapsed, the timeout signal 96 is asserted. The logical product element 900-1 is a circuit that detects a one-sided error. That is, when the own system is not an error and the other system is an error, the error signal 99 of the other system is asserted, so that the other system CP
The U block disconnection request signal 94A is asserted. The logical product elements 900-2 and 900-3 are circuits that detect a synchronization shift. The one in which the system bus request signal and the system bus use permission signal are not asserted is regarded as a failure, and the synchronization deviation error signals 911 and 98 of the other system are given.
Assert. Regarding the sync error signal 98, it is dangerous to issue a disconnection request immediately because the CPU may already be out of sync for another reason. Therefore, as described above, the synchronization error signal 98 is masked by the timer 90 and the output 96 for a while until the cause becomes clear.

【００６１】図１０と図１１に同期ずれエラーを検出し
てから、片系ＣＰＵブロック切り離し要求がアサートさ
れるまでのタイムチャートを示す。図１０は、ＰＭＣＵ
からのバス使用許可信号が、Ｂ系からは正常に出力され
たが、Ｂ系から出力されなかった場合を示している。Ｐ
ＭＣＵからのバス使用許可信号(PBGRTB−N）は、一旦Ｄ
ＳＢＡでラッチされ（６２Ｂ）、ＤＳＢＡ間インタフェ
イス６０を介して、１サイクルかけて二つのＤＳＢＡ間
で渡される（９１Ｂ）。自系のＤＳＢＡ内では、他系か
らのバス使用許可信号と位相をあわせるために内部でさ
らにラッチする（９１Ａ）。ＤＳＢＡでは、９１Ａと９
１Ｂを比較して不一致信号(CMPERR−N）をアサートす
る。10 and 11 are time charts from the detection of the synchronization error to the assertion of the one-system CPU block disconnection request. Figure 10 shows the PMCU
The bus use permission signal from is normally output from the B system, but is not output from the B system. P
The bus use permission signal (PBGRTB-N) from the MCU is once D
It is latched by SBA (62B) and passed between two DSBAs over one cycle via the inter-DSBA interface 60 (91B). In the DSBA of the own system, it is further internally latched to match the phase with the bus use permission signal from the other system (91A). In DSBA, 91A and 9
1B is compared and a mismatch signal (CMPERR-N) is asserted.

【００６２】図１０の場合は、ＰＭＣＵの信号の同期ず
れであり、ＣＰＵの別の原因で同期ずれを起こしてお
り、ＰＭＣＵから切り離し要求が出されている可能性が
ある。そこで、しばらく切り離しを待って、切り離しが
行われなければ、ＤＳＢＡが改めて切り離し要求を出す
（９４Ａ）。In the case of FIG. 10, there is a possibility that the signal of the PMCU is out of sync, and the CPU is out of sync due to another cause, and there is a possibility that a disconnection request has been issued from the PMCU. Therefore, after waiting for disconnection for a while, if the disconnection is not performed, DSBA issues a disconnection request again (94A).

【００６３】図１１は、ＤＳＢＡからのバス使用要求信
号が、Ｂ系からは正常に出力されたが、Ｂ系から出力さ
れなかった場合を示している。ＤＳＢＡからのバス使用
要求信号(PBREQB−N）は、ＤＳＢＡ間インタフェイス６
０を介して、１サイクルかけて二つのＤＳＢＡ間で渡さ
れる（９２Ｂ）。自系のＤＳＢＡ内では、他系からのバ
ス使用許可信号と位相をあわせるために内部でさらにラ
ッチする（９２Ａ）。ＤＳＢＡでは、９２Ａと９２Ｂを
比較して不一致信号（９１０）をアサートし、切り離し
要求を出す（９４Ａ）。上記切り離し要求は、ＰＸＩ７
Ａに伝えられて、最終的な切り離し信号を生成する。FIG. 11 shows a case where the bus use request signal from the DSBA is normally output from the B system but not from the B system. The bus use request signal (PBREQB-N) from the DSBA is the interface 6 between the DSBAs.
It is passed between two DSBAs via 0 through one cycle (92B). In the DSBA of the own system, it is further latched internally to match the phase with the bus use permission signal from the other system (92A). In DSBA, 92A and 92B are compared, a mismatch signal (910) is asserted, and a disconnection request is issued (94A). The disconnection request is PXI7.
It is passed to A to generate the final disconnect signal.

【００６４】（ｄ）プロセッサ間インタフェイス制御装
置（ＰＸＩ）：図１２（ａ）に切り離し信号を生成する
ＰＸＩ７Ａの構成を示す。ＰＸＩは、各系にあるが、全
く同じ構成なのでここでは、Ａ系のＰＸＩについて説明
する。９４Ａは、自系のＤＳＢＡから出される他系ＣＰ
Ｕブロックの切り離し要求信号である。５７は、他系の
ＰＸＩとのインタフェイス信号であり、LXDISCONREQA−
N は、Ａ系からのＢ系切り離し要求、LXDISCONREQB−N
は、Ｂ系からのＡ系切り離し要求、LXDISCONA−Nは、Ａ
系からのＢ系切り離し指示、LXDISCONB−Nは、Ｂ系から
のＡ系切り離し指示である。６５は、LXDISCONB−Nをラ
ッチで受けて、タイミング調整した自系ＣＰＵブロック
の切り離し指示信号であり、ＤＳＢＡの出力ゲートを閉
じることによって自系をＤＳバスから切り離す。１２１
は、論理和素子である。(D) Interprocessor interface control device (PXI): FIG. 12A shows the configuration of the PXI7A for generating the disconnection signal. Although the PXI is present in each system, it has exactly the same configuration, so the PXI of the A system will be described here. 94A is another system CP issued from own system DSBA
This is a U block disconnection request signal. 57 is an interface signal with the PXI of the other system, LXDISCONREQA-
N is a request to disconnect the B system from the A system, LXDISCONREQB-N
Is a request to disconnect the A system from the B system, and LXDISCONA-N is A
B system disconnection instruction from system, LXDISCONB-N is A system disconnection instruction from B system. Reference numeral 65 denotes a disconnection instruction signal of the own system CPU block whose timing is adjusted by receiving LXDISCONB-N by a latch, and disconnects the own system from the DS bus by closing the output gate of the DSBA. 121
Is an OR element.

【００６５】１２２は、自系のＣＰＵの状態を保持する
状態レジスタである。状態としては、図１３に示す６つ
の状態(ＮＯＮＥ,ＩＮＩＴ,ＲＥＡＤＹ,ＣＯＰＹ,ＯＮ
ＬＮ,DISCON）がある。１２０は、どちらの系を切り離
すかを判定する切り離し判定回路である。エラーは、同
時に２箇所で発生したり、既に片系が切り離されている
ときに残存系でエラーが発生する事があるため、切り離
し要求を受けてそのまま切り離し指示を出すと両系切り
離しという致命的な状態になる場合がある。そこで、切
り離し判定回路１２０で切り離してよいかのネゴシエー
ションを行ってからＡ系からのＢ系切り離し指示LXDISC
ONA124をアサートする。Reference numeral 122 is a status register for holding the status of the CPU of its own system. As the states, the six states shown in FIG. 13 (NONE, INIT, READY, COPY, ON
LN, DISCON). A disconnection determination circuit 120 determines which system is to be disconnected. An error may occur at two locations at the same time, or an error may occur in the remaining system when one system has already been disconnected. Therefore, if a disconnection request is issued and a disconnection instruction is issued as is, a fatal disconnection of both systems will occur. It may be in a different state. Therefore, the disconnection determination circuit 120 negotiates whether the disconnection may be performed, and then the B system disconnection instruction LXDISC from the A system.
Assert ONA124.

【００６６】図１２（ｂ）に切り離し判定回路１２０の
判定論理を示す。即ち、自系がオンライン状態であっ
て、自系への切り離し要求がなくかつ他系への切り離し
要求が自系ＣＰＵから出されているときにのみ、LXDISC
ONA124がアサートされる。FIG. 12B shows the decision logic of the disconnection decision circuit 120. That is, only when the own system is online, there is no disconnection request to the own system, and the disconnection request to another system is issued from the own system CPU, LXDISC
ONA124 is asserted.

【００６７】（ｅ）ＣＰＵ動作モード：図１３にＣＰＵ
の状態を示す。状態としては、図１３に示す６つの状態
（NONE，ＩＮＩＴ，ＲＥＡＤＹ，ＣＯＰＹ，ＯＮＬＮ，
DISCON）がある。ＮＯＮＥは、未実装状態またはクロッ
ク停止であり、全く動作していない状態である。INIT
は、ＣＰＵの初期化中であり、他系とは非同期に自ＣＰ
Ｕの初期化処理を実行している状態である。ＲＥＡＤＹ
は、メモリコピー開始待ち状態である。メモリコピーに
ついては、後述する。ＣＯＰＹは、メモリコピー中であ
り、他系からのメモリコピーを受けてメモリ一致化処理
を行っている状態である。ＯＮＬＮは、システムに組み
込まれて正常に動作している状態である。DISCONは、他
系によって切り離し指示が出されている状態である。(E) CPU operation mode: CPU in FIG.
Indicates the state of. As the states, the six states shown in FIG. 13 (NONE, INIT, READY, COPY, ONLN,
DISCON). NONE is a non-mounted state or a clock stopped state, and is a state in which no operation is performed. INIT
Is in the process of initializing the CPU, and the CP itself is asynchronous with other systems.
This is a state in which the U initialization process is being executed. READY
Is in a memory copy start waiting state. The memory copy will be described later. COPY is a state in which a memory copy is being performed, and a memory matching process is performed by receiving a memory copy from another system. ONLN is in a state of being incorporated in the system and operating normally. DISCON is a state where a disconnection instruction is issued by another system.

【００６８】初期復電時の状態は、ＮＯＮＥであり、Ｃ
ＰＵの初期化処理を行った後にONLN状態になる。一方、
二重化同期動作状態からエラー発生によって、切り離さ
れた後再投入された場合には、ＣＰＵの初期化処理を行
った後に正常系のＣＰＵからのメモリの複写開始待ち状
態になる。その後、実際に正常系のＣＰＵからのメモリ
の複写を受けるＣＯＰＹ状態になり、複写が終了すると
ＯＮＬＮ状態となって二重化同期運転状態に復旧する。The state at the time of initial power recovery is NONE, and C
After the PU initialization process, the state becomes ONLN. on the other hand,
In the case where the redundant synchronous operation state is separated and then turned on again due to an error occurrence, the CPU is initialized, and then a waiting state for starting the copying of the memory from the normal CPU is entered. After that, the copying state of the memory from the CPU of the normal system is actually entered, and when the copying is completed, the ONLN state is entered and the duplex synchronous operation state is restored.

【００６９】（ｆ）動作説明：次に、本実施例で示す高
信頼計算機の代表的な動作について説明する。(F) Description of Operation: Next, a typical operation of the highly reliable computer shown in this embodiment will be described.

【００７０】（イ）正常時の動作：（１）ＩＯバスアクセスを伴わない時ＩＯバスアクセスを伴わずプロセッサとメモリだけでプ
ログラム実行が行われる時には、二つのＣＰＵは、同じ
プログラムを同じ順序で同期しつつも独立して実行す
る。(A) Normal operation: (1) When no IO bus access is involved When two programs are executed only by the processor and memory without IO bus access, the two CPUs execute the same program in the same order. Execute independently while synchronizing.

【００７１】（２）ＩＯアクセスを伴う時図１４に示すようにＩＯアクセス起動時には、二つのＣ
ＰＵから同じＩＯに対するアクセスが同時に出力され
る。ＤＳＢＡ（図７で示すプライマリとセカンダリのＤ
ＳＢＡ）は、これを受けて、プライマリＤＳＢＡ側のア
クセスだけを入出力バスあるいは入出力装置に伝える
（１４０，１４１）。ＩＯからの応答は、プライマリと
セカンダリのＤＳＢＡが受けて、二つのＣＰＵに同時に
同じ応答を返す（１４２，１４３）。ここでは、ＣＰＵ
をアクセス元としたが、入出力バスあるいは入出力装置
がアクセス元となるＤＭＡ（ダイレクトメモリアクセ
ス）の時は、図１４の起動と応答が逆になるだけであ
る。つまり、同期運転をしている二つのＣＰＵと１つの
入出力バスあるいは入出力装置をＤＳＢＡで接続し、Ｃ
ＰＵからのアクセスをＤＳＢＡで選択し、入出力バスあ
るいは入出力装置からのアクセスを二つのＣＰＵに分配
する事によって、ＩＯを伴ってもＣＰＵの二重化同期運
転を続けることができる。(2) When accompanied by IO access As shown in FIG. 14, when the IO access is started, two C
Accesses to the same IO are simultaneously output from the PU. DSBA (Primary and Secondary D shown in FIG. 7)
In response to this, the SBA) transmits only the access on the primary DSBA side to the input / output bus or the input / output device (140, 141). The response from the IO is received by the primary and secondary DSBAs and returns the same response to the two CPUs at the same time (142, 143). Here, CPU
However, when the input / output bus or the input / output device is the access source, DMA (direct memory access) only causes the activation and response in FIG. 14 to be reversed. In other words, two CPUs operating in synchronization with one input / output bus or input / output device are connected by DSBA, and C
By selecting the access from the PU by the DSBA and distributing the access from the input / output bus or the input / output device to the two CPUs, the duplex synchronous operation of the CPUs can be continued even with the IO.

【００７２】（ロ）ＣＰＵ障害時の動作：障害時の動作
として（１）エラーの検出、（２）障害ブロックの切り
離し、（３）障害ブロックの交換、（４）交換ブロック
の再投入、のステップが必要なので、このステップに従
って説明する。(B) CPU failure operation: (1) error detection, (2) failure block disconnection, (3) failure block replacement, and (4) replacement block re-input as failures operation Since steps are required, the steps will be described.

【００７３】（１）エラー検出パリティチェックなど様々なエラー検出手段が考えられ
ているが、本発明ではエラー検出できることが重要であ
りどのような手段を用いるかは問題でない。従ってここ
では、片系のメモリでパリティエラーが発生した場合を
考える。(1) Error Detection Various error detection means such as parity check have been considered, but it is important in the present invention that error detection is possible, and it does not matter what kind of means is used. Therefore, here, consider the case where a parity error occurs in one-sided memory.

【００７４】（２）障害ブロックの切り離しエラーの発生がＰＸＩに報告されるとＰＸＩは、３００
ｎｓ程度の非常に短い時間でエラーが発生したＣＰＵブ
ロックに対して切り離し指示信号６５を出す。ＤＳＢＡ
は、切り離し指示信号６５に従って出力ゲートを閉じて
ＣＰＵブロックを入出力バスから切り離す。これによっ
て図１５に示すようにＩＯアクセス起動時には、正常系
のＤＳＢＡ（ライマリとセカンダリに関係無く）のアク
セスだけが入出力バスあるいは入出力装置に伝えられる
（１４４）。ＩＯからの応答は、正常系のＤＳＢＡ（ラ
イマリとセカンダリに関係無く）が受けて、正常なＣＰ
Ｕにのみ応答を返す（１４５）。ここでは、ＣＰＵをア
クセス元としたが、入出力バスあるいは入出力装置がア
クセス元となるＤＭＡ（ダイレクトメモリアクセス）の
時は、図１５の起動と応答が逆になるだけである。(2) Disconnection of faulty block When an error is reported to the PXI, the PXI becomes 300
The disconnection instruction signal 65 is issued to the CPU block in which the error has occurred in a very short time of about ns. DSBA
Closes the output gate according to the disconnection instruction signal 65 to disconnect the CPU block from the input / output bus. As a result, as shown in FIG. 15, when the IO access is activated, only normal DSBA access (regardless of primary and secondary) is transmitted to the input / output bus or input / output device (144). The response from IO is received by the normal system DSBA (regardless of primary and secondary), and the normal CP
A response is returned only to U (145). Here, the CPU is used as the access source, but in the case of DMA (direct memory access) in which the input / output bus or the input / output device is the access source, the activation and response in FIG. 15 are only reversed.

【００７５】（３）障害ＣＰＵの交換エラーが、回復不可能な永久故障による場合には、ＣＰ
Ｕの交換が必要となる。この交換は、正常系でオンライ
ン業務を実行中に行うので、ＣＰＵのオンライン挿抜と
呼ぶことにする。図１６にＣＰＵのオンライン挿抜手順
を示す。図２のＣＰＵ−ＢＯＸには、図１６(ａ)に示す
オンライン挿抜用パネルがある。１６０は、抜去要求ス
イッチでありシステム対して、当該ＣＰＵを抜去したい
意志を正常系のＣＰＵに伝える。１６１は、抜去許可ラ
ンプであり、正常系のＣＰＵが抜去要求のあったＣＰＵ
を抜去してよいときに赤色点灯する。１６２は、ＣＰＵ
−ＢＯＸの電源スイッチであり、抜去許可状態の時のみ
有効である。１６３は、ＣＰＵ−ＢＯＸを固定するため
のメカニカルキーであり、鍵を持たない作業者が誤って
抜去することを防止する。(3) Replacement of faulty CPU If the error is due to an unrecoverable permanent failure, CP
U must be replaced. Since this exchange is performed while the online work is being executed in the normal system, it will be called online insertion / removal of the CPU. FIG. 16 shows a CPU online insertion / removal procedure. The CPU-BOX in FIG. 2 has an online insertion / extraction panel shown in FIG. 16 (a). Reference numeral 160 denotes a removal request switch, which informs the system of the intention to remove the CPU to the normal CPU. Reference numeral 161 denotes a removal permission lamp, and a CPU of a normal system has a removal request.
Lights red when you can remove. 162 is a CPU
-BOX power switch, valid only when removal is permitted. Reference numeral 163 denotes a mechanical key for fixing the CPU-BOX, which prevents an operator who does not have the key from accidentally removing it.

【００７６】抜去時の手順を図１６（ｂ）のフローチャ
ートに示す。まず作業者は、抜去要求スイッチをオンし
てから抜去許可ランプが赤色点灯するまで待つ。そして
点灯したら、電源を切った後に抜去する。次に交換ＣＰ
Ｕ−ＢＯＸを挿入する場合は図１６（ｃ）のフローのよ
うに、抜去要求スイッチがオフしていることを確認して
挿入後に電源を入れる。The procedure for removal is shown in the flowchart of FIG. 16 (b). First, the operator turns on the removal request switch and then waits until the removal permission lamp lights up in red. When it lights up, turn off the power and then remove it. Next exchange CP
When inserting the U-BOX, as shown in the flow of FIG. 16C, after confirming that the removal request switch is off, turn on the power after insertion.

【００７７】（４）交換ＣＰＵの再投入交換ＣＰＵ−ＢＯＸは、復電後、図１３に示す様にＣＰ
Ｕの初期化処理を行った後に正常系のＣＰＵからのメモ
リの複写開始待ち状態になる。その後、実際に正常系の
ＣＰＵからのメモリの複写を受けるＣＯＰＹ状態とな
る。正常系のＣＰＵは、交換ＣＰＵがＣＯＰＹ状態にな
ると、メモリコピープログラムを生成して全てのメモリ
を走査する。(4) Reintroduction of the replacement CPU The replacement CPU-BOX has a CP as shown in FIG. 13 after power recovery.
After the initialization process of U is performed, a state of waiting for the start of copying the memory from the normal CPU is started. After that, the COPY state is reached in which the memory is actually copied from the normal CPU. When the replacement CPU is in the COPY state, the normal CPU generates a memory copy program and scans all the memories.

【００７８】図１７にメモリコピー時のデータの流れを
示す。プロセッサからメモリにアクセス（１７０）があ
るとＰＭＣＵ内のメモリアクセスモニタ４５がメモリア
クセスのアドレスとデータを盗みとってＤＳバス経由で
交換ＣＰＵ−ＢＯＸのメモリに書き込む（１７１）。こ
の時のプロセッサからメモリにアクセスとしては、メモ
リコピープログラムがメモリを走査するときに発生する
ものとそれ以外のオンライン業務用のプログラムが発生
せるものがあるが、いずれの場合もメモリコピー動作と
しては同じ様に扱う。FIG. 17 shows a data flow at the time of memory copy. When the memory is accessed from the processor (170), the memory access monitor 45 in the PMCU steals the memory access address and data and writes it in the memory of the exchange CPU-BOX via the DS bus (171). There are two types of memory access from the processor at this time: those that occur when the memory copy program scans the memory and those that generate other online business programs. In either case, the memory copy operation is Treat in the same way.

【００７９】一方、メモリコピー中に発生するＤＭＡラ
イトアクセス（１７２）も同様にメモリアクセスモニタ
４５がメモリアクセスのアドレスとデータを盗みとって
ＤＳバス経由で交換ＣＰＵ−ＢＯＸのメモリに書き込む
（１７３）。つまり、メモリコピープログラムで全ての
メモリを走査して、正常系のＣＰＵのメモリ内容を交換
ＣＰＵのメモリに複写しつつ、この間に行われる通常プ
ログラムによるメモリ更新やＤＭＡによるメモリ更新を
も全て交換ＣＰＵのメモリに反映することによって、メ
モリコピープログラムのメモリ走査が完了したときに
は、二つのメモリの内容は完全に一致している事にな
る。On the other hand, in the DMA write access (172) which occurs during memory copy, the memory access monitor 45 also steals the memory access address and data and writes it in the memory of the exchange CPU-BOX via the DS bus (173). . That is, all the memories are scanned by the memory copy program to copy the memory contents of the normal CPU to the memory of the exchange CPU, and the memory update by the normal program and the memory update by the DMA performed during this period are all exchange CPUs. When the memory scanning of the memory copy program is completed, the contents of the two memories are completely matched by reflecting them in the memory of.

【００８０】しかしこのままでは、プロセッサの内部状
態が一致していないので、二つのＣＰＵを同時にリセッ
トして全く同じ動作を開始させる。これによって、再び
二重化同期運転状態に復旧できる。However, in this state, since the internal states of the processors do not match, the two CPUs are reset at the same time and the same operation is started. As a result, the duplex synchronous operation state can be restored again.

【００８１】[0081]

【発明の効果】本発明によれば、ＣＰＵのような非常に
高速で高度な実装技術が要求される部分に二重化制御装
置を設けるのでなく、入出力装置との接続部分にＤＳＢ
Ａを設けることで、容易にかつ安価に高信頼性を実現で
きる。つまり、ハードウエアの１点障害が発生しても当
該障害部分を切り離して、処理を続行することによりノ
ンストップ運転を実現できる。またオンライン業務実行
中に障害部位を交換することができるため、ノーダウン
運転を実現できる。According to the present invention, the redundant control device is not provided in a portion such as a CPU that requires a very high speed and a high level of mounting technology, but a DSB is provided in a connecting portion with an input / output device.
By providing A, high reliability can be easily realized at low cost. That is, even if a hardware one-point failure occurs, the non-stop operation can be realized by separating the failure part and continuing the processing. Further, since the faulty part can be exchanged during the execution of the online work, no-down operation can be realized.

[Brief description of drawings]

【図１】高信頼計算機の全体構成図である。FIG. 1 is an overall configuration diagram of a highly reliable computer.

【図２】装置外観図である。FIG. 2 is an external view of the device.

【図３】概略構成図である。FIG. 3 is a schematic configuration diagram.

【図４】ＰＭＣＵの構成図である。FIG. 4 is a configuration diagram of a PMCU.

【図５】クロック給電図である。FIG. 5 is a clock power supply diagram.

【図６】ＤＳＢＡ間インタフェイスの図である。FIG. 6 is a diagram of an interface between DSBAs.

【図７】ＤＳＢＡの構成図である。FIG. 7 is a block diagram of DSBA.

【図８】ＤＳＢＡ出力ゲート制御回路の図である。FIG. 8 is a diagram of a DSBA output gate control circuit.

【図９】切り離し要求生成回路の図である。FIG. 9 is a diagram of a disconnection request generation circuit.

【図１０】同期ずれ検出時のタイムチャートの図であ
る。FIG. 10 is a diagram of a time chart when synchronization deviation is detected.

【図１１】同期ずれ検出時のタイムチャートの図であ
る。FIG. 11 is a diagram of a time chart when synchronization deviation is detected.

【図１２】切り離し判定手段の図である。FIG. 12 is a diagram of separation determination means.

【図１３】ＣＰＵブロックの動作モードの図である。FIG. 13 is a diagram of an operation mode of a CPU block.

【図１４】正常時のＩＯアクセス動作の図である。FIG. 14 is a diagram of an IO access operation under normal conditions.

【図１５】ＣＰＵ障害時のＩＯアクセス動作の図であ
る。FIG. 15 is a diagram of an IO access operation when a CPU fails.

【図１６】オンライン挿抜手順の図である。FIG. 16 is a diagram of an online insertion / extraction procedure.

【図１７】メモリコピー時のデータの流れの図である。FIG. 17 is a diagram showing the flow of data during memory copy.

[Explanation of symbols]

３Ａ，３Ｂ，４Ａ，４Ｂ…プロセッサ、５Ａ，５Ｂ…メ
モリ、６Ａ，６Ｂ…プロセッサメモリ制御ユニット、９
Ａ…システムバス、１１Ａ，１１Ｂ，１２Ａ，１２Ｂ，
１３Ａ，１３Ｂ，１４Ａ，１４Ｂ…多重システムバスア
ダプタ、１６…バススイッチ、２０Ａ，２０Ｂ，２１
Ａ，２１Ｂ…入出力バスアダプタ。3A, 3B, 4A, 4B ... Processor, 5A, 5B ... Memory, 6A, 6B ... Processor memory control unit, 9
A ... system bus, 11A, 11B, 12A, 12B,
13A, 13B, 14A, 14B ... Multiple system bus adapter, 16 ... Bus switch, 20A, 20B, 21
A, 21B ... Input / output bus adapter.

───────────────────────────────────────────────────── フロントページの続き (72)発明者宮崎義弘茨城県日立市大みか町五丁目２番１号株式会社日立製作所大みか工場内 (72)発明者日向一弘茨城県日立市大みか町五丁目２番１号株式会社日立製作所大みか工場内 (72)発明者石川佐孝神奈川県海老名市下今泉810番地株式会社日立製作所オフィスシステム事業部内 (72)発明者大黒浩神奈川県海老名市下今泉810番地株式会社日立製作所オフィスシステム事業部内 ─────────────────────────────────────────────────── ─── Continuation of front page (72) Inventor Yoshihiro Miyazaki 5-2-1 Omika-cho, Hitachi City, Ibaraki Prefecture Hitachi Ltd. Omika Plant, Ltd. (72) Inventor Kazuhiro Hinata 5-chome, Omika-cho, Hitachi City, Ibaraki Prefecture No. 1 Incorporated company Hitachi Ltd. Omika factory (72) Inventor Sataka Ishikawa 810 Shimoimaizumi, Ebina City, Kanagawa Prefecture Incorporated company Hitachi Ltd. Office Systems Division (72) Inventor Hiroshi Oguro 810 Shimoimazumi, Ebina City, Kanagawa Prefecture Stock Company Hitachi, Ltd. Office Systems Division

Claims

[Claims]

1. A first data processing block having a first memory for storing a program and data, a first data processing device for fetching the program and data from the memory and processing the same, and storing the program and data. A second data processing block having a second memory and a second data processing device for fetching a program and data from the memory and processing the program and data, and a clock and a reset signal to the first and second data processing blocks. A clock means to be supplied, an input / output device for storing or sending a processing result to the outside according to designation of the first and second data processing blocks, a first and second data processing block and connected to the input / output device. Duplication control means, first inter-block communication means and second block connected to the first and second data processing blocks Reliable computer having a dual processing apparatus characterized by comprising communications means.

2. A first data processing block having a first memory for storing a program and data and a first data processing device for fetching the program and data from the memory and processing the same, and storing the program and data. A second data processing block having a second memory and a second data processing device for fetching a program and data from the memory and processing the program and data, and a clock and a reset signal to the first and second data processing blocks. A clock means for supplying, a first input / output device for storing or sending a processing result to the outside according to designation of the first and second data processing blocks, a first and second data processing block and the input / output device. The processing result is stored or externally specified by the first redundant control means to be connected and the designation of the first and second data processing blocks. A second input / output device having the same configuration as that of the first input / output device to be output, first and second data processing blocks, second duplication control means connected to the input / output device, first and second A high reliability computer having a duplexing processing device, comprising a first inter-block communication means and a second inter-block communication means connected to two data processing blocks.

3. The high reliability computer according to claim 1 or 2, wherein said duplexing control means selects an output instruction from said first or second data processing block and outputs it to said input / output device. A high-reliability computer having a duplexing processing device, characterized by transmitting the response from the input / output device to the first and second data processing blocks.

4. The high reliability computer according to claim 1 or 2, wherein the duplexing control means causes a memory access from the input / output device to a memory in the first and second data processing blocks. A high-reliability computer having a duplexing processing device, characterized in that a memory access response from the first or second data processing block is selected and transmitted to the input / output device.

5. The high reliability computer according to claim 1 or 2, wherein said clock supply means is a clock having the same frequency and the same phase for said first and second data processing blocks and said duplex control means. A high-reliability computer having a redundant processing device.

6. The high reliability computer according to claim 1 or 2, wherein said duplex control means is a main control means connected to said first data processing block and an input / output bus, and said second control means. Data processing block and slave control means connected to the input / output bus, the main control means sending a start signal to the input / output bus, and the main and slave control means sending a start signal from the input / output bus. A high-reliability computer having a duplex processing device characterized by receiving.

7. The high reliability computer according to claim 1, wherein the first data processing device uses a main processor that retrieves a program and data from a memory and processes the program and data, and the same clock as the main processor. , A slave processor that executes the same program synchronously, but does not write data to the memory, and an output of the master and slave processors when connected to the master and slave processors and the master processor makes an external access A highly reliable computer having a duplication processing device, characterized by having a comparison means for comparing data.

8. The high reliability computer according to claim 1 or 2, wherein the first data processing device comprises:
A first disconnection request for disconnecting the data processing device from the duplication control device when the error detection device detects an error. To the first and second inter-block communication means, and the second data processing device has a second error detecting device for detecting an error occurring in the data processing device. When it detects an error, it issues a second disconnection request to the second inter-block communication means for disconnecting the data processing device from the duplication control means, and the first inter-block communication means sends the first disconnection request. When there is a second disconnection request without a request, a second disconnection instruction for disconnecting the second data processing device from the redundant control means is issued to the redundant control means, and a second block is issued. The inter-cook communication means issues a first disconnection instruction to disconnect the first data processing device from the redundant control means to the redundant control means when there is a first disconnect request and no second disconnect request. A high-reliability computer having a dual processing device.

9. The high reliability computer according to claim 5, wherein said main controller means inhibits sending of a start signal to an input / output bus in response to a disconnection instruction, and said slave control means responds to a disconnection instruction. A high-reliability computer having a duplex processing device, characterized in that start signal transmission to the input / output bus is started in place of the main controller means.

10. The high reliability computer according to claim 2, wherein the first and second duplex control means and the first and second input / output devices have independent addresses, and the first and second A second aspect of the present invention is a high reliability computer having a dual processing device, which is provided with status information indicating which one of the redundant control device and the input / output device is normal.

11. The high reliability computer according to claim 10, wherein if the status information indicates that the first redundant control means and the first input / output device are normal, the first redundant control means. If the status information indicates that the second redundant control means and the second input / output device are normal, the second redundant control means and the second input / output device are used. A high-reliability computer having a duplex processing device characterized by using an output device.

12. The high reliability computer according to claim 2, wherein the first and second redundant control means and the first and second input / output devices have independent addresses, and the second redundant controller is provided. The control means also has a function of receiving an address for the first input / output device, converting the address into an address of the second input / output device, and transmitting the second input / output device. A high-reliability computer having a duplexing processing device, characterized in that it is provided with a setting means for determining whether or not it is valid.

13. The high reliability computer according to claim 12, wherein the setting means is set to be effective when the first input / output device fails. .