JPS5932050A

JPS5932050A - Computer system

Info

Publication number: JPS5932050A
Application number: JP57141674A
Authority: JP
Inventors: Yoshihiro Matsumoto; 吉弘松本
Original assignee: Toshiba Corp; Tokyo Shibaura Electric Co Ltd
Current assignee: Toshiba Corp
Priority date: 1982-08-17
Filing date: 1982-08-17
Publication date: 1984-02-21
Also published as: JPS6248255B2

Abstract

PURPOSE:To improve the reliability of a fault processing function for a composite computer system using plural computers, by providing a deciding function for collation, diagnosis and majority to each computer. CONSTITUTION:A computer system link (CSL)6 is provided with a device number register 7 which stores the device numbers of CSLs of other computers, an address register 9 which has a communication with an instruction register 5 via a memory bus 2 for a physical address, a data register 10 which stores the data to be exchanged with other computers, and a coincidence detecting part 8 which detects the coincidence between the value of the register 7 and that of the register 9. The CSL6 is connected to a CSL of another computer via an interface bus 11. An instruction OP of the register 5 can access freely to the local memory of another computer by a physical address PA and via the CSL6.

Description

【発明の詳細な説明】ａ　技術分野本発明は複数の計算機を用いた複合系計算機システムに
係り、任意の計算機に障害が発生したとき他の健全な計
算機がこれをバックアップして障害の影響がシステムの
全体に及ばない様に保護した耐障害形の計算機システム
に関する。[Detailed Description of the Invention] a. Technical Field The present invention relates to a complex computer system using a plurality of computers, in which when a failure occurs in any computer, other healthy computers back it up and eliminate the effects of the failure. This invention relates to a fault-tolerant computer system that is protected so that the entire system is not affected.

ｂ　従来技術従来から、複数の計算機を用いた耐障害形計ｑ。b. Conventional technology Traditionally, fault-tolerant meters q have used multiple computers.

’９．　（ｆａｕｌｔ　ｔｏｌｅｒａｎｔ　ｓｙｓｔｅ
ｍ　）は多くの人々によって種々の構成のものが発明さ
れている。（オーム社発行、マイクロコンピュータ基礎
講座、ソの他で公知とされている。）しかし、その多くは特定の構成要素が照合機能診断機能
、障害個所判定機能を有しているため、上記特定の構成
要素に障害が発生すると全体のシステムの機能に障害を
与える欠点があった。'9. (fault tolerant system
m) has been invented in various configurations by many people. (Published by Ohmsha, Microcomputer Fundamentals Course, etc.) However, many of them have specific components that have a verification function, diagnosis function, and failure location determination function. There is a drawback that when a failure occurs in a component, the function of the entire system is impaired.

Ｃ発明の目的本発明の目的は、複数の計ｎ機を用いた複合系計算機シ
ステムに於て、個々の計算１幾に照合２診［１ｊ「と多
数決による判定１幾能を設け、耐障害処理機能の信頼性
の向上した耐障害形の計算機システムを得ることにある
。CObject of the Invention The object of the present invention is to provide a multifunction computer system that uses a plurality of computers, a function that performs verification, diagnosis, and judgment by majority vote for each individual calculation, and improves fault tolerance. The object of the present invention is to obtain a fault-tolerant computer system with improved reliability of processing functions.

ｄ　発明の概要本発明は、渠１の割算機が命令を実行したときに前記命
令のオペランドの番地を示すアドレス信号をメモリバス
に乗せ、前記アドレス１８号から第２の計嘗機を識別し
て前記第２の計９機の主記憶装置内の前記番地の記憶場
所の内容を前記命令の作用により読み出したり書き改め
たりする計算機間通イδ装置を用い゛Ｃ相互に結合され
た複数の割算機に、それぞれ同一目的のプログラムを記
憶させ、それぞれの前記プログラムの途中に排他的に実
行する照合セクションを設け、前記照合セクションの作
用により前記複数の計算機のそれぞれの計算機が他の計
＃機の計算値、状態値と照合し、照合結果を更に照合し
、異常と判定したとき障害個所、システム再構成、再開
方針の判断を行い、その判断結果を照合して耐障害処理
を行うことを特徴とした計靭９機システムである。d.Summary of the Invention The present invention provides, when the divider 1 executes an instruction, an address signal indicating the address of the operand of the instruction is placed on the memory bus, and the second divider is identified from the address No. 18. A plurality of interconnected computers are used to read out and rewrite the contents of the storage locations at the addresses in the main storage devices of the second nine computers in accordance with the action of the instructions. A program for the same purpose is stored in each of the dividing machines, and a verification section is provided in the middle of each of the programs to be executed exclusively, and each of the plurality of calculators can be used to perform other calculations by the action of the verification section. # Compare the calculated values and status values of the machine, further collate the collation results, and when it is determined that there is an abnormality, determine the location of the failure, system reconfiguration, and restart policy, and perform fault tolerance processing by collating the determination results. This is a total nine-machine system with the following characteristics.

ｅ　発明の構成第【図は本発明で使用される計↑フ機の１台のみを示し
た構成図である。e. Configuration of the Invention The figure is a configuration diagram showing only one of the counting machines used in the present invention.

通常の計算機と同様にローカルメモリ（以下ＬＭと記す
）【、プロセツザユニット（以下１）　Ｕと記す）３、
演算装置（以下ＡＵと記す）４、及びメモリバス２等で
構成する。ＡＵの内部には命令レジスタ５があり、命令
の作用を指定するＯ　Ｐ部と、その作用に関係したデー
タの格納されている物理的番地を指定するＰＡ部に分れ
ている。Local memory (hereinafter referred to as LM) [, processor unit (hereinafter referred to as 1), and processor unit (hereinafter referred to as U) 3.
It is composed of an arithmetic unit (hereinafter referred to as AU) 4, a memory bus 2, and the like. There is an instruction register 5 inside the AU, which is divided into an OP section that specifies the action of the instruction, and a PA section that specifies the physical address where data related to that action is stored.

壕だ、第１図は３２ビット形計算機を例として示してお
り、命令レジスタ５の０〜７ビツト壕でをＯＰ部、８〜
３１ビツトをＰＡ部としている。Figure 1 shows a 32-bit computer as an example, and the 0 to 7 bits of instruction register 5 are the OP section, the 8 to 7 bits are
31 bits are used as the PA section.

コンピュータシステムリンク（以下Ｃ８Ｌと記憶）６は
本発明の構成のために特に設けた装置でその内容は既に
特願昭５６−１０１２２７、同１０１２２８　、同１０
１２２９、及び“ｌ’０８ＢＡｃシリーズ７機器説明書
７７０ＫＬ４３Ａ等で周知とした計算載量通信装置であ
る。即ちＣ３Ｌ６には他の計算機の０８１．のｉ器番号
を記憶する機器番号レジスタ７、メモリバス２を介して
レジスタ５との間で物理的番地を交信するだめのアドレ
スレジスタ９、他の計算１幾との間で交換するデータを
格納するデータレジスタ１０１機器番号レジスタ７とア
ドレスレジスタ９に格納した値（上位アドレス）の一致
検出を行う一致検出部８が設けである。The computer system link (hereinafter referred to as C8L) 6 is a device specially provided for the configuration of the present invention, and its contents have already been disclosed in Japanese Patent Applications No. 56-101227, No. 101228, and No. 10122.
1229, and "l'08BAc Series 7 Equipment Manual 770KL43A, etc.".In other words, C3L6 has a device number register 7 that stores the i device number of 081. of other computers, and a memory bus. Address register 9 is used to exchange physical addresses with register 5 via 2, data register 101 is used to store data exchanged with other calculations 1, and data is stored in device number register 7 and address register 9. A coincidence detection section 8 is provided to detect coincidence of the values (upper addresses).

Ｃ３Ｌ　６はインターフェースバス１１を介して他の計
算機のＣ８Ｌと接続し、命令レジスタ５の命令ＯＰは物
理的番地Ｆ　ＡによりＣ８Ｌ　５を、介して他の計算機
のローカルメモリに自由にアクセスすることができる。The C3L 6 is connected to the C8L of another computer via the interface bus 11, and the instruction OP of the instruction register 5 can freely access the local memory of the other computer via the C8L 5 at the physical address FA. can.

第２図は３台の計算機をＣ８Ｌ　６−１２．１３，２１
．２３゜３１．３３’及びインターフェースバス１１−
１２．１３．２３を介して結合した例で完全結合方式と
称する。この方式ではｎ台の計算機を結合する場合にＣ
８Ｌがｎ（ｎ−１）個必要となる。Figure 2 shows three computers C8L 6-12.13, 21
．． 23°31.33' and interface bus 11-
12.13.23 is called a complete connection method. In this method, when connecting n computers, C
n(n-1) pieces of 8L are required.

ｖ、３図）−１，３台の計算機’ｅ　ＣＡＬ　６−１１
−１３．２１〜２３゜３１〜３３及びインターフェース
バス１１−１〜３　を介して結合した他の結合例でｎバ
ス結合方式と称する。v, Figure 3)-1, 3 computers'e CAL 6-11
-13.21 to 23 degrees 31 to 33 and interface buses 11-1 to 3. Another example of connection is called an n-bus connection method.

この方式ではｎ台の計算機を結合する場合にＣＡＬがｎ
１個必要となる。またこの方式の方が後述の理由により
耐障害性は大きい。In this method, when n computers are combined, CAL is n
One piece is required. Furthermore, this method has greater fault tolerance for reasons described later.

第４図は３台の計初、ｍ２１〜２３が第２図の様に結合
されている場合の命令オペランドから細針算機の物理的
番地の割付は方を説明する図である。FIG. 4 is a diagram illustrating how to allocate physical addresses of fine-needle calculators from instruction operands when three machines, m21 to m23, are combined as shown in FIG.

即ち同図は第１の計算機２１が第２の計算機２２と第３
の計算機２３内のローカルメモリを参照する場合を示し
ている。That is, in the figure, the first computer 21 is connected to the second computer 22 and the third computer 22.
The case where the local memory in the computer 23 is referred to is shown.

第一１の計算機２１の内部に有する命令レジスタのＦＡ
部２４の物理的番噛’ｂｓ　ＬＭＢ　＜メガバイト）を
越えて２ＭＢ未満のときＣ８Ｉ、６−１２内の図示しな
い一致検出部が作用して第２の計算機２２のローカルメ
モリを参照する。また、命令レジスタのＰＡＡｇ２Ｏ物
理的番地が２ＭＢを越えて３．ＭＢ未満であるときはＣ
３Ｌ６−１３内の図示しない一致検出部が作用して第３
の計算機２３のローカルメモリを参照する。第４図では
Ｃ８Ｌ　＠　を個で表わしているが実際には前述した様
にインタフェースバスの両側には２個のＣ８Ｌが存在し
インタフェースノ（スの交信権の制御を行うＣＡＬをマ
スタＣ８Ｌ　、他をスレーブＣ８Ｌと称する。FA of the instruction register inside the first computer 21
When the physical size of the part 24 exceeds the physical number 'bs LMB < megabytes) and is less than 2 MB, a match detection part (not shown) in the C8I, 6-12 operates to refer to the local memory of the second computer 22. Also, the PAAg2O physical address of the instruction register exceeds 2MB and 3. C if less than MB
A coincidence detection section (not shown) in 3L6-13 acts to detect the third
The local memory of the computer 23 is referred to. In Figure 4, each C8L @ is shown as an individual, but in reality, as mentioned above, there are two C8Ls on both sides of the interface bus, and the CAL that controls the communication rights of the interface bus is the master C8L, and the other C8Ls. is called slave C8L.

第５図は第１の計算機２１が第２の計算機２２のローカ
ルメモリにデータを書き込む場合の信号の流れ図であり
、第６図はそのタイミング図である。まず第１の計算機
２１内部のＡＵが書込み要求としてＣＢＢＳＹＯ，ＣＭ
ＩＦＯ，ＣＷＲＴＩ−１０をＣ８Ｉ、　６−１２に出す
。Ｃ８Ｌ　６−１２　はアドレス情報、モード情報を取
込みＡＵに対して待ち要求ＣＷＡ　Ｉ　Ｔｏを返しＡＵ
のアクセスを待たせる。同時にマスター側のＣ３Ｌ６−
２１　Ｋインタフェースバス取得要求Ｃｌ几ＥＱＯヲ出
す。要求が受付られるとバス許可信号ＣＩＥＮＬＯがＣ
３Ｌ５−２１から返ってくる。この信号を受取ったＣ３
Ｌ６−１２はＡＵから所定の時間後に再度書込要求ＣＢ
８８ＹＯ，ＣＭＬ如ＦＯ，ＣＷＲＴＨＯがくるとＣＡＣ
ＰＴＯをＡＵに返し書込要求を受付ける。続いてＡＵは
Ｃ８Ｌ　６−１２にＣＤＡＴＡＯ信号と書込みデータを
出力し、Ｃ３Ｌ６−１２はデータを受取りＣ３ＹＮＨＯ
をＡＵに返す。FIG. 5 is a signal flow diagram when the first computer 21 writes data to the local memory of the second computer 22, and FIG. 6 is a timing diagram thereof. First, the AU inside the first computer 21 issues a write request to CBBSYO, CM.
IFO issues CWRTI-10 to C8I, 6-12. C8L 6-12 takes in the address information and mode information and returns a wait request CWA I To to the AU.
Wait for access. At the same time, C3L6- on the master side
21 Issues a K interface bus acquisition request CL EQO. When the request is accepted, the bus permission signal CIENLO changes to C.
Returns from 3L5-21. C3 that received this signal
L6-12 requests CB to write again after a predetermined time from AU.
88YO, CML like FO, CAC when CWRTHO comes
Returns PTO to AU and accepts write request. Next, AU outputs the CDATAO signal and write data to C8L 6-12, and C3L6-12 receives the data and sends it to C3YNHO.
is returned to the AU.

アドレスとデータをＡＵから受取ったＣ３Ｌ６−１２は
Ｃ３Ｌ６−２１　’１ｃＡＤＲＧＯのタイミングでイン
タフェースバスヘアドレス情報を出力する。Ｃ３Ｌ６−
２１はストローブ信号ＣＡＤＲ８Ｏによってアドレス情
報を受取る。同様にＣ３Ｌ６−１２から０ＤＡＴＧＯの
ゲートタイミングでインタフェースバスに出力されたデ
ータをＣＤＡＴ８０のストローブ信号でＣ３Ｌ６−２１
にデータを受取る。The C3L6-12 that has received the address and data from the AU outputs address information to the interface bus at the timing of C3L6-21'1cADRGO. C3L6-
21 receives address information by strobe signal CADR8O. Similarly, the data output from C3L6-12 to the interface bus at the gate timing of 0DATGO is transferred to C3L6-21 using the CDAT80 strobe signal.
Receive data to.

アドレスとデータの情報の取込を完了したＣ３Ｌ６−２
１は第２の計算機２２のローカルメモリに書込むためメ
モリバス取得要求ＣＡＴＮＢＯを出す。第２の計算機２
２はバス使用許可信号としてＣＩ？ＡＫＢＯを出す。Ｃ
３Ｌ６−２１がこの信号を受取ると、　ＣＢＢＳＹＯ。C3L6-2 has completed importing address and data information
1 issues a memory bus acquisition request CATNBO in order to write to the local memory of the second computer 22. Second calculator 2
2 is CI? as a bus use permission signal? Release AKBO. C
When 3L6-21 receives this signal, CBBSYO.

ＣＭＲＥＦＯ、ＣＷＲＴＩ（０を出し、ローカルメモリ
への書込み要求をする。第２の計算機２２はアドレスを
受取＋）　ＣＡＣＰＴＯを返す。Ｃ８Ｉ、６−２１　ハ
ＣＤＡＴＡＯトデータを出し第２の計算機２２はデータ
を受取るとＣ３ＹＮＨＯをＣ３Ｌ６−２１へ返して第１
の計算機２１から第２の計算機２２へのデータの書込み
のサイクルを完了する。これらの信号のタイミングは第
６図に示す通りである。CMREFO, CWRTI (issues 0, requests write to local memory; second computer 22 receives address +) returns CACPTO. C8I, 6-21 C outputs CDATAO data, and when the second computer 22 receives the data, it returns C3YNHO to C3L6-21 and sends the data to the first computer 22.
The cycle of writing data from the second computer 21 to the second computer 22 is completed. The timing of these signals is as shown in FIG.

第７図は第１の計算機２１が第２の計算機２２のローカ
ルメモリからデータを読み出す場合の信号の流れ図で第
８図はそのタイミング図である。FIG. 7 is a signal flow diagram when the first computer 21 reads data from the local memory of the second computer 22, and FIG. 8 is a timing diagram thereof.

まず、第【の計算機２」内部のＡＵが読出し要求ＣＢＢ
ＳＹＯ，ＣＭＲｇＦＯをＣ３Ｌ６−１２　に出す。Ｃ３
Ｌ６−１２はアドレスナ＾報、モード情報を取込みＡＵ
に対し待ち要求信号ＣＷＡｉＴＯを返しＡＵのアクセス
を待たせる。同時にマスターＣ３Ｌ６−２１にインタフ
ェースバス取得要求ＣＩＲＥＱＯを出す。要求が受付ら
れるとＣＩ　ＢＮＬＯがマスターＣ８Ｉ、６−２１から
返ってくる。First, the AU inside [computer 2] makes a read request CBB.
Issue SYO, CMRgFO to C3L6-12. C3
L6-12 takes in address information and mode information and sends it to AU
A wait request signal CWAiTO is returned to the AU to make the AU wait for access. At the same time, an interface bus acquisition request CIREQO is issued to the master C3L6-21. When the request is accepted, CI BNLO is returned from master C8I, 6-21.

許可信号ＣＩＥＮＬＯを受取ったＣ３Ｌ６−１２はＣ３
Ｌ５−２１ヘアドレス情報、そ−ド情報をＣＡＤＲＧＯ
のタイミングで送出する。ＣＡＤＲ８０はそのストロー
ブ信号である。ＡＵへは再度待ち要求ＣＷＡＩＴＯを返
す。C3L6-12 that received the permission signal CIENLO
CADRGO the L5-21 head address information and field information.
Send at the timing of CADR80 is its strobe signal. The wait request CWAITO is returned to the AU again.

アドレス情報とモード情報を受取ったＣ３Ｌ６−２１は
第２の計算機２２のローカルメモ−りを読出すため、　
　　　　゛　　　　　メモリバス取得要求ＣＡＴＮＢＯ
を出す。ＣＡＴＮＢＯを受取った第２の計算機はメモリ
バス使用許可信号ＣＲＡＫＢＯを出す。After receiving the address information and mode information, the C3L6-21 reads the local memory of the second computer 22.
゛ Memory bus acquisition request CATNBO
issue. The second computer that received CATNBO issues a memory bus use permission signal CRAKBO.

Ｃ３Ｌ６−２１がこの信号を受取ると、ＣＢＢＳＹＯ、
ＣＭ）ＬＥル゛０を出しローカルメモリへ読出し要求を
する。第２の計算機２２はＣＡＣＰＴＯを返すと同時に
ローカルメモリの該当アドレスを読出し、ＣＤＡＴＡＯ
といっしょにデータを出方する。When C3L6-21 receives this signal, CBBSYO,
CM) Issues LE code 0 and requests read to local memory. At the same time as returning CACPTO, the second computer 22 reads the corresponding address in the local memory and reads CDATAO.
Output the data together.

Ｃ３Ｌ６−２１　カ；ＣＯ）　ｆ　−１を受取ルトｃｓ
ＹＮ１（ｏを第２の計算機２２に返しメモリパス読出し
サイクルを完了させる。データを受取ったＣ３Ｌ６−２
１は読出し要求をしているＣ３Ｌ６−１２へＣＤＡ　’
ｌ’ＧＯのタイミングでデータを出方し、ストローブ信
号ＣＤＡＴＳＯを出力する。データを受取ったＣ３Ｌ６
−１２に再度ＡＵから一定時間後に読出し要求ＣＢＢＳ
ＹＯ，ＣＭ几ＥＦＯかくるとＣＡＣＰＴＯを返し、ＣＤ
ＡＴＡＯとデータを同時に出力する。データを受取った
ＡＵはＣ３Ｌ６−１２にＣＳ　ＹＮＨＯを返し、全ての
読出しサイクルを完了する。これらの信号の実際のタイ
ミングは第８図に示す通りである。C3L6-21 F; CO) Receive f -1 root cs
Return YN1(o to the second computer 22 and complete the memory path read cycle.C3L6-2 that received the data
1 is CDA' to C3L6-12 making the read request
Data is output at the timing of l'GO, and a strobe signal CDATSO is output. C3L6 that received the data
-12, read request CBBS from AU again after a certain period of time
YO, CM EFO returns CACPTO, CD
Outputs ATAO and data at the same time. The AU that received the data returns CS YNHO to C3L6-12 and completes all read cycles. The actual timing of these signals is as shown in FIG.

以上のメモリバス制御信号を第１表に、斗たインタフェ
ースバス制御信号を第２表にまとめて示した。　（以下
全白）第１表（以下全白）第２表本発明は複数の計算機を上述したＣＡＬ等の計算載量通
信装置を用い前述の完全結合方式、またはｎバス結合方
式で結合して耐障害性を向上させる計算機システムであ
る。The above memory bus control signals are summarized in Table 1, and the interface bus control signals are summarized in Table 2. (Hereinafter, all white) Table 1 (Hereinafter, all white) Table 2 The present invention connects a plurality of computers by the above-mentioned complete coupling method or n-bus coupling method using the above-mentioned computational load communication device such as CAL. This is a computer system that improves fault tolerance.

各計算機のローカルメモリには耐障害処理のために用い
る特定の記憶領域を設け、この記憶領域に下記の変数名
を与えて表わす。A specific storage area used for fault-tolerant processing is provided in the local memory of each computer, and this storage area is expressed by giving the following variable names.

Ａ：照合を行う変数Ｘ：照合の結果、合致しなかった変数名を入れる変数。A: Variable to check X: A variable that stores variable names that do not match as a result of matching.

合致したときは′″Ｏ”を入れる。If there is a match, enter ``O''.

Ｙ：照合の結果を照合し、合致しなかった照合の結果を
入れる変数。合致したときは！１０″″を入れる。Y: A variable that matches the matching results and stores the matching results that do not match. When it matches! Insert 10″″.

Ｆ：故障判定の結果、すなわち、障害情報を格納する変
数。F: A variable that stores the result of failure determination, that is, failure information.

ｚ：Ｆ相互間で値の確認を行ない、その結果に合理性が
あるときはパ０”、合理性がないときは０”以外の値を
とる変数。z: A variable that confirms the values between F and takes a value of 0" if the result is reasonable, and a value other than 0 if it is not rational.

Ｐ：処理方針を値として持つ変数。P: Variable whose value is the processing policy.

第９図は本発明を適用する計算機システムを３台の計算
機で構成した場合の例で、各計算機のローカルメモり　
ｌ−１，１−２，１−３内の各変数の記憶領域を示した
メモリマツプである。Figure 9 shows an example where a computer system to which the present invention is applied is configured with three computers, and the local memory of each computer is
This is a memory map showing the storage area of each variable in l-1, 1-2, and 1-3.

各計算機には通常の命令のほかに以下に説明する耐障害
処理用命令（Ｉｎ５ｔｒｕｃｔｉｏｎｓ　ｆｏｒ　Ｆａ
ｕｌｔＴｏｌｅｒａｎｔ　Ｃｏｍｐｕｔｉｎｇ）　（以
後ＦＴＣ命令と略称で記す）を備える。In addition to normal instructions, each computer has fault-tolerant processing instructions (Instructions for Fa
ultTolerant Computing) (hereinafter abbreviated as FTC instruction).

（１）　　Ｃ０Ｌ（Ｐ：ｉｎ　ＡＲＲＡＹ　ｏｆ指定形
、　　Ｑ：ｏｕｔ　Ａ）ＬＬ（ＡＹｏｆ文字列）説明：Ｐはユーザが指定する形の変数の並び（並びの数
は計算機の台数と等しい）であり、この並び内の各変数
間の照合を行つ′〔多数決判定を行い、異なっている変
数名をＱに入れる。Ｐ、Ｑの部分には並び要素である変
数名をそのま＼記載してもよい。異なっている変数がな
いときはｔ咽ＯＮＢ”　を入れる。(1) C0L (P: in ARRAY of specified form, Q: out A) LL (AYof character string) Explanation: P is a sequence of variables in the form specified by the user (the number of sequences is equal to the number of computers). , performs a comparison between each variable in this sequence' (majority judgment is performed, and the different variable names are entered into Q. The variable names that are array elements may be written as is in the P and Q parts. If there are no different variables, enter "ONB".

（２）　　Ｆｌ、ＱＣ（Ｐ：ｉｎ　ＡＲＩ（ＡＹ　ｏｆ
文字列ｒ　Ｑ：ｏｕｔ　ＡＲＲＡＹ　ｏ　ｆ文字列）説明：Ｐを利用して障害部分がどこかを判定する。この
命令の実行ではどの計算機の故障か、どのＣ８Ｌの故障
かを判定する。計算機の内部の障害まで分析することは
できない。Ｐ、Ｑの部分には並び要素である変数名をそ
のま＼記載してもよい。(2) Fl, QC (P: in ARI (AY of
Character string r Q: out ARRAY of character string) Explanation: Determine where the faulty part is using P. When this instruction is executed, it is determined which computer or C8L is at fault. It is not possible to analyze the internal failures of a computer. The variable names that are array elements may be written as is in the P and Q parts.

ひとつのＣ８Ｌが故障したときに、完全結合方式では％
　　（ｎ−１）個の計算機間の照合しかとれないが、ｎ
バス結合方式では代替手段によって容易にｎ個の計算機
間の照合がとれる。多くのＣＡＬが故障して、インタフ
ェースバスが一本だけ活きている場合にもｎバス結合方
式ではなをかつ初期の耐障害処理を続行するが、完全結
合方式ではある数以上のインタフェースバスが用を果さ
なくなると、初期の耐障害処理機能の遂行は不可能とな
る。When one C8L fails, in the fully coupled method, %
Although it is possible to only match between (n-1) computers, n
In the bus coupling method, matching between n computers can be easily achieved by alternative means. Even if many CALs fail and only one interface bus is active, the n-bus combination method continues initial fault tolerance processing, but the complete combination method uses more than a certain number of interface buses. If the initial failure tolerance processing function is no longer fulfilled, it becomes impossible to perform the initial fault tolerance processing function.

Ａ３）　　ＤＩＡＧ（Ｑ：ｏｕｔ　ＡＲ）（ＡＹ　ｏｆ
文字列）説明：　ＦＬＱＣ命令によっても障害部分を断
定することができない場合に各構成要素を診断すること
によって障害箇所を同定するだめの命令である。A3) DIAG (Q: out AR) (AY of
Character string) Description: This command is used to identify the fault by diagnosing each component when the fault cannot be determined even by the FLQC command.

１４）　　ＲＢＣＯＶ（Ｐ：ｉｎ　ＡＩｔＲＡＹ　ｏｆ
文字列）説明：　ＦＬＱＣ命令、またはＤＩＡＧ命令に
よって同定された障害箇所を除去して、システムを再構
築し、システムを再開するための命令である。14) RBCOV (P:in AItRAY of
Character string) Description: This is an instruction to remove the fault identified by the FLQC command or DIAG command, rebuild the system, and restart the system.

ｆ　発明の作用本発明の計算機システムを構成する計算機は、通常の命
令機能のほかに前述のＦＴＣ命令群を備え、このＦＴＣ
命令が作用するときにＣＡＬを経由して他の計算機に影
響を及ぼし、前述の機能を果すことができる。f. Effect of the Invention The computer constituting the computer system of the present invention is equipped with the above-mentioned FTC instruction group in addition to the normal instruction function, and the FTC
When an instruction operates, it can affect other computers via the CAL and perform the functions described above.

この様な計算機を用いて構成する本発明の計算機システ
ムは、計算機、Ｃ８Ｌ、インタフェースバスの障害の際
に計算機システム内で耐障害処置を行って、外部へその
障害の影響を及ぼさない様に作用する。The computer system of the present invention configured using such a computer takes fault tolerance measures within the computer system in the event of a failure in the computer, C8L, or interface bus, so as to prevent the influence of the failure from affecting the outside. do.

本発明の計算機システムは前述の様に２台以上の計算機
、Ｃ８Ｌ、インタフェースバスで構成され、その結合方
法も前述したが、便宜上、第２図、または第３図に示し
た３台の計算機による構成を例に説明する。As mentioned above, the computer system of the present invention is composed of two or more computers, a C8L, and an interface bus, and the method of connecting them is also described above. The configuration will be explained as an example.

３台の計算機には全く同一のプログラムが記憶され、３
台の計算機の各ローカルメモＩＪ　ｌ−１１１−２１１
−３には第９図に示す様に耐障害処理用記憶領域を設け
ておき、３台の計算機は互いに同期されたタイミングで
実行するものとする。Exactly the same program is stored in the three computers, and 3
Local memo IJ l-111-211 for each computer
-3 is provided with a storage area for fault-tolerant processing as shown in FIG. 9, and the three computers are assumed to execute at mutually synchronized timing.

第１Ｏ図は３台の計算機の内相１の計ｎ機内のプログラ
ムの一部を説明的に示したものである。FIG. 1O is an explanatory view of a part of the programs in the three computers (inner phase 1).

プログラムの一部に照合セクション（Ｃｏｌｌａｔｉｏ
ｎｓｅｃｔｉｏｎ　）を任意に挿入することができ、こ
の照合セクションは文番号ｌから１７までに示した部分
で、　ｅｎｔｅｒ　ｃｏｌｌａｔｉｏｎ　５ｅｃｔｉｏ
ｎという命令に始まり、Ｉｅａｍｅ　ｃｏｌｌａｔｉｏ
ｎ　５ｅｃｔｉｏｎという命令に終る。前者の命令を実
行することによって第１の計算機の割込が禁止され、後
者の命令によって割込禁止は解除される。またこれ等の
命令によって、このセクション内を実行中はオペレーテ
ィングシステムもその実行に介入できない様になる。A part of the program includes a collation section (Collation section).
This collation section is the part shown in sentence numbers l to 17, and enter collation 5ectio
Starting with the instruction n, Ieame collation
The result is an instruction called n5ection. By executing the former instruction, interrupts of the first computer are prohibited, and by the latter instruction, the interrupt prohibition is canceled. These instructions also prevent the operating system from interfering with the execution of this section.

３台の計算機は同一のプログラムを同一タイミングで同
期して実行する様に作用させるものとすると、はソ同一
タイミングで照合セクションに入るはずであるが、それ
ぞれの計算機が回転式補助記憶装置、入出力装置、外部
割込みとの交信などを行うとタイミングがずれることが
予想される。Assuming that the three computers execute the same program synchronously at the same timing, they should enter the collation section at the same timing, but each computer has a rotary auxiliary storage device and an input section. It is expected that the timing will shift when communicating with an output device or an external interrupt.

そこで文番号２，３に示す命令で照合タイミングを最も
遅い計算機に合せて同期させる。Therefore, the matching timing is synchronized with the slowest computer using the instructions shown in statement numbers 2 and 3.

次に通常の命令によるプログラム上のどの点での照合を
行うかを示す里標名（ｍｉｌｅ−ｓｔｏｎｅ　ｐｏｉｎ
ｔｉｄｅｎ　ｔ　ｉ　ｆ　ｉｃａ　ｔ　１ｏｎ）と照合
しだい値ＡＩ（第２．第３の計算機ではＡ、　、　Ａ３
）を文番号４の命令５ｔｏｒｅ（Ａ、）で書き込む。Next, a mile-stone point is used to indicate which point on the program should be compared using normal instructions.
The value AI (A, , A3 in the second and third calculators)
) is written using the instruction 5tore(A,) of statement number 4.

文番号５の命令Ｃ０Ｌ（（Ａ、　、　Ａ、　、　Ａ、）
　：　ｉｎ　、　Ｘ、　：　ｏｕｔ）でＡＩ、　Ａ、　
、　Ａ３の内容を照合し、合致しない値があるときに多
数決判定を行い、不利な値の変数名をＸ、に入れる。Instruction C0L of statement number 5 ((A, , A, , A,)
: in, X, : out) for AI, A,
, A3 is checked, and if there is a value that does not match, a majority decision is made and the variable name of the unfavorable value is entered in X.

文番号６の命令Ｃ０Ｌ（（Ｘ、　、　Ｘ、　、　Ｘ、）
　：　ｉｎ　、　Ｙ、、　二ｏｕｔ）では、Ｘｌに入れ
て返された変数名がＸ、、Ｘｓの値と合致しているかを
照合し、不利な変数名をＹ、に入れる。Instruction C0L of statement number 6 ((X, , X, , X,)
: in, Y,, 2 out), the variable name returned in Xl is checked to see if it matches the value of X,, Xs, and the unfavorable variable name is placed in Y.

文番号５．６の命令の実行により照合に合格した場合に
は、文番号７の命令、ｉｆ（すべてのＹが”　Ｎ０ＮＥ
　ｎでない）　ｔｈｅｎ　１４により文番号１４へ移り
、文番号１４１１５．１６　　で照合完了の同期をとっ
て元のプログラムへ戻る。If the verification is passed by executing the instruction in statement number 5.6, the instruction in statement number 7, if (all Y are "N0NE
Then 14 moves to statement number 14, synchronizes the completion of collation with statement number 14115.16, and returns to the original program.

文番号５，６で照合がとれなかった場合には文番号８の
命令ＦＬＯＣ（（Ｙ、　、　Ｙ、　、　Ｙ、）：ｉｎ、
　Ｆ、　：　ｏｕｔ）で障害原因Ｆ０を調査し、文番号
９の命令ｃＯＬ（（Ｆ、　ＩＦ２１　Ｆ３）　：Ｉｎ＋
　ｚＨ：　ｏｕｔ）でその原因Ｆ、の照合を行い、一致
しているときには文番号ｌＯの命令口（Ｚｔ〜ｔ’ＮＯ
Ｎｇ’）　ｔｈｅｎ　１３　テ文番号１３へ移り北ＣＯ
Ｖ　（Ｚ、　：ｉｎ）命令で障害ｚ１を除去して計算機
システムを再構成し、システム再開を行う。If matching is not possible with statement numbers 5 and 6, the instruction of statement number 8 FLOC((Y, , Y, , Y,): in,
F, : out) to investigate the cause of the failure F0, and execute statement number 9 instruction cOL ((F, IF21 F3) :In+
zH: out) is used to check the cause F, and if they match, the statement number lO command entry (Zt~t'NO
Ng') then 13 Move to text number 13 and go to North CO
The fault z1 is removed by the V (Z, :in) instruction, the computer system is reconfigured, and the system is restarted.

文番号９で障害原因が一致しないときには文番号１１の
命令ＤＩＡ−Ｇ（Ｐ、　：　ｏｕ　ｔ　）で診断プログ
ラムを実行させ′Ｃ原因ｐ、を捜し、文番号１２の命令
ｋｇＣＯＶ（Ｐ、：ｉｎ）で障害原因Ｐ、を切離して計
算機システムを再構成し、システム再開を行う。If the cause of the failure does not match in statement number 9, execute the diagnostic program with the instruction DIA-G (P, : out) in statement number 11, search for the cause p, and execute the command kgCOV (P, : in) in statement number 12. ), the cause of the failure P is isolated, the computer system is reconfigured, and the system is restarted.

文番号１２．１３の命令によりこの照合セクションから
あらかじめ用意された別のプログラムへ制御を移す。The command of statement number 12.13 transfers control from this collation section to another program prepared in advance.

第３表は第１Ｏ図の照合セクションの照合手順を説明す
るためのマトリクスで、第３図のｎバス結合方式による
構成に対して行う場合の一実施例である。第３表の縦方
向は障害要素を示し、横方向には文番号５の命令を実行
して照合結果の入る記憶場所の名を示している。第３表
は第１の計算機の作用のみを示すもので記憶場所Ｘ、、
、　、　Ｘ、、ｓ、・・・・・・Ｘ■３はいずれも第１
の計算機に有するローカルメモリのＸ、内に設ける。Table 3 is a matrix for explaining the collation procedure of the collation section of FIG. 1O, and is an example of the case where it is performed for the configuration using the n-bus coupling method of FIG. 3. The vertical direction of Table 3 shows the failure element, and the horizontal direction shows the name of the storage location where the verification result is stored after executing the instruction of statement number 5. Table 3 shows only the action of the first computer, storage location X,...
, , X,,s,...X■3 are all the first
It is provided in the local memory X of the computer.

第３表中にＸ印が記入されているものは、文番号５の命
令を実行の結果、その該当する記憶場所ＸＩに、値が読
み込めないことを示している。例えば、Ｃ３Ｌ６−１１
が故障を起すとＸ、、、　、　Ｘ、、、には値を入れる
ことができないことを示す。Items marked with an X in Table 3 indicate that a value cannot be read into the corresponding storage location XI as a result of executing the instruction of statement number 5. For example, C3L6-11
If a failure occurs, it means that no value can be entered into X, , , X, , .

第２．第３の計算機に対しても同様の表を作成し、３種
類の表を合わせ−Ｃ分析すると障害個所の判定をするこ
とができる。文番号８の命令ではこの様な方法で故障個
所を判定している。Second. A similar table is created for the third computer, and the failure location can be determined by combining the three types of tables and performing a -C analysis. The instruction with statement number 8 uses this method to determine the location of the failure.

上述の様な方法によっ′Ｃも故障個所の判定がで第３表きなかったときには文番号１１の命令でテストデータを
積極的に流して自己診断を行い故障個所の判定を行う。When the failure location of 'C' is not determined as shown in Table 3 using the method described above, the test data is actively sent through the command of statement number 11 to perform self-diagnosis and determine the failure location.

前述の照合セクションは第１の計算機の主記憶装置に設
け、他の計算機の照合セクションと同一クロックタイミ
ングで排他的に実行されるので照合セクション内の命令
は一つ一つ同期をとって実行される。しかし各計算機内
のメモリバスーヒで競合が発生したとき、各命令が消費
する実行サイクルの数には大小があると考えられる。そ
こで命令と命令との間に必要と考えられる数のＮＯＰ　
（ノーオペレーション）命令をおいてアイドリンクする
ことによって全計算機の照合セクションの命令が同一タ
イミングで実行開始する様にしている。The above-mentioned collation section is provided in the main memory of the first computer and is executed exclusively at the same clock timing as the collation sections of other computers, so the instructions within the collation section are executed one by one in synchronization. Ru. However, when contention occurs in the memory bus in each computer, the number of execution cycles consumed by each instruction is considered to be large or small. Therefore, the number of NOPs considered necessary between instructions
(No-operation) instructions are placed and idle linked so that the instructions in the verification sections of all computers start executing at the same timing.

ｇ　発明の効果本発明の計算機システムによれば、本計算機システムを
構成するそれぞれの計算機が一つの命令で他の複数の計
算機の記憶内容を読み出して照合することが可能となり
、複数の計算機を同一プログラムで動作させ、各プログ
ラムの任意の点に於る処理過程値を同時刻に上記複数の
計算機か相互に照合する動作を行い、その照合結果も相
互に照合し、それぞれの計算機が多数決による障害判定
を行い、障害判定の結果、システム再構成、システム再
開方針に対しても相互に確め合って実行し、極めて高い
信頼性を有した耐障害形の計算機システムを得ることが
できる。g. Effects of the Invention According to the computer system of the present invention, each computer constituting the computer system can read and compare the memory contents of multiple other computers with a single command, and multiple computers can be It is operated by a program, and the process values at any point in each program are checked against each other at the same time by the multiple computers mentioned above, and the results of the checks are also checked against each other. A fault-tolerant computer system with extremely high reliability can be obtained by making a determination and mutually confirming and executing the system reconfiguration and system restart policy as a result of the fault determination.

[Brief explanation of drawings]

第１図は本発明の構成要件となる計Ｒ，機の１台のみの
構成図、第２図、第３図は本発明の一実施例で３台の計
算機の結合方式を示す図、第４図は命令オペランドから
細針算機の物理アドレスへの変換法を示す図、第５図は
第１の計算機から第２の計Ｘ：Ｐｓ、のローカルメモリ
へデータを書き込むときの信号交換図、ＷＪ６図はそれ
のタイミング図、第７図は第１の計算機が第２の計算機
のローカルメモリからデータを読み出すときの信号交換
図、第８図はそのタイミング図、第９図は３台の計算機
のローカルメモリに耐障害処理のために設けた記憶領域
を示すメモリマツプ、第１０図は本発明戸に挿入する耐障害処理のためのプログか李を示しＺｉ７
：１Ｊｊｒ：た図である。１．１−１．〜ｌ−３・・・ローカルメモリ（ＬＭ）２
・・・メモリバス３．３−１〜３−３・・・プロセッサユニット４・・・
演算装置５・・・命令レジスタ６．６−１１〜６−３３・・・コンピュータシステムリ
ンク（Ｃ８Ｌ）ｔ　ｔ　、　ｔ　ｔ−ｉ　−ｉ　１−ｉ
ａ・・・インターフェースバス２１〜２３・・・計算機（７３１７）　　代理人弁理士　則　近　憲　佑　（ほ
か１名）第１図第２図／／−／３第３図第４図Figure 1 is a configuration diagram of only one computer, which is a component of the present invention. Figures 2 and 3 are diagrams showing a method of combining three computers in an embodiment of the present invention. Figure 4 is a diagram showing the method of converting an instruction operand to a physical address of the fine needle calculator, and Figure 5 is a diagram of signal exchange when writing data from the first computer to the local memory of the second computer, X:Ps. , WJ6 is its timing diagram, Figure 7 is a signal exchange diagram when the first computer reads data from the local memory of the second computer, Figure 8 is its timing diagram, and Figure 9 is the signal exchange diagram when the first computer reads data from the local memory of the second computer. A memory map showing the storage area provided for fault-tolerant processing in the local memory of the computer, Figure 10 shows the program for fault-tolerant processing inserted into the door of the present invention.
:1Jjr:It is a diagram. 1.1-1. ~l-3...Local memory (LM) 2
...Memory bus 3.3-1 to 3-3...Processor unit 4...
Arithmetic unit 5...Instruction registers 6.6-11 to 6-33...Computer system link (C8L) tt, tti-i1-i
a...Interface bus 21-23...Computer (7317) Representative Patent Attorney Kensuke Chika (and 1 other person) Figure 1 Figure 2 //-/3 Figure 3 Figure 4

Claims

[Claims]

When the first 9″ machine executes an instruction, an address signal indicating the address of the operand of the instruction is placed on the memory bus;
The second computer is identified from the address signal and the second computer is identified.
A total of n computers connected to each other using an intercomputer communication device that reads and rewrites the contents of the memory location at the address in the main memory of the computer by the action of the instruction, each having the same purpose. Programs are stored, and in the middle of each program there is provided a collation section that is executed exclusively, and by the action of the collation section, each of the plurality of computers can calculate the calculated value of the other computer,
It is characterized by comparing it with the status value, further comparing the comparison result, and when it is determined that there is an abnormality, determines the location of the failure, system reconfiguration, and restart policy, and performs fault tolerance processing by combining the determination results. computer system.