JPH0368034A

JPH0368034A - Checkpoint retesting system

Info

Publication number: JPH0368034A
Application number: JP1204113A
Authority: JP
Inventors: Satoshi Kobayashi; 智小林
Original assignee: Mitsubishi Electric Corp
Current assignee: Mitsubishi Electric Corp
Priority date: 1989-08-07
Filing date: 1989-08-07
Publication date: 1991-03-25

Abstract

PURPOSE:To retest a check point easily at a multiple processor system by providing a hardware which guarantees the stored result of another system processor against the self-system processor. CONSTITUTION:An instruction is fetched to an instruction register 5 and decoded. As a result of the decode, a virtual operand address and the instruction length are calculated. The virtual operand address is converted to a physical address by an address conversion buffer 6, and the instruction length is stored at an instruction length latch (IL) 114. An operand data is read out from a cache 7 by the converted physical address, and at the same time, the contents of the IL 114 are transferred to an IL 215. The operand data is sent to an execution part 8 for the execution of the instruction, and is also sent to a cache buffer 11 when storing is executed to the address of the cache 7. At the same time, the contents of the IL 215 are transferred to an IL 316, and is held during the instruction.

Description

【発明の詳細な説明】［産業上の利用分野］この発明は、多重プロセッサシステムにおけるチェック
ポイント再試行方式に関するものである。DETAILED DESCRIPTION OF THE INVENTION [Field of Industrial Application] This invention relates to a checkpoint retry scheme in a multiprocessor system.

［従来の技術］第４図に多重プロセッサシステムの一構成例を示す。図
中、（１）及び（２）は同一ハードウェア構ｒ戊から成
るプロセッサＰＯ及びＰｌであり、システムバス４を介
して互いに交信する。（３）は主記憶装置（ＭＳと略す
）であり、プロセッサＰ　Ｏ（１）及びプロセッサＰ　
１１２＋の処理結果を格納する。[Prior Art] FIG. 4 shows an example of the configuration of a multiprocessor system. In the figure, processors PO and Pl (1) and (2) are composed of the same hardware structure and communicate with each other via a system bus 4. (3) is a main storage device (abbreviated as MS), which is a main storage device (abbreviated as MS), which includes a processor P O (1) and a processor P
112+ processing results are stored.

第５図に上記プロセッサＰ　Ｏ（１１の構成要素の主要
部分を示す。図中、（５）はＭ　Ｓ　＋３１からフェッ
チした命令を格納する命令レジスタ（ＩＲと略す）、（
６）はＩＲ（５１の出力をデコードした結果から求まる
オペランドのアドレス変換を行うアドレス変換バッファ
（ＴＬＢと酩す）　、（７）はＴ　Ｌ　Ｂ　（６）の変
換結果からＭ　Ｓ　（３）の内容の写しを取り出すため
のキャッシュ（ＣＡＣと略す）　、＋８１はＣＡ　Ｃ＋
７１の出力を処理する実行部（ＥＸ、Ｕと略す）　、　
（９）は汎用レジスタ等のワークレジスタを構成するロ
ーカルストレージ（ＬＳと略す）　、（１０）はＬ　Ｓ
　（９１の変更前の値をセーブするローカルストレーシ
バソファ（ＬＳＢと略す）　、（１１）はＣＡ　Ｃ（７
１の変更前の値をセーブするキャッシュバッファ（ＣＡ
ＣＢと略す）　、（］、、２）はＭ　Ｓ　（３＋から順
次命令をフェッチするアドレスを保持する命令カウンタ
（ＩＣと略す）　、（１３）はチェックポイント確立の
度にＩＣ（１２）の値をセーブする命令カウンタバッフ
ァ（ＩＣＢと略す）　、　（１９）はＥ　Ｘ　Ｕ　＋８
１からのストア実行時、ストアオペランドの内容を次の
チェックポイント確立までロックするためのロックアレ
ー（ＬＫと略す）　、　（２０）はＥ　Ｘ　Ｕ　（８１
からのチェックポイント確立リクエスト信号線（ＣＫＰ
Ｔ線と略す）である。FIG. 5 shows the main components of the processor P O (11). In the figure, (5) is an instruction register (abbreviated as IR) for storing instructions fetched from M
6) is an address translation buffer (combined with TLB) that converts the address of the operand determined from the result of decoding the output of IR (51), and (7) is TLB. Cache for retrieving copies of contents (abbreviated as CAC), +81 is CA C+
an execution unit (abbreviated as EX, U) that processes the output of 71;
(9) is local storage (abbreviated as LS) that constitutes a work register such as a general-purpose register, and (10) is LS.
(Local storage sofa (abbreviated as LSB) that saves the value before change of 91, (11) is CA C (7
Cache buffer (CA
(abbreviated as CB), (],, 2) is an instruction counter (abbreviated as IC) that holds the address for sequentially fetching instructions from M S (3+), (13) is the value of IC (12) every time a checkpoint is established. Instruction counter buffer (abbreviated as ICB) to save, (19) is E X U +8
When executing a store from 1, a lock array (abbreviated as LK) is used to lock the contents of the store operand until the next checkpoint is established.
Checkpoint establishment request signal line (CKP
(abbreviated as T line).

次に動作について説明する。命令は、ＩＣ（１２）で示
されるＭ　Ｓ　＋３１のアドレスからＩＲ［５）にフェ
ッチされ、デコードされる。デコードの結果求まった仮
想オペランドアドレスは、Ｔ　Ｌ　Ｂ　ｆ６＋により物
理アドレスに変換される。変換された物理アドレスによ
り、ＣＡ　Ｃ（７１からオペランドデータが読み出され
、命令実行のためにＥ　Ｘ　Ｕ　（８１へ送られるとと
もに、命令実行の結果、当該ＣＡ　Ｃｆ７＋のアドレス
にストアが行われる場合には、ＣＡＣＢ（１１，）へも
送られる。一方、汎用レジスタに格納されているオペラ
ンドはＬ　Ｓ　（９１より読み出されてＥ　Ｘ　Ｕ　（
８１へ送られるとともに、当該Ｌ　Ｓ　＋９１のアドレ
スにロードが行われる場合には、Ｌ　Ｓ　Ｂ　（１，０
）へも送られる。Ｅ　Ｘ　Ｕ　＋８１での実行結果は、
ＣＡｃ　＋７１及びＬ　Ｓ　（９１へ書き込まれる。Ｃ
Ａ　Ｃ（７１への書き込みを行う際には、当該書き込み
データを次のチェックポイント確立時点まで他系プロセ
ッサから変更されないようにするため、ストアアドレス
に対応するＬ　Ｋ　（１９）のエントリーにロックをセ
ットする。Next, the operation will be explained. The instruction is fetched from the address of M S +31, indicated by IC(12), into IR[5] and decoded. The virtual operand address obtained as a result of decoding is converted into a physical address by T L B f6+. According to the converted physical address, operand data is read from CA C(71) and sent to EXU(81) for instruction execution. , it is also sent to CACB (11,). On the other hand, the operand stored in the general-purpose register is read from L S (91) and is sent to E X U (
81 and when loading is performed to the address of L S +91, L S B (1,0
) is also sent to. The execution result at E X U +81 is
CAc +71 and L S (written to 91. C
When writing to A C (71), a lock is placed on the entry of L K (19) corresponding to the store address in order to prevent the write data from being changed by other processors until the next checkpoint is established. set.

このロックの方式は特公昭６１−２９７５号公報に示さ
れている。This locking method is shown in Japanese Patent Publication No. 61-2975.

本プロセッサは、故障発生時に、直前に確立したチェッ
クポイントまでハードウェアの状態を復元して再試行す
る。チェックポイント再試行の機能を有する。故障が発
生した場合、Ｌ　Ｓ　Ｂ　（１０）からＬ　Ｓ　ｆ９＋
の変更内容を復元し、ＣＡＣＢ（１１）からＣＡ、　Ｃ
（７１の変更内容を復元し、Ｉ　ＣＢ　（１３）からＩ
　Ｃ（１２）をチェックポイント確立時点の命令アドレ
スに戻すことにより、直前のチェックポイントから再試
行する。When a failure occurs, this processor restores the hardware state to the most recently established checkpoint and tries again. Has a checkpoint retry function. If a failure occurs, L S B (10) to L S f9+
Restore the changes in CACB (11) to CA, C
(Restore the changes in 71 and change I CB (13) to I
By returning C(12) to the instruction address at the time the checkpoint was established, a retry is made from the previous checkpoint.

次に、Ｌ　Ｋ　（１９）の働きについて第６図を用いて
説明する。チェックポイント再試行時には、故障発生以
前に実行されたストア命令が再実行される。このため、
自系プロセッサＰＯの当該ストアの結果に対し、他系プ
ロセッサＰＬがストアを実行すると、チェックポイント
再試行の結果、他系プロセッサＰ１のストア結果が失わ
れることがある。ＬＫ（１９）は、この問題を避けるた
めに設けられたもので、第６図に示す如く、ストア実行
時にセットされ、チェックポイント確立時にリセットさ
れる。他系プロセッサＰＬのストアは、ＬＫ（１９）の
対象エントリーがセットされてる間は待たされ、自系プ
ロセッサＰＯのチェックポイント確立後に実行されるよ
う制御される。これにより、自系プロセッサＰＯがスト
ア実行後２次のチェックポイント確立までの間に故障し
、直前のチェックポイントから再試行することにより同
一ストアを再度実行しても他系プロセッサＰ１のストア
結果は保証される。最初のチェックポイント確立時点か
ら次のチェックポイント確立時点までの間をチェックポ
イントフレームと称する。Next, the function of L K (19) will be explained using FIG. 6. When a checkpoint is retried, the store instruction that was executed before the failure occurs is re-executed. For this reason,
When a foreign processor PL executes a store on the store result of the own processor PO, the store result of the foreign processor P1 may be lost as a result of the checkpoint retry. LK (19) is provided to avoid this problem, and as shown in FIG. 6, is set when a store is executed and reset when a checkpoint is established. The store of the other system processor PL is made to wait while the target entry of LK (19) is set, and is controlled to be executed after the checkpoint of the own system processor PO is established. As a result, even if the own processor PO fails after executing the store and before establishing the secondary checkpoint, and the same store is executed again by retrying from the previous checkpoint, the store result of the other processor P1 will not be the same. Guaranteed. The period from the time when the first checkpoint is established to the time when the next checkpoint is established is called a checkpoint frame.

次に、チェックポイント確立時の動作を第５図を用いて
説明する。チェックポイント確立時には、先ず、先行す
るストアの処理完了を待つ（同期化と称する）。その後
、Ｅ　Ｘ　Ｕ　（８）からＣＫＰＴ線（２０）を介して
チェックポイント確立要求が送出される。これにより、
ＬＳＢ（１０）及びＣＡＣＢ（１１）の内容は無効化さ
れ、Ｌ　Ｋ　（１９）の内容は全てリセットされ、Ｉ　
Ｃ（１２）の内容がＩＣＢ（１３）にセーブされる。Next, the operation when establishing a checkpoint will be explained using FIG. 5. When establishing a checkpoint, first wait for the processing of the preceding store to complete (referred to as synchronization). Thereafter, a checkpoint establishment request is sent from EXU (8) via the CKPT line (20). This results in
The contents of LSB (10) and CACB (11) are invalidated, all contents of L K (19) are reset, and I
The contents of C(12) are saved to ICB(13).

［発明が解決しようとする課題］従来のチェックポイント再試行方式では、自系プロセッ
サのチェックポイントフレーム内でのスドア結果に対す
る他系プロセッサのストアを、自系プロセッサでの次の
チェックポイント確立まで待たせる必要があり、そのた
めの検出回路や排他回路が膨大かつ複雑になる問題点が
あった。また、他系プロセッサからのリクエストを待た
せるため、システム性能への悪影響があった。[Problem to be Solved by the Invention] In the conventional checkpoint retry method, the storage of the checkpoint result in the checkpoint frame of the own processor in the other processor waits until the next checkpoint is established in the own processor. There was a problem in that the detection circuits and exclusion circuits required for this purpose were enormous and complex. In addition, system performance was adversely affected because requests from other processors were forced to wait.

この発明は上記のような問題点を解消するためになされ
たもので、少ない回路追加で容易に再試行ができる多重
プロセッサシステムを得ることを目的とする。The present invention was made to solve the above-mentioned problems, and aims to provide a multiprocessor system that can easily perform retrials by adding a small number of circuits.

［課題を解決するための手段］この発明に係るチェックポイント再試行方式は、ストア
リクエストのペンディング数をカウントするカウンタを
備えて、ペンディングしているストアの完了を待たずに
チェックポイントを確立する非同期形チェックポイント
確立リクエストを発行するとともに、当該チェックポイ
ント確立リクエストを発行した命令の命令長を保持する
命令長保持手段を備え、非同期形チェックポイントを確
立したストア命令によるストアが完了したのちに故障が
発生した場合には、当該ストア命令の次の命令から再試
行を行うようにしたものである。[Means for Solving the Problems] The checkpoint retry method according to the present invention is an asynchronous method that includes a counter that counts the number of pending store requests and establishes a checkpoint without waiting for the completion of pending stores. In addition to issuing an asynchronous checkpoint establishment request, the instruction length retaining means retains the instruction length of the instruction that issued the checkpoint establishment request, and a failure occurs after the store by the store instruction that established the asynchronous checkpoint is completed. If this occurs, a retry is made from the instruction following the store instruction.

［作用］この発明におけるチェックポイント再試行方式は、スト
ア命令実行の度に非同期形チェックポイントを確立し、
ペンディングストアリクエストカウンタを１増加し、当
該ストア命令の命令長を命令長保持手段（バッファやレ
ジスタ）にセーブし、ストア完了時点でペンディングス
トアリクエストカウンタを１減算することにより、チェ
ックポイントから再試行するとき、ペンディングストア
リクエストカウンタがＬＩ　Ｏｌｊならばストア命令の
次の命令から再試行し、ｕ　Ｏ＋ｙでないならばストア
命令から再試行するようにしたものである。[Operation] The checkpoint retry method in this invention establishes an asynchronous checkpoint every time a store instruction is executed,
Increment the pending store request counter by 1, save the instruction length of the store instruction in the instruction length holding unit (buffer or register), and retry from the checkpoint by decrementing the pending store request counter by 1 when the store is completed. At this time, if the pending store request counter is LI_Olj, a retry is made from the instruction following the store instruction, and if it is not u_O+y, a retry is made from the store instruction.

すなわち、ストアが完了している場合には、ストア命令
の次から再試行し、自系プロセッサでのストア結果に対
する他系プロセッサのストア結果を保証する。また、ス
トアが完了していない場合には、ストア命令から再試行
し、プログラムの実行シーケンスを保証する。That is, if the store has been completed, the next store instruction is retried to ensure that the store result of the other system processor is the same as the store result of the own system processor. Furthermore, if the store is not completed, the program is retried from the store instruction to ensure the program execution sequence.

［実施例］以下、この発明の一実施例を図について説明する。[Example] An embodiment of the present invention will be described below with reference to the drawings.

第１図は、第４図に示したプロセッサＰ　Ｏｆ１＋の本
実施例による構成要素の主要部分を示すブロック図であ
る。図中、（５）はＭ　Ｓ　［３１からフェッチした命
令を格納する命令レジスタ（ＩＲと略す’）　、（６１
はＩＲ（５）の出力をデコードした結果から求まるオペ
ランドのアドレス変換を行うアドレス変換バッファ（Ｔ
ＬＢと略す）　、　（７）はＴ　Ｌ　Ｂ　（６１の変換
結果からＭ　Ｓ　＋３１の内容の写しを取り出すための
キャッシュ（ＣＡＣと略す）　、　＋８１はＣＡ　Ｃ（
７１の出力を処理する実行部（ＥＸＵと略す）　、＋９
１は汎用レジスタ等のワークレジスタを構成するローカ
ルストレージ（ＬＳと略す）　、（１０）はＬ　Ｓ　＋
９１の変更前の値をセーブするローカルストレージバッ
ファ（ＬＳＢと略す）、（１１）はＣＡ　Ｃ（７１の変
更前の値をセーブするキャッシュバッファ（ＣＡＣＢと
略す）　、　（１２）はＭ　Ｓ　＋３１から順次命令を
フェッチするアドレスを保持する命令カウンタ（ＩＣと
略す）　、　（１３）はチェックポイント確立の度にＩ
Ｃ（１２）の値をセーブする命令カウンタバッファ（工
ＣＢと略す）　、　（１４）はデコード後の命令の命令
長を、ＴＬＢ（８１によるアドレス変換中保持する命令
長ラッチ１　（ＩＬＬと略す）　、（１５）はＣＡ　Ｃ
（７１のアクセス中、命令長を保持する命令長ラッチ２
（ＩＬ２と略す）　、（１６）は命令実行中、命令長を
保持する命令長ラッチ３　（ＩＬ３と略す）　、（１７
）は非同期形チェックポイント確立時、ＩＬ３（１Ｂ）
の内容を格納する命令長バッファ（ＩＬＣＢと略す）　
、　（１８）はＥ　Ｘ　Ｕ　（８）のストアリクエスト
でカウントアツプし、ＣＡ　Ｃｆｉ＋でのストア完了時
カウントダウンするペンディングストアリクエストカウ
ンタ（ｓｐｃと略す）　、　（２０）はＥ　Ｘ　Ｕ　＋
８１からの同期形チェックポイント確立リクエスト信号
線、（２１）はＥ　Ｘ　Ｕ　（８１からの非同期形チェ
ックポイント確立リクエスト信号線、（２２）は上記各
信号線（２０）　、　（２１）の論理和をとるＯＲゲー
ト、（２３）はその出力で、同期形チェックポイント確
立リクエストか非同期形チェックポイント確立リクエス
トの何れかが有効なときに有効となるチェックポイント
確立リクエスト信号線である。FIG. 1 is a block diagram showing the main components of the processor P Of1+ shown in FIG. 4 according to this embodiment. In the figure, (5) is an instruction register (abbreviated as IR') that stores the instruction fetched from M S [31, (61
is an address translation buffer (T
(abbreviated as LB), (7) is a cache (abbreviated as CAC) for extracting a copy of the contents of M S +31 from the conversion result of T LB (61), and +81 is CAC (abbreviated as CAC).
Execution unit (abbreviated as EXU) that processes the output of 71, +9
1 is local storage (abbreviated as LS) that constitutes a work register such as a general-purpose register, (10) is LS +
A local storage buffer (abbreviated as LSB) that saves the value before change of 91, (11) is a cache buffer (abbreviated as CACB) that saves the value before change of 71, (12) is from M S +31 An instruction counter (abbreviated as IC) that holds addresses for fetching instructions sequentially, (13) is
(14) is an instruction counter buffer (abbreviated as CB) that saves the value of C (12), and an instruction length latch 1 (abbreviated as ILL) that stores the instruction length of the decoded instruction during address conversion by TLB (81). , (15) is C A C
(Instruction length latch 2 that holds the instruction length during access to 71)
(abbreviated as IL2), (16) is an instruction length latch 3 (abbreviated as IL3), (17) that holds the instruction length during instruction execution.
) is IL3 (1B) when an asynchronous checkpoint is established.
instruction length buffer (abbreviated as ILCB) that stores the contents of
, (18) is a pending store request counter (abbreviated as spc) that counts up at the store request of EXU (8) and counts down when the store is completed in CA Cfi+, (20) is the pending store request counter (abbreviated as spc)
Synchronous checkpoint establishment request signal line from 81, (21) is EXU (asynchronous checkpoint establishment request signal line from 81, (22) is the logical sum of the above signal lines (20) and (21) The output of the OR gate (23) is a checkpoint establishment request signal line that becomes valid when either the synchronous checkpoint establishment request or the asynchronous checkpoint establishment request is valid.

次に動作について説明する。命令は、Ｉ　Ｃ（１２）で
示されるＭ　Ｓ　（３１のアドレスからＩＲ（５）にフ
ェッチされ、デコードされる。デコードの結果、仮想オ
ペランドアドレスと命令長が求まり、仮想オペランドア
ドレスは、Ｔ　Ｌ　Ｂ　（６１により物理アドレスに変
換され、命令長はＩ　Ｌ　１　（１４）に格納される。Next, the operation will be explained. The instruction is fetched from the address of M S (31) indicated by I C (12) to IR (5) and decoded. As a result of decoding, the virtual operand address and instruction length are found, and the virtual operand address is T L B (61) converts it into a physical address, and the instruction length is stored in I L 1 (14).

変換された物理アドレスにより、ＣＡ　Ｃ（７１からオ
ペランドデータが読み出されると同時に、ＩＬＬ（１４
）の内容がＩ　Ｌ　２　（１５）へ移される。ＣＡ　Ｃ
＋７）から読み出されたオペランドデータは、命令実行
のためにＥ　Ｘ　ｔＪ　＋８）へ送られるとともに、命
令実行により当該ＣＡ　Ｃ＋７）のアドレスにストアが
行われるときにはＣＡＣＢ（１１）へも送られる。これ
と同時にＩ　Ｌ、　２　（１５）の内容がＩ　Ｌ　３　
（１６）へ移され、命令実行の間保持される。一方、汎
用レジスタに格納されているオペランドはＬ　Ｓ　＋９
１より読み出され、Ｅ　Ｘ　Ｕ　＋８１へ送られるとと
もに、当該Ｌ　Ｓ　（９１のアドレスにロードが行われ
る場合にはＬ　Ｓ　Ｂ　（１０）へも送られる。Ｅ　Ｘ
　Ｕ　＋８１での実行結果は、ＣＡ　Ｃ＋７１及びＬ　
Ｓ　＋９１へ書き込まれる。According to the converted physical address, operand data is read from CA C (71) and at the same time, ILL (14
) is moved to I L 2 (15). C.A.C.
The operand data read from +7) is sent to E X tJ +8) for instruction execution, and is also sent to CACB (11) when a store is performed to the address of CA C+7) by instruction execution. At the same time, the content of I L, 2 (15) becomes I L 3
(16) and is held during instruction execution. On the other hand, the operand stored in the general-purpose register is L S +9
1 and sent to E
The execution result on U +81 is CA C+71 and L
Written to S+91.

本プロセッサもチェックポイント再試行の機能を有し、
故障が発生した場合には、Ｌ　Ｓ　Ｂ　（１０）からＬ
　Ｓ　＋９１の変更内容を復元し、ＣＡＣＢ（１１）か
らＣＡ　Ｃ（７１の変更内容を復元し、Ｉ　ＣＢ　（１
，３）からＩ　Ｃ（１２）をチェックポイント確立時点
の命令アドレスに戻すことにより、直前のチェックポイ
ントから再試行する。This processor also has a checkpoint retry function,
If a failure occurs, L S B (10) to L
Restore the changes of S +91, restore the changes of CA C (71) from CACB (11), and restore the changes of I CB (1
, 3), the I C (12) is returned to the instruction address at the time the checkpoint was established, thereby retrying from the previous checkpoint.

ここで、ストア命令実行後の故障の場合について説明す
る。Ｅ　Ｘ　Ｕ　（８）は、ストア命令実行の度）こ非
同期形チェックポイント確立リクエスト信号線（２１）
により、先ずチェックポイントを確立し、当該ストア命
令の命令アドレスをＩ　ＣＥ　（１３）に、命令長をＩ
ＬＣＢ（１７）に格納するとともに、ＬＳＢ（１０）及
びＣＡ　ＣＢ　（１１，）の内容を無効化し、以降の故
障発生時に当該ストア命令から再試行できるようにする
。次にオペランドに対するストアを実行し、Ｓ　Ｐ　Ｃ
（１８）をカウントアツプする。正常にストアが完了す
ると、ｃ　Ａ　Ｃ（７）からのストア完了報告によりＳ
　Ｐ　Ｃ（１８）はカウントダウンする。第２図に、ス
トア完了後に故障が発生した場合の処理例を示す。Ｅ　
Ｘ　Ｕ　（８）からのストア要求は、他のリクエストと
の競合により一旦保留状態となり、競合が解消すると実
際にＣＡ　Ｃ＋７１に書き込まれ、それに伴ってＳ　Ｐ
　Ｃ（１８）がカウントダウンし、“ＯＦ＋に戻る。従
って、Ｓ　Ｐ　Ｃ（１８）はストア要求発行時点からＣ
Ａ　Ｃ（７１への書き込み完了までｉｔ　１　ｈｐを保
持している。第２図は、ｓ　ｐ　ｃ　（１８）がカウン
トダウンし　１１　Ｑ　ｊｌになった後に故障が発生し
た場合を示す。この場合、Ｅ　Ｘ　Ｕ　＋８１はマシン
チエツクの割り込み処理内でＳ　Ｐ　Ｃ（１８）の内容
が“０″であることからストア完了を知り、従って、直
前のチェックポイントでのストア命令を実行してはなら
ず、また実行する必要もないことを知る。そこで、Ｅ　
Ｘ　Ｕ　ｆ８＋は当該ストア命令の命令長をＩＬＣＢ（
１７）から得てＩ　ＣＢ　（１３）の値に加算した結果
をＩ　Ｃ（１２）へ格納することにより、当該ストア命
令の次の命令から再試行できるようにする。また、以上
の処理をすることから、当該ストア命令実行時はＣＡ　
Ｃ（７）のストア前のデータをＣＡＣＢ（１１）へ保存
しない０以上により、ＥＸＵ（８）が誤ってストア結果
を書き戻し、他系プロセッサのストア結果を失うことが
防止される。第２図の場合、故障発生時点までに白糸プ
ロセッサのストアが完了し、他系プロセッサにストア結
果が見えてしまっているため、自系プロセッサが再試行
によりストアを再実行すると１両系プロセッサのプログ
ラム上の一貫性が保たれなくなる危険がある。Here, a case of failure after execution of a store instruction will be described. EXU (8) is an asynchronous checkpoint establishment request signal line (21) every time a store instruction is executed.
First, a checkpoint is established, and the instruction address of the store instruction is set to I CE (13), and the instruction length is set to I CE (13).
It is stored in the LCB (17), and the contents of the LSB (10) and CA CB (11,) are invalidated so that a retry can be made from the store instruction in the event of a subsequent failure. Next, perform a store on the operand and S P C
Count up (18). When the store is completed normally, S
PC (18) counts down. FIG. 2 shows an example of processing when a failure occurs after the store is completed. E
The store request from X U (8) is temporarily put on hold due to conflict with another request, and when the conflict is resolved, it is actually written to CA C+71, and accordingly,
C(18) counts down and returns to "OF+. Therefore, SPC(18) counts down from the time the store request is issued.
It 1 hp is held until the writing to A C (71) is completed. Figure 2 shows a case where a failure occurs after sp c (18) counts down and reaches 11 Q jl. In this case, EXU+81 knows that the store is completed because the contents of SPC (18) are "0" during machine check interrupt processing, and therefore should not execute the store instruction at the previous checkpoint. , we also know that there is no need to execute it.Therefore, E
X U f8+ sets the instruction length of the store instruction to ILCB (
By storing the result obtained from step 17) and added to the value of I CB (13) in I C (12), it is possible to retry the instruction following the store instruction. Also, since the above processing is performed, when executing the store instruction, CA
The value of 0 or more that does not save data before storage in C(7) to CACB(11) prevents EXU(8) from erroneously writing back the store result and losing the store result of other system processors. In the case of Figure 2, the store of the Shiraito processor has been completed by the time the failure occurs, and the store result is visible to the processors of other systems, so when the own processor re-executes the store by retrying, the processors of both systems There is a risk that program consistency will not be maintained.

第３図に、ストアが保留中に故障が発生した場合を示す
。この場合、Ｅ　Ｘ　Ｕ　（８１は、マシンチエツク割
り込み内でＳ　Ｐ　Ｃ（１Ｂ）が′″１″であることか
らストアが保留しており、更にマシンチエツクからの回
復によりストアが消されたことを知り、従って、当該ス
トア命令から再実行する必要があることを知る。そこで
、Ｅ　Ｘ　Ｕ　１８）はＩ　ＣＥ　（１３）の値をその
ままＩＣ（１２）へ戻し、当該ストア命令から再試行で
きるようにする。通常、保留中のストアが故障箇所と切
り離せない場合には、故障発生と同時に状態を凍結し、
回復処理により保留中のストアを消去し、ＣＡＣ（７）
をリクエスト待ち状態にして、再試行に悪影響が生じな
いようにする。FIG. 3 shows a case where a failure occurs while a store is pending. In this case, E Therefore, E do it like this. Normally, if the pending store cannot be separated from the failure location, the status is frozen as soon as the failure occurs, and
The recovery process clears the pending store and returns CAC(7)
to wait for requests so that retries are not negatively affected.

逆に、故障から切り離せる場合には、故障による凍結を
しないため、故障部分の回復処理中にストアが完了して
、Ｅ　Ｘ　Ｕ　（８１のマシンチエツク割り込み処理時
点でＳ　Ｐ　Ｃ（１８）はｒｔｏ”になっている。On the other hand, if it can be separated from the failure, the storage will be completed during the recovery process of the failed part and the SPC (18) will not be frozen due to the failure. rto”.

一方、同期形チェックポイント確立時は、先ず保留中の
全てのストアが完了したのち、チェックポイントの確立
を行う。従って、この時点でＳ　Ｐ　Ｃ（１８）を“Ｏ
”とし、更にＩ　Ｌ　’ＣＢ　（１７）の内容も“Ｏ”
とする。これにより、同期形チェックポイント確立後で
非同期形チェックポイント確立以前に故障が発生した場
合には、Ｅ　Ｘ　Ｕ　（８１がＩＬＣＢ（１７）の値“
０”をＩ　ＣＢ　（１，３）に加算した後にＩＣ（１２
）に格納するため、チェックポイント確立点から再試行
できる。On the other hand, when establishing a synchronous checkpoint, the checkpoint is established after all pending stores are completed. Therefore, at this point, S P C (18) is
”, and the content of I L 'CB (17) is also “O”.
shall be. As a result, if a failure occurs after the synchronous checkpoint is established but before the asynchronous checkpoint is established, E
0” to I CB (1, 3) and then IC (12
), so you can retry from the checkpoint establishment point.

［発明の効果］以上のように、この発明によれば、自系プロセッサのス
トア結果に対する他系プロセッサのストア結果を保証す
るためのハードウェアを、ペンディングストアリクエス
トカウンタと命令長バッファで構成し、非同期形チェッ
クポイント確立リクエストを発行するようにしたので、
装置が安価にでき、また、システム性能への悪影響を少
なくする効果がある。[Effects of the Invention] As described above, according to the present invention, the hardware for guaranteeing the store result of the other system processor with respect to the store result of the own system processor is configured with a pending store request counter and an instruction length buffer, Now that an asynchronous checkpoint establishment request is issued,
The device can be made inexpensive and has the effect of reducing adverse effects on system performance.

[Brief explanation of drawings]

第１図はこの発明の一実施例によるプロセッサの要部構
成を示すブロック図、第２図は本実施例においてストア
完了後に故障が発生した場合の処理方式を示す概念図、
第３図は本実施例においてストア保留中に故障が発生し
た場合の処理方式を示す概念図、第４図は多重プロセッ
サシステムを示すブロック図、第５図は従来のプロセッ
サの要部構成を示すブロック図、第６図は従来のプロセ
ッサにおけるロック方式と再試行方式を示す概念図であ
る。（１１，＋２１はプロセッサ（ＰＯ，Ｐｌ）、（３１は
主記憶装置（ＭＳ）、（７１はキャッシュ（ＣＡＣ）、
＋８）は実行部（Ｅ　ＸＵ）　、　＋９＋はローカルス
トレージ（ＬＳ）、（１０）はローカルストレージバッ
ファ（Ｌ　Ｓ　Ｂ）　、　（１１）はキャッシュバッフ
ァ（ＣＡＣＢ）　、（１２）は命令カウンタ（ＩＣ）、
（１３）は命令カウンタバッファ（Ｉ　ＣＢ）　、（１
７）は命令長バッファ（Ｉ　ＬＣＢ　；命令長保持手段
）　、　（１８）はペンディングストアリクエストカウ
ンタ（ｓｐｃ）、（２０）は同期形チェックポイント確
立リクエスト信号線・、（２１）は非同期形チェックポ
イント確立リクエスト信号線。なお、図中、同一符号は同一、又は相当部分を示す。FIG. 1 is a block diagram showing the main part configuration of a processor according to an embodiment of the present invention, and FIG. 2 is a conceptual diagram showing a processing method in the case where a failure occurs after a store is completed in this embodiment.
FIG. 3 is a conceptual diagram showing a processing method in the case where a failure occurs while a store is pending in this embodiment, FIG. 4 is a block diagram showing a multiprocessor system, and FIG. 5 shows the main part configuration of a conventional processor. The block diagram in FIG. 6 is a conceptual diagram showing a lock method and a retry method in a conventional processor. (11, +21 are processors (PO, Pl), (31 is main memory (MS), (71 is cache (CAC),
+8) is the execution unit (EXU), +9+ is the local storage (LS), (10) is the local storage buffer (LSB), (11) is the cache buffer (CACB), and (12) is the instruction counter (IC). ,
(13) is the instruction counter buffer (I CB), (1
7) is an instruction length buffer (ILCB; instruction length holding means), (18) is a pending store request counter (spc), (20) is a synchronous checkpoint establishment request signal line, (21) is an asynchronous checkpoint Establishment request signal line. In addition, in the figures, the same reference numerals indicate the same or equivalent parts.

Claims

[Claims] The present invention includes means for retaining intermediate results of instruction execution, and by establishing a checkpoint for each predetermined instruction, a function is provided for returning from the instruction address where a failure has occurred to the previous checkpoint and retrying. In a multiprocessor system consisting of multiple processors, the system is equipped with a counter that counts the number of pending store requests, and issues an asynchronous checkpoint establishment request to establish a checkpoint without waiting for the completion of pending stores. , an instruction length holding unit that holds the instruction length of the instruction that issued the checkpoint establishment request, and if a failure occurs after the store by the store instruction that established the asynchronous checkpoint is completed, the instruction length of the instruction that issued the checkpoint establishment request is A checkpoint retry method characterized by retrying from the next instruction.