JPH0962580A

JPH0962580A - Multi-processor device

Info

Publication number: JPH0962580A
Application number: JP7221638A
Authority: JP
Inventors: Shuichi Nakamura; 秀一中村; Toshiyuki Fukui; 俊之福井; Kazumasa Hamaguchi; 一正濱口
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 1995-08-30
Filing date: 1995-08-30
Publication date: 1997-03-07

Abstract

PROBLEM TO BE SOLVED: To avoid the degradation in use efficiency of a cross coupling network by maintaining the consistency of data blocks existing in each cache memory based on issue of a prescribed instruction at the time of issue of this instruction from the processing of each processor. SOLUTION: Cache units 11 and 21 are connected to a main storage unit 12 and a bus arbiter 16. Cache units 11 and 21 update data entries of data blocks in units themselves or reflect them on the main storage unit 12 based on requests of processors and snoop the address information or the like flowing on a local bus 15 to execute the operation of cache consistency maintenance or the like. The bus arbiter 16 arbitrates the use right of the local bus 15. When the processing of each of processors 10 and 20 issues the prescribed instruction, the consistency of data blocks existing in each cache memory is maintained based on this instruction issue.

Description

Detailed Description of the Invention

【０００１】[0001]

【発明の属する技術分野】本発明はマルチプロセッサ装
置、詳しくは、各プロセッサがそれぞれキャッシュを備
えたマルチプロセッサ装置に関するものである。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a multiprocessor device, and more particularly to a multiprocessor device in which each processor has a cache.

【０００２】[0002]

【従来の技術】並列計算機システムにおいては、プロセ
ッサから発行される主記憶に対するアクセス要求に高速
に応じるため、及び相互結合網のトラフィックを減じる
ために、各プロセッサにキャッシュ・メモリを付随させ
ることが多い。各プロセッサから発行されるメモリ・ア
クセスはキャッシュ・メモリ（及びキャッシュメモリコ
ントローラ）を介して行われ、キャッシュ・メモリ中に
はそれらメモリ・アクセス対象のデータ・ブロックのコ
ピーが置かれることになる。2. Description of the Related Art In a parallel computer system, a cache memory is often attached to each processor in order to respond to an access request issued from a processor to a main memory at high speed and to reduce traffic of an interconnection network. . The memory access issued from each processor is performed via the cache memory (and the cache memory controller), and a copy of the data block of the memory access is placed in the cache memory.

【０００３】並列計算機システムにおいては、複数ある
キャッシュ・メモリ中に同一データ・ブロックのコピー
が各々存在する状況が生じ得るが、それらコピー間の一
貫性を保証するために、従来様々な方法が考案・実現さ
れている。In a parallel computer system, a situation may occur in which multiple copies of the same data block exist in multiple cache memories, but various methods have been devised in order to guarantee the consistency between the copies. -It has been realized.

【０００４】プロセッサ間やプロセッサ・主記憶間を相
互に接続する結合網に、全てのトランザクションが監視
可能であるバスのようなものを用いた並列計算機システ
ムにおいては、スヌープ方式が一般的である。スヌープ
方式は、キャッシュ・メモリが結合網上に発行される全
トランザクションを監視し、トランザクション対象のデ
ータ・ブロックのコピーが自キャッシュ・メモリ中に存
在していた場合は、必要な一貫性保持動作を施すもので
ある。The snoop method is generally used in a parallel computer system using a bus such as a bus capable of monitoring all transactions in a connection network connecting processors or processors and main memories to each other. In the snoop method, the cache memory monitors all transactions issued on the connection network, and if a copy of the data block to be transacted exists in its own cache memory, the necessary coherency maintenance operation is performed. It is something to give.

【０００５】また、プロセッサ間やプロセッサ・主記憶
間を相互に接続する結合網に、全てのトランザクション
を監視することが困難である並列計算機システムにおい
ては、ディレクトリ方式が用いられる。ディレクトリ方
式は、データ・ブロック単位、あるいはそれに類する単
位毎に、いずれのキャッシュ・メモリ中にそのコピーが
存在するかというキャッシング情報を、ディレクトリと
呼ばれる記憶装置に格納・管理しておき、プロセッサか
らのトランザクション発行時にはディレクトリから得ら
れるキャッシング情報をもとにして、トランザクション
対象データ・ブロックのコピーを有するキャッシュ・メ
モリにトランザクションの発生を通知し、コピー間の一
貫性保持を図るものである。A directory system is used in a parallel computer system in which it is difficult to monitor all transactions in a connection network interconnecting processors and processors / main memories. The directory method stores and manages, in a storage device called a directory, caching information indicating in which cache memory the copy exists in each data block unit or similar unit, and the cache information from the processor is stored. When a transaction is issued, the occurrence of a transaction is notified to the cache memory having a copy of the transaction target data block based on the caching information obtained from the directory, and the consistency is maintained between the copies.

【０００６】[0006]

【発明が解決しようとする課題】従来、並列計算機シス
テムにおける複数キャッシュ・メモリ中に存在するコピ
ー間の一貫性をとるための動作は、上述の通りトランザ
クション毎に行われるものであった。しかしこれは、メ
モリに対するアクセス・レイテンシを抑えるために様々
考案/実現されている緩いメモリ・コンシステンシ・モ
デルにはそぐわないものである。Conventionally, the operation for maintaining the consistency among the copies existing in a plurality of cache memories in a parallel computer system has been performed for each transaction as described above. However, this is incompatible with the loose memory consistency model that has been devised / implemented in various ways to reduce the access latency to the memory.

【０００７】一般に、緩いメモリ・コンシステンシ・モ
デルでは、処理の過程に同期ポイントを定め、処理が同
期ポイントに達した時点で、それまでに発行したメモリ
・トランザクションをシステム中に反映させることを義
務付けている。このことは、同期ポイント以前には各メ
モリ・トランザクション結果を反映させる必要がないこ
とを意味する。即ち、緩いメモリ・コンシステンシ・モ
デルを採る並列計算機システムにおいて従来のキャッシ
ュ一貫性保持手法を用いた場合、その時点では不要な一
貫性保持動作がトランザクション毎に入ることとなり、
そのオーバヘッドは、緩いメモリ・コンシステンシ・モ
デルの目的に反し、不用意にメモリ・アクセス・レイテ
ンシを嵩ませていると言える。Generally, in the loose memory consistency model, it is required to set a synchronization point in the process of processing and, when the processing reaches the synchronization point, reflect the memory transaction issued so far in the system. ing. This means that it is not necessary to reflect the outcome of each memory transaction prior to the sync point. In other words, when a conventional cache coherency retention method is used in a parallel computer system that adopts a loose memory consistency model, a coherency retention operation that is unnecessary at that point will be entered for each transaction,
It can be said that the overhead carelessly increases the memory access latency, contrary to the purpose of the loose memory consistency model.

【０００８】しかし、キャッシュの一貫性保持動作の実
施を、緩いメモリ・コンシステンシ・モデルでメモリ・
トランザクションを反映する必要が生じる同期ポイント
の時点にまで遅延させることによって、不要なキャッシ
ュ一貫性保持動作によるオーバーヘッドを削減するよう
なシステムにおいては、同期ポイントの時点において集
中的にキャッシュの一貫性保持動作が行われるため、同
期ポイントの時点で相互結合網上に集中的にトラフィッ
クが発生することになり、その結果、相互結合網の利用
効率が極端に低下する恐れがあった。However, the implementation of the cache coherency operation is performed in a memory-consistent model with memory
In a system that reduces the overhead of unnecessary cache coherency behavior by delaying to the point of the sync point where it is necessary to reflect the transaction, the cache coherency behavior is intensive at the time of the sync point. Therefore, the traffic is intensively generated on the interconnection network at the time of the synchronization point, and as a result, the utilization efficiency of the interconnection network may be extremely reduced.

【０００９】[0009]

【課題を解決するための手段】本発明はかかる問題点に
鑑みなされたものであり、キャッシュの一貫性保持動作
の実施を都合の良い時点まで遅延させることで、例え
ば、緩いメモリ・コンシステンシ・モデルのシステム
に、同期ポイント時点における相互結合網上のトラフィ
ックの集中による相互結合網の利用効率の低下を避け、
より性能の高いマルチプロセッサ装置を提供しようとす
るものである。SUMMARY OF THE INVENTION The present invention has been made in view of the above problems, and delays the execution of the cache coherency holding operation until a convenient time, for example, to loosen memory consistency. In the model system, avoiding the decrease in utilization efficiency of the interconnection network due to the concentration of traffic on the interconnection network at the synchronization point,
It is intended to provide a multiprocessor device having higher performance.

【００１０】この課題を解決するため、例えば本発明は
以下の構成を備える。すなわち、複数のプロセッサ及び
それらに付随するキャッシュメモリ、主記憶装置、及
び、キャッシュ・メモリと記憶装置間を相互に接続する
結合網からなる並列計算機システムにおいて、各プロセ
ッサの処理が所定の命令発行を行った場合に、当該命令
発行に基づいて各キャッシュ・メモリ中に存在するデー
タブロックの一貫性保持を行うことを特徴とする。In order to solve this problem, for example, the present invention has the following configuration. That is, in a parallel computer system including a plurality of processors, a cache memory attached to them, a main storage device, and a connection network interconnecting the cache memory and the storage devices, each processor process issues a predetermined instruction. When this is done, the data block existing in each cache memory is kept consistent based on the issue of the instruction.

【００１１】ここで本発明の好適な実施形態に従えば、
一貫性保持は、各プロセッサから発行されたメモリ・ト
ランザクションが、他のキャッシュ中に共有される可能
性があるデータブロックに対するメモリ・トランザクシ
ョンであることを識別する識別手段と、前記共有データ
ブロックと識別されたメモリ・トランザクションによっ
て格納されたキャッシュ中のデータブロックが共有デー
タブロックであることを保持する保持手段と、前記保持
手段により保持されている情報を用いて一貫性保持動作
を実施するデータブロックを選択する選択手段を備える
ことが望ましい。これによって、プロセッサが予め定め
た段階において、共有データブロックに対してのみシス
テムにおける一貫性保持動作を実施する。According to a preferred embodiment of the present invention,
Consistency retention is an identification means for identifying that a memory transaction issued from each processor is a memory transaction for a data block that may be shared in another cache, and the shared data block. Holding means for holding that the data block in the cache stored by the executed memory transaction is a shared data block, and a data block for performing a coherency holding operation using the information held by the holding means. It is desirable to have a selecting means for selecting. As a result, the processor performs the consistency maintaining operation in the system only on the shared data block at a predetermined stage.

【００１２】[0012]

【発明の実施の形態】以下、図面を参照して本発明の実
施の形態を詳細に説明する。図１は、本発明を実現する
ためシステムの第１の実施形態の構成を示すブロック図
である。図中、１０，２０はプロセッサであり、それぞ
れプロセッサバス１４，２４を介してキャッシュユニッ
ト１１，２１に接続されている。BEST MODE FOR CARRYING OUT THE INVENTION Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings. FIG. 1 is a block diagram showing a configuration of a first embodiment of a system for realizing the present invention. In the figure, 10 and 20 are processors, which are connected to the cache units 11 and 21 via processor buses 14 and 24, respectively.

【００１３】プロセッサ１０は、メモリ・アクセスを多
重発行することが可能であり、多重発行されたメモリ・
アクセスを緩いメモリ・コンシステンシ・モデルによる
データ一貫性保証の下で完了させるための特別な命令
(同期命令)を持つものとする。The processor 10 can issue multiple memory accesses, and
Special instructions to complete the access with a data consistency guarantee with a loose memory consistency model
(Synchronization command).

【００１４】また、プロセッサ１０は、共有データブロ
ック（例えば、クリティカルセクション内で読み書きさ
れるデータブロック）に対するメモリ・アクセスを開始
することを宣言する特別な命令(以下共有データブロッ
クアクセス開始命令)，共有データブロックに対するメ
モリ・アクセスを終了することを宣言する特別な命令
(以下共有データブロックアクセス終了命令)を持ち、こ
れらの命令はプロセッサコントロールバス１４３(後述)
によって外部モジュールに識別されるものとするが、共
有データブロック，非共有データブロックに対するメモ
リ・アクセスの開始，終了を明示する機構は本実施の形
態に制限されるものではない。また、共有データブロッ
クアクセス開始命令は共有データブロックに対する排他
的な使用権を確保するものであり、共有データブロック
アクセス終了命令は共有データブロックに対する排他的
な使用権を解放するものであるとし、共有データブロッ
クアクセス開始命令，共有データブロックアクセス終了
命令は、ともに主記憶ユニットまで完了の通知が必要で
あるものとする。Further, the processor 10 has a special instruction (hereinafter referred to as a shared data block access start instruction) for declaring that a memory access to a shared data block (for example, a data block read and written in a critical section) is started, and a shared instruction. A special instruction that declares the end of a memory access to a data block
(Hereinafter referred to as shared data block access end instruction), and these instructions are executed by the processor control bus 143 (described later)
However, the mechanism for clearly indicating the start and end of memory access to the shared data block and the non-shared data block is not limited to this embodiment. Further, the shared data block access start instruction secures an exclusive right to use the shared data block, and the shared data block access end instruction releases an exclusive right to use the shared data block. Both the data block access start instruction and the shared data block access end instruction require notification of completion to the main memory unit.

【００１５】またキャッシュユニット１１，２１は、そ
れぞれローカルバス１５を介して主記憶ユニット１２及
び、バスアービタ１６と接続される。またキャッシュユ
ニット１１，２１は、プロセッサの要求に基づいて、キ
ャッシュユニット１１，２１の内部のデータブロックの
データエントリの更新を行ったり、主記憶ユニット１２
に反映させたり等すると共に、ローカルバス１５上を流
れるアドレス情報等をスヌープしてキャッシュ一貫性保
持等の動作を実施する。バスアービタ１６はローカルバ
ス１５の利用権を調停するためのものである。本実施の
形態では、共有バスに接続されたマルチプロセッサシス
テム構成を例にあげているが、これらの構成は本実施の
形態に制限されるものではない。The cache units 11 and 21 are connected to the main memory unit 12 and the bus arbiter 16 via the local bus 15, respectively. Further, the cache units 11 and 21 update the data entry of the data block inside the cache units 11 and 21 and / or the main storage unit 12 based on the request of the processor.
And the like, and also snoops the address information and the like flowing on the local bus 15 to carry out operations such as cache coherency retention. The bus arbiter 16 is for arbitrating the right to use the local bus 15. In this embodiment, a multiprocessor system configuration connected to the shared bus is taken as an example, but these configurations are not limited to this embodiment.

【００１６】本実施の形態では、プロセッサから明示的
に与えられる同期命令発行時点でデータブロックの一貫
性が保証されるような緩いメモリ・コンシステンシ・モ
デルを採用した一貫性制御が行われる図１に示すような
構成の情報処理システムにおいて、キャッシュの一貫性
保持動作を制御するキャッシュ制御シーケンサが、 (１)共有データブロックに対するメモリ・アクセスを開
始する命令(共有データブロックアクセス開始命令)がプ
ロセッサから発行された場合に、以後のメモリ・アクセ
スに対して共有データブロックに対するメモリ・アクセ
スを終了する命令(共有データブロックアクセス終了命
令)がプロセッサから発行されるまで、共有データブロ
ック・アクセスフラグをセットする。In the present embodiment, the consistency control adopting the loose memory consistency model is performed so that the consistency of the data block is guaranteed at the time of issuing the synchronous instruction explicitly given from the processor. In the information processing system configured as shown in (1), the cache control sequencer that controls the cache coherency operation operates as follows: (1) An instruction to start memory access to a shared data block (shared data block access start instruction) is issued from the processor. When issued, sets the shared data block access flag until the processor issues an instruction to end the memory access to the shared data block for the subsequent memory access (shared data block access end instruction). .

【００１７】(２)主記憶にその値を反映していない状態
(以下DIRTY状態)にあるキャッシュのデータブロック
を、共有データブロックであるかどうか判別し、同期命
令発行時点で共有データブロックのみの主記憶へのライ
トバック処理、及び必要ならば一貫性保持動作を実行す
る。(2) State in which the value is not reflected in the main memory
It is determined whether the cache data block in the (DIRTY state) is a shared data block, and the write-back process to the main memory of only the shared data block at the time of issuing the synchronization command, and the coherency maintenance operation if necessary. Run.

【００１８】という動作を行うことによって、同期命令
発行時点で集中的に発生する主記憶へのライトバック処
理トラフィックや一貫性保持動作・トラフィックのうち
非共有データブロックに対するトラフィックを発生させ
ないことを実現している。By performing the operation described above, it is possible to prevent the write-back processing traffic to the main memory, which occurs intensively at the time of issuing the synchronous instruction, and the consistency maintaining operation / traffic for the non-shared data block from occurring. ing.

【００１９】図２は、本実施の形態の一部分であるキャ
ッシュユニットの構成を示す図である。図２において
は、キャッシュユニット１１を例にあげているが、キャ
ッシュユニット２１についても同様な構成をとるもので
ある。FIG. 2 is a diagram showing the structure of a cache unit which is a part of this embodiment. Although the cache unit 11 is taken as an example in FIG. 2, the cache unit 21 also has a similar configuration.

【００２０】図中、１４４はプロセッサアドレスバス１
４１と接続するためのプロセッサアドレスバスインタフ
ェースであり、１４５はプロセッサデータバス１４２と
接続するためのプロセッサデータバスインタフェースで
あり、１４６はプロセッサコントロールバス１４３と接
続するためのプロセッサコントロールバスインタフェー
スである。In the figure, 144 is a processor address bus 1.
41 is a processor address bus interface for connecting to 41, 145 is a processor data bus interface for connecting to the processor data bus 142, and 146 is a processor control bus interface for connecting to the processor control bus 143.

【００２１】また、１５４はローカルアドレスバス１５
１と接続するためのローカルアドレスバスインタフェー
スであり、１５５はローカルデータバス１５２と接続す
るためのローカルデータバスインタフェースであり、１
５６はローカルコントロールバス１５３と接続するため
のローカルコントロールバスインタフェースである。154 is a local address bus 15
1 is a local address bus interface for connecting with 1, and 155 is a local data bus interface for connecting with the local data bus 152.
Reference numeral 56 is a local control bus interface for connecting to the local control bus 153.

【００２２】１１５はデータを保持するためのデータエ
ントリであり、１１２はデータエントリ１１５のアドレ
スを保持するためのアドレスタグであり、１１３はデー
タエントリ１１５の状態を保持するための状態フラグで
あり、１１４はデータエントリ１１５が共有データブロ
ックであるかどうかを保持するための共有データブロッ
クフラグである。これらの部分は例えばＳＲＡＭのよう
な記憶素子の集合であるとするが、これは本実施形態に
制限されるものではない。Reference numeral 115 is a data entry for holding data, 112 is an address tag for holding the address of the data entry 115, 113 is a status flag for holding the status of the data entry 115, Reference numeral 114 is a shared data block flag for holding whether or not the data entry 115 is a shared data block. These parts are assumed to be a set of storage elements such as SRAM, but this is not limited to the present embodiment.

【００２３】１１６はアドレスタグ１１２の内容とプロ
セッサアドレスバス１４１，ローカルアドレスバス１５
１上のアドレスを比較する比較器である。また、１１７
は比較器１１６の比較結果からデータエントリ内のデー
タを選択する選択器である。Reference numeral 116 denotes the contents of the address tag 112, the processor address bus 141, and the local address bus 15.
It is a comparator that compares the address on 1. Also, 117
Is a selector for selecting the data in the data entry from the comparison result of the comparator 116.

【００２４】また、図中、１１８はプロセッサ１０が共
有データブロックに対するアクセス中であることを示す
共有データブロック・アクセスフラグであり、プロセッ
サ１０から発行された共有データブロックアクセス開始
命令が完了するとセットされ、プロセッサ１０から発行
された共有データブロックアクセス終了命令が完了する
とリセットされるものとする。Further, in the figure, reference numeral 118 denotes a shared data block access flag which indicates that the processor 10 is accessing the shared data block, which is set when the shared data block access start instruction issued from the processor 10 is completed. It shall be reset when the shared data block access end instruction issued from the processor 10 is completed.

【００２５】１１１はキャッシュユニット内の各モジュ
ールを制御するキャッシュ制御シーケンサである。Reference numeral 111 is a cache control sequencer for controlling each module in the cache unit.

【００２６】本実施の形態では、キャッシュユニットは
１キャッシュ・ラインあたり２５６バイト，１セットあ
たり１６キャッシュ・ラインの２ウェイ・セット・アソ
シアティブ構成をとっているが、これらの構成は本実施
の形態に制限されるものではない。In the present embodiment, the cache unit has a 2-way set associative structure of 256 bytes per cache line and 16 cache lines per set, but these structures are the same as those of the present embodiment. It is not limited.

【００２７】図３は、キャッシュユニット１１内のアド
レスタグ１１２，状態フラグ１１３，共有データブロッ
クフラグ１１４及びデータエントリ１１５の関係，及び
アドレス比較時のアドレス・フィールドを示したもので
ある。FIG. 3 shows the relationship between the address tag 112, the status flag 113, the shared data block flag 114, and the data entry 115 in the cache unit 11, and the address field at the time of address comparison.

【００２８】図示において、３２ビットのアドレス０ｘ
ｆ８０００７０１（０ｘは１６進数を示す）がプロセッ
サアドレスバス１４１又はローカルアドレスバス１５１
から渡された場合、アドレス比較がどのように行われる
かを説明する。In the figure, 32-bit address 0x
f8000701 (0x indicates a hexadecimal number) is the processor address bus 141 or the local address bus 151.
Describes how the address comparison is done when passed from.

【００２９】本実施の形態では、キャッシュユニットは
１キャッシュ・ラインあたり２５６バイト，１セットあ
たり１６キャッシュ・ラインの２ウェイ・セット・アソ
シアティブ構成をとっているため、アドレスタグ１１２
は２０ビット、キャッシュ・ラインの選択に使用される
インデクスは４ビット、キャッシュ・ライン内のデータ
ブロックの位置を指定するオフセットは８ビットのフィ
ールドを持つ。渡されたアドレスの下位８ビットはオフ
セットフィールドであり、キャッシュ・ライン内のデー
タブロックの位置を指定する。渡されたアドレスの上位
２０ビットはアドレスタグフィールドであり、アドレス
タグ１１２と比較される。尚、渡されたアドレスのオフ
セットフィールドとアドレスタグフィールドを除いた部
分はインデクスフィールドであり、セット内のキャッシ
ュ・ラインの位置を指定する。In this embodiment, since the cache unit has a 2-way set associative structure of 256 bytes per cache line and 16 cache lines per set, the address tag 112
Has a field of 20 bits, an index used to select a cache line is 4 bits, and an offset designating the position of a data block in the cache line has an 8-bit field. The lower 8 bits of the passed address are an offset field that specifies the location of the data block within the cache line. The upper 20 bits of the passed address are an address tag field and are compared with the address tag 112. The part of the passed address excluding the offset field and the address tag field is an index field, and specifies the position of the cache line in the set.

【００３０】比較器１１６はインデクスフィールドによ
り指定されたキャッシュ・ラインのアドレスタグに格納
されている値とプロセッサアドレスバス１４１又はロー
カルアドレスバス１５１から渡されたアドレスの上位２
０ビットとを比較する。上記比較結果が一致しており、
かつ、インデクスフィールドにより指定されたキャッシ
ュ・ラインの状態フラグ(状態フラグの状態の詳細につ
いては後述)が有効状態(CLEAN，又はDIRTY)の場合、キ
ャッシュ・ヒットとする(トランザクションがREADの場
合はキャッシュ・リードヒット、WRITEの場合はキャッ
シュ・ライトヒットと呼ぶ）。キャッシュ・ヒット以外
の場合はキャッシュ・ミスとする(トランザクションがR
EADの場合はキャッシュ・リードミス、WRITEの場合はキ
ャッシュ・ライトミスと呼ぶ）。The comparator 116 stores the value stored in the address tag of the cache line designated by the index field and the upper two addresses of the address passed from the processor address bus 141 or the local address bus 151.
Compare with 0 bit. The above comparison results are in agreement,
And, if the status flag of the cache line specified by the index field (details of the status of the status flag is described later) is in the valid status (CLEAN or DIRTY), it is considered as a cache hit (if the transaction is READ, the cache is hit).・ In the case of read hit, WRITE is called cache write hit). If it is not a cache hit, it is considered as a cache miss.
EAD is called a cache read miss, and WRITE is called a cache write miss).

【００３１】図３において、アドレス０ｘｆ８０００７
０１のオフセットフィールドは０ｘ０１であるため、キ
ャッシュ・ライン内のオフセットは１となる。アドレス
０ｘｆ８０００７０１のインデクスフィールドは０ｘ７
であるため、セット内のキャッシュ・ラインインデクス
は７となる。アドレス０ｘｆ８０００７０１のアドレス
タグフィールドは０ｘｆ８０００であり、キャッシュ・
ラインインデクス７に格納されているアドレスタグと比
較される。In FIG. 3, address 0xf80007.
Since the offset field of 01 is 0x01, the offset in the cache line is 1. The index field of address 0xf8000701 is 0x7
Therefore, the cache line index in the set is 7. The address tag field of the address 0xf8000701 is 0xf8000,
The address tag stored in the line index 7 is compared.

【００３２】図４は、各メモリトランザクション実行時
の状態フラグの状態遷移図を示したものである(以下キ
ャッシュユニット１１の状態フラグとして説明する）。
図４において、状態INVALIDは当該状態フラグが管理す
るデータエントリが無効であることを示す。状態CLEAN
は当該状態フラグが管理するデータエントリが主記憶ユ
ニットからLOADされた後、１度も書き換えられていない
ことを示す。但し、当該データエントリには主記憶ユニ
ットと同一の値が格納されているが、他のキャッシュユ
ニットのデータエントリには最新の値が格納されている
可能性がある。状態DIRTYは当該状態フラグが管理する
データエントリが主記憶ユニットからLOADされた後、１
度以上最新の値で書き換えられ、かつ、主記憶にその値
を反映していないことを示す。当該データエントリには
最新の値が格納されている。各状態からの遷移条件は以
下のとおりであるとする。FIG. 4 is a state transition diagram of the state flags at the time of executing each memory transaction (hereinafter described as the state flag of the cache unit 11).
In FIG. 4, the status INVALID indicates that the data entry managed by the status flag is invalid. Condition CLEAN
Indicates that the data entry managed by the status flag has never been rewritten after being loaded from the main memory unit. However, although the same value as the main storage unit is stored in the data entry, the latest value may be stored in the data entry of another cache unit. The status DIRTY is 1 after the data entry managed by the status flag is loaded from the main storage unit.
Indicates that the value has been rewritten with the latest value more than once and the value is not reflected in the main memory. The latest value is stored in the data entry. The transition conditions from each state are as follows.

【００３３】(１)INVALID状態にあるデータブロックに
対してプロセッサ１０からLOAD命令が発行された場合、
状態フラグはCLEANに遷移する。(1) When a LOAD instruction is issued from the processor 10 to a data block in the INVALID state,
The status flag transits to CLEAN.

【００３４】(２)INVALID状態にあるデータブロックに
対してプロセッサ１０からSTORE命令が発行された場
合、一旦キャッシュ・リードミス処理が実行され、状態
フラグがCLEANに遷移した後、キャッシュ・ライトヒッ
ト処理が実行され、状態フラグはDIRTYに遷移する。(2) When the STORE instruction is issued from the processor 10 to the data block in the INVALID state, the cache read hit process is executed once, the state flag transits to CLEAN, and then the cache write hit process is executed. It is executed and the status flag transits to DIRTY.

【００３５】(３)CLEAN状態にあるデータブロックに対
してプロセッサ１０からLOAD命令が発行された場合、状
態フラグはCLEANに遷移する。(3) When the LOAD command is issued from the processor 10 to the data block in the CLEAN state, the state flag transits to CLEAN.

【００３６】(４)CLEAN状態にあるデータブロックに対
してプロセッサ１０からSTORE命令が発行された場合、
状態フラグはDIRTYに遷移する。(4) When the STORE command is issued from the processor 10 to the data block in the CLEAN state,
The status flag transits to DIRTY.

【００３７】(５)CLEAN状態にあるデータブロックに対
して一貫性保持動作が実行された場合、状態フラグはIN
VALIDに遷移する。(5) When the consistency maintaining operation is executed on the data block in the CLEAN state, the state flag is IN
Transition to VALID.

【００３８】(６)DIRTY状態にあるデータブロックに対
してプロセッサ１０からLOAD命令が発行された場合、状
態フラグはDIRTYに遷移する。(6) When the LOAD instruction is issued from the processor 10 to the data block in the DIRTY state, the state flag transits to DIRTY.

【００３９】(７)DIRTY状態にあるデータブロックに対
してプロセッサ１０からSTORE命令が発行された状態フ
ラグはDIRTYに遷移する。(7) The status flag issued by the processor 10 for the STORE instruction for the data block in the DIRTY state transits to DIRTY.

【００４０】(８)DIRTY状態にあるデータブロックに対
してプロセッサ１０から同期命令が発行された場合、状
態フラグはCLEANに遷移する。(8) When the processor 10 issues a synchronization command to a data block in the DIRTY state, the state flag transits to CLEAN.

【００４１】(９)DIRTY状態にあるデータブロックに対
して主記憶へのライトバック処理が実行された場合、状
態フラグはCLEANに遷移する。(9) When the write-back process to the main memory is executed for the data block in the DIRTY state, the state flag transits to CLEAN.

【００４２】(１０)DIRTY状態にあるデータブロックに
対して一貫性保持動作が実行された場合、状態フラグは
INVALIDに遷移する。(10) When the consistency maintaining operation is executed for the data block in the DIRTY state, the state flag is
Transition to INVALID.

【００４３】図５は、LOAD命令実行の際の制御手順を示
したものである(以下LOAD命令がプロセッサ１０から発
行されたものとして説明する）。FIG. 5 shows a control procedure at the time of executing the LOAD instruction (hereinafter, it is assumed that the LOAD instruction is issued from the processor 10).

【００４４】図５において、プロセッサ１０から発行さ
れるLOAD命令に対してアドレス比較を行い（ステップＳ
１）、キャッシュ・リードヒットした場合、キャッシュ
ユニット１１は、プロセッサ１０に対して自身に記憶さ
れているデータブロックを供給する（ステップＳ２）。In FIG. 5, address comparison is performed for the LOAD instruction issued from the processor 10 (step S
1) If a cache read hit occurs, the cache unit 11 supplies the data block stored therein to the processor 10 (step S2).

【００４５】また、プロセッサ１０から発行されるLOAD
命令に対してアドレス比較を行い、キャッシュ・リード
ミスした場合、キャッシュユニット１１は、ローカルバ
ス１５に対してリード要求を発行する（ステップＳ
３）。この場合、キャッシュ・リードミスしたデータブ
ロックがキャッシュユニット１１に供給されるまで、キ
ャッシュユニット１１はプロセッサ１０に対して当該デ
ータブロックの供給を行わない。キャッシュユニット１
１は、ローカルアドレスバス１５１に対して当該リード
アクセスのアドレスを転送し、ローカルコントロールバ
ス１５３に対してリード要求を発行し、ローカルデータ
バス１５２にデータブロックが供給されるまで一貫性保
持動作に対するサービス及びプロセッサからのアクセス
要求に対するサービスを停止するものとするが、これは
本実施の形態に制限されるものではない。主記憶ユニッ
ト１２は、リード要求，及び当該リードアクセスのアド
レスを受け付けてローカルデータバス１５２にデータブ
ロックを供給する（ステップＳ４）。キャッシュユニッ
ト１１は、ローカルデータバス１５２に供給されたデー
タブロックをキャッシュユニット１１内の当該データブ
ロックのエントリに置換し、当該データブロックの状態
フラグ１１３をCLEANとする（ステップＳ５）。共有デ
ータブロック・アクセスフラグ１１８がセットされてい
る場合、キャッシュユニット１１内の当該データブロッ
クの共有データブロックフラグ１１４をセットする。共
有データブロック・アクセスフラグ１１８がリセットさ
れている場合、キャッシュユニット１１内の当該データ
ブロックの共有データブロックフラグ１１４をリセット
する。キャッシュユニット１１は、プロセッサ１０に対
してデータブロックを供給する（ステップＳ６）。LOAD issued from the processor 10
When an address comparison is performed on the instruction and a cache read miss occurs, the cache unit 11 issues a read request to the local bus 15 (step S).
3). In this case, the cache unit 11 does not supply the data block to the processor 10 until the cache read missed data block is supplied to the cache unit 11. Cash unit 1
1 transfers the address of the read access to the local address bus 151, issues a read request to the local control bus 153, and services the consistency maintaining operation until a data block is supplied to the local data bus 152. Also, the service for the access request from the processor is stopped, but this is not limited to the present embodiment. The main memory unit 12 receives the read request and the address of the read access and supplies a data block to the local data bus 152 (step S4). The cache unit 11 replaces the data block supplied to the local data bus 152 with the entry of the data block in the cache unit 11, and sets the status flag 113 of the data block to CLEAN (step S5). When the shared data block access flag 118 is set, the shared data block flag 114 of the data block in the cache unit 11 is set. When the shared data block access flag 118 is reset, the shared data block flag 114 of the data block in the cache unit 11 is reset. The cache unit 11 supplies the data block to the processor 10 (step S6).

【００４６】次に、STORE命令実行の際の制御手順を図
６に従って説明する(以下STORE命令がプロセッサ１０か
ら発行されたものとして説明する）。Next, the control procedure for executing the STORE instruction will be described with reference to FIG. 6 (hereinafter, the STORE instruction will be described as issued from the processor 10).

【００４７】図６において、プロセッサ１０から発行さ
れるSTORE命令に対してアドレス比較を行い（ステップ
Ｓ１０）、キャッシュ・ライトヒットした場合には、当
該データブロックの状態フラグをDIRTY状態とする（ス
テップＳ１２、１３）。プロセッサ１０から発行される
STORE命令に対してアドレス比較を行い、キャッシュ・
ライトミスした場合には、キャッシュ・リードミス処理
を行った後、キャッシュ・ライトヒット処理を行うもの
とする（ステップＳ１１）。In FIG. 6, address comparison is performed on the STORE instruction issued from the processor 10 (step S10), and if a cache write hit occurs, the status flag of the data block is set to the DIRTY status (step S12). , 13). Issued from processor 10
Performs address comparison for the STORE instruction and caches
When a write miss occurs, the cache write hit process is performed after the cache read miss process (step S11).

【００４８】図７は、同期命令実行の際の制御手順を示
したものである(以下同期命令がプロセッサ１０から発
行されたものとして説明する）。FIG. 7 shows a control procedure at the time of executing the synchronous instruction (hereinafter, it is assumed that the synchronous instruction is issued from the processor 10).

【００４９】図示において、キャッシュユニット１１に
共有データブロックフラグ１１４がセットされ、かつ、
DIRTY状態のデータブロックが１つ以上存在し（ステッ
プＳ２０）、かつ、プロセッサ１０から同期命令が発行
された場合、当該DIRTYブロックの主記憶へのライトバ
ック処理を実行し、当該データブロックの状態フラグを
CLEANにする（ステップＳ２２、２３）。In the figure, the shared data block flag 114 is set in the cache unit 11, and
When there is one or more data blocks in the DIRTY state (step S20) and a synchronization instruction is issued from the processor 10, write-back processing to the main memory of the DIRTY block is executed, and the state flag of the data block. To
Set to CLEAN (steps S22 and S23).

【００５０】本実施形態ではDIRTYブロックの主記憶へ
のライトバック処理はDIRTYブロックがなくなるまで繰
り返し実行されるものとするが、これは本実施形態に制
限されるものではない。In this embodiment, the write-back processing of the DIRTY block to the main memory is repeatedly executed until the DIRTY block is exhausted, but this is not limited to this embodiment.

【００５１】図８は、主記憶へのライトバック処理実行
の際の制御手順を示したものである(以下主記憶へのラ
イトバック処理がキャッシュユニット１１から発行され
たものとして説明する）。FIG. 8 shows a control procedure at the time of executing the write-back process to the main memory (hereinafter, it is assumed that the write-back process to the main memory is issued from the cache unit 11).

【００５２】図８において、キャッシュユニット１１
は、ローカルコントロールバス１５３に対してライトバ
ック要求元情報、及びローカルアドレスバス１５１に対
して当該ライトバックアクセスのアドレスを転送し、ロ
ーカルコントロールバス１５３に対してライトバック要
求を発行し、ローカルデータバス１５２にライトバック
するデータブロックを供給する（ステップＳ３０）。そ
して、ライトバックしたデータブロックが当該主記憶ユ
ニットのエントリに供給され、必要な一貫性保持動作が
完了し、主記憶へのライトバック処理が完了するまで実
行を停止する。In FIG. 8, the cache unit 11
Transfers the write-back request source information to the local control bus 153 and the address of the write-back access to the local address bus 151, issues a write-back request to the local control bus 153, A data block to be written back to 152 is supplied (step S30). Then, the written-back data block is supplied to the entry of the main memory unit, and the execution is stopped until the necessary consistency holding operation is completed and the write-back processing to the main memory is completed.

【００５３】主記憶ユニット１２は、リード要求、及び
当該ライトバックアクセスのアドレス及びローカルデー
タバス１５２に供給されているデータブロックを受け付
けて当該データブロックのエントリにリードする。ま
た、同時にキャッシュユニット１１以外のキャッシュユ
ニットにもデータのコピーが有効状態(CLEAN状態，もし
くはDIRTY状態)で保持されている場合は、有効状態で保
持しているキャッシュユニットはローカルバス１５に転
送されるアドレスをスヌープして有効状態で保持してい
るデータブロックに対して一貫性保持動作を実行する
（ステップＳ３１〜ステップＳ３４）。The main memory unit 12 receives the read request, the address of the write-back access, and the data block supplied to the local data bus 152, and reads the entry of the data block. At the same time, if the data copy is also held in the cache unit other than the cache unit 11 in the valid state (CLEAN state or DIRTY state), the cache unit held in the valid state is transferred to the local bus 15. The address to be stored is snooped and the consistency holding operation is executed for the data block held in the valid state (steps S31 to S34).

【００５４】本実施形態では一貫性保持動作は無効化型
のトランザクションとするが、これは本実施形態に制限
されるものではない。In this embodiment, the consistency holding operation is an invalidation type transaction, but this is not limited to this embodiment.

【００５５】図９は、データブロックの置換実行の際の
制御手順を示したものである(以下キャッシュユニット
１１がデータブロックの置換を実行するものとして説明
する）。FIG. 9 shows a control procedure at the time of executing the data block replacement (hereinafter, the cache unit 11 will be described as the data block replacement).

【００５６】図９において、キャッシュユニット１１
は、データブロックの置換要求が発行された場合、置換
すべきデータブロックをＬＲＵ等のデータブロック置換
アルゴリズムにより、置換対象データブロックを検出す
る（ステップＳ４０）。In FIG. 9, the cache unit 11
When a data block replacement request is issued, the data block replacement algorithm detects a replacement target data block by a data block replacement algorithm such as LRU (step S40).

【００５７】当該置換対象データブロックがDIRTY状態
の場合（ステップＳ４１）、DIRTYブロックの主記憶へ
のライトバック処理を実行する（ステップＳ４２）。当
該置換対象データブロックがINVALID状態の場合や、当
該置換対象データブロックがCLEAN状態の場合、及び当
該置換対象データブロックがDIRTY状態で主記憶へのラ
イトバック処理が完了した場合、当該データブロックの
データエントリにデータブロックをLOADする（ステップ
Ｓ４３、４４）。When the data block to be replaced is in the DIRTY state (step S41), the write-back process to the main memory of the DIRTY block is executed (step S42). If the replacement target data block is in the INVALID state, or if the replacement target data block is in the CLEAN state, or if the replacement target data block is in the DIRTY state and the write-back processing to the main memory is complete, the data of the data block The data block is loaded into the entry (steps S43, 44).

【００５８】本手法の理解のために、まず本手法の特徴
である共有データブロックに対するメモリ・アクセスに
関しての一貫性保持動作が同期命令発行時点まで延期さ
れる例を説明する。In order to understand the present method, first, an example in which the consistency maintaining operation regarding the memory access to the shared data block, which is a characteristic of the present method, is postponed until the synchronous command issuance is described.

【００５９】具体的には、例えばプロセッサ１０，２０
が非共有データブロックであるアドレスｆ８０００００
０番地に対してLOAD命令を発行し、それぞれのLOAD命令
が完了した後で、プロセッサ１０が非共有データブロッ
クであるアドレスｆ８００００００番地に対してSTORE
命令を発行した場合、STORE命令発行時点、及びその後
の同期命令や共有データブロックアクセス開始命令発行
時点ではキャッシュユニット２１に対して一貫性保持動
作は発行されない点、また、共有データブロックに対し
てメモリ・アクセスするために共有データブロックアク
セス開始命令を発行し、プロセッサ１０が共有データブ
ロックであるアドレスf８０１００００番地に対してSTO
RE命令を発行した場合、STORE命令発行時点ではキャッ
シュユニット２１に対して一貫性保持動作は発行され
ず、その後、同期命令がプロセッサ１０から発行された
場合に、一貫性保持動作が実行される点がどのようにし
て実現するかを図１０を参考にして説明する。Specifically, for example, the processors 10 and 20
Is a non-shared data block at address f800000
The LOAD instruction is issued to address 0, and after each LOAD instruction is completed, the processor 10 stores STORE to the address f8000000, which is a non-shared data block.
When an instruction is issued, the coherency holding operation is not issued to the cache unit 21 at the time of issuing the STORE instruction and at the subsequent issuing of the synchronous instruction or the shared data block access start instruction, and the memory for the shared data block is issued. -A shared data block access start command is issued to access, and the processor 10 sends an STO to the shared data block at address f8010000.
When the RE instruction is issued, the coherency holding operation is not issued to the cache unit 21 at the time of issuing the STORE instruction, and then the coherency holding operation is executed when the synchronization instruction is issued from the processor 10. How to realize will be described with reference to FIG.

【００６０】図１０は、本実施の形態の一貫性保持動作
の一例を示すタイミングチャート図である。FIG. 10 is a timing chart showing an example of the consistency maintaining operation of this embodiment.

【００６１】アドレスｆ８００００００番地，及びアド
レスｆ８０１００００番地が主記憶ユニット１２に割り
当てられているものとして説明する。It is assumed that the address f8000000 and the address f8010000 are assigned to the main memory unit 12.

【００６２】但し、時刻１以前には共有データブロック
アクセス終了命令の実行が完了しているか、又は共有デ
ータブロックアクセス開始命令が実行されていないかの
どちらかで、共有データブロック・アクセスフラグ１１
８はリセットされた状態であり、かつ、アドレスｆ８０
０００００番地，及びアドレスｆ８０１００００番地の
どちらもキャッシュユニット内にコピーをLOADしていな
いとする。However, before the time 1, the execution of the shared data block access end instruction is completed or the shared data block access start instruction is not executed, and the shared data block access flag 11
8 is in the reset state, and the address f80
It is assumed that neither the address 00000 nor the address f8010000 has been loaded into the cache unit.

【００６３】時刻１では、プロセッサ１０，２０が非共
有データブロックとなるアドレスｆ８００００００番地
に対してLOAD命令を発行し、それぞれが主記憶ユニット
１２からLOADを完了している。このとき、アドレスｆ８
００００００番地に対応するキャッシュユニット１１，
２１の内の状態フラグ１１３，２１３は、それぞれCLEA
Nであり、かつ、共有データブロックフラグ１１４はリ
セットされている。At time 1, the processors 10 and 20 issue a LOAD instruction to the address f8000000000, which is a non-shared data block, and the respective LOADs have been completed from the main memory unit 12. At this time, the address f8
Cache unit 11 corresponding to address 000000,
The status flags 113 and 213 of 21 are CLEA, respectively.
N, and the shared data block flag 114 has been reset.

【００６４】時刻２では、プロセッサ１０が非共有デー
タブロックとなるアドレスｆ８０００００番地に対して
STORE命令を発行し、STOREが完了している。このとき、
ローカルバスに対するアクセス，及び一貫性保持動作は
発生しない。キャッシュ１１内の当該データブロックの
状態フラグ１１３はプロセッサ１０の発行したSTORE命
令の結果、DIRTYになっており、かつ、共有データブロ
ック・アクセスフラグ１１８がリセットされているた
め、当該データブロックの共有データブロックフラグ１
１４はリセットされる。At time 2, the processor 10 addresses the address f800000, which is a non-shared data block.
STORE command is issued and STORE is completed. At this time,
Access to the local bus and consistency maintenance operation do not occur. Since the status flag 113 of the data block in the cache 11 is DIRTY as a result of the STORE instruction issued by the processor 10 and the shared data block access flag 118 is reset, the shared data of the data block Block flag 1
14 is reset.

【００６５】時刻３では、プロセッサ１０が共有データ
ブロックに対してメモリ・アクセスを開始するために、
共有データブロックアクセス開始命令を発行し、共有デ
ータブロックアクセス開始命令が完了している。At time 3, the processor 10 starts memory access to the shared data block,
A shared data block access start instruction has been issued, and the shared data block access start instruction has been completed.

【００６６】時刻４では、プロセッサ１０が共有データ
ブロックとなるアドレスｆ８０１００００番地に対して
STORE命令を発行し、STORE命令が完了している。このと
き、ローカルバスに対するアクセス，及び一貫性保持動
作は発生しない。キャッシュ１１内の当該データブロッ
クの状態フラグ１１３はプロセッサ１０の発行したSTOR
E命令の結果、DIRTYになっており、かつ、時刻２で完了
した共有データブロックアクセス開始命令により、共有
データブロック・アクセスフラグ１１８がセットされて
いるため、共有データブロックフラグ１１４がセットさ
れる。At time 4, the processor 10 sends the shared data block to the address f8010000.
STORE command is issued and STORE command is completed. At this time, access to the local bus and consistency holding operation do not occur. The status flag 113 of the data block in the cache 11 is the STOR issued by the processor 10.
The result of the E instruction is DIRTY, and the shared data block access flag 118 is set by the shared data block access start instruction completed at time 2, so the shared data block flag 114 is set.

【００６７】時刻５では、プロセッサ１０が共有データ
ブロックに関するメモリ・アクセスを終了し、共有デー
タブロックアクセス終了命令を発行し、共有データブロ
ックアクセス終了命令が完了している。At time 5, the processor 10 finishes the memory access for the shared data block, issues the shared data block access end instruction, and the shared data block access end instruction is completed.

【００６８】時刻６では、プロセッサ１０が同期命令を
発行している。At time 6, the processor 10 issues a synchronization instruction.

【００６９】時刻７では、時刻６で発行されたプロセッ
サ１０の同期命令により、キャッシュ１１内にDIRTYで
保持されているデータブロックのうち、共有データブロ
ックフラグ１１４がセットされている共有データブロッ
クであるアドレスｆ８０１００００番地の主記憶へのラ
イトバック処理が生じている。同じくキャッシュ１１内
にDIRTYで保持されているアドレスｆ８００００００番
地のデータブロックは、共有データブロックフラグ１１
４がリセットされている非共有データブロックであるた
め主記憶へのライトバック処理は生じない。At time 7, it is a shared data block in which the shared data block flag 114 is set among the data blocks held in DIRTY in the cache 11 by the synchronization instruction of the processor 10 issued at time 6. Write back processing to the main memory at the address f8010000 has occurred. Similarly, the data block at address f8000000 held in the cache 11 at DIRTY is the shared data block flag 11
Since 4 is a reset non-shared data block, write back processing to the main memory does not occur.

【００７０】＜第２の実施の形態＞次に、第２の実施形
態について説明するが、システムの構成及び，キャッシ
ュユニットの構成及び制御等については、上記第１の実
施形態とほぼ同じであるので説明は省略し、異なる点の
みについて説明する。<Second Embodiment> Next, a second embodiment will be described. The system configuration, the cache unit configuration and control, etc. are almost the same as those in the first embodiment. Therefore, description will be omitted and only different points will be described.

【００７１】図１において、プロセッサ１０は、メモリ
・アクセスを多重発行することが可能であり、多重発行
されたメモリ・アクセスを緩いメモリ・コンシステンシ
・モデルによるデータ一貫性保証の下で完了させるため
の特別な命令(同期命令)を持つものとするが、プロセッ
サ１０は、共有データブロックに対するメモリ・アクセ
スを開始することを宣言する特別な命令(共有データブ
ロックアクセス開始命令)，及び共有データブロックに
対するメモリ・アクセスを終了することを宣言する特別
な命令(共有データブロックアクセス終了命令)を持たな
いものとする。In FIG. 1, the processor 10 is capable of issuing multiple memory accesses in multiple numbers, and in order to complete the multiple issued memory accesses under the data consistency guarantee based on the loose memory consistency model. However, the processor 10 declares that the memory access to the shared data block is started (shared data block access start instruction) and the shared data block to the shared data block. It shall not have a special instruction (shared data block access end instruction) that declares the end of memory access.

【００７２】本第２の実施形態では、実施の形態１の共
有データブロックアクセス開始命令，及び共有データブ
ロックアクセス終了命令の代りに、共有データブロック
アクセス開始アドレスに共有データブロックに対するメ
モリ・アクセスを開始したことを示す値をSTOREするこ
とにより、共有データブロックに対するメモリ・アクセ
スを開始することを宣言し、共有データブロックアクセ
ス開始アドレスに共有データブロックに対するメモリ・
アクセスを終了したことを示す値をSTOREすることによ
り、共有データブロックに対するメモリ・アクセスを終
了することを宣言するものとする。また、共有データブ
ロックに対するメモリ・アクセスの開始・終了を共有デ
ータブロックアクセス開始アドレスにSTOREされている
値で判断することにより、キャッシュユニット１１は、
共有データブロック・アクセスフラグ１１８を、プロセ
ッサ１０が共有データブロックアクセス開始アドレスに
共有データブロックに対するメモリ・アクセスを開始し
たことを示す値をSTOREするとセットし、共有データブ
ロックアクセス開始アドレスに共有データブロックに対
するメモリ・アクセスを終了したことを示す値をSTORE
するとリセットするものとする。In the second embodiment, instead of the shared data block access start instruction and shared data block access end instruction of the first embodiment, memory access to the shared data block is started at the shared data block access start address. The memory access to the shared data block is declared by storing the value indicating that
The memory access to the shared data block shall be declared to be completed by STORE the value indicating that the access is completed. In addition, the cache unit 11 determines whether the memory access to the shared data block is started or ended by the value stored in the shared data block access start address.
The shared data block access flag 118 is set to STORE with a value indicating that the processor 10 has started the memory access to the shared data block at the shared data block access start address, and the shared data block access start address is set to the shared data block. STORE the value indicating that the memory access is completed
Then, it shall be reset.

【００７３】本実施形態では、プロセッサから明示的に
与えられる同期命令発行時点でデータブロックの一貫性
が保証されるような緩いメモリ・コンシステンシ・モデ
ルを採用した一貫性制御が行われる図１に示すような構
成の情報処理システムにおいて、キャッシュの一貫性保
持動作を制御するキャッシュ制御シーケンサが、共有デ
ータブロックアクセス開始アドレスに共有データブロッ
クに対するメモリ・アクセスを開始したことを示す値を
STOREした場合に、以後のメモリ・アクセスに対して共
有データブロックアクセス開始アドレスに共有データブ
ロックに対するメモリ・アクセスを終了したことを示す
値をSTOREするまで、共有データブロック・アクセスフ
ラグをセットし、主記憶にその値を反映していない状態
(以下DIRTY状態)にあるキャッシュのデータブロック
を、共有データブロックであるかどうか判別し、同期命
令発行時点で共有データブロックのみの主記憶へのライ
トバック処理、及び必要ならば一貫性保持動作を実行す
ることによって、同期命令発行時点で集中的に発生する
主記憶へのライトバック処理トラフィックや一貫性保持
動作・トラフィックのうち非共有データブロックに対す
るトラフィックを発生させないことを実現している。In the present embodiment, the consistency control adopting the loose memory consistency model is performed so that the consistency of the data block is guaranteed at the time of issuing the synchronous instruction explicitly given from the processor. In the information processing system having the configuration as shown, the cache control sequencer that controls the cache coherency maintaining operation sets a value indicating that the shared data block access start address starts the memory access to the shared data block.
When the STORE is performed, the shared data block access flag is set until the value indicating that the memory access to the shared data block has been completed is stored in the shared data block access start address for the subsequent memory access. State that does not reflect that value in memory
It is determined whether the cache data block in the (DIRTY state) is a shared data block, and the write-back process to the main memory of only the shared data block at the time of issuing the synchronization command, and the coherency maintenance operation if necessary. By executing it, it is possible to prevent the write-back processing traffic to the main memory, which occurs intensively at the time of issuing the synchronous instruction, and the traffic for the non-shared data block among the consistency maintaining operation / traffic from occurring.

【００７４】以上説明したように、本実施形態によれ
ば、緩いメモリ・コンシステンシ・モデルを採用し、相
互結合網で互いに結合された並列計算機システムにおい
て、その性能をより向上させるためのキャッシュ・メモ
リ及びキャッシュ・メモリの一貫性保持動作機構を提供
し、同期ポイント時点に集中する相互結合網上のトラフ
ィックを共有データブロックに対するもののみに限定し
て発生させることで、相互結合網の利用効率低下，及び
同期動作に伴う処理のオーバーヘッドを軽減することが
可能であり、システム全体の処理能力を向上させるとい
う効果がある。As described above, according to the present embodiment, in the parallel computer system which adopts the loose memory consistency model and is connected to each other by the interconnection network, the cache memory for improving its performance is further improved. By providing a coherency maintenance mechanism for memory and cache memory, and generating traffic on the interconnection network that concentrates at the synchronization point time only for shared data blocks, the utilization efficiency of the interconnection network is reduced. , And the processing overhead associated with the synchronous operation can be reduced, which has the effect of improving the processing capacity of the entire system.

【００７５】尚、本願発明は複数のプロセッサ（上記実
施形態では２個としたがそれ以上であっても勿論構わな
い）に適合することに特徴があるのであって、そのシス
テムあるいは装置は如何なる用途に用いられるものであ
っても良いし、汎用のコンピュータ装置であっても良い
のは勿論である。It should be noted that the present invention is characterized by adapting to a plurality of processors (the number of processors is two in the above-mentioned embodiment, but it is of course possible to have more than two), and the system or apparatus can be used for any purpose. It is needless to say that it may be used for the above or a general-purpose computer device.

【００７６】[0076]

【発明の効果】以上説明したように、本発明によれば、
キャッシュの一貫性保持動作の実施を都合の良い時点ま
で遅延させ、共有データブロックに対してのみ一貫性保
持を行うことで、例えば、緩いメモリ・コンシステンシ
・モデルのシステムに、同期ポイント時点における相互
結合網上のトラフィックの集中による相互結合網の利用
効率の低下を避け、より性能の高いマルチプロセッサ装
置が提供できる。As described above, according to the present invention,
Delaying the implementation of the cache coherency operation until a convenient point in time, and coherency only for shared data blocks, allows, for example, a system with a loose memory consistency model to interact with each other at the point of synchronization. A multiprocessor device having higher performance can be provided while avoiding a decrease in utilization efficiency of the mutual connection network due to concentration of traffic on the connection network.

【００７７】[0077]

[Brief description of drawings]

【図１】本発明の実施形態における情報処理システムの
プロセッサ周辺の回路構成を示す図である。FIG. 1 is a diagram showing a circuit configuration around a processor of an information processing system according to an embodiment of the present invention.

【図２】実施形態におけるキャッシュユニットの構成を
示す図である。FIG. 2 is a diagram showing a configuration of a cache unit in the embodiment.

【図３】実施形態のキャッシュユニットにおけるアドレ
ス比較を説明する図である。FIG. 3 is a diagram illustrating address comparison in a cache unit according to the embodiment.

【図４】実施形態の状態フラグの状態遷移図である。FIG. 4 is a state transition diagram of state flags of the embodiment.

【図５】実施形態においてキャッシュユニットがLOAD命
令実行時に行う処理のチャートである。FIG. 5 is a chart of a process performed by a cache unit when executing a LOAD instruction in the embodiment.

【図６】実施形態においてキャッシュユニットがSTORE
命令命令実行時に行う処理のチャートである。FIG. 6 shows a cache unit STORE in the embodiment.
It is a chart of the processing performed when an instruction is executed.

【図７】実施形態においてキャッシュユニットが同期命
令実行時に行う処理のチャートである。FIG. 7 is a chart of processing performed by a cache unit when executing a synchronous instruction in the embodiment.

【図８】実施形態において主記憶へのライトバック処理
実施時のチャートである。FIG. 8 is a chart at the time of performing write back processing to the main memory in the embodiment.

【図９】実施形態においてデータブロックの置換時に行
う処理のチャート図である。FIG. 9 is a chart of a process performed when replacing a data block in the embodiment.

【図１０】実施形態において同期命令発行時点まで共有
データブロックに対するメモリ・アクセスに関しての一
貫性保持動作が延期される例を示す図である。FIG. 10 is a diagram showing an example in which a consistency maintaining operation regarding a memory access to a shared data block is postponed until a synchronous command is issued in the embodiment.

[Explanation of symbols]

１０、２０プロセッサ１１、２１キャッシュユニット１２主記憶ユニット１４、２４プロセッサバス１６バスアービタ１５ローカルバス 10, 20 processor 11, 21 cache unit 12 main memory unit 14, 24 processor bus 16 bus arbiter 15 local bus

Claims

[Claims]

1. In a parallel computer system including a plurality of processors, a cache memory associated with them, a main memory, and a connection network interconnecting the cache memory and the memory, each processor issues a predetermined instruction. When performing, the multiprocessor device is characterized in that the data blocks existing in the respective cache memories are held consistent based on the instruction issuance.

2. The coherency retention means for identifying that a memory transaction issued by each processor is a memory transaction for a data block that may be shared in another cache. Holding means for holding that the data block in the cache stored by the memory transaction identified as the shared data block is a shared data block, and a consistency holding operation using the information held by the holding means 2. The method according to claim 1, further comprising selection means for selecting a data block for executing the method, wherein the processor performs the consistency maintaining operation in the system only on the shared data block at a predetermined stage. Multiprocessor device.

3. The multiprocessor system according to claim 1, wherein the system takes a loose memory consistency model.

4. The multiprocessor device according to claim 2, wherein the shared data block is a data block read and written in a critical section.