JPH0210462A

JPH0210462A - Method of forcing cohesion of cash for computer apparatus and computer apparatus

Info

Publication number: JPH0210462A
Application number: JP1049761A
Authority: JP
Inventors: Jon Rubinstein; ジヨン・ルビンシユテイン; S Milancar Glen; グレン・エス・ミランカー
Original assignee: Ardent Computer Corp
Current assignee: Ardent Computer Corp
Priority date: 1988-03-01
Filing date: 1989-03-01
Publication date: 1990-01-16
Also published as: GB2216308A; GB8903963D0; IT1229126B; IT8919608A0

Abstract

PURPOSE: To realize an efficient caching operation by invalidating a cache entry when the cache entry contradicts an entry in a main memory. CONSTITUTION: Processors 11-13 are mutually connected by a bus 10. The processors 11-13 have bus interface devices 14, bus watchers 15, data cache memories 18, processors 23 and floating point processors 30. The bus watcher 15 has an even address tag array 16 and an odd address tag array 17, monitors the bus 10 for write transaction, and invalidates the corresponding cache entry when write transaction is executed by a memory mutator.

Description

【発明の詳細な説明】〔産業上の利用分野〕本発明は、コンピュータ装置において多数の装置の間の
キャッシュの首尾一貫性を強いる方法および装置の分野
に関するものである。DETAILED DESCRIPTION OF THE INVENTION Field of the Invention The present invention is in the field of methods and apparatus for enforcing cache coherency among multiple devices in a computer system.

〔従来の技術と発明が解決すべき課題〕コンピュータ装
置においては、しばしば用いられるデータおよびプログ
ラムのアクセス速度を向上させるために、キャッシュメ
モリを利用できる。[Background Art and Problems to be Solved by the Invention] In computer devices, cache memory can be used to improve the access speed of frequently used data and programs.

キャッシュメモリというのは、典型的にはＣＰＵの速度
にひけをとらない高速メモリである。そのメモリはＣＰ
Ｕと、それよυ低速の主メモリの間のバッファとして機
能する。典型的には、しばしば用いられるデータがキャ
ッシュメモリに格納され、読出し動作中ＫＣＰＵにより
アクセスされる。書込み動作によυキャッシュメモリへ
の書込みが行われ、ある時点において主メモリへ書込ま
れる。キャッシュメモリから主メモリを更新するには、
キャッシュを介しての書込みと、キャッシュへ書戻す書
込みとの２つの方法がある。Cache memory is typically high-speed memory that is comparable to the speed of a CPU. That memory is CP
It acts as a buffer between U and the slower main memory. Typically, frequently used data is stored in cache memory and accessed by the KCPU during read operations. A write operation causes a write to υ cache memory and, at some point, to main memory. To update main memory from cache memory,
There are two methods: writing through the cache and writing back to the cache.

キャッシュを介する書込み機構においては、キャッシュ
メモリへ書込みが行われると、それとほぼ同時に主メモ
リへの対応する書込みが行われる。In a write-through-cache mechanism, a write to cache memory causes a corresponding write to main memory at approximately the same time.

キャッシュへの書戻し法においては、キャッシュメモリ
への書込みに必ずしも相関しない時点で新しいデータで
主メモリが更新される。書戻しキャッシュの例について
スマート・キャッシュ・プロトコルに関連して以下に説
明する。In write-back to cache methods, main memory is updated with new data at times that are not necessarily correlated with writes to cache memory. An example of a write-back cache is described below in connection with a smart cache protocol.

キャッシュメモリを使用するには、ある時点でキャッシ
ュメモリと主メモリが正確に対応する必要がある。共実
行装置を持たない１つのｃｐｏ装置の場合には、これは
大きな問題点である。しかし、いくつかのメモリ・ミュ
ーデータ（メモリを修正できるＣＰＵのような装置）が
１つのメモリを共用する場合には、キャッシュの一貫性
が大きな問題になる。たとえば、データがプロセッサ１
によりそれのキャッシュメモリへ読込まれ、その後でプ
ロセッサ２によりそれのキャッシュメモリに読込まれ、
それからデータがプロセッサ１によし更新されるものと
すると、プロセッサ２はそれのデータバージョンを更新
または無効にする必要があり、さもないとデータはプロ
セッサ１と２の間で一貫性がなくなる。In order to use cache memory, at some point the cache memory and main memory must correspond accurately. In the case of a single cpo device without a co-executor, this is a major problem. However, when several memory mudata (devices such as CPUs that can modify memory) share a memory, cache coherency becomes a major problem. For example, if the data is processor 1
is read into its cache memory by processor 2, and subsequently read into its cache memory by processor 2;
If the data is then updated to processor 1, processor 2 needs to update or invalidate its data version or the data will be inconsistent between processors 1 and 2.

多数プロセッサが１つの主メモリを共用し、別々のキャ
ッシュメモリを有する場合には、キャッシュの一貫性を
強いるためＫいくつかの方法が良く知られている。１つ
の方法はキャッシュの一貫性を強いるソフトウェアを有
することである。典型的には、キャッシュの一貫性を強
いるソフトウェアアルゴリズムは、主メモリとキャッシ
ュメモリ内のデータに時間スタンプを利用する。ソフト
ウェアアルゴリズムには、キャッシュの一貫性を強いる
ためにクロックサイクルを用いること、システムソフト
ウェアが複雑になること、時間スタンプまたは類似のマ
ークのオーバーヘッドをメモリ中の各データブロックに
加えることを含めたいくつかの欠点がある。Several methods are well known for enforcing cache coherency when multiple processors share one main memory and have separate cache memories. One method is to have software that enforces cache coherency. Typically, software algorithms that enforce cache coherency utilize time stamps on data in main memory and cache memory. Some software algorithms include using clock cycles to enforce cache coherency, increasing system software complexity, and adding the overhead of timestamps or similar marks to each block of data in memory. There are drawbacks.

第２の方法は、汚れている（すなわち、更新された）キ
ャッシュ内の全てのエントリイを見失わないようにする
ことＫより、装置内の全てのキャッシュの一貫性を保つ
ようにするバスプロトコルま九はスマート・キャッシュ
・プロトコルヲ用イる。装置内のデータの特定の装置の
ただ１つの汚れたバージョンが存在しうる。汚れ九バー
ジョンを持たないプロセッサがデータを読出そうとした
とすると、汚れたコピーが主メモリに書込まれ、それか
ら第２のプロセッサがそのデータを読出すことを許され
るように装置は強制せねばならない。The second method uses a bus protocol that attempts to keep all caches in the device consistent, rather than keeping track of all entries in the cache that are dirty (i.e., updated). uses the smart cache protocol. There may be only one tainted version of a particular device's data within the device. If a processor that does not have the dirty nine version attempts to read the data, the device must force a dirty copy to be written to main memory and then allow a second processor to read the data. It won't happen.

そうすると、データの各コピーがどこにあるかを装置が
知ることのような動作がいくつか複雑になること、およ
びデータの汚れたバージョンが存在した後はキャッシュ
からのデータの読出しを許さない、というような規則の
強制をもたらすことになる。更に、多数のデータバスお
よびバスマスターをキャッシュなしで使用することによ
り一層複雑になる。This introduces some operational complexity, such as the device knowing where each copy of the data is, and not allowing data to be read from the cache after a dirty version of the data exists. This will result in the enforcement of strict rules. Additionally, the use of multiple data buses and bus masters without caching adds complexity.

キャッシュの一貫性を強いる第３の方法は、キャッシュ
を介する書込みにバスウオッチャーを用いることである
。典型的には、パスウオッチャーにはキャッシュメモリ
を有する各ＣＰＵが組合わされる。バスウオッチャーは
、装置バスにおける書込みトランザクションを見張るこ
と、書込みトランザクションがそれのプロセッサキャッ
シュメモリ内のデータを更新または無効にするかどうか
を判定することに責任を負う。A third way to enforce cache coherency is to use a bus watcher on writes through the cache. Typically, a path watcher is associated with each CPU having a cache memory. The bus watcher is responsible for monitoring write transactions on the device bus and determining whether the write transactions update or invalidate data in its processor cache memory.

更に、キャッシュメモリを用いる典型的なコンピュータ
装置においては、多数の語を１つのフェッチ命令で主メ
モリからキャッシュメモリ内に置くために、多数の語を
主メモリからフェッチすると有利である。そのような装
置はバスおよびメモリの管理において通常有利である。Additionally, in typical computer systems that utilize cache memory, it is advantageous to fetch multiple words from main memory in order to place multiple words from main memory into cache memory with a single fetch instruction. Such devices are typically advantageous in bus and memory management.

たとえば、プログラムの実行中に命令が順次アクセスさ
れるのが普通で、順次実行される命令と対比して分岐命
令は比較的−船釣でない。したがって、命令ブロックを
フェッチすると装置が効率的になる。For example, instructions are typically accessed sequentially during program execution, and branch instructions are relatively unbiased compared to instructions that are executed sequentially. Therefore, fetching blocks of instructions makes the device efficient.

キャッシュへのデータブロックの７エツチを許す装置の
設計は周知である。しかし、マイクロプロセッサの進歩
に伴い、キャッシュメモリの制御はマイクロプロセッサ
の内部へ動く。そのようなマイクロプロセッサを用いる
装置においては、マイクロプロセッサが１度に１語を７
エツチすることは普通である。効率的なキャッシング動
作を行う、コンピュータ装置において標準的なマイクロ
プロセッサの使用を許す方法と装置を開発することが望
まれる。Device designs that allow seven etches of data blocks into cache are well known. However, with advances in microprocessors, control of cache memory has moved inside the microprocessor. In devices using such microprocessors, the microprocessor processes seven words one word at a time.
Having sex is normal. It would be desirable to develop methods and apparatus that allow the use of standard microprocessors in computing devices to perform efficient caching operations.

また、コンピュータ装置においては、モジュール間で通
信する場合に第１の帯域幅たとえば６４ビツトのバスを
利用し、モジュール内で通信する時には第２の帯域幅た
とえば３２ビツトのバスを利用することが知られている
。Furthermore, in computer equipment, it is known that when communicating between modules, a bus with a first bandwidth of, for example, 64 bits is used, and when communicating within a module, a bus with a second bandwidth, for example, 32 bits is used. It is being

そのようなコンピュータ装置においては、よシ広い帯域
幅のバスからのデータをより狭い帯域幅のバスへ注ぎこ
むと装置がつまることがしばしばある。In such computing devices, funneling data from a wider bandwidth bus onto a narrower bandwidth bus often causes the device to become clogged.

[Means to solve problems]

本発明は、多数のメモリミューデータと少くとも１つの
キャッシュメモリと用いるコンピュータ装置において、
キャッシュの一貫性を強いる方法および装置を開示する
ものである。好適な実施例においては、多数の各マイク
ロプロセッサ装置が別々のキャッシュメモリへ結合され
る。この装置はＩ１０装置、グラフィックス・プロセッ
サ、およびキャッシュメモリのない浮動小数点プロセッ
サのような他のメモリエミュレータも有する。The present invention provides a computer device using a large number of memory data and at least one cache memory,
A method and apparatus for enforcing cache coherency is disclosed. In a preferred embodiment, each of the multiple microprocessor devices is coupled to a separate cache memory. The device also has other memory emulators such as I10 devices, graphics processors, and floating point processors without cache memory.

好適な実施例は、書込み仲介キャッシュと、パスウオッ
チャーを各キャッシュメモリに組合わせるバスウオッチ
ャー機構とを用いる。バスウォッチャーは偶数および奇
数のアドレスタグアレイを有する。偶数のアドレスタグ
アレイは、主メモリ内の偶数アドレスに対応する、キャ
ッシュメモリ内のデータのアドレスを保持する。同様Ｋ
、奇数のアドレスタグアレイは、主メモリ内の奇数アド
レスに対応する、キャッシュメモリ内のデータのアドレ
スを保持する。The preferred embodiment uses a write-mediated cache and a bus watcher mechanism that combines a path watcher with each cache memory. The bus watcher has even and odd address tag arrays. The even address tag array holds addresses of data in cache memory that correspond to even addresses in main memory. Similar K
, an odd address tag array holds addresses of data in cache memory that correspond to odd addresses in main memory.

一般に、バスクロックサイクル当勺のタグアクセスの数
は、バスのＩＩ（ビットの数）をプロセッサからのアク
セスのＩＩＩ（ビットの数）で除したものに等しいよう
なコンピュータ装置において本発明を利用できることが
描業者には明らかであろう。In general, the present invention can be utilized in computer systems where the number of tag accesses per bus clock cycle is equal to bus II (number of bits) divided by accesses from the processor III (number of bits). would be obvious to the artist.

好適な実施例においては、プロセッサはバスクロックサ
イクル当＃）２つのタグアクセスまで構成できる。バス
の幅は６４ビツトで、プロセッサはアクセス轟り３２ビ
ツトをアクセスする。In the preferred embodiment, the processor is configurable for up to two tag accesses per bus clock cycle. The width of the bus is 64 bits, and the processor accesses 32 bits.

各ハスウオッチャーは、それ自身のプロセッサからの書
込みトランザクションをモニタして、キャッシュ書込み
がいつ行われたか、したがってそれのアドレスタグの更
新がいつ行われたかを決定する。キャッシュメモリへ結
合されているマイクロプロセッサが、主メモリフェッチ
を含む（すなわち、キャツシュヒツトがなかった）読出
し動作を行おうとする時は、パスウオッチャーのアドレ
スタグがいつでも更新される。バスウオッチャーは、他
のプロセッサからの書込みトランザクションのためにも
装置バスをモニタする。書込むべきアドレスは調べられ
て、アドレスがこのバスウオッチャ内のアドレスタグと
一致するかどうかについての判定が行われる。もし一致
すれば、対応するバスウオッチャ〜、アドレスタグアレ
イリイが無効にされ、キャッシュ内のエントリイが無効
にされる。Each lot watcher monitors write transactions from its own processor to determine when cache writes occur and therefore when updates to its address tag occur. Whenever a microprocessor coupled to a cache memory attempts to perform a read operation that involves a main memory fetch (ie, there was no cache hit), the path watcher's address tag is updated. The bus watcher also monitors the device bus for write transactions from other processors. The address to be written is examined and a determination is made as to whether the address matches the address tag in this bus watcher. If there is a match, the corresponding bus watcher address tag array is invalidated and the entry in the cache is invalidated.

バス監視装置は、プロセッサがキャッシュ書込みを実行
しようとする時にそのプロセッサがバスウオッチャーに
注意しないようなコンピュータ装置における一貫性を強
いるために設計される。Bus watchers are designed to enforce consistency in computer systems such that a processor does not pay attention to the bus watcher when the processor attempts to perform a cache write.

この明細書ではキャッシュの一貫性を強いる方法および
装置について説明する。本発明を完全に理解できるよう
にする丸めに、帯域幅、プロセッサのＳ類等のような数
多くの詳細を以下に述べる。This specification describes methods and apparatus for enforcing cache coherency. Numerous details such as bandwidth, processor class, etc. are set forth below in order to provide a thorough understanding of the invention.

しかし、本発明はそのような特定の詳細なしに実施でき
ることが当業者には明らかであろう。他の場合には、周
知の回路および技術については、本発明を不必要にあい
まいにしないために、周知の回路および技術については
詳しくは説明しなかった。However, it will be apparent to one skilled in the art that the present invention may be practiced without such specific details. In other instances, well-known circuits and techniques have not been described in detail in order not to unnecessarily obscure the present invention.

本発明杖、別々の書込み仲介キャッシュメモリを有する
多数のプロセッサを持つコンピュータ装置において、キ
ャッシュの一貫性を強いる方法および装置を提供するも
のである。そのよう々装置は、システムバスのアクセス
およびメモリの更新ができる（メモリミューデータ）任
意の数の装置を有することができる。各メモリミューデ
ータには希望によりキャッシュメモリを組合わせること
ができる。The present invention provides a method and apparatus for enforcing cache coherency in a computer system having multiple processors with separate write-mediated cache memories. Such devices may include any number of devices capable of accessing the system bus and updating memory (memory data). Each memory mu data can be combined with a cache memory as desired.

一般に、キャッシュメモリは、ひんばんにアクセスされ
るデータおよび命令の読出し要求をバッファするために
中央処理装置へ直接結合される高速メモリとして、コン
ピュータ装置に用いられる。Cache memory is commonly used in computer systems as a high speed memory coupled directly to a central processing unit to buffer frequently accessed data and instruction read requests.

本発明の好適な装置のプロセッサ装置は、命令とデータ
とのキャッシュを有する整数プロセッサ装置（ＩＰＵ）
と、装置バスに対するインターフェイスとを有する。ま
た、プロセッサは浮動小数点プロセッサ装置（ＦＰＵ）
を有することもできる。ＩＰＵの命令キャッシュは絖出
し専用であシ、データキャッシュは書込みバッファを行
う書込み仲介キャッシュである。The processor unit of the preferred apparatus of the present invention is an integer processor unit (IPU) having an instruction and data cache.
and an interface to the device bus. In addition, the processor is a floating point processor unit (FPU)
It is also possible to have The instruction cache of the IPU is dedicated to start-up, and the data cache is a write intermediary cache that acts as a write buffer.

好適な実施例のコンピュータ装置においては、２つ以上
のＩＰＵがアシ、各ＩＰＵは自身のデータキャッシュを
有する。また、各ＩＰＵに組合わされるＦＰＵはメモリ
を修正でき、キャッシュメモリヲ参照しない。したがっ
て、装置における各種のデータキャッシュの間のキャッ
シュ−真性の問題が存在する。好適な実施例は、装置バ
スにおけるバス監視の態様により、データキャッシュの
間でキャッシュの一買性を維持する。バス監視によム　
キャッシュ中のどのデータを主メモリ中の格納データに
一致させられる。In the preferred embodiment computing system, there are two or more IPUs, each IPU having its own data cache. Further, the FPU associated with each IPU can modify the memory and does not refer to the cache memory. Therefore, a cache-inherent problem exists between the various data caches in the device. The preferred embodiment maintains cache volatility among data caches through aspects of bus supervision on the device bus. By bus monitoring
Which data in the cache can be matched to the data stored in main memory.

〔Example〕

以下、図面を参照して本発明の詳細な説明する。 Hereinafter, the present invention will be described in detail with reference to the drawings.

第１図は本発明で利用できる装置バス１ｏとプロセッサ
装置１１を示す。好適な実施例では、装置バス１０は６
４ビツトバスであって、バスインターフェイス装置１４
によりプロセッサ１１へ結合される。バスインターフェ
イス装置１４はアドレス線２４を介してバスウォッチャ
ー１５へ結合される。バスウオッチャー１５は偶数アド
レスタグアレイ１６と、奇数アドレスタグアレイ１７を
有する。それぞれのタグアレイ１６．１７への各エント
リイは３２ビツトデータに関連するアドレスを表す。装
置内のデータは６４ビツトバスを介して転送される。し
たがって、キャッシュ内のデータと、アドレスタグアレ
イ中の対応スるエントリイは、クロックサイクル当シロ
４ビット（２語）で更新できる。二重語バス転送が行わ
れたとすると、奇数と偶数のそれぞれのアドレスタグア
レイ１７．１６は、一致しているかどうかについて調べ
られる。単一語すなわちサブ語の転送が行われたとする
と、アドレスタグアレイ１６と１７の一方だけが調べら
れる。どのアドレスタグアレイを調べるかは、アドレス
が偶数語境界と奇数語境界のいずれでスタートするかＫ
よシ決定される。FIG. 1 shows a device bus 1o and a processor device 11 that can be used in the present invention. In the preferred embodiment, device bus 10 has six
A 4-bit bus, the bus interface device 14
is coupled to processor 11 by. Bus interface device 14 is coupled to bus watcher 15 via address lines 24. Bus watcher 15 has an even address tag array 16 and an odd address tag array 17. Each entry in each tag array 16.17 represents an address associated with 32 bit data. Data within the device is transferred via a 64-bit bus. Therefore, data in the cache and corresponding entries in the address tag array can be updated in four bits (two words) per clock cycle. If a dual word bus transfer occurs, each odd and even address tag array 17.16 is examined for a match. If a single word or subword transfer occurs, only one of address tag arrays 16 and 17 is examined. Which address tag array to check depends on whether the address starts on an even word boundary or an odd word boundary.
It will be decided.

バスウオッチャー１５は線２６を介してデータキャッシ
ュメモリ１８へ結合される。この実施例では、データキ
ャッシュメモリのサイズは１６にバイトである。キャッ
シュメモリ１８は偶数アドレスタグアレイ１９と、奇数
アドレスタグアレイ２０と、偶数データアレイ２１と、
奇数データアレイ２２とで構成される。データキャッシ
ュは直接マツプされる（すなわち、アドレスタグアレイ
１９ｔたは２０へのエントリイはデータアレイ２１また
は２２へのエントリイと一対一の対応をなす）。Bus watcher 15 is coupled to data cache memory 18 via line 26. In this example, the size of the data cache memory is 16 bytes. The cache memory 18 includes an even address tag array 19, an odd address tag array 20, an even data array 21,
It is composed of an odd number data array 22. The data cache is directly mapped (ie, entries in address tag array 19t or 20 have a one-to-one correspondence with entries in data array 21 or 22).

データキャッシュ１８は線２Ｔを介してプロセッサ２３
へ結合される。この実施例は、アメリカ合衆国カリホル
ニア州すニーベイル（Ｓｕｎｎｙｖａｊ・）所在のミッ
ゾス・コンピュータ（ＭＩＰＳ　Ｃｏｍｐｕｔ・ｒｓ）
により製造されたプロセッサを用いる。プロセッサ２３
は線２５を介してバスインターフェイス装置１４へ結合
される。線２５は、プロセッサ２３とバスインターフェ
イス１４の間で一方向通信を行えるようにする双方向線
である。Data cache 18 is connected to processor 23 via line 2T.
is combined with This example is manufactured by MIPS Comput.rs, Sunnyvaj, California, USA.
uses a processor manufactured by processor 23
is coupled to bus interface device 14 via line 25. Line 25 is a bidirectional line that allows for one-way communication between processor 23 and bus interface 14 .

プロセッサ２３に加えて、浮動小数点プロセッサ装置（
ＦＰＯ）３Ｇをバスインターフェイス装置１４へ結合で
きる。装置バス１Ｇは他の装置１２゜１３へも結合でき
る。ＦＰＵ３０と装ｆｉｌ　２　、１３は主メモリを更
新できる。もつとも、この実施例ではそれらは別々のキ
ャッシュメモリを持たない。In addition to the processor 23, a floating point processor unit (
FPO) 3G can be coupled to the bus interface device 14. Device bus 1G can also be coupled to other devices 12, 13. The FPU 30 and devices fil 2 and 13 can update the main memory. However, in this embodiment they do not have separate cache memories.

それらの装置は、それがキャッシュメモリであろうとな
かろうと、メモリミューデータと呼ぶことができる。These devices, whether cache memory or not, can be referred to as memory mu data.

第２図は典型的な読出し動作を詳しく示す。読出し動作
中は、プロセッサは、物理アドレスをデータキャッシュ
メモリへ供給することにより、キャッシュの読出しを最
初に試みることができる（ブロック４０）。データキャ
ッシュメモリは供給され九物理アドレスをそれの偶数ま
たは奇数のアドレスタグアレイ中の物理アドレスと比較
する。FIG. 2 details a typical read operation. During a read operation, the processor may first attempt to read the cache by providing a physical address to the data cache memory (block 40). A data cache memory is provided to compare the nine physical addresses with the physical addresses in its even or odd address tag array.

どのアドレスタグアレイと比較するかの判定は、物理ア
ドレスが偶数と奇数のどの語境界で始まるかにより決定
される。キャツシュヒツトが生ずる、すなわち、キャッ
シュ読出しが成功したとすると（ブロック４１）、分岐
が行われて（４２）、データはデータキャッシュアレイ
カラマイクロプロセッサへ供給される。The determination of which address tag array to compare is determined by whether the physical address starts on an even or odd word boundary. If a cache hit occurs, ie, the cache read is successful (block 41), a branch is taken (42) and the data is provided to the data cache array color microprocessor.

データキャッシュメモリ内にデータが存在しないとする
と、マイクロプロセッサによりメモリフェッチが行われ
る。この実施例で用いられるＭＩＰＳコンピュータのマ
イクロプロセッサハ、メモリ７エツチにより求められた
データがメモリアクセスの成功後にキャッシュメモリに
格納されるかどうかについての情報は供給しない。した
がって、プロセッサによりメモリフエツチが行われる時
は、パスウオッチャーはそれのアドレスタグをフェッチ
すべきデータのアドレスで常に更新する。そうすると、
バスウオッチャー・アドレスタグアレイ中の他のあるエ
ントリイを重ね書きさせる。もしそうであれば、データ
キャッシュアレイ中の対応するデータが無効にされる（
ブロック４５）。それから、データをメモリから受ける
ことができる（ブロック４６）。Assuming there is no data in the data cache memory, a memory fetch is performed by the microprocessor. The microprocessor of the MIPS computer used in this embodiment does not provide information as to whether the data determined by the memory 7 etch will be stored in the cache memory after a successful memory access. Therefore, whenever a memory fetch is performed by a processor, the path watcher updates its address tag with the address of the data to be fetched. Then,
Causes other entries in the bus watcher address tag array to be overwritten. If so, the corresponding data in the data cache array is invalidated (
block 45). Data may then be received from memory (block 46).

次に、キャッシュを更新すべきかどうかをプロセッサは
判定する。キャッシュを更新すべきであるとすると、分
岐４８へ進んでキャッシュは更新される（ブロック４９
）。さも表ければキャッシュは更新されない（ブロック
５０）。キャッシュが更新される場合には（ブロック４
９）、キャッシュ内のデータはバスウオッチャー内のア
ドレスタグに一致し続ける。キャッシュが更新されない
場合には（分岐５０）、バスウオッチャー内のアドレス
タグはキャッシュ内のアドレスタグの最高のものである
。この方法により、データがキャッシュメモリに存在す
るかどうかをバスウオッチャーは判定できるようにされ
る。The processor then determines whether the cache should be updated. If the cache is to be updated, branch 48 is taken and the cache is updated (block 49).
). Otherwise, the cache is not updated (block 50). If the cache is updated (block 4
9), the data in the cache continues to match the address tag in the bus watcher. If the cache is not updated (branch 50), then the address tag in the bus watcher is the highest address tag in the cache. This method allows the bus watcher to determine whether data is present in the cache memory.

ここで第５図と第６図を少し参照して、主メモリからキ
ャッシュメモリへ情報をロー′ドするために、この実施
例で利用される方法を詳しく説明する。With brief reference to FIGS. 5 and 6, the method utilized in this embodiment for loading information from main memory to cache memory will now be described in detail.

この実施例では、装置バスの帯域幅は２語（６４ビツト
、３２ビット＝１語）である。データは、第６図に示す
ように、装置バス１０で構成される。In this embodiment, the device bus bandwidth is 2 words (64 bits, 32 bits = 1 word). Data is organized on a device bus 10, as shown in FIG.

与えられたクロックサイクルにおいては、２つの語が装
置バス１０を介して通信され、より高次Ｏ語８２をクロ
ックサイクル１０間に伝えることができる。このやシ方
にはいくつかの発明的利点が得られる。たとえば、ある
コンピュータ装置においては、主メモリからの情報は１
度に１語ではなくて語ブロックで要求すると有利である
。そのようなマイクロプロセッサに組合わされているキ
ャッシュメモリへ情報を語のブロックで供給することが
望ましい。In a given clock cycle, two words are communicated over the device bus 10, and a higher order O word 82 can be conveyed during the clock cycle 10. This alternative offers several inventive advantages. For example, in some computing devices, information from main memory is
It is advantageous to request in blocks of words rather than one word at a time. It is desirable to supply information to a cache memory associated with such a microprocessor in blocks of words.

第１図を参照して説明したように、この実施例ａＭＩＰ
ｓコンピュータのマイクロプロセラ−＋ｊ２３を用いる
。このマイクロプロセッサはデータキャッシュメモリ１
８を有する。マイクロプロセッサはデータの要求を行う
ことができ、主メモリをアクセスすることを求めるキャ
ッシュミス９０が起ることがある。二重語境界上に整列
させられているｗ！（すなわち、偶数アドレスされた語
）に対する要求を行ったとすると（分岐９１）、二重語
読出し命令が装置バスにおいて開始される（ブロック９
２）。二重語は高位の語８２と低位の語８１に一致でき
る。As explained with reference to FIG.
A Microprocessor+J23 computer is used. This microprocessor has data cache memory 1
It has 8. The microprocessor can make requests for data, and cache misses 90 may occur that require access to main memory. It is aligned on the double word boundary lol! (i.e., an even addressed word) (branch 91), a double word read instruction is initiated on the device bus (block 9
2). A double word can match a high word 82 and a low word 81.

それから、装置は奇数データキャッシュアレイ２２を書
込むべきキャッシュとして選択しくブロック９４）、ク
ロックサイクル０の間に低位語８１が受けられて、キャ
ッシュメモリに書込まれる（ブロック９５）。プーロセ
ツブは二重語境界上の整列されている語すなわち高位語
８２をもともと求めていた。プロセラ丈は正しく危いデ
ータを受けたこと、およびメモリアクセスの再試行を行
うことを命令される（ブロック９６）。The device then selects the odd data cache array 22 as the cache to write to (block 94), and during clock cycle 0, the low order word 81 is received and written to the cache memory (block 95). Poulosetub originally sought words that were aligned on double word boundaries, that is, high-level words 82. The processor manager is instructed to correctly receive the compromised data and to retry the memory access (block 96).

それから、偶数データキャッシュアレイ２１が選択され
、クロックサイクル１の間に高位の語８２が受けられる
。次に、その高位￥Ｆ！８２はキャッシュメモリへ書込
まれる（ブロック９８）。次にブロセツブは要求された
情報、高位語８２を利用できる。本発明の別の実施例で
は、戻される語の特定の順序は変えることができること
が当業者には明らかであろう。とくに、この実施例のコ
ンピュータ装置においては、バイト順序はパイ）０が最
も左側のバイトであるようにしてつけられる。このこと
をビッグ−エンデイアン・システム（ｂｉｇ−・ｎｄｌ
ａｎ　５ｙｓｔ＠ｍ）と呼ぶことができ、それはモトロ
ーラ（Ｍｏｔｏ　ｒｏｔａ）　６８０００プロセツサ規
約に適合する。本発明の方法と装置は、バイト順序が、
バイトＯが最も右側のバイトであるよう々、コンピュー
タ装置へも同様に応用できる。これをリトル−エンデイ
アン・システム（ｔｉｔｌ・−・ｎｄｉａｎｓｙｓｔｅ
ｍ）と呼ぶ。リトル−エンデイアン・システムにおいて
は、この実施例で説明した方法とは逆の順序で戻すこと
ができる。The even data cache array 21 is then selected and receives the high order word 82 during clock cycle one. Next, that high rank ￥F! 82 is written to cache memory (block 98). The block then makes available the requested information, the high-level term 82. It will be apparent to those skilled in the art that in other embodiments of the invention, the specific order of the returned words may be varied. Specifically, in the computer system of this embodiment, the byte order is such that pi)0 is the leftmost byte. This can be described as a big-endian system (big-・ndl).
an 5yst@m), which conforms to the Motorola 68000 processor convention. The method and apparatus of the present invention provides that the byte order is
A similar application can be made to computer systems, such that byte O is the rightmost byte. This is called a little-endian system (title...ndiansystem).
m). In a little-endian system, the order can be reversed to that described in this example.

コンビエータ装置は情報のアクセスを順次行うことをし
ばしば求めるから、低位語８１に対して要求が行われる
とキャッシュ拳ヒツトが起る。Because combinator devices often require sequential access to information, cache hits occur when requests are made for low-level words 81.

この実施例では、主メモリアクセスは１０のクロックサ
イクルを必要とする。本発明の方法を用いると、二重語
アクセスの丸めに１１のクロックサイクルが求められる
。再試行のために付加クロックサイクルを必要とする。In this example, main memory access requires 10 clock cycles. Using the method of the invention, 11 clock cycles are required for rounding the double word access. Requires additional clock cycles for retry.

しかし、プロセッサが低位語８１をアクセスすることを
求めると仮定すると、９クロツクサイクルが節約される
。However, assuming the processor seeks to access low-level word 81, nine clock cycles are saved.

更に、本発明は、プロセッサボード１１のよう表、装置
内のボード上で３２ビツト（１語）のバス帯域幅を利用
でき、ある倍数の語の帯域幅の装置バスを利用する。し
たがって、情報が装置バスからボードへ動かされるＫつ
れて装置内で詰りか生ずることがある。上記の本発明の
再試行法によりブ−タブロック中の順次語を装置内のボ
ードへ時間的にずらして送ることができる。これにより
、データ語の幅が狭くされた時に詰シの問題を装置バス
の性能を低下させることなしに解決できる。Additionally, the present invention utilizes 32 bits (one word) of bus bandwidth on a board within the device, such as processor board 11, and utilizes a device bus with a multiple word bandwidth. Therefore, jams can occur within the device as information is moved from the device bus to the board. The above-described retry method of the present invention allows sequential words in a booter block to be sent to boards within the device in a staggered manner. This solves the problem of jamming when the data word width is narrowed without degrading the performance of the device bus.

２語以外の装置バス帯域幅を有するコンピュータ装置に
本発明を利用できることが当業者には明らかであろう。It will be apparent to those skilled in the art that the present invention can be utilized with computer devices having device bus bandwidths other than two words.

たとえば、コンピュータ装置は１２８ビツト（４Ｍ）の
装置バス帯域幅を利用できる。その装置においては、１
〜４つのデータキャッシュメモリアレイを利用でき、デ
ータを４語のブロックで要求できる。クロックブイクル
当り１語の割合でデータを４クロツクサイクルにわたっ
て時間的にずらしてキャッシュメモリ内に置くことがで
きる。For example, a computer device can utilize 128 bits (4M) of device bus bandwidth. In that device, 1
~4 data cache memory arrays are available and data can be requested in blocks of 4 words. Data can be staggered in time over four clock cycles in the cache memory at a rate of one word per clock cycle.

あるいは、ブロックデータをバスから受けてバッファ内
に置くことができる。それから、データはバッファから
キャッシュメモリ内に受けられる。Alternatively, block data can be received from the bus and placed in a buffer. Data is then received from the buffer into cache memory.

第３図は、本発明で利用できるように、バスウオッチャ
ーが自分のプロセッサからの書込みトランザクションを
処理する方法を示す。パスウオッチャーからキャッシュ
への経路は一方向経路であるから、キャッシュへの書込
みが行われてもバスウオッチャーは注意されない。その
代りに、プロセッサが書込みトランザクションを書込み
バッファへ送ると（ブロックＴＯ）、書込みバッファは
、キャッシュの書込みが行われたこと（ブロック７５）
を述べる情報を格納し、書込みトランザクションとバス
インターフェイスへ送る責任を負う（ブロック７１）。FIG. 3 illustrates how a bus watcher processes write transactions from its processor as may be utilized with the present invention. Since the path from the path watcher to the cache is a one-way path, the bus watcher is not alerted to writes to the cache. Instead, when the processor sends the write transaction to the write buffer (block TO), the write buffer indicates that the cache write has occurred (block 75).
and is responsible for storing information describing the write transaction and sending it to the bus interface (block 71).

書込みトランザクションをバスインターフェイスへ送る
ことと並列Ｋ（ブロック７１）、新しいデータがキャッ
シュへの書込みトランザクションであれば、その新しい
データでキャッシュを更新できる（ブロック７２）。Parallel K (block 71) with sending a write transaction to the bus interface, and if the new data is a write transaction to the cache, the cache can be updated with the new data (block 72).

この実施例の書込みバッファは、装置バスを利用でき危
い時にプロセッサが停止することをさけるために、小さ
い行列を有する。そうすると、バスインターフェイスは
、キャッシュ書込みが行われたことをバスウオッチャー
へ知らせ（ブロック７３）、装置バスとメモリを利用で
きる時にトランザクションを装置バスに置く。そうする
と、バスウオッチャーは、それのアドレスタグアレイを
更新する責任を負い、アドレスはメモリへ書込まえる（
ブロック７４）。The write buffer in this embodiment has a small matrix to avoid stalling the processor at critical times when the device bus is available. The bus interface then informs the bus watcher that the cache write has occurred (block 73) and places the transaction on the device bus when the device bus and memory are available. The bus watcher is then responsible for updating its address tag array and the address is written to memory (
block 74).

第４図は、別のプロセッサ、ＦＰＵ、　Ｉｌｏ　装置ま
たはグラフィックス装置が情報を主メモリへ書込む時に
１キヤツシユエントリイを無効にする方法を示す。この
実施例においては、キャッシュメモリへ書込まれるデー
タと直接相関してデータが主メモリへ書込まれるように
構成された、書込み仲介キャッシュ機構をプロセッサは
用いる。いいかえると、キャッシュメモリが主メモリヘ
フラツシュされる時である未来の時点まで、データは汚
れたデータとしてキャッシュメモリに保たれず、むしろ
、キャッシュへの書込みトランザクションが行われたら
データはできるだけ早く主メモリへ書込まれる。この実
施例では、コンピュータ装置は別々のキャッシュを持つ
いくつかのプロセッサ装置を有する。各キャッシュは互
いに調和せねばならない。したがって、メモリミューデ
ータにより書込みトランザクションが行われると、対応
するキヤツシユエントリイを無効にせねばならない。FIG. 4 shows how to override one cache entry when another processor, FPU, Ilo device, or graphics device writes information to main memory. In this embodiment, the processor employs a write-mediated cache mechanism configured such that data is written to main memory in direct correlation with data written to cache memory. In other words, the data is not kept in cache memory as dirty data until a future point in time when the cache memory is flushed to main memory; rather, once a write transaction to the cache occurs, the data is flushed to main memory as soon as possible. written to. In this embodiment, the computing device has several processor devices with separate caches. Each cache must be compatible with each other. Therefore, when a write transaction is performed with memory data, the corresponding cache entry must be invalidated.

この実施例では、バスウオッチャーは装置パスを書込み
トランザクションのために監視し、書込むべきアドレス
を調べる（ブロック６１）。それはこの動作を、アドレ
スが偶数または奇数の語境界で始まることに応じて、そ
れの偶数ま九は奇数のアドレスタグアレイと比較するこ
とにより行う。In this embodiment, the bus watcher monitors the device path for write transactions and determines the address to write to (block 61). It does this by comparing it to the even or odd address tag array depending on whether the address begins on an even or odd word boundary.

また、あるトランザクションを二重語トランザクション
とすることができ、パスウオッチャーは偶数と奇数のタ
グアレイを調べねばならない。ノ（スウオッチャーのア
ドレスタグアレイは、プロセッサのためのデータキャッ
シュ内に含まれているデータのスーパーセットを含むか
ら、）くスウオッチャーのアドレスタグアレイ中にアド
レスが見出されないとすると、アドレスはプロセッサの
ためのデータキャッシュ内に存在しない、その場合には
、分岐６２へ進んでこの書込みトランザクションを無視
できる。Also, a transaction can be a double word transaction, and the path watcher must examine the even and odd tag arrays. If an address is not found in the watcher's address tag array (because the watcher's address tag array contains a superset of the data contained in the data cache for the processor), then the address is is not in the data cache for the write transaction, in which case branch 62 can be taken to ignore this write transaction.

さもなければ分岐６３へ進み、ノクスウオッチャー内の
対応するエントリイが無効にされる（ブロック６４）。Otherwise, branch 63 is taken and the corresponding entry in the Nox Watcher is invalidated (block 64).

データキャッシュメモリにおけるエントリイも無効にさ
れる（ブロック６５）。プロセッサがこのデータの読出
しを要求する次にはキャツシュヒツトはなく、第２図を
参照して説明したようにデータをメモリからフェッチす
る必要がある。Entries in the data cache memory are also invalidated (block 65). The next time the processor requests to read this data, there is no cache hit and the data must be fetched from memory as described with reference to FIG.

[Brief explanation of the drawing]

第１図はキャッシュメモリと、中央処理ノくスヘ結合さ
れたバスウオッチャーとを有する多数の中央処理装置を
備え、本発明に使用できるコンピュータ装置のブロック
図、第２図は読出しがキャッシュ読出しであるかどうか
をプロセッサ装置に注意しない、本発明に利用できる、
データ読出しおよびキャッシュメモリ更新方法を示す流
れ図、第３図は、書込みトランザクションがキャッシュ
書込みであるかどうかをプロセッサが装置に注意しない
、装置への書込みトランザクション中にキャッシュの一
貫性を確保する、本発明に利用できる方法を示す流れ図
、第４図は、キャッシュの一貫性を確保するために１外
部プロセッサからの書込みトランザクションをモニタす
る本発明に利用できる方法の流れ図、第５図は、１つの
語がプロセッサにより求められた時にキャッシュメモリ
へデータのブロックを供給する、本発明の方法を示す流
れ図、第６図は本発明で利用できるパスにおけるデータ
構成を示す略図である。１０・・・・装置ハス、１１・・拳−フロセッサ装置、
１４・・・・パスインターフェイス装置、１５０拳・・
バスウオッチャー　１６，１９・・・・偶数アドレスタ
グアレイ、１γ、２０・・骨・奇数アドレスタグアレイ
、１８・・・・データキャッシュメモリ、２１・・・・
偶数データアレイ、２２−・・・奇数データアレイ、２
３・・拳・プロセッサ、３０・−・・浮動率ＪＬａ　７
’　ａ　セｙす装置。FIG. 1 is a block diagram of a computer device that can be used in the present invention and includes a number of central processing units having a cache memory and a bus watcher coupled to a central processing node; FIG. Regardless of whether there is a processor device that can be utilized in the present invention,
FIG. 3 is a flowchart illustrating a method for reading data and updating cache memory; FIG. FIG. 4 is a flow diagram of a method available to the present invention for monitoring write transactions from one external processor to ensure cache coherency; FIG. FIG. 6 is a flow diagram illustrating the method of the present invention for providing blocks of data to a cache memory when requested by a processor; FIG. 10...Device Hass, 11...Fist - Flosser device,
14...Pass interface device, 150 fists...
Bus watcher 16, 19... Even address tag array, 1γ, 20... Bone/odd address tag array, 18... Data cache memory, 21...
Even number data array, 22-...odd number data array, 2
3...Fist processor, 30...Floating rate JLa 7
'a Saying device.

Claims

[Claims]

(1) maintaining a list of identifying information for all entries in a first cache memory associated with a first processor; and monitoring write transactions to main memory by other processors in the computer device. and, when the write transaction addresses a location in the main memory that corresponds to an address in the list of identification information, invalidating a corresponding entry in the first cache memory, thereby . A method for enforcing cache coherency in a computing device, wherein a cache entry is invalidated in the first cache memory when the cache entry is inconsistent with an entry in the main memory.

(2) having a plurality of processor units sharing one main memory, each said processor unit having a separate cache memory, said processors coupled to said main memory by a common bus, and each said memory connected to said processor unit; A method for enforcing cache coherency in a computing device, further comprising circuitry for monitoring transactions on the bus between a processor and a main memory, the circuitry monitoring transactions in the computing device, and comprising: in the case of a read transaction from the main memory by a first one of the cache memories, updating a tag in the circuit corresponding to an entry in a first cache memory associated with the first processor; in the case of a write transaction to the main memory by a first of the processors, determining from the tag whether information to be written is present in the first cache memory; The information to be entered is the first
invalidating the information if it exists in a cache memory of the first processor; and updating the tag based on the write transaction in the case of a write transaction from the first processor. , whereby a cache entry is invalidated in the first cache memory when the cache entry is inconsistent with an entry in the memory.

(3) processor means for processing instructions; first means coupled to said processor means for storing information; first means coupled to said processor means for storing information;
second means having an access time shorter than the access time of said first means; and third means coupled to said processor and said first means for monitoring information sent to said first means. means; and a fourth means coupled to said third means and said second means for storing identification information about the contents of said second means.
A computer device comprising: means, whereby the contents of the second means are maintained consistent with the first means.

(4) In a computer system having a number of processing units, one main memory shared by the processing units, and a plurality of separate high speed memories coupled to each said processing unit, each said processing unit and said processing unit said a plurality of first means coupled to main memory for monitoring transactions between each said processing unit and said main memory; and a plurality of first means coupled to each said high speed memory and said first means for determining addresses of data in said separate a plurality of second means for storing in high-speed memories; and a computer apparatus capable of maintaining a list of data in the separate high-speed memories consistent with the main memory.

(5) an integer processing unit, a cache memory coupled to the integer processing unit, a device bus coupled to the cache memory, and a main memory coupled to the device path;
A computing device comprising the integer processing device and a floating point processing device coupled to the device path, comprising: maintaining a list of identification information for all entries in the cache memory; monitoring a write transaction to main memory; and if said write transaction addresses a location in said main memory corresponding to an address in said list of identification information, said write transaction addresses a location in said first cache memory that corresponds to an address in said list of identification information; 1. A method of enforcing cache coherence in a computing device, comprising: a process of invalidating entries;