JPH06103477B2

JPH06103477B2 - Parallel cache memory

Info

Publication number: JPH06103477B2
Application number: JP3348383A
Authority: JP
Inventors: 浩酒井
Original assignee: 工業技術院長
Priority date: 1991-12-05
Filing date: 1991-12-05
Publication date: 1994-12-14
Anticipated expiration: 2009-12-14
Also published as: JPH05158793A

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は、複数のプロセッサを有
するデータ処理システムに用いられる並列キャッシュメ
モリに関するものである。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a parallel cache memory used in a data processing system having a plurality of processors.

【０００２】[0002]

【従来の技術】従来、高速のデータ処理を実現する手段
として、図８に示すように複数のプロセッサＰを使用
し、これら複数のプロセッサＰをメモリバスＢに接続し
てメモリＭを共有する密結合マルチプロセッサ方式を採
用したものがある。2. Description of the Related Art Conventionally, as a means for realizing high-speed data processing, a plurality of processors P are used as shown in FIG. 8, and the plurality of processors P are connected to a memory bus B to share a memory M. Some have adopted a combined multiprocessor system.

【０００３】しかし、このように各プロセッサＰを単に
メモリバスＢに結合したのでは、各プロセッサＰから共
有メモリＭに対するリード／ライトなどのメモリアクセ
ス要求がメモリバスＢに集中することがあるため、プロ
セッサＰの数が４〜１６個程度になると、システム全体
の性能向上がそれ以上望めなくなることが知られてい
る。However, if each processor P is simply coupled to the memory bus B in this way, memory access requests such as read / write to the shared memory M from each processor P may concentrate on the memory bus B. It is known that when the number of processors P is about 4 to 16, the performance improvement of the entire system cannot be expected any more.

【０００４】これを解決する手段として、図９に示すよ
うに各プロセッサＰに対してキャッシュメモリＣＭ1 、
ＣＭ2 、…を接続したものがある。なお、キャッシュメ
モリに関する参考文献として、次のものが知られてい
る。As a means for solving this, as shown in FIG. 9, the cache memory CM1 for each processor P,
Some have CM2, ... Connected. The following are known as reference documents relating to cache memories.

【０００５】Paul Sweazey and Alan Jay Smith : A Cl
ass of a Compatible Cache Consistency Protocol and
their Support by the IEEE Futurebus, Proceedingso
f the 13th Annual International Symposium on Compu
ter Architecture, June, 1986. James Archibald and Jean-loup Baer: Cache Coherenc
e Protocols: Evaluation Using a Multiprocessor Sim
ulation Model, ACM Transacions on Computer, Vol.4,
No.4, Novemder, 1986. ここで、図９において、各プロセッサＰにそれぞれ接続
されるキャッシュメモリＣＭ1 、ＣＭ2 、…は、それぞ
れ対応するプロセッサＰが最近に参照した共有メモリＭ
の内容のコピーを、複数あるキャッシュラインのひとつ
に格納している。ここでの各キャッシュラインは、下記
のような情報を格納するようにしている。（１）数ワード分（４〜１６ワード程度のものが一般的
である）のデータ（これをキャッシュラインデータと呼
ぶ）。（２）データがもともと存在していた共有メモリ装置Ｍ
上でのアドレス。（３）キャッシュラインの状態、例えば、タグ情報とし
て、・キャッシュラインに有効なデータが格納されているか
否か、・キャッシュラインのデータは他のキャッシュメモリに
も存在するか否か、・共有メモリ上のデータと同じであるか否か、などである。Paul Sweazey and Alan Jay Smith: A Cl
ass of a Compatible Cache Consistency Protocol and
their Support by the IEEE Futurebus, Proceedingso
f the 13th Annual International Symposium on Compu
ter Architecture, June, 1986. James Archibald and Jean-loup Baer: Cache Coherenc
e Protocols: Evaluation Using a Multiprocessor Sim
ulation Model, ACM Transacions on Computer, Vol.4,
No. 4, Novemder, 1986. Here, in FIG. 9, the cache memories CM1, CM2, ... Connected to the respective processors P are shared memories M recently referenced by the corresponding processors P.
Stores a copy of the contents of one of the cache lines. Each cache line here stores the following information. (1) Data of several words (generally 4 to 16 words) (this is called cache line data). (2) Shared memory device M in which data originally existed
Address above. (3) Cache line state, for example, as tag information: -whether valid data is stored in the cache line-whether the cache line data exists in other cache memory, shared memory Whether it is the same as the above data, etc.

【０００６】しかして、このようにしたキャッシュメモ
リＣＭ1 、ＣＭ2 、…では、各プロセッサＰから共有メ
モリＭへのリード／ライト要求に対し、いま、あるプロ
セッサＰからリード要求があった場合、プロセッサＰに
接続されたキャッシュメモリＣＭ1 が、要求のあったキ
ャッシュラインデータを保持していると、図１０（ａ）
の動作シーケンスに示すように、キャッシュメモリＣＭ
1 は、そのキャッシュラインデータの中から要求のあっ
た部分データをプロセッサＰに返して処理を終了する。In the cache memories CM1, CM2, ..., Which have been described above, when a read / write request from each processor P to the shared memory M is issued by a certain processor P, a processor P is issued. If the cache memory CM1 connected to is holding the requested cache line data, FIG.
As shown in the operation sequence of
1 returns the requested partial data from the cache line data to the processor P and ends the processing.

【０００７】また、プロセッサＰに接続されたキャッシ
ュメモリＣＭ1 が、該当するキャッシュラインデータを
保持していないと、図１０（ｂ）の動作シーケンスに示
すように、まず、キャッシュメモリＣＭ1 は、他のキャ
ッシュメモリＣＭ2 および共有メモリＭに対して、その
アドレスに対応するキャッシュラインデータの転送を要
求する。すると、データ転送要求を受け取ったキャッシ
ュメモリＣＭ2 は、そのキャッシュラインに要求のあっ
たキャッシュラインデータを保持しているか否か、およ
び、それをキャッシュメモリＣＭ1 に転送すべきか否か
を判定する。そして、データ転送をすべきと判定したキ
ャッシュメモリＣＭ2 は、そのキャッシュラインデータ
をキャッシュメモリＣＭ1 に転送する。もし、どのキャ
ッシュメモリも転送しない場合、共有メモリＭが該当す
るキャッシュラインデータを転送する。そして、転送さ
れたキャッシュラインの内容を受けとったキャッシュメ
モリＣＭ1 は、それを適当なキャッシュラインに格納す
るとともに、そのキャッシュラインデータの中から要求
のあった部分データをプロセッサＰに返して処理を終了
する。If the cache memory CM1 connected to the processor P does not hold the corresponding cache line data, as shown in the operation sequence of FIG. The cache memory CM2 and the shared memory M are requested to transfer the cache line data corresponding to the address. Then, the cache memory CM2 that has received the data transfer request determines whether or not the requested cache line data is held in that cache line, and whether or not it should be transferred to the cache memory CM1. Then, the cache memory CM2 that determines that the data transfer should be performed transfers the cache line data to the cache memory CM1. If no cache memory is transferred, the shared memory M transfers the corresponding cache line data. Then, the cache memory CM1 which has received the transferred contents of the cache line stores it in an appropriate cache line, returns the requested partial data from the cache line data to the processor P, and ends the processing. To do.

【０００８】一方、プロセッサＰからライト要求があっ
た場合、プロセッサＰに接続されたキャッシュメモリＣ
Ｍ1 が、該当するキャッシュラインデータを保持し、か
つ、そのキャッシュラインデータが他のキャッシュメモ
リＣＭ2に存在しないと（これは、タグ情報を参照して
調べる）、図１１（ａ）の動作シーンスに示すように、
キャッシュメモリＣＭ1 は、該当するキャッシュライン
データのうち、ライト要求のあった部分のデータを書き
換えるとともに、必要に応じてタグ情報の更新（例え
ば、共有メモリＭと内容が異なることを記憶する）を行
う。On the other hand, when there is a write request from the processor P, the cache memory C connected to the processor P
If M1 holds the corresponding cache line data and the cache line data does not exist in another cache memory CM2 (this is checked by referring to the tag information), the operation scene of FIG. As shown
The cache memory CM1 rewrites the data of the portion for which a write request has been issued among the corresponding cache line data, and updates the tag information as necessary (for example, stores that the content is different from the shared memory M). .

【０００９】また、プロセッサＰに接続されたキャッシ
ュメモリＣＭ1 が、該当するキャッシュラインデータを
保持しており、かつ、そのキャッシュラインデータが他
のキャッシュメモリＣＭ2 に存在すると、図１１（ｂ）
の動作シーンスに示すように、キャッシュメモリＣＭ1
は、他のキャッシュメモリＣＭ2 に対して、該当するキ
ャッシュラインデータの無効化を要求する。そして、こ
の無効化の要求が達成されることが確実になった時点
で、該当するキャッシュラインデータのうち、ライト要
求のあった部分のデータを書き換えるとともに、必要に
応じてタグ情報の更新（例えば、共有メモリと内容が異
なること、そのキャッシュラインデータは他のキャッシ
ュメモリには無いことを記憶する）を行う。さらに、無
効化要求を受け取ったキャッシュメモリＣＭ2 は、キャ
ッシュラインに、該当するキャッシュラインデータがあ
るか否か調べ、もしあれば、そのタグに保持されている
データが無効であることを示す情報を格納するようにな
る。If the cache memory CM1 connected to the processor P holds the corresponding cache line data and the cache line data exists in another cache memory CM2, FIG.
Cache memory CM1
Requests the other cache memory CM2 to invalidate the corresponding cache line data. Then, when it becomes certain that the request for invalidation is achieved, the data of the portion of the corresponding cache line data for which the write request is made is rewritten, and the tag information is updated as necessary (for example, , That the content is different from the shared memory and that the cache line data is not in another cache memory). Further, the cache memory CM2 which has received the invalidation request checks whether or not there is the corresponding cache line data in the cache line, and if there is, the information indicating that the data held in the tag is invalid. Will be stored.

【００１０】また、プロセッサＰに接続されたキャッシ
ュメモリＣＭ1 が、該当するキャッシュラインデータを
保持していないと、図１１（ｃ）の動作シーンスに示す
ように、キャッシュメモリＣＭ1 は、他のキャッシュメ
モリＣＭ2 及び共有メモリＭに対して、そのアドレスに
対応するキャッシュラインデータの転送と無効化を要求
する。そして、データ転送及び無効化要求を受け取った
キャッシュメモリＣＭ2 は、同キャッシュメモリＣＭ2
内のキャッシュラインに、該当するキャッシュラインデ
ータが存在するか否か、及び、それを要求元のキャッシ
ュメモリＣＭ1に転送すべきか否かを判定する。そし
て、データ転送をすべきと判定したキャッシュメモリＣ
Ｍ2 は、該当するキャッシュラインデータをキャッシュ
メモリＣＭ1 に転送する。もし、どのキャッシュメモリ
ＣＭ1 も転送しない場合、共有メモリＭが該当するキャ
ッシュラインデータを転送する。それと同時に、キャッ
シュラインに、該当するキャッシュラインデータが存在
するすべてのキャッシュメモリＣＭ2は、その無効化を
行う（すなわちタグにそれが無効であることを示す情報
を格納する）。さらに、転送されたキャッシュラインデ
ータを受け取ったキャッシュメモリＣＭ1 は、それを適
当なキャッシュラインに格納するとともに、該当するキ
ャッシュラインデータのうち、ライト要求のあった部分
のデータを書き換え、必要に応じてタグ情報の更新（例
えば、共有メモリと内容が異なること、そのキャッシュ
ラインデータは他のキャッシュメモリには無いことを記
憶する）を行う。If the cache memory CM1 connected to the processor P does not hold the corresponding cache line data, as shown in the operation scene of FIG. The CM2 and the shared memory M are requested to transfer and invalidate the cache line data corresponding to the address. The cache memory CM2 that has received the data transfer and invalidation request is
It is determined whether or not the corresponding cache line data exists in the cache line therein, and whether or not the cache line data should be transferred to the requesting cache memory CM1. Then, the cache memory C that determines that the data transfer should be performed
M2 transfers the corresponding cache line data to the cache memory CM1. If no cache memory CM1 is transferred, the shared memory M transfers the corresponding cache line data. At the same time, all the cache memories CM2 having the corresponding cache line data in the cache line are invalidated (that is, the tag stores information indicating that the cache memory is invalid). Further, the cache memory CM1 that has received the transferred cache line data stores it in an appropriate cache line and rewrites the data of the write request portion of the corresponding cache line data, and if necessary, The tag information is updated (for example, it is stored that the content is different from the shared memory and that the cache line data does not exist in another cache memory).

【００１１】しかして、各プロセッサＰでのリード／ラ
イト要求がそれぞれ図１０（ａ）、図１１（ａ）に示す
動作シーケンスで処理できる確率が高い場合には、この
ような並列キャッシュの採用により、データ処理装置全
体の性能は飛躍的に改善できることになる。しかし、実
際には図１０（ｂ）、図１１（ｂ）（ｃ）に示すような
動作シーケンスも起こることがあるため、プロセッサＰ
の数が１６〜３０程度にもなると、メモリバス上のトラ
フィックが増大し、それ以上のプロセッサ数を増やして
もシステム全体の性能は上がらないようになる。However, if there is a high probability that the read / write request in each processor P can be processed by the operation sequences shown in FIGS. 10 (a) and 11 (a), the parallel cache is used. Therefore, the performance of the entire data processing device can be dramatically improved. However, in reality, the operation sequences shown in FIGS. 10B and 11B and 11C may occur, so that the processor P
If the number of memory cells becomes about 16 to 30, the traffic on the memory bus will increase, and even if the number of processors is increased, the performance of the entire system will not improve.

【００１２】ところで、並列キャッシュメモリを使用す
る場合に、メモリバスＢにコマンドが出るのは、次の３
つのケースに大別できる。By the way, when a parallel cache memory is used, a command is issued to the memory bus B in the following three cases.
There are two cases.

【００１３】第１のケースは、あるプロセッサが使用す
るプログラム及びデータがキャッシュメモリに格納しき
れず、キャッシュメモリと共有メモリの間でキャッシュ
ラインデータの転送が起きる場合である。これは、図１
０（ｂ）や図１１（ｃ）の動作シーケンスに相当し、こ
れを軽減するには、キャッシュメモリの容量を大きくす
る必要がある。これは近年のＬＳＩの高集積化により実
現できる可能性がある。In the first case, the programs and data used by a certain processor cannot be stored in the cache memory, and the cache line data is transferred between the cache memory and the shared memory. This is
This corresponds to the operation sequence of 0 (b) and FIG. 11 (c), and in order to reduce this, it is necessary to increase the capacity of the cache memory. This may be realized by the recent high integration of LSI.

【００１４】第２のケースは、あるキャッシュライン中
の個々のデータをそれぞれ別のプロセッサがアクセスす
ることにより、結果的にそのキャッシュラインデータへ
のアクセス競合を生ずる場合である。これは、図１０
（ｂ）や図１１（ｂ）（ｃ）の動作シーケンスに相当
し、これを軽減するには、キャッシュラインサイズを小
さくする必要がある。これについても、近年のＬＳＩの
高集積化により実現できる可能性がある。The second case is a case where different processors access individual data in a cache line, resulting in access competition for the cache line data. This is shown in FIG.
This corresponds to the operation sequence of FIG. 11B and FIGS. 11B and 11C, and in order to reduce this, it is necessary to reduce the cache line size. This may also be realized by the recent high integration of LSIs.

【００１５】第３のケースは、あるデータに対して複数
のプロセッサからのアクセスが競合することにより、図
１０（ｂ）や図１１（ｂ）（ｃ）の動作シーケンスが起
きる場合である。これは、ひとつの仕事を複数のプロセ
ッサで並列処理する場合に本質的な現象でありＬＳＩの
高集積化では解決できない。The third case is a case where the operation sequences of FIGS. 10B and 11B and 11C occur due to competition of accesses from a plurality of processors for certain data. This is an essential phenomenon when a single job is processed in parallel by a plurality of processors and cannot be solved by high integration of LSI.

【００１６】[0016]

【発明が解決しようとする課題】このように、従来の並
列キャッシュメモリにあっては、あるデータに対して複
数のプロセッサからのアクセスが競合する場合に、メモ
リバスの負荷が増大するために、プロセッサ数をある限
度以上に増やしてもデータ処理のためのシステム全体の
性能は上がらないという問題点があった。As described above, in the conventional parallel cache memory, when accesses from a plurality of processors compete for certain data, the load on the memory bus increases. Even if the number of processors is increased beyond a certain limit, the performance of the entire system for data processing does not improve.

【００１７】本発明は、上記事情に鑑みてなされたもの
で、メモリバスに対する負荷を軽減しデータ処理のため
のシステム全体の性能向上を可能にした並列キャッシュ
メモリを提供することを目的とする。The present invention has been made in view of the above circumstances, and an object of the present invention is to provide a parallel cache memory in which the load on the memory bus is reduced and the performance of the entire system for data processing can be improved.

【００１８】[0018]

【課題を解決するための手段】本発明の並列キャッシュ
メモリは、複数のプロセッサにそれぞれ接続されるとと
もに、それぞれがメモリバスを介して共有メモリに接続
されるキャッシュメモリを有し、プロセッサのリード要
求に対してそのプロセッサに接続されたキャッシュメモ
リが自己のキャッシュライン中にリード要求の対象とな
るキャッシュラインデータがあるか否かを判断し、無け
れば他のキャッシュメモリまたは前記共有メモリからリ
ード要求の対象となるキャッシュラインデータを獲得す
るようにしたもので、各キャッシュメモリは、自らキャ
ッシュメモリ中にリード要求の対象となるキャッシュラ
インデータがあるとリード要求を行ったプロセッサが同
じキャッシュラインデータに対してライト要求を行うか
否かを予測する手段を有し、リード要求を行ったプロセ
ッサに接続されたキャッシュメモリからキャッシュライ
ンデータの転送要求を受けた他のキャッシュメモリは、
自己のキャッシュラインデータに対しプロセッサがライ
ト要求を行うか否かを予測しライト要求を行うと予測す
るとそのキャッシュラインデータを無効化し、リード要
求を行ったプロセッサに接続されたキャッシュメモリ
は、他のキャッシュメモリ又は前記共有メモリから転送
されてきたキャッシュラインデータに対してプロセッサ
がライト要求を行うか否かを予測しライト要求を行うと
予測するとそのキャッシュラインデータが他のキャッシ
ュメモリに無いことを記憶するようにしている。A parallel cache memory of the present invention has a cache memory that is connected to a plurality of processors and that is also connected to a shared memory via a memory bus. For the cache memory connected to the processor, it judges whether or not there is cache line data that is the target of the read request in its own cache line, and if there is no cache request data from another cache memory or the shared memory, The target cache line data is acquired, and if each cache memory has the cache line data that is the target of the read request in its own cache memory, the processor that issued the read request sends the same cache line data to the same cache line data. To predict whether or not a write request will be made Have, other cache memory that has received the transfer request for the cache line data from the cache memory coupled to the processor performing the read request,
If it is predicted that the processor will make a write request to its own cache line data, and if it is predicted that a write request will be made, that cache line data will be invalidated, and the cache memory connected to the processor that issued the read request If it is predicted that the processor will make a write request to the cache line data transferred from the cache memory or the shared memory, and if it is predicted that a write request will be made, it is stored that the cache line data is not in another cache memory. I am trying to do it.

【００１９】[0019]

【作用】この結果、本発明によれば、プロセッサがある
アドレスに対してリードを行い、次に同じアドレスに対
してライトを行うような場合、各キャッシュメモリでの
ライト要求を予測する機能とそれに基づくキャッシュラ
インデータの無効化の機能により、あるデータに対して
複数のプロセッサからのアクセスが競合する場合にもラ
イト処理を高速化できるとともに、メモリバスに対する
負荷を軽減できるようになる。As a result, according to the present invention, the function of predicting a write request in each cache memory when the processor reads from one address and then writes to the same address, and With the cache line data invalidation function based on this, the write processing can be speeded up even when access from a plurality of processors competes for certain data, and the load on the memory bus can be reduced.

【００２０】[0020]

【実施例】以下、本発明の一実施例を図面に従い説明す
る。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS An embodiment of the present invention will be described below with reference to the drawings.

【００２１】図１は同実施例の概略構成を示すものであ
る。図において、１〜３はプロセッサで、これらプロセ
ッサ１〜３は、３２ビットのアドレス線４〜６、３２ビ
ットのデータ線７〜９をそれぞれ介してキャッシュメモ
リ１０〜１２を接続している。また、これらキャッシュ
メモリ１０〜１２は、メモリバス１３により共有メモリ
１４を接続している。FIG. 1 shows a schematic structure of the embodiment. In the figure, 1 to 3 are processors, and these processors 1 to 3 are connected to cache memories 10 to 12 via 32-bit address lines 4 to 6 and 32-bit data lines 7 to 9, respectively. Further, the cache memories 10 to 12 are connected to a shared memory 14 by a memory bus 13.

【００２２】ここで、説明を簡単にするためキャッシュ
メモリ１０について述べると、かかるキャッシュメモリ
１０は、レジスタ１０１、比較回路１０２、セレクタ１
０３、キャッシュライン１０４、制御回路１０５、１０
６、比較回路１０７、セレクタ１０８、レジスタ１０
９、１１０、演算回路１１１、レジスタ１１２を有して
いる。To simplify the description, the cache memory 10 will be described. The cache memory 10 includes a register 101, a comparison circuit 102, and a selector 1.
03, cache line 104, control circuit 105, 10
6, comparison circuit 107, selector 108, register 10
9, 110, an arithmetic circuit 111, and a register 112.

【００２３】レジスタ１０１は、対応するプロセッサ１
からリード要求を受けとると、そのアドレスを格納す
る。キャッシュライン１０４は、アドレス格納部１０４
１、タグ格納部１０４２、キャッシュラインデータ格納
部１０４３を有している。また、キャッシュライン１０
４のエントリ数として１０２４個を有し、それぞれのキ
ャッシュラインデータの大きさを１６バイトとしてい
る。さらに、キャッシュライン１０４は、格納されてい
るキャッシュラインデータがもともと格納されていた共
有メモリのアドレスのうち、Ａ31〜Ａ14（ただし、最上
位ビットをＡ31、最下位ビットをＡ0 で表わす）の情報
と、タグ情報としてのＶビットおよびＸビットを有して
いる。ここでは、Ｖビットが１の時、そのキャッシュラ
インデータが有効であることを表わし、Ｘビットが１の
時、そのキャッシュラインデータは、他のキャッシュメ
モリには存在しないことを表わしている。The register 101 corresponds to the corresponding processor 1.
When the read request is received from, the address is stored. The cache line 104 is the address storage unit 104.
1, a tag storage unit 1042, and a cache line data storage unit 1043. Also, the cache line 10
The number of entries of 4 is 1024, and the size of each cache line data is 16 bytes. Further, the cache line 104 stores information of A31 to A14 (where the most significant bit is A31 and the least significant bit is A0) among the addresses of the shared memory where the stored cache line data was originally stored. , And has V bits and X bits as tag information. Here, when the V bit is 1, it means that the cache line data is valid, and when the X bit is 1, it means that the cache line data does not exist in other cache memory.

【００２４】そして、リード要求に対応するキャッシュ
ラインデータがキャッシュメモリ１０中に存在するか否
かの判定は、制御回路１０５により、レジスタ１０１に
格納されたアドレスのＡ13〜Ａ4 に対応するキャッシュ
ラインを選択し、そのキャッシュラインに格納されてい
るアドレスとレジスタ１０１に格納されたアドレスのＡ
31〜Ａ14が等しいか否かを比較回路１０２で調べ、その
結果とタグ情報のＶビットの値が共に１であるか否かに
より行う。例えば0x34564 番地（16進数）に対するリー
ド要求の場合、Ａ13〜Ａ4 は0x56であるので、0x56番目
（10進では86番目）のキャッシュラインが選択されるよ
うになる。The control circuit 105 determines whether or not the cache line data corresponding to the read request exists in the cache memory 10 by checking the cache line corresponding to the addresses A13 to A4 stored in the register 101. A of the address stored in the selected cache line and the address stored in the register 101
The comparison circuit 102 checks whether or not 31 to A14 are equal, and it is performed depending on the result and whether the value of the V bit of the tag information is 1 or not. For example, in the case of a read request for the address 0x34564 (hexadecimal number), since A13 to A4 are 0x56, the 0x56th (86th decimal) cache line is selected.

【００２５】ここで、リード要求に対応するキャッシュ
ラインデータがキャッシュメモリ１０中に存在する場合
には、レジスタ１０１に格納されたアドレスのＡ3 〜Ａ
2 に対応する部分のデータをセレクタ１０３で選択し、
それをデータ線７に出力するようになる。Here, when the cache line data corresponding to the read request exists in the cache memory 10, the addresses A3 to A stored in the register 101 are stored.
Select the data of the part corresponding to 2 with the selector 103,
It will be output to the data line 7.

【００２６】一方、リード要求に対応するキャッシュラ
インデータがキャッシュメモリ１０中に存在しない場合
には、メモリバス１３を通じて他のキャッシュメモリ１
１、１２あるいは共有メモリ１３にキャッシュラインデ
ータの転送を要求するようになる。On the other hand, when the cache line data corresponding to the read request does not exist in the cache memory 10, another cache memory 1 is accessed through the memory bus 13.
1, 12 or the shared memory 13 is requested to transfer the cache line data.

【００２７】他のキャッシュメモリ１１、１２について
も上述したキャッシュメモリ１０と同様である。The other cache memories 11 and 12 are similar to the cache memory 10 described above.

【００２８】この状態から、キャッシュメモリ１０がキ
ャッシュラインデータの転送を要求した場合を説明する
と、メモリバス１３は、図示しないアドレス線３２ビッ
ト、データ線６４ビット、制御線で構成され、制御線に
は、コマンドの種類（キャッシュラインデータの転送、
キャッシュラインデータの無効化等）が含まれる。From this state, the case where the cache memory 10 requests the transfer of the cache line data will be described. The memory bus 13 is composed of 32 bits of address lines, 64 bits of data lines, and a control line (not shown). Is the type of command (transfer of cache line data,
(Invalidation of cache line data, etc.) is included.

【００２９】そして、他のキャッシュメモリ１１、１２
が、キャッシュメモリ１０からのキャッシュラインデー
タの転送要求を受け取ると、そのアドレス情報がそれぞ
れのレジスタ１１２に格納される。Then, the other cache memories 11 and 12
However, when the cache line data transfer request from the cache memory 10 is received, the address information is stored in the respective registers 112.

【００３０】そして、キャッシュラインに該当するキャ
ッシュラインデータがあるか否かを、制御回路１０６、
比較回路１０７で判定する。この判定方法は、対応する
プロセッサからのリード要求に対して行う制御回路１０
５、比較回路１０２による判定と全く同じである。ここ
で、全く同じ回路を２重に持つのは、そのキャッシュメ
モリと直接接続されているプロセッサからのメモリアク
セス要求と他のキャッシュメモリからのバスを介して送
られてくるキャッシュラインデータの転送等の要求を同
時に処理できるようにするためである。Then, the control circuit 106 determines whether or not there is cache line data corresponding to the cache line.
The comparison circuit 107 determines. This determination method is performed by the control circuit 10 for a read request from the corresponding processor.
5, exactly the same as the judgment by the comparison circuit 102. Here, having exactly the same circuit twice is a memory access request from a processor directly connected to the cache memory and the transfer of cache line data sent from another cache memory via a bus. This is because the requests can be processed simultaneously.

【００３１】キャッシュメモリ１１、１２は、キャッシ
ュラインに該当するキャッシュラインデータがあれば、
そのキャッシュラインデータを送るとともに、セレクタ
１０８を用いて、レジスタ１１２に格納されているアド
レスのＡ3 〜Ａ2 に対応するデータを選択し、演算回路
１１１を用いて、そのデータとレジスタ１０９に格納さ
れている値の間でビットごとの論理積をとり、次にレジ
スタ１１０に格納されている値と比較する。その結果、
両者が一致すると、Ｖビットを１から０に書き変えるよ
うになる。If there is cache line data corresponding to the cache line, the cache memories 11 and 12
While transmitting the cache line data, the selector 108 is used to select the data corresponding to the addresses A3 to A2 stored in the register 112, and the arithmetic circuit 111 is used to store the data and the register 109. Bitwise logical product is performed between the existing values and then compared with the value stored in the register 110. as a result,
When they match, the V bit is rewritten from 1 to 0.

【００３２】また、キャッシュメモリ１０は、転送され
てきたキャッシュラインデータを適当なキャッシュライ
ンに格納するとともに、セレクタ１０８を用いて、レジ
スタ１１２に格納されているアドレスのＡ3 〜Ａ2 に対
応するデータを選択し、演算回路１１１を用いて、その
データとレジスタ１０９に格納されている値の間でビッ
トごとの論理積をとり、次にレジスタ１１０に格納され
ている値と比較する。その結果、両者が一致するとき、
Ｘビットに１を書き込むようになる。Further, the cache memory 10 stores the transferred cache line data in an appropriate cache line, and uses the selector 108 to store the data corresponding to the addresses A3 to A2 stored in the register 112. Then, the arithmetic circuit 111 is used to perform a bit-wise AND operation between the data and the value stored in the register 109, and then the logical product is compared with the value stored in the register 110. As a result, when they match,
Write 1 to the X bit.

【００３３】次に、このように構成した本発明を図２に
示すように複数のプロセッサＰに対してそれぞれキャッ
シュメモリＣＭ1 、ＣＭ2 を接続し、これらキャッシュ
メモリＣＭ1 、ＣＭ2 をメモリバスＢに接続してメモリ
Ｍを共有するような構成に置き換えて説明する。Next, according to the present invention thus constructed, as shown in FIG. 2, the cache memories CM1 and CM2 are respectively connected to a plurality of processors P, and these cache memories CM1 and CM2 are connected to the memory bus B. The description will be made by substituting the memory M for sharing.

【００３４】まず、あるデータに対して複数のプロセッ
サＰからのアクセスが競合する典型的なパターンは、各
プロセッサＰがそのアドレスに対してリードを行い、そ
のデータに対してライトを行う場合である。この一連の
動作において、プロセッサＰが初めにリード要求を行っ
た時点で、プロセッサＰに接続されるキャッシュメモリ
ＣＭ1 に該当するキャッシュラインが無く、他のキャッ
シュメモリＣＭ2 からキャッシュラインデータの転送を
受ける必要のある場合、従来では、図３の動作シーケン
スに示すように、リード要求によりキャッシュメモリＣ
Ｍ2 からキャッシュメモリＣＭ1 へのキャッシュライン
データの転送が行われ（上述した図１０（ａ）と同じ動
作シーケンス）、次のライト要求で、もともとそのキャ
ッシュラインを保持していた他のキャッシュメモリＣＭ
2 に対してそのキャッシュラインの無効化が起きてしま
う。これに対して、本発明のキャッシュメモリＣＭ1 で
は、図４の動作シーケンスに示すように、まずリード要
求によりキャッシュメモリＣＭ2 からキャッシュメモリ
ＣＭ1 へのキャッシュラインデータの転送が行われる
と、キャッシュメモリＣＭ2 でプロセッサＰ2 によるラ
イト要求を予測した結果、そのキャッシュラインデータ
の無効化も同時に行われる。また、キャッシュラインデ
ータを受けとったキャッシュメモリＣＭ1 もプロセッサ
Ｐによるライト要求を予測した結果、そのキャッシュラ
インデータを持つキャッシュメモリが他に無いことを記
憶する。そして、次のライト要求では、キャッシュメモ
リＣＭ1だけがそのキャッシュラインを保持しているの
で、上述した図１１（ａ）と同じ動作シーケンスが起き
る。First, a typical pattern in which access from a plurality of processors P competes for certain data is when each processor P reads at that address and writes at that data. . In this series of operations, when the processor P first issues a read request, there is no cache line corresponding to the cache memory CM1 connected to the processor P, and it is necessary to receive transfer of cache line data from another cache memory CM2. In the conventional case, as shown in the operation sequence of FIG.
The cache line data is transferred from M2 to the cache memory CM1 (the same operation sequence as in FIG. 10A described above), and at the next write request, another cache memory CM that originally held that cache line.
The cache line is invalidated for 2. On the other hand, in the cache memory CM1 of the present invention, as shown in the operation sequence of FIG. 4, when the cache line data is first transferred from the cache memory CM2 to the cache memory CM1 by a read request, the cache memory CM2 stores the cache line data. As a result of predicting the write request by the processor P2, the cache line data is also invalidated at the same time. Further, the cache memory CM1 which has received the cache line data also stores that there is no other cache memory having the cache line data as a result of predicting the write request by the processor P. Then, in the next write request, since only the cache memory CM1 holds the cache line, the same operation sequence as that in FIG. 11A described above occurs.

【００３５】このため、本発明のキャッシュメモリＣＭ
1 、ＣＭ2 では、従来のキャッシュメモリがライト要求
で行っていたキャッシュラインデータの無効化を省略す
ることができ、バスの負荷を減らすことができるように
なる。Therefore, the cache memory CM of the present invention
In 1 and CM2, it is possible to omit the invalidation of the cache line data, which has been performed by the write request in the conventional cache memory, and it is possible to reduce the load on the bus.

【００３６】なお、プロセッサＰが、あるアドレスに対
してリードを行った場合、その後でそのアドレスに対し
てライトが行われるとは限らない。本発明では、次にラ
イトが行われるか否かを、各キャッシュメモリＣＭ1 、
ＣＭ2 に予測させ、ライトが起きると予測される場合の
み、そのプロセッサＰに直接つながっているキャッシュ
メモリＣＭ1 またはＣＭ2 だけが、該当するキャッシュ
ラインデータを保持するようにすることが最大の特徴と
なる。When the processor P reads a certain address, the write is not always performed to that address thereafter. According to the present invention, whether or not the write is next performed is determined by each cache memory CM1,
The greatest feature is that only the cache memory CM1 or CM2 directly connected to the processor P holds the corresponding cache line data only when it is predicted that the CM2 will cause a write.

【００３７】次に、実際にリードを行ったプロセッサが
同じアドレスに対してライトを行うか否かが予測可能な
場合を説明する。Next, a case will be described in which it can be predicted whether the processor that has actually read the data will write to the same address.

【００３８】この場合、並列論理プログラムでは、一般
に多くのプロセスを生成し、それらを多数のプロセッサ
で実行することにより、処理の高速化をはかる。このよ
うに複数のプロセスで並列処理を行う場合には、各プロ
セス間で結果の引渡しを行うなど、プロセス間で同期を
とる必要がある。それを並列論理プログラムでは変数へ
の値の代入により実現している。In this case, in the parallel logic program, generally, many processes are generated and executed by a large number of processors to speed up the processing. When parallel processing is performed by a plurality of processes in this way, it is necessary to synchronize the processes, such as passing results between the processes. In parallel logic programs, this is achieved by substituting values for variables.

【００３９】例えば、プロセスＰ1 からプロセスＰ2 へ
結果を引渡すには、図５に示すように予め決めておいた
ワードＡを使用する。ワードＡは、初期状態として、値
が未定義であり、何かある値を格納できる状態（並列論
理プログラムでは、これを変数と呼ぶ）となっている。
そして、プロセスＰ1 では、プロセスＰ2 に引渡す値が
決まった時、それをワードＡに代入する。また、プロセ
スＰ2 は、ワードＡに対してリード操作を行い、その結
果が変数でなければ、それをプロセスＰ1 から引渡され
た値として処理を行う。逆に、その結果が未だ変数のま
まであれば、ワードＡに変数以外の値が格納されるま
で、Ｐ2 の実行を待たせる必要がある。For example, to pass the result from the process P1 to the process P2, a predetermined word A is used as shown in FIG. The word A has an undefined value as an initial state and can store a certain value (in a parallel logic program, this is called a variable).
Then, in the process P1, when the value to be delivered to the process P2 is determined, it is substituted into the word A. Further, the process P2 performs a read operation on the word A, and if the result is not a variable, processes it as the value passed from the process P1. On the contrary, if the result is still a variable, it is necessary to make the execution of P2 wait until the value other than the variable is stored in the word A.

【００４０】並列論理プログラムでは、上記の操作を下
記の方法により実現している。In the parallel logic program, the above operation is realized by the following method.

【００４１】まず、結果をもらう側のプロセスＰ2 は、
ワードＡに対してリードした結果、もしそれが変数でな
ければ、ワードＡに対して、さらにアクセスすることは
ない。しかし、変数であった場合は、別のプロセスによ
って、ワードＡに何か結果が書き込まれるまで、ウェイ
トする必要があるので、ワードＡに図６に示すような
「ポインタつきの変数」を書き込む。この「ポインタ付
きの変数」は、「このワードに、あるプロセスが値を書
き込んだ時は、そのポインタで指されているプロセスＰ
2 を再び実行可能とするように」ということを表わして
いる。First, the process P2 on the side of receiving the result is
If word A is read and it is not a variable, word A is not accessed further. However, if it is a variable, it needs to be waited until another result is written to word A by another process, and thus a "variable with pointer" as shown in FIG. 6 is written to word A. This "variable with a pointer" is "the process P pointed to by the pointer when a process writes a value in this word.
Make 2 feasible again ".

【００４２】次に、プロセスＰ1 がその結果をワードＡ
に書き込む場合、ワードＡは、変数であるか、ポインタ
つきの変数であるか（すなわち図４に示すデータ構造が
できている）、あるいは別のプロセスによって既に変数
以外の値が格納されているかの３種類の可能性があるの
で、ワードＡに対してリード動作を行う必要がある。そ
して、その結果として、ワードＡが変数あるいはポイン
タつき変数の場合は、ワードＡへの値の書込みを行い、
ワードＡが変数以外の場合には、ワードＡに対するアク
セスは行わないようになる。。Process P1 then returns the result in word A
When writing to, the word A is a variable, a variable with a pointer (that is, the data structure shown in FIG. 4 is formed), or a value other than the variable is already stored by another process. Since there is a possibility of different types, it is necessary to perform a read operation on word A. As a result, when the word A is a variable or a variable with a pointer, the value is written to the word A,
If word A is not a variable, word A is not accessed. .

【００４３】このように、並列論理プログラムでは、変
数あるいはポインタつき変数をリードした後には、その
ワードに対してライトを行い、そうでない場合はライト
を行わないことが多いので、「リードしたデータが変数
あるいはポインタつき変数であること」をライト要求の
予測とすることができる。これは、そのデータが特定の
パターンであるか否か（例えば、そのデータとある定数
のビットごとの論理積がある定数と一致するか否か）に
より判定できる。As described above, in a parallel logic program, after reading a variable or a variable with a pointer, the word is often written, and if not, it is not written. A variable or a variable with a pointer "can be used as the prediction of a write request. This can be determined by whether or not the data has a specific pattern (for example, whether or not the bitwise logical product of the data and a certain constant matches a certain constant).

【００４４】プロセッサによるライト要求を、このよう
な方法で予測することにより、本発明の並列キャッシュ
メモリでは、プロセスＰ２による最初のリードに関して
はメモリバスＢが使用されるものの、その時に他のキャ
ッシュメモリ中の当該キャッシュラインデータは無効化
されるので、続いて行われるライトではメモリバスＢに
コマンドを出す必要が無い。また、その後に行われるプ
ロセスＰ1 によるワードＡへのリード及びライトについ
ても、同様にメモリバスＢの負荷は軽減される。その結
果、プロセスＰ2 およびＰ1 によるワードＡへのライト
動作が高速化できるとともに、メモリバスＢの付加が軽
減され、その分全体の性能が向上する。By predicting the write request by the processor in this way, the parallel cache memory of the present invention uses the memory bus B for the first read by the process P2, but at that time, another cache memory is used. Since the relevant cache line data therein is invalidated, it is not necessary to issue a command to the memory bus B in the subsequent write. Further, the load on the memory bus B is similarly reduced in the subsequent reading and writing to the word A by the process P1. As a result, the write operation to the word A by the processes P2 and P1 can be speeded up, the addition of the memory bus B is reduced, and the overall performance is improved accordingly.

【００４５】次に、記号処理言語では実行過程で多くの
データをメモリ上に割付け、また、不要になったデータ
に割付けていたメモリの回収を行う。このうち、不要に
なったデータを発見する有力な方法に、参照カウンタを
用いる方法がある。これは、図７に示すように、そのデ
ータがいくつのポインタで指されているかを保持するカ
ウンタである。図７では、データＡが、３つのポインタ
で指されているので、参照カウンタには「３」が格納さ
れている。Next, in the symbol processing language, a large amount of data is allocated in the memory during the execution process, and the memory allocated to the unnecessary data is recovered. Among them, a method of using a reference counter is a powerful method for finding unnecessary data. This is a counter that holds how many pointers the data points to, as shown in FIG. In FIG. 7, since the data A is pointed by three pointers, “3” is stored in the reference counter.

【００４６】この参照カウンタは、プロセッサがポイン
タを新たに作ったり削除するたびに、そのポインタが指
すデータの参照カウンタを１だけ増減する必要がある。
このため、プロセッサによる参照カウンタの更新では、
まずリードを行い、次に更新後の結果をライトする。逆
に、参照カウントをリードだけしてライトを行わないこ
とはない。従って、リードしたデータが参照カウンタで
ある場合には、次にライトが行われると予測できる。リ
ードしたデータが参照カウンタであるか否かは、そのデ
ータを特定のパターンになっているか否かで判定でき
る。With this reference counter, it is necessary to increment or decrement the reference counter of the data pointed to by the pointer by 1 each time the processor newly creates or deletes the pointer.
Therefore, when the reference counter is updated by the processor,
First, reading is performed, and then the updated result is written. On the contrary, the reference count is not read but the write is not performed. Therefore, when the read data is the reference counter, it can be predicted that the next write will be performed. Whether or not the read data is the reference counter can be determined by whether or not the data has a specific pattern.

【００４７】次に、マルチプロセッサ用オペレーティン
グシステムでは、メモリの一部を例えばセマフォのよう
に同期処理用の領域として使用する。このような同期処
理用の領域は、通常、リード動作に引き続いて同じアド
レスに対してライト動作が行われることが多い。従っ
て、プロセッサによるライト動作の予測に、リード要求
のあったデータが同期処理用の領域であるか否かを用い
ることができ、そのワードのアドレスが特定の範囲内に
あるか否かにより判定できる。Next, in the multiprocessor operating system, a part of the memory is used as an area for synchronization processing like a semaphore. In such a synchronous processing area, a write operation is usually performed to the same address following a read operation. Therefore, whether or not the read-requested data is in the synchronous processing area can be used to predict the write operation by the processor, and it can be determined by whether or not the address of the word is within a specific range. .

【００４８】なお、本発明は上記実施例にのみ限定され
ず、要旨を変更しない範囲で適宜変形して実施できる。
例えば、プロセッサによるリードの後のライトを予測す
る手段として、上記実施例では、リード対象となるデー
タと定数（レジスタ１０９に格納されている値）のビッ
トごとの論理積をとり、それと別の定数（レジスタ１１
０の値）が一致するかとしたが、他にもそのデータが特
定のパターンであるか否かの判定法としていろいろな可
能性がある。一例として、２^Nエントリ×１ビットのメ
モリを用意し、リード対象のデータのうちＮビットをア
ドレスとして、このメモリにアクセスを行い、その値が
１である場合に、プロセッサによるライトが行われると
予測してもよい。The present invention is not limited to the above-mentioned embodiments, but can be carried out by appropriately modifying it within the scope of the invention.
For example, as a means for predicting a write after a read by a processor, in the above embodiment, the bit-wise logical product of the data to be read and a constant (value stored in the register 109) is calculated, and another constant is calculated. (Register 11
However, there are various possibilities as a method for determining whether the data has a specific pattern. As an example, a memory of 2 ^N entries × 1 bit is prepared, N bits of the data to be read are used as an address to access this memory, and if the value is 1, a write is performed by the processor. You may predict.

【００４９】また、上述では、リード対象となるデータ
に対して、それが特定のパターンであるか否かをプロセ
ッサによるライトの予測として用いたが、それに代え
て、リード対象となるデータ及びそのアドレスをデータ
（すなわち、６４ビットのデータ）とみなし、これに対
して上述したと同様の予測手段とすることにより、より
精度の高い予測が可能になる場合がある。これは、例え
ば、並列論理プログラムにおいて、変数の存在する領域
があるアドレス範囲に限定されている場合に有効であ
る。つまり、変数であるか否かを、データのパターンが
変数の場合と一致し、かつ、「そのアドレスが変数の存
在するアドレス範囲に含まれる」ことで判定することに
より、プロセッサによるライトの予測の精度を高めるこ
とができる。Further, in the above description, whether or not the data to be read is a specific pattern is used as the prediction of writing by the processor. Instead, however, the data to be read and its address are used. May be regarded as data (that is, 64-bit data), and by using the same prediction means as described above, more accurate prediction may be possible. This is effective, for example, in a parallel logic program when an area in which a variable exists is limited to a certain address range. In other words, by determining whether or not it is a variable by matching the case where the data pattern is a variable and "that address is included in the address range in which the variable exists", it is possible to predict the write by the processor. The accuracy can be increased.

【００５０】さらに、アドレスがある範囲にあるか否か
をプロセッサによるライトの予測として用いる場合、上
述のメモリの代わりに、論理アドレスから物理アドレス
への変換を行うＭＭＵ（Memory Management Unit) を用
い、そのアドレスを含むページに関する情報として、ラ
イトが予測されるか否かを含めることも可能である。さ
らにまた、上述では、キャッシュメモリ１０がプロセッ
サ１によるライトを予測する手段として、リード要求の
あったデータ部分とレジスタ１０９に格納されている値
の間でビットごとの論理積をとり（演算器１１１使
用）、次にレジスタ１１０に格納されている値と比較し
たが、それに代えて、キャッシュメモリ１１、１２がキ
ャッシュラインデータを転送する際、それらのキャッシ
ュメモリ１１、１２が行った予測結果も同時に転送し、
それをキャッシュメモリ１０が参照するようにしてもよ
い。Furthermore, when using whether or not an address is within a certain range as a write prediction by the processor, an MMU (Memory Management Unit) for converting a logical address to a physical address is used instead of the above memory, The information regarding the page including the address may include whether or not a write is predicted. Furthermore, in the above description, as a means for the cache memory 10 to predict a write by the processor 1, a bitwise logical product is calculated between the data portion for which a read request has been made and the value stored in the register 109 (operation unit 111 Used), and then compared with the value stored in the register 110. Instead, when the cache memories 11 and 12 transfer the cache line data, the prediction results performed by the cache memories 11 and 12 are also simultaneously calculated. Transfer,
The cache memory 10 may refer to it.

【００５１】[0051]

【発明の効果】本発明の並列キャッシュメモリによれ
ば、ライト処理を高速化でき、メモリバスに対する負荷
の軽減を実現できることから、プロセッサ数を変えるこ
となく高性能のデータ処理システムを実現することがで
きる。According to the parallel cache memory of the present invention, the write processing can be speeded up and the load on the memory bus can be reduced, so that a high-performance data processing system can be realized without changing the number of processors. it can.

[Brief description of drawings]

【図１】本発明による並列キャッシュメモリの一実施例
の該略構成を示すブロック図。FIG. 1 is a block diagram showing the schematic configuration of an embodiment of a parallel cache memory according to the present invention.

【図２】図１に示す実施例の動作を説明するための図。FIG. 2 is a diagram for explaining the operation of the embodiment shown in FIG.

【図３】図１に示す実施例においてあるデータに対して
リード及びライトを続けて行った場合のキャッシュメモ
リの動作シーケンスを示す図。FIG. 3 is a diagram showing an operation sequence of the cache memory when reading and writing certain data in succession in the embodiment shown in FIG. 1;

【図４】図１に示す実施例においてあるデータに対して
リード及びライトを続けて行った場合のキャッシュメモ
リの動作シーケンスを示す図。FIG. 4 is a diagram showing an operation sequence of the cache memory when reading and writing certain data continuously in the embodiment shown in FIG. 1;

【図５】図１に示す実施例においてワードＡを媒介とす
るプロセス間の同期を説明する概念図。5 is a conceptual diagram illustrating synchronization between processes mediated by word A in the embodiment shown in FIG.

【図６】図１に示す実施例においてポインタ付き変数を
用いたプロセスのウェイト動作を説明する概念図。FIG. 6 is a conceptual diagram illustrating a wait operation of a process using a variable with a pointer in the embodiment shown in FIG.

【図７】図１に示す実施例において参照カウンタを説明
するための概念図。FIG. 7 is a conceptual diagram for explaining a reference counter in the embodiment shown in FIG.

【図８】従来のメモリ共有型マルチプロセッサを示す構
成図。FIG. 8 is a configuration diagram showing a conventional memory sharing type multiprocessor.

【図９】従来の並列キャッシュメモリを備えたマルチプ
ロセッサを示す構成図。FIG. 9 is a configuration diagram showing a multiprocessor including a conventional parallel cache memory.

【図１０】図９に示すマルチプロセッサにおいてプロセ
ッサのリード要求に対する各キャッシュメモリの動作シ
ーケンスを示す図。10 is a diagram showing an operation sequence of each cache memory in response to a read request from the processor in the multiprocessor shown in FIG.

【図１１】図９に示すマルチプロセッサにおいてプロセ
ッサのライト要求に対する各キャッシュメモリの動作シ
ーケンスを示す図。11 is a diagram showing an operation sequence of each cache memory in response to a write request of the processor in the multiprocessor shown in FIG.

[Explanation of symbols]

１〜３…プロセッサ、４〜６…アドレス線、７〜９…デ
ータ線、１０〜１２…キャッシュメモリ、１０１、１０
９、１１０、１１２…レジスタ、１０２、１０７…比較
回路、１０３、１０８…セレクタ、１０４…キャッシュ
ライン、１０５、１０６…制御回路、１１１…演算回
路。1-3 ... Processor, 4-6 ... Address line, 7-9 ... Data line, 10-12 ... Cache memory, 101, 10
9, 110, 112 ... Registers, 102, 107 ... Comparison circuits, 103, 108 ... Selectors, 104 ... Cache lines, 105, 106 ... Control circuits, 111 ... Arithmetic circuits.

Claims

[Claims]

1. A cache memory connected to a plurality of processors, each of which has a cache memory connected to a shared memory via a memory bus, the cache memory being connected to the processor in response to a read request from the processor. Judge whether or not there is cache line data that is the target of the read request in its own cache line,
If there is no cache line data that is the target of a read request from another cache memory or the shared memory, each cache memory has its own cache line data that is the target of a read request. If there is a read request, the processor that makes the write request for the same cache line data has a means for predicting whether or not there is a request to transfer the cache line data from the cache memory connected to the processor that issued the read request. The other cache memory that received the request predicts whether the processor will make a write request for its own cache line data, and if it predicts that a write request will be made, invalidates the cache line data and connects to the processor that made the read request. Cached The memory predicts whether or not the processor issues a write request to the cache line data transferred from another cache memory or the shared memory. A parallel cache memory characterized by storing nothing.

2. The prediction of whether or not a processor that has issued a read request issues a write request for the same cache line data is performed based on whether or not the target data of the read request has a certain pattern. The parallel cache memory according to claim 1.

3. The prediction of whether or not a processor that has issued a read request issues a write request to the same cache line data is performed based on whether or not the target address of the read request is within a certain range. The parallel cache memory according to claim 1.