JPH07210526A

JPH07210526A - Parallel computer

Info

Publication number: JPH07210526A
Application number: JP6019905A
Authority: JP
Inventors: Naonobu Sukegawa; 直伸助川; Toshiaki Tarui; 俊明垂井; Keimei Fujii; 啓明藤井
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 1994-01-20
Filing date: 1994-01-20
Publication date: 1995-08-11
Anticipated expiration: 2018-05-26
Also published as: JP3410535B2

Abstract

PURPOSE:To reduce directory capacity in the parallel computer which is provided with a distributed shared memory and performs the coincidence control of a cache memory by a directory. CONSTITUTION:In the parallel computer provided with plural processors and plural distributed shared memories to be accessed by the respective processors and providing cache memories for registering the data of the shared memories for the unit of a line in the respective processors, concerning the shared memory, a page directory 70 is prepared for each page of the shared memory, and the page directory 70 stores the processor which registers the line of that page in the cache memory. Then, a line directory is prepared for each line of the shared memory, and the position of the processor registering the line in the cache memory corresponding to the position on the page directory is stored in the line directory by a bit map.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は、複数のキャッシュメモ
リおよび共有メモリを有し、該共有メモリにディレクト
リを備える並列計算機に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a parallel computer having a plurality of cache memories and a shared memory, and the shared memory having a directory.

【０００２】[0002]

【従来の技術】計算機性能の向上のため、多数台のプロ
セッサを並列動作させる並列計算機が有望視されてい
る。並列計算機ではプロセッサ間の通信手段が必要とな
る。通信手段としては、ネットワークを介してメッセー
ジを交換するメッセージパッシング方式と、各プロセッ
サよりアクセスできる共有メモリ領域を用意する共有メ
モリ方式とがある。メッセージパッシング方式における
メッセージの交換は、一般にオペレーティングシステム
を起動することで実現する。オペレーティングシステム
の起動は、特に短いメッセージの通信において、非常に
重いオーバヘッドとなる。これに対し、共有メモリを用
いる方法では、オペレーティングシステムを起動するこ
となく通信が実現する。このため、共有メモリ方式は、
通信におけるプロセッサの負担を軽減できる。2. Description of the Related Art In order to improve computer performance, a parallel computer for operating a large number of processors in parallel is considered promising. A parallel computer requires a communication means between processors. As communication means, there are a message passing method of exchanging messages via a network and a shared memory method of preparing a shared memory area accessible by each processor. Exchange of messages in the message passing method is generally realized by starting an operating system. Booting an operating system is a very heavy overhead, especially for communicating short messages. On the other hand, the method using the shared memory realizes communication without starting the operating system. Therefore, the shared memory method is
The load on the processor in communication can be reduced.

【０００３】大規模並列計算機の共有メモリ方式とし
て、共有メモリを分割・分散させて用意する、分散型共
有メモリ方式が有効である。共有メモリを分散させるこ
とで、複数のプロセッサによる共有メモリの同時アクセ
スが可能になり、高度な並列性を持つ処理が実現する。
分散共有メモリ方式としては、（Ａ）プロセッサを持つ構成要素と、一部の共有メモリ
を持つ構成要素とを、ネットワークで結合する、均質な
分散型共有メモリシステム（Ｂ）プロセッサと一部の共有メモリとを持つ構成要素
を、ネットワークで結合する、不均質な分散型共有メモ
リシステムとが知られている。均質（アクセス時間にバラツキのな
い）な分散型共有メモリシステムとしては、特開平５−
１２８０７１などが、不均質（アクセス時間にバラツキ
のある）な分散型共有メモリシステムとしては、特開平
５−８９０５６などがある。これらの方式では、プロセ
ッサが共有メモリをアクセスする場合、均質な分散型共
有メモリ方式では毎回、不均質な分散型共有メモリシス
テムでも高い確率で、ネットワークを介してデータ転送
することになる。近年のプロセッサの処理速度の向上
は、ネットワーク速度の向上と比べ著しい。このため、
ネットワークを介するアクセスでは、遅延時間が大きな
問題になる。As a shared memory system for a large-scale parallel computer, a distributed shared memory system in which a shared memory is divided and distributed and prepared is effective. Distributing the shared memory allows multiple processors to access the shared memory at the same time, and realizes highly parallel processing.
The distributed shared memory system includes (A) a homogeneous distributed shared memory system in which a constituent element having a processor and a constituent element having a part of shared memory are connected by a network (B) a processor and a part of a shared A heterogeneous distributed shared memory system is known in which components having a memory are connected by a network. As a homogeneous distributed type shared memory system (with no variation in access time), Japanese Patent Application Laid-Open No.
As a distributed shared memory system in which 128071 or the like is heterogeneous (variation in access time), there is JP-A-5-89056. In these schemes, when the processor accesses the shared memory, the homogeneous distributed shared memory scheme transfers data through the network with a high probability even in the heterogeneous distributed shared memory system. The recent improvement in the processing speed of the processor is remarkable as compared with the improvement in the network speed. For this reason,
When accessing via a network, the delay time becomes a big problem.

【０００４】各プロセッサの共有メモリアクセスの高速
化手段として、キャッシュメモリを各プロセッサに用意
する方式が有効である。キャッシュメモリは、共有メモ
リの一部の内容を登録する、高速なバッファである。キ
ャッシュメモリを用意した共有メモリ型並列計算機で
は、同一データを複数のプロセッサがそれぞれのキャッ
シュメモリに登録するケースが発生する。この場合、各
プロセッサのキャッシュメモリ内容の一致を保証する手
段が必要となる。バスで結合された並列計算機では、ス
ヌーピング手法を利用することで、キャッシュメモリの
内容の一致を保証する。しかし、ネットワークを用いた
分散型共有メモリシステムでは、スヌーピング手法は利
用が困難である。スヌーピング手法が必要とする、一致
管理の情報を全てのプロセッサに通達する通信のオーバ
ヘッドが、ネットワークでは非常に大きくなるためであ
る。分散型共有メモリシステムでキャッシュメモリの一
致を保証する手段として、特開平４−３２８６５３で
は、アドレス／コマンド用にバスを、データ用にネット
ワークを設ける方式を開示する。この方法では、アドレ
ス／コマンド送信は並列処理が不可能であり、高並列な
処理は実現困難である。As a means for accelerating the shared memory access of each processor, a method of preparing a cache memory for each processor is effective. The cache memory is a high-speed buffer that registers part of the contents of the shared memory. In a shared memory type parallel computer having a cache memory, a plurality of processors may register the same data in their respective cache memories. In this case, a means for guaranteeing the match of the cache memory contents of each processor is required. The parallel computers connected by the bus guarantee the matching of the contents of the cache memory by using the snooping method. However, it is difficult to use the snooping method in the distributed shared memory system using the network. This is because the communication overhead required for the snooping technique to communicate the matching management information to all the processors becomes very large in the network. As means for guaranteeing cache memory match in a distributed shared memory system, Japanese Patent Laid-Open No. 328653/1992 discloses a system in which a bus is provided for addresses / commands and a network is provided for data. With this method, parallel processing of address / command transmission is impossible, and highly parallel processing is difficult to realize.

【０００５】並列性を高めた一致管理として、「Ｔｈｅ
ＳｔａｎｆｏｒｄＤａｓｈＭｕｌｔｉｐｒｏｃｅ
ｓｓｏｒ，ＩＥＥＥＣｏｍｐｕｔｅｒ，Ｍａｒｃｈ
１９９２ｐｐ．６３〜７８」では、ディレクトリ方式
が開示される。ディレクトリ方式では、共有メモリから
キャッシュメモリに登録する単位であるラインごとに、
そのラインをキャッシュメモリに登録しているプロセッ
サを記憶するディレクトリを用意する。また、分散され
た共有メモリごとに、ディレクトリを制御するディレク
トリ制御回路とを用意する。あるラインを１つまたはそ
れ以上のプロセッサがキャッシュメモリに登録した場
合、ディレクトリ制御回路は、そのラインに用意される
ディレクトリに、そのプロセッサ全てをディレクトリに
登録する。また、あるプロセッサがあるラインの内容を
更新する場合には、ディレクトリ制御回路が更新するラ
インのディレクトリを調べ、そのラインをキャッシュメ
モリに登録しているプロセッサ（更新する主体のプロセ
ッサ以外）を特定する。さらディレクトリ制御回路は、
特定されたプロセッサのキャッシュメモリに対しては、
ネットワークを通じ、（Ａ）そのキャッシュメモリより、更新するラインの情
報を、（必要な場合には共有メモリに書き戻した後に）
抹消する（Ｂ）そのキャッシュメモリの情報も更新するのどちらかの処理を命令する。以上の管理を行なうこと
により、全てのキャッシュメモリにおいて、同一ライン
については同一の情報が登録されることになる。As a match management with improved parallelism, "The
Stanford Dash Multiproce
ssor, IEEE Computer, March
1992 pp. 63-78 "discloses a directory system. In the directory method, for each line that is a unit to be registered in the cache memory from the shared memory,
Prepare a directory to store the processor that registers the line in the cache memory. Further, a directory control circuit for controlling a directory is prepared for each distributed shared memory. When one or more processors register a line in the cache memory, the directory control circuit registers all the processors in the directory prepared in the line. When a processor updates the contents of a certain line, the directory control circuit checks the directory of the line to be updated and identifies the processor (other than the main processor that updates the line) in the cache memory. . Furthermore, the directory control circuit
For the cache memory of the specified processor,
Through the network, (A) The information of the line to be updated from the cache memory (after writing back to the shared memory if necessary)
Delete (B) Instruct either processing to update the information in the cache memory. By performing the above management, the same information is registered for the same line in all cache memories.

【０００６】ディレクトリ方式には、フルマップ方式と
リミテッドポインタ方式がある。フルマップ方式は、１
ビットと１プロセッサとが対応するビットマップで、デ
ィレクトリを構成する方式である。リミテッドポインタ
方式は、プロセッサＩＤがいくつか記憶できる有限長ポ
インタ配列で、ディレクトリを構成する方式である。リ
ミテッドポインタ方式は、同じラインを同時にキャッシ
ュメモリに登録しているプロセッサ数は一般に少ない、
という特性を利用する。フルマップ方式で発生するディ
レクトリの冗長性（ディレクトリのほとんどのビットが
０となる）が、リミテッドポインタ方式では削減され
る。これにより、ディレクトリ容量を小さくすることが
可能である。リミテッドポインタ方式では、あるライン
のディレクトリが溢れる場合には、リプレース処理など
の対策が必要になる。リプレース処理とは、ディレクト
リに既に記憶されているプロセッサの１つが、そのライ
ンをキャッシュから共有メモリに書き戻した後に、その
プロセッサをディレクトリより抹消し、新たなプロセッ
サを登録する処理である。「ＴｈｅＳｔａｎｆｏｒｄ
ＤａｓｈＭｕｌｔｉｐｒｏｃｅｓｓｏｒ，ＩＥＥＥ
Ｃｏｍｐｕｔｅｒ，Ｍａｒｃｈ１９９２ｐｐ．６
３〜７８」ではフルマップ方式が用いられており、特開
平５−１２８０７１ではリミテッドポインタ方式が用い
られている。The directory system includes a full map system and a limited pointer system. The full map method is 1
This is a method of constructing a directory with a bitmap in which bits correspond to one processor. The limited pointer method is a method of forming a directory with a finite length pointer array that can store several processor IDs. In the limited pointer method, the number of processors that register the same line in the cache memory at the same time is generally small.
To utilize the characteristic. The redundancy of the directory that occurs in the full map method (most of the bits in the directory are 0) is reduced in the limited pointer method. As a result, the directory capacity can be reduced. In the limited pointer method, when a directory on a certain line overflows, it is necessary to take measures such as replacement processing. The replace process is a process in which one of the processors already stored in the directory writes the line back from the cache to the shared memory, deletes the processor from the directory, and registers a new processor. "The Stanford
Dash Multiprocessor, IEEE
Computer, March 1992 pp. 6
3 to 78 ", the full map method is used, and in JP-A-5-128071, the limited pointer method is used.

【０００７】[0007]

【発明が解決しようとする課題】上記のディレクトリ方
式を用いて、分散型共有メモリを持つ大規模並列計算機
を構成すると、極めて大きなディレクトリ容量が必要と
なる。例として、４０９６台のプロセッサがネットワー
クで結合されている並列計算機で、フルマップ方式のデ
ィレクトリを用意する場合を説明する。ページおよびラ
インの大きさは、１ページ６４ライン、１ライン５１２
ビット（１ページ４Ｋバイト）とする。フルマップ方式
では、各ラインに４０９６ビット＝５１２バイトのディ
レクトリが必要になり、共有メモリ１ページごとに必要
なディレクトリ容量は、５１２バイト×６４ライン＝３
２Ｋバイトとなる。つまり、共有メモリ容量の８倍の容
量を持つディレクトリが必要となり、実用性に欠ける。
同じ並列計算機で、リミテッドポインタ方式を用意する
場合を説明する。一つのラインのディレクトリに１６プ
ロセッサまで登録可能とすると、各記憶場合には、共有
メモリの１ページごとに必要なディレクトリ容量は、
（１２＋１）／８バイト×１６ポインタ×６４ライン＝
１６６４バイトとなる（４０９６台の場合のラインディ
レクトリの１ポインタには、各プロセッサ番号を表すた
めには１２ビット必要である。また、各ポインタには、
そのポインタが使用されているかどうかを示すＶａｌｉ
ｄビットが１ビット必要である）。リミテッドポインタ
ディレクトリ方式では、ディレクトリの容量は共有メモ
リ容量の約２／５となりフルマップ方式より大幅に軽減
されるものの、まだ相当大きい。以上に示した通り、フ
ルマップ方式は、大規模の並列計算機では実用性に欠け
る容量のディレクトリを必要とする。リミテッド方式で
も、ディレクトリの容量は無視できない大きさになる。
本発明の目的は、ディレクトリ容量の小さい、ディレク
トリ方式の分散型共有メモリを持つ大規模並列計算機を
実現することにある。When a large-scale parallel computer having a distributed shared memory is constructed using the above directory system, an extremely large directory capacity is required. As an example, a case where a full-map type directory is prepared in a parallel computer in which 4096 processors are connected via a network will be described. The size of page and line is 64 lines per page and 512 lines per line.
Bit (4K bytes per page). In the full map method, a directory of 4096 bits = 512 bytes is required for each line, and the directory capacity required for each page of shared memory is 512 bytes × 64 lines = 3.
It will be 2K bytes. That is, a directory having a capacity eight times as large as the shared memory capacity is required, which is not practical.
A case where the limited pointer method is prepared in the same parallel computer will be described. If up to 16 processors can be registered in the directory of one line, the directory capacity required for each page of the shared memory is:
(12 + 1) / 8 bytes x 16 pointers x 64 lines =
1664 bytes (1 pointer of the line directory for 4096 units requires 12 bits to represent each processor number. Further, each pointer has
Vali indicating if the pointer is used
1 bit is required for d bits). In the limited pointer directory method, the capacity of the directory is about 2/5 of the shared memory capacity, which is significantly reduced as compared with the full map method, but it is still considerably large. As described above, the full map method requires a directory with a capacity that is not practical for a large-scale parallel computer. Even with the limited method, the size of the directory is not negligible.
An object of the present invention is to realize a large-scale parallel computer having a directory type distributed shared memory with a small directory capacity.

【０００８】[0008]

【課題を解決するための手段】上記目的を達成するた
め、本発明は、複数のプロセッサと該各プロセッサがア
クセスできる複数の分散型の共有メモリを備え、前記各
プロセッサは共有メモリのデータをライン単位で登録す
るキャッシュメモリを備える並列計算機において、前記
共有メモリは、該共有メモリの各ページごとに用意さ
れ、該ページの一部または全てのラインをキャッシュメ
モリに登録したプロセッサを記憶するページディレクト
リと、共有メモリの各ラインごとに用意され、該ライン
をキャッシュメモリに登録したプロセッサのページディ
レクトリ上での位置をビットマップ形式で記憶するライ
ンディレクトリとを備えるようにしている。また、前記
ラインディレクトリは、共有メモリの各ラインごとに用
意され、該ラインをキャッシュメモリに登録したプロセ
ッサのページディレクトリ上での位置をポインタ形式で
記憶するようにしている。また、前記複数の共有メモリ
と前記複数のプロセッサとをネットワークで接続するよ
うにしている。また、前記共有メモリと前記プロセッサ
とによりノードを構成し、複数の前記ノードをネットワ
ークで接続するようにしている。また、前記ページディ
レクトリ毎に、該ページディレクトリに記憶されている
プロセッサのうち、記憶された時間が最も早いプロセッ
サを指示するページディレクトリポインタを備えるよう
にしている。また、前記ページディレクトリに記憶され
ている各プロセッサ対応に、該ページディレクトリから
のプロセッサの記憶抹消の可否を表すロックビットを備
えるようにしている。In order to achieve the above object, the present invention comprises a plurality of processors and a plurality of distributed shared memories which can be accessed by the respective processors, and each of the processors transfers data in the shared memory to a line. In a parallel computer having a cache memory that is registered in units, the shared memory is prepared for each page of the shared memory, and a page directory for storing a processor that registers some or all lines of the page in the cache memory A line directory that is prepared for each line of the shared memory and that stores the position on the page directory of the processor that registered the line in the cache memory in the bitmap format is provided. The line directory is prepared for each line of the shared memory, and the position on the page directory of the processor that registered the line in the cache memory is stored in the pointer format. Further, the plurality of shared memories and the plurality of processors are connected by a network. Further, a node is configured by the shared memory and the processor, and a plurality of the nodes are connected by a network. Further, each page directory is provided with a page directory pointer which points to a processor having the earliest stored time among the processors stored in the page directory. Further, each processor stored in the page directory is provided with a lock bit indicating whether or not the processor can delete the memory from the page directory.

【０００９】[0009]

【作用】本発明によれば、共有メモリにページディレク
トリとラインディレクトリを備えたことにより、ライン
ディレクトリに必要となる容量を大幅に低減することが
でき、これにより、ディレクトリ全体に必要な容量の大
幅な低減を達成することができる。According to the present invention, since the shared memory is provided with the page directory and the line directory, the space required for the line directory can be significantly reduced, and thus the space required for the entire directory can be greatly reduced. Can be achieved.

【００１０】[0010]

【実施例】図１〜図６に本発明の１実施例を示す。ま
た、図９に、本実施例によるページディレクトリ７０、
ラインディレクトリ９０の内容の１例を示す。最初に、
図７〜図９により、本実施例の概要を説明する。８台の
プロセッサＩ〜ＶＩＩＩが、それぞれ共有メモリから６
ラインだけ登録できるキャッシュメモリ２０−０〜７を
持つ場合の、キャッシュメモリの内容の例を図７に示
す。各プロセッサのキャッシュメモリ２０が図７に示す
登録状況である時、本実施例に示す方式で、各ラインが
登録されているプロセッサを記憶した場合の、ページデ
ィレクトリ７０およびラインディレクトリ９０を図９に
示す。ただし、ライン０〜７の８本で共有メモリ１ペー
ジとし、そのページのページディレクトリ７０、ライン
０〜７に対応するラインディレクトリ９０−０〜７の内
容のみ示す。1 to 6 show an embodiment of the present invention. Further, FIG. 9 shows a page directory 70 according to the present embodiment.
An example of the contents of the line directory 90 is shown. At first,
The outline of the present embodiment will be described with reference to FIGS. Eight processors I-VIII are each connected to shared memory 6
FIG. 7 shows an example of the contents of the cache memory having the cache memories 20-0 to 20-7 capable of registering only lines. When the cache memory 20 of each processor has the registration status shown in FIG. 7, the page directory 70 and the line directory 90 when the processor in which each line is registered are stored by the method shown in this embodiment are shown in FIG. Show. However, one line of the shared memory is made up of eight lines 0 to 7, and only the contents of the page directory 70 of that page and the line directories 90-0 to 90 corresponding to the lines 0 to 7 are shown.

【００１１】ライン０〜７を記憶しているプロセッサ
は、プロセッサＩ、ＩＩ、ＩＩＩ、ＶＩであるため、ペ
ージディレクトリ７０のポインタ９４にはプロセッサ
Ｉ、ＩＩ、ＩＩＩ、ＶＩの４つのプロセッサＩＤを記憶
する。さらに、各ラインに、ラインディレクトリ９０−
０〜７を用意する。ラインディレクトリ９０−０〜７
は、そのラインをキャッシュメモリ２０−０〜７に登録
しているプロセッサが、ページディレクトリ７０に記憶
されているプロセッサのうち、どのプロセッサであるか
をビットマップ形式で記憶する。本例では、ページディ
レクトリ７０に記憶されるプロセッサ４つに対応した４
ビットのビットマップで、ラインディレクトリ９０−０
〜７を構成している。例えば、ライン０に用意されたラ
インディレクトリ９０−０が全て１であれば、ページデ
ィレクトリに記憶されているすべてのプロセッサＩ、Ｉ
Ｉ、ＩＩＩ、ＶＩがライン０をキャッシュに登録してい
ることを意味する。なお、ラインディレクトリ９０−０
〜７は、ポインタ形式、すなわちポインタ列で構成する
ことも可能である。この場合、各ポインタは、ページデ
ィレクトリ７０中のプロセッサ登録位置を示すことにな
る。例えば、図９の場合ページディレクトリには４つの
プロセッサＩＤが登録されているが、その登録位置にそ
れぞれポインタ００、０１、１０、１１を与え、ライン
ディレクトリにはこのポインタを格納するようにする。
ページディレクトリ７０のポインタ９４のそれぞれに
は、そのポインタの使用／未使用を示すフラグを用意し
ても、用意しなくてもよい。用意しない場合には、ペー
ジディレクトリが記憶するプロセッサは常に定数個とな
る。この場合、ページディレクトリ７０には記憶されて
いるが、対応するラインディレクトリ９０−０〜７のビ
ットが全て０のプロセッサが有り得る。Since the processors storing the lines 0 to 7 are the processors I, II, III and VI, the pointer 94 of the page directory 70 stores the four processor IDs of the processors I, II, III and VI. To do. Furthermore, for each line, the line directory 90-
Prepare 0-7. Line directory 90-0 to 7
Stores, in a bitmap format, which of the processors stored in the page directory 70 is the processor that registered the line in the cache memories 20-0 to 20-7. In this example, 4 corresponding to the 4 processors stored in the page directory 70
Bit directory, line directory 90-0
~ 7 are configured. For example, if all the line directories 90-0 prepared for the line 0 are 1, all the processors I, I stored in the page directory
It means that I, III, and VI have registered line 0 in the cache. The line directory 90-0
.About.7 can also be configured in a pointer format, that is, a pointer string. In this case, each pointer indicates the processor registration position in the page directory 70. For example, in the case of FIG. 9, four processor IDs are registered in the page directory, but pointers 00, 01, 10 and 11 are given to the registered positions, respectively, and these pointers are stored in the line directory.
Each of the pointers 94 of the page directory 70 may or may not be provided with a flag indicating use / unuse of the pointer. If not prepared, the page directory always stores a fixed number of processors. In this case, although there are processors stored in the page directory 70, all the bits of the corresponding line directories 90-0 to 90-7 may be 0.

【００１２】リミテッドポインタ方式のディレクトリ
で、図７に示す登録状況を記憶した場合の、ライン０〜
７に用意されたディレクトリ９２−０〜７の内容を図８
に示す。各ラインに用意されたディレクトリ９２−０〜
７のポインタ９４は、そのラインをキャッシュメモリ２
０−０〜７に登録しているプロセッサのＩＤをそのまま
記憶する。例えば、ライン０は、プロセッサＩ、ＩＩ、
ＩＩＩ、ＶＩの４つのキャッシュメモリ２０−０〜７に
登録されているため、ディレクトリ９２−０には、プロ
セッサＩ、ＩＩ、ＩＩＩ、ＶＩの４つのプロセッサＩＤ
が記憶されている。なお、各ポインタ９４は、使用／未
使用を表すＶａｌｉｄフラグ９６が必要である。Lines 0 to 0 when the registration status shown in FIG. 7 is stored in the limited pointer type directory.
8 shows the contents of the directories 92-0 to 92-7 prepared in FIG.
Shown in. Directories 92-0 to 92-0 prepared for each line
The pointer 94 of 7 indicates the line to the cache memory 2
The IDs of the processors registered in 0-0 to 7 are stored as they are. For example, line 0 is processor I, II,
Since it is registered in the four cache memories 20-0 to 7 of III and VI, the four processor IDs of processors I, II, III, and VI are stored in the directory 92-0.
Is remembered. It should be noted that each pointer 94 needs a Valid flag 96 indicating whether it is used or not.

【００１３】一般に、あるラインをキャッシュメモリに
登録するプロセッサは、近接するラインもキャッシュメ
モリに登録する確率が高い。これは、ループ構造をもつ
プログラムでは、連続した領域を処理することが多いこ
とによる（一般的には、連続する領域を処理するよう
に、プログラムを最適化する）。例えば、図７のプロセ
ッサＶのキャッシュメモリ２０−４のように、連続する
ラインがキャッシュメモリに登録される確率は高い。上
記の特性がある場合、リミテッドポインタ方式では、近
接するディレクトリに同じプロセッサのＩＤが繰り返し
登録されることになる。この場合、ディレクトリのもつ
情報に冗長性が発生する。例えば、図８の８本のディレ
クトリ９２−０〜７には、のべ２２個のプロセッサＩＤ
が記憶されているが、記憶されているのはプロセッサ
Ｉ、ＩＩ、ＩＩＩ、ＶＩの４つだけである。本実施例の
方式では、この冗長性を利用して、ディレクトリの容量
を削減する。図８、９の例で、それぞれに必要なディレ
クトリの容量を計算する。プロセッサは全８個であるか
ら、ポインタ９４は３ビット必要である。リミテッドポ
インタ方式では、Ｖａｌｉｄビット９６も含めて、全部
で（３＋１）ビット×４ポインタ×８ライン＝１２８ビ
ットのディレクトリ容量が必要になる。これに対し、本
実施例の方式では、ページディレクトリ７０に３ビット
×４ポインタ＝１２ビット、ラインディレクトリ９０−
０〜８に４ビット×８ライン＝３２ビット、合計４４ビ
ットで済み、リミテッドポインタ方式に比べ、約１／３
の容量で済む。更に、４０９６台のプロセッサがネット
ワークで結合されている並列計算機で、本実施方式で必
要とされるディレクトリ容量を計算する。前記のフルマ
ップ方式およびリミテッドポインタ方式でディレクトリ
の容量を計算した際と同様に、ページおよびラインの大
きさは、１ページ６４ライン、１ライン５１２ビット
（１ページ４Ｋバイト）とする。各ページのページディ
レクトリに登録できるプロセッサ数を１６とした場合、
１２／８バイト×１６ポインタ＋１６／８バイト×６４
ライン＝１５２バイトとなる。また、各ページのページ
ディレクトリに登録できるプロセッサ数を６４とした場
合にも、１２／８バイト×６４ポインタ＋６４／８バイ
ト×６４ライン＝６０８バイトで済む。これは、同条件
におけるフルマップ方式の３２Ｋバイト、リミテッドポ
インタ方式の１６６４バイトに比べると、ディレクトリ
容量が低減されている。Generally, a processor that registers a certain line in the cache memory has a high probability of registering an adjacent line in the cache memory. This is because a program having a loop structure often processes a continuous area (generally, the program is optimized to process a continuous area). For example, like the cache memory 20-4 of the processor V in FIG. 7, there is a high probability that consecutive lines will be registered in the cache memory. In the case of the above-mentioned characteristics, in the limited pointer method, the ID of the same processor is repeatedly registered in the adjacent directories. In this case, redundancy occurs in the information that the directory has. For example, a total of 22 processor IDs are stored in the eight directories 92-0 to 7 in FIG.
Are stored, but only the four processors I, II, III, and VI are stored. In the method of this embodiment, the capacity of the directory is reduced by utilizing this redundancy. In the examples of FIGS. 8 and 9, the required directory capacity is calculated for each. Since there are a total of eight processors, the pointer 94 requires 3 bits. The limited pointer method requires a directory capacity of (3 + 1) bits × 4 pointers × 8 lines = 128 bits in total including the Valid bit 96. On the other hand, in the method of this embodiment, the page directory 70 has 3 bits × 4 pointers = 12 bits, and the line directory 90−
4 bits for 0 to 8 x 32 lines = 32 bits, 44 bits in total, about 1/3 of the limited pointer method
Capacity is enough. Further, a parallel computer in which 4096 processors are connected via a network calculates the directory capacity required in this embodiment. Similar to the case where the directory capacity is calculated by the full map method and the limited pointer method, the size of page and line is set to 64 lines per page, 512 bits per line (4 Kbytes per page). If the number of processors that can be registered in the page directory of each page is 16,
12/8 bytes x 16 pointers + 16/8 bytes x 64
Line = 152 bytes. Further, even when the number of processors that can be registered in the page directory of each page is 64, it will be 12/8 bytes × 64 pointers + 64/8 bytes × 64 lines = 608 bytes. This is smaller in directory capacity than the full-map type 32 Kbytes and the limited pointer type 1664 bytes under the same conditions.

【００１４】本実施例においては、ページディレクトリ
が全て使用されていて、かつさらに新たなプロセッサを
ページディレクトリに記憶する必要が生じた場合の処理
が問題となる。この処理の概要を、図１を用いて説明す
る。本実施例では、ページディレクトリ７０が全て使用
されていて、かつさらに新たなプロセッサをページディ
レクトリ７０に記憶する必要が生じた場合には、ページ
ディレクトリ７０に記憶されているプロセッサのうち、
最も記憶された時間が早いプロセッサを指すページディ
レクトリポインタ８０により指されるプロセッサに対し
て、メッセージ解釈／実行部５０がリプレース処理を要
求する。例えば，ページディレクトリ７０に既に記憶さ
れているプロセッサ１５について、プロセッサ２１５へ
のリプレース処理を行なう場合、メッセージ解釈／実行
部５０は、プロセッサ１５に、ページディレクトリ７０
の所属する共有メモリ４５のページから、キャッシュメ
モリ２０に登録されているラインを全て共有メモリ４５
に書き戻すよう要求し、プロセッサ１５をラインディレ
クトリ９０およびページディレクトリ７０より消去する
ことで、ディレクトリを空けた後に、新しいプロセッサ
２１５をページディレクトリ７０に記憶する。ページデ
ィレクトリポインタ８０は、リプレース処理ごとにイン
クリメント（またはディクリメント）することで、最も
記憶された時間が早いプロセッサを常に指すことができ
る。さらに、ページディレクトリ７０に記憶されている
プロセッサそれぞれについて、記憶抹消の可否を表すロ
ックビット７５を持たせることで、記憶抹消を避けたい
プロセッサのリプレースを禁止することができる。この
場合、リプレース処理の必要が生じた場合には、ページ
ディレクトリポインタ８０がロックビット７５によりリ
プレースが禁止されていないプロセッサを指すまで、ペ
ージディレクトリポインタ８０をインクリメント（また
はディクリメント）する。ページディレクトリポインタ
８０およびロックビット７５により、リプレース処理の
頻発を避け、リプレース処理の時間コストを低減でき
る。In this embodiment, there is a problem in processing when the page directory is completely used and it is necessary to store a new processor in the page directory. The outline of this process will be described with reference to FIG. In the present embodiment, when all the page directories 70 are used and a new processor needs to be stored in the page directory 70, among the processors stored in the page directory 70,
The message interpretation / execution unit 50 requests replacement processing to the processor pointed to by the page directory pointer 80 that points to the processor with the earliest stored time. For example, when replacing the processor 15 already stored in the page directory 70 with the processor 215, the message interpretation / execution unit 50 causes the processor 15 to notify the page directory 70.
From the page of the shared memory 45 to which all the lines registered in the cache memory 20 belong
, And the processor 15 is erased from the line directory 90 and the page directory 70, so that the new processor 215 is stored in the page directory 70 after the directory is emptied. The page directory pointer 80 can always point to the processor with the earliest stored time by incrementing (or decrementing) each replacement process. Furthermore, by providing each processor stored in the page directory 70 with a lock bit 75 indicating whether or not the memory can be erased, it is possible to prohibit the replacement of the processor whose memory is to be erased. In this case, when the replacement process becomes necessary, the page directory pointer 80 is incremented (or decremented) until the page directory pointer 80 points to a processor whose replacement is prohibited by the lock bit 75. By using the page directory pointer 80 and the lock bit 75, frequent replacement processing can be avoided and the time cost for replacement processing can be reduced.

【００１５】図１〜図６を用いて、本実施例の詳細を説
明する。図１はシステムユニット０、２００、１０００
を持つ、分散型共有メモリ方式の並列計算機である。各
システムユニットは、プロセッサノード１０、２１０、
１０１０、メモリノード４０、２４０、１０４０より構
成される。本実施例は、全てのプロセッサユノード１
０、２１０、１０１０と、メモリユノード４０、２４
０、１０４０との間を、ネットワーク１５００で接続し
た、均質な分散型共有メモリである。同一システムユニ
ット０内のプロセッサノード１０とメモリノード４０、
システムユニット２００内のプロセッサノード２１０と
メモリノード２４０、システムユニット１０００内のプ
ロセッサノード１０１０とメモリノード１０４０間の接
続には、ネットワーク１５００を利用せずに、別に用意
した結合方法を利用する、不均質な分散型共有メモリに
おいても、本実施例は、適応できる。また、本実施例で
は、常に、ページディレクトリ７０が全て使用されてい
る状態で動作させる。この場合、ページディレクトリ７
０へ新たにプロセッサ１５、２１５、１０１５を記憶さ
せる場合、必ずリプレース処理が必要になる。The details of this embodiment will be described with reference to FIGS. FIG. 1 shows system units 0, 200 and 1000.
It is a distributed shared memory parallel computer. Each system unit includes a processor node 10, 210,
1010 and memory nodes 40, 240, 1040. In this embodiment, all processor nodes 1
0, 210, 1010 and memory unit nodes 40, 24
A uniform distributed shared memory in which 0 and 1040 are connected by a network 1500. A processor node 10 and a memory node 40 in the same system unit 0,
The connection between the processor node 210 and the memory node 240 in the system unit 200 and the connection between the processor node 1010 and the memory node 1040 in the system unit 1000 does not use the network 1500 but uses a separately prepared coupling method. This embodiment can be applied to various distributed shared memories. Further, in the present embodiment, the page directory 70 is always operated in a state of being used. In this case, page directory 7
When the processors 15, 215, and 1015 are newly stored in 0, the replacement process is necessary.

【００１６】システムユニット０についてのみ、プロセ
ッサノード１０およびメモリノード４０の構成を述べ
る。他のシステムユニット２００、１０００もシステム
ユニット０と同じ構成である。プロセッサノード１０
は、プロセッサ１５、キャッシュメモリ２０、プロセッ
サネットワーク接続回路２５、メッセージ組立回路３
０、メッセージ分解回路３５より構成される。プロセッ
サネットワーク接続回路２５は、プロセッサ１５の共有
メモリ４５、２４５、１０４５に対する要求が、どのメ
モリノード４０、２４０、１０４０に対する要求である
かを判別するための分散メモリマップ２７を持ち、また
プロセッサ０とメッセージ組立回路３０、メッセージ分
解回路３５とを接続する機能とを持つ。メッセージ組立
回路３０は、プロセッサネットワーク接続回路２５から
のネットワーク１５００に対するメモリノード番号、ア
ドレス、データおよびコマンドに対し、メッセージパケ
ットを生成して、ネットワーク１５００に送る機能を持
つ。メッセージ分解回路３５は、ネットワークからのメ
ッセージパケットを、アドレス、データおよびコマンド
に分解して、プロセッサネットワーク接続回路２５に送
る機能をもつ。Only for the system unit 0, the configurations of the processor node 10 and the memory node 40 will be described. The other system units 200 and 1000 have the same configuration as the system unit 0. Processor node 10
Is a processor 15, a cache memory 20, a processor network connection circuit 25, a message assembly circuit 3
0, a message decomposition circuit 35. The processor network connection circuit 25 has a distributed memory map 27 for determining which memory node 40, 240, 1040 the request for the shared memory 45, 245, 1045 of the processor 15 is, and the processor 0 and the processor 0. It has a function of connecting the message assembling circuit 30 and the message disassembling circuit 35. The message assembling circuit 30 has a function of generating a message packet for the memory node number, address, data and command for the network 1500 from the processor network connection circuit 25 and sending the message packet to the network 1500. The message decomposing circuit 35 has a function of decomposing a message packet from the network into an address, data and a command, and sending it to the processor network connection circuit 25.

【００１７】メモリノード４０は、共有メモリ４５、ぺ
ージディレクトリ７０、ロックビット７５、ページディ
レクトリポインタ８０、ラインディレクトリ９０、ペー
ジディレクトリ制御回路６５、ラインディレクトリ制御
回路８５、メッセージ組立回路５５、メッセージ分解回
路６０より構成される。メッセージ解釈／実行部５０
は、プロセッサ１５、２１５、１０１５の要求に対し、
ページディレクトリ７０、ロックビット７５、ページデ
ィレクトリポインタ８０にアドレス線１７３を介してペ
ージ番号を送るほか、ページディレクトリ制御回路６
５、ラインディレクトリ制御回路８５にアドレス情報、
データ、制御信号を送り、ページディレクトリ７０、ラ
インディレクトリ９０の書き込み、または読み出しを行
う。また、共有メモリ４５の書き込み、読み出しを行
う。本実施例では、メッセージ解釈／実行部５０を、内
部にプロセッサ、メモリ、およびＩ／Ｏを持つ制御シス
テムとするが、回路で構成することも可能である。ペー
ジディレクトリ制御回路６５は、メッセージ解釈／実行
部５０からの制御信号により、ページディレクトリ７
０、ロックビット７５、ページディレクトリポインタ８
０を操作する回路である。ラインディレクトリ制御回路
８５は、メッセージ解釈／実行部５０からの制御信号に
より、ラインディレクトリ９０を操作する回路である。The memory node 40 includes a shared memory 45, a page directory 70, a lock bit 75, a page directory pointer 80, a line directory 90, a page directory control circuit 65, a line directory control circuit 85, a message assembly circuit 55, and a message disassembly circuit. It consists of 60. Message interpreter / executor 50
Responds to requests from processors 15, 215, 1015
In addition to sending the page number to the page directory 70, the lock bit 75, and the page directory pointer 80 via the address line 173, the page directory control circuit 6
5, address information in the line directory control circuit 85,
Data and control signals are sent to write or read the page directory 70 and line directory 90. In addition, writing to and reading from the shared memory 45 are performed. In this embodiment, the message interpretation / execution unit 50 is a control system having a processor, a memory, and an I / O therein, but it may be configured by a circuit. The page directory control circuit 65 receives the page directory 7 according to the control signal from the message interpreting / executing unit 50.
0, lock bit 75, page directory pointer 8
It is a circuit that operates 0. The line directory control circuit 85 is a circuit for operating the line directory 90 by a control signal from the message interpreting / executing unit 50.

【００１８】図２にページディレクトリ制御回路６５の
構成を示す。ここでは、ページディレクトリ７０に各ペ
ージごとに登録できるプロセッサ数を、ｍとしている。
プロセッサ番号レジスタ１０１には、メッセージ解釈／
実行部５０よりデータ線１５７で送られてきたプロセッ
サ番号が入る。プロセッサ番号レジスタ１１０〜１１４
には、データ線１７０より送られてきた、ページディレ
クトリ７０に記憶されているプロセッサ番号が入る。プ
ロセッサ番号レジスタ１１０〜１１４の数はｍである。
比較器１０５〜１０９は、プロセッサ番号レジスタ１０
１とプロセッサ番号レジスタ１１０〜１１４のプロセッ
サ番号を比較し、一致判定結果を、ビット演算器１２
０、およびデータ線１６０を介してラインディレクトリ
制御回路８５のビット演算器１８０に送る回路である。
比較結果が全て不一致の場合には、データ線１４５によ
り、メッセージ解釈／実行部５０へ不一致が出力され
る。マルチプレクサ１１７は、プロセッサ番号レジスタ
１１０〜１１４に記憶されているプロセッサ番号のう
ち、セレクタ１１８からの信号により決定される１つ
を、メッセージ解釈／実行部５０に出力する回路であ
る。デマルチプレクサ１１９は、メッセージ解釈／実行
部５０からの制御信号１５０に応じて、プロセッサ番号
レジスタ１０１に記憶されているプロセッサ番号を、プ
ロセッサ番号レジスタ１１０〜１１４のうちポインタ情
報レジスタ１２２により示される１つのレジスタへ出力
する回路である。ロック情報レジスタ１２１には、デー
タ線１７１より送られてきた、ロックビット７５の１ペ
ージ分のロック情報が入る。ビット演算器１２０は、ロ
ック情報レジスタ１２１の内容を、メッセージ解釈／実
行部５０からの制御信号１５１に応じてセット、制御信
号１５２に応じてリセットする回路である。ロック情報
レジスタ１２１のどのビットをセット／リセットするか
は、比較器１０５〜１０９からの一致信号により決定さ
れる。また、ビット演算器１２０は、メッセージ解釈／
実行部５０からの制御信号１５３に応じて、ロック情報
レジスタ１２１のビットが０か１かをテストし、結果を
データ線１４６を介してメッセージ解釈／実行部５０へ
送る機能を持つ。テストするビットは、ポインタ情報レ
ジスタ１２２からのデータにより決定される。ポインタ
情報レジスタ１２２には、データ線１７２から送られて
きた、ディレクトリページポインタ８０の１ページ分の
情報が入る。ポインタレジスタの内容は、データ線１６
１を介して、ラインディレクトリ制御回路８５のビット
演算器１８０にも出力される。インクリメント回路１２
３は、ポインタ情報レジスタ１２２の内容を、メッセー
ジ解釈／実行部５０からの制御信号１５６に応じて、イ
ンクリメントする回路である。プライオリティエンコー
ダは、ラインディレクトリ制御回路８５からデータ線１
６５で送られてきたライン情報レジスタ１７９の内容を
エンコードする回路である。エンコードした結果が複数
になる場合には、一定の方法もしくはランダムな方法に
より、一つを選んでセレクタ１１８に出力する。メッセ
ージ解釈／実行部５０からの制御線１５５により、エン
コードした複数の結果を順にセレクタ１１８に出力する
機能も持つ。セレクタ１１８は、メッセージ解釈／実行
部５０からの制御信号１５４に応じて、マルチプレクサ
１１７に出力する信号を、ポインタ情報レジスタ１２２
の内容か、プライオリティエンコーダ１２４の内容かに
切り換える回路である。FIG. 2 shows the configuration of the page directory control circuit 65. Here, the number of processors that can be registered in the page directory 70 for each page is m.
The processor number register 101 stores message interpretation /
The processor number sent from the execution unit 50 via the data line 157 is entered. Processor number registers 110-114
In, the processor number stored in the page directory 70 sent from the data line 170 is entered. The number of processor number registers 110 to 114 is m.
The comparators 105 to 109 are the processor number register 10
1 and the processor numbers of the processor number registers 110 to 114 are compared, and the coincidence determination result is determined by the bit calculator 12
It is a circuit for sending to the bit calculator 180 of the line directory control circuit 85 via 0 and the data line 160.
If all the comparison results do not match, the data line 145 outputs the mismatch to the message interpretation / execution unit 50. The multiplexer 117 is a circuit that outputs one of the processor numbers stored in the processor number registers 110 to 114, which is determined by the signal from the selector 118, to the message interpretation / execution unit 50. In response to the control signal 150 from the message interpreting / executing unit 50, the demultiplexer 119 sets the processor number stored in the processor number register 101 to one of the processor number registers 110 to 114 indicated by the pointer information register 122. It is a circuit that outputs to a register. The lock information register 121 stores the lock information for one page of the lock bit 75 sent from the data line 171. The bit calculator 120 is a circuit that sets the contents of the lock information register 121 according to the control signal 151 from the message interpreting / executing unit 50 and resets it according to the control signal 152. Which bit of the lock information register 121 is set / reset is determined by a match signal from the comparators 105 to 109. Further, the bit calculator 120 interprets the message /
It has a function of testing whether the bit of the lock information register 121 is 0 or 1 according to the control signal 153 from the execution unit 50 and sending the result to the message interpretation / execution unit 50 via the data line 146. The bit to test is determined by the data from the pointer information register 122. Information for one page of the directory page pointer 80 sent from the data line 172 is stored in the pointer information register 122. The contents of the pointer register are the data line 16
It is also output to the bit calculator 180 of the line directory control circuit 85 via 1. Increment circuit 12
Reference numeral 3 is a circuit that increments the contents of the pointer information register 122 according to the control signal 156 from the message interpreting / executing unit 50. The priority encoder uses the data line 1 from the line directory control circuit 85.
It is a circuit that encodes the contents of the line information register 179 sent in 65. When there are a plurality of encoded results, one is selected by a fixed method or a random method and output to the selector 118. The control line 155 from the message interpretation / execution unit 50 also has a function of sequentially outputting a plurality of encoded results to the selector 118. The selector 118 outputs the signal output to the multiplexer 117 according to the control signal 154 from the message interpretation / execution unit 50 to the pointer information register 122.
Or the content of the priority encoder 124.

【００１９】図３に、ラインディレクトリ制御回路９０
の構成を示す。ライン番号発生器１７５は、メッセージ
解釈／実行部５０からの制御線１８８に応じて、全ての
ライン番号を順々に発生させる回路である。発生させた
ライン番号は、セレクタ１７６に出力される。ライン番
号発生器１７５は、ビット演算器１８０にテスト要求も
出力する。セレクタ１７６は、メッセージ解釈／実行部
５０からの制御線１８７に応じて、ライン番号発生器１
７５の出力するライン番号、もしくはアドレス線１８５
より得られるメッセージ解釈／実行部５０からのライン
番号のどちらかを出力する回路である。ミキサ１７８
は、メッセージ解釈／実行部５０よりアドレス線１８６
で送られてきた、ページ番号と、セレクタ１７６からの
ライン番号とを合成して、ラインディレクトリ９０のア
ドレス線１６０に結果を出力する回路である。ライン情
報レジスタ１７９には、データ線１９７から送られてき
た、ラインディレクトリ９０の１ライン分の情報が入
る。ライン情報レジスタ１７９の内容は、データ線１６
５より、ページディレクトリ制御回路６５のプライオリ
ティエンコーダ１５５に出力される。ビット演算器１８
０は、ライン情報レジスタ１８０の内容を、メッセージ
解釈／実行部５０からの制御信号１８９に応じてセッ
ト、制御信号１９０に応じてリセットする回路である。
ライン情報レジスタ１７９のどのビットをセット／リセ
ットするかは、ページディレクトリ制御回路６５の比較
器１０５〜１０９からの一致信号１６０により決定され
る。また、ビット演算器１８０は、ライン番号発生器１
７５からの制御信号に応じて、ライン情報レジスタ１７
９のビットが０か１かをテストし、結果をライン番号ラ
ッチ１７７に出力する機能を持つ。テストするビット
は、ページディレクトリ制御回路６５のポインタ情報レ
ジスタ１６１からのデータにより決定される。ライン番
号ラッチ１７７は、ビット演算器１８０のテスト結果が
真の場合に、セレクタ１７６の出力するライン番号をメ
ッセージ解釈／実行部５０に出力する回路である。FIG. 3 shows a line directory control circuit 90.
Shows the configuration of. The line number generator 175 is a circuit that sequentially generates all the line numbers according to the control line 188 from the message interpretation / execution unit 50. The generated line number is output to the selector 176. The line number generator 175 also outputs a test request to the bit calculator 180. The selector 176 receives the line number generator 1 according to the control line 187 from the message interpretation / execution unit 50.
75 output line number or address line 185
This is a circuit for outputting either of the line numbers from the message interpretation / execution unit 50 obtained further. Mixer 178
Is the address line 186 from the message interpreter / executor 50.
It is a circuit for synthesizing the page number and the line number sent from the selector 176, which are sent in step S1, and outputting the result to the address line 160 of the line directory 90. The line information register 179 stores information for one line of the line directory 90 sent from the data line 197. The content of the line information register 179 is the data line 16
5 is output to the priority encoder 155 of the page directory control circuit 65. Bit calculator 18
Reference numeral 0 is a circuit that sets the content of the line information register 180 according to the control signal 189 from the message interpreting / executing unit 50 and resets it according to the control signal 190.
Which bit of the line information register 179 is set / reset is determined by the coincidence signal 160 from the comparators 105 to 109 of the page directory control circuit 65. Further, the bit calculator 180 is the line number generator 1
In response to a control signal from 75, the line information register 17
It has a function of testing whether the 9th bit is 0 or 1 and outputting the result to the line number latch 177. The bit to be tested is determined by the data from the pointer information register 161 of the page directory control circuit 65. The line number latch 177 is a circuit that outputs the line number output by the selector 176 to the message interpretation / execution unit 50 when the test result of the bit calculator 180 is true.

【００２０】以下に、本発明方式による、プロセッサ１
５によるロード、ストア、フラッシュ実行時のメモリ制
御機構の動作を説明する。プロセッサ１５は、ストアイ
ン型のキャッシュメモリ２０を持つとする。この中で、
ページディレクトリ制御回路６５、ラインディレクトリ
制御回路８５の動作が必要とされるページディレクトリ
チェック２０００、ページディレクトリ記憶２１００、
プロセッサ検索２２００、全プロセッサ検索２３００、
ラインディレクトリ記憶２４００、ラインディレクトリ
抹消２５００、ロックビットセット、ロックビットリセ
ットの詳細は、後に説明する。The processor 1 according to the method of the present invention will be described below.
The operation of the memory control mechanism at the time of executing load, store, and flush according to No. 5 will be described. It is assumed that the processor 15 has a store-in type cache memory 20. In this,
Page directory control circuit 65, line directory control circuit 85, page directory check 2000, page directory storage 2100,
Processor search 2200, all processor search 2300,
Details of the line directory storage 2400, the line directory deletion 2500, the lock bit set, and the lock bit reset will be described later.

【００２１】［１］プロセッサ１５が共有メモリ４５の
データをロードする場合キャッシュメモリ２０に目的のデータを含むラインが登
録されている場合には、プロセッサ１５よりプロセッサ
ネットワーク接続回路２５へはロード要求は出力され
ず、動作は全て終了となる。キャッシュメモリ２０に目
的のデータを含むラインが登録されていない場合に、プ
ロセッサ１５よりプロセッサネットワーク接続回路２５
にロード要求が出力され、以下の動作を行う。プロセッ
サネットワーク接続回路２５において、分散メモリマッ
プ２７をチエックし、ロード要求は共有メモリ４５への
ロード要求と判断される。プロセッサネットワーク接続
回路２５は、メッセージ組立回路３０、ネットワーク１
５００、メッセージ分解回路６０を介して、メッセージ
解釈／実行部５０にロードコマンドを送付する。[1] When the processor 15 loads the data in the shared memory 45 When the line containing the target data is registered in the cache memory 20, the load request is sent from the processor 15 to the processor network connection circuit 25. No operation is output and all operations are completed. When the line including the target data is not registered in the cache memory 20, the processor 15 connects the processor network connection circuit 25 to the processor network connection circuit 25.
A load request is output to and the following operation is performed. In the processor network connection circuit 25, the distributed memory map 27 is checked, and the load request is determined to be a load request to the shared memory 45. The processor network connection circuit 25 includes a message assembly circuit 30, a network 1
A load command is sent to the message interpretation / execution unit 50 via the message decomposition circuit 60.

【００２２】ロードコマンドに対するメッセージ解釈／
実行部５０の動作を図４に示し、以下に説明する。ロー
ドコマンドを受けたメッセージ解釈／実行部５０はペー
ジディレクトリチェック２０００を行い、プロセッサ１
５がページディレクトリ７０に記憶されているかを調べ
る。記憶されていなければ、ページディレクトリ記憶２
１００を行い、プロセッサ１５をページディレクトリ７
０に記憶させる。次に他のプロセッサ２１５、１０１５
が、そのラインをキャッシュメモリ２２０、１０２０に
登録していないかどうかプロセッサ検索２２００を行
う。その結果、例えばプロセッサ２１５のキャッシュメ
モリ２２０に登録されていれば、メッセージ解釈／実行
部５０は、メッセージ組立回路５５、ネットワーク１５
００、メッセージ分解回路２３５、プロセッサネットワ
ーク接続回路２２５を介して、プロセッサ２１５にライ
ンをプロセッサ１５へ転送するように依頼する。依頼さ
れたプロセッサ２１５は、キャッシュメモリ２２０より
ラインを読みだし、プロセッサネットワーク接続回路２
２５、メッセージ組立回路２３０、ネットワーク１５０
０、メッセージ分解回路３５、プロセッサネットワーク
接続回路２５を介して、プロセッサ１５へとラインを転
送する。ラインのキャッシュメモリ２２０、１０２０へ
の登録がない場合には、メッセージ解釈／実行部５０は
共有メモリ４５よりラインを読みだし、そのラインをメ
ッセージ組立回路５５、ネットワーク１５００、メッセ
ージ分解回路３５、プロセッサネットワーク接続回路２
５を介して、プロセッサ１５に送る。最後に、メッセー
ジ解釈／実行部５０は、ラインディレクトリ記憶２４０
０を実行し、プロセッサ１５をラインディレクトリ９０
に記憶させる。以上で、プロセッサ１５が共有メモリ４
５のデータをロードする場合の動作の説明を終了する。Message interpretation for load command /
The operation of the execution unit 50 is shown in FIG. 4 and will be described below. Upon receiving the load command, the message interpreting / executing unit 50 performs page directory check 2000, and the processor 1
Check whether 5 is stored in the page directory 70. Page directory storage 2 if not stored
100, processor 15 to page directory 7
Store to 0. Next, another processor 215, 1015
Performs a processor search 2200 to see if the line is registered in the cache memories 220 and 1020. As a result, for example, if it is registered in the cache memory 220 of the processor 215, the message interpretation / execution unit 50 causes the message assembling circuit 55, the network 15
00, the message decomposing circuit 235, and the processor network connecting circuit 225 to request the processor 215 to transfer the line to the processor 15. The requested processor 215 reads the line from the cache memory 220, and the processor network connection circuit 2
25, message assembly circuit 230, network 150
0, the message decomposition circuit 35, and the processor network connection circuit 25 to transfer the line to the processor 15. If the line is not registered in the cache memories 220 and 1020, the message interpretation / execution unit 50 reads the line from the shared memory 45, and the line is assembled into the message assembly circuit 55, the network 1500, the message decomposition circuit 35, the processor network. Connection circuit 2
5 to the processor 15. Finally, the message interpreter / executor 50 uses the line directory store 240
0 to run processor 15 in line directory 90
To memorize. With the above processing, the processor 15 causes the shared memory 4
The description of the operation for loading data No. 5 is finished.

【００２３】［２］プロセッサ１５が共有メモリ４５の
データにストアする場合キャッシュメモリ２０にストアするデータを含むライン
が登録されていない場合、最初に上記データのロードと
同じ動作が行われる。目的のデータを含むラインがキャ
ッシュメモリ２０に登録された状態で、以下の動作が行
われる。プロセッサ１５は、ストアを実行する場合に、
他プロセッサ２１５、１０１５のキャッシュメモリ２２
０、１０２０に登録しているそのラインを無効化する要
求を、プロセッサネットワーク接続回路２５に出力す
る。プロセッサネットワーク接続回路２５は、分散メモ
リマップ２７をチェックすることで、ストアするデータ
が、元は共有メモリ４５のデータであることを判断し、
メッセージ組立回路３０、ネットワーク１５００、メッ
セージ分解回路６０を介し、インバリデートコマンドを
送る。インバリデートコマンドに対するメッセージ解釈
／実行部５０の動作を、図５に示し、以下に説明する。
インバリデートコマンドを受けたメッセージ解釈／実行
部５０は、そのラインをキャッシュメモリ２２０、１０
２０に登録している全プロセッサ番号を抽出するする全
プロセッサ検索２３００を行う。その結果、例えばプロ
セッサ１５、プロセッサ２１５、プロセッサ１０１５の
キャッシュメモリ２０、キャッシュメモリ２２０、キャ
ッシュメモリ１０２０に登録されていた場合、メッセー
ジ解釈／実行部５０は、インバリデート要求元であるプ
ロセッサ１５を除くプロセッサ２１５、プロセッサ１０
１５に、メッセージ組立回路５５、ネットワーク１５０
０、メッセージ分解回路２３５およびメッセージ分解回
路１０３５、プロセッサネットワーク接続回路２２５お
よびプロセッサネットワーク接続回路１０２５を介し
て、そのラインの無効化を要求する。プロセッサ２１５
およびプロセッサ１０１５は、そのラインをキャッシュ
メモリ２２０およびキャッシュメモリ１０２０から無効
化した後に、プロセッサネットワーク接続回路２２５お
よびプロセッサネットワーク接続回路１０２５、メッセ
ージ組立回路２３０およびメッセージ組立回路１０３
０、ネットワーク１５００、メッセージ分解回路６０を
介し、メッセージ解釈／実行部５０に無効化終了を通達
する。メッセージ解釈／実行部５０は、無効化終了を受
ける度に、無効化したプロセッサ２１５、またはプロセ
ッサ１０１５の記憶を、そのラインのラインディレクト
リ９０より抹消するラインディレクトリ抹消２５００を
行う。インバリデートコマンドを要求したプロセッサ１
５以外のプロセッサ２１５とプロセッサ１０１５の無効
化終了受理およびラインディレクトリ抹消２５００を終
了した後に、メッセージ解釈／実行部５０は、メッセー
ジ組立回路５５、ネットワーク１５００、メッセージ分
解回路３５、プロセッサネットワーク接続回路２５を介
して、プロセッサ１５にインバデーションコマンド終了
を通達する。以上で、プロセッサ１５が共有メモリ４５
のデータにストアする場合の動作の説明を終了する。[2] When the processor 15 stores in the data of the shared memory 45 When the line including the data to be stored in the cache memory 20 is not registered, the same operation as the above-mentioned data loading is first performed. The following operation is performed while the line including the target data is registered in the cache memory 20. When the processor 15 executes the store,
Cache memory 22 of other processors 215, 1015
A request for invalidating the line registered in 0, 1020 is output to the processor network connection circuit 25. The processor network connection circuit 25 checks the distributed memory map 27 to determine that the data to be stored is originally the data in the shared memory 45,
An invalidate command is sent via the message assembly circuit 30, the network 1500, and the message decomposition circuit 60. The operation of the message interpreter / executor 50 for the invalidate command is shown in FIG. 5 and described below.
The message interpretation / execution unit 50 that has received the invalidate command sets the line to the cache memories 220, 10
An all processor search 2300 for extracting all processor numbers registered in 20 is performed. As a result, for example, when the processor 15, the processor 215, and the cache memory 20 of the processor 1015, the cache memory 220, and the cache memory 1020 are registered, the message interpreting / executing unit 50 excludes the processor 15 that is the invalidation request source. 215, processor 10
15, message assembly circuit 55, network 150
0, the message decomposing circuit 235 and the message decomposing circuit 1035, the processor network connecting circuit 225, and the processor network connecting circuit 1025 are requested to invalidate the line. Processor 215
And the processor 1015 invalidates the line from the cache memory 220 and the cache memory 1020, and then the processor network connection circuit 225 and the processor network connection circuit 1025, the message assembly circuit 230 and the message assembly circuit 103.
0, the network 1500, and the message decomposing circuit 60 to notify the message interpreting / executing unit 50 of the invalidation end. Each time the message interpretation / execution unit 50 receives the invalidation completion, it erases the memory of the invalidated processor 215 or the processor 1015 from the line directory 90 of the line, and performs line directory elimination 2500. Processor 1 that requested the invalidate command
After completing the invalidation termination acceptance and line directory deletion 2500 of the processors 215 and 1015 other than 5, the message interpreting / executing unit 50 causes the message assembling circuit 55, the network 1500, the message disassembling circuit 35, and the processor network connecting circuit 25 to operate. Via the end of the invalidation command, the processor 15 is notified of the completion. With the above processing, the processor 15 causes the shared memory 45
This ends the description of the operation for storing in the data.

【００２４】［３］プロセッサ１５が共有メモリ４５に
ラインをフラッシュする場合プロセッサ１５は、フラッシュ要求をプロセッサネット
ワーク接続回路２５に、フラッシュするラインとともに
出力する。プロセッサネットワーク接続回路２５は、分
散メモリマップ２７をチェックすることで、フラッシュ
するラインが、元は共有メモリ４５のラインであること
を判断し、メッセージ組立回路３０、ネットワーク１５
００、メッセージ分解回路６０を介し、ラインとともに
フラッシュコマンドを送る。フラッシュコマンドに対す
るメッセージ解釈／実行部５０の動作を、図６に示し、
以下に説明する。フラッシュコマンドを受けたメッセー
ジ解釈／実行部５０は、そのラインを共有メモリ４５に
書き戻す。さらに、そのラインのラインディレクトリ９
０よりプロセッサ１５の記憶を抹消するラインディレク
トリ抹消２５００を行う。最後に、メッセージ解釈／実
行部５０は、メッセージ組立回路５５、ネットワーク１
５００、メッセージ分解回路３５、プロセッサネットワ
ーク接続回路２５を介し、プロセッサ１５にフラッシュ
コマンド終了を通達する。以上で、プロセッサ１５が共
有メモリ４５のラインをフラッシュする場合の動作の説
明を終了する。[3] When Processor 15 Flushes Line to Shared Memory 45 The processor 15 outputs a flush request to the processor network connection circuit 25 together with the line to be flushed. The processor network connection circuit 25 checks the distributed memory map 27 to determine that the line to be flushed is originally the line of the shared memory 45, and the message assembly circuit 30 and the network 15
00, send a flush command with the line via the message decomposition circuit 60. The operation of the message interpretation / execution unit 50 for the flash command is shown in FIG.
This will be described below. The message interpretation / execution unit 50 that has received the flush command writes the line back to the shared memory 45. In addition, the line directory 9 for that line
The line directory deletion 2500 that deletes the memory of the processor 15 from 0 is performed. Finally, the message interpreter / executor 50 includes a message assembling circuit 55, a network 1
The end of the flash command is notified to the processor 15 via the 500, the message decomposition circuit 35, and the processor network connection circuit 25. This is the end of the description of the operation when the processor 15 flushes the line of the shared memory 45.

【００２５】以下に、ページディレクトリ制御回路６
５、ラインディレクトリ制御回路８５の動作が必要とさ
れるページディレクトリチェック２０００、ページディ
レクトリ記憶２１００、プロセッサ検索２２００、全プ
ロセッサ検索２３００、ラインディレクトリ記憶２４０
０、ラインディレクトリ消去２５００、ロックビットセ
ット、ロックビットリセットの詳細な動作を説明する。Below, the page directory control circuit 6
5, page directory check 2000 that requires the operation of the line directory control circuit 85, page directory storage 2100, processor search 2200, all processor search 2300, line directory storage 240
Detailed operations of 0, line directory deletion 2500, lock bit set, and lock bit reset will be described.

【００２６】＜１＞ページディレクトリチェック２００
０ページディレクトリチェック２０００は、プロセッサ１
５、２１５、１０１５がページディレクトリ７０に記憶
されているかを調べる動作である。例として、プロセッ
サ１５について調べるとする。メッセージ解釈／実行部
５０は、データ線１５７より、ページディレクトリ制御
回路６５内のプロセッサ番号レジスタ１０１に、プロセ
ッサ１５のＩＤ番号を記憶させる。また、メッセージ解
釈／実行部５０は、アドレス線１７３により、ページデ
ィレクトリ７０に調査するページ番号を通達する。調査
するページのページディレクトリ情報は、データ線１７
０を通り、プロセッサ番号レジスタ１１０〜１１４に記
憶される。プロセッサ番号レジスタ１０１の内容と、プ
ロセッサ番号レジスタ１１０〜１１４の内容とが、比較
器１０５〜１０９において比較され、結果がすべて不一
致であった場合には、データ線１４５より不一致がメッ
セージ解釈／実行部５０に通達される。以上により、プ
ロセッサ１５がページディレクトリ７０に記憶されてい
るか調べる動作が終了する。<1> Page directory check 200
0 Page Directory Check 2000 is for Processor 1
This is an operation for checking whether 5, 215 and 1015 are stored in the page directory 70. As an example, let us consider the processor 15. The message interpretation / execution unit 50 causes the processor number register 101 in the page directory control circuit 65 to store the ID number of the processor 15 through the data line 157. Further, the message interpreting / executing unit 50 informs the page directory 70 of the page number to be investigated through the address line 173. The page directory information of the page to be investigated is the data line 17
It goes through 0 and is stored in the processor number registers 110-114. The contents of the processor number register 101 and the contents of the processor number registers 110 to 114 are compared in the comparators 105 to 109, and if the results are all inconsistent, the message line interpreting / executing unit indicates that the inconsistencies are from the data line 145. 50 is notified. With the above, the operation of checking whether the processor 15 is stored in the page directory 70 ends.

【００２７】＜２＞ページディレクトリ記憶２１００ページディレクトリ記憶２１００は、ページディレクト
リ７０より前述のリプレース処理を行うことで、プロセ
ッサ１５、２１５、１０１５を新たに登録する動作であ
る。例として、プロセッサ２１５をページディレクトリ
７０より抹消し、新たにプロセッサ１５を登録させる動
作について説明する。メッセージ解釈／実行部５０は、
アドレス線１７３により、ページディレクトリ７０、ロ
ックビット７５、ページディレクトリポインタ８０に調
査するページ番号を通達する。調査するページの、ペー
ジディレクトリ情報はデータ線１７０を通りプロセッサ
番号レジスタ１１０〜１１４に、ロック情報はデータ線
１７１を通りロック情報レジスタ１２１に、ポインタ情
報はデータ線１７２を通りポインタ情報レジスタ１２２
に記憶される。メッセージ解釈／実行部５０は、制御線
１５３よりビット演算器１２０に、ロック情報レジスタ
１２１のポインタ情報レジスタ１２２が指すビットの、
テストを要求する。ビット演算器１２０は、データ線１
４６を通して、結果をメッセージ解釈／実行部５０に送
る。メッセージ解釈／実行部は、もしテスト結果がロッ
ク状態とわかれば制御線１５６により、インクリメント
回路１２３にインクリメント要求を出す。インクリメン
ト回路１２３は、インクリメント要求があった場合には
ポインタ情報レジスタ１２２の内容を、インクリメント
する。メッセージ解釈／実行部５０は、インクリメント
した場合には、再び制御線１５３よりビット演算器１２
０に、ロック情報レジスタ１２１のポインタ情報レジス
タ１２２が指すビットの、テストを要求する。以上の動
作を、テスト結果がアンロック状態となるまで繰り返
す。ｍ回繰り返しても、アンロック状態がない場合に
は、メッセージ解釈／実行部５０は異常終了となる。<2> Page Directory Storage 2100 The page directory storage 2100 is an operation for newly registering the processors 15, 215, and 1015 by performing the above-mentioned replacement processing from the page directory 70. As an example, the operation of deleting the processor 215 from the page directory 70 and newly registering the processor 15 will be described. The message interpretation / execution unit 50
The address line 173 informs the page directory 70, the lock bit 75, and the page directory pointer 80 of the page number to be examined. The page directory information of the page to be investigated passes through the data line 170 to the processor number registers 110 to 114, the lock information passes through the data line 171 to the lock information register 121, and the pointer information passes through the data line 172 and the pointer information register 122.
Memorized in. The message interpreter / executor 50 sends the bit calculator 120 through the control line 153 to the bit calculator 120 of the bit indicated by the pointer information register 122 of the lock information register 121.
Request a test. The bit calculator 120 uses the data line 1
The result is sent to the message interpreter / executor 50 via 46. If the message interpretation / execution unit finds that the test result is in the locked state, the message interpretation / execution unit issues an increment request to the increment circuit 123 via the control line 156. The increment circuit 123 increments the content of the pointer information register 122 when an increment request is made. When the message interpreter / executor 50 increments, the message interpreter / executor 50 sends the bit calculator 12 from the control line 153 again.
0 is requested to test the bit indicated by the pointer information register 122 of the lock information register 121. The above operation is repeated until the test result is unlocked. If there is no unlock state even after repeating m times, the message interpretation / execution unit 50 terminates abnormally.

【００２８】アンロック状態がプロセッサ２１５につい
て見つかると、メッセージ解釈／実行部５０は、制御線
１５４により、セレクタ１１８の出力をポインタ情報レ
ジスタ１２２側のデータとする。セレクタ１１８の出力
により、マルチプレクサ１１７は、プロセッサ番号レジ
スタ１１０〜１１４のうち、ポインタ情報レジスタ１２
２の指すものの内容、つまりリプレース処理で抹消する
べきプロセッサ２１５のＩＤ番号を出力する。この出力
は、データ線１４０よりメッセージ解釈／実行部５０に
送られる。これによって、メッセージ解釈／実行部５０
は、リプレース処理で抹消すべきプロセッサのＩＤ番号
を入手する。When the unlocked state is found for the processor 215, the message interpreter / executor 50 sets the output of the selector 118 to the data on the side of the pointer information register 122 through the control line 154. Based on the output of the selector 118, the multiplexer 117 causes the pointer information register 12 out of the processor number registers 110 to 114.
The content indicated by 2, that is, the ID number of the processor 215 to be deleted in the replacement process is output. This output is sent to the message interpreter / executor 50 via the data line 140. As a result, the message interpretation / execution unit 50
Obtains the ID number of the processor to be deleted in the replacement process.

【００２９】次に、メッセージ解釈／実行部５０は、制
御線１８７を使い、ラインディレクトリ制御回路８５の
セレクタ１７６を、ライン番号発生器１７５側に切り替
える。また、メッセージ解釈／実行部５０は、アドレス
線１８６より、ページ番号をミキサ１７８に送る。さら
に、メッセージ解釈／実行部５０は、制御線１８８によ
り、ライン番号発生器１７５を起動する。Next, the message interpreting / executing unit 50 uses the control line 187 to switch the selector 176 of the line directory control circuit 85 to the line number generator 175 side. Further, the message interpreting / executing unit 50 sends the page number to the mixer 178 via the address line 186. Further, the message interpreting / executing unit 50 activates the line number generator 175 through the control line 188.

【００３０】ライン番号発生器１７５よりでたライン番
号は、セレクタ１７６を通り、ミキサ１７８でページ番
号とミクスされ、アドレス線１９５よりラインディレク
トリ９０に出力される。ラインディレクトリ９０の情報
は、ライン情報レジスタ１７９に入る。ライン番号発生
器１７５は、ビット演算器１８０にテスト要求信号を入
力する。ここで、ビット演算器１８０には、データ線１
６１を通り、ページディレクトリ制御回路６５のポイン
タ情報レジスタ１２２の内容、つまり、リプレースする
プロセッサ２１５のページディレクトリ７０における位
置が入力されている。これにより、ビット演算器は、リ
プレースすべきプロセッサ２１５が、ライン番号発生器
１７５が出力するラインのラインディレクトリ９０に記
憶されているかどうかを調べることになる。調べた結
果、記憶されていた場合には、ライン番号ラッチ１７７
に、出力要求信号を送る。出力要求信号を受けたライン
番号ラッチ１７７は、セレクタ１７６の出力を、データ
線１８１より、メッセージ解釈／実行部５０に出力す
る。ライン番号発生器１７５は、ライン番号が一巡する
まで、新たなライン番号を発生させ、ビット演算器１８
０にテスト要求信号を出力する。これにより、メッセー
ジ解釈／実行部５０は、リプレース処理で抹消すべきプ
ロセッサ２１５がキャッシュメモリ２２０に登録してい
た、そのページのラインの番号全てを入手する。The line number output from the line number generator 175 passes through the selector 176, is mixed with the page number in the mixer 178, and is output to the line directory 90 from the address line 195. Information in the line directory 90 enters the line information register 179. The line number generator 175 inputs the test request signal to the bit calculator 180. Here, the bit line 180 is connected to the data line 1
Through 61, the contents of the pointer information register 122 of the page directory control circuit 65, that is, the position of the processor 215 to be replaced in the page directory 70 is input. As a result, the bit calculator checks whether or not the processor 215 to be replaced is stored in the line directory 90 of the line output by the line number generator 175. As a result of the check, if it is stored, the line number latch 177
To the output request signal. Upon receiving the output request signal, the line number latch 177 outputs the output of the selector 176 to the message interpretation / execution unit 50 via the data line 181. The line number generator 175 generates a new line number until the line number makes a round, and the bit calculator 18
The test request signal is output to 0. As a result, the message interpretation / execution unit 50 obtains all the line numbers of the page registered in the cache memory 220 by the processor 215 to be deleted in the replacement process.

【００３１】リプレース処理で抹消すべきプロセッサ２
１５のＩＤ番号、およびャッシュメモリ２２０に登録し
ていたそのページのラインの番号全てを入手した後、メ
ッセージ解釈／実行部５０は、メッセージ組立回路５
５、ネットワーク１５００、メッセージ分解回路２３
５、プロセッサネットワーク接続回路２２５を介し、プ
ロセッサ２１５にそのラインのフラッシュを要求する。
プロセッサ２１５より、プロセッサネットワーク接続回
路２２５、メッセージ組立回路２３０、ネットワーク１
５００、メッセージ分解回路６０を介し、メッセージ解
釈／実行部５０までフラッシュデータが送出されてきた
ら、メッセージ解釈／実行部５０はそのデータを共有メ
モリ４５に書き戻す。Processor 2 to be erased by the replacement process
After obtaining all the ID numbers of 15 and the line numbers of the page registered in the cache memory 220, the message interpreting / executing unit 50 causes the message assembling circuit 5 to execute.
5, network 1500, message decomposition circuit 23
5. Requests the processor 215 to flush the line via the processor network connection circuit 225.
From processor 215, processor network connection circuit 225, message assembly circuit 230, network 1
When the flash data is sent to the message interpretation / execution unit 50 via the message decomposition circuit 60, the message interpretation / execution unit 50 writes the data back to the shared memory 45.

【００３２】その後、データ線１５７を使い、リプレー
スすべきプロセッサ２１５のＩＤ番号を、ページディレ
クトリ制御回路６５のプロセッサ番号レジスタ１０１に
入力する。プロセッサ番号レジスタ１１０〜１１４には
そのページのページディレクトリ７０の情報が入ってい
るので、比較器１０５〜１０９により比較されること
で、データ線１６０には、リプレースすべきプロセッサ
２１５の、ページディレクトリ７０における位置が出力
される。ラインディレクトリ制御回路８５のビット演算
器１８０には、データ線１６０により、リプレースすべ
きプロセッサ２１５の、ページディレクトリ７０におけ
る位置が入力される。この状態で、アドレス線１８５の
信号が、セレクタ１７６より出力されるように、メッセ
ージ解釈／実行部５０より制御信号１８７を出す。Then, using the data line 157, the ID number of the processor 215 to be replaced is input to the processor number register 101 of the page directory control circuit 65. Since the processor number registers 110 to 114 store the information of the page directory 70 of the page, the data lines 160 are compared by the comparators 105 to 109, and the data line 160 has the page directory 70 of the processor 215 to be replaced. The position at is output. The position of the processor 215 to be replaced in the page directory 70 is input to the bit calculator 180 of the line directory control circuit 85 by the data line 160. In this state, the message interpreting / executing unit 50 issues a control signal 187 so that the signal on the address line 185 is output from the selector 176.

【００３３】更に、メッセージ解釈／実行部５０は、フ
ラッシュされたライン番号を、アドレス線１８５、セレ
クタ１７６を通し、ミキサ１７８に送る。アドレス線１
８６にはそのページ番号が入力されているので、ライン
ディレクトリ９０には、アドレス線１９５よりフラッシ
ュの終了したライン番号が入力される。ラインディレク
トリ９０からは、そのラインの情報が、データ線１９７
を通り、ラインディレクトリレジスタ１７９に入力され
る。ここで、メッセージ解釈／実行部５０から制御線１
９０で、リセット要求を出力する。ビット演算器１８０
には、データ線１６０より、リプレースすべきプロセッ
サ２１５の、ページディレクトリ７０における位置が入
力されているので、プロセッサ２１５の記憶がラインデ
ィレクトリレジスタ１７９より抹消される。抹消された
データをラインディレクトリ９０に、データ線１９７を
通し、書き戻す。メッセージ解釈／実行部５０は、フラ
ッシュされたすべてのラインについて、上記の抹消処理
を繰り返す。Further, the message interpreting / executing unit 50 sends the flushed line number to the mixer 178 through the address line 185 and the selector 176. Address line 1
Since the page number is input to 86, the line number for which the flash is completed is input to the line directory 90 from the address line 195. From the line directory 90, the information of the line is the data line 197.
And is input to the line directory register 179. Here, from the message interpretation / execution unit 50 to the control line 1
At 90, a reset request is output. Bit calculator 180
Since the position of the processor 215 to be replaced in the page directory 70 is input to the data line 160, the memory of the processor 215 is deleted from the line directory register 179. The erased data is written back to the line directory 90 through the data line 197. The message interpretation / execution unit 50 repeats the erasure process described above for all the flushed lines.

【００３４】ラインディレクトリ９０からのプロセッサ
２１５の抹消処理の終了後、メッセージ解釈／実行部５
０は、新たにページディレクトリ７０に記憶したいプロ
セッサ１５のＩＤ番号を、データ線１５７を通して、ペ
ージディレクトリ制御回路のプロセッサ番号レジスタ１
０１に入れる。ここで、ポインタ情報レジスタ１２２
は、プロセッサ２１５の記憶されているプロセッサ番号
レジスタ１１０〜１１４を指している。この状態で、メ
ッセージ解釈／実行部が、制御信号１５０により、デマ
ルチプレクサ１１９に出力要求信号を出す。デマルチプ
レクサは、プロセッサ番号レジスタ１０１に記憶されて
いるプロセッサ１５のＩＤ番号を、プロセッサ２１５が
記憶されていたプロセッサ番号レジスタ１１０〜１１４
に出力する。プロセッサ番号レジスタ１１０〜１１４
を、データ線１７０を通して、ページディレクトリ７０
に書き戻す。以上により、新たにプロセッサ１５を、ペ
ージディレクトリ７０に記憶する動作が終了する。After the processing of deleting the processor 215 from the line directory 90 is completed, the message interpreting / executing unit 5
0 indicates the ID number of the processor 15 to be newly stored in the page directory 70 through the data line 157 and the processor number register 1 of the page directory control circuit.
Put it in 01. Here, the pointer information register 122
Indicates the processor number registers 110 to 114 stored in the processor 215. In this state, the message interpreting / executing unit issues an output request signal to the demultiplexer 119 by the control signal 150. The demultiplexer uses the ID number of the processor 15 stored in the processor number register 101 and the processor number registers 110 to 114 in which the processor 215 was stored.
Output to. Processor number registers 110-114
To the page directory 70 through the data line 170.
Write back to. As described above, the operation of newly storing the processor 15 in the page directory 70 ends.

【００３５】＜３＞プロセッサ検索２２００、全プロ
セッサ検索２３００プロセッサ検索２２００は、プロセッサ１５、２１５、
１０１５のうち、あるラインをキャッシュメモリ２０、
２２０、１０２０に登録しているものを、１つだけ調べ
る動作であり、全プロセッサ検索２３００は全て調べる
動作である。メッセージ解釈／実行部５０は、アドレス
線１７３により、ページディレクトリ７０に調査するペ
ージ番号を通達する。調査するページのページディレク
トリ情報は、データ線１７０を通り、プロセッサ番号レ
ジスタ１１０〜１１４に記憶される。次に、メッセージ
解釈／実行部５０は、制御線１８７を使い、ラインディ
レクトリ制御回路８５のセレクタ１７６を、アドレス線
１８５側に切り替える。また、メッセージ解釈／実行部
５０は、アドレス線１８６より、ページ番号をミキサ１
７８に送る。また、メッセージ解釈／実行部５０は、調
査したいライン番号を、アドレス線１８５、セレクタ１
７６を通し、ミキサ１７８に送る。ミキサ１７８でペー
ジ番号とライン番号がミクスされ、アドレス線１９５を
通りラインディレクトリ９０に出力される。<3> Processor search 2200, all processor search 2300 The processor search 2200 includes the processors 15, 215,
1015, a certain line is set to the cache memory 20,
Only one operation registered in 220, 1020 is checked, and all processor search 2300 is checked. The message interpreter / executor 50 notifies the page directory 70 of the page number to be examined through the address line 173. The page directory information of the page to be examined is stored in the processor number registers 110 to 114 through the data line 170. Next, the message interpretation / execution unit 50 uses the control line 187 to switch the selector 176 of the line directory control circuit 85 to the address line 185 side. In addition, the message interpreting / executing unit 50 sends the page number from the address line 186 to the mixer 1
Send to 78. Also, the message interpreting / executing unit 50 determines the line number to be investigated by the address line 185, the selector 1
It is sent to the mixer 178 through 76. The page number and the line number are mixed by the mixer 178 and output to the line directory 90 through the address line 195.

【００３６】ラインディレクトリ９０の情報は、ライン
情報レジスタ１７９に入る。ページディレクトリ制御回
路６５のプライオリティエンコーダ１２４には、ライン
情報レジスタ１７９の情報が入力される。プライオリテ
ィエンコーダ１２４は、ライン情報に記憶されているペ
ージディレクトリ上のプロセッサ１５、２１５、１０１
５の位置情報のうち、１つをセレクタ１１８に出力す
る。メッセージ解釈／実行部５０は、制御線１５５を使
い、セレクタ１１８をプライオリティエンコーダ１２４
側に切り換える。これにより、プロセッサ番号レジスタ
１１０〜１１４に記憶されているプロセッサＩＤ番号の
うち、プライオリティエンコーダ１２４の選択した位置
のプロセッサ番号が、マルチプレクサ１１７によりメッ
セージ解釈／実行部５０に送られる。全てのプロセッサ
番号が必要な全プロセッサ検索２３００の場合には、プ
ライオリティエンコーダ１２４への制御信号１５５に、
メッセージ解釈／実行部５０よりプライオリティ変更要
求を出す。プライオリィティを一巡変更すれば、メッセ
ージ解釈／実行部５０は全プロセッサを入手できる。以
上により、あるラインをキャッシュメモリ２０、２２
０、１０２０に登録しているものを、１つまたは全て調
べる動作が終了する。The information in the line directory 90 enters the line information register 179. The information of the line information register 179 is input to the priority encoder 124 of the page directory control circuit 65. The priority encoder 124 uses the processor 15, 215, 101 on the page directory stored in the line information.
One of the five pieces of position information is output to the selector 118. The message interpreting / executing unit 50 uses the control line 155 to set the selector 118 to the priority encoder 124.
Switch to the side. As a result, of the processor ID numbers stored in the processor number registers 110 to 114, the processor number at the position selected by the priority encoder 124 is sent to the message interpretation / execution unit 50 by the multiplexer 117. In the case of the all processor search 2300 that requires all the processor numbers, the control signal 155 to the priority encoder 124 includes
The message interpreting / executing unit 50 issues a priority change request. If the priority is changed once, the message interpreter / executor 50 can obtain all the processors. By the above, a certain line is set to the cache memory 20, 22
The operation of checking one or all of those registered in 0, 1020 is completed.

【００３７】＜４＞ラインディレクトリ記憶２４００、
ラインディレクトリ抹消２５００ラインディレクトリ記憶２４００とは、あるラインのラ
インディレクトリ９０より、プロセッサ１５、２１５、
１０１５のいずれかを記憶する動作であり、ラインディ
レクトリ抹消２５００とは記憶を抹消する動作である。
メッセージ解釈／実行部５０は、データ線１５７より、
ページディレクトリ制御回路６５内のプロセッサ番号レ
ジスタ１０１に、プロセッサ１５のＩＤ番号を記憶させ
る。また、メッセージ解釈／実行部５０は、アドレス線
１７３により、ページディレクトリ７０に調査するペー
ジ番号を通達する。調査するページのページディレクト
リ情報は、データ線１７０を通り、プロセッサ番号レジ
スタ１１０〜１１４に記憶される。プロセッサ番号レジ
スタ１０１の内容と、プロセッサ番号レジスタ１１０〜
１１４の内容とが、比較器１０５〜１０９において比較
され、結果がすべて不一致であった場合には、データ線
１４５より不一致がメッセージ解釈／実行部５０に通達
され、異常終了となる。一致があった場合には、データ
線１６０より、記憶／抹消すべきプロセッサ１５、２１
５、１０１５の記憶／抹消位置が、ラインディレクトリ
制御回路８５のビット演算器１８０に入力される。次
に、メッセージ解釈／実行部５０は、制御線１８７を使
い、ラインディレクトリ制御回路８５のセレクタ１７６
を、アドレス線１８５側に切り替える。また、メッセー
ジ解釈／実行部５０は、アドレス線１８６より、ページ
番号をミキサ１７８に送る。また、メッセージ解釈／実
行部５０は、記憶／抹消したいライン番号を、アドレス
線１８５、セレクタ１７６を通し、ミキサ１７８に送
る。ミキサ１７８でページ番号とライン番号がミクスさ
れ、アドレス線１９５を通りラインディレクトリ９０に
出力される。ラインディレクトリ９０の情報は、ライン
情報レジスタ１７９に入る。この状態で、メッセージ解
釈／実行部５０よりビット演算器１８０まで、制御信号
１８９により記憶要求を出すことで記憶が、制御信号１
９０により抹消要求を出すことで抹消が実行される。最
後に、ライン情報レジスタ１７９より、ラインディレク
トリ９０に情報を書き戻す。以上により、あるラインの
ラインディレクトリ９０より、プロセッサ１５、２１
５、１０１５のいずれかを記憶／抹消する動作が終了す
る。<4> Line directory storage 2400,
Line directory deletion 2500 The line directory storage 2400 refers to the line directory 90 of a certain line from the processors 15, 215,
1015 is an operation of storing any of them, and the line directory deletion 2500 is an operation of deleting the storage.
From the data line 157, the message interpretation / execution unit 50
The ID number of the processor 15 is stored in the processor number register 101 in the page directory control circuit 65. Further, the message interpreting / executing unit 50 informs the page directory 70 of the page number to be investigated through the address line 173. The page directory information of the page to be examined is stored in the processor number registers 110 to 114 through the data line 170. Contents of processor number register 101 and processor number registers 110-110
The contents of 114 are compared with each other in the comparators 105 to 109, and if the results are all inconsistent, the inconsistency is notified to the message interpretation / execution unit 50 from the data line 145, resulting in abnormal termination. If there is a match, the processor 15 or 21 to be stored / erased from the data line 160.
The storage / erasure positions of 5, 1015 are input to the bit calculator 180 of the line directory control circuit 85. Next, the message interpretation / execution unit 50 uses the control line 187 to select the selector 176 of the line directory control circuit 85.
To the address line 185 side. Further, the message interpreting / executing unit 50 sends the page number to the mixer 178 via the address line 186. The message interpreting / executing unit 50 also sends the line number to be stored / deleted to the mixer 178 through the address line 185 and the selector 176. The page number and the line number are mixed by the mixer 178 and output to the line directory 90 through the address line 195. Information in the line directory 90 enters the line information register 179. In this state, the message interpretation / execution unit 50 sends a storage request by the control signal 189 to the bit arithmetic unit 180 to store the control signal 1
Erasure is executed by issuing a erasure request by 90. Finally, the line information register 179 writes the information back to the line directory 90. As described above, the line directories 90 of a certain line allow the processors 15, 21
The operation of storing / deleting any one of 5, 1015 ends.

【００３８】＜５＞ロックビットセット、ロックビット
リセットロックビットセットとは、あるページのロックビット７
５を特定のプロセッサ１５、２１５、１０１５について
ロック状態にする動作であり、ロックビットリセットと
はアンロック状態にする動作である。メッセージ解釈／
実行部５０は、データ線１５７より、ページディレクト
リ制御回路６５内のプロセッサ番号レジスタ１０１に、
プロセッサ１５のＩＤ番号を記憶させる。また、メッセ
ージ解釈／実行部５０は、アドレス線１７３により、ペ
ージディレクトリ７０、ロックビット７５にページ番号
を通達する。ページディレクトリ情報はデータ線１７０
を通りプロセッサ番号レジスタ１１０〜１１４に、ロッ
ク情報はデータ線１７１を通りロック情報レジスタ１２
１に記憶される。プロセッサ番号レジスタ１０１の内容
と、プロセッサ番号レジスタ１１０〜１１４の内容と
が、比較器１０５〜１０９において比較され、結果がす
べて不一致であった場合には、データ線１４５より不一
致がメッセージ解釈／実行部５０に通達され、異常終了
となる。一致があった場合には、記憶／抹消すべきプロ
セッサ１５、２１５、１０１５の記憶／抹消位置が、比
較器１０５〜１０９より、ページディレクトリ制御回路
６５のビット演算器１２０に入力される。この状態で、
メッセージ解釈／実行部５０よりビット演算器１２０ま
で、制御信号１５１によりロック要求を出すことでロッ
クが、制御信号１５２により抹消要求を出すことで抹消
が実行される。最後に、ロック情報レジスタ１２１よ
り、ロックビット７５に情報を書き戻す。以上により、
あるページのロックビット７５を特定のプロセッサ１
５、２１５、１０１５についてロック／アンロック状態
にする動作が終了する。<5> Lock Bit Set, Lock Bit Reset The lock bit set is the lock bit 7 of a page.
5 is an operation for putting the processor 5 into a locked state with respect to a specific processor 15, 215, 1015, and a lock bit reset is an operation for bringing it into an unlocked state. Message interpretation /
The execution unit 50 transfers the data line 157 to the processor number register 101 in the page directory control circuit 65,
The ID number of the processor 15 is stored. The message interpreting / executing unit 50 also notifies the page directory 70 and the lock bit 75 of the page number through the address line 173. Page directory information is data line 170
To the processor number registers 110 to 114, and the lock information passes through the data line 171 to the lock information register 12
Stored in 1. The contents of the processor number register 101 and the contents of the processor number registers 110 to 114 are compared in the comparators 105 to 109, and if the results are all inconsistent, the message line interpreting / executing unit indicates that the inconsistencies are from the data line 145. It is notified to 50 and ends abnormally. If there is a match, the storage / erasure positions of the processors 15, 215, 1015 to be stored / erased are input to the bit arithmetic unit 120 of the page directory control circuit 65 from the comparators 105 to 109. In this state,
From the message interpreting / executing unit 50 to the bit arithmetic unit 120, a lock request is issued by the control signal 151, and a lock is issued by issuing a delete request by the control signal 152. Finally, the information is written back to the lock bit 75 from the lock information register 121. From the above,
Lock bit 75 of a page to a specific processor 1
The operation of setting the locked / unlocked state for 5, 215, and 1015 ends.

【００３９】以上により、本発明方式による、プロセッ
サ１５、２１５、１０１５によるロード、ストア、フラ
ッシュ実行時のメモリ制御機構の動作の説明を終了す
る。以上述べた回路、制御方法により、ページディレク
トリ７０、ロックビット７５、ページディレクトリポイ
ンタ８０、ラインディレクトリ９０を持つ、共有メモリ
型並列計算機が実現する。This is the end of the description of the operation of the memory control mechanism at the time of loading, storing, and flushing by the processors 15, 215, and 1015 according to the method of the present invention. With the circuit and control method described above, a shared memory parallel computer having the page directory 70, the lock bit 75, the page directory pointer 80, and the line directory 90 is realized.

【００４０】[0040]

【発明の効果】本発明によれば、共有メモリに設けられ
た、キャッシュメモリの一致制御を行うディレクトリの
容量を、従来に比べ大幅に低減することができる。According to the present invention, the capacity of the directory provided in the shared memory for performing the cache memory matching control can be significantly reduced as compared with the conventional one.

[Brief description of drawings]

【図１】実施例の共有メモリ機構を持った並列計算機の
構成を示す図である。FIG. 1 is a diagram showing a configuration of a parallel computer having a shared memory mechanism according to an embodiment.

【図２】実施例の共有メモリ機構におけるページディレ
クトリ制御回路を示す図である。FIG. 2 is a diagram showing a page directory control circuit in the shared memory mechanism of the embodiment.

【図３】実施例の共有メモリ機構におけるラインディレ
クトリ制御回路を示す図である。FIG. 3 is a diagram showing a line directory control circuit in the shared memory mechanism of the embodiment.

【図４】実施例の共有メモリ機構におけるメッセージ解
釈／実行部のロードコマンドに対する処理のフローチャ
ートを示す図である。FIG. 4 is a diagram showing a flowchart of processing for a load command of a message interpreting / executing unit in the shared memory mechanism of the embodiment.

【図５】実施例の共有メモリ機構におけるメッセージ解
釈／実行部のインバリデートコマンドに対する処理のフ
ローチャートを示す図である。FIG. 5 is a diagram showing a flowchart of processing for an invalidate command of a message interpretation / execution unit in the shared memory mechanism of the embodiment.

【図６】実施例の共有メモリ機構におけるメッセージ解
釈／実行部のフラッシュコマンドに対する処理のフロー
チャートを示す図である。FIG. 6 is a diagram showing a flowchart of processing for a flash command of a message interpretation / execution unit in the shared memory mechanism of the embodiment.

【図７】並列計算機のキャッシュメモリにおけるライン
の記憶状態を示す図である。FIG. 7 is a diagram showing a storage state of a line in a cache memory of a parallel computer.

【図８】リミテッドポインタ方式のディレクトリを示す
図である。FIG. 8 is a diagram showing a directory of a limited pointer type.

【図９】実施例の共有メモリ機構を持った並列計算機の
ディレクトリを示す図である。FIG. 9 is a diagram showing a directory of a parallel computer having a shared memory mechanism of the embodiment.

[Explanation of symbols]

０、２００、１０００システムユニット１０、２１０、１０１０プロセッサノード１５、２１５、１０１５プロセッサ２０、２２０、１０２０キャッシュメモリ２５、２２５、１０２５プロセッサネットワーク接続
回路２７分散メモリマップ３０、５５、２３０、１０３０メッセージ組立回路３５、６０、２３５、１０３５メッセージ分解回路４０、２４０、１０４０メモリノード４５、２４５、１０４５共有メモリ５０メッセージ解釈／実行部６５ページディレクトリ制御回路７０ページディレクトリ７５ロックビット８０ページディレクトリポインタ８５ラインディレクトリ制御回路９０ラインディレクトリ９２フルマップディレクトリ９４リミテッドポインタディレクトリ９６ディレクトリバリッドビット１０１プロセッサ番号レジスタ１０５〜１０９比較器１１０〜１１４プロセッサ番号レジスタ１１７マルチプレクサ１１８セレクタ１１９デマルチプレクサ１２０ビット演算器１２１ロック情報レジスタ１２２ポインタ情報レジスタ１２３インクリメント回路１２４プライオリティ・エンコーダ１７５ライン番号発生器１７６セレクタ１７７ライン番号ラッチ１７８ミキサ１７９ライン情報レジスタ１８０ビット演算器１５００ネットワーク0, 200, 1000 system unit 10, 210, 1010 processor node 15, 215, 1015 processor 20, 220, 1020 cache memory 25, 225, 1025 processor network connection circuit 27 distributed memory map 30, 55, 230, 1030 message assembly circuit 35, 60, 235, 1035 Message decomposition circuit 40, 240, 1040 Memory node 45, 245, 1045 Shared memory 50 Message interpretation / execution unit 65 Page directory control circuit 70 Page directory 75 Lock bit 80 Page directory pointer 85 Line directory control circuit 90 line directory 92 full map directory 94 limited pointer directory 96 directory valid bit 101 processor number register 105-109 comparator 110-114 processor number register 117 multiplexer 118 selector 119 demultiplexer 120 bit calculator 121 lock information register 122 pointer information register 123 increment circuit 124 priority encoder 175 line number generator 176 selector 177 lines Number Latch 178 Mixer 179 Line Information Register 180-bit Operation Unit 1500 Network

Claims

[Claims]

1. A parallel computer comprising a plurality of processors and a plurality of distributed shared memories accessible by the respective processors, each processor comprising a cache memory for registering data of the shared memory in line units, Is prepared for each page of the shared memory, and is provided for each line of the shared memory and a page directory for storing a processor in which some or all lines of the page are registered in the cache memory, and the line is cached. A parallel computer, comprising: a line directory that stores a position on a page directory of a processor registered in a memory in a bitmap format.

2. A parallel computer comprising a plurality of processors and a plurality of distributed shared memories accessible by the respective processors, each processor comprising a cache memory for registering data of the shared memory in line units, Is prepared for each page of the shared memory, and is provided for each line of the shared memory and a page directory for storing a processor in which some or all lines of the page are registered in the cache memory, and the line is cached. A parallel computer, comprising: a line directory that stores, in a pointer format, a position on a page directory of a processor registered in a memory.

3. The parallel computer according to claim 1, wherein the plurality of shared memories and the plurality of processors are connected via a network.

4. The parallel computer according to claim 1, wherein the shared memory and the processor constitute a node, and a plurality of the nodes are connected by a network.

5. The parallel computer according to claim 1, wherein, for each page directory, a page directory pointer that points to a processor having the earliest stored time among the processors stored in the page directory. A parallel computer comprising:

6. The parallel computer according to claim 1, further comprising a lock bit for each processor stored in the page directory, the lock bit indicating whether or not the processor can delete the memory from the page directory. A characteristic parallel computer.