JP2001222517A

JP2001222517A - Distributed shared memory multiplex processor system having direction-separated duplex ring structure

Info

Publication number: JP2001222517A
Application number: JP2000111165A
Authority: JP
Inventors: Seitai Cho; 星泰張; Shushoku Zen; 洲植全; Meichu Kin; 明柱金
Original assignee: Individual
Current assignee: Individual
Priority date: 2000-02-11
Filing date: 2000-04-12
Publication date: 2001-08-17
Also published as: KR20010081224A; KR100319708B1

Abstract

PROBLEM TO BE SOLVED: To provide a distributed shared memory multiplex processor system having direction-separated duplex ring structure while using a snooping system. SOLUTION: There are plural processor nodes arrayed in the shape of ring, any one of plural processor nodes generates a request signal to one data block and the remaining processor nodes snoop their own internal elements. Thus, there are plural processor nodes for supplying a data block from any one of the remaining processor nodes and there is a direction-separated duplex ring structure for supplying two opposite routes along with first and second ring paths while including the first and second ring paths by snooping the internal element of the remaining processor nodes. Then, this system is provided with the direction-separated duplex ring structure, to which the plural processor nodes are connected through the first and second routes, for performing the multi-address communication of the request signal through any one of routes to each of the remaining processor nodes and performing the single communication of the data block through any one of routes to the processor node, which generates the request signal.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、分散共有メモリ多
重プロセッサシステムに関し、より詳しくは方向分離二
重リング構造を備えた分散共有メモリ多重プロセッサシ
ステムに関する。The present invention relates to a distributed shared memory multiprocessor system, and more particularly, to a distributed shared memory multiprocessor system having a direction-separated dual ring structure.

【０００２】[0002]

【従来の技術】一般的に、多重プロセッサシステムは明
示的なメッセージ伝送(message-passing)を用いてプロ
セッサ間の通信を実現する分散メモリ構造(distributed
-memoryarchitecture)と、メモリを共有して単一システ
ムイメージを供給する共有メモリ構造(shared-memory a
rchitecture)に分けることができるが、現在共有メモリ
多重プロセッサシステムの相互連結網の中で、もっとも
大衆的な技術は共有バスである。共有バスは実現上の複
雑度が低く、且つ低費用であるということで広く使用さ
れているが、性能が速やかに向上しているプロセッサの
速度に追いつくことができないという短所を有する。ま
た、バスの物理的な特性による拡張性やと、バス使用量
の増加によるバス帯域幅(bandwidth)に不具合がある。2. Description of the Related Art In general, a multi-processor system uses a distributed memory structure that implements communication between processors using explicit message-passing.
-memoryarchitecture) and a shared memory structure (shared-memory a
At present, the most popular technology in the interconnection network of the shared memory multiprocessor system is the shared bus. Shared buses are widely used because of their low implementation complexity and low cost, but have the disadvantage that they cannot keep up with the speed of processors whose performance is improving rapidly. In addition, there are problems in scalability due to physical characteristics of the bus and bus bandwidth due to an increase in bus usage.

【０００３】このようなバス構造の限界を克服するため
にいろいろ試してきた結果、単方向地点間リンクを用い
たIEEE SCIが標準として確定された。最大四つのプロ
セッサがスヌーピッグ方式のバスによりUMA(Uniform M
emory Access)の形で設けられたプロセッサノードをSC
Iリンクを用いて単方向リング構造で接続し、ディレク
トリ方式のキャッシュプロトコルを利用して実現された
常用化システムが開示されている。(Tom LovettとRuss
ell Clappの“STiNG：A CC-NUMA ComputerSystem f
or the Commercial Marketplace”(In Proceedings
of the23th International Symposium on Compu
ter Architecture, pp.308-317,May 1996を参照) 上述したシステムをさらに改善したもので、図1及び図2
に示すように、最大四つのプロセッサがスヌーピング方
式のバスによりUMAの形で設けられたプロセッサノード
をSCIリンクを用いて単方向リング構造で接続したシス
テムをスヌーピング方式のキャッシュプロトコルを用い
て具現した多重プロセッサシステムが張星泰と全洲植と
鄭盛宇の“PANDA: Ring-Based Multiprocessor Syst
em using New Snooping Protocol”（In The Pro
ceeding of ICPADS 1998, pp.10-17, Dec.1998）(19
98年8月7日、日本国出願された特許出願第224423/98号
を参照)に開示されている。[0003] As a result of various attempts to overcome the limitations of such a bus structure, IEEE SCI using a unidirectional point-to-point link has been established as a standard. Up to four processors are connected by UMA (Uniform M
emory Access) in the form of SC
A general-purpose system has been disclosed which is connected by a unidirectional ring structure using an I-link and realized using a directory-based cache protocol. (Tom Lovett and Russ
ell Clapp's “STiNG: A CC-NUMA ComputerSystem f
or the Commercial Marketplace ”(In Proceedings
of the23th International Symposium on Compu
ter Architecture, pp. 308-317, May 1996).
As shown in the figure, a multiplexing system in which a processor in which up to four processors are connected in a unidirectional ring structure using SCI links with processor nodes provided in the form of UMA by a snooping bus is implemented using a snooping cache protocol. Processor system is "PANDA: Ring-Based Multiprocessor Syst"
em using New Snooping Protocol ”(In The Pro
ceeding of ICPADS 1998, pp.10-17, Dec.1998) (19
(See Japanese Patent Application No. 224423/98 filed on Aug. 7, 1998).

【０００４】しかし、プロセッサとローカルバスのクラ
ック速度が向上しつつあることに従って、このような高
性能のプロセッサとローカルバスを採択した前記分散共
有メモリ多重プロセッサシステムは単一地点間リンク帯
域幅及びシステム拡張が必要となった。それにより、既
存システムで単一地点間リンク帯域幅を拡張するため
に、単に2倍の帯域幅を有するリンクを使用する方法も
あるが、現在２倍増加された帯域幅を有する新しいリン
クの開発、及びこれを短時間内に常用化された製品に適
用するのには実際に難しい。However, as the cracking speed of the processor and the local bus is increasing, the distributed shared memory multiprocessor system adopting such a high performance processor and the local bus is not suitable for single point link bandwidth and system. Extension needed. Therefore, in order to extend the point-to-point link bandwidth in the existing system, there is a method of simply using a link having twice the bandwidth, but the development of a new link having a bandwidth which is now doubled is developed. And applying it to commercialized products in a short time is actually difficult.

【０００５】[0005]

【発明が解決しようとする課題】従って、本発明の主な
目的は、スヌーピング方式を用いてシステム内のプロセ
ッサノード間のキャッシュ一貫性を維持しながら、且つ
性能を向上するために方向分離二重リング構造を有する
分散共有メモリ多重プロセッサシステムを提供すること
にある。SUMMARY OF THE INVENTION Accordingly, it is a primary object of the present invention to provide a direction-separating dual to improve performance while maintaining cache coherency between processor nodes in a system using a snooping scheme. An object of the present invention is to provide a distributed shared memory multiprocessor system having a ring structure.

【０００６】[0006]

【課題を解決するための手段】上記の目的を達成するた
めに、本発明によれば、リング状で配列されている複数
のプロセッサノードがあり、前記複数のプロセッサノー
ド中のいずれかが１つのデータブロックに対する要求信
号を発生し、残余プロセッサノードが自分の内部要素を
スヌーピングすることにより、前記残余プロセッサノー
ド中のいずれかが前記データブロックを供給する前記複
数のプロセッサノードと、第1及び第２リングバスを含
み、前記第1及び第2リングバスに沿って2つの反対経路
を供給する方向分離二重リング構造があり、前記第1及
び第2経路を介して前記複数のプロセッサノードが接続
され、前記要求信号が前記経路中のいずれかを介して前
記残余プロセッサノードの各々に同報通信され、前記デ
ータブロックは前記経路中のいずれかを介して前記要求
信号を生成したプロセッサノードに単一通信される前記
方向分離二重リング構造とを備える分散共有メモリ多重
プロセッサシステムが提供される。According to the present invention, there is provided, in accordance with the present invention, a plurality of processor nodes arranged in a ring, and any one of the plurality of processor nodes may be one of the plurality of processor nodes. Generating a request signal for a data block and causing the remaining processor node to snoop its own internal element such that one of the remaining processor nodes supplies the data block; Including a ring bus, there is a direction separated double ring structure that provides two opposite paths along the first and second ring buses, and the plurality of processor nodes are connected via the first and second paths. The request signal is broadcast to each of the remaining processor nodes via any of the paths, and the data block is Distributed shared memory multiprocessor system comprising said direction separated dual ring structure is a single communication is provided to the processor node that generated the request signal via one of the.

【０００７】[0007]

【発明の実施の形態】以下、本発明は添付の図面を参照
して、次のように詳細に説明する。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS The present invention will be described below in detail with reference to the accompanying drawings.

【０００８】図3を参照すると、スヌーピング方式を支
援する地点間(point-to-point)方向分離二重リング構
造、例えば、90A及び90Bに基づく分散共有メモリ多重プ
ロセッサシステム100を示し、方向分離二重リングバス9
0A及び90Bは地点間リンクを用いて実現し、地点間リン
クの各々は複数の信号を伝送することができる光ファイ
バ、同軸ケーブルまたは光連結部を用いて実現すること
ができる。本発明の好適な実施例によると、分散共有メ
モリ多重プロセッサシステム100は8個のプロセッサノー
ド、即ちPN1〜PN8(10〜80)を含む。PN1(10)からPN8(80)
に達する各々のプロセッサノードは、スヌーピングを支
援する地点間方向分離二重リング90A及び90Bを介して接
続されている。Referring to FIG. 3, a distributed shared memory multiprocessor system 100 based on a point-to-point direction-separating dual ring structure, eg, 90A and 90B, supporting a snooping scheme is shown. Heavy ring bus 9
0A and 90B are implemented using point-to-point links, and each of the point-to-point links can be implemented using an optical fiber, a coaxial cable, or an optical connection capable of transmitting a plurality of signals. According to a preferred embodiment of the present invention, distributed shared memory multiprocessor system 100 includes eight processor nodes, PN1-PN8 (10-80). PN1 (10) to PN8 (80)
Are connected via point-to-point directional splitting dual rings 90A and 90B that support snooping.

【０００９】ここで、「方向分離二重リング」とは、進行
方向が互いに反対である2つの分離されたリングバスを
有し、プロセッサノードからのデータ要求信号またはそ
の要求信号に対応するデータブロックを一番近い経路を
有する第1方向、または前記第1方向と反対である第2方
向に伝達するように構成されたリングバス構造を意味す
る。図3に示すように、方向分離二重リング構造は第1リ
ングバス90Aと第2リングバス90Bを含み、要求信号及び
検索されたデータブロックはデータブロックの属性、例
えば、あるプロセッサノードにより要求されたデータを
含むデータブロックが偶数メモリブロックアドレスと奇
数メモリブロックアドレスの中のどこに格納されている
かにより、第1リングバス90Aまたは第2リングバス90Bを
介して伝送される。Here, the "direction-separated dual ring" has two separated ring buses whose traveling directions are opposite to each other, and includes a data request signal from a processor node or a data block corresponding to the request signal. In a first direction having a closest path, or in a second direction opposite to the first direction. As shown in FIG. 3, the direction separating dual ring structure includes a first ring bus 90A and a second ring bus 90B, and a request signal and a retrieved data block are requested by an attribute of the data block, for example, by a certain processor node. Depending on where the data block containing the stored data is stored in the even memory block address and the odd memory block address, the data block is transmitted via the first ring bus 90A or the second ring bus 90B.

【００１０】図4には、互いに同一の構成を有する多数
個のプロセッサノードPN1〜PN8(10〜80)中のいずれかの
プロセッサノード、例えば、PN1(10)の構成がより詳細
に示されている。図4に示すように、PN1(10)はキャッシ
ュを組み込んでいる多数個のプロセッサモジュールと、
入出力(I/O)ブリッジ216と、ローカルシステムバス218
と、ローカル共有メモリ220と、ノード制御器222と、遠
隔キャッシュ224と、メモリディレクトリ226と、リンク
バス228と、リングインタフェース230とを含む。FIG. 4 shows in more detail the configuration of any one of a plurality of processor nodes PN1 to PN8 (10 to 80) having the same configuration, for example, PN1 (10). I have. As shown in FIG. 4, PN1 (10) has a number of processor modules incorporating a cache,
Input / output (I / O) bridge 216 and local system bus 218
, A local shared memory 220, a node controller 222, a remote cache 224, a memory directory 226, a link bus 228, and a ring interface 230.

【００１１】説明の便宜上、単なる2つのプロセッサモ
ジュール、即ち第1プロセッサモジュール212と、第2プ
ロセッサモジュール214が図4に示されており、プロセッ
サモジュール212及び214と、ローカル共有メモリ220
と、I/Oブリッジ216と、ノード制御器222は、ローカル
システムバス218を通じて互いに接続される。For convenience of description, only two processor modules, a first processor module 212 and a second processor module 214, are shown in FIG. 4 and include processor modules 212 and 214 and a local shared memory 220.
, The I / O bridge 216 and the node controller 222 are connected to each other through the local system bus 218.

【００１２】リングインタフェース230は、2つのリンク
制御器230A及び230Bを含み、このリンク制御器を介して
各々のプロセッサノードは方向分離二重リング構造、即
ちリングバス90A及び90Bにそれぞれ接続される。この実
施例で、リングインタフェース230は、リンクバス228を
介してノード制御器222に接続される。The ring interface 230 includes two link controllers 230A and 230B, through which each processor node is connected to a directionally separated dual ring structure, ie, ring buses 90A and 90B, respectively. In this embodiment, the ring interface 230 is connected to the node controller 222 via the link bus 228.

【００１３】ノード制御器222は、プロセッサモジュー
ル212及び214のいずれかからの要求信号に対応するデー
タブロックが遠隔キャッシュ224またはローカル共有メ
モリ220に有効な状態で格納されているかどうかを検索
する。検索の結果、データブロックが遠隔キャッシュ22
4に有効な状態で格納されている場合は、ノード制御器2
22は遠隔キャッシュ224に格納された該当データブロッ
クを要求信号を発生させるプロセッサモジュールに供給
するが、データブロックがローカル共有メモリ220に有
効な状態で格納されている場合は、ローカル共有メモリ
220が要求信号を発生させたプロセッサモジュールにデ
ータブロックを供給する。The node controller 222 searches whether the data block corresponding to the request signal from one of the processor modules 212 and 214 is stored in the remote cache 224 or the local shared memory 220 in a valid state. As a result of the search, the data block is
If it is stored in a valid state in 4, the node controller 2
22 supplies the corresponding data block stored in the remote cache 224 to the processor module that generates the request signal, but if the data block is stored in the local shared memory 220 in a valid state, the local shared memory
220 supplies the data block to the processor module that generated the request signal.

【００１４】遠隔キャッシュ224やローカル共有メモリ2
20ともにそのデータブロックが有効な状態で格納されて
おらず、データブロックが奇数ブロックアドレス、即ち
奇数ブロックである場合は、そのデータブロックに対す
る要求信号を第1リンク制御器230Aを介して第1リングバ
ス90Aに伝送する。一方、データブロックが偶数ブロッ
クアドレス、即ち偶数ブロックである場合は、そのデー
タブロックに対する要求信号を第2リンク制御器230Bを
介して第2リングバス90Bに伝送する。このようにノード
制御器222はリングインタフェース230、即ち第1リンク
制御器230A及び第2リンク制御器230Bを介してデータブ
ロックに対する要求信号を他のプロセッサノードPN2〜P
N8(20〜80)に供給する。The remote cache 224 and the local shared memory 2
In both cases, the data block is not stored in a valid state, and if the data block is an odd block address, that is, an odd block, a request signal for the data block is sent to the first ring controller via the first link controller 230A. Transmit to bus 90A. On the other hand, if the data block is an even block address, that is, an even block, a request signal for the data block is transmitted to the second ring bus 90B via the second link controller 230B. As described above, the node controller 222 transmits a request signal for a data block via the ring interface 230, that is, the first link controller 230A and the second link controller 230B, to the other processor nodes PN2 to PN.
Supply to N8 (20-80).

【００１５】続いて、他のプロセッサノードPN2〜PN8(2
0〜80)のいずれかからデータブロックに対する要求信号
が第1リンク制御器230Aまたは第2リンク制御器230Bを介
して受信されると、ノード制御器222はその要求信号に
対応するデータブロックが自分の遠隔キャッシュ224や
ローカル共有メモリ220に有効な状態で格納されている
かを検索する。検索の結果、データブロックが遠隔キャ
ッシュ224またはローカル共有メモリ220に有効な状態で
格納されている場合は、ノード制御器222はローカルシ
ステムバス218を介して遠隔キャッシュ224またはローカ
ル共有メモリ220からデータブロックを受信して、その
データブロックを要求したプロセッサノードに一番近い
経路を有する第1リングバス90Aまたは第2リングバス90B
を介してそのデータブロックを伝送する。Subsequently, the other processor nodes PN2 to PN8 (2
0 to 80), a request signal for a data block is received via the first link controller 230A or the second link controller 230B, and the node controller 222 determines that the data block corresponding to the request signal is its own. Is searched in a valid state in the remote cache 224 or the local shared memory 220. As a result of the search, if the data block is stored valid in the remote cache 224 or the local shared memory 220, the node controller 222 transmits the data block from the remote cache 224 or the local shared memory 220 via the local system bus 218. And the first ring bus 90A or the second ring bus 90B having a path closest to the processor node that requested the data block.
Via the data block.

【００１６】図4に示すように、各々のプロセッサノー
ドPN1〜PN8(10〜80)は、ローカル共有メモリ220に格納
されたデータブロックに対する状態情報を格納するメモ
リディレクトリ226をさらに含み、ノード制御器222が直
接メモリディレクトリ226をアクセスする。従って、メ
モリディレクトリ226によりノード制御器222はプロセッ
サモジュール212及び214のいずれかから要求されたデー
タブロックがローカル共有メモリ220にどの状態で格納
されているかを効果的に検索し、他のプロセッサノード
PN2〜PN8(20〜80)のいずれかから要求されたデータブロ
ックが自分のローカル共有メモリ220にどの状態で格納
されているかを効果的に検索することができるようにな
る。さらに好ましくは、メモリディレクトリ226は、図4
に示すように、独立的な2つのメモリディレクトリ226A
及び226Bより構成される。As shown in FIG. 4, each of the processor nodes PN1 to PN8 (10 to 80) further includes a memory directory 226 for storing status information for data blocks stored in the local shared memory 220, and includes a node controller 222 accesses memory directory 226 directly. Therefore, the node controller 222 effectively searches the memory directory 226 to find out in which state the data block requested by one of the processor modules 212 and 214 is stored in the local shared memory 220, and to read the other processor nodes.
It is possible to effectively search in which state the data block requested from any of PN2 to PN8 (20 to 80) is stored in its own local shared memory 220. More preferably, the memory directory 226 is located in FIG.
As shown in the two independent memory directories 226A
And 226B.

【００１７】第1メモリディレクトリ226Aは、ローカル
システムバス218を介して伝達されるプロセッサモジュ
ール212及び214からのローカル共有メモリアクセス要求
に反応し、第2メモリディレクトリ226Bは、リンクバス2
28を介してノード制御器222に接続されたリングインタ
フェース230を介して他のプロセッサノードPN2〜PN8(20
〜80)から伝達された遠隔共有メモリアクセス要求に反
応する。このような方式を通じて、ローカルメモリ220
に対するアクセス要求が並列に行われるようにすること
ができる。The first memory directory 226A responds to local shared memory access requests from the processor modules 212 and 214 transmitted over the local system bus 218, and the second memory directory 226B
The other processor nodes PN2 to PN8 (20) via a ring interface 230 connected to the node controller 222 via 28.
-80) responds to the remote shared memory access request transmitted. Through such a method, the local memory 220
Can be made in parallel.

【００１８】第1リンク制御器230Aと第2リンク制御器23
0Bは、PN1(10)を方向分離二重リングバス90A及び90Bに
接続するデータ経路を供給し、パケット伝送に必要な全
体的なデータの流れを制御する。第1リンク制御器230A
と第2リンク制御器230Bは、ノード制御器222からの要求
信号やデータブロックを有するパケットを構成して第1
リングバス90Aまたは第2リングバス90Bを介して他のプ
ロセッサノードPN2〜PN8(20〜80)に伝送し、第1リング
バス90Aまたは第2リングバス90Bを介して他のプロセッ
サノードPN2〜PN8(20〜80)から伝送されてくる要求信号
やデータブロックを選別してノード制御器222に伝達す
る。また、リンクインタフェース230は伝送される要求
信号が放送パケットである場合は、その放送パケットを
スヌーピングのためにノード制御器222に伝送するだけ
でなく、伝送された放送パッケトを次のプロセッサノー
ドPN2(20)またはPN8(80)にバイパス(bypass)する。より
具体的に、第1リンク制御器230Aは第1リングバス90Aを
介して隣り合うプロセッサノードPN8(80)から伝達され
る放送パケットを第1リングバス90Aを介して隣り合うさ
らに他のプロセッサノードPN2(20)に伝達し、第2リンク
制御器230Bは第2リングバス90Bを介して隣り合うプロセ
ッサノードPN2(20)から伝送される放送パケットを第2リ
ングバス90Bを介して隣合うさらに他のプロセッサノー
ドPN8(80)に伝達する。The first link controller 230A and the second link controller 23
0B provides a data path connecting PN1 (10) to the direction-separating dual ring buses 90A and 90B and controls the overall data flow required for packet transmission. 1st link controller 230A
And the second link controller 230B form a packet having a request signal and a data block from the node controller 222, and
The signal is transmitted to other processor nodes PN2 to PN8 (20 to 80) via the ring bus 90A or the second ring bus 90B, and is transmitted to the other processor nodes PN2 to PN8 (20 to 80) via the first ring bus 90A or the second ring bus 90B. 20 to 80) are selected and transmitted to the node controller 222. When the request signal to be transmitted is a broadcast packet, the link interface 230 not only transmits the broadcast packet to the node controller 222 for snooping, but also transmits the transmitted broadcast packet to the next processor node PN2 ( 20) or bypass to PN8 (80). More specifically, the first link controller 230A transmits a broadcast packet transmitted from the adjacent processor node PN8 (80) via the first ring bus 90A to another adjacent processor node via the first ring bus 90A. PN2 (20), and the second link controller 230B transmits the broadcast packet transmitted from the adjacent processor node PN2 (20) via the second ring bus 90B to another adjacent neighbor via the second ring bus 90B. To the processor node PN8 (80).

【００１９】一方、遠隔キャッシュ224は自分を除く他
のプロセッサノードPN2〜PN8(20〜80)のローカル共有メ
モリ(以下、遠隔共有メモリと称する)に格納されたデー
タブロックのみをキャッシュする。ローカルシステムバ
ス218に接続されたプロセッサモジュール212及び214の
いずれかが他のプロセッサノードPN2〜PN8(20〜80)のい
ずれかの遠隔共有メモリに格納されたデータブロックを
要求する場合、そのデータブロックは遠隔キャッシュ22
4に割当てられ、ローカル共有メモリ220に格納されたデ
ータブロックはキャッシングされない。即ち、上述のよ
うに、プロセッサノードPN1(10)の遠隔キャッシュ224は
他のプロセッサノードPN2〜PN8(20〜80)の遠隔共有メモ
リに格納されたデータブロックのみをキャッシングする
ことにより、遠隔メモリアクセス時間を減らすことがで
きる。On the other hand, the remote cache 224 caches only the data blocks stored in the local shared memories (hereinafter referred to as remote shared memories) of the other processor nodes PN2 to PN8 (20 to 80) except for itself. If any of the processor modules 212 and 214 connected to the local system bus 218 requests a data block stored in a remote shared memory of any of the other processor nodes PN2-PN8 (20-80), the data block Is a remote cache 22
Data blocks allocated to 4 and stored in local shared memory 220 are not cached. That is, as described above, the remote cache 224 of the processor node PN1 (10) caches only the data blocks stored in the remote shared memories of the other processor nodes PN2 to PN8 (20 to 80), thereby enabling the remote memory access. Time can be reduced.

【００２０】遠隔キャッシュ224はプロセッサノードPN1
(10)内のプロセッサモジュール212及び214におけるキャ
ッシュや、他のプロセッサノードPN2〜PN8(20〜80)内の
遠隔共有メモリに対してMLI性質(Multi-Level Inclusi
on Property)を満たすため、他のプロセッサノードPN2
〜PN8(20〜80)からの遠隔共有メモリ参照要求に対する
スヌーピング・フィルタリング(Snoop filtering)機能
を行うことができる。ここで、MLI性質は、下位階層、
即ちローカルキャッシュに有効な状態で格納されたデー
タブロックは上位階層、即ち遠隔キャッシュにも常に有
効な状態で格納されていなければならない性質を意味す
る。このようなMLI特性を保障するため、上位階層のキ
ャッシュに格納されたデータブロックが置換(replaceme
nt)えられる場合、該当データブロックは下位階層のど
のキャッシュにも有効な状態で存在してはならない。The remote cache 224 is a processor node PN1
(10) MLI properties (Multi-Level Included) for caches in the processor modules 212 and 214 and remote shared memories in other processor nodes PN2 to PN8 (20 to 80).
on Property), another processor node PN2
PN8 (20 to 80) can perform a snoop filtering function for a remote shared memory reference request. Here, the MLI property is the lower hierarchy,
That is, a data block stored in a valid state in the local cache must be always stored in a valid state in an upper layer, that is, a remote cache. In order to guarantee such MLI characteristics, the data blocks stored in the upper-level cache are replaced (replaceme
nt), the data block must not be valid in any lower-level cache.

【００２１】従って、遠隔キャッシュ224はプロセッサ
ノードPN1(10)のプロセッサモジュール212及び214内の
キャッシュに有効な状態で格納された遠隔データブロッ
クを格納するようになる。他のプロセッサノードPN2〜P
N8(20〜80)からの遠隔共有メモリ参照要求信号に反応す
るデータブロックが遠隔キャッシュ224に有効な状態で
格納されていない場合は、ローカルシステムバス218で
該当データブロックに対する要求を伝送する必要がない
スヌーピング・フィルタリング機能を行う。Accordingly, the remote cache 224 stores the remote data blocks stored in the caches in the processor modules 212 and 214 of the processor node PN1 (10) in a valid state. Other processor nodes PN2 to P
If a data block responding to the remote shared memory reference request signal from N8 (20 to 80) is not stored in the remote cache 224 in a valid state, it is necessary to transmit a request for the data block on the local system bus 218. Performs no snooping and filtering functions.

【００２２】このとき、好もしくは遠隔キャッシュ224
はデータブロックの内容を格納する遠隔データキャッシ
ュ224-1と、データブロックの状態及びアドレスの一部
分を格納する遠隔タグキャッシュ224-2を含むことによ
り、遠隔キャッシュ224に格納されたデータブロックの
状態を更新するか、または必要な場合、該当データブロ
ックを供給し易くする。さらに好ましくは、遠隔タグキ
ャッシュ224-2は遠隔データブロックのアドレスや状態
を格納する2つの遠隔タグキャッシュ、即ち第1遠隔タグ
キャッシュ224-2Aと第2遠隔タグキャッシュ224-2Bを含
む。第1遠隔タグキャッシュ224-2Aは、プロセッサモジ
ュール212及び214のいずれかによる遠隔キャッシュアク
セス要求に反応し、第2遠隔タグキャッシュ224-2Bは他
のプロセッサノードPN2〜PN8(20〜80)のいずれかによる
遠隔キャッシュアクセス要求に反応する。このような方
式で、遠隔キャッシュ224に対するアクセス要求を並列
に処理することができる。At this time, the preferred or remote cache 224
Includes a remote data cache 224-1 for storing the contents of a data block and a remote tag cache 224-2 for storing a part of the data block status and address, thereby enabling the status of the data block stored in the remote cache 224 to be stored. Updating or, if necessary, facilitating supply of the corresponding data block. More preferably, remote tag cache 224-2 includes two remote tag caches for storing addresses and status of remote data blocks, a first remote tag cache 224-2A and a second remote tag cache 224-2B. The first remote tag cache 224-2A responds to remote cache access requests by any of the processor modules 212 and 214, and the second remote tag cache 224-2B responds to any of the other processor nodes PN2-PN8 (20-80). To the remote cache access request. In this manner, access requests to the remote cache 224 can be processed in parallel.

【００２３】上述したように構成された本発明の方向分
離二重リング構造を有する分散共有メモリ多重プロセッ
サシステムの動作は図5及び図6を参照して次のように詳
しく説明する。The operation of the above-structured distributed shared memory multiprocessor system having the direction-separated dual ring structure of the present invention will be described in detail with reference to FIGS.

【００２４】遠隔キャッシュ224に格納されたデータブ
ロックは次の四つの状態、即ち「更新(Modified)」、「更
新―共有(Modified-Shared)」,「共有(Shared)」、「無効(I
nvalid)」状態のいずれかで表れる。四つの状態はそれぞ
れ次のように述べられる。＊更新: 該当データブロックが有効で更新されてお
り、唯一に有効なコピーである状態＊更新−共有: 該当データブロックが有効で更新され
ており、他の遠隔キャッシュが該当データブロックを共
有することができる。＊共有: 該当データブロックが有効であり、他の遠隔
キャッシュが該当データブロックを共有することができ
る。＊無効: 該当データブロックが有効ではない。The data blocks stored in the remote cache 224 have four states: "Modified", "Modified-Shared", "Shared", and "Invalid (I
nvalid) "state. Each of the four states is described as follows. * Update: The data block is valid and updated, and is the only valid copy. * Update-Share: The data block is valid and updated, and another remote cache shares the data block. Can be. * Sharing: The data block is valid and another remote cache can share the data block. * Invalid: The corresponding data block is not valid.

【００２５】また、本発明による多重プロセッサシステ
ムで、第1及び第2メモリディレクトリ226A及び226Bは三
つの状態、即ちC(clean)、S(share)、G(gone)のいずれ
かを維持することにより、ローカルシステムバス218を
介してローカル共有メモリアクセス要求に反応するキャ
ッシュコヒーレントのトラフィック量を最小化し、方向
分離二重リングバス90A及び90Bへの不必要なトランザク
ションを減らし、ローカルシステムバス219からの要求
を効果的に処理し、方向分離二重リングバス90A及び90B
によるスヌーピング要求に反応するスヌーピング結果を
発生する。この三つの状態に対する詳細は次に述べる。＊C(Clean): 該当データブロックが他のプロセッサノ
ードのどの遠隔キャッシュにも有効な状態で格納されて
いない。＊S(Shared): 該当データブロックが有効であり、他の
プロセッサノードのいずれかの遠隔キャッシュに更新さ
れない有効な状態で格納され得る。＊G(Gone): 該当データブロックが有効ではなく、他の
プロセッサノードのいずれかの遠隔キャッシュに更新さ
れた有効な状態で格納されている。Also, in the multiprocessor system according to the present invention, the first and second memory directories 226A and 226B maintain one of three states, namely, C (clean), S (share), and G (gone). This minimizes the amount of cache coherent traffic that responds to local shared memory access requests via the local system bus 218, reduces unnecessary transactions to the direction-separated dual ring buses 90A and 90B, and reduces Effectively handles demands, directional separation double ring bus 90A and 90B
Generates a snooping result that responds to the snooping request by. The details of these three states will be described below. * C (Clean): The data block is not stored in a valid state in any remote cache of another processor node. * S (Shared): The corresponding data block is valid and can be stored in a valid state that is not updated in any remote cache of another processor node. * G (Gone): The corresponding data block is not valid and is stored in a valid state updated in any remote cache of another processor node.

【００２６】一方、各プロセッサノードPN1〜PN8(10〜8
0)を順次接続する方向分離二重リングバス90A及び90B上
の全ての通信はパケットを介して構成され、パケットは
要求パケット、応答パケット、認識パケットに分類する
ことができる。要求パケットは、方向分離二重リングバ
ス90 A及び90Bへのトランゼクションを必要とするプロ
セッサノードPN1〜PN8(10〜80)のいずれかにより発送さ
れるパケットで同報通信パケット(broadcast packet)と
単一通信パケット(unicast packet)に区分できる。この
中で同報通信パケットのみが他のプロセッサノードによ
りスヌーピングされる。On the other hand, each of the processor nodes PN1 to PN8 (10 to 8
All communication on the direction-separating dual ring buses 90A and 90B that sequentially connect 0) are configured via packets, and the packets can be classified into request packets, response packets, and recognition packets. The request packet is a packet that is sent by any of the processor nodes PN1 to PN8 (10 to 80) that require a transaction to the direction-separating dual ring bus 90A and 90B, and is a broadcast packet. And a single communication packet. Of these, only broadcast packets are snooped by other processor nodes.

【００２７】応答パケットは要求パケットを受信したプ
ロセッサノードにより常に単一通信される。認識パケッ
トは、応答パケットを受信したプロセッサノードにより
発生すると、応答パケットを伝送したプロセッサノード
に単一通信される。応答パケットを単一通信したプロセ
ッサノードは応答パケットに対応する認識パケットが受
信されるまで応答パケットを維持する。本発明の他の実
施例によると、応答パケットを単一通信したプロセッサ
ノードが応答パケットに対応する認識パケットを受信す
る前に同一のデータブロックに対する他の要求パケット
を他のプロセッサノードから受信した場合は、必要に応
じて、そのプロセッサノードで要求パケットを再伝送す
ることを要求することができる。The response packet is always singly communicated by the processor node receiving the request packet. When the recognition packet is generated by the processor node that has received the response packet, it is singly communicated to the processor node that transmitted the response packet. A processor node that has communicated a single response packet maintains the response packet until a recognition packet corresponding to the response packet is received. According to another embodiment of the present invention, a case where a processor node that has single-communicated a response packet receives another request packet for the same data block from another processor node before receiving a recognition packet corresponding to the response packet Can request that the processor node retransmit the request packet, if necessary.

【００２８】図5及び図6で、分散共有メモリ多重プロセ
ッサシステムの例示的な動作が示されている。Referring to FIGS. 5 and 6, an exemplary operation of the distributed shared memory multiprocessor system is shown.

【００２９】第1番目の場合は、プロセッサノードPN1(1
0)内の第1プロセッサモジュール212がデータブロックに
対する読込み要求を発生させる場合で、要求パケットは
RQ12である。該当データブロックが遠隔共有メモリに該
当し、PN1(10)の遠隔キャッシュ224に有効な状態で格納
されていない場合は、PN1(10)はリングバス90Aまたは90
Bを介して他のプロセッサノードPN2〜PN8(20〜80)にRQ1
2を同報通信する。また、該当データブロックがローカ
ル共有メモリ220には該当するが、ローカル共有メモリ2
20に有効な状態で格納されていない場合は、PN1(10)は
リングバス90Aまたは90Bを介して他のプロセッサノード
PN2〜PN8(20〜80)にRQ12を通報通信する。この場合、該
当データブロックが奇数ブロックであれば、RQ12は第1
リンク制御器230A及び第1リンクバス90Aを介して伝送さ
れ、該当データブロックが偶数ブロックであれば、RQ12
は第2リンク制御器230Bと第2リングバス90Bを介して伝
達される。In the first case, the processor node PN1 (1
In the case where the first processor module 212 in (0) generates a read request for a data block, the request packet is
RQ12. If the corresponding data block corresponds to the remote shared memory and is not stored in a valid state in the remote cache 224 of the PN1 (10), the PN1 (10) is stored in the ring bus 90A or 90.
RQ1 to other processor nodes PN2-PN8 (20-80) via B
Broadcast 2 The data block corresponds to the local shared memory 220, but the local shared memory 2
If not stored in a valid state at 20, PN1 (10) is connected to another processor node via ring bus 90A or 90B.
RQ12 is reported to PN2 to PN8 (20 to 80). In this case, if the corresponding data block is an odd block, RQ12 is the first
If the data block is transmitted via the link controller 230A and the first link bus 90A and the data block is an even block, RQ12
Is transmitted via the second link controller 230B and the second ring bus 90B.

【００３０】これにより、奇数ブロックに対するRQ12
は、図5に示すように、第1リングバス90Aに沿って反時
計方向にPN2(20)からPN8(80)に巡回する。一方、偶数ブ
ロックに対するRQ12は、図6に示すように、第2リングバ
ス90Bに沿って時計方向にPN8(80)からPN2(20)に巡回す
る。RQ12がリングバス90Aまたは90Bに沿って巡回する
間、各々のプロセッサノードはRQ12に反応して内部の遠
隔キャッシュまたはメモリディレクトリを調査して該当
データブロックがどの状態で格納されているかなどに対
するスヌーピングを行うと同時に、そのRQ12を隣合う次
のプロセッサノードにバイパスする。As a result, RQ12 for odd blocks
Circulates from PN2 (20) to PN8 (80) in the counterclockwise direction along the first ring bus 90A, as shown in FIG. On the other hand, RQ12 for the even-numbered block circulates clockwise from PN8 (80) to PN2 (20) along the second ring bus 90B as shown in FIG. While the RQ12 circulates along the ring bus 90A or 90B, each processor node checks the internal remote cache or memory directory in response to the RQ12 and performs snooping on the state of the corresponding data block and the like. At the same time, the RQ12 is bypassed to the next adjacent processor node.

【００３１】例えば，RQ12がPN4(40)に供給されると、P
N4(40)のノード制御器はPN4(40)内の遠隔キャッシュと
メモリディレクトリをスヌーピングする。その結果、該
当データブロックがPN4(40)の遠隔キャッシュに「更新」
または「更新−共有」状態で格納されている場合は、PN4
(40)のノード制御器はRQ12に応答する責任を有すると判
断する。この場合、該当データブロックはローカル共有
メモリに有効な状態で格納しているプロセッサノードは
存在しない。For example, when RQ12 is supplied to PN4 (40), P
The node controller of N4 (40) snoops the remote cache and memory directory in PN4 (40). As a result, the corresponding data block is updated in the remote cache of PN4 (40).
Or, if stored in the "update-shared" state, PN4
The node controller of (40) determines that it is responsible for responding to RQ12. In this case, there is no processor node that stores the data block in the local shared memory in a valid state.

【００３２】これにより、PN4(40)のノード制御器は該
当データブロックを含む応答パケットRSP42をPN1に一番
近い経路を有する第1リングバス90Aまたは第2リングバ
ス90Bに沿ってPN1(10)に単一通信する。また、PN4(40)
のノード制御器はPN4(40)の遠隔キャッシュの状態を「更
新−共有」または「共有」状態のように更新されていない
有効な状態に変更させる。As a result, the node controller of the PN4 (40) transmits the response packet RSP42 including the corresponding data block to the PN1 (10) along the first ring bus 90A or the second ring bus 90B having the path closest to the PN1. A single communication. Also, PN4 (40)
Causes the remote cache state of PN4 (40) to change to a valid state that has not been updated, such as an "update-shared" or "shared" state.

【００３３】該当データブロックがPN4(40)のローカル
共有メモリに有効な状態で格納されている場合は、PN4
(40)のノード制御器はRQ12に対する応答の責任を取り、
該当データブロックを含む応答パケットをPN1に一番近
い経路を有する第1リングバス90Aまたは第2リングバス9
0Bを介してPN1(10)に伝送する。それで、図5及び図6
で、PN4(40)はPN1(10)まで一番近い経路を有する第2リ
ングバス90Bを介して応答パケットRSP42を伝送する。If the corresponding data block is stored in a valid state in the local shared memory of PN4 (40), the PN4
The node controller of (40) takes responsibility for the response to RQ12,
The response packet including the data block is transferred to the first ring bus 90A or the second ring bus 9 having the route closest to PN1.
It transmits to PN1 (10) via 0B. Therefore, FIGS. 5 and 6
The PN4 (40) transmits the response packet RSP42 via the second ring bus 90B having the closest route to the PN1 (10).

【００３４】このとき、前記読込み要求パケットRQ12は
巡回を終了した後、PN1(10)により除去される。一方、R
SP42を受信した後、プロセッサノードPN1(10)はプロセ
ッサノードPN4(40)まで一番近い経路を有する第1リング
バス90Aまたは第2リングバス90B、即ち図5及び図6では
第1リングバス90Aを介して認識パケットACK14をPN4(40)
に単一通信すると同時に、プロセッサノードPN1(10)の
ローカルシステムバス218を介して該当データブロック
に対する読込み要求を発生させる第1プロセッサモジュ
ール212にRSP42内の該当データブロックを伝送する。該
当データブロックが遠隔共有メモリに該当すると、PN1
(10)は遠隔データキャッシュ224-1に該当データブロッ
クを有効な状態で格納すると同時に、該当データブロッ
クに対応する遠隔タグキャッシュ224-2の状態を有効な
状態に更新し、該当データブロックがローカル共有メモ
リ220に該当すると、PN1(10)はローカル共有メモリ220
に該当データブロックを格納すると同時に、メモリディ
レクトリ226の状態を他のプロセッサノードが該当デー
タブロックを共有していることを意味する状態、例えば
「S」状態に更新する。At this time, the read request packet RQ12 is removed by the PN1 (10) after completing the tour. On the other hand, R
After receiving SP42, the processor node PN1 (10) has the first ring bus 90A or the second ring bus 90B having the closest path to the processor node PN4 (40), that is, the first ring bus 90A in FIGS. 5 and 6. Acknowledgment packet ACK14 via PN4 (40)
At the same time, the corresponding data block in the RSP 42 is transmitted to the first processor module 212 which issues a read request for the corresponding data block via the local system bus 218 of the processor node PN1 (10). If the data block corresponds to the remote shared memory, PN1
(10) stores the relevant data block in the remote data cache 224-1 in a valid state, and at the same time, updates the state of the remote tag cache 224-2 corresponding to the relevant data block to a valid state, and the relevant data block is If it corresponds to the shared memory 220, PN1 (10)
At the same time, the state of the memory directory 226 is updated to a state meaning that another processor node shares the corresponding data block, for example, the “S” state.

【００３５】一方、第2番目の場合は、PN1(10)内の第1
プロセッサモジュール212が書込み要求を発生する場合
であり、この場合にも図5及び図6はまだ有効であり、単
純化のためにこの時の要求パケットもRQ12とする。PN1
(10)が書込み要求に対する該当データブロックを遠隔キ
ャッシュ224やローカル共有メモリ220のどこにも有効な
状態で格納していないと、PN1(10)はリングバス90Aまた
は90Bを介して他のプロセッサノードPN2〜PN8(20〜80)
にRQ12を同報通信する。この場合、該当データブロック
が奇数ブロックである場合は、RQ12は、図5に示すよう
に、第1リングバス90Aを介して伝送される。一方、該当
データブロックが偶数ブロックである場合は、RQ12は、
図6に示すように、第2リングバス90Bを介して伝送され
る、書込み要求パケットRQ12がリングバス90Aまたは90B
を介して巡回する間、各々のプロセッサノードはRQ12に
反応して内部の遠隔キャッシュまたはメモリディレクト
リを調査して該当データブロックがどのような状態で格
納されているかなどに対するスヌーピングを行うと同時
に、RQ12を隣合う次のプロセッサノードにバイパスす
る。On the other hand, in the second case, the first in PN1 (10)
In this case, the processor module 212 issues a write request. In this case, FIGS. 5 and 6 are still valid, and the request packet at this time is also assumed to be RQ12 for simplicity. PN1
If (10) does not store the corresponding data block for the write request in a valid state anywhere in the remote cache 224 or the local shared memory 220, PN1 (10) will connect to another processor node PN2 via the ring bus 90A or 90B. ~ PN8 (20 ~ 80)
To RQ12. In this case, if the data block is an odd block, RQ12 is transmitted via the first ring bus 90A as shown in FIG. On the other hand, if the data block is an even block, RQ12 is
As shown in FIG. 6, the write request packet RQ12 transmitted via the second ring bus 90B is transmitted to the ring bus 90A or 90B.
While traversing through, each processor node responds to RQ12, checks the internal remote cache or memory directory, performs snooping on how the corresponding data block is stored, etc. To the next adjacent processor node.

【００３６】例えば、RQ12がPN4(40)に供給されると、P
N4(40)のノード制御器は内部の遠隔キャッシュとメモリ
ディレクトリをスヌーピングする。その結果、該当デー
タブロックがPN4(40)の遠隔キャッシュに更新された状
態、例えば、「更新」または「更新―共有」状態で格納され
ているか(この場合に、該当ブロックをローカル共有メ
モリに有効な状態で格納しているプロセッサノードは存
在しない)、または該当データブロックがローカル共有
メモリに有効な状態で格納されていると、PN4(40)のノ
ード制御器は自分がRQ12に対する応答の責任を有すると
判断して要求したデータブロックを含む応答パケットRS
P42をPN1(10)に一番近い経路を有する第1リングバス90A
または第2リングバス90Bを介して伝送する。図5及び図6
を参照すると、PN4(40)は一番近い経路を有する第2リン
グバス90Bを介して応答パケットRSP42を伝送する。ま
た、PN4(40)のノード制御器は該当データブロックを格
納している遠隔キャッシュの状態を無効化された状態、
例えば、「無効」状態でするか、またはメモリディレクト
リの状態で無効化された状態、例えば、「G」状態に更新
する。前記書込み要求パケットRQ12は、リングバス90A
または90Bを巡回した後、プロセッサノードPN1(10)によ
り除去される。For example, when RQ12 is supplied to PN4 (40), P
The node controller of N4 (40) snoops the internal remote cache and memory directory. As a result, whether the corresponding data block is stored in the PN4 (40) remote cache in an updated state, for example, in the `` update '' or `` update-shared '' state (in this case, the corresponding block is valid If the corresponding data block is stored in a valid state in the local shared memory, the node controller of PN4 (40) assumes responsibility for the response to RQ12. Response packet RS containing the data block requested to be determined to have
P42 is the first ring bus 90A having the path closest to PN1 (10)
Alternatively, the signal is transmitted via the second ring bus 90B. Figures 5 and 6
, The PN4 (40) transmits the response packet RSP42 via the second ring bus 90B having the closest route. Also, the node controller of PN4 (40) invalidates the state of the remote cache storing the corresponding data block,
For example, the state is changed to the “invalid” state or the state of the memory directory is invalidated, for example, updated to the “G” state. The write request packet RQ12 is transmitted to the ring bus 90A.
Alternatively, after circulating through 90B, it is removed by the processor node PN1 (10).

【００３７】一方、スヌーピング結果、該当データブロ
ックが他のプロセッサノード、即ちPN2〜PN3(20〜30)と
PN5〜PN8(50〜80)の遠隔キャッシュに更新されていない
有効な状態、例えば、「共有」状態で格納されていること
と判明されると、プロセッサノード、即ちPN2〜PN3(20
〜30)とPN5〜PN8(50〜80)の遠隔キャッシュ状態は無効
化された状態、例えば、「無効」状態に変更する。On the other hand, as a result of the snooping, the corresponding data block is connected to another processor node, that is, PN2 to PN3 (20 to 30).
If the remote cache of PN5-PN8 (50-80) is found to be stored in a valid state that has not been updated, e.g., in a `` shared '' state, the processor node, i.e., PN2-PN3 (20
30) and the remote cache state of PN5 to PN8 (50 to 80) are changed to an invalidated state, for example, an "invalid" state.

【００３８】PN1(10)はPN4(40)からRSP42を受信する
と、PN4(40)に一番近い経路を有する第1リングバス90A
または第2リングバス90B、例えば、図5及び図6の場合は
第1リングバス90Aを介してPN4(40)に認識パケットACK14
を単一伝送すると同時に、プロセッサノードPN1(10)の
ローカルシステムバス218を介してその書込み要求を生
成するプロセッサモジュール212に該当データブロック
を伝送する。また、要求したデータブロックが遠隔共有
メモリに該当すると、PN1(10)は遠隔データキャッシュ2
24-1に該当データブロックを修正された有効な状態、例
えば、「更新」状態で格納し，該当データブロックがロー
カル共有メモリ220に該当すると、PN1(10)はローカル共
有メモリ220に該当データブロックを格納すると同時
に、メモリディレクトリ226の状態を他のプロセッサノ
ードの遠隔キャッシュが該当データブロックを共有して
いないことを意味する状態、例えば、「C」状態に更新す
る。When PN1 (10) receives RSP42 from PN4 (40), PN1 (10) receives the first ring bus 90A having the path closest to PN4 (40).
Alternatively, the recognition packet ACK14 is sent to the PN4 (40) via the second ring bus 90B, for example, the first ring bus 90A in the case of FIGS. 5 and 6.
And the corresponding data block is transmitted to the processor module 212 which generates the write request via the local system bus 218 of the processor node PN1 (10). If the requested data block corresponds to the remote shared memory, PN1 (10) stores the remote data cache 2
If the corresponding data block is stored in the modified valid state, for example, "updated" state in 24-1 and the corresponding data block corresponds to the local shared memory 220, the PN1 (10) stores the corresponding data block in the local shared memory 220. At the same time, the state of the memory directory 226 is updated to a state meaning that the remote cache of another processor node does not share the corresponding data block, for example, the “C” state.

【００３９】一方、第3番目の場合は、PN1(10)内の第1
プロセッサモジュール212がデータブロックに対する書
込み要求または無効化要求を生成する場合であり、図5
及び図6は、この場合にもまだ有効であり、説明の便宜
上、要求パケットはこの場合にもRQ12で表す。On the other hand, in the third case, the first in PN1 (10)
FIG. 5 illustrates a case where the processor module 212 generates a write request or an invalidation request for a data block.
6 is still valid in this case as well, and for convenience of explanation, the request packet is again represented by RQ12 in this case.

【００４０】PN1(10)が該当データブロックを遠隔キャ
ッシュ224に有効な状態、例えば、「共有」状態で格納さ
れている場合は、要求過程はPN1(10)が該当データブロ
ックを遠隔キャッシュ224とローカル共有メモリ220のど
こにも有効な状態で格納されていない場合と同様に行わ
れる。If PN1 (10) stores the corresponding data block in the valid state in the remote cache 224, for example, in the "shared" state, the request process is such that PN1 (10) stores the corresponding data block in the remote cache 224. This is performed in the same manner as when the data is not stored in a valid state anywhere in the local shared memory 220.

【００４１】一方、図5及び図6で、プロセッサノードPN
1(10)のプロセッサモジュール212及び214のいずれかか
らの書込み要求や無効化要求に対して、プロセッサノー
ドPN1(10)がローカル共有メモリ220に該当データブロッ
クを有効な状態で格納しているか、または該当ブロック
を遠隔キャッシュ224に「更新―共有」状態で格納されて
いる場合、プロセッサノードPN1(10)は第1リングバス90
Aまたは第2リングバス90Bを介して他のプロセッサノー
ドPN2〜PN8(10〜80)に無効化要求パケットを同報通信す
る。このとき、該当データブロックが奇数ブロックであ
る場合は、図5に示すように、第1リングバス90Aを介し
て無効化要求が同報通信される。一方、該当データブロ
ックが偶数ブロックである場合は、図6に示すように、
第2リングバス90Bを介して無効化要求が同報通信され
る。要求パケットRQ12がリングバス90Aまたは90Bを巡回
する間、プロセッサノードPN2〜PN8(20〜80)の各々は、
RQ12に反応して内部の遠隔キャッシュまたはメモリディ
レクトリを調査して該当データブロックがどのような状
態で格納されているかなどに対するスヌーピングを行う
と同時に、上述したようにRQ12を隣合う次のプロセッサ
ノードにバイパスする。前記RQ12はリングバスを巡回し
た後、プロセッサノードPN1(10)により除去される。一
方、スヌーピングの結果、該当データブロックが他のプ
ロセッサノード、即ちPN2〜PN8(20〜80)の遠隔キャッシ
ュに更新されていない有効な状態、例えば、「共有」状態
で格納されていると表れると、PN4(40)の遠隔キャッシ
ュの状態は、無効化された状態、例えば「無効」状態に変
更する。該当データブロックが遠隔共有メモリに該当す
ると、PN1(10)は遠隔データキャッシュ224-1に格納され
た該当データブロックの状態を更新した状態、例えば、
「更新」状態に変更し、該当データブロックがローカル共
有メモリ220に該当すると、PN1(10)はメモリディレクト
リ226の状態を他のプロセッサノードの遠隔キャッシュ
が該当ブロックを共有していないことを意味する状態、
例えば、「C」状態に更新する。On the other hand, in FIG. 5 and FIG.
In response to a write request or invalidation request from any of the processor modules 212 and 214 of 1 (10), the processor node PN1 (10) stores the corresponding data block in the local shared memory 220 in a valid state, Alternatively, when the corresponding block is stored in the remote cache 224 in the “update-shared” state, the processor node PN1 (10)
An invalidation request packet is broadcast to the other processor nodes PN2 to PN8 (10 to 80) via A or the second ring bus 90B. At this time, if the data block is an odd block, an invalidation request is broadcasted via the first ring bus 90A as shown in FIG. On the other hand, if the data block is an even block, as shown in FIG.
An invalidation request is broadcast via the second ring bus 90B. While the request packet RQ12 circulates around the ring bus 90A or 90B, each of the processor nodes PN2 to PN8 (20 to 80)
In response to RQ12, the internal remote cache or memory directory is checked to perform snooping on the state of the corresponding data block and the like, and at the same time, RQ12 is transmitted to the next next processor node as described above. Bypass. The RQ12 is removed by the processor node PN1 (10) after traveling around the ring bus. On the other hand, if the result of snooping indicates that the corresponding data block is stored in a valid state that has not been updated in the remote cache of another processor node, that is, PN2 to PN8 (20 to 80), for example, in a `` shared '' state, , The state of the remote cache of PN4 (40) changes to an invalidated state, for example, an "invalid" state. When the corresponding data block corresponds to the remote shared memory, PN1 (10) updates the state of the corresponding data block stored in the remote data cache 224-1, for example,
When the state is changed to the "update" state and the corresponding data block corresponds to the local shared memory 220, PN1 (10) indicates that the state of the memory directory 226 is not shared by the remote cache of another processor node. Status,
For example, update to the “C” state.

【００４２】第4番目の場合は、プロセッサノードPN1(1
0)の遠隔キャッシュ224におけるデータブロック置換え
により抽出されるデータブロックの状態が更新された状
態、例えば、「更新」あるいは「更新―共有」状態の場合で
ある。この場合に、PN1(10)は該当データブロックをも
ともと格納されるべきローカル共有メモリを備えたプロ
セッサノード、例えば、PN4(40)に一番近い経路を有す
るリングバス90Aまたは90Bを介して抽出されたブロック
を含むパケットを単一通信する。そうすると、PN4(40)
はRQ12に反応して内部のデータメモリとメモリディレク
トリを更新し、応答パケットRSP42を第1リングバス90A
または第2リングバス90Bを介してPN1(10)に単一通信す
る。PN1(10)は認識パケットACK14をPN4(40)に単一通信
する。In the fourth case, the processor node PN1 (1
This is the case where the state of the data block extracted by the data block replacement in the remote cache 224 of 0) is an updated state, for example, an “update” or “update-shared” state. In this case, PN1 (10) is extracted via a processor node having a local shared memory in which the corresponding data block should be originally stored, for example, via a ring bus 90A or 90B having a path closest to PN4 (40). Communicate a single packet containing a block. Then, PN4 (40)
Updates the internal data memory and memory directory in response to RQ12, and sends a response packet RSP42 to the first ring bus 90A.
Alternatively, single communication is performed with PN1 (10) via the second ring bus 90B. The PN1 (10) performs a single communication of the recognition packet ACK14 to the PN4 (40).

【００４３】一方、本発明の好適な実施例によると、プ
ロセッサノード、例えば、PN1(10)は印加されるパケッ
トの入力順に従って1つ以上のデータブロックに対する
要求を処理することができる。例えば、要求されたデー
タブロックに対する応答パケットが対応するプロセッサ
ノード、即ちPN4(40)から受信される前に他のデータブ
ロックに対する要求パケットが他のプロセッサノードPN
2〜PN8(20〜80)中の1つ以上から伝達されると、プロセ
ッサノードPN1(10)はまず他のプロセッサノードPN2〜PN
8(20〜80)中の1つ以上から伝達された要求パケットに対
する動作を行った数に要求したデータブロックに対する
応答パケットに該当する動作を行う。On the other hand, according to the preferred embodiment of the present invention, the processor node, for example, PN1 (10) can process requests for one or more data blocks according to the input order of applied packets. For example, before a response packet for a requested data block is received from the corresponding processor node, i.e., PN4 (40), a request packet for another data block is
When transmitted from one or more of the nodes 2 to PN8 (20 to 80), the processor node PN1 (10)
The operation corresponding to the response packet to the requested data block is performed according to the number of operations performed on the request packet transmitted from one or more of 8 (20 to 80).

【００４４】図7ないし図10を参照すると、図3に示す本
発明による方向分離二重リング構造を有する多重プロセ
ッサシステムで使用されるプロセッサノードの他の実施
例がそれぞれ示されている。Referring to FIGS. 7 to 10, there is shown another embodiment of a processor node used in the multiprocessor system having the direction-separated dual ring structure shown in FIG. 3 according to the present invention.

【００４５】図7は本発明の第2実施例によるプロセッサ
ノード200-1の詳細図を示している。図7に示すように、
プロセッサノード200-1の構成は第1リンク制御器230A及
び第2リンク制御器230Bがリンクバス228無しにノード制
御器222に直接接続されていること以外は、図4に示す本
発明の実施例1によるプロセッサノードの構成と同様で
ある。FIG. 7 shows a detailed view of the processor node 200-1 according to the second embodiment of the present invention. As shown in FIG.
The configuration of the processor node 200-1 is similar to that of the embodiment of the present invention shown in FIG. 4 except that the first link controller 230A and the second link controller 230B are directly connected to the node controller 222 without the link bus 228. This is the same as the configuration of the processor node according to 1.

【００４６】図8には本発明の実施例３によるプロセッ
サノード200-2が詳細に示されており、プロセッサノー
ド200-2の構成はプロセッサノード200-2がローカル共有
メモリ及びメモリディレクトリを含まないこと以外は、
図4に示す本発明の実施例1によるプロセッサノードの構
成と同様である。FIG. 8 shows in detail a processor node 200-2 according to Embodiment 3 of the present invention. The configuration of the processor node 200-2 is such that the processor node 200-2 does not include the local shared memory and the memory directory. Other than that,
This is the same as the configuration of the processor node according to the first embodiment of the present invention shown in FIG.

【００４７】図9は本発明の実施例4によるプロセッサノ
ード200-3を詳細に示しており、プロセッサノード200-3
の構成はプロセッサノード200-3が遠隔キャッシュを含
まないということ以外は、図4に示す本発明の実施例1に
よるプロセッサノードの構成と同様である。FIG. 9 shows the details of the processor node 200-3 according to the fourth embodiment of the present invention.
Is the same as the configuration of the processor node according to the first embodiment of the present invention shown in FIG. 4 except that the processor node 200-3 does not include the remote cache.

【００４８】図10は本発明の実施例5によるプロセッサ
ノード200-4を詳細に示しており、プロセッサノード200
-4の構成は内部プロセッサモジュールがローカルシステ
ムバスの代わりにリングまたはクロスバースイッチのよ
うなある相互接続網240を介して互いに接続されている
こと以外は、図4に示す本発明の実施例1によるプロセッ
サノードの構成と同様である。FIG. 10 shows the details of the processor node 200-4 according to the fifth embodiment of the present invention.
4 is different from the embodiment 1 of the present invention shown in FIG. 4 except that the internal processor modules are connected to each other via some interconnection network 240 such as a ring or a crossbar switch instead of the local system bus. Is the same as the configuration of the processor node.

【００４９】図7ないし図10に示す実施例の構成は、第1
実施例の構成と殆ど同一であるため、その動作の説明は
便宜上省略する。本発明によると、プロセッサノードPN
1〜PN8(10〜80)の各々は図7ないし図10から選択された
構成を有するプロセッサノードを介して実現することが
できる。The structure of the embodiment shown in FIGS.
Since the configuration is almost the same as that of the embodiment, the description of the operation is omitted for convenience. According to the invention, the processor node PN
Each of 1 to PN8 (10 to 80) can be realized via a processor node having a configuration selected from FIGS.

【００５０】さらに、上記において、遠隔キャッシュが
「更新」と「更新―共有」、「共有」、「無効」状態を有する場合
について説明したが、本発明は遠隔キャッシュが変更さ
れた他の状態を有する場合を含む多様な場合にも同様に
適用することができる。本発明の実施例においてローカ
ル共有メモリのためのディレクトリが「C」、「S」、「G」の
状態を維持する場合について説明したが、本発明はディ
レクトリが変更された多様な他の状態を有する場合にも
同様に適用され得ることを理解しなければ成らない。Further, in the above description, the case where the remote cache has the "update" and "update-shared", "shared", and "invalid" states has been described. The present invention can be similarly applied to various cases including the case of having. In the embodiment of the present invention, the case where the directory for the local shared memory maintains the states of "C", "S", and "G" has been described. However, the present invention is applicable to various other states in which the directory is changed. It must be understood that the same can be applied when having.

【００５１】上述した本発明の実施例において、同報通
信要求パケットに対する応答パケットと認識パケットが
最短経路を有する第1リングバス90Aあるいは第2リング
バス90Bを介して伝送される場合について説明したが、
本発明が第1リングバス90Aを介して伝送された同報通信
要求に対する応答パケットと認識パケットは第1リング
バス90Aを介して伝送され、第2リングバス90Bを介して
伝送された同報通信要求に対する応答パケットと認識パ
ケットは第2リングバス90Bを介して伝送される場合と、
所定の順に第1リングバス90Aあるいは第2リングバス90B
を介して応答及び認識パケットに伝送される場合にも同
様に適用されることは明らかである。In the above-described embodiment of the present invention, a case has been described where a response packet to a broadcast request packet and a recognition packet are transmitted via the first ring bus 90A or the second ring bus 90B having the shortest path. ,
According to the present invention, a response packet and a recognition packet for a broadcast request transmitted via the first ring bus 90A are transmitted via the first ring bus 90A, and the broadcast transmitted via the second ring bus 90B. The response packet to the request and the recognition packet are transmitted via the second ring bus 90B,
First ring bus 90A or second ring bus 90B in a predetermined order
It is clear that the same applies to the case where the response and the recognition packet are transmitted via the.

【００５２】[0052]

【発明の効果】従って、本発明によれば、分散共有メモ
リ多重プロセッサシステムはスヌーピング方式を使用す
ることにより、プロセッサノード間のキャッシュ一貫性
を維持することができるだけでなく、リング帯域幅が2
倍に拡張された単方向リングバスを有する既存システム
に比べてさらに向上した性能を供給する効果を奏する。Thus, according to the present invention, a distributed shared memory multiprocessor system can not only maintain cache coherency between processor nodes by using a snooping scheme, but also reduce ring bandwidth by two.
This provides an effect of providing further improved performance as compared with an existing system having a unidirectional ring bus doubled.

[Brief description of the drawings]

【図１】従来技術のスヌーピング方式の単一リングを備
えた分散共有メモリ多重プロセッサシステムの構成図で
ある。FIG. 1 is a configuration diagram of a distributed shared memory multiprocessor system having a snooping type single ring according to the related art.

【図２】図1に示すプロセッサノードの詳細構成図であ
る。FIG. 2 is a detailed configuration diagram of a processor node shown in FIG. 1;

【図３】本発明による方向分離二重リングを備えた分散
共有メモリ多重プロセッサシステムの構成図である。FIG. 3 is a block diagram of a distributed shared memory multiprocessor system with a direction-separating dual ring according to the present invention;

【図４】本発明の実施例1による図3に示すプロセッサノ
ードの詳細構成図である。FIG. 4 is a detailed configuration diagram of a processor node shown in FIG. 3 according to the first embodiment of the present invention.

【図５】本発明の実施例１による分散共有多重プロセッ
サシステムの動作を例示的に示す図面である。FIG. 5 is a diagram illustrating an operation of the distributed shared multiprocessor system according to the first embodiment of the present invention.

【図６】本発明の実施例１による分散共有多重プロセッ
サシステムの動作を例示的に示す図面である。FIG. 6 is a diagram illustrating an operation of the distributed shared multiprocessor system according to the first embodiment of the present invention.

【図７】本発明の実施例2による図3に示すプロセッサノ
ードの詳細構成図である。FIG. 7 is a detailed configuration diagram of a processor node shown in FIG. 3 according to a second embodiment of the present invention.

【図８】本発明の実施例3による図3に示すプロセッサノ
ードの詳細構成図である。FIG. 8 is a detailed configuration diagram of a processor node shown in FIG. 3 according to Embodiment 3 of the present invention.

【図９】本発明の実施例4による図3に示すプロセッサノ
ードの詳細構成図である。FIG. 9 is a detailed configuration diagram of a processor node shown in FIG. 3 according to Embodiment 4 of the present invention.

【図１０】本発明の実施例5による図3に示すプロセッサ
ノードの詳細構成図である。FIG. 10 is a detailed configuration diagram of a processor node shown in FIG. 3 according to Embodiment 5 of the present invention.

[Explanation of symbols]

90A：第1リングバス 90B：第2リングバス 212：第1プロセッサモジュール 214：第2プロセッサモジュール 216：I/Oブリッジ 218：ローカルシステムバス 220：ローカル共有メモリ 222：ノード制御器 224：遠隔キャッシュ 224-1：遠隔データキャッシュ 224-2：遠隔タグキャッシュ 224-2A：第1遠隔タグキャッシュ 224-2B：第2遠隔タグキャッシュ 226：メモリディレクトリ 226A：第1メモリディレクトリ 226B：第2メモリディレクトリ 228：リンクバス 230：リングインタフェース 230A：第1リンク制御器 230B：第2リンク制御器 90A: First ring bus 90B: Second ring bus 212: First processor module 214: Second processor module 216: I / O bridge 218: Local system bus 220: Local shared memory 222: Node controller 224: Remote cache 224 -1: Remote data cache 224-2: Remote tag cache 224-2A: First remote tag cache 224-2B: Second remote tag cache 226: Memory directory 226A: First memory directory 226B: Second memory directory 228: Link Bus 230: Ring interface 230A: First link controller 230B: Second link controller

───────────────────────────────────────────────────── フロントページの続き (71)出願人 500170526 金明柱大韓民国、ソウル特別市蘆原区中渓洞市営アパートメント 206−304 (72)発明者張星泰大韓民国、ソウル特別市城北区貞陵１洞 1015番地京南アパートメント103−1701 (72)発明者全洲植大韓民国、ソウル特別市江南区道谷洞宇星アパートメント１−103 (72)発明者金明柱大韓民国、ソウル特別市蘆原区中渓洞市営アパートメント 206−304 Ｆターム(参考） 5B005 JJ01 KK14 MM01 NN53 PP21 PP26 5B045 BB13 BB17 BB24 BB28 BB29 BB47 DD01 DD12 GG01 5B060 KA01 KA06 ──────────────────────────────────────────────────続き Continuing from the front page (71) Applicant 500170526 Kim Myeong-il, 206-304, Junggye-dong Municipal Apartment, Luwon-gu, Seoul, Korea (72) Inventor Zhang Xingtai 1 Jeongneung, Seongbuk-gu, Seoul, Korea 1015-dong Gyeongnam Apartment 103-1701 (72) Inventor Jeong-suk-po Korea 1-1, Dook-dong, Gangnam-gu, Seoul Special Apartment 1-103 (72) Inventor Kim Myeong-gil Junggye-dong, Awon-gu, Seoul, Republic of Korea Municipal apartment 206-304 F term (reference) 5B005 JJ01 KK14 MM01 NN53 PP21 PP26 5B045 BB13 BB17 BB24 BB28 BB29 BB47 DD01 DD12 GG01 5B060 KA01 KA06

Claims

[Claims]

1. A distributed shared memory multiprocessor system, comprising: a plurality of processor nodes arranged in a ring, wherein one of the plurality of processor nodes generates a request signal for one data block; The remaining processor nodes snooping their own internal elements, the plurality of processor nodes supplying any of the data blocks among the remaining processor nodes, and the first and second ring buses; There is a direction-separated dual ring structure that provides two opposite paths along a two-ring bus, the plurality of processor nodes are connected via the first and second paths, and the request signal is To each of the remaining processor nodes, the data block being routed through any of the paths. Distributed shared memory multiprocessor system comprising: a said direction separated dual ring structure is a single communication to the processor node that generated the request signals.

2. The request signal is transmitted through the first ring bus or the second ring bus depending on where the data block is stored in an even or odd memory block address of an internal element of the plurality of processor nodes. 2. The distributed shared multiprocessor system according to claim 1, wherein the distributed shared multiprocessor system is transmitted.

3. A request signal for a data block different from the data block from at least one of the remaining processor nodes before the processor node that generated the request signal receives a corresponding signal corresponding to the request signal. Upon receipt, the processor node first performs an operation corresponding to a request signal transmitted from at least one of the remaining processor nodes, and then processes an operation corresponding to the response signal. 2. The distributed shared memory multiprocessor system according to 1.

4. Each of the plurality of processor nodes generates a request signal for the data block, a plurality of processor modules, a local shared memory storing a data block shared by each of the plurality of processor modules, In response to the request signal, investigate whether the data block corresponding to the request signal is stored in a valid state in the local shared memory, and store the data block in a valid state in the local shared memory. If the data block is not stored in the local shared memory in a valid state, the request signal is transmitted to the plurality of processor nodes. System that supplies the next processor node next to A link controller for interfacing the node controller with the direction-separating dual ring structure; and an interconnection network for interconnecting the plurality of processor modules, the local shared memory, and the node controller. 2. The distributed shared memory multiprocessor system according to claim 1, wherein

5. Each of the plurality of processor nodes includes: a plurality of processor modules for generating the request signal for the data block; a remote cache; and a data corresponding to the request signal in response to the request signal. Investigate whether a block is validly stored in the remote cache and, if the data block is validly stored in the remote cache, communicate the data block to the plurality of processor modules. And if the data block is not stored valid in the remote cache, the node controller supplies the request signal to an adjacent next processor node among the plurality of processor nodes. Link controller for interfacing a controller with the direction-separated dual ring structure , Distributed shared memory multiprocessor system according to claim 1, characterized in that it comprises an interconnect network interconnecting said plurality of processor modules and local shared memory and a node controller.

6. Each of the plurality of processor nodes includes: a plurality of processor modules for generating the request signal for the data block; a local shared memory for storing a data block shared by each of the plurality of processor modules; In response to the request signal, investigate whether the data block corresponding to the request signal is stored in the remote cache or the local shared memory in a valid state. Transmitting the data block to the plurality of processor modules, if the data block is validly stored in the remote cache or the local shared memory, the data block being valid in the remote cache or the local shared memory; Not stored in A node controller that supplies the request signal to an adjacent next processor node among the plurality of processor nodes; a link controller that interfaces the node controller and the direction-separated dual ring structure; 2. The distributed shared memory multiprocessor system according to claim 1, further comprising an interconnection network interconnecting the processor modules, the local shared memory, and the node controllers.

7. The distributed shared memory multiprocessor system according to claim 4, wherein each of the plurality of processor nodes further includes a memory directory including status information on data blocks stored in the local shared memory. .

8. The memory directory having two or more independent memory directories to process access requests to the local shared memory from the processor module and the remaining processor nodes in parallel. The distributed shared memory multiprocessor system according to claim 7, wherein

9. The remote cache includes a remote data cache including contents of the data block, and a remote tag cache storing a tag address and a status of the data block stored in the remote data cache. Claim 5 or Claim 6
3. The distributed shared memory multiprocessor system according to item 1.

10. The remote tag cache having two or more independent remote tag caches to process access requests to the remote cache from the processor module and the remaining processor nodes in parallel. 10. The distributed shared memory multiprocessor system according to claim 9, wherein:

11. The link controller according to claim 1, further comprising a first link controller and a second link controller connecting the first ring bus and the second ring bus to the node controller, respectively. 7. The distributed shared memory multiprocessor system according to any one of claims 4 to 6.

12. The first link controller and the second link controller each generate a packet including the request signal and the data block, thereby generating a corresponding ring bus connected to each controller. Transmitting the packet to the remaining processor node via the corresponding ring bus and selectively supplying the request or data block supplied from the remaining processor node via the corresponding ring bus to the node controller. Item 12. The distributed shared memory multiprocessor system according to item 11.

13. The distribution system according to claim 12, wherein each of the processor nodes further includes a link bus connecting the node controller to the first link controller and the second link controller. Shared memory multiprocessor system.