JP2007241601A

JP2007241601A - Multiprocessor system

Info

Publication number: JP2007241601A
Application number: JP2006062261A
Authority: JP
Inventors: Fumihiko Hayakawa; 文彦早川
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 2006-03-08
Filing date: 2006-03-08
Publication date: 2007-09-20
Anticipated expiration: 2026-03-08
Also published as: JP5168800B2

Abstract

<P>PROBLEM TO BE SOLVED: To speed up data transfer between respective cache memories corresponding to a plurality of processor cores in a multiprocessor system while suppressing increase in a hardware mounting cost as much as possible. <P>SOLUTION: The multiprocessor system comprises: the plurality of cache memories corresponding to the plurality of processor cores, respectively; and a direct memory access control means for directly controlling data transfer between the plurality of cache memories. <P>COPYRIGHT: (C)2007,JPO&INPIT

Description

本発明はマルチプロセッサシステムにおけるキャッシュ間のデータ転送方式に係り、さらに詳しくはマルチプロセッサシステムを構成するプロセッサコアのうちの１つで更新されたデータを、下位メモリ階層を介することなく、他のプロセッサコアのキャッシュに直接に転送することを可能とし、データ移動やキャッシュアクセスのレイテンシを小さくすることができるマルチプロセッサシステムに関する。 The present invention relates to a data transfer method between caches in a multiprocessor system, and more specifically, data updated by one of processor cores constituting the multiprocessor system is transferred to another processor without going through a lower memory hierarchy. The present invention relates to a multiprocessor system that can directly transfer data to a core cache and reduce the latency of data movement and cache access.

処理を複数のプロセッサで分割して並列に実行するマルチプロセッサシステムは、高速な処理を実現するために広範に用いられている。図１８は、そのようなマルチプロセッサシステムの従来例の構成ブロック図である。同図において、チップ１００上に複数のプロセッサ１０１_ａから１０１_ｄが備えられ、各プロセッサはメインメモリ１０５とバス１０６によって接続されている。各プロセッサ１０１_ａから１０１_ｄに対しては、それぞれキャッシュメモリ１０２_ａから１０２_ｄが備えられている。 A multiprocessor system that divides processing by a plurality of processors and executes the processing in parallel is widely used to realize high-speed processing. FIG. 18 is a block diagram showing a conventional example of such a multiprocessor system. In the figure, a plurality of processors 101 _a to 101 _d are provided on a chip 100, and each processor is connected to a main memory 105 by a bus 106. For 101 _d from the processor 101 _a, is provided with a 102 _d from the cache memory 102 _a, respectively.

図１８のようなマルチプロセッサシステムでは、メモリアクセスの平均的性能を向上させるために、各プロセッサに対してそれぞれキャッシュメモリが備えられる、分散キャッシュシステムが用いられる。このような分散キャッシュシステムでは、各キャッシュメモリと下位メモリ階層との間で同一アドレスに対するデータの不整合が生じることのないよう、データの一致性を保つ必要がある。 In the multiprocessor system as shown in FIG. 18, in order to improve the average performance of memory access, a distributed cache system in which a cache memory is provided for each processor is used. In such a distributed cache system, it is necessary to maintain data consistency so that inconsistency of data for the same address does not occur between each cache memory and the lower memory hierarchy.

図１９は、このようなマルチプロセッサシステムにおいて、各メモリにおけるデータの一致性を保つためのデータ転送の第１の従来技術の説明図である。同図において、あるプロセッサ（コア）１０１_ａ上で更新されたデータは、キャッシュメモリ１０２_ａに書き込まれるとともに、一般にプログラムの指示によって下位メモリ階層１０５に書き戻される。そして同時に別のプロセッサ（コア）、例えば１０１_ｂ上のキャッシュメモリ１０２_ｂ内で対応するアドレスのデータに対して無効化が指示され、キャッシュメモリ１０２_ｂ上のデータは破棄される。プロセッサ１０１_ｂでデータが必要となった場合には、下位メモリ階層１０５から最新の更新データをリードすることによって、データの一致性が確保される。 FIG. 19 is an explanatory diagram of a first prior art of data transfer for maintaining data consistency in each memory in such a multiprocessor system. In the figure, data updated by a processor (core) on 101 _a, together with written into the cache memory 102 _a, generally written in lower memory hierarchy 105 by instructions of the program back. At the same time another processor (core), invalidation is instructed to the corresponding address of the data, for example, the cache memory 102 _b on 101 _b, data in the cache memory 102 _b is discarded. If it becomes necessary data processor 101 _b, by leading the latest update data from the lower memory hierarchy 105, consistency of data is ensured.

図２０は、第１の従来技術におけるデータ転送シーケンスの例である。同図において、例えば図１９上のプロセッサ１０１_ａ、すなわちデータ転送元実行部から転送元キャッシュ、すなわちキャッシュメモリ１０２_ａ上の更新データが、下位メモリ階層としての下位メモリ階層１０５に転送された後に、転送先実行部としてのプロセッサ１０１_ｂに対応して備えられている転送先キャッシュ１０２_ｂへのデータ転送が行われるものとして、処理のシーケンスを説明する。 FIG. 20 is an example of a data transfer sequence in the first prior art. In FIG. 19, for example, after update data on the transfer source cache, that is, the cache memory 102 _a is transferred from the processor 101 _{a in} FIG. 19, that is, the data transfer source execution unit, to the lower memory hierarchy 105 as the lower memory hierarchy, The processing sequence will be described on the assumption that data transfer is performed to the transfer destination cache 102 _b provided corresponding to the processor 101 _b as the transfer destination execution unit.

まず転送元実行部から転送元キャッシュ、すなわちキャッシュメモリ１０２_ａに対して更新データ転送の指示が出され、キャッシュメモリ１０２_ａから下位メモリ階層としての下位メモリ階層１０５への更新データ転送が行われる。転送元キャッシュ、すなわちキャッシュメモリ１０２_ａにおいても、その後メモリシステム内でのデータの統一性を保つために与えられるデータの無効化指示に対応して、データの無効化が行われる。そして転送元実行部としてのプロセッサ１０１_ａから転送先実行部、すなわちプロセッサ１０１_ｂに対して、更新データが下位メモリ階層１０５に格納されたことを示す通知が送られ、そのデータを必要とする転送先実行部、すなわちプロセッサ１０１_ｂは転送先キャッシュとしてのキャッシュメモリ１０２_ｂに対してデータ要求を行うが、この時点ではキャッシュメモリ１０２_ｂには更新データはまだ格納されておらず、転送先キャッシュとしてのキャッシュメモリ１０２_ｂは下位メモリ階層に対してデータのムーブイン要求を行い、その要求に対応して返送されたデータを受け取り、さらにそのデータを転送先実行部に返送することによって、更新データを用いたプロセッサ１０１_ｂにおける処理が可能となる。 First the source cache from the source execution unit, ie an instruction of updating the data transfer is issued to the cache memory 102 _a, is performed updating data transfer from the cache memory 102 _a to the lower memory hierarchy 105 as lower memory hierarchy. The source cache, i.e. also in the cache memory 102 _a, in response to the deactivation instruction data given to subsequently maintain the integrity of data in the memory system, disabling data. The transfer destination execution unit from the processor 101 _a as a former execution unit, i.e. the processor 101 _b, notification that the update data is stored in the lower memory hierarchy 105 is sent, transfer that requires that data The pre-execution unit, that is, the processor 101 _b makes a data request to the cache memory 102 _b as the transfer destination cache. At this time, the update data is not yet stored in the cache memory 102 _b , and the transfer destination cache the cache memory 102 _b perform move-in request for data to the lower memory hierarchy, receives the data sent back in response to the request, by further returns the data to the destination execution unit, use the update data There processing in the processor 101 _b has become possible.

しかしながらこの第１の従来技術では、あるプロセッサコアにおいて更新され、対応するキャッシュメモリに格納されたデータは一旦下位メモリ階層に書き戻された後、必要とするプロセッサコアがそのデータを下位メモリ階層からキャッシュメモリを介してリードすることが必要となり、
転送元実行部が更新データの転送を指示してから、その更新データを必要とする転送先実行部が実際にそのデータを獲得するまでの時間が長くなるという問題点があった。 However, in the first prior art, data updated in a certain processor core and stored in the corresponding cache memory is once written back to the lower memory hierarchy, and then the necessary processor core transfers the data from the lower memory hierarchy. It is necessary to read through the cache memory,
There is a problem that it takes a long time until the transfer destination execution unit that requires the update data actually acquires the data after the transfer source execution unit instructs the transfer of the update data.

図２１は、分散キャッシュシステムにおけるデータの一致性を保つためのデータ転送方式の第２の従来例の説明図である。同図においてあるプロセッサ（コア）１０１_ａのキャッシュメモリ１０２_ａ上のデータが更新されると、そのキャッシュメモリ１０２_ａに書き込まれた更新データは他のプロセッサのすべてにブロードキャストされ、同じアドレスのデータを格納しているキャッシュメモリ上のデータは更新される。このようなキャッシュの制御方式はスヌープキャッシュと呼ばれる。 FIG. 21 is an explanatory diagram of a second conventional example of a data transfer method for maintaining data consistency in a distributed cache system. When the data processor (core) on 101 _a cache memory 102 _a located in the drawing is updated, updated data written in the cache memory 102 _a is broadcast to all other processors, the data of the same address The stored data on the cache memory is updated. Such a cache control method is called a snoop cache.

図２２は、第２の従来技術、すなわちスヌープキャッシュ方式における更新データの転送シーケンスの例である。同図において転送元キャッシュ、例えばキャッシュメモリ１０２_ａにデータが書き込まれると、そのキャッシュメモリ、すなわち転送元キャッシュから、スヌープ処理によって転送先キャッシュ、すなわち他のすべてのキャッシュメモリのうちで同じアドレスのデータが格納されているキャッシュメモリにデータの書込みが行われ、データ書込みが行われたことが転送先実行部としての、転送先キャッシュに対応するプロセッサに通知され、転送先実行部は必要に応じて転送先キャッシュに対してデータを要求し、返送されたデータを用いて処理を行うことになる。 FIG. 22 is an example of a transfer sequence of update data in the second prior art, that is, the snoop cache method. When the source cache, for example, data in the cache memory 102 _a is written in the figure, the cache memory, i.e. the transfer from the original cache, transfer destination cache by snooping process, that is, the data of the same address of all the other cache memory Is written to the cache memory in which the data is stored, and the processor corresponding to the transfer destination cache serving as the transfer destination execution unit is notified that the data has been written, and the transfer destination execution unit Data is requested from the transfer destination cache, and processing is performed using the returned data.

しかしながらこの第２の従来技術においては、更新データのブロードキャストを行うためのハード実装コストが大きくなるという問題点がある。すなわち対応するアドレスのデータが格納されていないキャッシュメモリに対してもデータのブロードキャストが行われるために、メモリアクセスのトラフィックが大きくなり、幅の広いバスが必要となるとともに、それぞれのプロセッサはキャッシュメモリ上のデータの更新などを行いながらその他の処理を行うために、例えば２つのポートを持つランダム・アクセス・メモリなどを備える必要があり、ハードウェアのコストが大きくなる。 However, the second prior art has a problem that the hardware mounting cost for broadcasting the update data is increased. In other words, since data is broadcast to the cache memory in which the data at the corresponding address is not stored, the memory access traffic increases, a wide bus is required, and each processor has a cache memory. In order to perform other processing while updating the above data, it is necessary to provide a random access memory having two ports, for example, which increases the cost of hardware.

このようなマルチプロセッサシステム内でのＤＭＡ転送におけるメモリアクセスの待ち時間を低減するために、キャッシュをソフトウェアプログラムで管理する技術を開示する特許文献１では、システムメモリ上のデータの各プロセッサへの転送を高速化するために設けられるＤＭＡキャッシュのソフトウェアプログラム管理を可能にするキャッシュ管理コマンドを提供する方法が開示されている。
特開２００５−２７６１９９号公報しかしながら特許文献１の従来技術は、システムメモリ上のデータの各プロセッサへの転送を高速化するための技術であり、本発明が対象とする分散キャッシュメモリシステムにおけるキャッシュメモリ相互間でのデータ転送には適用することができないという問題点があった。 In Patent Document 1, which discloses a technique for managing a cache with a software program in order to reduce the memory access waiting time in DMA transfer in such a multiprocessor system, data in the system memory is transferred to each processor. A method for providing a cache management command that enables software program management of a DMA cache provided to speed up the process is disclosed.
However, the prior art disclosed in Japanese Patent Laid-Open No. 2005-276199 is a technique for speeding up the transfer of data on the system memory to each processor, and the cache memory in the distributed cache memory system to which the present invention is directed. There is a problem that it cannot be applied to data transfer between each other.

本発明の課題は、上述の問題点に鑑み、ハードウェアの実装コストの増大をできるだけ抑えながら、マルチプロセッサシステムにおける複数のプロセッサコアにそれぞれ対応するキャッシュメモリ相互間でのデータ転送を高速化することである。 In view of the above-described problems, an object of the present invention is to speed up data transfer between cache memories respectively corresponding to a plurality of processor cores in a multiprocessor system while suppressing an increase in hardware implementation cost as much as possible. It is.

図１は、本発明のマルチプロセッサシステムの原理構成ブロック図である。同図においてマルチプロセッサシステムは、複数のプロセッサコア１_ａから１_ｄと、直接メモリアクセス制御手段３とを備えている。 FIG. 1 is a block diagram showing the principle configuration of a multiprocessor system according to the present invention. In the figure, the multiprocessor system includes a plurality of processor cores 1 _a to 1 _d and direct memory access control means 3.

各プロセッサコア１_ａから１_ｄに対しては、それぞれ対応するキャッシュメモリ２_ａから２_ｄが備えられ、直接データ転送制御手段３はこれら複数のキャッシュメモリ２_ａから２_ｄの相互間での直接データ転送を制御するものである。 For 1 _d from each processor core 1 _a, provided with 2 _d from the corresponding cache memory 2 _a, the direct data transfer control unit 3 directly between each other 2 _d from the plurality of cache memories 2 _a It controls data transfer.

図１の直接メモリアクセス制御手段３は、例えばバスマスタであり、バスマスタは転送元プロセッサコアから転送元プロセッサコアの番号、転送先プロセッサコアの番号、転送データのキャッシュメモリ上の格納開始アドレス、およびデータ転送量を示す転送条件と、転送開始の指示に対応して、転送元のキャッシュメモリに対してリードアクセスを行い、キャッシュヒット応答に対応して転送データ、およびフラグなどを読み出し、転送先キャッシュメモリに対してライトアクセスを行い、転送データとフラグなどを転送先キャッシュメモリにライトする。 The direct memory access control means 3 in FIG. 1 is, for example, a bus master, and the bus master transfers from the source processor core to the source processor core number, the destination processor core number, the storage start address of the transfer data in the cache memory, and the data In response to the transfer condition indicating the transfer amount and the transfer start instruction, read access is performed to the transfer source cache memory, transfer data, a flag, etc. are read in response to the cache hit response, and the transfer destination cache memory Write access is performed, and the transfer data and flags are written to the transfer destination cache memory.

本発明によれば、マルチプロセッサシステムにおいて、各プロセッサコアにそれぞれ対応するキャッシュメモリ相互間での直接データ転送が可能となり、キャッシュメモリ相互間でのデータ転送が高速化され、マルチプロセッサシステム全体としてのデータ処理効率の向上に寄与するところが大きい。 According to the present invention, in a multiprocessor system, it is possible to directly transfer data between cache memories corresponding to respective processor cores, the data transfer between cache memories is accelerated, and the entire multiprocessor system is This greatly contributes to improving data processing efficiency.

図２は、本発明におけるマルチプロセッサシステムの基本構成ブロック図である。同図においてシステムは、図１８の従来例におけると同様のチップ１０とメインメモリ１５とによって構成されており、チップ１０の上には複数のプロセッサ（コア）１１_ａから１１_ｄが搭載されており、各プロセッサに対してはそれぞれキャッシュメモリ１２_ａから１２_ｄが備えられている。本発明においては、これらの構成要素に加えて各キャッシュメモリ１２_ａから１２_ｄの間でメインメモリ１５を介することなく、データの転送を行うためにＤＭＡＣ（ダイレクト・メモリ・アクセス・コントローラ）１４が備えられ、各プロセッサ１１_ａから１１_ｄ、メインメモリ１５、およびＤＭＡＣ１４はバス１６によって接続されている。 FIG. 2 is a block diagram showing the basic configuration of the multiprocessor system according to the present invention. In this figure, the system is composed of the same chip 10 and main memory 15 as in the conventional example of FIG. 18, and a plurality of processors (cores) 11 _a to 11 _d are mounted on the chip 10. Each processor is provided with cache memories _12a to _12d . In the present invention, without passing through the main memory 15 among these components added 12 from the cache memory 12 _a to _d, DMAC (direct memory access controller) 14 in order to transfer the data The processors 11 _a to 11 _d , the main memory 15, and the DMAC 14 are connected by a bus 16.

本発明においては、図２に示すように各プロセッサ（コア）１１_ａから１１_ｄにそれぞれ対応するキャッシュメモリ１２_ａから１２_ｄに対するデータの読み書き、すなわち転送の制御を行うためのＤＭＡＣ１４が新設され、あるプロセッサ（コア）、例えば１１_ａによる演算によってキャッシュメモリ１２_ａ上で更新されたデータは、ＤＭＡＣ１４の制御によって別のプロセッサ（コア）、例えば１１_ｂに直接に転送され、キャッシュメモリ１２_ｂ上の対応するデータの更新が行われる。この場合、どのプロセッサからどのプロセッサへの転送が行われるべきかは、例えばプロセッサ１１_ａによる演算結果がプロセッサ１１_ｂによる演算で使われるべきことから、マルチプロセッサシステムに対する全体的処理のプログラムによって指定される。 In the present invention, read and write data from the cache memory 12 _a for 12 _d respectively corresponding to 11 _d from the respective processor (core) 11 _a as shown in FIG. 2, that is, DMAC14 for controlling the transfer is established, data updated by a processor (core), for example, a cache memory on 12 _a by computation by 11 _a, the another processor under the control of the DMAC 14 (core), are transferred directly to the example 11 _b, the cache memory 12 _b Corresponding data is updated. In this case, the should any transfer to the processor is performed from any processor, for example, since should the operation result by the processor 11 _a is used in the computation by the processor 11 _b, it is designated by the overall processing of the program for the multiprocessor system The

図３は、図２の基本構成システムにおけるデータ更新のメインメモリへの反映方法の説明図である。図２１で説明したスヌープキャッシュ方式以外の一般的なマルチプロセッサシステムにおけるキャッシュ制御方式では、図１９で説明したようにメインメモリにデータの更新結果が反映された後には、複数のキャッシュメモリ、ここでは１２_ａから１２_ｄのうちで、メインメモリ上でデータ更新が反映されたデータに対応するアドレスに対しては、いずれか１つのキャッシュメモリだけが有効なデータを保持するように制御が行われる。 FIG. 3 is an explanatory diagram of a method for reflecting the data update to the main memory in the basic configuration system of FIG. In a cache control method in a general multiprocessor system other than the snoop cache method described in FIG. 21, after the data update result is reflected in the main memory as described in FIG. 19, a plurality of cache memories, here Of 12 _a to 12 _d , control is performed so that only one cache memory holds valid data for an address corresponding to data in which data update is reflected on the main memory.

すなわち図３において、図２でプロセッサ１１_ａに対応するキャッシュメモリ１２_ａに対して書き込まれた更新データがプロセッサ１１_ｂに転送されてキャッシュメモリ１２_ｂに書き込まれ、データの更新が行われると、そのデータ更新の最終的なメインメモリ１５への反映は、プロセッサ１１_ｂからメインメモリ１５へのそのデータの転送によって行われる。この時、もともとのデータ転送元のプロセッサ１１_ａに対するキャッシュメモリ１２_ａ内の更新データは無効化されて以後使用不可能にされるか、あるいはそのデータに対応する更新済みフラグとしてのダーティビットがクリアされることによって、そのプロセッサ１１_ａでは使用可能であるとしても、例えば他のプロセッサ、例えば１１_ｃへの転送には使用できない状態とされる。 That is, in FIG. 3, updated data written to the cache memory 12 _a corresponding to the processor 11 _a in FIG. 2 is transferred to the processor 11 _b is written into the cache memory 12 _b, the update of the data is performed, The final update of the data in the main memory 15 is performed by transferring the data from the processor 11 _b to the main memory 15. At this time, if the update data in the cache memory 12 _a to the original data transfer source processor 11 _a is in the subsequent unusable is disabled, or the dirty bit is cleared as the update flag corresponding to the data by being, even if the processor 11 is _a the available such as other processors, the transfer to example 11 _c are unusable.

図４は、図２の基本構成システムにおける複数のプロセッサコアへのＤＭＡＣによる直接データ転送の説明図である。同図において、キャッシュメモリ１２_ａ上で更新された書き込みデータは、例えば他の３つのプロセッサ１１_ｂ、１１_ｃ、および１１_ｄのすべてにＤＭＡＣ１４の制御によって転送され、各キャッシュメモリ１２_ｂから１２_ｄ上の対応するアドレスのデータは更新可能となる。しかしながら、前述のスヌープキャッシュ方式を除いては、有効なデータとして実際にキャッシュへの書き込みを行う権利を持つのは転送先の複数のプロセッサのうちで１つだけとなる。その制御はソフトウェアによって保証されるが、その制御自体は本発明と直接の関連は無いので、その詳細な説明は省略する。 FIG. 4 is an explanatory diagram of direct data transfer by DMAC to a plurality of processor cores in the basic configuration system of FIG. In the figure, the write data that has been updated in the cache memory 12 _a is, for example, transferred by all the control of DMAC14 of the other three processors ₁₁ b, 11 _c, and 11 _d, 12 _d from the cache memory 12 _b The data at the corresponding address above can be updated. However, except for the above-described snoop cache system, only one of a plurality of transfer destination processors has the right to actually write data to the cache as valid data. Although the control is ensured by software, the control itself is not directly related to the present invention, and thus detailed description thereof is omitted.

図５は、本発明におけるマルチプロセッサシステムの第１の実施例の構成ブロック図である。同図においてシステムは、チップ２０と下位メモリ階層、例えばメインメモリ２５によって構成され、チップ２０上には複数のプロセッサコア２１_ａから２１_ｄ、バスマスタ２４が備えられ、各プロセッサコア２１_ａから２１_ｄ、バスマスタ２４、および下位メモリ階層２５はバス２６によって接続されている。そしてそれぞれのプロセッサコア、例えばプロセッサコア２１_ａは実行部２２_ａとキャッシュメモリ２３_ａとを備えている。なおバスマスタ２４は、例えば図２におけるＤＭＡＣ１４に相当する。 FIG. 5 is a block diagram showing the configuration of the first embodiment of the multiprocessor system according to the present invention. System In this figure, the chip 20 and the lower memory hierarchy is constituted by, for example, a main memory 25, chip 20 21 _d from a plurality of processor cores 21 _a is on, provided with a bus master 24, 21 _d from the processor core 21 _a The bus master 24 and the lower memory hierarchy 25 are connected by a bus 26. And each processor core, for example, the processor core 21 _a is provided with an execution unit 22 _a and the cache memory 23 _a. The bus master 24 corresponds to, for example, the DMAC 14 in FIG.

図５において、例えばプロセッサコア２１_ａ上のキャッシュメモリ２３_ａが転送元キャッシュメモリとして、プロセッサコア２１_ｂ上の転送先キャッシュメモリとしてのキャッシュメモリ２３_ｂに対して、キャッシュメモリ間データ直接転送を司るバスマスタ２４の制御によって、データの転送を行う場合の処理の詳細について、図６から図８のフローチャートを用いて説明する。 5, for example, as a cache memory 23 _a is the source cache memory on the processor core 21 _a, the cache memory 23 _b as the transfer destination cache memory on the processor core 21 _b, charge transfer cache memory between the data directly Details of processing when data is transferred under the control of the bus master 24 will be described with reference to the flowcharts of FIGS.

図６は、バスマスタ２４の処理フローチャートである。同図においてバスマスタ２４の処理は、例えばプロセッサコア２１_ａ上の実行部２２_ａから送られるデータ転送要求に対応して開始される。処理が開始されると、まずステップＳ１で転送元コア番号、ここではプロセッサコア２１_ａの番号、転送すべきキャッシュメモリ２３_ａ上の領域、すなわちアドレスと、転送データサイズ、および転送先のコア番号、例えばプロセッサコア２１_ｂの番号の設定結果が、転送元コア、すなわちプロセッサコア２１_ａから受け取られ、またステップＳ２でプロセッサコア２１_ａから転送開始の指示が受け取られ、これに対応してリード操作時の動作が開始される。 FIG. 6 is a process flowchart of the bus master 24. The processing of the bus master 24 in figure is started for example in response to a data transfer request sent from the execution unit 22 _a on the processor core 21 _a. When the process is started, the transfer source core number at step S1, wherein the processor core 21 _a number of the regions in the cache memory 23 to be transferred _a, i.e. address and transfer data size, and the transfer destination of the core number , for example, the processor core 21 _b number Configuration result of, the source core, i.e. received from the processor core 21 _a, also start instruction transferred from the processor core 21 _a is received in step S2, corresponding to the read operation to The hour action is started.

リード操作時の動作としては、まずステップＳ３で転送元コア、ここではプロセッサコア２１_ａの実行部２２_ａに対してリード操作の要求が出され、ステップＳ４でその要求に対応して転送元コアから受け取ったキャッシュ応答がヒットであるか否かが判定される。一般的には転送元コアの転送要求に対応するキャッシュ応答はヒットとなることが多く、その場合にはステップＳ５で転送元コアからキャッシュ上のデータとフラグ、例えば前述の更新済みフラグとしてのダーティビットが読み出され、ステップＳ６で転送元コアに対して転送データに対する無効化処理、あるいはダーティビットのクリア処理が要求され、その後ライト操作時の動作が行われる。ここで、転送元コアに対して無効化処理とダーティビットクリア処理のいずれを要求すべきかを示すデータがバスマスタ内の図示しないレジスタに格納され、バスマスタはそのデータに対応していずれかの処理を転送元コアに対して要求するものとする。 The operation during the read operation, first, the source core in step S3, where the request for the read operation is issued to the execution unit 22 _a of the processor core 21 _a, the source core in response to the request in step S4 It is determined whether the cache response received from is a hit. In general, the cache response corresponding to the transfer request of the transfer source core is often a hit, and in this case, in step S5, the data and flags on the cache from the transfer source core, for example, the dirty flag as the above-mentioned updated flag are stored. The bit is read, and in step S6, the transfer source core is requested to invalidate the transfer data or to clear the dirty bit, and then the operation during the write operation is performed. Here, data indicating whether invalidation processing or dirty bit clear processing should be requested to the transfer source core is stored in a register (not shown) in the bus master, and the bus master performs any processing corresponding to the data. It shall be requested to the transfer source core.

ライト操作時の動作としては、まずステップＳ８で転送先コア、ここではプロセッサコア２１_ｂに対して、データのキャッシュメモリ２３_ｂへのライト操作の要求が出され、ステップＳ９で転送先コアに対してライトデータとフラグが転送され、ステップＳ１０で転送すべきデータのすべての転送を完了したか否かが判定される。一般的に大量のデータを転送する場合には、その転送データが小さなデータ量の単位に分割され、その単位毎にデータのリードと転送が行われる。そしてすべてのデータの転送が完了していない場合には、ステップＳ３以降の処理が繰り返される。 The operation at the time of the write operation, first, the destination core in step S8, the processor core 21 _b in this case, a request for a write operation to the cache memory 23 _b of the data is issued, to the destination core in step S9 The write data and the flag are transferred, and it is determined in step S10 whether or not all the transfer of the data to be transferred has been completed. In general, when a large amount of data is transferred, the transferred data is divided into small data amount units, and data is read and transferred for each unit. If the transfer of all data has not been completed, the processes after step S3 are repeated.

すべてのデータの転送が完了した場合には、ステップＳ１１で転送元コア、ここでは２１_ａに対してデータ転送の完了が通知され、またステップＳ１２で転送先コア、すなわち２１_ｂに対してデータ転送の完了が通知されて、処理を終了する。なお後述するように、このデータの転送完了通知は、バスマスタ２４からではなく、例えばデータ転送先実行部としてのプロセッサコア２１_ｂの実行部２２_ｂから、例えば転送元コア２１_ａに対して行われることも可能であり、その場合はステップＳ１１の処理は省略されることになる。ここでステップＳ１１、Ｓ１２の処理が省略可能であることが、全体にカッコをつけることによって示されている。なお、バスマスタがデータ転送完了通知を行うべきか否かは、前述と同様にバスマスタ内の図示しないレジスタに格納されているデータによって指定されるものとする。 If the transfer of all data has been completed, the source core in step S11, where the notification is completed data transfer to 21 _a, also the destination core in step S12, i.e., data transfer to 21 _b Is notified, and the process ends. As will be described later, this data transfer completion notification is not sent from the bus master 24 but from, for example, the execution unit 22 _b of the processor core 21 _b as the data transfer destination execution unit, for example, to the transfer source core 21 _a . In this case, the process in step S11 is omitted. Here, the fact that the processing of steps S11 and S12 can be omitted is indicated by adding parentheses to the whole. Whether or not the bus master should notify the completion of data transfer is specified by data stored in a register (not shown) in the bus master as described above.

リード操作時の動作におけるステップＳ４で転送元コアのキャッシュ応答がヒットでない場合には、ステップＳ１４以降の処理が行われる。前述のように転送元コアからデータ転送要求を受けたことに対応するキャッシュ応答は基本的にはヒットとなるべきであるが、キャッシュメモリ２３_ａ内で転送すべきデータが格納されているアドレスのデータが、その後のプロセッサコア２１_ａ内の処理で、参照局所性が低いと判断され、メインメモリ等の下位メモリ階層に書き戻されてしまっていることも起こりうるために、キャッシュ応答がヒットではなく、ミスとなる場合も考えられる。 If the cache response of the transfer source core is not a hit in step S4 in the operation during the read operation, the processing from step S14 is performed. While the foregoing is a cache response corresponding to receiving the data transfer request from the transfer source core as it should be hit basically the address where the data to be transferred in the cache memory 23 _a is stored data, the processing subsequent processor core 21 _a, it is determined that the reference locality is low, since it may also occur that they've been written back to the lower memory hierarchy, such as a main memory, the hit cache response It is also possible to make a mistake.

このようにキャッシュ応答がミスの場合には、ステップＳ１４で下位メモリ階層、例えばメインメモリから転送すべきデータを取得すべきか否かが判定される。データを取得すべきか否かは、例えばステップＳ１で転送元のコアによる設定の中で指定されるものとする。メインメモリからデータを取得すべき場合には、ステップＳ１５で下位メモリ階層に対するリード操作が行われ、ステップＳ１６で読み出しデータが受け取られて、ライト操作時の動作としてのステップＳ８以降の処理が続行される。 When the cache response is a miss as described above, it is determined in step S14 whether or not data to be transferred is to be acquired from the lower memory hierarchy, for example, the main memory. Whether or not data should be acquired is specified in the setting by the transfer source core in step S1, for example. When data is to be acquired from the main memory, a read operation is performed on the lower memory hierarchy in step S15, read data is received in step S16, and the processing after step S8 as an operation at the time of the write operation is continued. The

ステップＳ１４でキャッシュミスの場合に下位メモリ階層からデータを取得すべきでないと判定されると、ステップＳ１７でデータ転送が中断され、ステップＳ１８で転送元コア、すなわちプロセッサコア２１_ａにエラーが通知されて処理を終了する。なおこのステップＳ１８の処理も省略可能である。 If it is determined that it should not get the data from the lower memory hierarchy in the case of a cache miss in the step S14, the data transfer is interrupted at step S17, an error is notified transfer source core step S18, that is, the processor core 21 _a To finish the process. Note that the processing in step S18 can also be omitted.

図７は、転送元キャッシュ側、ここではキャッシュメモリ２３_ａ側の処理のフローチャートである。この処理は、プロセッサコア２１_ａの実行部２２_ａが転送元実行部として実行する処理のフローチャートである。同図において処理が開始されると、まずステップＳ２１でバスマスタ２４からデータのリード操作要求が受け付けられる。これは図６のステップＳ３の処理に対応する。そしてステップＳ２２で受け付けたアドレスがキャッシュに存在するか否かが調べられ、ステップＳ２３でアクセスアドレスがキャッシュヒットしているか否かが判定される。 FIG. 7 is a flowchart of processing on the transfer source cache side, here, on the cache memory _23a side. This process execution unit 22 _a processor core 21 _a is a flowchart of a process performed as the source execution unit. When the process is started in the figure, a data read operation request is received from the bus master 24 in step S21. This corresponds to the process of step S3 in FIG. In step S22, it is checked whether the accepted address exists in the cache. In step S23, it is determined whether the access address has a cache hit.

キャッシュヒットしている場合には、ステップＳ２４でバスマスタ２４に対してヒット応答が返され、ステップＳ２５で該当アドレスのデータとフラグがバスマスタ２４に返送され、ステップＳ２６でキャッシュラインの無効化、またはダーティビットのクリアが行われて処理を終了する。なお例えばスヌープキャッシュ制御の場合にはステップＳ２６の処理は省略される。ステップＳ２３でキャッシュヒットでないと判定されると、ステップＳ２７でキャッシュミス応答がバスマスタ２４に送られて処理を終了する。 If there is a cache hit, a hit response is returned to the bus master 24 in step S24, the data and flag at the corresponding address are returned to the bus master 24 in step S25, and the cache line is invalidated or dirty in step S26. The bit is cleared and the process ends. For example, in the case of snoop cache control, the process of step S26 is omitted. If it is determined in step S23 that there is no cache hit, a cache miss response is sent to the bus master 24 in step S27, and the process is terminated.

図８は、転送先キャッシュ側、ここではプロセッサコア２１_ｂ上のキャッシュメモリ２３_ｂ側における処理のフローチャートである。同図において処理が開始されると、まずステップＳ３１でバスマスタ２４からのライト操作要求が受け付けられる。これは図６のステップＳ８の処理に対応する。そしてステップＳ３２で受け付けたアドレスがキャッシュに存在するかが調べられ、ステップＳ３３でアクセスアドレスがキャッシュヒットしているか否かが判定される。 8, the transfer destination cache side, where a flow chart of the processing in the cache memory 23 _b side of the processor core 21 _b. When the processing is started in the figure, first, a write operation request from the bus master 24 is accepted in step S31. This corresponds to the process of step S8 in FIG. In step S32, it is checked whether the received address exists in the cache. In step S33, it is determined whether the access address has a cache hit.

ヒットしている場合には、ステップＳ３４でバスマスタ２４から送られたデータ、およびフラグがキャッシュに書き込まれて処理を終了する。キャッシュヒットでない場合には、ステップＳ３５でキャッシュに空きエントリがあるか否かが判定され、空きエントリがある場合には、ステップＳ３４でその空きエントリにデータとフラグが書き込まれ、処理を終了する。空きエントリがない場合には、ステップＳ３６でメインメモリにコピーバック処理が行われて、空きエントリが作成され、その空きエントリにステップＳ３４でデータとフラグが書き込まれて処理を終了する。なおステップＳ３５、Ｓ３６の処理は本発明とは直接関係のない処理である。 If there is a hit, the data and flag sent from the bus master 24 in step S34 are written in the cache, and the process is terminated. If it is not a cache hit, it is determined in step S35 whether or not there is an empty entry in the cache. If there is an empty entry, data and a flag are written in the empty entry in step S34, and the process is terminated. If there is no empty entry, copy back processing is performed in the main memory in step S36 to create an empty entry, data and a flag are written in the empty entry in step S34, and the process is terminated. Note that the processes in steps S35 and S36 are processes not directly related to the present invention.

第１の実施例におけるデータ転送処理は、基本的に図６から図８のフローチャートに対応して行われるが、第１の実施例におけるデータ転送シーケンスの例について、図９から図１２を用いて説明する。図９では転送元実行部、例えばプロセッサコア２１_ａの実行部２２_ａからバスマスタ２４に対して、転送条件の指示と転送開始の指示が与えられる。この動作は図６のステップＳ１、Ｓ２の処理に対応する。バスマスタ２４は転送元キャッシュ、ここではキャッシュメモリ２３_ａに対するデータリード操作を行い、転送元キャッシュ側からのキャッシュヒット応答を受け取り、また読み出しデータとフラグを受け取る。そしてバスマスタ２４は転送先キャッシュ、例えばキャッシュメモリ２３_ｂに対するデータライト操作として、書き込みデータ、およびフラグを転送する。一方転送元キャッシュ、すなわちキャッシュメモリ２３_ａでは、例えば転送したデータの無効化が行われる。 The data transfer process in the first embodiment is basically performed in accordance with the flowcharts of FIGS. 6 to 8, but an example of the data transfer sequence in the first embodiment will be described with reference to FIGS. explain. The source execution unit 9, for example, to the bus master 24 from the processor core 21 _a of the execution unit 22 _a, is given an instruction instructing the start Transfer Conditions. This operation corresponds to the processing of steps S1 and S2 in FIG. The bus master 24 performs _a data read operation on the transfer source cache, here, the cache memory 23a, receives a cache hit response from the transfer source cache side, and receives read data and a flag. The bus master 24 is the destination cache, for example, as a data write operation to the cache memory 23 _b, and transfers the write data, and a flag. On the other hand, in the transfer source cache, that is, the cache memory _23a , for example, the transferred data is invalidated.

図１０は、転送元におけるキャッシュミスの場合の動作シーケンスである。図９と同様に、転送元実行部からバスマスタ２４に対して転送条件指示と転送開始指示が与えられ、バスマスタ２４から転送元キャッシュ側に対してデータリード操作が行われるが、キャッシュミス応答が返されるためにデータ転送が中断される。これらの動作は図６のステップＳ１４、Ｓ１７の処理に相当し、下位メモリ階層からデータを取得しない場合に対応する。 FIG. 10 is an operation sequence in the case of a cache miss at the transfer source. As in FIG. 9, a transfer condition instruction and a transfer start instruction are given from the transfer source execution unit to the bus master 24, and a data read operation is performed from the bus master 24 to the transfer source cache side, but a cache miss response is returned. Therefore, the data transfer is interrupted. These operations correspond to the processing of steps S14 and S17 in FIG. 6 and correspond to the case where data is not acquired from the lower memory hierarchy.

図１１は、キャッシュミス応答がバスマスタ２４に返された場合に、下位メモリ階層、例えばメインメモリからのデータ読み出しを行う場合の動作シーケンスである。図１０にと同様にバスマスタ２４が転送元キャッシュからキャッシュミス応答を受け取ると、図６のステップＳ１５、Ｓ１６に対応して、下位メモリ階層に対するデータリード操作を行い、読み出しデータを受け取ると、バスマスタ２４は転送先キャッシュに対するライト操作としてデータ、およびフラグの書き込みを行い、例えば転送先キャッシュは転送元実行部に対してデータ転送の完了を通知する。前述のように、このデータ転送完了通知はバスマスタから転送元実行部に送られることも可能であるが、バスマスタがその通知を行う場合には、その時点では実際にデータ転送が完了していないことも考えられ、実際にデータとフラグが転送された転送先キャッシュから転送完了通知を行うほうがより確実な動作となる。 FIG. 11 shows an operation sequence when data is read from the lower memory hierarchy, for example, the main memory, when a cache miss response is returned to the bus master 24. As in FIG. 10, when the bus master 24 receives a cache miss response from the transfer source cache, corresponding to steps S15 and S16 in FIG. 6, performs a data read operation on the lower memory hierarchy and receives read data. Writes data and flags as a write operation to the transfer destination cache. For example, the transfer destination cache notifies the transfer source execution unit of the completion of data transfer. As described above, this data transfer completion notification can be sent from the bus master to the transfer source execution unit. However, when the bus master performs the notification, the data transfer is not actually completed at that time. It is also possible to perform a more reliable operation by notifying the completion of transfer from the transfer destination cache to which the data and flag are actually transferred.

図１２は、複数のキャッシュメモリに対するデータ転送動作のシーケンスである。図９におけると同様に、転送元キャッシュからバスマスタ２４に対して読み出しデータとフラグが与えられると、バスマスタ２４から転送先キャッシュ１、および転送先キャッシュ２、例えばキャッシュメモリ２３_ｂと２３_ｃとに対するライト操作としてデータとフラグの書込みが行われる。転送元キャッシュ、すなわちキャッシュメモリ２３_ａ側では、例えばデータの無効化が行われる。 FIG. 12 is a sequence of data transfer operations for a plurality of cache memories. As in FIG. 9, when the read data and the flag is provided to the bus master 24 from the source cache, transfer destination cache 1 from the bus master 24, and the transfer destination cache 2, for example, writes to the cache memory 23 _b and 23 _c Data and flags are written as operations. For example, data is invalidated on the transfer source cache, that is, the cache memory _23a side.

図１３は、マルチプロセッサシステムの第２の実施例の構成ブロック図である。同図を図５の第１の実施例と比較すると、図５におけるバス２６に代わって、バスマスタ２４によるキャッシュメモリ間の直接データ転送のための専用バス２７と、各キャッシュメモリと下位メモリ階層２５、例えばメインメモリとを接続するバス２８とが分離されている点だけが異なっている。 FIG. 13 is a block diagram showing the configuration of the second embodiment of the multiprocessor system. Comparing this figure with the first embodiment of FIG. 5, in place of the bus 26 in FIG. 5, a dedicated bus 27 for direct data transfer between the cache memories by the bus master 24, and each cache memory and lower memory hierarchy 25 For example, the only difference is that the bus 28 for connecting the main memory is separated.

図１３の第２の実施例においては、前述のように第１の実施例と異なってキャッシュメモリ間のバスマスタ２４によるデータ転送のための専用バス２７が設けられる点が第１の実施例と異なるだけであり、バスマスタ２４や各キャッシュメモリ側の処理については、第１の実施例におけると同様であり、その説明を省略する。 The second embodiment of FIG. 13 differs from the first embodiment in that a dedicated bus 27 for data transfer by the bus master 24 between the cache memories is provided unlike the first embodiment as described above. However, the processing on the bus master 24 and each cache memory side is the same as in the first embodiment, and the description thereof is omitted.

図１４、図１５は、第２の実施例におけるデータ転送シーケンスの例である。図１４においては、例えば図９におけると同様に、バスマスタ２４から転送先キャッシュに対してライト操作としてデータとフラグの書込みが行われるとともに、転送先キャッシュから下位メモリ階層２５に対して不要ラインの排出が行われ、また転送元実行部に対してデータ転送の完了通知が行われる。このうち不要ラインの排出は、例えば図８のステップＳ３５でキャッシュに空きエントリが存在せず、ステップＳ３６でコピーバック処理として不要ラインのデータが下位メモリ階層２５としてのメインメモリにはき出されることに対応する。この不要ライン排出は各キャッシュメモリと下位メモリ階層２５とを接続するバス２８を用いて行われる。また転送先キャッシュから転送元実行部へのデータ転送完了通知については、図１１におけると同様である。 14 and 15 show examples of data transfer sequences in the second embodiment. In FIG. 14, as in FIG. 9, for example, data and flags are written from the bus master 24 to the transfer destination cache as a write operation, and unnecessary lines are discharged from the transfer destination cache to the lower memory hierarchy 25. In addition, the transfer source execution unit is notified of the completion of data transfer. Of these, unnecessary line discharge corresponds to, for example, that there is no empty entry in the cache in step S35 of FIG. 8, and that in step S36 the data of unnecessary lines is exported to the main memory as the lower memory hierarchy 25 as copy back processing. To do. This unnecessary line discharge is performed using a bus 28 that connects each cache memory and the lower memory hierarchy 25. The data transfer completion notification from the transfer destination cache to the transfer source execution unit is the same as in FIG.

図１５は、転送先キャッシュから転送先実行部に対してデータ転送完了通知が行われる場合の動作シーケンスである。図１４と異なって、データ転送完了通知が転送先キャッシュから転送先実行部に送られる。転送先実行部は、バスマスタ２４によるデータ転送が完了したことを、転送先キャッシュからの通知によって確実に知ることが可能となり、必要に応じて、その後の処理に必要となるデータを転送先キャッシュに要求し、返送されたデータを用いて処理を実行することになる。 FIG. 15 shows an operation sequence when a data transfer completion notification is sent from the transfer destination cache to the transfer destination execution unit. Unlike FIG. 14, a data transfer completion notification is sent from the transfer destination cache to the transfer destination execution unit. The transfer destination execution unit can surely know that the data transfer by the bus master 24 has been completed by notification from the transfer destination cache. If necessary, the transfer destination execution unit stores data necessary for the subsequent processing in the transfer destination cache. The processing is executed using the requested and returned data.

図１６は、図１９、および図２１で説明した従来技術と本発明との比較の説明図である。同図において性能とは、基本的にキャッシュアクセスレイテンシと、データ移動にかかるレイテンシによって決定される。 FIG. 16 is an explanatory diagram for comparing the prior art described in FIG. 19 and FIG. 21 with the present invention. In the figure, the performance is basically determined by the cache access latency and the latency for data movement.

図１７は、キャッシュアクセスレイテンシと、データ移動にかかるレイテンシとの説明図である。同図は図１５の動作シーケンスに対応するものであり、転送先キャッシュからデータ転送完了通知が転送元実行部にも送られる点だけが追加されている。データ移動にかかるレイテンシとは、転送元実行部からバスマスタ２４に対して転送指示、すなわち転送条件の指示と転送開始指示が与えられてから、実際に転送データが転送先キャッシュに書き込まれ、転送先キャッシュから転送先実行部や転送元実行部に対して転送完了通知が行われるまでの時間であり、またキャッシュアクセスレイテンシとは、例えばデータ転送終了後に転送先実行部が必要に応じて転送先キャッシュに対してデータを要求し、転送データを受け取るまでの時間に相当する。 FIG. 17 is an explanatory diagram of cache access latency and latency for data movement. This figure corresponds to the operation sequence of FIG. 15, and only the point that a data transfer completion notification is also sent from the transfer destination cache to the transfer source execution unit is added. The latency associated with data movement refers to the fact that transfer data is actually written to the transfer destination cache after the transfer instruction from the transfer source execution unit is given to the bus master 24, that is, the transfer condition instruction and the transfer start instruction. This is the time until the transfer completion notification is sent from the cache to the transfer destination execution unit and the transfer source execution unit. The cache access latency is, for example, after the data transfer is completed, This corresponds to the time from requesting data to receiving transfer data.

図１６において、第１の従来技術では、本発明のようにＤＭＡＣを備えることなく、ハード実装コストは小さくてすむが、データ転送後に転送先実行部がデータを必要とする場合に、そのデータを下位メモリ階層、すなわちメインメモリに対して要求し、転送データを受け取るために、キャッシュアクセスレイテンシが大となり、その結果、性能は小さくなる。 In FIG. 16, the first prior art does not include a DMAC as in the present invention, and the hardware implementation cost can be reduced. However, when the transfer destination execution unit needs data after the data transfer, the data is transferred. In order to make a request to the lower memory hierarchy, that is, the main memory and receive the transfer data, the cache access latency is increased, and as a result, the performance is decreased.

これに対して第２の従来技術では、スヌープキャッシュを用いることによってデータ移動にかかるレイテンシは小となり、性能は大きくなるが、例えばブロードキャスト用のハード実装コストが大きくなる。 On the other hand, in the second prior art, by using the snoop cache, the latency for data movement is reduced and the performance is increased, but the hardware implementation cost for broadcasting is increased.

これに対して本発明においては、ＤＭＡＣを追加しなければならないことからハード実装コストがやや大きくなるというデメリットはあるが、データ移動にかかるレイテンシや、キャッシュアクセスレイテンシは小となり、性能は向上する。 On the other hand, in the present invention, there is a demerit that the hardware mounting cost is slightly increased because the DMAC has to be added, but the latency for data movement and the cache access latency are reduced, and the performance is improved.

特に第２の実施例では、キャッシュメモリ相互間の直接データ転送のためのバスを、メインメモリとキャッシュメモリを結ぶバスと独立に設けることより、例えばプロセッサコアの数が増大しても、メモリバストラフィックを小さく抑えることが可能となり、大きな性能を得ることができる。 Particularly in the second embodiment, the bus for direct data transfer between the cache memories is provided independently of the bus connecting the main memory and the cache memory. For example, even if the number of processor cores increases, the memory bus Traffic can be kept small, and a large performance can be obtained.

すなわち本発明においては、従来技術と比較して、キャッシュメモリ間のデータ転送はメインメモリを介することがないため、メインメモリとの間のバス帯域を消費せず、またスヌープキャッシュと比較して、例えばプログラムで指定されているキャッシュメモリ相互間でのみデータ転送を行うため、バストラフィックは最小限となり、プロセッサコアの数が増大しても実用的な動作を維持することが可能となる。さらに転送先プロセッサコアは、データが必要となった時点でキャッシュにアクセスし、メインメモリにアクセスする必要がないため、小さいキャッシュアクセスレイテンシで必要データを受け取ることが可能となる。 That is, in the present invention, the data transfer between the cache memories does not go through the main memory as compared with the prior art, so the bus bandwidth with the main memory is not consumed, and compared with the snoop cache, For example, since data transfer is performed only between cache memories specified by a program, bus traffic is minimized, and a practical operation can be maintained even if the number of processor cores increases. Furthermore, since the transfer destination processor core accesses the cache when data is needed and does not need to access the main memory, it can receive the necessary data with a small cache access latency.

本発明のマルチプロセッサシステムの原理構成ブロック図である。1 is a block diagram showing the principle configuration of a multiprocessor system according to the present invention. 本発明のマルチプロセッサシステムの基本構成ブロック図である。It is a basic composition block diagram of the multiprocessor system of the present invention. 図２におけるデータ更新結果のメインメモリへの反映の説明図である。It is explanatory drawing of reflection to the main memory of the data update result in FIG. 図２における複数のキャッシュメモリへのデータ直接転送の説明図である。It is explanatory drawing of the data direct transfer to the some cache memory in FIG. マルチプロセッサシステムの第１の実施例の構成ブロック図である。1 is a configuration block diagram of a first embodiment of a multiprocessor system. FIG. バスマスタによる処理の詳細フローチャートである。It is a detailed flowchart of the process by a bus master. 転送元キャッシュ側における処理の詳細フローチャートである。It is a detailed flowchart of the process in the transfer source cache side. 転送先キャッシュ側における処理の詳細フローチャートである。It is a detailed flowchart of the process in the transfer destination cache side. 第１の実施例におけるデータ転送シーケンスの例（その１）である。It is an example (the 1) of the data transfer sequence in a 1st Example. 第１の実施例におけるデータ転送シーケンスの例（その２）である。It is an example (the 2) of the data transfer sequence in a 1st Example. 第１の実施例におけるデータ転送シーケンスの例（その３）である。It is an example (the 3) of the data transfer sequence in a 1st Example. 第１の実施例におけるデータ転送シーケンスの例（その４）である。It is an example (the 4) of the data transfer sequence in a 1st Example. マルチプロセッサシステムの第２の実施例の構成ブロック図である。It is a block diagram of the configuration of the second embodiment of the multiprocessor system. 第２の実施例におけるデータ転送シーケンスの例（その１）である。It is an example (the 1) of the data transfer sequence in a 2nd Example. 第２の実施例におけるデータ転送シーケンスの例（その２）である。It is an example (the 2) of the data transfer sequence in a 2nd Example. 本発明と従来技術との比較の説明図である。It is explanatory drawing of the comparison with this invention and a prior art. データ転送性能としてのデータ移動にかかるレイテンシとキャッシュアクセスレイテンシとの説明図である。It is explanatory drawing of the latency concerning data movement as a data transfer performance, and cache access latency. マルチプロセッサシステムの従来例の構成ブロック図である。It is a configuration block diagram of a conventional example of a multiprocessor system. データ転送の第１の従来技術の説明図である。It is explanatory drawing of the 1st prior art of data transfer. 第１の従来技術によるデータ転送シーケンスの説明図である。It is explanatory drawing of the data transfer sequence by 1st prior art. データ転送の第２の従来技術の説明図である。It is explanatory drawing of the 2nd prior art of data transfer. 第２の従来技術におけるデータ転送シーケンスの説明図である。It is explanatory drawing of the data transfer sequence in a 2nd prior art.

Explanation of symbols

１、２１プロセッサコア
２、１２、２３キャッシュメモリ
３直接メモリアクセス制御手段
１０、２０チップ
１１プロセッサ
１４ダイレクト・メモリ・アクセス・コントローラ（ＤＭＡＣ）
１５メインメモリ
１６、２６バス
２２実行部
２４バスマスタ
２５下位メモリ階層
２７キャッシュメモリ間直接データ転送バス
２８キャッシュメモリと下位メモリ階層との間のバス 1, 21 Processor core 2, 12, 23 Cache memory 3 Direct memory access control means 10, 20 Chip 11 Processor 14 Direct memory access controller (DMAC)
15 Main memory 16, 26 Bus 22 Execution unit 24 Bus master 25 Lower memory hierarchy 27 Direct data transfer bus between cache memories 28 Bus between cache memory and lower memory hierarchy

Claims

A multiprocessor system comprising a plurality of processor cores,
A plurality of cache memories respectively corresponding to the plurality of processor cores;
And a direct memory access control means for controlling direct data transfer among the plurality of cache memories.

The direct memory access control means should invalidate the output data on the data transfer source cache memory after the transfer data is output from the data transfer source cache memory, or clear the updated flag for the data A register for storing data indicating whether or not
The memory access control means requests the data transfer source cache to invalidate the output data or clear the updated flag for the data corresponding to the stored contents of the register. Item 4. The multiprocessor system according to Item 1.

Whether the direct memory access control means should notify the data transfer source processor core or the transfer destination processor core of completion of data transfer from the data transfer source cache memory to the data transfer destination cache memory A register for storing data indicating
Corresponding to the stored contents of the register, the direct memory access control means notifies the transfer source processor core or the transfer destination processor core of the completion of data transfer when the data write access to the transfer destination cache memory is completed. The multiprocessor system according to claim 1, wherein:

2. The direct memory access control means suspends all data transfer processing when receiving a cache miss response from the transfer source cache memory during data read access to the data transfer source cache memory. The described multiprocessor system.

When the direct memory access control means receives a cache miss response from the transfer source cache memory at the time of data read access to the data transfer source cache memory, the direct memory access control means reads the data to be transferred from the lower memory hierarchy, The multiprocessor system according to claim 1, wherein the read data is transferred to the computer.