JP2015210659A

JP2015210659A - Arithmetic processing unit and control method of arithmetic processing unit

Info

Publication number: JP2015210659A
Application number: JP2014091683A
Authority: JP
Inventors: 直宏清田; Naohiro Kiyota
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 2014-04-25
Filing date: 2014-04-25
Publication date: 2015-11-24
Anticipated expiration: 2034-04-25
Also published as: JP6287545B2

Abstract

PROBLEM TO BE SOLVED: To provide an arithmetic processing unit with improved throughput and a control method of arithmetic processing unit.SOLUTION: A cache RAM 133 has RAMs #0-#15, and each of the RAMs #0-#15 is assigned with an address the way of which is continuous. An instruction control section 11 acquires an arithmetic processing instruction which includes plural cache access requests 200 each of which instructs to process on the data with predetermined address intervals on the same way, and transmits the acquired cache access requests 200. A cache control section 13 receives cache access requests 200 transmitted from the instruction control section 11 and controls to execute the access processing on the data at predetermined address intervals on the way specified by the cache access requests 200. An arithmetic control section 12 controls to execute the arithmetic processing based on the access processing result made by the cache control section 13.

Description

本発明は、演算処理装置及び演算処理装置の制御方法に関する。 The present invention relates to an arithmetic processing device and a control method for the arithmetic processing device.

近年、演算性能向上のために、演算処理装置は、ＳＩＭＤ（Single Instruction Multiple Data）と呼ばれる処理を行うことがある。ＳＩＭＤは、複数のデータに対する演算を１つの命令でまとめて処理する手法である。また、ＳＩＭＤを用いた演算処理の性能向上のために、ロード・ストア処理も、ＳＩＭＤに応じた拡張がなされることが好ましい。 In recent years, in order to improve calculation performance, an arithmetic processing apparatus sometimes performs processing called SIMD (Single Instruction Multiple Data). SIMD is a method for processing operations on a plurality of pieces of data in a single instruction. Also, in order to improve the performance of arithmetic processing using SIMD, it is preferable that the load / store processing is also expanded according to SIMD.

また、ＳＩＭＤを用いたロード・ストア処理では、演算器とキャッシュとの間の高スループットが性能向上の鍵となる。ＳＩＭＤを用いたロード・ストア処理では、１つのロード・ストア命令に対して１サイクルでＳＩＭＤデータ幅分のデータが演算器とキャッシュとの間で転送されることが好ましい。 Also, in load / store processing using SIMD, high throughput between the arithmetic unit and the cache is a key to performance improvement. In load / store processing using SIMD, it is preferable that data corresponding to the SIMD data width is transferred between the arithmetic unit and the cache in one cycle for one load / store instruction.

例えば、４ウェイセットアソシエイティブ方式のキャッシュ構成で、ＳＩＭＤデータ幅が３２Ｂｙｔｅのロード・ストア処理を実現するには次のような構成を有すればよい。 For example, in order to realize a load / store process with a SIMD data width of 32 bytes in a 4-way set associative cache configuration, the following configuration may be used.

この場合、キャッシュは、４つのウェイから成り、ウェイ毎に４個のＲＡＭ（Random Access Memory）を有する。そして、各ＲＡＭは、例えば、読み出し幅８Ｂｙｔｅの１ｒｅａｄｏｒ１ｗｅｉｔｅＲＡＭである。これにより、１ウェイあたりの１回の読み出し幅は８Ｂｙｔｅ×４＝３２Ｂｙｔｅとなる。よって、このキャッシュＲＡＭを用いれば、８Ｂｙｔｅ×４＝３２ＢｙｔｅのＳＩＭＤデータ幅を１サイクルで読み出すことができる。すなわち、このキャッシュＲＡＭであればＳＩＭＤデータ幅が３２Ｂｙｔｅのロード・ストア処理を実現できる。 In this case, the cache is composed of four ways, and each way has four RAMs (Random Access Memory). Each RAM is, for example, a 1 read or 1 weight RAM with a read width of 8 bytes. As a result, the read width per one way is 8 bytes × 4 = 32 bytes. Therefore, if this cache RAM is used, the SIMD data width of 8 bytes × 4 = 32 bytes can be read in one cycle. In other words, this cache RAM can realize load / store processing with a SIMD data width of 32 bytes.

また、行列積演算などを実行する場合、特定のアドレス間隔でメモリ上にデータが配置されることが考えられる。このような、特定のアドレス間隔でメモリ上に配置されたデータの間隔をストライドと呼び、ストライドパターンで配置されたデータへのアクセスをストライドアクセスと呼ぶ場合がある。 In addition, when executing a matrix product operation or the like, it is conceivable that data is arranged on the memory at specific address intervals. Such an interval between data arranged on a memory at a specific address interval may be called a stride, and access to data arranged in a stride pattern may be called a stride access.

このようなストライドアクセスを考慮して、更なる演算性能向上の技術として、ストライドパターンの複数のデータに対して、１つの命令でＳＩＭＤ演算を行う技術があり、それに伴うロード・ストア処理を、ストライドロード・ストライドストア処理と呼ぶ場合がある。 In consideration of such stride access, there is a technique for performing SIMD calculation with a single instruction for a plurality of data in a stride pattern as a technique for further improving the calculation performance. It may be called a load stride store process.

このストライドロード・ストライドストア処理を実行する場合にも、演算器とキャッシュとの間の高スループットは性能向上のために重要であり、ストライドパターンのデータを１サイクルで転送することが好ましい。 Even when this stride load / stride store process is executed, high throughput between the arithmetic unit and the cache is important for improving performance, and it is preferable to transfer stride pattern data in one cycle.

例えば、ストライドアクセスの技術として、メモリのインタリーブ構成をストライドの間隔に合わせることで複数のデータを同時にアクセスする従来技術がある。また、ストライドの間隔に合わせて、メモリのアクセス単位を変更して、スループットを向上させる従来技術がある。 For example, as a stride access technique, there is a conventional technique in which a plurality of data is accessed simultaneously by adjusting the memory interleave configuration to the stride interval. Further, there is a conventional technique for improving the throughput by changing the memory access unit in accordance with the stride interval.

特開２０００−１６３３１６号公報JP 2000-163316 A 特開２００８−２５０９２６号公報JP 2008-250926 A

しかしながら、従来のキャッシュ構成のように各ウェイに独立してＲＡＭが割り当てられた構成のままストライドアクセスを行うと、１サイクルで全要素のデータにアクセスすることが困難である。 However, if stride access is performed in a configuration in which a RAM is independently assigned to each way as in the conventional cache configuration, it is difficult to access data of all elements in one cycle.

例えば、前述のように１ウェイに読み出し幅８Ｂｙｔｅの１ｒｅａｄｏｒ１ｗｅｉｔｅＲＡＭが４つ割り当てられた構成で、先頭アドレスである０ｘ０から８ＢｙｔｅデータをＳＩＭＤデータ幅が３２Ｂｙｔｅで且つストライド間隔が２でロードする場合で説明する。ここでは、ＳＩＭＤの４つのキャッシュアクセスリクエストを、要素０〜３とする。 For example, as described above, 4 ways of 1 read or 1 weight RAM with a read width of 8 bytes are allocated to 1 way, and 8 bytes of data from 0x0, which is the top address, are loaded with a SIMD data width of 32 bytes and a stride interval of 2. The case will be described. Here, four cache access requests of SIMD are set as elements 0 to 3.

この構成では３２Ｂｙｔｅ毎に同じＲＡＭに戻ってくるので、要素２のデータは、０ｘ０から３２Ｂｙｔｅ後のアドレスから始まる場所に格納されるので、要素０のデータと要素２のデータとは同じＲＡＭ上に存在する。また、同様の理由で、要素１のデータと要素３のデータとは同じＲＡＭ上に存在する。 In this configuration, the data is returned to the same RAM every 32 bytes, so the data of element 2 is stored at a location starting from the address after 0 bytes from 0x0, so the data of element 0 and the data of element 2 are stored in the same RAM. Exists. For the same reason, the data of element 1 and the data of element 3 exist on the same RAM.

このように、従来からのウェイの配置では、１サイクルで全要素のデータにアクセスすることが困難であり、スループットが低下してしまうおそれがある。 As described above, in the conventional way arrangement, it is difficult to access data of all elements in one cycle, and the throughput may be reduced.

また、メモリのインタリーブ構成をストライドの間隔に合わせる従来技術を用いても、ストライドアクセスにおいて同じＲＡＭへのアクセスが発生してしまい、１サイクルで全要素のデータにアクセスすることが困難である。また、ストライド間隔に応じてメモリのアクセス単位を変える従来技術を用いても、１ウェイに対するＲＡＭの構成は変わらず、ストライドアクセスにおいて同じＲＡＭへのアクセスが発生してしまい、１サイクルで全要素のデータにアクセスすることが困難である。すなわち、いずれの従来技術を用いても、スループットが低下してしまうおそれがある。 Further, even when the conventional technique of adjusting the memory interleaved configuration to the stride interval is used, the same RAM is accessed in the stride access, and it is difficult to access the data of all elements in one cycle. In addition, even when the conventional technology that changes the memory access unit according to the stride interval is used, the RAM configuration for one way does not change, and the same RAM is accessed in the stride access. It is difficult to access the data. That is, there is a possibility that the throughput may be reduced by using any of the conventional techniques.

開示の技術は、上記に鑑みてなされたものであって、スループットを向上させた演算処理装置及び演算処理装置の制御方法を提供することを目的とする。 The disclosed technology has been made in view of the above, and an object thereof is to provide an arithmetic processing device and a control method for the arithmetic processing device that improve the throughput.

本願の開示する演算処理装置及び演算処理装置の制御方法は、一つの態様において、キャッシュメモリは、ウェイ毎に複数の記憶素子をそれぞれ有し、各ウェイに属する複数の前記記憶素子に対して連続するアドレスが割り当てられる。命令制御部は、同じウェイに含まれる所定のアドレス間隔のデータに対する処理をそれぞれ指示する複数のキャッシュアクセス要求を含む演算処理命令を取得し、取得した複数の前記キャッシュアクセス要求を送信する。キャッシュ制御部は、前記命令制御部から送信された複数の前記キャッシュアクセス要求をそれぞれ受信し、各前記キャッシュアクセス要求で指定された前記ウェイに含まれる前記所定のアドレス間隔のデータに対する複数のアクセス処理を実行する。演算制御部は、前記キャッシュ制御部による前記アクセス処理の処理結果を基に演算処理を行う。 In one aspect, the arithmetic processing device and the control method of the arithmetic processing device disclosed in the present application are such that the cache memory has a plurality of storage elements for each way, and is continuous with respect to the plurality of storage elements belonging to each way. Address to be assigned. The instruction control unit acquires an operation processing instruction including a plurality of cache access requests that respectively instruct processing on data at a predetermined address interval included in the same way, and transmits the acquired plurality of cache access requests. The cache control unit receives each of the plurality of cache access requests transmitted from the instruction control unit, and performs a plurality of access processes on the data at the predetermined address interval included in the way specified by each cache access request Execute. The arithmetic control unit performs arithmetic processing based on the processing result of the access processing by the cache control unit.

本願の開示する演算処理装置及び演算処理装置の制御方法の一つの態様によれば、所定のアドレス間隔のデータに対するデータの読み出しであってもスループットを維持することができるという効果を奏する。 According to one aspect of the arithmetic processing device and the control method of the arithmetic processing device disclosed in the present application, there is an effect that the throughput can be maintained even when data is read from data at a predetermined address interval.

図１は、演算処理装置のブロック図である。FIG. 1 is a block diagram of an arithmetic processing unit. 図２は、キャッシュ制御部の概略構成図である。FIG. 2 is a schematic configuration diagram of the cache control unit. 図３は、実施例におけるキャッシュＲＡＭへのアドレスの割り当てを示す図である。FIG. 3 is a diagram illustrating allocation of addresses to the cache RAM in the embodiment. 図４は、アドレスの干渉が発生しないことを説明するための図である。FIG. 4 is a diagram for explaining that no address interference occurs. 図５は、ＲＡＭアドレス生成回路の読み出し機能のブロック図である。FIG. 5 is a block diagram of the read function of the RAM address generation circuit. 図６は、要素毎の加算器による加算量を示す図である。FIG. 6 is a diagram showing the addition amount by the adder for each element. 図７は、読み出しに対応するセレクタの詳細を表す回路図である。FIG. 7 is a circuit diagram showing details of the selector corresponding to reading. 図８は、ＲＡＭ＃０読出用選択条件回路の回路図である。FIG. 8 is a circuit diagram of the RAM # 0 read selection condition circuit. 図９は、ウェイとアドレスによる読み出し時のＲＡＭ選択真理値表を示す図である。FIG. 9 is a diagram showing a RAM selection truth table at the time of reading by way and address. 図１０は、読み出し時にＲＡＭ＃０が選択される条件を表す図である。FIG. 10 is a diagram illustrating conditions under which RAM # 0 is selected during reading. 図１１は、ＲＡＭアドレス生成回路の書き込み機能のブロック図である。FIG. 11 is a block diagram of the write function of the RAM address generation circuit. 図１２は、ウェイとアドレスによる書き込み時のＲＡＭ選択真理値表を示す図である。FIG. 12 is a diagram showing a RAM selection truth table at the time of writing by way and address. 図１３は、書き込みに対応するセレクタの詳細を表す回路図である。FIG. 13 is a circuit diagram showing details of a selector corresponding to writing. 図１４は、ＲＡＭ＃０書込用選択条件回路の回路図である。FIG. 14 is a circuit diagram of the RAM # 0 write selection condition circuit. 図１５は、書き込み時にＲＡＭ＃０が選択される条件を表す図である。FIG. 15 is a diagram showing conditions for selecting RAM # 0 during writing. 図１６は、実施例に係る演算処理装置によるロード・ストア処理のフローチャートである。FIG. 16 is a flowchart of the load / store process performed by the arithmetic processing apparatus according to the embodiment.

以下に、本願の開示する演算処理装置及び演算処理装置の制御方法の実施例を図面に基づいて詳細に説明する。なお、以下の実施例により本願の開示する演算処理装置及び演算処理装置の制御方法が限定されるものではない。 Embodiments of an arithmetic processing device and a control method for the arithmetic processing device disclosed in the present application will be described below in detail with reference to the drawings. The following embodiments do not limit the arithmetic processing device and the control method of the arithmetic processing device disclosed in the present application.

図１は、演算処理装置のブロック図である。本実施例では、演算処理装置としてＣＰＵ（Central Processing Unit）を例に説明する。 FIG. 1 is a block diagram of an arithmetic processing unit. In this embodiment, a CPU (Central Processing Unit) will be described as an example of the arithmetic processing device.

ＣＰＵ１は、命令制御部１１、演算制御部１２、キャッシュ制御部１３、二次キャッシュ制御部１４及びメモリ制御部１５を有する。そして、メモリ制御部１５は、メモリ２と接続する。 The CPU 1 includes an instruction control unit 11, an operation control unit 12, a cache control unit 13, a secondary cache control unit 14, and a memory control unit 15. The memory control unit 15 is connected to the memory 2.

命令制御部１１は、アプリケーションなどから命令を取得する。そして、命令制御部１１は、取得した命令にキャッシュアクセスリクエストが含まれているか否かを判定する。キャッシュアクセスリクエストとは、例えば、メモリからキャッシュを介して演算器へのデータ読み込みや演算器からキャッシュを介してのメモリへのデータ書き込みを要求するロード・ストアリクエストなどである。 The instruction control unit 11 acquires an instruction from an application or the like. Then, the instruction control unit 11 determines whether or not a cache access request is included in the acquired instruction. The cache access request is, for example, a load / store request for requesting data reading from the memory to the computing unit via the cache or data writing from the computing unit to the memory via the cache.

本実施例に係るキャッシュアクセスリクエストは、メモリ上で非連続な複数のデータをアクセスするストライドアクセスを行うリクエストである。 The cache access request according to the present embodiment is a request for performing stride access to access a plurality of non-contiguous data on the memory.

命令制御部１１は、命令に含まれるキャッシュアクセスリクエストをキャッシュ制御部１３へ送信する。以下では、キャッシュアクセスリクエストを、単に「リクエスト」と呼ぶ。 The instruction control unit 11 transmits a cache access request included in the instruction to the cache control unit 13. Hereinafter, the cache access request is simply referred to as “request”.

また、命令制御部１１は、命令に含まれる演算処理の実行要求などを演算制御部１２へ送出する。 In addition, the instruction control unit 11 sends an execution request for the arithmetic processing included in the instruction to the arithmetic control unit 12.

キャッシュ制御部１３は、一次キャッシュを有する。そして、キャッシュ制御部１３は、ストライドアクセスを示すリクエストを命令制御部１１から受信する。 The cache control unit 13 has a primary cache. Then, the cache control unit 13 receives a request indicating stride access from the instruction control unit 11.

受信した各リクエストがロード処理のリクエストの場合、キャッシュ制御部１３は、一次キャッシュにおいてキャッシュヒットするか否かを判定する。キャッシュヒットしない場合、キャッシュ制御部１３は、データ転送要求を二次キャッシュ制御部１４へ通知する。その後、キャッシュ制御部１３は、データ転送要求に対する応答データを二次キャッシュ制御部１４から受信する。そして、キャッシュ制御部１３は、応答データを一次キャッシュへ書き込む。その後、キャッシュ制御部１３は、一次キャッシュにおいてキャッシュヒットするか否かを判定する。キャッシュヒットした場合、キャッシュ制御部１３は、ヒットしたデータを一次キャッシュから取得し演算制御部１２へ送信する。 When each received request is a request for load processing, the cache control unit 13 determines whether or not a cache hit occurs in the primary cache. If there is no cache hit, the cache control unit 13 notifies the secondary cache control unit 14 of a data transfer request. Thereafter, the cache control unit 13 receives response data for the data transfer request from the secondary cache control unit 14. Then, the cache control unit 13 writes the response data to the primary cache. Thereafter, the cache control unit 13 determines whether or not a cache hit occurs in the primary cache. When a cache hit occurs, the cache control unit 13 acquires the hit data from the primary cache and transmits it to the arithmetic control unit 12.

受信した各リクエストがストア処理のリクエストの場合、キャッシュ制御部１３は、一次キャッシュ上の格納先のデータを更新する。 If each received request is a request for store processing, the cache control unit 13 updates the data stored in the primary cache.

ここで、図２を参照して、キャッシュ制御部１３について詳細に説明する。図２は、キャッシュ制御部の概略構成図である。キャッシュ制御部１１は、タグ判定部１３１、ＲＡＭアドレス生成回路１３２、キャッシュＲＡＭ１３３、ヒット判定部１３４及びメモリアクセス制御部１３５を有する。本実施例では、ストライドの幅は、１〜７まで用いられるものとし、特にストライド幅が４の場合を例に説明する。また、本実施例では、１つの命令に要素がＥ１〜Ｅ４の４つ含まれる場合で説明する。以下では、１つの命令に含まれるリクエストの数を「要素数」という。すなわち、本実施例では、要素数が４である。また、本実施例では、キャッシュＲＡＭ１３３は、４つのウェイを有する場合で説明する。 Here, the cache control unit 13 will be described in detail with reference to FIG. FIG. 2 is a schematic configuration diagram of the cache control unit. The cache control unit 11 includes a tag determination unit 131, a RAM address generation circuit 132, a cache RAM 133, a hit determination unit 134, and a memory access control unit 135. In this embodiment, the stride width is assumed to be 1 to 7, and the case where the stride width is 4 will be described as an example. In this embodiment, a case where four elements E1 to E4 are included in one instruction will be described. Hereinafter, the number of requests included in one instruction is referred to as “number of elements”. That is, in this embodiment, the number of elements is four. In this embodiment, the cache RAM 133 is described as having four ways.

ここで、本実施例では、一次キャッシュであるキャッシュＲＡＭ１３３に、ＲＡＭ＃０〜＃１５の１６個の１ｒｅａｄｏｒ１ｗｅｉｔｅＲＡＭが搭載される。ここで、ＲＡＭ＃０〜＃１５を構成する１つのＲＡＭからのデータの読み出し幅を８Ｂｙｔｅとする。すなわち、ストライド幅が４であるので、１つの命令により１回に読み出し可能なデータ量、すなわち１ウェイあたりの１回の読み出し幅は３２Ｂｙｔｅである。以下では、ＲＡＭ＃０〜＃１５を区別しない場合、単に「ＲＡＭ」という。このＲＡＭが、「記憶素子」の一例にあたる。 Here, in the present embodiment, 16 1 read or 1 weight RAMs of RAMs # 0 to # 15 are mounted on the cache RAM 133 which is a primary cache. Here, it is assumed that the read width of data from one RAM constituting the RAMs # 0 to # 15 is 8 bytes. That is, since the stride width is 4, the amount of data that can be read at one time by one command, that is, the read width per one way is 32 bytes. Hereinafter, when the RAMs # 0 to # 15 are not distinguished, they are simply referred to as “RAM”. This RAM is an example of a “memory element”.

ここで、キャッシュＲＡＭ１３３に含まれるＲＡＭの数は、使用されるストライド幅に対して次の２つの条件のいずれかを満たすように設定される。第１の条件は、ストライド幅とＲＡＭの数が素であるという条件である。第２の条件は、ストライド幅に要素数を乗算した値がＲＡＭの数とストライド幅の最小公倍数を越えないという条件である。本実施例では、使用するストライド幅は、１〜７までである。この場合、ストライド幅が１〜４及び６の場合、第２の条件を満たす。また、ストライド幅が５又は７の場合、第１の条件を満たす。 Here, the number of RAMs included in the cache RAM 133 is set so as to satisfy one of the following two conditions with respect to the stride width used. The first condition is that the stride width and the number of RAMs are prime. The second condition is that the value obtained by multiplying the stride width by the number of elements does not exceed the least common multiple of the number of RAMs and the stride width. In this embodiment, the stride width used is from 1 to 7. In this case, when the stride width is 1 to 4 and 6, the second condition is satisfied. When the stride width is 5 or 7, the first condition is satisfied.

キャッシュＲＡＭ１３３には、図３に示すように、アドレスが割り当てられる。図３は、実施例におけるキャッシュＲＡＭへのアドレスの割り当てを示す図である。具体的には、本実施例では、１ウェイにおける連続する１２８Ｂｙｔｅ分のアドレスが、ＲＡＭ＃０〜＃１５の内の同じＲＡＭにあたらないように、アドレスが割り当てられる。そして、図３におけるｗ０，ｗ１，ｗ２，ｗ３はそれぞれ、ウェイを表す。 An address is assigned to the cache RAM 133 as shown in FIG. FIG. 3 is a diagram illustrating allocation of addresses to the cache RAM in the embodiment. Specifically, in the present embodiment, addresses are allocated so that addresses of continuous 128 bytes in one way do not correspond to the same RAM among the RAMs # 0 to # 15. In FIG. 3, w0, w1, w2, and w3 each represent a way.

さらに、ストライド幅の数のメモリを１つのメモリグループとして、そのメモリグループの各ＲＡＭにおいて、各ウェイｗ０〜ｗ３同士のアドレスが連続するように割り当てられる。 Further, the memories having the number of stride widths are assigned as one memory group, and the addresses of the respective ways w0 to w3 are assigned consecutively in each RAM of the memory group.

ここで、図３では、ウェイ毎の１２８Ｂｙｔｅ分のアドレスの割り当て方について記載したが、実際には、図３で示す割り当て方で、ＲＡＭ＃０〜＃１５の記憶領域全てにアドレスの割り当てが繰り返されていく。 Here, FIG. 3 describes how to assign 128 bytes of addresses for each way. Actually, however, the address assignment is repeated for all the storage areas of the RAMs # 0 to # 15 according to the assignment shown in FIG. It will be.

例えば、ウェイｗ０における００〜０７までの８ビットのアドレスがＲＡＭ＃０に割り当てられる。また、ウェイｗ０における０８〜０Ｆまでの８ビットのアドレスがＲＡＭ＃１に割り当てられる。また、ウェイｗ０における１０〜１７までの８ビットのアドレスがＲＡＭ＃２に割り当てられる。また、ウェイｗ０における１８〜１Ｆまでの８ビットのアドレスがＲＡＭ＃３に割り当てられる。また、ウェイｗ０における２０〜２７までの８ビットのアドレスがＲＡＭ＃４に割り当てられる。また、ウェイｗ０における２８〜２Ｆまでの８ビットのアドレスがＲＡＭ＃５に割り当てられる。また、ウェイｗ０における３０〜３７までの８ビットのアドレスがＲＡＭ＃６に割り当てられる。また、ウェイｗ０における３８〜３Ｆまでの８ビットのアドレスがＲＡＭ＃７に割り当てられる。このようなアドレスの割り当てがＲＡＭ＃１５まで繰り返される。 For example, an 8-bit address from 00 to 07 in the way w0 is assigned to the RAM # 0. An 8-bit address from 08 to 0F in the way w0 is assigned to the RAM # 1. Also, 8-bit addresses from 10 to 17 in the way w0 are assigned to the RAM # 2. An 8-bit address from 18 to 1F in way w0 is assigned to RAM # 3. An 8-bit address from 20 to 27 in the way w0 is assigned to the RAM # 4. Also, 8-bit addresses 28 to 2F in the way w0 are assigned to the RAM # 5. Further, 8-bit addresses 30 to 37 in the way w0 are assigned to the RAM # 6. An 8-bit address from 38 to 3F in way w0 is assigned to RAM # 7. Such address assignment is repeated up to RAM # 15.

また、例えば、ＲＡＭ＃０には、ウェイｗ０における００〜０７までの８ビットのアドレス、ウェイｗ１における２０〜２７までの８ビットのアドレスが割り当てられる。さらに、ＲＡＭ＃０には、ウェイｗ２における４０〜４７までの８ビットのアドレス、ウェイｗ３における６０〜６７までの８ビットのアドレスが割り当てられる。 Further, for example, an 8-bit address from 00 to 07 in the way w0 and an 8-bit address from 20 to 27 in the way w1 are allocated to the RAM # 0. Further, an 8-bit address from 40 to 47 in the way w2 and an 8-bit address from 60 to 67 in the way w3 are allocated to the RAM # 0.

図２に戻って説明を続ける。命令制御部１１から出力されるリクエスト２００には、アドレス、ＳＩＭＤ及びストライドが含まれる。ここで、アドレスは、ロード・ストア処理の対象とするアドレスを示す情報である。要素番号は、何番目のリクエストかを示す情報である。ストライドは、ストライドの間隔を示す情報である。さらに、アドレスは、上位２ビットでウェイを表す。 Returning to FIG. 2, the description will be continued. The request 200 output from the instruction control unit 11 includes an address, SIMD, and stride. Here, the address is information indicating an address to be subjected to load / store processing. The element number is information indicating the number of the request. The stride is information indicating a stride interval. Furthermore, the address represents the way with the upper 2 bits.

タグ判定部１３１は、リクエスト２００のアドレスの情報を取得する。そして、タグ判定部１３１は、取得したアドレスに対応するキャッシュＲＡＭ１３３内のキャッシュセットでキャッシュタグがヒットするか否かを判定する。キャッシュタグがヒットしない場合、タグ判定部１３１は、キャッシュミスと判定する。これに対して、キャッシュタグがヒットした場合、タグ判定部１３１は、ヒットしたウェイの情報をＲＡＭアドレス生成回路１３２へ出力する。 The tag determination unit 131 acquires information on the address of the request 200. Then, the tag determination unit 131 determines whether or not the cache tag is hit in the cache set in the cache RAM 133 corresponding to the acquired address. If the cache tag does not hit, the tag determination unit 131 determines that there is a cache miss. On the other hand, when the cache tag is hit, the tag determination unit 131 outputs information on the hit way to the RAM address generation circuit 132.

ＲＡＭアドレス生成回路１３２は、リクエスト２００からアドレス、要素番号及びストライドを取得する。さらに、ＲＡＭアドレス生成回路１３２は、ウェイの情報の入力をタグ判定部１３１から受ける。 The RAM address generation circuit 132 acquires an address, an element number, and a stride from the request 200. Further, the RAM address generation circuit 132 receives input of way information from the tag determination unit 131.

ＲＡＭアドレス生成回路１３２は、アドレス、要素番号、ストライド及びウェイの情報からストライドアクセスの各アドレスを取得する。この場合、ＲＡＭアドレス生成回路１３２は、要素Ｅ１の処理が対象とするアドレスを、リクエスト２００に格納されていたアドレスとする。また、ＲＡＭアドレス生成回路１３２は、要素Ｅ２の処理が対象とするアドレスを、ストライドに８Ｂｙｔｅを乗算した値をリクエスト２００に格納されていたアドレスに加えたアドレスとする。また、ＲＡＭアドレス生成回路１３２は、要素Ｅ３の処理が対象とするアドレスを、ストライドに８Ｂｙｔｅを乗算した値に２を乗算した値をリクエスト２００に格納されていたアドレスに加えたアドレスとする。 The RAM address generation circuit 132 acquires each address for stride access from the address, element number, stride and way information. In this case, the RAM address generation circuit 132 sets the address targeted for processing of the element E1 as the address stored in the request 200. Further, the RAM address generation circuit 132 sets the address targeted for the processing of the element E2 as the address obtained by adding the value obtained by multiplying the stride by 8 bytes to the address stored in the request 200. Further, the RAM address generation circuit 132 sets the address targeted for the processing of the element E3 as an address obtained by adding a value obtained by multiplying the value obtained by multiplying the stride by 8 bytes to 2 to the address stored in the request 200.

ここで、キャッシュＲＡＭ１３３は、ＲＡＭ＃０〜＃１５という１６個のＲＡＭを有しする。そして、ストライド幅とＲＡＭの数が素である場合には、１回の処理で同じＲＡＭに複数回アクセスすることはない。また、ストライド幅に要素数を乗算した値がＲＡＭの数とストライド幅の最小公倍数を越えない場合は、１回の処理で同じＲＡＭに複数回アクセスすることはない。以下では、１回の処理で同じＲＡＭに複数回のアクセスが発生することを「アドレスの干渉」という。 Here, the cache RAM 133 has 16 RAMs, RAM # 0 to # 15. When the stride width and the number of RAMs are prime, the same RAM is not accessed multiple times in one process. Further, when the value obtained by multiplying the stride width by the number of elements does not exceed the number of RAMs and the least common multiple of the stride widths, the same RAM is not accessed multiple times in one process. Hereinafter, the occurrence of multiple accesses to the same RAM in one process is referred to as “address interference”.

図４は、アドレスの干渉が発生しないことを説明するための図である。図４のグラフ２０１〜２０４は、それぞれ、ストライド幅を変えた場合の処理に用いるアドレスの位置を示す。円を１６分割した各位置でＲＡＭ＃０〜＃１５を表す。さらに、ＲＡＭ＃０〜１５を示す位置に記載された数字がそれぞれ要素Ｅ１〜Ｅ４で用いられるアドレスが、対応するＲＡＭ上に存在することを示す。また、ここでは、要素Ｅ１のアドレスは、ＲＡＭ＃０上に存在する場合で示す。 FIG. 4 is a diagram for explaining that no address interference occurs. Graphs 201 to 204 in FIG. 4 indicate the positions of addresses used for processing when the stride width is changed. The RAMs # 0 to # 15 are represented at respective positions obtained by dividing the circle into 16 parts. Further, the numbers written in the positions indicating the RAMs # 0 to 15 indicate that the addresses used in the elements E1 to E4 respectively exist on the corresponding RAM. Here, the address of the element E1 is shown when it exists on the RAM # 0.

図４に示すように、グラフ２０１〜２０７に示すように、ストライド幅が１〜７のいずれの場合でも、要素Ｅ１〜Ｅ４に対応するアドレスが干渉しないことが分かる。すなわち、本実施例に係るキャッシュＲＡＭ１３３が生成した各要素Ｅ１〜Ｅ４に対応するアドレスは、干渉しないことが分かる。 As shown in FIG. 4, as shown in graphs 201 to 207, it can be seen that the addresses corresponding to the elements E <b> 1 to E <b> 4 do not interfere even when the stride width is 1 to 7. That is, it can be seen that the addresses corresponding to the elements E1 to E4 generated by the cache RAM 133 according to the present embodiment do not interfere with each other.

さらに、図２に戻って説明を続ける。ＲＡＭアドレス生成回路１３２は、リクエストの上位から数えて５ビット目及び６ビット目の値を取得する。そして、ＲＡＭアドレス生成回路１３２は、取得した値とウェイの番号からそのアドレスを有するＲＡＭをＲＡＭ＃０〜＃１５から特定する。 Further, returning to FIG. The RAM address generation circuit 132 acquires the values of the fifth bit and the sixth bit from the top of the request. Then, the RAM address generation circuit 132 specifies the RAM having the address from the RAMs # 0 to # 15 from the acquired value and the way number.

例えば、ＲＡＭ＃０には、以下のアドレスが割り当てられる。以下では、リクエストの上位から数えてｘビットからｙビットの範囲を［ｙ：ｘ］で示す。また、ある範囲の各ビットの値を鉤括弧内の数列として「００１」のように表す。ＲＡＭ＃０には、要素ｎ（ｎ＝Ｅ１，Ｅ２，Ｅ３，Ｅ４）のアドレスの［６：３］が「００００」であり、且つウェイｗ０のアドレスが割り当てられる。また、ＲＡＭ＃０には、要素ｎのアドレスの［６：３］が「０１００」であり、且つウェイｗ１のアドレスが割り当てられる。また、ＲＡＭ＃０には、要素ｎのアドレスの［６：３］が「１０００」であり、且つウェイｗ２のアドレスが割り当てられる。また、ＲＡＭ＃０には、要素ｎのアドレスの［６：３］が「１１００」であり、且つウェイｗ３のアドレスが割り当てられる。そして、要素ｎのアドレスの［６：３］の値は１６通りの組み合わせがあり、且つウェイ毎に要素ｎのアドレスの［６：３］の値が各ＲＡＭ＃０〜＃１５のそれぞれに対して重ならないように割り当てられる。 For example, the following addresses are assigned to RAM # 0. In the following, the range from x bits to y bits counted from the top of the request is indicated by [y: x]. In addition, the value of each bit in a certain range is expressed as “001” as a numerical sequence in brackets. In RAM # 0, the address [6: 3] of the element n (n = E1, E2, E3, E4) is “0000” and the address of the way w0 is assigned. Further, the address of the element n is [0: 3] “0100” and the address of the way w1 is assigned to the RAM # 0. In addition, the address [6: 3] of the element n is “1000” and the address of the way w2 is assigned to the RAM # 0. Further, the address of the element n is [1: 3] “1100” and the address of the way w3 is assigned to the RAM # 0. The value of [6: 3] of the address of the element n has 16 combinations, and the value of [6: 3] of the address of the element n for each way corresponds to each of the RAMs # 0 to # 15. Assigned so that they do not overlap.

したがって、ＲＡＭアドレス生成回路１３２は、要素ｎのアドレスの［６：３］の値とウェイの番号から使用するＲＡＭを特定することができる。 Therefore, the RAM address generation circuit 132 can specify the RAM to be used from the [6: 3] value of the address of the element n and the way number.

ここで、図５を参照して、ＲＡＭアドレス生成回路１３２による読み出し時のアドレス生成について詳細に説明する。図５は、ＲＡＭアドレス生成回路の読み出し機能のブロック図である。 Here, with reference to FIG. 5, the address generation at the time of reading by the RAM address generation circuit 132 will be described in detail. FIG. 5 is a block diagram of the read function of the RAM address generation circuit.

ＲＡＭアドレス生成回路１３２は、加算器３０１〜３０３及びセレクタ３１１〜３１４を有する。ただし、図５では、セレクタを４つしか記載していないが実際には、ＲＡＭアドレス生成回路１３２は、ＲＡＭ＃０〜＃１５の数分のセレクタを有する。 The RAM address generation circuit 132 includes adders 301 to 303 and selectors 311 to 314. However, in FIG. 5, only four selectors are shown, but actually, the RAM address generation circuit 132 has as many selectors as RAMs # 0 to # 15.

加算器３０１〜３０３は、要素Ｅ１のリクエストの［１３：３］にあたるアドレスの入力を受ける。さらに、加算器３０１〜３０３は、要素Ｅ１のリクエストの［２：０］にあたるストライド幅の入力を受ける。ここで、ストライド幅は、要素ｗ１のリクエストの［２：０］の３ビットを用いて１〜７のいずれであるかが示される。 The adders 301 to 303 receive an input of an address corresponding to [13: 3] of the request of the element E1. Further, the adders 301 to 303 receive an input of a stride width corresponding to [2: 0] of the request of the element E1. Here, the stride width is indicated to be any one of 1 to 7 using 3 bits [2: 0] of the request of the element w1.

加算器３０１は、入力されたストライド幅に８を乗算した値を要素Ｅ１のリクエストの［１３：３］に加算する。この加算結果が、要素Ｅ２の［１３：３］で表されるアドレスにあたる。そして、加算器３０１は、加算結果をセレクタ３１１〜３１４へ出力する。 The adder 301 adds a value obtained by multiplying the input stride width by 8 to [13: 3] of the request of the element E1. This addition result corresponds to the address represented by [13: 3] of the element E2. Then, the adder 301 outputs the addition result to the selectors 311 to 314.

加算器３０２は、入力されたストライド幅に１６を乗算した値を要素Ｅ１のリクエストの［１３：３］に加算する。この加算結果が、要素Ｅ３の［１３：３］で表されるアドレスにあたる。そして、加算器３０２は、加算結果をセレクタ３１１〜３１４へ出力する。 The adder 302 adds the value obtained by multiplying the input stride width by 16 to [13: 3] of the request of the element E1. This addition result corresponds to the address represented by [13: 3] of the element E3. Adder 302 then outputs the addition result to selectors 311 to 314.

加算器３０３は、入力されたストライド幅に３２を乗算した値を要素Ｅ１のリクエストの［１３：３］に加算する。この加算結果が、要素Ｅ４の［１３：３］で表されるアドレスにあたる。そして、加算器３０３は、加算結果をセレクタ３１１〜３１４へ出力する。 The adder 303 adds a value obtained by multiplying the input stride width by 32 to [13: 3] of the request of the element E1. This addition result corresponds to the address represented by [13: 3] of the element E4. Then, the adder 303 outputs the addition result to the selectors 311 to 314.

具体的には、加算器３０１〜３０３は、ストライド幅に合わせて、図６に示す値を各要素に加算する。図６は、要素毎の加算器による加算量を示す図である。 Specifically, the adders 301 to 303 add the values shown in FIG. 6 to each element in accordance with the stride width. FIG. 6 is a diagram showing the addition amount by the adder for each element.

セレクタ３１１は、ＲＡＭ＃０の選択を決定するセレクタである。セレクタ３１２は、ＲＡＭ＃１の選択を決定するセレクタである。セレクタ３１３は、ＲＡＭ＃２の選択を決定するセレクタである。セレクタ３１４は、ＲＡＭ＃１５の選択を決定するセレクタである。 The selector 311 is a selector that determines selection of the RAM # 0. The selector 312 is a selector that determines selection of the RAM # 1. The selector 313 is a selector that determines selection of the RAM # 2. The selector 314 is a selector that determines selection of the RAM # 15.

セレクタ３１１〜３１４は、要素Ｅ１のリクエストの［１３：３］にあたるアドレスの入力を受ける。また、セレクタ３１１〜３１４は、要素Ｅ２〜Ｅ４のアドレスの入力を加算器３０１〜３０３から受ける。 The selectors 311 to 314 receive an input of an address corresponding to [13: 3] of the request of the element E1. Further, the selectors 311 to 314 receive the addresses of the elements E2 to E4 from the adders 301 to 303.

ここで、セレクタ３１１を例にセレクタの動作を説明する。図７は、読み出しに対応するセレクタの詳細を表す回路図である。セレクタ３１１は、ＲＡＭ＃０読出用選択条件回路３２１〜３２４、ＡＮＤ回路３３１〜３３４及びＯＲ回路３４０を有する。 Here, the operation of the selector will be described using the selector 311 as an example. FIG. 7 is a circuit diagram showing details of the selector corresponding to reading. The selector 311 includes RAM # 0 read selection condition circuits 321 to 324, AND circuits 331 to 334, and an OR circuit 340.

ＲＡＭ＃０読出用選択条件回路３２１〜３２４は、それぞれ、要素Ｅ１〜Ｅ４の［１３：３］にあたるアドレスの入力を受ける。また、ＲＡＭ＃０読出用選択条件回路３２１〜３２４は、ウェイＩＤの入力を受ける。 The RAM # 0 read selection condition circuits 321 to 324 receive input of addresses corresponding to [13: 3] of the elements E1 to E4, respectively. The RAM # 0 read selection condition circuits 321 to 324 receive the way ID input.

ＲＡＭ＃０読出用選択条件回路３２１〜３２４は、図８に示す回路構成を有する。すなわち、ＲＡＭ＃０読出用選択条件回路３２１〜３２４は、ＡＮＤ回路３５１〜３５４及びＯＲ回路３５５を有する。図８は、ＲＡＭ＃０読出用選択条件回路の回路図である。 The RAM # 0 read selection condition circuits 321 to 324 have the circuit configuration shown in FIG. That is, the RAM # 0 read selection condition circuits 321 to 324 include AND circuits 351 to 354 and an OR circuit 355. FIG. 8 is a circuit diagram of the RAM # 0 read selection condition circuit.

ＡＮＤ回路３５１は、入力されたアドレス内の［６：３］の各ビットの値を反転した値の入力を受ける。さらに、ＡＮＤ回路３５１は、ウェイＩＤを表す［１：０］の各ビットを反転させた値の入力を受ける。そして、ＡＮＤ回路３５１は、入力値の論理積を求め、結果をＯＲ回路３５５へ出力する。 The AND circuit 351 receives a value obtained by inverting the value of each bit of [6: 3] in the input address. Further, the AND circuit 351 receives an input of a value obtained by inverting each bit of [1: 0] representing the way ID. The AND circuit 351 calculates a logical product of the input values and outputs the result to the OR circuit 355.

ＡＮＤ回路３５２は、入力されたアドレス内の［６：３］の各ビットのうち３番目及び４番目のビットを反転した値の入力を受ける。さらに、ＡＮＤ回路３５２は、ウェイＩＤを表す［１：０］の各ビットの１番目のビットを反転させた値の入力を受ける。そして、ＡＮＤ回路３５２は、入力値の論理積を求め、結果をＯＲ回路３５５へ出力する。 The AND circuit 352 receives an input of a value obtained by inverting the third and fourth bits of the [6: 3] bits in the input address. Further, the AND circuit 352 receives an input of a value obtained by inverting the first bit of each bit of [1: 0] representing the way ID. Then, the AND circuit 352 calculates the logical product of the input values and outputs the result to the OR circuit 355.

ＡＮＤ回路３５３は、入力されたアドレス内の［６：３］の各ビットのうち３番目、４番目及び５番目のビットを反転した値の入力を受ける。さらに、ＡＮＤ回路３５３は、ウェイＩＤを表す［１：０］の０番目のビットを反転させた値の入力を受ける。そして、ＡＮＤ回路３５３は、入力値の論理積を求め、結果をＯＲ回路３５５へ出力する。 The AND circuit 353 receives an input of a value obtained by inverting the third, fourth, and fifth bits of the [6: 3] bits in the input address. Further, the AND circuit 353 receives an input of a value obtained by inverting the 0th bit of [1: 0] representing the way ID. Then, the AND circuit 353 calculates the logical product of the input values and outputs the result to the OR circuit 355.

ＡＮＤ回路３５４は、入力されたアドレス内の［６：３］の各ビットのうち３番目、４番目及び６番目のビットを反転した値の入力を受ける。さらに、ＡＮＤ回路３５４は、ウェイＩＤを表す［１：０］の各ビットの値の入力を受ける。そして、ＡＮＤ回路３５４は、入力値の論理積を求め、結果をＯＲ回路３５５へ出力する。 The AND circuit 354 receives an input of a value obtained by inverting the third, fourth, and sixth bits of the [6: 3] bits in the input address. Further, the AND circuit 354 receives an input of each bit value [1: 0] representing the way ID. Then, the AND circuit 354 calculates the logical product of the input values and outputs the result to the OR circuit 355.

ＯＲ回路３５５は、ＡＮＤ回路３５１〜３５４から入力された値の論理和を求め、結果を出力する。 The OR circuit 355 calculates the logical sum of the values input from the AND circuits 351 to 354 and outputs the result.

ここで、ウェイＩＤを表す［１：０］及びアドレス内の［６：３］の各ビットの組み合わせに対して、図９に示す使用ＲＡＭのようにＲＡＭ＃０〜＃１５が対応する。図９は、ウェイとアドレスによる読み出し時のＲＡＭ選択真理値表を示す図である。すなわち、ＲＡＭ＃０読出用選択条件回路３２１〜３２４は、この読み出し時のＲＡＭ選択真理値表の中でＲＡＭ＃０に対応するウェイＩＤ及びアドレスの組み合わせが入力された場合に、１を出力する。これは、他のＲＡＭ＃１〜＃１５についても同様である。 Here, RAM # 0 to # 15 correspond to the combination of [1: 0] representing the way ID and [6: 3] in the address as in the RAM used in FIG. FIG. 9 is a diagram showing a RAM selection truth table at the time of reading by way and address. That is, the RAM # 0 read selection condition circuits 321 to 324 output 1 when a combination of way ID and address corresponding to RAM # 0 is input in the RAM selection truth table at the time of reading. . The same applies to the other RAMs # 1 to # 15.

例えば、ＲＡＭ＃０の選択について説明する。読み出し時にＲＡＭ＃０が選択される場合を図９から抜き出すと、図１０が完成する。図１０は、読み出し時にＲＡＭ＃０が選択される条件を表す図である。ここで、図１０に示された各場合についてＡＮＤ回路３５３〜３５４の動作を確認する。ウェイＩＤを表す［１：０］が「００」でアドレス内の［６：３］が「００００」の場合、ＡＮＤ回路３５１は１を出力する。また、ウェイＩＤを表す［１：０］が「０１」でアドレス内の［６：３］が「１１００」の場合、ＡＮＤ回路３５２から１が出力される。ウェイＩＤを表す［１：０］が「１０」でアドレス内の［６：３］が「１０００」の場合、ＡＮＤ回路３５３から１が出力される。ウェイＩＤを表す［１：０］が「１１」でアドレス内の［６：３］が「０１００」の場合、ＡＮＤ回路３５４から１が出力される。すなわち、図１０のいずれかの組み合わせの入力がＲＡＭ＃０読出用選択条件回路３２１〜３２４になされれば、入力を受けたＲＡＭ＃０読出用選択条件回路３２１〜３２４は、１を出力する。すなわち、ＲＡＭ＃０の選択条件を満たす場合、ＲＡＭ＃０読出用選択条件回路３２１〜３２４は１を出力し、ＲＡＭ＃０の選択条件を満たさない場合、ＲＡＭ＃０読出用選択条件回路３２１〜３２４は０を出力する。 For example, selection of RAM # 0 will be described. When the case where RAM # 0 is selected at the time of reading is extracted from FIG. 9, FIG. 10 is completed. FIG. 10 is a diagram illustrating conditions under which RAM # 0 is selected during reading. Here, the operation of the AND circuits 353 to 354 is confirmed in each case shown in FIG. When [1: 0] representing the way ID is “00” and [6: 3] in the address is “0000”, the AND circuit 351 outputs “1”. When [1: 0] representing the way ID is “01” and [6: 3] in the address is “1100”, 1 is output from the AND circuit 352. When [1: 0] representing the way ID is “10” and [6: 3] in the address is “1000”, 1 is output from the AND circuit 353. When [1: 0] representing the way ID is “11” and [6: 3] in the address is “0100”, 1 is output from the AND circuit 354. That is, if any combination of the inputs in FIG. 10 is input to the RAM # 0 read selection condition circuits 321 to 324, the RAM # 0 read selection condition circuits 321 to 324 that have received the inputs output 1. That is, when the selection condition for RAM # 0 is satisfied, the selection condition circuits 321 to 324 for reading RAM # 0 output 1, and when the selection condition for RAM # 0 is not satisfied, the selection condition circuit 321 for reading RAM # 0. 324 outputs 0.

図７に戻って説明を続ける。ＡＮＤ回路３３１〜３３４は、ＲＡＭ＃０読出用選択条件回路３２１〜３２４からの出力を受ける。またＡＮＤ回路３３１〜３３４は、それぞれ、要素Ｅ１〜Ｅ４の［１３：５］で表されるアドレスの入力を受ける。そして、ＡＮＤ回路３３１〜３４４は、それぞれ、入力値の論理積を求めて、ＯＲ回路３４０へ出力する。具体的には、ＡＮＤ回路３３１〜３４４のうち、ＲＡＭ＃０読出用選択条件回路３２１〜３２４から１の入力を受けた回路が受信したアドレスを出力し、０の入力を受けた回路はすべて０の値を出力する。 Returning to FIG. 7, the description will be continued. AND circuits 331 to 334 receive outputs from RAM # 0 read selection condition circuits 321 to 324. Each of the AND circuits 331 to 334 receives an input of an address represented by [13: 5] of the elements E1 to E4. Then, each of the AND circuits 331 to 344 calculates a logical product of the input values and outputs the logical product to the OR circuit 340. More specifically, among the AND circuits 331 to 344, the address received by the circuit receiving 1 input from the RAM # 0 read selection condition circuits 321 to 324 is output, and all the circuits receiving 0 input are 0. The value of is output.

ＯＲ回路３４０は、ＡＮＤ回路３３１〜３３４から入力された値の論理和を求め、結果を出力する。すなわち、ＲＡＭ＃０読出用選択条件回路３２１〜３２４のいずれかから１が出力された場合、ＯＲ回路３４０は、それに対応する要素Ｅ１〜Ｅ４のいずれかの要素をＲＡＭ＃０の読出アドレスとして出力する。 The OR circuit 340 calculates the logical sum of the values input from the AND circuits 331 to 334 and outputs the result. That is, when 1 is output from any of RAM # 0 read selection condition circuits 321 to 324, OR circuit 340 outputs any of elements E1 to E4 corresponding thereto as a read address of RAM # 0. To do.

図５に戻って説明を続ける。セレクタ３１１は、ＯＲ回路３４０からの出力であるアドレスをヒット判定部１３４へ出力する。同様に、セレクタ３１２は、ウェイＩＤを表す［１：０］及びアドレス内の［６：３］の各ビットの組み合わせがＲＡＭ＃１の選択条件に一致した場合、ヒット判定部１３４へアドレスを出力する。セレクタ３１３は、ウェイＩＤを表す［１：０］及びアドレス内の［６：３］の各ビットの組み合わせがＲＡＭ＃２の選択条件に一致した場合、ヒット判定部１３４へアドレスを出力する。セレクタ３１４は、ウェイＩＤを表す［１：０］及びアドレス内の［６：３］の各ビットの組み合わせがＲＡＭ＃１５の選択条件に一致した場合、ヒット判定部１３４へアドレスを出力する。 Returning to FIG. The selector 311 outputs an address that is an output from the OR circuit 340 to the hit determination unit 134. Similarly, the selector 312 outputs the address to the hit determination unit 134 when the combination of each bit of [1: 0] representing the way ID and [6: 3] in the address matches the selection condition of the RAM # 1. To do. The selector 313 outputs the address to the hit determination unit 134 when the combination of each bit of [1: 0] representing the way ID and [6: 3] in the address matches the selection condition of the RAM # 2. The selector 314 outputs the address to the hit determination unit 134 when the combination of each bit of [1: 0] representing the way ID and [6: 3] in the address matches the selection condition of the RAM # 15.

ヒット判定部１３４は、各セレクタ３１１〜３１４から入力されたアドレスを用いて、対応するＲＡＭ＃０〜＃１５におけるキャッシュヒットの判定を行う。 The hit determination unit 134 uses the addresses input from the selectors 311 to 314 to determine cache hits in the corresponding RAMs # 0 to # 15.

次に、図１１を参照して、ＲＡＭアドレス生成回路１３２による書き込み時のアドレス生成について詳細に説明する。図１１は、ＲＡＭアドレス生成回路の書き込み機能のブロック図である。 Next, referring to FIG. 11, address generation at the time of writing by the RAM address generation circuit 132 will be described in detail. FIG. 11 is a block diagram of the write function of the RAM address generation circuit.

ＲＡＭアドレス生成回路１３２は、書き込み時用のセレクタ３６１〜３６４を有する。セレクタ３６１〜３６４は、それぞれ、ＲＡＭ＃０〜＃１５に対応する。 The RAM address generation circuit 132 includes selectors 361 to 364 for writing. The selectors 361 to 364 correspond to the RAMs # 0 to # 15, respectively.

セレクタ３６１〜３６４は、リクエストの［１３：５］にあたるアドレスの入力を受ける。また、セレクタ３６１〜３６４は、リクエストの［１：０］にあたるウェイＩＤの入力を受ける。 The selectors 361 to 364 receive an input of an address corresponding to [13: 5] of the request. Further, the selectors 361 to 364 receive the input of the way ID corresponding to [1: 0] of the request.

ウェイＩＤを表す［１：０］及びアドレス内の［６：５］の各ビットの組み合わせは、図１２の使用ＲＡＭとして表されるＲＡＭ＃０〜＃１５に対応する。図１２は、ウェイとアドレスによる書き込み時のＲＡＭ選択真理値表を示す図である。 The combination of [1: 0] representing the way ID and [6: 5] in the address corresponds to the RAMs # 0 to # 15 represented as the used RAMs in FIG. FIG. 12 is a diagram showing a RAM selection truth table at the time of writing by way and address.

セレクタ３６１〜３６４は、ウェイＩＤを表す［１：０］及びアドレス内の［６：５］の各ビットの組み合わせに応じて、図１２の対応する使用ＲＡＭを書き込みに使用するＲＡＭとして選択する。そして、セレクタ３６１〜３６４は、選択したＲＡＭのアドレスをヒット判定部１３４へ出力する。 The selectors 361 to 364 select the corresponding RAM to be used in FIG. 12 as the RAM to be used for writing according to the combination of [1: 0] representing the way ID and [6: 5] in the address. Then, the selectors 361 to 364 output the selected RAM address to the hit determination unit 134.

ここで、図１３を参照して、セレクタ３６１を例にセレクタ３６１〜３６４について詳細に説明する。図１３は、書き込みに対応するセレクタの詳細を表す回路図である。セレクタ３６１は、ＲＡＭ＃０書込用選択条件回路３７１及びＡＮＤ回路３７２を有する。 Here, with reference to FIG. 13, the selectors 361 to 364 will be described in detail by taking the selector 361 as an example. FIG. 13 is a circuit diagram showing details of a selector corresponding to writing. The selector 361 includes a RAM # 0 write selection condition circuit 371 and an AND circuit 372.

例えば、要素Ｅ１を処理する場合、ＲＡＭ＃０書込用選択条件回路３７１は、要素Ｅ１の［１３：５］で表されるアドレス内の［６：５］に対応するビットの入力を受ける。また、ＲＡＭ＃０書込用選択条件回路３７１は、ウェイＩＤを表す［１：０］の入力を受ける。 For example, when the element E1 is processed, the RAM # 0 write selection condition circuit 371 receives a bit input corresponding to [6: 5] in the address represented by [13: 5] of the element E1. Further, the RAM # 0 write selection condition circuit 371 receives [1: 0] representing the way ID.

そして、ＲＡＭ＃０書込用選択条件回路３７１は、ウェイＩＤを表す［１：０］及びアドレス内の［６：５］の組み合わせに対応する図１２における使用ＲＡＭがＲＡＭ＃０に一致する場合、ＲＡＭ＃０の選択を表す１をＡＮＤ回路３７２へ出力する。これに対して、ウェイＩＤを表す［１：０］及びアドレス内の［６：５］の組み合わせに対応する図１２における使用ＲＡＭがＲＡＭ＃０に一致しない場合、ＲＡＭ＃０書込用選択条件回路３７１は、０をＡＮＤ回路３７２へ出力する。 Then, the RAM # 0 write selection condition circuit 371, when the RAM used in FIG. 12 corresponding to the combination of [1: 0] representing the way ID and [6: 5] in the address matches the RAM # 0. 1 representing the selection of the RAM # 0 is output to the AND circuit 372. On the other hand, if the RAM used in FIG. 12 corresponding to the combination of [1: 0] representing the way ID and [6: 5] in the address does not match RAM # 0, the selection condition for writing RAM # 0 The circuit 371 outputs 0 to the AND circuit 372.

さらに、具体的にＲＡＭ＃０書込用選択条件回路３７１の構成を説明する。図１４は、ＲＡＭ＃０書込用選択条件回路の回路図である。ＲＡＭ＃０書込用選択条件回路３７１は、ＡＮＤ回路３８１〜３８４及びＯＲ回路３８５を有する。 Further, the configuration of the RAM # 0 write selection condition circuit 371 will be specifically described. FIG. 14 is a circuit diagram of the RAM # 0 write selection condition circuit. The RAM # 0 write selection condition circuit 371 includes AND circuits 381 to 384 and an OR circuit 385.

ＡＮＤ回路３８１は、アドレス内の［６：５］の各ビットを反転させた値の入力を受ける。また、ＡＮＤ回路３８１は、ウェイＩＤを表す［１：０］の各ビットを反転させた値の入力を受ける。そして、ＡＮＤ回路３８１は、入力された値の論理積を求め、結果をＯＲ回路３８５へ出力する。 The AND circuit 381 receives a value obtained by inverting each bit of [6: 5] in the address. The AND circuit 381 receives an input of a value obtained by inverting each bit of [1: 0] representing the way ID. Then, the AND circuit 381 obtains a logical product of the input values and outputs the result to the OR circuit 385.

ＡＮＤ回路３８２は、入力されたアドレス内の［６：５］の各ビットの値の入力を受ける。さらに、ＡＮＤ回路３８２は、ウェイＩＤを表す［１：０］の各ビットの１番目のビットを反転させた値の入力を受ける。そして、ＡＮＤ回路３８２は、入力値の論理積を求め、結果をＯＲ回路３８５へ出力する。 The AND circuit 382 receives the value of each [6: 5] bit in the input address. Further, the AND circuit 382 receives an input of a value obtained by inverting the first bit of each bit of [1: 0] representing the way ID. Then, the AND circuit 382 calculates the logical product of the input values and outputs the result to the OR circuit 385.

ＡＮＤ回路３８３は、入力されたアドレス内の［６：５］の各ビットのうち５番目のビットを反転した値の入力を受ける。さらに、ＡＮＤ回路３８３は、ウェイＩＤを表す［１：０］の０番目のビットを反転させた値の入力を受ける。そして、ＡＮＤ回路３８３は、入力値の論理積を求め、結果をＯＲ回路３８５へ出力する。 The AND circuit 383 receives an input of a value obtained by inverting the fifth bit among the [6: 5] bits in the input address. Further, the AND circuit 383 receives an input of a value obtained by inverting the 0th bit of [1: 0] representing the way ID. Then, the AND circuit 383 obtains a logical product of the input values and outputs the result to the OR circuit 385.

ＡＮＤ回路３８４は、入力されたアドレス内の［６：５］の各ビットのうち６番目のビットを反転した値の入力を受ける。さらに、ＡＮＤ回路３８４は、ウェイＩＤを表す［１：０］の各ビットの値の入力を受ける。そして、ＡＮＤ回路３８４は、入力値の論理積を求め、結果をＯＲ回路３８５へ出力する。 The AND circuit 384 receives an input of a value obtained by inverting the sixth bit among the [6: 5] bits in the input address. Furthermore, the AND circuit 384 receives the value of each bit of [1: 0] representing the way ID. Then, the AND circuit 384 calculates a logical product of the input values and outputs the result to the OR circuit 385.

ＯＲ回路３８５は、ＡＮＤ回路３８１〜３８４から入力された値の論理和を求め、結果を出力する。 The OR circuit 385 calculates the logical sum of the values input from the AND circuits 381 to 384 and outputs the result.

例えば、ＲＡＭ＃０の選択について説明する。書き込み時にＲＡＭ＃０が選択される場合を図１２から抜き出すと、図１５のようになる。図１５は、書き込み時にＲＡＭ＃０が選択される条件を表す図である。ここで、図１５に示された各場合についてＡＮＤ回路３８３〜３８４の動作を確認する。ウェイＩＤを表す［１：０］が「００」でアドレス内の［６：５］が「００」の場合、ＡＮＤ回路３８１は１を出力する。また、ウェイＩＤを表す［１：０］が「０１」でアドレス内の［６：５］が「１１」の場合、ＡＮＤ回路３８２から１が出力される。ウェイＩＤを表す［１：０］が「１０」でアドレス内の［６：５］が「１０」の場合、ＡＮＤ回路３８３から１が出力される。ウェイＩＤを表す［１：０］が「１１」でアドレス内の［６：５］が「０１」の場合、ＡＮＤ回路３８４から１が出力される。すなわち、図１５のいずれかの組み合わせの入力がＲＡＭ＃０書込用選択条件回路３７１になされれば、入力を受けたＲＡＭ＃０書込用選択条件回路３７１は、１を出力する。すなわち、ＲＡＭ＃０の選択条件を満たす場合、ＲＡＭ＃０書込用選択条件回路３７１は１を出力し、ＲＡＭ＃０の選択条件を満たさない場合、ＲＡＭ＃０書込用選択条件回路３７１は０を出力する。 For example, selection of RAM # 0 will be described. FIG. 15 shows a case where RAM # 0 is selected at the time of writing from FIG. FIG. 15 is a diagram showing conditions for selecting RAM # 0 during writing. Here, the operation of the AND circuits 383 to 384 is confirmed in each case shown in FIG. When [1: 0] representing the way ID is “00” and [6: 5] in the address is “00”, the AND circuit 381 outputs 1. When [1: 0] representing the way ID is “01” and [6: 5] in the address is “11”, 1 is output from the AND circuit 382. When [1: 0] representing the way ID is “10” and [6: 5] in the address is “10”, 1 is output from the AND circuit 383. When [1: 0] representing the way ID is “11” and [6: 5] in the address is “01”, 1 is output from the AND circuit 384. That is, if any combination of the inputs in FIG. 15 is input to the RAM # 0 write selection condition circuit 371, the RAM # 0 write selection condition circuit 371 that receives the input outputs 1. That is, when the selection condition of RAM # 0 is satisfied, the selection condition circuit 371 for writing the RAM # 0 outputs 1, and when the selection condition of the RAM # 0 is not satisfied, the selection condition circuit 371 for writing the RAM # 0 is 0 is output.

図１３に戻って説明を続ける。ＡＮＤ回路３７２は、ＲＡＭ＃０書込用選択条件回路３７１からＲＡＭ＃０の選択結果を表す値の入力を受ける。また、ＡＮＤ回路３７２は、要素Ｅ１の［１３：５］で表されるアドレスの入力を受ける。 Returning to FIG. AND circuit 372 receives an input of a value representing the selection result of RAM # 0 from RAM # 0 write selection condition circuit 371. The AND circuit 372 receives an input of an address represented by [13: 5] of the element E1.

そして、ＲＡＭ＃０書込用選択条件回路３７１からの入力値が１の場合、ＡＮＤ回路３７２は、要素Ｅ１の［１３：５］で表されるアドレスを出力する。これに対して、ＲＡＭ＃０書込用選択条件回路３７１からの入力値が０の場合、ＡＮＤ回路３７２は、０を出力する。 When the input value from the RAM # 0 write selection condition circuit 371 is 1, the AND circuit 372 outputs the address represented by [13: 5] of the element E1. On the other hand, when the input value from the RAM # 0 write selection condition circuit 371 is 0, the AND circuit 372 outputs 0.

図１１に戻って説明を続ける。セレクタ３６１は、ＡＮＤ回路３７２からの出力をヒット判定部１３４へ出力する。同様に、セレクタ３６２〜３６４は、書き込み先のＲＡＭとして対応するＲＡＭ＃１〜＃１５が選択された場合、アドレスをヒット判定部１３４へ出力する。 Returning to FIG. 11, the description will be continued. The selector 361 outputs the output from the AND circuit 372 to the hit determination unit 134. Similarly, the selectors 362 to 364 output addresses to the hit determination unit 134 when the corresponding RAMs # 1 to # 15 are selected as the write destination RAMs.

ヒット判定部１３４は各セレクタ３６１〜３６４から入力されたアドレスを用いて、対応するＲＡＭ＃０〜＃１５におけるキャッシュヒットの判定を行う。 The hit determination unit 134 uses the addresses input from the selectors 361 to 364 to determine cache hits in the corresponding RAMs # 0 to # 15.

さらに、図２に戻って説明を続ける。ヒット判定部１３４は、ＲＡＭアドレス生成回路１３２からロード・ストア処理を行う対象とするＲＡＭの情報及びアドレスの入力を受ける。そして、ヒット判定部１３４は、入力されたアドレスを用いて指定されたＲＡＭに対するキャッシュヒットの判定を行う。キャッシュヒットした場合、ヒット判定部１３４は、キャッシュヒットしたデータを取得し、演算制御部１２へ出力する。キャッシュミスが発生した場合、ヒット判定部１３４は、データ転送要求をメモリアクセス制御部１３５へ送信する。 Further, returning to FIG. The hit determination unit 134 receives input of information and an address of a RAM to be subjected to load / store processing from the RAM address generation circuit 132. The hit determination unit 134 determines a cache hit for the designated RAM using the input address. When a cache hit occurs, the hit determination unit 134 acquires the data hit by the cache and outputs it to the arithmetic control unit 12. When a cache miss occurs, the hit determination unit 134 transmits a data transfer request to the memory access control unit 135.

メモリアクセス制御部１３５は、キャッシュミスが発生した場合、データの転送要求をヒット判定回路１３４から受ける。そして、メモリアクセス制御部１３５は、データの転送要求の送信元でヒット判定が行われたリクエストについてのデータ転送要求を二次キャッシュ制御部１４へ送信する。 When a cache miss occurs, the memory access control unit 135 receives a data transfer request from the hit determination circuit 134. Then, the memory access control unit 135 transmits to the secondary cache control unit 14 a data transfer request for a request for which a hit determination has been made at the transmission source of the data transfer request.

その後、メモリアクセス制御部１３５は、二次キャッシュ制御部１４からデータ転送要求に対する応答データを受信する。そして、メモリアクセス制御部１３５は、受信した応答データをキャッシュＲＡＭ１３３に格納する。その後、メモリアクセス制御部１３５は、ヒット判定部１３４の内のデータ転送要求の送信元に対して応答データの格納を通知する。 Thereafter, the memory access control unit 135 receives response data for the data transfer request from the secondary cache control unit 14. Then, the memory access control unit 135 stores the received response data in the cache RAM 133. Thereafter, the memory access control unit 135 notifies the transmission source of the data transfer request in the hit determination unit 134 to store the response data.

図１に戻って説明を続ける。二次キャッシュ制御部１４は、二次キャッシュを有する。二次キャッシュ制御部１４は、データ転送要求をメモリアクセス制御部１３８から受信する。そして、二次キャッシュ制御部１４は、自己が有する二次キャッシュにデータ転送要求で指定されたデータが格納されているか否かを判定する。格納されている場合、二次キャッシュ制御部１４は、データ転送要求で指定されたデータを二次キャッシュから取得し、取得したデータを応答データとしてメモリアクセス制御部１３５へ送信する。 Returning to FIG. 1, the description will be continued. The secondary cache control unit 14 has a secondary cache. The secondary cache control unit 14 receives a data transfer request from the memory access control unit 138. Then, the secondary cache control unit 14 determines whether the data designated by the data transfer request is stored in the secondary cache that the secondary cache control unit 14 has. If stored, the secondary cache control unit 14 acquires the data specified by the data transfer request from the secondary cache, and transmits the acquired data to the memory access control unit 135 as response data.

これに対して、データ転送要求で指定されたデータが二次キャッシュに格納されていない場合、二次キャッシュ制御部１４は、データ転送要求をメモリ制御部１５に送信する。その後、二次キャッシュ制御部１４は、応答データをメモリ制御部１５から受信する。そして、二次キャッシュ制御部１４は、受信した応答データをメモリアクセス制御部１３５へ送信する。 On the other hand, when the data specified by the data transfer request is not stored in the secondary cache, the secondary cache control unit 14 transmits the data transfer request to the memory control unit 15. Thereafter, the secondary cache control unit 14 receives the response data from the memory control unit 15. Then, the secondary cache control unit 14 transmits the received response data to the memory access control unit 135.

演算制御部１２は、演算処理の実行要求などを命令制御部１１から受信する。また、演算処理１２は、データをキャッシュ制御部１３から受信する。そして、演算制御部１２は、キャッシュ制御部１３から受信したデータを用いて演算処理などを実行する。ただし、実行する処理にキャッシュデータを用いない場合など、演算制御部１２は、キャッシュ制御部１３からのデータの受信を行わずに、処理を実行する場合もある。 The arithmetic control unit 12 receives an execution request for arithmetic processing from the instruction control unit 11. The arithmetic processing 12 receives data from the cache control unit 13. Then, the arithmetic control unit 12 executes arithmetic processing using the data received from the cache control unit 13. However, when the cache data is not used for the process to be executed, the arithmetic control unit 12 may execute the process without receiving data from the cache control unit 13.

次に、図１６を参照して、本実施例に係る演算処理装置によるロード・ストア処理の流れについて説明する。図１６は、実施例に係る演算処理装置によるロード・ストア処理のフローチャートである。 Next, with reference to FIG. 16, the flow of load / store processing by the arithmetic processing unit according to the present embodiment will be described. FIG. 16 is a flowchart of the load / store process performed by the arithmetic processing apparatus according to the embodiment.

命令制御部１１は、命令からリクエストを取得し、取得したリクエストをキャッシュ制御部１３へ発行する（ステップＳ１）。 The instruction control unit 11 acquires a request from the instruction and issues the acquired request to the cache control unit 13 (step S1).

キャッシュ制御部１３のタグ判定部１３１は、取得したリクエストからタグを取得して、取得したタグがキャッシュＲＡＭ１３３においてヒットするか否かを判定する（ステップＳ２）。タグがヒットしない場合（ステップＳ２：否定）、キャッシュ制御部１３は、ステップＳ８へ進む。 The tag determination unit 131 of the cache control unit 13 acquires a tag from the acquired request, and determines whether or not the acquired tag is hit in the cache RAM 133 (step S2). If the tag does not hit (No at Step S2), the cache control unit 13 proceeds to Step S8.

これに対して、タグがヒットした場合（ステップＳ２：肯定）、ＲＡＭアドレス生成回路１３２は、読み出し処理か否かを判定する（ステップＳ３）。 On the other hand, when the tag hits (step S2: affirmative), the RAM address generation circuit 132 determines whether or not it is a read process (step S3).

読み出し処理の場合（ステップＳ３：肯定）、ＲＡＭアドレス生成回路１３２は、要素Ｅ１のアドレスから各要素Ｅ２〜Ｅ４のアドレスを生成する（ステップＳ４）。 In the case of the reading process (step S3: Yes), the RAM address generation circuit 132 generates the addresses of the elements E2 to E4 from the address of the element E1 (step S4).

そして、ＲＡＭアドレス生成回路１３２は、各ウェイｗ０〜ｗ３の連続するアドレスが全てのメモリに亘って割り振られたキャッシュＲＡＭ１３３の各ＲＡＭ＃０〜１５から、ストライド幅に合わせて読み出し元となるＲＡＭを選択する（ステップＳ５）。 Then, the RAM address generation circuit 132 selects a RAM as a read source in accordance with the stride width from each of the RAMs # 0 to 15 of the cache RAM 133 in which consecutive addresses of the respective ways w0 to w3 are allocated over all memories. Select (step S5).

これに対して、読み出し処理でなく書き込み処理の場合（ステップＳ３：否定）、ＲＡＭアドレス生成回路１３２は、次の処理を行う。すなわち、ＲＡＭアドレス生成回路１３２は、各ウェイｗ０〜ｗ３の連続するアドレスが全てのメモリに亘って割り振られたキャッシュＲＡＭ１３３の各ＲＡＭ＃０〜１５から、ストライド幅に合わせて書き込み先となるＲＡＭを選択する（ステップＳ６）。 On the other hand, when the write process is not the read process (No at Step S3), the RAM address generation circuit 132 performs the following process. That is, the RAM address generation circuit 132 selects a RAM to be written to in accordance with the stride width from the RAMs # 0 to 15 of the cache RAM 133 in which consecutive addresses of the ways w0 to w3 are allocated over all memories. Select (step S6).

次に、ＲＡＭアドレス生成回路１３２は、選択したＲＡＭ及びそのＲＡＭ上のアドレスをヒット判定部１３４へ出力する。ヒット判定部１３４は、ＲＡＭアドレス生成回路１３２により選択されたＲＡＭ上の指定されたアドレスにおいてキャッシュヒットするか否かを判定する（ステップＳ７）。キャッシュミスの場合（ステップＳ７：否定）、メモリアクセス制御部１３５は、データ転送要求を二次キャッシュ制御部１４へ送信する。その後、二次キャッシュ制御部１４は、データ転送要求に対する応答データを、キャッシュＲＡＭ１３３へデータ転送する（ステップＳ８）。その後、キャッシュ制御部１３は、ステップＳ２に戻る。 Next, the RAM address generation circuit 132 outputs the selected RAM and the address on the RAM to the hit determination unit 134. The hit determination unit 134 determines whether or not a cache hit occurs at a specified address on the RAM selected by the RAM address generation circuit 132 (step S7). In the case of a cache miss (No at Step S7), the memory access control unit 135 transmits a data transfer request to the secondary cache control unit 14. Thereafter, the secondary cache control unit 14 transfers the response data to the data transfer request to the cache RAM 133 (step S8). Thereafter, the cache control unit 13 returns to step S2.

これに対して、キャッシュヒットの場合（ステップＳ７：肯定）、ヒット判定部１３４は、キャッシュヒットしたアドレスに対してロード・ストア処理を実行する（ステップＳ９）。 On the other hand, in the case of a cache hit (step S7: affirmative), the hit determination unit 134 executes a load / store process for the cache hit address (step S9).

以上に説明したように、本実施例に係る演算処理装置におけるキャッシュＲＡＭは、ストライド幅とＲＡＭの数とが互いに素又はストライド幅と要素数との乗算結果がＲＡＭの数とストライド幅との最小公倍数よりも小さくなる。さらに、ウェイ毎の連続するアドレスが全てのＲＡＭに亘って配置されるように割り当てられる。これにより、１つの命令でストライドアクセスを行った場合にも、同じメモリに対する複数回のアクセスを回避でき、スループットを向上させることができる。 As described above, in the cache RAM in the arithmetic processing unit according to the present embodiment, the stride width and the number of RAMs are relatively prime or the multiplication result of the stride width and the number of elements is the minimum of the number of RAMs and the stride width. It becomes smaller than the common multiple. Further, consecutive addresses for each way are assigned so as to be arranged over all the RAMs. As a result, even when stride access is performed with one instruction, multiple accesses to the same memory can be avoided and throughput can be improved.

１ＣＰＵ
２メモリ
１１命令制御部
１２演算制御部
１３キャッシュ制御部
１４二次キャッシュ制御部
１５メモリ制御部
１３１タグ判定部
１３２ＲＡＭアドレス生成回路
１３３キャッシュＲＡＭ
１３４ヒット判定部
１３５メモリアクセス制御部
３０１〜３０２加算器
３１１〜３１４セレクタ
３２１〜３２４ＲＡＭ＃０読出用選択条件回路
３３１〜３３４ＡＮＤ回路
３４０ＯＲ回路
３５１〜３５４ＡＮＤ回路
３５５ＯＲ回路
３６１〜３６４セレクタ
３７１ＲＡＭ＃０書込用選択条件回路
３７２ＡＮＤ回路
３８１〜３８４ＡＮＤ回路
３８５ＯＲ回路 1 CPU
2 memory 11 instruction control unit 12 arithmetic control unit 13 cache control unit 14 secondary cache control unit 15 memory control unit 131 tag determination unit 132 RAM address generation circuit 133 cache RAM
134 Hit Determination Unit 135 Memory Access Control Unit 301 to 302 Adder 311 to 314 Selector 321 to 324 RAM # 0 Read Selection Condition Circuit 331 to 334 AND Circuit 340 OR Circuit 351 to 354 AND Circuit 355 OR Circuit 361 to 364 Selector 371 RAM # 0 write selection condition circuit 372 AND circuit 381-384 AND circuit 385 OR circuit

Claims

A cache memory having a plurality of storage elements for each way, and assigned consecutive addresses to the plurality of storage elements belonging to each way;
An instruction control unit that acquires an arithmetic processing instruction including a plurality of cache access requests that respectively instruct processing on data at a predetermined address interval included in the same way, and transmits the acquired plurality of cache access requests;
Cache control that receives each of the plurality of cache access requests transmitted from the command control unit and executes a plurality of access processes for the data of the predetermined address interval included in the way specified by each cache access request And
An arithmetic processing unit comprising: an arithmetic control unit that performs arithmetic processing based on a processing result of the access processing by the cache control unit.

The predetermined address interval is relatively prime to the number of storage elements, or the multiplication result of the predetermined address interval and the cache access request included in the arithmetic processing instruction is the number of storage elements and the predetermined number The arithmetic processing unit according to claim 1, wherein the arithmetic processing unit is smaller than a least common multiple of the address interval.

In the cache memory, the storage elements of the number of all cache access requests included in the arithmetic processing instruction are grouped, and the addresses assigned to the ways on each group are assigned so as to be serial numbers. The arithmetic processing apparatus according to claim 1 or 2, characterized in that

4. The arithmetic processing apparatus according to claim 3, wherein in the cache memory, addresses of the ways are serial numbers for each group.

A method for controlling an arithmetic processing unit having a plurality of storage elements for each way and having a cache memory in which consecutive addresses are assigned to the plurality of storage elements belonging to each way,
Obtaining an arithmetic processing instruction including a plurality of cache access requests each instructing processing for data of a predetermined address interval included in the same way;
Executing a plurality of access processes on the data of the predetermined address interval included in the way designated by each acquired cache access request;
An arithmetic processing unit that performs arithmetic processing based on the processing result of the access processing.