JP2013210961A

JP2013210961A - Cache memory device, method of controlling cache memory, and information processing device

Info

Publication number: JP2013210961A
Application number: JP2012082322A
Authority: JP
Inventors: Atsushi Torii; 淳鳥居
Original assignee: Renesas Electronics Corp
Current assignee: Renesas Electronics Corp
Priority date: 2012-03-30
Filing date: 2012-03-30
Publication date: 2013-10-10

Abstract

PROBLEM TO BE SOLVED: To solve such a problem that it takes a long time to complete all the responses to a plurality of access requests.SOLUTION: A cache memory device 100 includes: a plurality of tag memories 121 to 12n; a plurality of data memories 131 to 13n corresponding to each tag memory; and a controller unit 110 that performs cache hit determination to the plurality of tag memories 121 to 12n for a first access request AR and in the course of access processing to any of the plurality of data memories based on a second access request, initiates access processing to the data memory when the data memory corresponding to the tag memory identified in a cache hit in the first access request AR is not in the course of access processing based on the second access request.

Description

本発明は、キャッシュメモリ装置及びキャッシュメモリの制御方法並びに情報処理装置に関し、例えば、複数のアクセス要求を処理するためのキャッシュメモリ装置及びキャッシュメモリの制御方法並びに情報処理装置に関する。 The present invention relates to a cache memory device, a cache memory control method, and an information processing device. For example, the present invention relates to a cache memory device, a cache memory control method, and an information processing device for processing a plurality of access requests.

プロセッサの速度向上に対して、外部メモリの速度向上は限られる。そのため、プロセッサと外部メモリとの間にキャッシュメモリを設けることで、システム全体としてのメモリレイテンシやスループットの改善を図る手法が用いられている。 The speed improvement of the external memory is limited to the speed improvement of the processor. Therefore, a technique for improving memory latency and throughput of the entire system by providing a cache memory between the processor and the external memory is used.

ここで、キャッシュメモリは、応答速度とコスト（容量）との関係から複数に階層化されていることが一般的である。すなわち、最上位階層の一次キャッシュは、高速アクセスが可能な代わりに、その高速性を維持することや消費電力のため容量が制限される。そこで、一次キャッシュは、プロセッサの役割を担うＩＰコア（ＩｎｔｅｌｌｅｃｔｕａｌＰｒｏｐｅｒｔｙＣｏｒｅ）のプライベートキャッシュとして用いられることが多い。 Here, the cache memory is generally hierarchized into a plurality of layers from the relationship between response speed and cost (capacity). In other words, the primary cache of the highest hierarchy is limited in capacity for maintaining high speed and power consumption instead of enabling high speed access. Therefore, the primary cache is often used as a private cache of an IP core (Intellectual Property Core) that plays the role of a processor.

一方、二次キャッシュ以降は、一次キャッシュと比べて容量が確保できる代わりに、アクセス速度が相対的に遅くなる。そこで、二次キャッシュ以降は、複数のＩＰコアから共有される共有キャッシュメモリとして用いられることが多い。ここで、共有キャッシュメモリは、複数のアクセス要求に対して応答する必要がある。そのためには、例えば、マルチバンクやマルチポートといった技術が挙げられる。 On the other hand, after the secondary cache, the access speed is relatively slow instead of securing the capacity compared to the primary cache. Therefore, after the secondary cache, it is often used as a shared cache memory shared by a plurality of IP cores. Here, the shared cache memory needs to respond to a plurality of access requests. For this purpose, for example, techniques such as multi-bank and multi-port can be cited.

非特許文献１には、マルチバンクに関する技術が開示されている。非特許文献１にかかるＬ２キャッシュメモリは、命令又はデータのキャッシュであり、物理的に８バンクに分割されている。また、当該Ｌ２キャッシュメモリは、全ＣＰＵの間で論理的に共有されている。そして、当該Ｌ２キャッシュメモリは、キャッシュラインの物理アドレス（６４バイト）の下位ビットを用いてインターリーブするものである。 Non-Patent Document 1 discloses a technique related to multibank. The L2 cache memory according to Non-Patent Document 1 is an instruction or data cache, and is physically divided into 8 banks. The L2 cache memory is logically shared among all CPUs. The L2 cache memory is interleaved using lower bits of the physical address (64 bytes) of the cache line.

また、特許文献１には、マルチポートに関する技術が開示されている。特許文献１にかかるマルチポートキャッシュメモリは、各メモリセルについて複数の入出力ポートを有する。そのため、同一のキャッシュメモリに対して複数のアクセス要求が発生しても、複数のメモリセルに対して同時並行に処理が可能である。 Patent Document 1 discloses a technique related to multiport. The multi-port cache memory according to Patent Document 1 has a plurality of input / output ports for each memory cell. For this reason, even if a plurality of access requests are generated for the same cache memory, a plurality of memory cells can be processed in parallel.

特開２００６−１３９４０１号公報JP 2006-139401 A

Luiz Andre Barroso, Kourosh Gharachorloo, Robert McNamara, Andreas Nowatzyk, Shaz Qadeer, Barton Sano, Scott Smith, Robert Stets, and Ben Verghese, "Piranha: A Scalable Architecture Based on Single-Chip Multiprocessing", International Symposium of Comupter Architecture 2000, pp 282 - 293Luiz Andre Barroso, Kourosh Gharachorloo, Robert McNamara, Andreas Nowatzyk, Shaz Qadeer, Barton Sano, Scott Smith, Robert Stets, and Ben Verghese, "Piranha: A Scalable Architecture Based on Single-Chip Multiprocessing", International Symposium of Comupter Architecture 2000, pp 282-293

しかしながら、非特許文献１には、複数のアクセス要求に対する全ての応答が完了するまでの時間が長くなる場合があるという問題点がある。それは、複数のアクセス要求のアクセス対象のアドレスが同一のバンクを指す場合（いわゆるバンク競合の場合）に、各アクセス要求を並列に処理できないためである。したがって、バンク競合を減らすように並列に実行するプログラムをチューニングする必要などが生じる可能性がある。 However, Non-Patent Document 1 has a problem that it may take a long time to complete all responses to a plurality of access requests. This is because when access addresses of a plurality of access requests indicate the same bank (so-called bank conflict), the access requests cannot be processed in parallel. Therefore, it may be necessary to tune a program to be executed in parallel to reduce bank contention.

尚、特許文献１にかかるマルチポートキャッシュメモリは、一次キャッシュを対象としたものである。一般にマルチポートメモリは、面積コストが大きく、二次キャッシュ以降のキャッシュメモリの様に容量が重要なキャッシュメモリには適さない。よって、特許文献１では、上述した課題を解決することができない。 Note that the multi-port cache memory according to Patent Document 1 is intended for a primary cache. In general, the multi-port memory has a large area cost and is not suitable for a cache memory whose capacity is important like a cache memory after the secondary cache. Therefore, in patent document 1, the subject mentioned above cannot be solved.

その他の課題と新規な特徴は、本明細書の記述および添付図面から明らかになるであろう。 Other problems and novel features will become apparent from the description of the specification and the accompanying drawings.

一実施の形態によれば、キャッシュメモリ装置は、第１のアクセス要求に対して複数のタグメモリへのキャッシュヒット判定を行い、第２のアクセス要求に基づき前記複数のデータメモリのいずれかにアクセス処理中に、前記第１のアクセス要求におけるキャッシュヒットの際に特定されるタグメモリに対応するデータメモリが、前記第２のアクセス要求に基づきアクセス処理中ではない場合に、当該データメモリへのアクセス処理を開始する。 According to one embodiment, the cache memory device performs a cache hit determination on a plurality of tag memories in response to the first access request, and accesses any one of the plurality of data memories based on the second access request. During processing, if the data memory corresponding to the tag memory specified at the time of the cache hit in the first access request is not being accessed based on the second access request, the data memory is accessed. Start processing.

また、他の実施の形態によれば、キャッシュメモリ装置は、アソシアティブキャッシュメモリに対する第１のアクセス要求に基づき複数のウェイのいずれかにデータアクセス中に、当該第１のアクセス要求以外の第２のアクセス要求におけるキャッシュヒットにより特定されるウェイが、当該第１のアクセス要求に基づきデータアクセス中ではない場合、当該特定されるウェイに対するデータアクセスを開始する。 Further, according to another embodiment, the cache memory device may perform second data other than the first access request during data access to any of the plurality of ways based on the first access request to the associative cache memory. If the way specified by the cache hit in the access request is not being accessed based on the first access request, data access to the specified way is started.

さらに、他の実施の形態によれば、キャッシュメモリの制御方法は、第１のアクセス要求に基づきキャッシュヒットしたタグメモリを特定し、当該第１のアクセス要求以外の第２のアクセス要求に基づき複数のデータメモリのいずれかにアクセス処理中に、前記第１のアクセス要求における前記特定されたタグメモリに対応するデータメモリが当該第１のアクセス要求以外の第２のアクセス要求に基づきアクセス処理中であるか否かを判定し、前記アクセス処理中ではないと判定した場合に、前記第１のアクセス要求に基づきアクセス処理を開始する。 Furthermore, according to another embodiment, the cache memory control method specifies a tag memory that has a cache hit based on the first access request, and a plurality of cache memory control methods based on second access requests other than the first access request. The data memory corresponding to the identified tag memory in the first access request is being accessed based on the second access request other than the first access request. It is determined whether or not there is an access process, and when it is determined that the access process is not in progress, the access process is started based on the first access request.

また、他の実施の形態によれば、情報処理装置は、複数のプロセッサコアのいずれかからの第１のアクセス要求に対して複数のタグメモリへのキャッシュヒット判定を行い、当該第１のアクセス要求とは異なる第２のアクセス要求に基づき前記複数のデータメモリのいずれかにアクセス処理中に、前記第１のアクセス要求におけるキャッシュヒットの際に特定されるタグメモリに対応するデータメモリが、前記第２のアクセス要求に基づきアクセス処理中ではない場合に、当該データメモリへのアクセス処理を開始する。 Further, according to another embodiment, the information processing apparatus performs cache hit determination to a plurality of tag memories in response to a first access request from any of a plurality of processor cores, and performs the first access During access processing to any of the plurality of data memories based on a second access request different from the request, a data memory corresponding to a tag memory specified at the time of a cache hit in the first access request, When access processing is not being performed based on the second access request, access processing to the data memory is started.

前記一実施の形態によれば、複数のアクセス要求に対する全ての応答が完了するまでの時間を短縮することができる。 According to the one embodiment, it is possible to shorten the time until all responses to a plurality of access requests are completed.

本実施の形態１にかかるキャッシュメモリ装置の構成を示すブロック図である。1 is a block diagram illustrating a configuration of a cache memory device according to a first embodiment; 本実施の形態１にかかるキャッシュメモリ装置の制御処理の流れを示すフローチャートである。3 is a flowchart showing a flow of control processing of the cache memory device according to the first embodiment; 本実施の形態２にかかるマルチプロセッサシステムの構成を示すブロック図である。FIG. 3 is a block diagram illustrating a configuration of a multiprocessor system according to a second embodiment. 本実施の形態２にかかるキャッシュメモリ装置の制御処理の流れを示すフローチャートである。10 is a flowchart showing a flow of control processing of the cache memory device according to the second embodiment; 本実施の形態２にかかるキャッシュメモリ装置の動作を説明するタイミングチャートである。6 is a timing chart for explaining the operation of the cache memory device according to the second embodiment; 本実施の形態２にかかるキャッシュメモリ装置の動作を説明するタイミングチャートである。6 is a timing chart for explaining the operation of the cache memory device according to the second embodiment; 本実施の形態２にかかるキャッシュメモリ装置の動作を説明するタイミングチャートである。6 is a timing chart for explaining the operation of the cache memory device according to the second embodiment; 本実施の形態３にかかるキャッシュメモリ装置の制御処理の流れを示すフローチャートである。14 is a flowchart showing a flow of control processing of the cache memory device according to the third embodiment; 本実施の形態６にかかる情報処理装置の構成を示すブロック図である。It is a block diagram which shows the structure of the information processing apparatus concerning this Embodiment 6. 関連技術にかかるマルチバンクによるプロセッサシステムの構成を示すブロック図である。It is a block diagram which shows the structure of the processor system by the multibank concerning related technology.

以下では、上述した課題を解決するための手段を適用した具体的な実施の形態について、図面を参照しながら詳細に説明する。各図面において、同一要素には同一の符号が付されており、説明の明確化のため、必要に応じて重複説明は省略する。 Hereinafter, specific embodiments to which means for solving the above-described problems are applied will be described in detail with reference to the drawings. In the drawings, the same elements are denoted by the same reference numerals, and redundant description will be omitted as necessary for the sake of clarity.

＜実施の形態１＞
図１は、本実施の形態１にかかるキャッシュメモリ装置１００の構成を示すブロック図である。キャッシュメモリ装置１００は、複数のアクセス要求ＡＲを受け付けて、それぞれの要求元に対する応答としてのデータ転送ＤＴを部分的に並列処理するものである。尚、アクセス要求に対する応答は、データ転送の他、書き込み処理等のデータメモリへのアクセス処理を含むものとする。キャッシュメモリ装置１００は、制御部１１０と、タグメモリ群１２０と、データメモリ群１３０とを備える。 <Embodiment 1>
FIG. 1 is a block diagram illustrating a configuration of the cache memory device 100 according to the first embodiment. The cache memory device 100 accepts a plurality of access requests AR and partially performs parallel processing of data transfer DT as a response to each request source. Note that the response to the access request includes access processing to the data memory such as write processing in addition to data transfer. The cache memory device 100 includes a control unit 110, a tag memory group 120, and a data memory group 130.

タグメモリ群１２０は、複数のタグメモリ１２１〜１２ｎを含む。データメモリ群１３０は、複数のデータメモリ１３１〜１３ｎを含む。ここで、タグメモリ１２１〜１２ｎと、データメモリ１３１〜１３ｎとはそれぞれが対応するものである。例えば、タグメモリ１２１とデータメモリ１３１とが対応し、この対応関係をウェイＷ１と定義する。同様に、タグメモリ１２２とデータメモリ１３２との対応関係がウェイＷ２であり、・・・タグメモリ１２ｎとデータメモリ１３ｎとの対応関係がウェイＷｎである。尚、タグメモリ１２１〜１２ｎ並びにデータメモリ１３１〜１３ｎは、それぞれ異なるメモリモジュールとする。メモリモジュールとしては、例えば、ＳＲＡＭ（ＳｔａｔｉｃＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）が採用できる。そのため、タグメモリ１２１〜１２ｎには並列にキャッシュヒット判定のアクセスが可能である。また、データメモリ１３１〜１３ｎには並列にデータアクセス処理が可能である。例えば、データメモリ１３１とデータメモリ１３２から並列にデータ転送が可能である。 The tag memory group 120 includes a plurality of tag memories 121 to 12n. The data memory group 130 includes a plurality of data memories 131 to 13n. Here, the tag memories 121 to 12n and the data memories 131 to 13n correspond to each other. For example, the tag memory 121 and the data memory 131 correspond, and this correspondence is defined as way W1. Similarly, the correspondence relationship between the tag memory 122 and the data memory 132 is the way W2, and the correspondence relationship between the tag memory 12n and the data memory 13n is the way Wn. The tag memories 121 to 12n and the data memories 131 to 13n are different memory modules. As the memory module, for example, an SRAM (Static Random Access Memory) can be adopted. Therefore, the cache memory determination access can be made in parallel to the tag memories 121 to 12n. The data memories 131 to 13n can perform data access processing in parallel. For example, data can be transferred from the data memory 131 and the data memory 132 in parallel.

制御部１１０は、第１のアクセス要求ＡＲに対して複数のタグメモリ１２１〜１２ｎへのキャッシュヒット判定を行う。そして、制御部１１０は、第２のアクセス要求に基づき複数のデータメモリのいずれかにアクセス処理中に、第１のアクセス要求ＡＲにおけるキャッシュヒットの際に特定されるタグメモリに対応するデータメモリが、第２のアクセス要求に基づきアクセス処理中ではない場合に、当該データメモリへのアクセス処理を開始する。つまり、制御部１１０は、アクセス要求ＡＲがキャッシュヒットしたウェイと、他のアクセス要求がキャッシュヒットしたウェイとが異なるか否かを判定し、異なると判定した場合に当該ウェイへのデータアクセス処理を開始する。また、他のアクセス要求は、アクセス要求ＡＲに先行して受け付けられたものとする。特に、制御部１１０は、アクセス処理として、データメモリから読み出したデータをアクセス要求ＡＲの要求元へバースト転送する。 The control unit 110 performs cache hit determination on the plurality of tag memories 121 to 12n in response to the first access request AR. Then, the control unit 110 determines whether the data memory corresponding to the tag memory specified at the time of the cache hit in the first access request AR is in the process of accessing any of the plurality of data memories based on the second access request. When access processing is not being performed based on the second access request, access processing to the data memory is started. That is, the control unit 110 determines whether the way in which the access request AR hits the cache hit and the way in which another access request hits the cache hit are different, and if the way is different, the control unit 110 performs the data access process to the way. Start. Further, it is assumed that other access requests are received prior to the access request AR. In particular, the control unit 110 performs burst transfer of data read from the data memory to the request source of the access request AR as an access process.

これにより、複数のアクセス要求のデータ転送を部分的に並列処理できるため、全体としてデータ転送処理時間を短縮することができる。 As a result, data transfer of a plurality of access requests can be partially processed in parallel, so that the data transfer processing time can be shortened as a whole.

図２は、本実施の形態１にかかるキャッシュメモリ装置１００の制御処理の流れを示すフローチャートである。まず、制御部１１０は、受け付けたアクセス要求ＡＲがキャッシュヒットであるか否かを判定する（Ｓ１０１）。ステップＳ１０１においてキャッシュヒットであると判定した場合、制御部１１０は、ヒットしたウェイを特定する（Ｓ１０２）。すなわち、制御部１１０は、第１のアクセス要求に基づきキャッシュヒットしたタグメモリを特定する。 FIG. 2 is a flowchart showing a flow of control processing of the cache memory device 100 according to the first embodiment. First, the control unit 110 determines whether or not the received access request AR is a cache hit (S101). If it is determined in step S101 that the cache hit has occurred, the control unit 110 identifies the hit way (S102). That is, the control unit 110 specifies a tag memory that has a cache hit based on the first access request.

ここで、この時点で少なくともアクセス要求ＡＲにより先行又は同時に受け付けられたアクセス要求に基づきデータメモリ群１３０内のいずれかのデータメモリに対するアクセス処理が実行中であるものとする。このとき、制御部１１０は、特定されたウェイにかかるデータメモリがアクセス処理中であるか否かを判定する（Ｓ１０３）。 Here, it is assumed that, at this time, an access process for any one of the data memories in the data memory group 130 is being executed based on at least the access request that has been received in advance or simultaneously by the access request AR. At this time, the control unit 110 determines whether or not the data memory associated with the identified way is being accessed (S103).

ステップＳ１０３においてアクセス処理中であると判定した場合、制御部１１０は、アクセス要求ＡＲに基づくアクセス処理を待機する（Ｓ１０４）。そして、一定時間後に、再度、ステップＳ１０３を実行する。また、ステップＳ１０３においてアクセス処理中でないと判定した場合、制御部１１０は、アクセス要求ＡＲに基づきアクセス処理を開始する（Ｓ１０５）。例えば、第１のアクセス要求以外の第２のアクセス要求に基づきデータメモリ１３２に対してアクセス処理中であっても、ステップＳ１０２により特定されたタグメモリに対応するデータメモリ１３１が第２のアクセス要求に基づきアクセス処理中ではないため、第１のアクセス要求に基づきアクセス処理を開始する。また、ステップＳ１０４による待機の間に第２のアクセス要求に基づくアクセス処理が完了すれば、ステップＳ１０３においてアクセス処理中でないと判定される。 If it is determined in step S103 that the access process is being performed, the control unit 110 waits for an access process based on the access request AR (S104). Then, after a predetermined time, step S103 is executed again. If it is determined in step S103 that access processing is not being performed, the control unit 110 starts access processing based on the access request AR (S105). For example, even if the data memory 132 is being accessed based on the second access request other than the first access request, the data memory 131 corresponding to the tag memory identified in step S102 is not in the second access request. Since the access process is not being executed based on the first access request, the access process is started based on the first access request. If the access process based on the second access request is completed during the standby in step S104, it is determined in step S103 that the access process is not in progress.

また、ステップＳ１０１においてキャッシュミスであると判定した場合、制御部１１０は、該当データのロード処理する（Ｓ１０６）。 If it is determined in step S101 that there is a cache miss, the control unit 110 loads the corresponding data (S106).

そのため、例えば、先行するアクセス要求に基づいて任意のデータメモリからデータ転送中に、後続するアクセス要求を受け付けた場合、キャッシュメモリ装置１００は、後続するアクセス要求のアクセス対象として特定されたデータメモリが、先行するアクセス要求に基づきデータ転送中でなければ、後続するアクセス要求のデータ転送を開始する。そのため、先行するアクセス要求と後続するアクセス要求とに基づくデータ転送が部分的に並列実行される。よって、先行するアクセス要求のデータ転送が完了するまで待ち合わせるようなシーケンシャルに処理する場合に比べて、全体のデータ転送時間を短縮することができる。 Therefore, for example, when a subsequent access request is received during data transfer from an arbitrary data memory based on the preceding access request, the cache memory device 100 determines that the data memory specified as the access target of the subsequent access request is If data transfer is not in progress based on the preceding access request, data transfer of the subsequent access request is started. Therefore, data transfer based on the preceding access request and the subsequent access request is partially executed in parallel. Therefore, the entire data transfer time can be shortened as compared with the case where sequential processing is performed in which the data transfer of the preceding access request is completed.

尚、本実施の形態１にかかるキャッシュメモリ装置１００は、次のように言い換えることができる。すなわち、キャッシュメモリ装置は、アソシアティブキャッシュメモリに対する第１のアクセス要求に基づき複数のウェイのいずれかにデータアクセス中に、当該第１のアクセス要求以外の第２のアクセス要求におけるキャッシュヒットにより特定されるウェイが、当該第１のアクセス要求に基づきデータアクセス中ではない場合、当該特定されるウェイに対するデータアクセスを開始する。 The cache memory device 100 according to the first embodiment can be rephrased as follows. That is, the cache memory device is specified by a cache hit in the second access request other than the first access request during data access to any of the plurality of ways based on the first access request to the associative cache memory. If the way is not accessing data based on the first access request, data access to the specified way is started.

このように、実施の形態１は例えば、通常の複数ウェイを持ったセットアソシアティブキャッシュを拡張することで実現可能である。すなわち、キャッシュタグはシーケンシャルにサーチし、その結果によって、ウェイがアクセス処理中でなければ、並列にデータアクセスをできるようにするものである。つまり、アクセスするウェイが異なれば、アクセス処理中でないため、異なったイニシエータから並列アクセス可能となり、スループットが向上する。特に、バースト転送によって、タグ引きサイクルに比べて、データアレイへのアクセスサイクルが長くなる場合に有効である。 As described above, the first embodiment can be realized, for example, by expanding a set associative cache having a normal plurality of ways. That is, the cache tag searches sequentially, and the result allows data access in parallel if the way is not being accessed. In other words, if the way to be accessed is different, the access processing is not in progress, so that parallel access is possible from different initiators, and the throughput is improved. This is particularly effective when the access cycle to the data array becomes longer than the tag pulling cycle due to burst transfer.

上述したように共有キャッシュメモリは、複数のＩＰコアからのアクセス要求に対して応答する必要が生じる。そのため、プライベートキャッシュのヒット率が低いアプリケーション実行時などには、この共有キャッシュメモリがアクセスネックとなり、性能を律速する可能性が生じる。 As described above, the shared cache memory needs to respond to access requests from a plurality of IP cores. For this reason, when an application with a low private cache hit rate is executed, the shared cache memory becomes an access bottleneck, which may limit the performance.

続いて、以下に非特許文献１における課題と、本実施の形態１による課題の解決及び効果について詳述する。共有キャッシュメモリには、上述の通りキャッシュネックが存在するが、これに対して上述したマルチバンク方式のキャッシュメモリが利用されていることが多い。マルチバンク方式は、アドレスの特定ビットを識別ビットとして割り付け、複数のキャッシュメモリの中から識別ビットによって該当するキャッシュメモリを決定してアクセスを行うものである。これにより、識別ビット（バンクアドレス）の異なるキャッシュメモリに対しては、異なったＩＰコアからのアクセス要求を許容することができる。 Subsequently, the problem in Non-Patent Document 1 and the solution and effect of the problem according to the first embodiment will be described in detail. The shared cache memory has a cache neck as described above, but the multi-bank cache memory described above is often used. In the multi-bank method, a specific bit of an address is assigned as an identification bit, and the corresponding cache memory is determined by the identification bit from a plurality of cache memories and accessed. Thereby, access requests from different IP cores can be permitted for cache memories having different identification bits (bank addresses).

図１０は、非特許文献１にかかるマルチバンクによるプロセッサシステムの構成を示すブロック図である。Ｌ２キャッシュメモリＬ２_０〜Ｌ２_７は、命令又はデータのキャッシュであり、物理的に８バンクに分割されている。また、Ｌ２キャッシュメモリＬ２_０〜Ｌ２_７は、全ＣＰＵの間で論理的に共有されている。そして、Ｌ２キャッシュメモリＬ２_０〜Ｌ２_７は、キャッシュラインの物理アドレス（６４バイト）の下位ビットを用いてインターリーブするものである。これは、非特許文献１のpp285.左側の15-17行目に、"The L2 banks are interleaved using the lower address bits of a cache line's physical address (64-byte line)."と記載されていることから明らかである。 FIG. 10 is a block diagram illustrating a configuration of a multi-bank processor system according to Non-Patent Document 1. The L2 cache memories L2 _{0 to} L2 ₇ are instruction or data caches, and are physically divided into 8 banks. Further, the L2 cache memories L2 _{0 to} L2 ₇ are logically shared among all CPUs. The L2 cache memories L2 _{0 to} L2 ₇ are interleaved using the lower bits of the physical address (64 bytes) of the cache line. This is described in Non-Patent Document 1, pp285, lines 15-17 on the left side as "The L2 banks are interleaved using the lower address bits of a cache line's physical address (64-byte line)." It is clear from

そのため、非特許文献１では、例えば、２つのアクセス要求が１つのＬ２キャッシュメモリ（バンク）に対してアクセスする場合には、一方のアクセス要求によるキャッシュのヒット判定を行い、ヒットした場合にＬ２キャッシュメモリから該当データを読み出し、要求元へデータ転送を行う。そして、当該データ転送が完了した後に、他方のアクセス要求に対するヒット判定等の処理が開始される。つまり、バンク競合の場合には、アクセス要求をシーケンシャルに処理せざるを得ず、結果として、複数のアクセス要求に対する全ての応答が完了するまでの時間が長くなってしまう。 Therefore, in Non-Patent Document 1, for example, when two access requests access one L2 cache memory (bank), a cache hit determination is performed by one access request. Read the corresponding data from the memory and transfer the data to the request source. Then, after the data transfer is completed, processing such as hit determination for the other access request is started. That is, in the case of bank contention, access requests must be processed sequentially, and as a result, the time until all responses to a plurality of access requests are completed becomes long.

また、非特許文献１のようなマルチバンク方式では、アドレスによって一意にアクセスするキャッシュメモリが決定できるため、構成が容易となる。しかし、次のような主な３つの課題が挙げられる。１つ目は、複数のキャッシュヒットミス判定などの制御論理が必要なことである。２つ目は、バンクアドレスを分散させるチューニングが必要になることである。これは、バンクアドレスが分散しない場合には、特定のキャッシュメモリのバンクのみにアクセスが集中するためである。３つ目は、キャッシュメモリに対するアクセス負荷に対して、電源切断によるキャッシュ容量の増減などに対応しづらいことである。 In addition, in the multi-bank system as in Non-Patent Document 1, the cache memory to be uniquely accessed can be determined by the address, so that the configuration becomes easy. However, there are the following three main issues. The first is that control logic such as determination of a plurality of cache hit misses is necessary. Second, tuning to distribute bank addresses is necessary. This is because when the bank addresses are not distributed, accesses are concentrated only on a specific bank of the cache memory. Third, it is difficult to cope with an increase or decrease in cache capacity due to power-off, etc., against an access load on the cache memory.

そこで、１バンク内を複数のウェイにより管理するセットアソシアティブ方式を用いることが考えられる。セットアソシアティブ方式では、あるアドレスのデータについて複数のキャッシュエントリ、つまりウェイのいずれかに格納することが可能である。しかし、セットアソシアティブ方式では、あるアクセス要求についてキャッシュヒットした場合に、ヒットしたウェイにおけるデータの転送が完了するまでの間、他のアクセス要求についての処理を進められない。つまり、並列したアクセス要求に対応していない。 Therefore, it is conceivable to use a set associative method in which one bank is managed by a plurality of ways. In the set associative method, data at a certain address can be stored in one of a plurality of cache entries, that is, ways. However, in the set associative method, when a cache hit occurs for a certain access request, the processing for another access request cannot proceed until the data transfer in the hit way is completed. That is, it does not support parallel access requests.

ここで、このような共有キャッシュメモリは、上位のプライベートキャッシュを介したアクセスが中心となるため、複数個のデータを何回かのサイクルに分けて転送するバースト転送がほとんどになる。ここで、シングルデータ転送では、転送ごとにキャッシュタグを比較する必要が生じる。一方、バースト転送では、転送開始時のみキャッシュタグを比較するだけでよい。つまり、バースト転送では、１サイクル目でキャッシュタグの比較を行い、キャッシュヒットした場合には、データメモリに格納されたデータを複数サイクルによりバースト転送する。上述の通り、キャッシュタグへのヒットミス判定は、１サイクルで可能であるにもかかわらず、バースト転送の間のサイクル中の間も、他のアクセス要求についてのヒットミス判定が待機させられていた。 Here, since such a shared cache memory is mainly accessed through a higher-level private cache, burst transfer in which a plurality of data is transferred in several cycles is mostly performed. Here, in single data transfer, it is necessary to compare cache tags for each transfer. On the other hand, in burst transfer, it is only necessary to compare cache tags at the start of transfer. That is, in the burst transfer, the cache tags are compared in the first cycle, and when the cache hits, the data stored in the data memory is burst transferred in a plurality of cycles. As described above, hit miss determination for the cache tag is possible in one cycle, but hit miss determination for other access requests is kept waiting even during the cycle between burst transfers.

そこで、本実施の形態１では、複数のキャッシュエントリの候補すなわちウェイを、キャッシュタグアクセス後に並列にアクセスするものである。すなわち、本実施の形態１は、複数のアクセス要求に対して、タグメモリへのアクセスは直列に行い、データメモリへのアクセスは並列に行うキャッシュメモリ装置である。これにより、スループット向上が可能となり、マルチバンク方式に匹敵するスループット向上が可能になる。しかも、マルチバンク方式キャッシュのチューニングコスト、容量増減、論理量などの課題を解決できる。 Therefore, in the first embodiment, a plurality of cache entry candidates, that is, ways, are accessed in parallel after the cache tag is accessed. That is, the first embodiment is a cache memory device that performs access to a tag memory in series and accesses data memory in parallel in response to a plurality of access requests. Thereby, throughput can be improved, and throughput comparable to the multi-bank method can be achieved. In addition, problems such as the tuning cost, capacity increase / decrease, and logical amount of the multi-bank cache can be solved.

＜実施の形態２＞
本実施の形態２では、上述した実施の形態１の具体的な実施例であり、複数のアクセス要求を同時に受け付けた場合に、アクセス要求を調停するものである。 <Embodiment 2>
The second embodiment is a specific example of the first embodiment described above, and arbitrates access requests when a plurality of access requests are received simultaneously.

図３は、本実施の形態２にかかるマルチプロセッサシステム２００の構成を示すブロック図である。マルチプロセッサシステム２００は、ＩＰコア２１１〜２１４と、Ｌ１キャッシュ２２１〜２２４と、キャッシュメモリ装置２３０と、ＳＤＲＡＭ（ＳｙｎｃｈｒｏｎｏｕｓＤｙｎａｍｉｃＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）２４０とを備える。つまり、マルチプロセッサシステム２００は、Ｌ１キャッシュ２２１〜２２４と、Ｌ２キャッシュであるキャッシュメモリ装置２３０と、さらに下位層のＳＤＲＡＭ２４０との階層メモリを有する。 FIG. 3 is a block diagram showing a configuration of the multiprocessor system 200 according to the second embodiment. The multiprocessor system 200 includes IP cores 211 to 214, L1 caches 221 to 224, a cache memory device 230, and an SDRAM (Synchronous Dynamic Random Access Memory) 240. That is, the multiprocessor system 200 includes hierarchical memories of L1 caches 221 to 224, a cache memory device 230 that is an L2 cache, and a lower layer SDRAM 240.

ＩＰコア２１１〜２１４は、データの読み出しや書き込みをするためのアクセス要求を階層メモリに対して行う。ＩＰコア２１１〜２１４は、それぞれＬ１キャッシュ２２１〜２２４を備える。ＩＰコア２１１〜２１４は、Ｌ１キャッシュミスの場合、アクセス要求バスＢ２０を介してアービタスケジューラ２３１へアクセス要求を発行する。尚、ＩＰコアの数はこれに限定されない。 The IP cores 211 to 214 make access requests for reading and writing data to the hierarchical memory. The IP cores 211 to 214 include L1 caches 221 to 224, respectively. In the case of an L1 cache miss, the IP cores 211 to 214 issue an access request to the arbiter scheduler 231 via the access request bus B20. The number of IP cores is not limited to this.

アービタスケジューラ２３１は、複数のアクセス要求を受け付け、調停を行った上で、Ｌ２ＨＩＴ／ＭＩＳＳ判定部２３２に対して一つずつアクセス要求を発行する。尚、調停を実現するためのアルゴリズム等は公知なもののいずれを採用しても構わない。Ｌ２ＨＩＴ／ＭＩＳＳ判定部２３２は、アクセス要求に応じてＷＡＹ０タグ２３３０及びＷＡＹ１タグ２３３１に対してキャッシュのヒット判定を行う。 The arbiter scheduler 231 receives a plurality of access requests, performs arbitration, and issues access requests one by one to the L2HIT / MISS determination unit 232. It should be noted that any known algorithm or the like for realizing the arbitration may be employed. The L2HIT / MISS determination unit 232 performs cache hit determination for the WAY 0 tag 2330 and the WAY 1 tag 2331 in response to the access request.

ＷＡＹ０データアレイ２３５０及びＷＡＹ１データアレイ２３５１は、それぞれ独立したＳＲＡＭで構成され、ＳＤＲＡＭ２４０内の一部のアドレスに対応するデータをラインごとに格納しているデータメモリである。尚、ＳＤＲＡＭ２４０内の同一のアドレスに対応したデータは、ＷＡＹ０データアレイ２３５０及びＷＡＹ１データアレイ２３５１のいずれにも格納可能である。 The WAY 0 data array 2350 and the WAY 1 data array 2351 are data memories that are configured by independent SRAMs and store data corresponding to some addresses in the SDRAM 240 for each line. Note that data corresponding to the same address in the SDRAM 240 can be stored in either the WAY 0 data array 2350 or the WAY 1 data array 2351.

ＷＡＹ０アクセス状態フラグ２３４０は、ＷＡＹ０データアレイ２３５０が読出し又は書込みのアクセス処理が処理中であるか否かを示すアクセス状態情報である。同様に、ＷＡＹ１アクセス状態フラグ２３４１は、ＷＡＹ１データアレイ２３５１が読出し又は書込みのアクセス処理が処理中であるか否かを示すアクセス状態情報である。ＷＡＹ０アクセス状態フラグ２３４０及びＷＡＹ１アクセス状態フラグ２３４１は、少なくともＬ２ＨＩＴ／ＭＩＳＳ判定部２３２が参照可能なデータである。例えば、ＷＡＹ０アクセス状態フラグ２３４０は、例えば、ＷＡＹ０データアレイ２３５０内の制御領域に格納されるものである。同様に、ＷＡＹ１アクセス状態フラグ２３４１は、ＷＡＹ１データアレイ２３５１内の制御領域に格納されるものである。または、キャッシュメモリ装置２３０内の記憶領域に管理テーブルとしてアクセス状態情報を格納してもよい。ここで、管理テーブルは、データメモリがアクセス処理中か否かを示す情報を格納するものである。 The WAY 0 access status flag 2340 is access status information indicating whether or not the WAY 0 data array 2350 is performing read or write access processing. Similarly, the WAY 1 access state flag 2341 is access state information indicating whether or not the WAY 1 data array 2351 is performing a read or write access process. The WAY 0 access state flag 2340 and the WAY 1 access state flag 2341 are data that can be referred to at least by the L2HIT / MISS determination unit 232. For example, the WAY 0 access status flag 2340 is stored in the control area in the WAY 0 data array 2350, for example. Similarly, the WAY 1 access status flag 2341 is stored in the control area in the WAY 1 data array 2351. Alternatively, access state information may be stored as a management table in a storage area in the cache memory device 230. Here, the management table stores information indicating whether or not the data memory is being accessed.

また、アクセス状態フラグとしては、"Ｆｒｅｅ"と"Ｂｕｓｙ"とのいずれかを示す値であればよい。例えば、"０"が"Ｆｒｅｅ"であり、"１"が"Ｂｕｓｙ"であってもよく、これ以外の例であってもよい。ここで、"Ｆｒｅｅ"は「アクセス処理中でない」ことを示し、"Ｂｕｓｙ"は「アクセス処理中である」ことを示すものである。また、Ｌ２ＨＩＴ／ＭＩＳＳ判定部２３２は、ウェイにロックをかけるときには、アクセス状態フラグを"１"に更新し、ウェイのロックを解除するときには、アクセス状態フラグを"０"に更新する。 The access state flag may be a value indicating either “Free” or “Busy”. For example, “0” may be “Free”, “1” may be “Busy”, or other examples. Here, “Free” indicates “access not being processed”, and “Busy” indicates “access is being processed”. The L2HIT / MISS determination unit 232 updates the access state flag to “1” when locking the way, and updates the access state flag to “0” when unlocking the way.

ＷＡＹ０タグ２３３０は、ＷＡＹ０データアレイ２３５０の各ラインに対応するアドレス情報である。同様に、ＷＡＹ１タグ２３３１は、ＷＡＹ１データアレイ２３５１の各ラインに対応するアドレス情報である。尚、ＷＡＹ０タグ２３３０及びＷＡＹ１タグ２３３１は、１枚のＳＲＡＭで実現しても構わない。 The WAY 0 tag 2330 is address information corresponding to each line of the WAY 0 data array 2350. Similarly, the WAY 1 tag 2331 is address information corresponding to each line of the WAY 1 data array 2351. Note that the WAY 0 tag 2330 and the WAY 1 tag 2331 may be realized by a single SRAM.

ＷＡＹ０タグ２３３０、ＷＡＹ０アクセス状態フラグ２３４０及びＷＡＹ０データアレイ２３５０は、ＷＡＹ０として対応付けられている。同様に、ＷＡＹ１タグ２３３１、ＷＡＹ１アクセス状態フラグ２３４１及びＷＡＹ１データアレイ２３５１は、ＷＡＹ１として対応付けられている。尚、図３では、２ウェイのキャッシュメモリ装置を例示しているが、ウェイ数は２以上であればよい。 The WAY 0 tag 2330, the WAY 0 access status flag 2340, and the WAY 0 data array 2350 are associated as WAY 0. Similarly, the WAY1 tag 2331, the WAY1 access state flag 2341, and the WAY1 data array 2351 are associated as WAY1. 3 illustrates a two-way cache memory device, the number of ways may be two or more.

ＳＤＲＡＭコントローラ２３６は、Ｌ２ＨＩＴ／ＭＩＳＳ判定部２３２においてキャッシュミスと判定されたアクセス要求に応じて、ＳＤＲＡＭ２４０へのアクセスを制御し、応答データを要求元へ転送し、かつ、応答データをＷＡＹ０データアレイ２３５０又はＷＡＹ１データアレイ２３５１にロードする。 The SDRAM controller 236 controls access to the SDRAM 240 according to the access request determined as a cache miss by the L2HIT / MISS determination unit 232, transfers the response data to the request source, and transmits the response data to the WAY0 data array 2350. Alternatively, the data is loaded into the WAY 1 data array 2351.

ＳＤＲＡＭ２４０は、メインメモリであり、Ｌ１キャッシュ２２１〜２２４並びにＷＡＹ０データアレイ２３５０及びＷＡＹ１データアレイ２３５１に格納されないデータも含めて格納されている。 The SDRAM 240 is a main memory, and stores data including data not stored in the L1 caches 221 to 224, the WAY0 data array 2350, and the WAY1 data array 2351.

上述したＬ２ＨＩＴ／ＭＩＳＳ判定部２３２は、制御部１１０の一例である。そのため、本実施の形態２にかかる制御部は、キャッシュヒットである場合にアクセス要求に基づきアクセス処理を開始する際に、当該アクセス処理対象のデータメモリを処理中とし、当該アクセス処理が完了した後に、当該アクセス処理対象のデータメモリを処理中でないとして、それぞれアクセス状態情報を更新する。言い換えると、当該制御部は、アクセス処理の開始時に、当該アクセス処理対象のデータメモリに排他制御のためにロックをかける。また、当該制御部は、アクセス処理の完了後に、当該アクセス処理対象のデータメモリのロックを解除する。また、当該制御部は、アクセス要求がキャッシュヒットである場合に特定されるタグメモリに対応するデータメモリについてアクセス状態情報を参照して、当該アクセス状態情報が処理中でなければ、当該データメモリへのアクセス処理を開始する。このように、アクセス状態情報を用いることにより、アクセス処理中であるか否かについて、データメモリ単位、つまりウェイ単位で容易に確認することが可能となり、排他制御を容易に実現することができる。 The above-described L2HIT / MISS determination unit 232 is an example of the control unit 110. Therefore, when the control unit according to the second embodiment starts an access process based on an access request in the case of a cache hit, the control target data memory is being processed and the access process is completed. The access state information is updated assuming that the data memory targeted for access processing is not being processed. In other words, the control unit locks the data memory targeted for access processing for exclusive control at the start of the access processing. In addition, after the access process is completed, the control unit unlocks the data memory targeted for the access process. In addition, the control unit refers to the access state information for the data memory corresponding to the tag memory specified when the access request is a cache hit, and if the access state information is not being processed, Start the access process. In this way, by using the access state information, it is possible to easily confirm whether or not the access processing is being performed in units of data memory, that is, in units of ways, and exclusive control can be easily realized.

また、上述したアービタスケジューラ２３１は、複数のアクセス要求を受け付けて調停を行う調停部である。そして、上記制御部は、調停された後続のアクセス要求がキャッシュヒットの際に特定されるタグメモリに対応するデータメモリが、調停された先行のアクセス要求に基づきアクセス処理中である場合には、当該後続のアクセス要求を調停部へ戻す。このように、調停部を用いることにより、複数のアクセス要求が同時に発行された場合であっても、１サイクルに調停された一つの要求のヒットミス判定を行いつつ、アクセス処理については上記同様に、部分的に並列処理することができる。 The arbiter scheduler 231 described above is an arbitration unit that accepts a plurality of access requests and performs arbitration. When the data memory corresponding to the tag memory identified when the subsequent access request that has been arbitrated is a cache hit is in the process of access based on the previous access request that has been arbitrated, The subsequent access request is returned to the arbitration unit. In this way, by using the arbitration unit, even when a plurality of access requests are issued at the same time, the access processing is performed in the same manner as described above while performing hit miss determination of one request that is arbitrated in one cycle. , Partially parallel processing.

図４は、本実施の形態２にかかるキャッシュメモリ装置２３０の制御処理の流れを示すフローチャートである。まず、アービタスケジューラ２３１は、複数のアクセス要求を受け付ける（Ｓ２０１）。次に、アービタスケジューラ２３１は、複数のアクセス要求を調停する（Ｓ２０２）。そして、アービタスケジューラ２３１は、処理対象のアクセス要求を選択する（Ｓ２０３）。つまり、アービタスケジューラ２３１は、複数のアクセス要求のうち、調停により最優先と判定されたアクセス要求をＬ２ＨＩＴ／ＭＩＳＳ判定部２３２へ通知する。例えば、２つのアクセス要求を受け付けた場合、調停により先行して処理されるアクセス要求を上述した第２のアクセス要求とし、調停により第２のアクセス要求の後続に処理されるアクセス要求を上述した第１のアクセス要求とする。 FIG. 4 is a flowchart showing a flow of control processing of the cache memory device 230 according to the second embodiment. First, the arbiter scheduler 231 receives a plurality of access requests (S201). Next, the arbiter scheduler 231 arbitrates a plurality of access requests (S202). Then, the arbiter scheduler 231 selects an access request to be processed (S203). That is, the arbiter scheduler 231 notifies the L2HIT / MISS determination unit 232 of the access request determined as the highest priority by arbitration among the plurality of access requests. For example, when two access requests are received, an access request processed in advance by arbitration is set as the second access request described above, and an access request processed subsequent to the second access request by arbitration is set as described above. 1 access request.

そして、Ｌ２ＨＩＴ／ＭＩＳＳ判定部２３２は、Ｌ２キャッシュのヒットミス判定を行う（Ｓ２０４）。つまり、Ｌ２ＨＩＴ／ＭＩＳＳ判定部２３２は、当該アクセス要求がキャッシュヒットであるか否かを判定する。具体的には、Ｌ２ＨＩＴ／ＭＩＳＳ判定部２３２は、アクセス要求に含まれるアドレスとＷＡＹ０タグ２３３０及びＷＡＹ１タグ２３３１とを照合し、一致するか否かを判定し、一致する場合にはキャッシュヒットと判定する。 Then, the L2HIT / MISS determination unit 232 performs L2 cache hit miss determination (S204). That is, the L2HIT / MISS determination unit 232 determines whether or not the access request is a cache hit. Specifically, the L2HIT / MISS determination unit 232 compares the address included in the access request with the WAY0 tag 2330 and the WAY1 tag 2331 to determine whether or not they match. To do.

ステップＳ２０４においてキャッシュヒットと判定した場合、Ｌ２ＨＩＴ／ＭＩＳＳ判定部２３２は、アクセス対象のウェイのフラグが"Ｆｒｅｅ"であるか否かを判定する（Ｓ２０５）。Ｌ２ＨＩＴ／ＭＩＳＳ判定部２３２は、アクセス状態情報を参照して判定する。例えば、ＷＡＹ０がキャッシュヒットした場合、Ｌ２ＨＩＴ／ＭＩＳＳ判定部２３２は、ＷＡＹ０アクセス状態フラグ２３４０のフラグが"０"であれば、"Ｆｒｅｅ"と判定し、"１"であれば"Ｂｕｓｙ"と判定する。つまり、Ｌ２ＨＩＴ／ＭＩＳＳ判定部２３２は、第１のアクセス要求に基づき特定されたタグメモリに対応するデータメモリについて、ロックがかけられているか否かにより、当該データメモリが前記アクセス処理中であるか否かを判定する。 If it is determined in step S204 that there is a cache hit, the L2HIT / MISS determination unit 232 determines whether or not the flag of the access target way is “Free” (S205). The L2HIT / MISS determination unit 232 makes the determination with reference to the access state information. For example, when WAY 0 has a cache hit, the L2HIT / MISS determination unit 232 determines “Free” if the WAY 0 access status flag 2340 is “0”, and determines “Busy” if “1”. To do. That is, the L2HIT / MISS determination unit 232 determines whether the data memory is in the access process depending on whether the data memory corresponding to the tag memory specified based on the first access request is locked. Determine whether or not.

ステップＳ２０５においてアクセス対象のウェイのフラグが"Ｆｒｅｅ"でない、つまり"Ｂｕｓｙ"であると判定した場合、当該アクセス要求は、アービタスケジューラ２３１に戻されて、再度、調停が行われる。つまり、アービタスケジューラ２３１は、第１のアクセス要求のアクセス処理対象のデータメモリが、第２のアクセス要求に基づきアクセス処理中であると判定した場合に、当該第１のアクセス要求を再度、調停対象とする。例えば、先行するアクセス要求により同一のウェイに対して既にアクセス処理が開始されている場合には、同一のＳＲＡＭに対する処理は行えないため、所定時間後にアクセスする必要がある。 If it is determined in step S205 that the flag of the way to be accessed is not “Free”, that is, “Busy”, the access request is returned to the arbiter scheduler 231 and arbitration is performed again. That is, when the arbiter scheduler 231 determines that the data memory subject to the access process of the first access request is being accessed based on the second access request, the arbiter scheduler 231 reassigns the first access request to the arbitration target. And For example, if access processing has already been started for the same way due to a preceding access request, processing for the same SRAM cannot be performed, and access must be made after a predetermined time.

ステップＳ２０５においてアクセス対象のウェイのフラグが"Ｆｒｅｅ"であると判定した場合、Ｌ２ＨＩＴ／ＭＩＳＳ判定部２３２は、アクセス対象のウェイのフラグを"Ｂｕｓｙ"に更新する（Ｓ２０６）。つまり、Ｌ２ＨＩＴ／ＭＩＳＳ判定部２３２は、アクセス要求に基づきアクセス処理を開始する際に、当該アクセス処理対象のデータメモリにロックをかけて、アクセス処理中とする。言い換えると、Ｌ２ＨＩＴ／ＭＩＳＳ判定部２３２は、ロックがかけられていないと判定した場合に、第１のアクセス要求のアクセス処理対象のデータメモリにロックをかける。これにより、当該ウェイに対して後続のアクセス要求に対してアクセス処理を排除できる。 If it is determined in step S205 that the flag of the access target way is “Free”, the L2HIT / MISS determination unit 232 updates the flag of the access target way to “Busy” (S206). That is, when the access process is started based on the access request, the L2HIT / MISS determination unit 232 locks the data memory targeted for the access process and sets the access process in progress. In other words, when the L2HIT / MISS determination unit 232 determines that the lock is not applied, the L2HIT / MISS determination unit 232 locks the data memory targeted for access processing of the first access request. As a result, it is possible to eliminate access processing for subsequent access requests for the way.

例えば、並行してアクセス処理が行われているアクセス要求がない場合には、いずれのウェイであっても"Ｆｒｅｅ"である。また、本実施の形態で顕著な場合としては、既に他のアクセス要求によりアクセス処理が開始されている場合である。このとき、他のアクセス要求によりアクセスされているウェイについては、"Ｂｕｓｙ"となっているが、それ以外のウェイについて"Ｆｒｅｅ"であれば、アクセス可能である。よって、このような場合には、複数のアクセス要求に対してアクセス処理を並列実行することができる。 For example, when there is no access request for which access processing is being performed in parallel, “Free” is used for any way. In addition, a remarkable case in the present embodiment is a case where an access process has already been started by another access request. At this time, the way accessed by another access request is “Busy”, but other ways can be accessed if they are “Free”. Therefore, in such a case, access processing can be executed in parallel for a plurality of access requests.

そして、Ｌ２ＨＩＴ／ＭＩＳＳ判定部２３２は、対象のデータアレイへのアクセス処理を開始する（Ｓ２０７）。例えば、アクセス要求が読出し命令である場合、要求元のＩＰコアに対してデータ転送が行われる。 Then, the L2HIT / MISS determination unit 232 starts access processing to the target data array (S207). For example, when the access request is a read command, data transfer is performed to the requesting IP core.

データ転送の完了後、Ｌ２ＨＩＴ／ＭＩＳＳ判定部２３２は、対象のタグを更新し、かつ、ウェイのフラグを"Ｆｒｅｅ"に更新する（Ｓ２０８）。つまり、Ｌ２ＨＩＴ／ＭＩＳＳ判定部２３２は、当該アクセス処理が完了した後に、当該アクセス処理対象のデータメモリのロックを解除して、アクセス処理中でないものとする。これにより、以後のアクセス要求については当該ウェイに対してアクセス処理が可能となる。 After the data transfer is completed, the L2HIT / MISS determination unit 232 updates the target tag and updates the way flag to “Free” (S208). That is, the L2HIT / MISS determination unit 232 releases the lock of the data memory targeted for access processing after the access processing is completed, and assumes that the access processing is not in progress. As a result, subsequent access requests can be processed for the way.

ステップＳ２０４においてキャッシュミスと判定した場合、ＳＤＲＡＭコントローラ２３６は、ロード対象のウェイを決定する（Ｓ２０９）。そして、ＳＤＲＡＭコントローラ２３６は、ロード対象のウェイのフラグが"Ｆｒｅｅ"であるか否かを判定する（Ｓ２１０）。ステップＳ２０９においてアクセス対象のウェイのフラグが"Ｆｒｅｅ"であると判定した場合、Ｌ２ＨＩＴ／ＭＩＳＳ判定部２３２は、ロード対象のウェイのフラグを"Ｂｕｓｙ"に更新する（Ｓ２１１）。その後、ＳＤＲＡＭコントローラ２３６は、ロード対象のウェイに対してデータをロードする（Ｓ２１２）。 If it is determined in step S204 that there is a cache miss, the SDRAM controller 236 determines a load target way (S209). Then, the SDRAM controller 236 determines whether or not the flag of the way to be loaded is “Free” (S210). If it is determined in step S209 that the flag of the access target way is “Free”, the L2HIT / MISS determination unit 232 updates the flag of the load target way to “Busy” (S211). Thereafter, the SDRAM controller 236 loads data to the load target way (S212).

図５は、本実施の形態２にかかるキャッシュメモリ装置２３０の動作を説明するタイミングチャートである。ここでは、アクセス要求ＸがＷＡＹ０をアクセス対象とし、アクセス要求ＹがＷＡＹ1をアクセス対象とする例を示す。 FIG. 5 is a timing chart for explaining the operation of the cache memory device 230 according to the second embodiment. Here, an example is shown in which the access request X has WAY 0 as an access target, and the access request Y has WAY 1 as an access target.

まず、キャッシュメモリ装置２３０がＩＰコアＡ及びＢからそれぞれアクセス要求Ｘ及びＹを同時に受け付けたものとする（Ｓ２０１）。そのため、アービタスケジューラ２３１は、アクセス要求Ｘ及びＹの調停を行う（Ｓ２０２）。このとき、調停によりアクセス要求Ｘが先行となったものとする。よって、次のサイクルにて、アービタスケジューラ２３１は、アクセス要求Ｘを処理対象として選択する（Ｓ２０３）。 First, it is assumed that the cache memory device 230 simultaneously receives access requests X and Y from IP cores A and B, respectively (S201). Therefore, the arbiter scheduler 231 arbitrates the access requests X and Y (S202). At this time, it is assumed that the access request X is preceded by arbitration. Therefore, in the next cycle, the arbiter scheduler 231 selects the access request X as a processing target (S203).

続いて、Ｌ２ＨＩＴ／ＭＩＳＳ判定部２３２は、アクセス要求Ｘについてヒットミス判定を行う（Ｓ２０４）。つまり、Ｌ２ＨＩＴ／ＭＩＳＳ判定部２３２は、ＷＡＹ０タグ２３３０及びＷＡＹ１タグ２３３１に共にヒットミス判定を行う。ここでは、ＷＡＹ０でキャッシュヒットしたものとする（Ｓ２０４でＹＥＳ）。 Subsequently, the L2HIT / MISS determination unit 232 performs a hit miss determination for the access request X (S204). That is, the L2HIT / MISS determination unit 232 performs hit miss determination on both the WAY 0 tag 2330 and the WAY 1 tag 2331. Here, it is assumed that a cache hit occurs in WAY 0 (YES in S204).

そこで、Ｌ２ＨＩＴ／ＭＩＳＳ判定部２３２は、ＷＡＹ０アクセス状態フラグ２３４０が"Ｆｒｅｅ"か否かを判定する（Ｓ２０５）。ここでは、ＷＡＹ０アクセス状態フラグ２３４０が"Ｆｒｅｅ"であるものとする。そのため、Ｌ２ＨＩＴ／ＭＩＳＳ判定部２３２は、ＷＡＹ０アクセス状態フラグ２３４０を"Ｂｕｓｙ"に更新し（Ｓ２０６）、ＷＡＹ０データアレイ２３５０へのアクセス処理を開始する（Ｓ２０７）。ここでは、アクセス要求Ｘがデータ読出しであるものとする。よって、ヒットミス判定の次のサイクルからＷＡＹ０データアレイ２３５０からＤ０、Ｄ１、・・・Ｄ１５と読み出され、ＩＰコアＡへデータ転送される。尚、この例では、データ転送に１６サイクル要するが、１ライン当たりのデータ量に応じて３２、６４サイクル等、２以上のサイクルを要することが一般的である。 Therefore, the L2HIT / MISS determination unit 232 determines whether or not the WAY 0 access state flag 2340 is “Free” (S205). Here, it is assumed that the WAY 0 access state flag 2340 is “Free”. Therefore, the L2HIT / MISS determination unit 232 updates the WAY 0 access status flag 2340 to “Busy” (S206), and starts an access process to the WAY 0 data array 2350 (S207). Here, it is assumed that the access request X is data reading. Therefore, D0, D1,... D15 are read from the WAY0 data array 2350 and transferred to the IP core A from the next cycle of hit / miss determination. In this example, 16 cycles are required for data transfer, but generally 2 or more cycles such as 32, 64 cycles, etc. are required depending on the amount of data per line.

また、ＷＡＹ０データアレイ２３５０からＤ０を読み出すサイクルと同時に、アービタスケジューラ２３１は、後続のアクセス要求Ｙを処理対象として選択し（Ｓ２０３）、Ｌ２ＨＩＴ／ＭＩＳＳ判定部２３２は、アクセス要求Ｙのヒットミス判定を行う（Ｓ２０４）。ここでは、ＷＡＹ１でキャッシュヒットしたものとする（Ｓ２０４でＹＥＳ）。 Simultaneously with the cycle of reading D0 from the WAY0 data array 2350, the arbiter scheduler 231 selects the subsequent access request Y as a processing target (S203), and the L2HIT / MISS determination unit 232 performs the hit miss determination of the access request Y. Perform (S204). Here, it is assumed that a cache hit occurs in WAY 1 (YES in S204).

そこで、Ｌ２ＨＩＴ／ＭＩＳＳ判定部２３２は、ＷＡＹ１アクセス状態フラグ２３４１が"Ｆｒｅｅ"か否かを判定する（Ｓ２０５）。ここでは、ＷＡＹ１アクセス状態フラグ２３４１が"Ｆｒｅｅ"であるものとする。そのため、Ｌ２ＨＩＴ／ＭＩＳＳ判定部２３２は、ＷＡＹ１アクセス状態フラグ２３４１を"Ｂｕｓｙ"に更新し（Ｓ２０６）、ＷＡＹ１データアレイ２３５１へのアクセス処理を開始する（Ｓ２０７）。ここでは、アクセス要求Ｙがデータ読出しであるものとする。よって、ヒットミス判定の次のサイクル、つまり、ＷＡＹ０データアレイ２３５０からＤ１が読み出されるタイミングからＷＡＹ１データアレイ２３５１からＤ０、Ｄ１、・・・Ｄ１５と読み出され、ＩＰコアＢへデータ転送される。このとき、アクセス要求Ｘ及びＹに基づくデータ転送が部分的に並列処理される。 Therefore, the L2HIT / MISS determination unit 232 determines whether or not the WAY1 access state flag 2341 is “Free” (S205). Here, it is assumed that the WAY 1 access state flag 2341 is “Free”. Therefore, the L2HIT / MISS determination unit 232 updates the WAY1 access state flag 2341 to “Busy” (S206), and starts an access process to the WAY1 data array 2351 (S207). Here, it is assumed that the access request Y is data reading. Therefore, D0, D1,... D15 are read from the WAY1 data array 2351 from the next cycle of the hit miss determination, that is, the timing at which D1 is read from the WAY0 data array 2350, and the data is transferred to the IP core B. At this time, data transfer based on the access requests X and Y is partially processed in parallel.

その後、Ｌ２ＨＩＴ／ＭＩＳＳ判定部２３２は、ＷＡＹ０データアレイ２３５０からＤ１５が読み出されたことにより、ＷＡＹ０アクセス状態フラグ２３４０を"Ｆｒｅｅ"に更新する（Ｓ２０８）。また、Ｌ２ＨＩＴ／ＭＩＳＳ判定部２３２は、ＷＡＹ１データアレイ２３５１からＤ１５が読み出されたことにより、ＷＡＹ１アクセス状態フラグ２３４１を"Ｆｒｅｅ"に更新する（Ｓ２０８）。 Thereafter, the L2HIT / MISS determination unit 232 updates the WAY 0 access state flag 2340 to “Free” by reading D15 from the WAY 0 data array 2350 (S208). Further, the L2HIT / MISS determination unit 232 updates the WAY1 access state flag 2341 to “Free” by reading D15 from the WAY1 data array 2351 (S208).

このように、図５の例では、同時に発行されたアクセス要求ＸとＹに対して、わずか１サイクル差で全てのデータ転送を完了することができる。 In this way, in the example of FIG. 5, all data transfer can be completed with a difference of only one cycle with respect to the access requests X and Y issued simultaneously.

図６は、本実施の形態２にかかるキャッシュメモリ装置２３０の動作を説明するタイミングチャートである。ここでは、アクセス要求Ｘ及びＹが共にＷＡＹ０をアクセス対象とする例を示す。以下、図５との違いを中心に説明する。 FIG. 6 is a timing chart for explaining the operation of the cache memory device 230 according to the second embodiment. Here, an example is shown in which both access requests X and Y target WAY 0 as an access target. Hereinafter, the difference from FIG. 5 will be mainly described.

まず、先行のアクセス要求Ｘに基づくアクセス処理としてＷＡＹ０データアレイ２３５０からＤ０が読み出される。このときのサイクルと同時に、アービタスケジューラ２３１は、後続のアクセス要求Ｙを処理対象として選択し（Ｓ２０３）、Ｌ２ＨＩＴ／ＭＩＳＳ判定部２３２は、アクセス要求Ｙのヒットミス判定を行う（Ｓ２０４）。ここでは、再び、ＷＡＹ０でキャッシュヒットしたものとする（Ｓ２０４でＹＥＳ）。 First, D0 is read from the WAY0 data array 2350 as an access process based on the preceding access request X. Simultaneously with this cycle, the arbiter scheduler 231 selects the subsequent access request Y as a processing target (S203), and the L2HIT / MISS determination unit 232 performs hit miss determination of the access request Y (S204). Here, it is assumed again that a cache hit occurs in WAY 0 (YES in S204).

そこで、Ｌ２ＨＩＴ／ＭＩＳＳ判定部２３２は、ＷＡＹ０アクセス状態フラグ２３４０が"Ｆｒｅｅ"か否かを判定する（Ｓ２０５）。ここでは、ＷＡＹ０アクセス状態フラグ２３４０が"Ｂｕｓｙ"であるため（Ｓ２０５でＮＯ）、アクセス要求Ｙは、再度、アービタスケジューラ２３１における調停が行われる。その後、Ｌ２ＨＩＴ／ＭＩＳＳ判定部２３２は、ＷＡＹ０データアレイ２３５０からＤ１５が読み出されたことにより、ＷＡＹ０アクセス状態フラグ２３４０を"Ｆｒｅｅ"に更新する（Ｓ２０８）。そして、アクセス要求ＹについてＷＡＹ０に対するアクセスが可能となり、データ転送が開始される。 Therefore, the L2HIT / MISS determination unit 232 determines whether or not the WAY 0 access state flag 2340 is “Free” (S205). Here, since the WAY 0 access status flag 2340 is “Busy” (NO in S205), the arbitration in the arbiter scheduler 231 is performed again for the access request Y. Thereafter, the L2HIT / MISS determination unit 232 updates the WAY 0 access state flag 2340 to “Free” by reading D15 from the WAY 0 data array 2350 (S208). Then, the access request Y can be accessed to WAY 0, and data transfer is started.

このように、アクセス状態フラグにより同一のウェイに対する同時アクセスを適切に排除し、応答データの整合性を保つことができる。 In this way, simultaneous access to the same way can be appropriately eliminated by the access status flag, and the consistency of response data can be maintained.

図７は、本実施の形態２にかかるキャッシュメモリ装置２３０の動作を説明するタイミングチャートである。ここでは、アクセス要求Ｘ及びＹが共にＷＡＹ０をアクセス対象とし、アクセス要求ＺがＷＡＹ１をアクセス対象とする例を示す。以下、図５及び図６との違いを中心に説明する。 FIG. 7 is a timing chart for explaining the operation of the cache memory device 230 according to the second embodiment. Here, an example is shown in which both access requests X and Y have WAY 0 as an access target, and access request Z has WAY 1 as an access target. Hereinafter, the difference from FIGS. 5 and 6 will be mainly described.

まず、キャッシュメモリ装置２３０がＩＰコアＡ及びＢからそれぞれアクセス要求Ｘ及びＹを同時に受け付けたものとする（Ｓ２０１）。そして、調停により、アクセス要求Ｘが先行となったものとする。そして、Ｌ２ＨＩＴ／ＭＩＳＳ判定部２３２は、アクセス要求Ｘについてヒットミス判定を行い、ＷＡＹ０でキャッシュヒットしたものとする（Ｓ２０４でＹＥＳ）。 First, it is assumed that the cache memory device 230 simultaneously receives access requests X and Y from IP cores A and B, respectively (S201). Then, it is assumed that access request X is preceded by arbitration. Then, the L2HIT / MISS determination unit 232 performs a hit miss determination on the access request X, and assumes that a cache hit occurs in WAY 0 (YES in S204).

同時に、キャッシュメモリ装置２３０は、ＩＰコアＣからアクセス要求Ｚを受け付けたものとする。このとき、アービタスケジューラ２３１は、アクセス要求Ｙ及びＺ及びＹの調停を行う（Ｓ２０２）。このとき、調停によりアクセス要求Ｙが先行となったものとする。よって、次のサイクルにて、アービタスケジューラ２３１は、アクセス要求Ｙを処理対象として選択する（Ｓ２０３）。 At the same time, it is assumed that the cache memory device 230 has received an access request Z from the IP core C. At this time, the arbiter scheduler 231 arbitrates the access requests Y, Z, and Y (S202). At this time, it is assumed that the access request Y is preceded by arbitration. Therefore, in the next cycle, the arbiter scheduler 231 selects the access request Y as a processing target (S203).

続いて、Ｌ２ＨＩＴ／ＭＩＳＳ判定部２３２は、アクセス要求Ｙについてヒットミス判定を行う（Ｓ２０４）。このとき、上述した通り、並行して、アクセス要求Ｘに基づくデータ転送が開始される。そして、図６と同様に、ＷＡＹ０アクセス状態フラグ２３４０が"Ｂｕｓｙ"であるため、アクセス要求Ｙは、再度、アービタスケジューラ２３１における調停が行われる。 Subsequently, the L2HIT / MISS determination unit 232 performs hit miss determination for the access request Y (S204). At this time, as described above, data transfer based on the access request X is started in parallel. Similarly to FIG. 6, since the WAY 0 access state flag 2340 is “Busy”, the arbitration in the arbiter scheduler 231 is performed again for the access request Y.

次に、アクセス要求Ｘに基づきＷＡＹ０データアレイ２３５０からＤ１を読み出すサイクルと同時に、アービタスケジューラ２３１は、後続のアクセス要求Ｚを処理対象として選択し（Ｓ２０３）、Ｌ２ＨＩＴ／ＭＩＳＳ判定部２３２は、アクセス要求Ｚのヒットミス判定を行う（Ｓ２０４）。ここでは、ＷＡＹ１でキャッシュヒットしたものとする（Ｓ２０４でＹＥＳ）。そこで、アクセス要求Ｚに基づきＷＡＹ１データアレイ２３５１からデータ転送が開始される。このとき、アクセス要求Ｘ及びＺに基づくデータ転送が部分的に並列処理される。 Next, simultaneously with the cycle of reading D1 from the WAY0 data array 2350 based on the access request X, the arbiter scheduler 231 selects the subsequent access request Z as a processing target (S203), and the L2HIT / MISS determination unit 232 Z hit / miss determination is performed (S204). Here, it is assumed that a cache hit occurs in WAY 1 (YES in S204). Therefore, data transfer is started from the WAY 1 data array 2351 based on the access request Z. At this time, data transfer based on the access requests X and Z is partially processed in parallel.

その後、アクセス要求Ｙについては図６と同様に、アクセス要求Ｘに基づくデータ転送の完了後にデータ転送が開始される。アクセス要求Ｙ及びＺに基づくデータ転送が部分的に並列処理される。 Thereafter, for the access request Y, data transfer is started after completion of data transfer based on the access request X, as in FIG. Data transfer based on access requests Y and Z is partially processed in parallel.

このように、本実施の形態２にかかるＷＡＹ０データアレイ２３５０及びＷＡＹ１データアレイ２３５１は、それぞれ別のＳＲＡＭマクロにすることで、また、応答バスＢ２１及び応答バスＢ２２を設けることで、並列したアクセス処理を実現することができる。 As described above, the WAY 0 data array 2350 and the WAY 1 data array 2351 according to the second embodiment are configured as separate SRAM macros, and by providing the response bus B 21 and the response bus B 22, parallel access processing is performed. Can be realized.

さらに、キャッシュメモリ装置２３０は、ＷＡＹ０及びＷＡＹ１それぞれにＢＵＳＹ及びＦＲＥＥを管理するアクセス状態情報を用いるものである。そして、アービタスケジューラ２３１は、ＢＵＳＹと判定されたウェイへのアクセス要求を再度スケジューリングしなおすものである。 Further, the cache memory device 230 uses access state information for managing BUSY and FREE for WAY0 and WAY1, respectively. The arbiter scheduler 231 reschedules an access request to the way determined to be BUSY.

尚、本実施の形態２は、バースト転送が行われる場合に、特に有効である。そのため、Ｌ２以下のキャッシュメモリに適用可能である。また、本実施の形態２は、マルチバンク方式と異なり、アドレス振り分け不要である。 The second embodiment is particularly effective when burst transfer is performed. Therefore, it can be applied to a cache memory of L2 or less. Further, unlike the multi-bank method, the second embodiment does not require address allocation.

ここで、本実施の形態２について、改めて説明する。まず、共有キャッシュへの複数のアクセス要求が生じた場合、キャッシュへのアクセス要求をアービタスケジューラで調停し、その後、複数のウェイのタグを引く。ここで、キャッシュのヒット又はミスを判定し、アクセスを行うウェイを決定する。アクセスを行うウェイがフリーであれば、ビジーに状態を遷移させ、データアクセスを行う。他方、ビジーであれば、当該キャッシュアクセス要求は、ウェイト状態となり、アービタスケジューラで再度スケジュールし、他のアクセス要求と再度調停しつつ、当該アクセス要求は、当該ウェイがフリーとなるまで、キャッシュのヒットミス判定を繰り返す処理を行う。 Here, the second embodiment will be described again. First, when a plurality of access requests to the shared cache are generated, the access request to the cache is arbitrated by the arbiter scheduler, and then tags of a plurality of ways are pulled. Here, the hit or miss of the cache is determined, and the way to be accessed is determined. If the way to be accessed is free, the state is changed to busy and data access is performed. On the other hand, if it is busy, the cache access request is in a wait state, rescheduled by the arbiter scheduler, and reconciled with other access requests, while the access request hits the cache until the way becomes free. A process of repeating the error determination is performed.

また、ＷＡＹ０がバースト転送中にも、ＷＡＹ０及びＷＡＹ１のタグに対しては、共にアクセス可能であり、後続のアクセス要求がＷＡＹ１へのアクセスとなった場合には、ＷＡＹ０とＷＡＹ１がそれぞれ異なったＩＰコアに対してデータのやり取りが可能になる。 Even when WAY 0 is in burst transfer, the tags of WAY 0 and WAY 1 can both be accessed, and when a subsequent access request is an access to WAY 1, WAY 0 and WAY 1 have different IPs. Data can be exchanged with the core.

そのため、本実施の形態２により、キャッシュメモリに対する部分的な並列アクセスが可能になる。また、ウェイ単位での並列アクセスであるため、上述したマルチバンク方式と違い、プログラムのチューニングが不要となるという効果も奏する。 Therefore, according to the second embodiment, partial parallel access to the cache memory becomes possible. Further, since parallel access is performed in units of ways, there is an effect that program tuning is not required unlike the multi-bank method described above.

尚、本実施の形態２にかかるキャッシュメモリ装置２３０は、複数のプロセッサコアにより共有され、各プロセッサコアから個別にアクセス要求を受け付けるものである。これにより、複数のアクセス要求の発行元に対して共有キャッシュメモリを実現することができ、かつ、アクセス処理を部分的に並列処理可能とできる。また、本実施の形態２にかかるキャッシュメモリ装置２３０は、上述したようにセットアソシアティブ又はフルアソシアティブ方式であることが望ましい。これにより、バンク競合の場合であっても、アクセスを分散させることができる。 Note that the cache memory device 230 according to the second embodiment is shared by a plurality of processor cores, and receives an access request from each processor core individually. As a result, a shared cache memory can be realized for a plurality of access request issuers, and access processing can be partially performed in parallel. In addition, the cache memory device 230 according to the second embodiment is desirably set associative or full associative as described above. Thereby, access can be distributed even in the case of bank competition.

＜実施の形態３＞
本実施の形態３にかかるキャッシュメモリ装置は、キャッシュミス時にＷＡＹ０アクセス状態フラグ２３４０及びＷＡＹ１アクセス状態フラグ２３４１を確認した上で、ロード先を決定するものである。本実施の形態３にかかるキャッシュメモリ装置は、図３と同等であるため図示及び詳細な説明を省略する。 <Embodiment 3>
The cache memory device according to the third embodiment determines the load destination after confirming the WAY 0 access state flag 2340 and the WAY 1 access state flag 2341 at the time of a cache miss. Since the cache memory device according to the third embodiment is the same as that shown in FIG. 3, the illustration and detailed description thereof are omitted.

図８は、本実施の形態３にかかるキャッシュメモリ装置の制御処理の流れを示すフローチャートである。図８において、図４と同等の処理については同一の符号を付し、詳細な説明を省略する。 FIG. 8 is a flowchart showing a flow of control processing of the cache memory device according to the third embodiment. In FIG. 8, the same processes as those in FIG. 4 are denoted by the same reference numerals, and detailed description thereof is omitted.

ステップＳ２０４においてキャッシュミスと判定した場合、ＳＤＲＡＭコントローラ２３６は、ウェイのフラグが"Ｆｒｅｅ"であるウェイが存在するか否かを判定する（Ｓ３０９）。例えば、ＳＤＲＡＭコントローラ２３６は、ＷＡＹ０アクセス状態フラグ２３４０及びＷＡＹ１アクセス状態フラグ２３４１を参照し、"Ｆｒｅｅ"が１以上存在するか否かを判定する。 If it is determined in step S204 that there is a cache miss, the SDRAM controller 236 determines whether there is a way whose way flag is “Free” (S309). For example, the SDRAM controller 236 refers to the WAY 0 access status flag 2340 and the WAY 1 access status flag 2341 and determines whether or not “Free” is 1 or more.

ステップＳ３０９において"Ｆｒｅｅ"であるウェイが存在しない、つまり、全てのウェイがアクセス処理中である場合には、ステップＳ２０２へ戻る。 If there is no way that is “Free” in step S309, that is, if all the ways are being accessed, the process returns to step S202.

また、ステップＳ３０９において"Ｆｒｅｅ"であるウェイが存在すると判定した場合、ＳＤＲＡＭコントローラ２３６は、"Ｆｒｅｅ"であるウェイの中からロード対象のウェイを決定する（Ｓ３１０）。つまり、第１のアクセス要求がキャッシュミスと判定された場合に、複数のデータメモリのうちロックがかけられていないデータメモリをロード先として決定する。従来は、ＬＲＵ（ＬｅａｓｔＲｅｃｅｎｔｌｙＵｓｅｄ）等に基づいて決定されたウェイにロードをするか否かを判定し、当該ウェイが使用中であればロードを待機せざるをえなかった。しかし、本実施の形態３では、予め各ウェイのアクセス状態フラグを参照することで、ロード対象として決定する際には、確実にロード可能なウェイを決定することができる。 If it is determined in step S309 that there is a “Free” way, the SDRAM controller 236 determines a load target way from among the “Free” ways (S310). That is, when it is determined that the first access request is a cache miss, a data memory that is not locked among a plurality of data memories is determined as a load destination. Conventionally, it is determined whether or not to load a way determined based on LRU (Least Recently Used) or the like, and if the way is in use, the load must be waited. However, in the third embodiment, by referring to the access state flag of each way in advance, when a load target is determined, a way that can be loaded reliably can be determined.

つまり、ＳＤＲＡＭコントローラ２３６は、アクセス要求がキャッシュヒット判定によりキャッシュミスと判定された場合に、当該アクセス要求が要求するデータを下位層のメモリから複数のデータメモリのいずれかへロードするロードストアユニットである。そして、ＳＤＲＡＭコントローラ２３６は、アクセス状態情報を参照して、処理中以外のデータメモリをロード先として決定する。 That is, the SDRAM controller 236 is a load store unit that loads data requested by an access request from a lower layer memory to one of a plurality of data memories when the access request is determined to be a cache miss by the cache hit determination. is there. Then, the SDRAM controller 236 refers to the access state information and determines a data memory other than that being processed as a load destination.

または、図４のステップＳ２０９のようにＬＲＵなど既存技術によるウェイ選択を行い、ステップＳ２１０においてウェイがＢＵＳＹだった場合に、アクセス状態フラグが"Ｆｒｅｅ"であるウェイの中からロード対象を決定するようにしてもよい。これにより、キャッシュミスによって新たにデータをロードしようとした際に、選択を行ったウェイがＢＵＳＹだったの場合、他のウェイをロード対象として、なるべく並列アクセスができるようにして、並列アクセスによる性能向上率を高めることができる。 Alternatively, as shown in step S209 in FIG. 4, when a way is selected by an existing technique such as LRU and the way is BUSY in step S210, the load target is determined from the ways whose access state flag is “Free”. It may be. As a result, when a new way to load data due to a cache miss and the selected way is BUSY, it is possible to perform parallel access as much as possible with another way as a load target, and performance by parallel access The improvement rate can be increased.

＜実施の形態４＞
既存のロード対象のウェイの決定の仕方では、あるウェイにデータ格納し切ってから別のウェイへデータを格納するようにしていた。そして、本実施の形態１〜３では、複数の異なったウェイに対するアクセス要求について部分的に並列処理を行うものである。そのため、１つ目のウェイにデータが格納し終わるまでの間は、同じウェイに対しては本実施の形態１〜３により並列処理が実行できないといえる。 <Embodiment 4>
In the method of determining an existing way to be loaded, data is stored in one way and then stored in another way. In Embodiments 1 to 3, partial parallel processing is performed for access requests for a plurality of different ways. Therefore, it can be said that the parallel processing cannot be executed for the same way according to the first to third embodiments until the data is completely stored in the first way.

そこで、本実施の形態４では、直前にロード対象とされたウェイと異なるウェイをロード対象として選択するものである。例えば、２ウェイの場合には、ロード対象のウェイを交互に選択することとなる。まず、本実施の形態４では、アクセス要求がキャッシュヒット判定によりキャッシュミスと判定された場合に、当該アクセス要求が要求するデータを下位層のメモリから前記複数のデータメモリのいずれかへロードするロードストアユニットを備える。そして、本実施の形態４にかかるロードストアユニットは、直前にロードされたデータメモリとは異なるデータメモリをロード先として決定する。つまり、連続してロードするときに、直前のロード先のウェイとは異なるウェイをロード先として決定する。例えば、複数のウェイが空の状態である初期段階などでは、新規に割り付けるウェイをＷＡＹ０、ＷＡＹ１、ＷＡＹ０・・・と交互に割り付ける。これにより、各ウェイに対して均等にデータを格納することができ、並列処理が行える確率を高めることができる。 Therefore, in the fourth embodiment, a way different from the way that was set as the load target immediately before is selected as the load target. For example, in the case of 2 ways, the way to be loaded is selected alternately. First, in the fourth embodiment, when an access request is determined to be a cache miss by a cache hit determination, a load for loading data requested by the access request from a lower layer memory to any of the plurality of data memories Equipped with a store unit. The load / store unit according to the fourth embodiment determines a data memory different from the data memory loaded immediately before as the load destination. That is, when loading continuously, a way different from the immediately preceding load destination is determined as the load destination. For example, in an initial stage where a plurality of ways are empty, new ways to be assigned are assigned alternately as WAY0, WAY1, WAY0,. As a result, data can be stored equally for each way, and the probability that parallel processing can be performed can be increased.

＜実施の形態５＞
キャッシュメモリ内の複数のウェイに対するアクセスのヒット率は、アクセス要求元のアプリケーション等や処理過程に応じて変動することがある。そのため、本実施の形態５では、キャッシュミス時にロードするデータを複数のウェイに分散させるものである。すなわち、まず、本実施の形態５では、アクセス要求がキャッシュヒット判定によりキャッシュミスと判定された場合に、当該アクセス要求が要求するデータを下位層のメモリから複数のデータメモリのいずれかへロードするロードストアユニットを備える。そして、本実施の形態５にかかるロードストアユニットは、複数のデータメモリにおけるアクセス頻度に応じてロード先となるデータメモリを決定するものである。アクセス頻度とは、例えば、ウェイの使用比率を含む。これにより、各ウェイに分散してデータを格納することができ、ウェイごとのヒット率が分散し、並列アクセス処理ができる確率を向上させることができる。 <Embodiment 5>
The hit rate of access to a plurality of ways in the cache memory may fluctuate depending on the application requesting the access request and the like and the processing process. Therefore, in the fifth embodiment, data to be loaded at the time of a cache miss is distributed over a plurality of ways. That is, first, in the fifth embodiment, when an access request is determined to be a cache miss by a cache hit determination, data requested by the access request is loaded from a lower layer memory to any of a plurality of data memories. Equipped with a load store unit. The load / store unit according to the fifth embodiment determines a data memory to be a load destination according to the access frequency in a plurality of data memories. The access frequency includes, for example, the usage ratio of the way. As a result, data can be stored in each way in a distributed manner, the hit rate for each way can be distributed, and the probability that parallel access processing can be performed can be improved.

＜実施の形態６＞
図９は、本実施の形態６にかかる情報処理装置６００の構成を示すブロック図である。情報処理装置６００は、複数のプロセッサコア６１１〜６１ｍと、タグメモリ群１２０と、データメモリ群１３０と、キャッシュ制御部６２０とを備える。 <Embodiment 6>
FIG. 9 is a block diagram showing a configuration of the information processing apparatus 600 according to the sixth embodiment. The information processing apparatus 600 includes a plurality of processor cores 611 to 61m, a tag memory group 120, a data memory group 130, and a cache control unit 620.

複数のプロセッサコア６１１〜６１ｍは、それぞれアクセス要求ＡＲをキャッシュ制御部６２０に対して発行する。そして、複数のプロセッサコア６１１〜６１ｍは、アクセス要求ＡＲの応答としてデータ転送ＤＴ等を受け付ける。 Each of the plurality of processor cores 611 to 61m issues an access request AR to the cache control unit 620. The plurality of processor cores 611 to 61m accept data transfer DT or the like as a response to the access request AR.

キャッシュ制御部６２０は、複数のプロセッサコア６１１〜６１ｍのいずれかからの第１のアクセス要求に対して複数のタグメモリ１２１〜１２ｎへのキャッシュヒット判定を行う。そして、キャッシュ制御部６２０は、第１のアクセス要求とは異なる第２のアクセス要求に基づき複数のデータメモリのいずれかにアクセス処理中に、第１のアクセス要求におけるキャッシュヒットの際に特定されるタグメモリに対応するデータメモリが当該第１のアクセス要求とは異なる第２のアクセス要求に基づきアクセス処理中ではない場合に、当該データメモリへのアクセス処理を開始する。 The cache control unit 620 performs cache hit determination to the plurality of tag memories 121 to 12n in response to the first access request from any of the plurality of processor cores 611 to 61m. Then, the cache control unit 620 is specified at the time of a cache hit in the first access request during the access process to any of the plurality of data memories based on the second access request different from the first access request. When the data memory corresponding to the tag memory is not being accessed based on the second access request different from the first access request, the access processing to the data memory is started.

尚、タグメモリ群１２０とデータメモリ群１３０とは、図１と同等であるため説明を省略する。 The tag memory group 120 and the data memory group 130 are the same as those in FIG.

これにより、本実施の形態１と同様の効果を奏することができる。 Thereby, the same effects as those of the first embodiment can be obtained.

＜その他の実施の形態＞
また、他の実施の形態として、ヒットミス判定部を複数かつ並列に備えるようにしてもよい。これにより、複数のアクセス要求に対してヒットミス判定を並列化することができる。 <Other embodiments>
As another embodiment, a plurality of hit miss determination units may be provided in parallel. Thereby, hit miss determination can be parallelized for a plurality of access requests.

尚、上述した実施の形態２ではマルチプロセッサシステムを対象としたが、１つのプロセッサから複数のアクセス要求を発行するものであっても適用可能である。 In the second embodiment described above, the multiprocessor system is targeted. However, the present invention can be applied to a case where a plurality of access requests are issued from one processor.

尚、上述した実施の形態１〜６では、ウェイごとにデータアレイのＳＲＡＭマクロを分離するため、ウェイ単位でのシャットダウンが容易に実現可能となる。よって、アクセス頻度が低い場合などに、キャッシュ容量を電源遮断によって減らすことにより、省電力化が可能である。 In the first to sixth embodiments described above, since the SRAM macro of the data array is separated for each way, shutdown in units of ways can be easily realized. Therefore, when the access frequency is low, it is possible to save power by reducing the cache capacity by turning off the power.

尚、図４のステップ２０３の後に、ウェイが"Ｆｒｅｅ"の場合に、キャッシュがヒットするものと仮定して、投機的にデータ読み出しを開始するようにしてもよい。これにより、後続のアクセス要求のヒットミス判定と同時にデータ読出しを開始することができる。また、レイテンシ短縮も実現でき得る。 In addition, after step 203 in FIG. 4, when the way is “Free”, it is possible to speculatively start reading data on the assumption that the cache hits. Thereby, data reading can be started simultaneously with the hit miss determination of the subsequent access request. In addition, latency can be shortened.

さらに、その他の実施の形態としては、以下のキャッシュメモリシステムがある。すなわち、キャッシュメモリシステムは、複数のＩＰコアにより共有されるキャッシュメモリを有する。当該共有キャッシュメモリは、複数の連想度を持ったセットアソシアティブ方式のキャッシュ構成である。そして、ウェイごとにビジー又はフリーの状態管理を行う。また、キャッシュアクセス要求によって、上記キャッシュメモリのタグメモリをアクセスする。そして、アクセスを行う必要のあるウェイを決定し、当該ウェイがフリーであればアクセスを許可する。同時に、当該ウェイの状態をビジーにする。また、データ転送の完了後に再度フリーの状態に戻す。そして、後続のアクセス要求が、続くサイクルで上記タグメモリをアクセスし、異なったウェイにアクセスすると決定され、当該ウェイがフリーであれば、前記先行するアクセス要求と当該後続のアクセス要求に基づくデータアクセスを並列に行う。 Furthermore, as another embodiment, there is the following cache memory system. In other words, the cache memory system has a cache memory shared by a plurality of IP cores. The shared cache memory has a set associative cache configuration having a plurality of associations. Then, busy or free state management is performed for each way. Further, the tag memory of the cache memory is accessed by a cache access request. Then, a way that needs to be accessed is determined, and if the way is free, access is permitted. At the same time, the state of the way is busy. In addition, after the data transfer is completed, the free state is restored. Then, it is determined that a subsequent access request accesses the tag memory in a subsequent cycle and accesses a different way, and if the way is free, data access based on the preceding access request and the subsequent access request In parallel.

尚、本実施の形態は、複数のプロセッサおよびプロセッサや他のハードウェアＩＰと、共有のキャッシュメモリを集積したＳｏＣ（ＳｙｓｔｅｍｏｎａＣｈｉｐ）に適用可能である。 The present embodiment can be applied to a SoC (System on a Chip) in which a plurality of processors, processors and other hardware IP, and a shared cache memory are integrated.

以上、本発明者によってなされた発明を実施の形態に基づき具体的に説明したが、本発明は前記実施の形態に限定されるものではなく、その要旨を逸脱しない範囲で種々変更可能であることはいうまでもない。 As mentioned above, the invention made by the present inventor has been specifically described based on the embodiment. However, the present invention is not limited to the embodiment, and various modifications can be made without departing from the scope of the invention. Needless to say.

１００キャッシュメモリ装置
１１０制御部
１２０タグメモリ群
１２１タグメモリ
１２２タグメモリ
１２ｎタグメモリ
１３０データメモリ群
１３１データメモリ
１３２データメモリ
１３ｎデータメモリ
Ｗ１ウェイ
Ｗ２ウェイ
Ｗｎウェイ
ＡＲ１アクセス要求
ＡＲ２アクセス要求
ＤＴ１データ転送
ＤＴ２データ転送
２００マルチプロセッサシステム
２１１ＩＰコア
２１２ＩＰコア
２１３ＩＰコア
２１４ＩＰコア
２２１Ｌ１キャッシュ
２２２Ｌ１キャッシュ
２２３Ｌ１キャッシュ
２２４Ｌ１キャッシュ
２３０キャッシュメモリ装置
２３１アービタスケジューラ
２３２Ｌ２ＨＩＴ／ＭＩＳＳ判定部
２３３０ＷＡＹ０タグ
２３３１ＷＡＹ１タグ
２３４０ＷＡＹ０アクセス状態フラグ
２３４１ＷＡＹ１アクセス状態フラグ
２３５０ＷＡＹ０データアレイ
２３５１ＷＡＹ１データアレイ
２３６ＳＤＲＡＭコントローラ
２４０ＳＤＲＡＭ
Ｂ２０アクセス要求バス
Ｂ２１応答バス
Ｂ２２応答バス
ＡＩＰコア
ＢＩＰコア
ＣＩＰコア
Ｘアクセス要求
Ｙアクセス要求
Ｚアクセス要求
６００情報処理装置
６１１プロセッサコア
６１２プロセッサコア
６１ｍプロセッサコア
６２０キャッシュ制御部
Ｌ２_０〜Ｌ２_７Ｌ２キャッシュメモリ 100 cache memory device 110 control unit 120 tag memory group 121 tag memory 122 tag memory 12n tag memory 130 data memory group 131 data memory 132 data memory 13n data memory W1 way W2 way Wn way AR1 access request AR2 access request DT1 data transfer DT2 data Transfer 200 Multiprocessor system 211 IP core 212 IP core 213 IP core 214 IP core 221 L1 cache 222 L1 cache 223 L1 cache 224 L1 cache 230 Cache memory device 231 Arbiter scheduler 232 L2HIT / MISS determination unit 2330 WAY0 tag 2331 WAY1 tag 2340 WAY1 tag 2340 Access status flag 2341 WAY 1 access Status flag 2350 WAY 0 data array 2351 WAY 1 data array 236 SDRAM controller 240 SDRAM
B20 access request bus B21 response bus B22 response bus A IP core B IP core C IP core X access request Y access request Z access request 600 information processing device 611 processor core 612 processor core 61m processor core 620 cache control unit L2 _{0 to} L2 ₇ L2 cache memory

Claims

Multiple tag memories,
A plurality of data memories corresponding to each tag memory;
A cache hit in the plurality of tag memories is determined in response to the first access request, and the cache in the first access request is processed during access processing to any of the plurality of data memories based on the second access request. A control unit that starts an access process to the data memory when the data memory corresponding to the tag memory specified at the time of hit is not being accessed based on the second access request;
A cache memory device comprising:

The cache memory device according to claim 1, wherein the control unit performs burst transfer of data read from the data memory to the request source of the access request as the access processing.

A storage area for storing access status information indicating whether or not the data memory is being accessed;
The controller is
When the access process is started based on the second access request in the case of the cache hit, the access process target data memory is being processed, and after the access process is completed, the access process target data memory And updating the access status information respectively,
When the first access request is the cache hit, the access state information is referred to for the data memory corresponding to the specified tag memory, and if the access state information is not being processed, the data memory is transferred to the data memory. The cache memory device according to claim 1, wherein an access process is started.

It further includes an arbitration unit that accepts a plurality of access requests and arbitrates,
The controller is
If the data memory corresponding to the tag memory identified when the arbitrated subsequent access request is a cache hit is in the process of being accessed based on the arbitrated previous access request, the subsequent access The cache memory device according to claim 1, wherein the request is returned to the arbitration unit.

A load store unit that loads data requested by the access request from a lower layer memory to any of the plurality of data memories when the access request is determined to be a cache miss by the cache hit determination;
4. The cache memory device according to claim 3, wherein the load store unit refers to the access state information and determines a data memory other than that during access processing as a load destination. 5.

A load store unit that loads data requested by the access request from a lower layer memory to any of the plurality of data memories when the access request is determined to be a cache miss by the cache hit determination;
The cache memory device according to claim 1, wherein the load store unit determines a data memory different from the data memory loaded immediately before as a load destination.

A load store unit that loads data requested by the access request from a lower layer memory to any of the plurality of data memories when the access request is determined to be a cache miss by the cache hit determination;
The cache memory device according to claim 1, wherein the load store unit determines a data memory to be a load destination according to an access frequency in the plurality of data memories.

The cache memory device according to claim 1, wherein the cache memory device is a set associative method.

The cache memory device according to claim 1, wherein the cache memory device is shared by a plurality of processor cores and receives an access request from each processor core individually.

During data access to any of the plurality of ways based on the first access request to the associative cache memory, the way specified by the cache hit in the second access request other than the first access request is the first way. A cache memory device that starts data access to the specified way when data access is not being performed based on an access request.

Identify the tag memory that hits the cache based on the first access request,
During an access process to any of a plurality of data memories based on a second access request other than the first access request, a data memory corresponding to the identified tag memory in the first access request is the second To determine whether access processing is in progress based on the access request
A method for controlling a cache memory, which starts an access process based on the first access request when it is determined that the access process is not in progress.

When starting the access process based on the second access request, the access target data memory is locked, and after the access process is completed, the access target data memory is unlocked,
Determining whether the data memory corresponding to the tag memory identified based on the first access request is locked;
The cache memory control method according to claim 11, wherein, when it is determined that the lock is not applied, a lock is applied to a data memory subject to access processing of the first access request.

Mediation when multiple access requests are accepted,
An access request processed in advance by the arbitration is defined as the second access request, and an access request processed subsequent to the second access request by the arbitration is defined as the first access request.
When it is determined that the data memory subject to access processing of the first access request is undergoing access processing based on the second access request, the first access request is again subject to arbitration. The method of controlling a cache memory according to claim 11.

When the first access request is determined to be a cache miss, a data memory that is not locked among a plurality of data memories is determined as a load destination;
13. The cache memory control method according to claim 12, wherein data requested by the first access request is loaded from a lower layer memory to the load destination.

When it is determined that the first access request is a cache miss, a data memory different from the data memory loaded immediately before is determined as a load destination,
12. The cache memory control method according to claim 11, wherein data requested by the first access request is loaded from a lower layer memory to the load destination.

When it is determined that the first access request is a cache miss, a data memory to be a load destination is determined according to an access frequency in a plurality of data memories,
12. The cache memory control method according to claim 11, wherein data requested by the first access request is loaded from a lower layer memory to the load destination.

Multiple processor cores,
Multiple tag memories,
A plurality of data memories corresponding to each tag memory;
A cache hit determination to the plurality of tag memories is performed for the first access request from any of the plurality of processor cores, and the plurality of the plurality of tag cores are determined based on a second access request different from the first access request. When the data memory corresponding to the tag memory specified at the time of the cache hit in the first access request is not being accessed based on the second access request during the access processing to any of the data memories A cache control unit for starting access processing to the data memory;
An information processing apparatus comprising: