JP2006139646A

JP2006139646A - Data transfer device and data transfer method

Info

Publication number: JP2006139646A
Application number: JP2004330088A
Authority: JP
Inventors: Mitsunari Todoroki; 晃成轟
Original assignee: Seiko Epson Corp
Current assignee: Seiko Epson Corp
Priority date: 2004-11-15
Filing date: 2004-11-15
Publication date: 2006-06-01

Abstract

<P>PROBLEM TO BE SOLVED: To improve processing efficiency by reducing power consumption in a cache memory. <P>SOLUTION: An information processor 1 detects whether data supposed to be successively read are cached or not at the time of reading data to be read from a processor. When the data supposed to be successively read are stored in a cache, these data are stored in a pre-read cache part 30, and when the data supposed to be successively read are not stored in the cache, these data are read from a main memory 7 and stored in the look-ahead cache part 30. Thereafter, when the address of data read actually from the processor in the following cycle is matched to the address of the data stored in the look-ahead cache part 30, the data are outputted from the cache part 30 to the processor. <P>COPYRIGHT: (C)2006,JPO&NCIPI

Description

本発明は、プロセッサとプロセッサ外部に備えられたメモリデバイスとの間におけるデータの転送を行うデータ転送装置およびデータ転送方法に関する。 The present invention relates to a data transfer apparatus and a data transfer method for transferring data between a processor and a memory device provided outside the processor.

従来、プロセッサがメインメモリ等のメモリデバイス上のデータを読み出す処理を高速化するために、キャッシュメモリが用いられている。
キャッシュメモリは、プロセッサによって高速にデータを読み出すことが可能な記憶素子によって構成されている。そして、キャッシュメモリは、メモリデバイスに記憶されているデータ（以下、適宜「メモリデバイスデータ」と言う。）の一部を記憶しておき、プロセッサがメモリデバイスからデータの読み出しを行う場合に、そのデータがキャッシュメモリに記憶されているものであれば、キャッシュメモリから読み出すことによって、データを高速に読み出すことを可能としている。 Conventionally, a cache memory is used in order to speed up a process in which a processor reads data on a memory device such as a main memory.
The cache memory is configured by a storage element that can read data at high speed by a processor. The cache memory stores a part of data stored in the memory device (hereinafter referred to as “memory device data” as appropriate), and when the processor reads data from the memory device, the cache memory If the data is stored in the cache memory, the data can be read at high speed by reading from the cache memory.

ここで、キャッシュメモリの方式には種々のものがあるが、セット・アソシアティブ方式が一般的である。
セット・アソシアティブ方式とは、キャッシュメモリを複数の領域（ウェイ）に分割し、それぞれのウェイに、メモリデバイス上の異なるアドレスのデータを格納しておくことにより、ヒット率を向上させることができる方式である。 Here, there are various cache memory systems, but the set associative system is common.
The set associative method is a method that can improve the hit rate by dividing the cache memory into a plurality of areas (way) and storing data of different addresses on the memory device in each way. It is.

図１３は、従来のセット・アソシアティブ方式のキャッシュメモリ１００の構成を示す概略図である。
図１３において、キャッシュメモリ１００は、タグ・テーブル１１０と、データ・メモリ１２０と、ヒット検出部１３０と、マルチプレクサ（ＭＵＸ）１４０とを含んで構成される。なお、キャッシュメモリ１００においては、その記憶領域内に、複数Ｎ個の要素を記憶することが可能であり、これらの要素それぞれは“エントリ”と称される。また、キャッシュメモリ１００は、２ウェイのセット・アソシアティブ方式であり、各エントリには、それぞれ２つのメモリデバイスデータ（ウェイＡおよびウェイＢのデータ）が格納されている。 FIG. 13 is a schematic diagram showing a configuration of a conventional set-associative cache memory 100. As shown in FIG.
In FIG. 13, the cache memory 100 includes a tag table 110, a data memory 120, a hit detection unit 130, and a multiplexer (MUX) 140. In the cache memory 100, a plurality of N elements can be stored in the storage area, and each of these elements is referred to as an “entry”. The cache memory 100 is a two-way set associative method, and each entry stores two memory device data (way A and way B data).

タグ・テーブル１１０は、データ・メモリ１２０におけるウェイＡ，Ｂそれぞれのメモリデバイスデータが、メモリデバイス上のいずれのアドレスに記憶されているものであるかを示すアドレス情報を記憶している。タグ・テーブル１１０に記憶されているアドレス情報は、後述するヒット検出部１３０によって参照され、キャッシュがヒットしたか否かの判定に用いられる。 The tag table 110 stores address information indicating at which address on the memory device the memory device data of the ways A and B in the data memory 120 is stored. The address information stored in the tag table 110 is referred to by a hit detection unit 130, which will be described later, and is used to determine whether or not the cache is hit.

データ・メモリ１２０は、アクセス頻度の高いデータ等、所定のメモリデバイスデータを記憶している。また、データ・メモリ１２０には、ウェイＡ，Ｂそれぞれに対応するメモリデバイスデータを記憶することが可能である。
ヒット検出部１３０は、プロセッサからの読み出し命令に対し、キャッシュメモリ１００に記憶されているメモリデバイスデータがヒットしたか否かを検出する。具体的には、タグ・テーブル１１０に記憶されたアドレス情報それぞれを参照し、プロセッサからの読み出し命令に示されたアドレスに対応するアドレス情報が検出されると、キャッシュがヒットしたものと判定する。そして、ヒット検出部１３０は、ヒットしたウェイを示す情報をＭＵＸ１４０に出力する。 The data memory 120 stores predetermined memory device data such as frequently accessed data. The data memory 120 can store memory device data corresponding to each of the ways A and B.
The hit detection unit 130 detects whether or not the memory device data stored in the cache memory 100 has hit in response to a read command from the processor. Specifically, the address information stored in the tag table 110 is referred to, and when the address information corresponding to the address indicated in the read command from the processor is detected, it is determined that the cache is hit. Then, the hit detection unit 130 outputs information indicating the hit way to the MUX 140.

ＭＵＸ１４０は、ヒット検出部１３０から入力されたウェイを示す情報に基づいて、データ・メモリ１２０から出力されるいずれかのメモリデバイスデータを選択し、プロセッサへの出力データ（プロセッサによって読み出されたデータ）とする。
ところで、プロセッサの中には、メインメモリ等、プロセッサ外部に備えられるメモリデバイスの他、そのメモリデバイスより高速にアクセス可能なローカルメモリをプロセッサ内部に備えるものがある。 The MUX 140 selects any memory device data output from the data memory 120 based on the information indicating the way input from the hit detection unit 130, and outputs data to the processor (data read by the processor). ).
Some processors include a memory device provided outside the processor, such as a main memory, and a local memory accessible at a higher speed than the memory device.

そして、プロセッサ外部に備えられたメモリデバイスから一部のデータをローカルメモリに読み込み、高速に処理を行うことで、全体として処理効率の向上が図られている。
このとき、ローカルメモリと外部のメモリデバイスとの間におけるデータ転送を、プロセッサを介することなく行うために、ＤＭＡＣ（Direct Memory Access Controller）が備えられることがある。 Further, the processing efficiency is improved as a whole by reading a part of data from a memory device provided outside the processor into the local memory and performing processing at high speed.
At this time, a DMAC (Direct Memory Access Controller) may be provided to transfer data between the local memory and the external memory device without using a processor.

ＤＭＡＣは、ローカルメモリと外部のメモリデバイスとの間でプロセッサを介することなくデータを転送することにより、プロセッサの処理負荷を軽減している。
ところが、キャッシュメモリを備えるプロセッサにおいては、キャッシュメモリに記憶されたデータと外部のメモリデバイスに記憶されたデータとが必ずしも一致していない場合がある。 The DMAC reduces the processing load on the processor by transferring data between the local memory and the external memory device without going through the processor.
However, in a processor provided with a cache memory, data stored in the cache memory may not always match data stored in an external memory device.

したがって、ＤＭＡＣがアクセスする外部のメモリデバイス上のデータは、キャッシュメモリの内容が反映されておらず、無効なものとなる可能性がある。
そこで、ＤＭＡＣが外部のメモリデバイスにアクセスする場合に、キャッシュメモリを介してアクセスすることにより、プロセッサが外部のメモリデバイスにアクセスする場合と同様、最新のデータ（有効なデータ）を参照可能とすることが考えられる。 Therefore, the data on the external memory device accessed by the DMAC does not reflect the contents of the cache memory and may become invalid.
Therefore, when the DMAC accesses an external memory device, the latest data (valid data) can be referred to by accessing via the cache memory as in the case where the processor accesses the external memory device. It is possible.

ここで、ＤＭＡＣを備える場合に、外部のメモリデバイスのデータとキャッシュメモリのデータとのコヒーレンシ（一致性）を保証するための技術として、特許文献１〜３に記載されたものが知られている。
特開平８−６９４１０号公報特開平６−１２３６３号公報特開平８−１１５２６９号公報 Here, when a DMAC is provided, the techniques described in Patent Documents 1 to 3 are known as techniques for guaranteeing coherency (coincidence) between data in an external memory device and data in a cache memory. .
JP-A-8-69410 JP-A-6-12363 JP-A-8-115269

しかしながら、上述のようなセット・アソシアティブ方式のキャッシュメモリにおいては、プロセッサからエントリ・アドレス（キャッシュメモリ内に記憶されているいずれかのエントリを選択するアドレス）が入力された場合、キャッシュメモリ１００の複数のウェイそれぞれについて、タグ・テーブル１１０およびデータ・メモリ１２０にアクセスを行い、データがヒットしたか否かが検出される。 However, in the set associative cache memory as described above, when an entry address (an address for selecting one of the entries stored in the cache memory) is input from the processor, a plurality of cache memories 100 are stored. For each of the ways, the tag table 110 and the data memory 120 are accessed, and it is detected whether or not the data hits.

そのため、仮にＤＭＡＣがキャッシュメモリ１００にアクセスした場合であっても、キャッシュメモリ１００内の不要な部分に対するアクセスが増加することとなり、消費電力の増加、あるいは、処理効率の低下を招くと言う問題があった。
なお、上述の公報に記載された技術は、ＤＭＡＣが外部のメモリデバイスにアクセスする場合に、キャッシュメモリを介してアクセスする上で生ずる上記問題を解決するものではなかった。 Therefore, even if the DMAC accesses the cache memory 100, access to an unnecessary part in the cache memory 100 increases, resulting in a problem that power consumption increases or processing efficiency decreases. there were.
Note that the technique described in the above publication does not solve the above-described problem that occurs when the DMAC accesses an external memory device through the cache memory.

このように、ＤＭＡ（Direct Memory Access）を行う場合等、プロセッサを介することなくローカルメモリと外部のメモリデバイスとの間でデータ転送を行う場合に、キャッシュメモリに記憶されたデータとのコヒーレンシを保証しつつ、消費電力の増加、あるいは、処理効率の低下を防止することは困難であった。
本発明の課題は、プロセッサを介することなくローカルメモリと外部のメモリデバイスとの間でデータ転送を行う場合に、キャッシュメモリに記憶されたデータとのコヒーレンシを保証しつつ、消費電力の増加、あるいは、処理効率の低下を防止することである。 In this way, when performing data transfer between local memory and an external memory device without going through a processor, such as when performing DMA (Direct Memory Access), coherency with data stored in the cache memory is guaranteed. However, it has been difficult to prevent an increase in power consumption or a decrease in processing efficiency.
An object of the present invention is to increase power consumption while guaranteeing coherency with data stored in a cache memory when transferring data between a local memory and an external memory device without going through a processor, or It is to prevent a decrease in processing efficiency.

以上の課題を解決するため、本発明は、
プロセッサ外部に備えられる第１のメモリ（例えば、図１のメインメモリ７）と、前記第１のメモリと前記プロセッサとの間に備えられ、前記第１のメモリに記憶されているデータの少なくとも一部を複数のウェイにキャッシュしておくことが可能なキャッシュメモリ（例えば、図１のキャッシュメモリ５）と、前記プロセッサによって前記第１のメモリより高速にアクセス可能な第２のメモリ（例えば、図１のローカルメモリ３）とを含むデータ転送装置であって、前記第１のメモリから前記第２のメモリへのデータの読み出し命令あるいは前記第２のメモリから前記第１のメモリへのデータの書き込み命令の少なくともいずれかを、前記キャッシュメモリに出力することにより、前記第１のメモリと前記第２のメモリとの間で前記プロセッサを介さずにデータ転送を行うアクセス制御手段（例えば、図１のＤＭＡＣ４）と、前記アクセス制御手段が読み出しあるいは書き込みを行っているアクセス中データの後にアクセスされると予想される予定データが、前記キャッシュメモリのいずれかのウェイにキャッシュされているか否かを判定するキャッシュ判定手段（例えば、図２のアクセス管理部２０およびタグ・テーブル４０）と、前記キャッシュ判定手段によって、前記予定データがいずれかのウェイにキャッシュされていると判定された場合に、前記複数のウェイのうち、該予定データが記憶されているウェイにアクセスし、該予定データを読み出して記憶する先読みキャッシュ手段（例えば、図２のアクセス管理部２０および先読みキャッシュ部３０）とを含み、前記先読みキャッシュ手段は、前記アクセス中データの後に前記予定データ対象とする読み出し命令が入力された場合、記憶している予定データを前記アクセス制御手段に出力し、前記アクセス中データの後に前記予定データを対象とする書き込み命令が入力された場合、該予定データに対して書き込みを行うことを特徴としている。 In order to solve the above problems, the present invention provides:
A first memory provided outside the processor (for example, main memory 7 in FIG. 1) and at least one of data stored in the first memory provided between the first memory and the processor. Cache memory (for example, cache memory 5 in FIG. 1) that can be cached in a plurality of ways, and a second memory (for example, FIG. 1) that can be accessed by the processor at higher speed than the first memory. A local memory 3), and a data read command from the first memory to the second memory or a data write from the second memory to the first memory By outputting at least one of the instructions to the cache memory, the process is performed between the first memory and the second memory. The access control means (for example, DMAC 4 in FIG. 1) that performs data transfer without going through the service, and the scheduled data that is expected to be accessed after the accessing data that is being read or written by the access control means, The cache determination means (for example, the access management unit 20 and the tag table 40 in FIG. 2) for determining whether or not the cache memory is cached in any way and the cache determination means determine which of the scheduled data. When it is determined that the data is cached in one of the ways, a prefetch cache means (for example, FIG. 1) accesses the way in which the scheduled data is stored among the plurality of ways, and reads and stores the scheduled data. 2 access management unit 20 and prefetch cache unit 30), The cache means outputs the stored scheduled data to the access control means when the read command for the scheduled data is input after the accessing data, and the scheduled data is output after the accessing data. When a target write command is input, the scheduled data is written.

このような構成により、第１のメモリと第２のメモリとの間でプロセッサを介さずにデータの転送を行う場合にも、キャッシュメモリを介してデータの転送が行われる。このキャッシュメモリにおいては、読み出し対象であるデータがアクセス制御手段によって読み出されている際に、引き続いて読み出されると予想されるデータがキャッシュされているか否かを検出する。そして、キャッシュメモリは、引き続いて読み出されると予想されるデータがキャッシュメモリに記憶されている場合、そのデータを先読みキャッシュ手段に記憶し、引き続いてＤＭＡＣから実際に読み出されたデータのアドレスが、先読みキャッシュ手段に記憶されたデータのアドレスと一致する場合、そのデータに対して読み出しあるいは書き込みが行われる。 With such a configuration, even when data is transferred between the first memory and the second memory without using a processor, the data is transferred via the cache memory. In this cache memory, when data to be read is being read by the access control means, it is detected whether or not the data expected to be read subsequently is cached. When the cache memory stores data that is expected to be read subsequently, the cache memory stores the data in the prefetch cache means, and subsequently the address of the data actually read from the DMAC is If the address of the data stored in the prefetch cache means matches, the data is read or written.

そのため、アクセス制御手段から読み出しあるいは書き込み対象であるデータのアドレスが入力された場合に、キャッシュメモリは、キャッシュメモリの各ウェイに常にはアクセスする必要がなく、キャッシュメモリに読み出しあるいは書き込み対象であるデータが記憶されている場合にのみアクセスすれば足りる。
したがって、プロセッサを介することなく第１のメモリと第２のメモリとの間でデータ転送を行う場合に、キャッシュメモリに記憶されたデータとのコヒーレンシを保証しつつ、消費電力の増加、あるいは、処理効率の低下を防止することが可能となる。 Therefore, when the address of the data to be read or written is input from the access control means, the cache memory does not always have to access each way of the cache memory, and the data to be read or written to the cache memory. Access is only necessary if is stored.
Therefore, when data is transferred between the first memory and the second memory without going through the processor, an increase in power consumption or processing is ensured while ensuring coherency with the data stored in the cache memory. It is possible to prevent a decrease in efficiency.

また、前記キャッシュメモリは、前記複数のウェイについて、キャッシュしているデータのアドレスを格納しているアドレス記憶手段（例えば、図２のタグ・テーブル４０）と、該アドレスそれぞれに対応するデータを記憶しているデータ記憶手段（例えば、図２のデータ・メモリ５０）とを含み、前記キャッシュ判定手段は、前記アドレス記憶手段のいずれかのウェイに、前記予定データのアドレスが記憶されているか否かによって、該予定データがキャッシュされているか否かを判定し、前記先読みキャッシュ手段は、前記データ記憶手段の複数のウェイのうち、前記予定データのアドレスを記憶している前記アドレス記憶手段のウェイに対応するウェイにアクセスすることを特徴としている。 The cache memory also stores address storage means (for example, the tag table 40 in FIG. 2) that stores the addresses of the cached data for the plurality of ways, and data corresponding to each of the addresses. The cache determination unit determines whether the address of the scheduled data is stored in any one of the address storage units. To determine whether the schedule data is cached, and the prefetch cache means stores the address of the schedule data storing the address of the schedule data among the plurality of ways of the data storage means. It is characterized by accessing the corresponding way.

このような構成により、アドレス記憶手段にアクセスすることで、予定データがキャッシュにヒットするか否かを判定することができるため、キャッシュにヒットするか否かを判定する際にデータ記憶手段にアクセスすることによって発生する不要な消費電力を低減することが可能となる。また、データ記憶手段において、予定データが記憶されているウェイにのみアクセスすることができるため、さらに消費電力を低減することが可能となる。 With such a configuration, it is possible to determine whether or not the scheduled data hits the cache by accessing the address storage means, so the data storage means is accessed when determining whether or not the cache hits the cache. By doing so, it is possible to reduce unnecessary power consumption. Further, since the data storage means can access only the way where the scheduled data is stored, it is possible to further reduce the power consumption.

また、前記予定データは、前記読み出し中データの直後に読み出しあるいは書き込みが行われると予想されるデータ（例えば、読み出し中データのアドレスに引き続くアドレスのデータ等）であることを特徴としている。
したがって、読み出し中データの次に読み出されると予想されるデータのみについて、キャッシュされているか否かおよび先読みキャッシュ手段への記憶等の処理を行えばよいため、処理効率を向上させることができる。 Further, the scheduled data is data that is expected to be read or written immediately after the data being read (for example, data at an address subsequent to the address of the data being read).
Therefore, only the data that is expected to be read next to the data being read only needs to be processed such as whether it is cached and stored in the prefetch cache means, so that the processing efficiency can be improved.

また、前記アクセス制御手段のアクセス対象となるデータは、複数のワードを含むブロックとして構成され、該ブロックを単位として、前記予定データがキャッシュされているか否かの判定あるいは前記予定データの読み出しあるいは書き込みを行うことを特徴としている。
このような構成により、複数のワードそれぞれについて、アクセス制御手段が読み出し命令あるいは書き込み命令を行う必要がなく、一つの命令でブロック全体を読み出すことあるいは書き込むことができるため、消費電力を低減することができると共に、処理効率を向上させることが可能となる。 Further, the data to be accessed by the access control means is configured as a block including a plurality of words, and it is determined whether the scheduled data is cached or reading or writing the scheduled data in units of the block. It is characterized by performing.
With this configuration, it is not necessary for the access control means to execute a read command or a write command for each of a plurality of words, and the entire block can be read or written with a single command, thereby reducing power consumption. In addition, the processing efficiency can be improved.

また、前記キャッシュ判定手段は、前記アクセス中データを構成する複数のワードのうち、前記アクセス制御手段が末尾のワードの読み出しあるいは書き込みを指示することに対応して、前記予定データがキャッシュされているか否かを判定することを特徴としている。
一般に、予定データは、アクセス中データのより後のワードがアクセス制御手段にアクセスされているタイミングで予想する方が、的中する確率が高くなるものである。 Further, the cache determination means determines whether the scheduled data is cached in response to the access control means instructing reading or writing of the last word among the plurality of words constituting the accessed data. It is characterized by determining whether or not.
In general, it is more probable that the scheduled data is predicted at the timing when the later word of the data being accessed is accessed by the access control means.

したがって、このような構成により、より高い確率でアクセス制御手段にアクセスされるデータを予定データとして先読みすることが可能となる。
また、前記キャッシュ判定手段は、前記アクセス中データを構成する複数のワードのうち、前記アクセス制御手段が末尾のワードに先行するワードの読み出しあるいは書き込みを指示することに対応して、前記予定データがキャッシュされているか否かを判定することを特徴としている。 Therefore, with such a configuration, it is possible to prefetch data accessed by the access control means with higher probability as scheduled data.
Further, the cache determination means corresponds to the fact that the scheduled data is in response to the access control means instructing to read or write the word preceding the last word among the plurality of words constituting the data being accessed. It is characterized by determining whether or not it is cached.

このような構成により、より早いタイミングで予定データがキャッシュされているか否かを判定できるため、キャッシュされていない場合の処理（例えば、メモリデバイスから読み込む処理等）をより早く行うことができ、ウェイトサイクルの発生を防止することあるいは発生するウェイトサイクルを低減することが可能となる。
また、前記先読みキャッシュ手段は、前記キャッシュ判定手段によって、前記予定データがいずれかのウェイにキャッシュされていると判定された場合に、前記アクセス中データを構成する複数のワードのうち、前記アクセス制御手段が末尾のワードの読み出しあるいは書き込みを指示することに対応して、該予定データが記憶されているウェイにアクセスし、該予定データの読み出しあるいは書き込みを行うことを特徴としている。 With such a configuration, it is possible to determine whether or not the scheduled data is cached at an earlier timing, so that the processing when it is not cached (for example, processing to read from the memory device) can be performed earlier, and the wait It is possible to prevent the occurrence of cycles or reduce the number of wait cycles that occur.
Further, the prefetch cache means, when the cache determination means determines that the scheduled data is cached in any way, the access control among the plurality of words constituting the accessed data In response to the means instructing reading or writing of the last word, the means accesses the way in which the scheduled data is stored and reads or writes the scheduled data.

このような構成により、より早いタイミングで予定データがキャッシュされているか否かを判定した場合にも、予定データの読み出しあるいは書き込みが行われる確率が高くなったタイミングで、実際に予定データの読み出しあるいは書き込みを行うことができる。したがって、先読みキャッシュ手段に記憶された予定データがアクセス制御手段に読み出されない場合あるいは書き込まれない場合となる割合を低減できるため、処理効率の低下を防止することができる。 With such a configuration, even when it is determined whether or not the scheduled data is cached at an earlier timing, the scheduled data is actually read or read at the timing when the probability that the scheduled data is read or written is increased. Can write. Accordingly, since the ratio of the case where the scheduled data stored in the prefetch cache unit is not read or written to the access control unit can be reduced, it is possible to prevent the processing efficiency from being lowered.

また、前記キャッシュメモリにおける複数のウェイのうち、データのアクセスに関係しないウェイを低消費電力で動作させる低消費電力化手段（例えば、図１１の消費電力制御部８０）をさらに含むことを特徴としている。
このような構成により、不要な部分の消費電力を低減することができ、データ転送装置の消費電力をさらに低減することが可能となる。 Further, the present invention further includes power consumption reduction means (for example, a power consumption control unit 80 in FIG. 11) that operates a way that is not related to data access among a plurality of ways in the cache memory with low power consumption. Yes.
With such a configuration, it is possible to reduce power consumption of unnecessary portions and further reduce power consumption of the data transfer apparatus.

また、前記低消費電力化手段は、データの読み出しに関係しないウェイに対し、クロック信号を供給しないように制御するクロックゲーティング機能を備えることを特徴としている。
このような構成により、不要な部分に対するクロック信号を供給することにより発生する不要な消費電力を低減することができる。 Further, the low power consumption means is characterized by having a clock gating function for controlling so that a clock signal is not supplied to a way not related to data reading.
With such a configuration, unnecessary power consumption generated by supplying a clock signal to unnecessary portions can be reduced.

また、前記キャッシュメモリは、セット・アソシアティブ方式のキャッシュメモリであることを特徴としている。
このような構成により、セット・アソシアティブ方式のキャッシュメモリにおいて、エントリに含まれる各ウェイのアドレス記憶手段（タグ・テーブル）およびデータ記憶手段（データ・メモリ）に不必要にアクセスすることによって発生する消費電力を大幅に低減できると共に、処理効率を向上させることが可能となる。 Further, the cache memory is a set associative cache memory.
With this configuration, in the set associative cache memory, consumption caused by unnecessary access to the address storage means (tag table) and data storage means (data memory) of each way included in the entry The power can be greatly reduced and the processing efficiency can be improved.

また、前記先読みキャッシュ手段は、前記キャッシュ判定手段によって、前記キャッシュメモリのいずれのウェイにも前記予定データがキャッシュされていないと判定された場合、前記メモリデバイスにアクセスし、該予定データを読み出して記憶することを特徴としている。
このような構成により、予定データがキャッシュされていない場合に、メモリデバイスから予定データを読み出す処理をより早く行うことができるため、ウェイトサイクルの発生を防止することあるいは発生するウェイトサイクルを低減することが可能となる。 The prefetch cache unit accesses the memory device and reads the scheduled data when the cache determining unit determines that the scheduled data is not cached in any way of the cache memory. It is characterized by memorizing.
With this configuration, when the scheduled data is not cached, the process of reading the scheduled data from the memory device can be performed more quickly, so that the generation of wait cycles can be prevented or the generated wait cycles can be reduced. Is possible.

また、本発明は、
プロセッサ外部に備えられる第１のメモリと、前記第１のメモリと前記プロセッサとの間に備えられ、前記第１のメモリに記憶されているデータの少なくとも一部を複数のウェイにキャッシュしておくことが可能なキャッシュメモリと、前記プロセッサによって前記第１のメモリより高速にアクセス可能な第２のメモリとにおけるデータ転送方法であって、前記第１のメモリから前記第２のメモリへのデータの読み出し命令あるいは前記第２のメモリから前記第１のメモリへのデータの書き込み命令の少なくともいずれかを、前記キャッシュメモリに出力することにより、前記第１のメモリと前記第２のメモリとの間で前記プロセッサを介さずにデータ転送を行うアクセス制御ステップと、前記アクセス制御ステップにおいて読み出しあるいは書き込みが行われているアクセス中データの後にアクセスされると予想される予定データが、前記キャッシュメモリのいずれかのウェイにキャッシュされているか否かを判定するキャッシュ判定ステップと、前記キャッシュ判定ステップにおいて、前記予定データがいずれかのウェイにキャッシュされていると判定された場合に、前記複数のウェイのうち、該予定データが記憶されているウェイにアクセスし、該予定データを読み出して記憶する先読みキャッシュステップと、前記アクセス中データの後に前記予定データを対象とする読み出し命令が発行された場合、記憶している予定データを読み出させ、前記アクセス中データの後に前記予定データを対象とする書き込み命令が発行された場合、該予定データに対して書き込みを行うことを特徴としている。 The present invention also provides:
A first memory provided outside the processor, and provided between the first memory and the processor, and caches at least a part of data stored in the first memory in a plurality of ways. A method of transferring data between a cache memory capable of being transferred and a second memory accessible by the processor at a higher speed than the first memory, wherein the data is transferred from the first memory to the second memory. By outputting at least one of a read command or a data write command from the second memory to the first memory to the cache memory, between the first memory and the second memory An access control step for transferring data without going through the processor, and reading in the access control step In the cache determination step for determining whether or not the scheduled data that is expected to be accessed after the in-access data being written is cached in any way of the cache memory, and in the cache determination step When it is determined that the scheduled data is cached in any way, the prefetching that accesses the way in which the scheduled data is stored and reads and stores the scheduled data among the plurality of ways A cache step, and when a read instruction for the scheduled data is issued after the data being accessed, the stored planned data is read, and a write for the planned data is targeted after the data being accessed When an instruction is issued, write to the scheduled data. It is characterized in.

このように、本発明によれば、プロセッサを介することなくローカルメモリと外部のメモリデバイスとの間でデータ転送を行う場合に、キャッシュメモリに記憶されたデータとのコヒーレンシを保証しつつ、消費電力の増加、あるいは、処理効率の低下を防止することが可能となる。 As described above, according to the present invention, when data is transferred between a local memory and an external memory device without going through a processor, power consumption is ensured while ensuring coherency with data stored in the cache memory. It is possible to prevent an increase in the processing rate or a decrease in processing efficiency.

以下、図を参照して本発明に係るデータ転送装置の実施の形態を説明する。
まず、構成を説明する。
図１は、本発明に係るデータ転送装置１Ａを備える情報処理装置１の概略構成を示すブロック図である。
図１において、情報処理装置１は、ＣＰＵ（Central Processing Unit）コア２と、ローカルメモリ３と、ＤＭＡＣ（Direct Memory Access Controller）４と、キャッシュメモリ５と、メモリインターフェース（以下、「メモリＩ／Ｆ」と言う。）６と、メインメモリ７とを含んで構成され、これら各部はバスを介して接続されている。なお、図１に示す構成の場合、ローカルメモリ３、ＤＭＡＣ４、キャッシュメモリ５、メモリＩ／Ｆ６およびメインメモリ７がデータ転送装置１Ａを構成している。 Hereinafter, embodiments of a data transfer apparatus according to the present invention will be described with reference to the drawings.
First, the configuration will be described.
FIG. 1 is a block diagram showing a schematic configuration of an information processing apparatus 1 including a data transfer apparatus 1A according to the present invention.
In FIG. 1, an information processing apparatus 1 includes a CPU (Central Processing Unit) core 2, a local memory 3, a DMAC (Direct Memory Access Controller) 4, a cache memory 5, and a memory interface (hereinafter referred to as “memory I / F”). 6) and the main memory 7, which are connected via a bus. In the configuration shown in FIG. 1, the local memory 3, the DMAC 4, the cache memory 5, the memory I / F 6, and the main memory 7 constitute the data transfer device 1A.

ＣＰＵコア２は、情報処理装置１全体を制御するものであり、所定のプログラムを実行することにより、種々の処理を行う。例えば、ＣＰＵコア２は、メインメモリ７の所定のアドレスから演算対象となるデータあるいは命令コードを読み出して演算処理を行い、演算結果をメインメモリ７の所定のアドレスに書き込む動作を繰り返しながら、入力されたプログラムを実行する。 The CPU core 2 controls the entire information processing apparatus 1 and performs various processes by executing a predetermined program. For example, the CPU core 2 reads the data or instruction code to be calculated from a predetermined address in the main memory 7, performs the calculation process, and repeats the operation of writing the calculation result to the predetermined address in the main memory 7. Run the program.

このとき、ＣＰＵコア２がメインメモリ７にアクセスする処理を高速化するために、キャッシュメモリ５を介してデータが入出力される。
また、ＣＰＵコア２には、ＣＰＵコア２が高速にアクセス可能なローカルメモリ３が備えられている。そして、メインメモリ７とローカルメモリ３との間で、ＤＭＡＣ４がＣＰＵコア２を介することなくデータの転送を行うことにより、ＣＰＵコア２がメインメモリ７との間でデータ転送を行う処理負担を軽減し、処理の高速化が図られる。 At this time, data is input / output via the cache memory 5 in order to speed up the process of the CPU core 2 accessing the main memory 7.
The CPU core 2 includes a local memory 3 that can be accessed at high speed by the CPU core 2. The DMAC 4 transfers data between the main memory 7 and the local memory 3 without going through the CPU core 2, thereby reducing the processing burden of the CPU core 2 transferring data to and from the main memory 7. In addition, the processing speed can be increased.

ローカルメモリ３は、ＣＰＵの内蔵メモリとして構成され、ＳＲＡＭ（Static Random Access Memory）等、メインメモリ７より高速にＣＰＵコア２からアクセス可能な記憶素子を備えて構成されている。また、ローカルメモリは、メインメモリ７より小容量に構成されている。そして、ＤＭＡＣ４によって、ＣＰＵコア２の処理対象となる所定のデータがメインメモリ７からローカルメモリ３に読み込まれたり、ＣＰＵコア２の処理結果であるデータがローカルメモリ３からメインメモリ７に書き出されたりする。 The local memory 3 is configured as a built-in memory of the CPU, and includes a storage element such as an SRAM (Static Random Access Memory) that can be accessed from the CPU core 2 at a higher speed than the main memory 7. Further, the local memory has a smaller capacity than the main memory 7. Then, the DMAC 4 reads predetermined data to be processed by the CPU core 2 from the main memory 7 to the local memory 3, and data that is the processing result of the CPU core 2 is written from the local memory 3 to the main memory 7. Or

ＤＭＡＣ４は、ローカルメモリ３とメインメモリ７とにおけるＤＭＡを制御し、ＤＭＡの実行中にＣＰＵコア２をウェイト状態とさせたり、ＤＭＡの終了をＣＰＵコア２に通知したりする。
また、ＤＭＡＣ４は、ローカルメモリ３とメインメモリ７との間でデータの転送を行う際に、キャッシュメモリ５を介してデータの読み込みあるいは書き出しを行う構成である。 The DMAC 4 controls the DMA in the local memory 3 and the main memory 7, and puts the CPU core 2 into a wait state during execution of the DMA, or notifies the CPU core 2 of the end of the DMA.
The DMAC 4 is configured to read or write data via the cache memory 5 when data is transferred between the local memory 3 and the main memory 7.

このように、ローカルメモリ３とメインメモリ７との間でデータ転送を行う場合にキャッシュメモリ５を介することで、ローカルメモリ３においてＤＭＡ転送を行うのに先立ち、キャッシュメモリ５とメインメモリ７とのコヒーレンシを保証するための処理を行う必要がなくなる。したがって、メインメモリ７とローカルメモリ３との間でＤＭＡ転送を行う場合の処理の高速化を図ることができ、処理効率の向上を図ることが可能となる。 As described above, when data is transferred between the local memory 3 and the main memory 7, the cache memory 5 is connected to the main memory 7 before the DMA transfer is performed in the local memory 3 through the cache memory 5. There is no need to perform processing for guaranteeing coherency. Therefore, it is possible to increase the processing speed when performing DMA transfer between the main memory 7 and the local memory 3, and it is possible to improve the processing efficiency.

キャッシュメモリ５は、メインメモリ７より高速にＣＰＵコア２からアクセス可能な記憶素子を備えており、ＣＰＵコア２がメインメモリ７とデータを入出力する処理を高速化する。
また、キャッシュメモリ５は、ＤＭＡＣ４がローカルメモリ３とメインメモリ７との間でデータの転送を行う場合にデータの入出力を制御する。 The cache memory 5 includes a storage element that can be accessed from the CPU core 2 at a higher speed than the main memory 7, and speeds up the process in which the CPU core 2 inputs and outputs data to and from the main memory 7.
The cache memory 5 controls data input / output when the DMAC 4 transfers data between the local memory 3 and the main memory 7.

ここで、キャッシュメモリの方式には種々のものがあるが、セット・アソシアティブ方式が一般的であるため、ここでは２ウェイ（ウェイＡ，Ｂ）のセット・アソシアティブ方式のキャッシュメモリを例に挙げて説明する。
なお、セット・アソシアティブ方式とは、キャッシュメモリを複数の領域（ウェイ）に分割し、それぞれのウェイに、メモリデバイス上の異なるアドレスのデータを格納しておくことにより、ヒット率を向上させることができる方式である。 Here, there are various cache memory systems, but since the set associative system is common, here, a 2-way (way A, B) set associative cache memory is taken as an example. explain.
In the set associative method, the hit ratio is improved by dividing the cache memory into a plurality of areas (way) and storing data of different addresses on the memory device in each way. This is a possible method.

図２は、キャッシュメモリ５の機能構成を示すブロック図である。
図２において、キャッシュメモリ５は、アクセス調停部１０と、アクセス管理部２０と、先読みキャッシュ部３０と、タグ・テーブル４０と、データ・メモリ５０と、ヒット検出部６０と、ＭＵＸ７０とを含んで構成される。
また、図３は、タグ・テーブル４０およびデータ・メモリ５０に記憶されるデータの構成を示す図であり、（ａ）はタグ・テーブル４０内のデータの構成、（ｂ）はデータ・メモリ５０内のデータの構成を示している。 FIG. 2 is a block diagram showing a functional configuration of the cache memory 5.
In FIG. 2, the cache memory 5 includes an access arbitration unit 10, an access management unit 20, a prefetch cache unit 30, a tag table 40, a data memory 50, a hit detection unit 60, and a MUX 70. Composed.
FIG. 3 is a diagram showing a configuration of data stored in the tag table 40 and the data memory 50. (a) is a configuration of data in the tag table 40, and (b) is a data memory 50. The structure of the data in is shown.

以下、図２に基づいて、キャッシュメモリ５の構成を説明し、適宜、図３を参照することとする。なお、ここでは、キャッシュメモリ５が、２ウェイ（ウェイＡ，Ｂ）のセット・アソシアティブ方式である場合を想定している。
アクセス管理部２０は、キャッシュメモリ５全体を制御するものであり、後述する状態遷移図に従うように、キャッシュメモリ５を動作させる。 Hereinafter, the configuration of the cache memory 5 will be described with reference to FIG. 2, and FIG. 3 will be referred to as appropriate. Here, it is assumed that the cache memory 5 is a two-way (way A, B) set associative method.
The access management unit 20 controls the entire cache memory 5 and operates the cache memory 5 in accordance with a state transition diagram to be described later.

例えば、アクセス管理部２０は、ＣＰＵコア２あるいはＤＭＡＣ４から入力された読み出し命令に基づいて、読み出し命令に示されたアドレスのデータが先読みキャッシュ部３０に記憶されている場合には、そのアドレスに対応するデータをＣＰＵコア２あるいはＤＭＡＣ４に対して出力させると共に、引き続き読み出されるデータを予想し、予想したデータを先読みキャッシュ部３０のプロセッサ先読みバッファ３４あるいはＤＭＡ先読みバッファ３５に記憶させる。 For example, when the data at the address indicated in the read command is stored in the prefetch cache unit 30 based on the read command input from the CPU core 2 or the DMAC 4, the access management unit 20 corresponds to the address. The data to be output is output to the CPU core 2 or the DMAC 4, the data to be continuously read is predicted, and the predicted data is stored in the processor prefetch buffer 34 or the DMA prefetch buffer 35 of the prefetch cache unit 30.

一方、読み出し命令に示されたアドレスのデータが先読みキャッシュ部３０に記憶されていない場合、アクセス管理部２０は、タグ・テーブル４０を参照する。そして、そのアドレスがタグ・テーブル４０に記憶されている場合、アクセス管理部２０は、そのアドレスに対応するデータをデータ・メモリ５０からプロセッサ先読みバッファ３４あるいはＤＭＡ先読みバッファ３５に記憶させる。 On the other hand, when the data at the address indicated in the read command is not stored in the prefetch cache unit 30, the access management unit 20 refers to the tag table 40. When the address is stored in the tag table 40, the access management unit 20 stores data corresponding to the address from the data memory 50 into the processor prefetch buffer 34 or the DMA prefetch buffer 35.

また、読み出し命令に示されたアドレスがタグ・テーブル４０に記憶されていない場合、アクセス管理部２０は、メインメモリ７にアクセスし、そのアドレスのデータを先読みキャッシュ部３０の外部メモリ先読みバッファ３６に記憶させる。
先読みキャッシュ部３０は、ＣＰＵコア２あるいはＤＭＡＣ４から入力された読み出し命令を受け取り、読み出し命令に示されたアドレスをアクセス管理部２０に出力する。また、先読みキャッシュ部３０は、アクセス管理部２０の指示に従って、ＣＰＵコア２あるいはＤＭＡＣ４によって読み出されると予想されたデータを、データ・メモリ５０あるいはメインメモリ７から予め読み出して記憶し、ＣＰＵコア２あるいはＤＭＡＣ４から実際に読み出された場合に、そのデータをＣＰＵコア２あるいはＤＭＡＣ４に出力する。 If the address indicated in the read command is not stored in the tag table 40, the access management unit 20 accesses the main memory 7 and stores the data at that address in the external memory prefetch buffer 36 of the prefetch cache unit 30. Remember.
The prefetch cache unit 30 receives a read command input from the CPU core 2 or the DMAC 4 and outputs an address indicated by the read command to the access management unit 20. In addition, the prefetch cache unit 30 reads and stores data predicted to be read by the CPU core 2 or the DMAC 4 in advance from the data memory 50 or the main memory 7 in accordance with an instruction from the access management unit 20. When actually read from the DMAC 4, the data is output to the CPU core 2 or the DMAC 4.

具体的には、先読みキャッシュ部３０は、アドレス制御部３１と、プロセッサ書き込みバッファ３２と、ＤＭＡ書き込みバッファ３３と、プロセッサ先読みバッファ３４と、ＤＭＡ先読みバッファ３５と、外部メモリ先読みバッファ３６とを含んで構成される。
アドレス制御部３１は、ＣＰＵコア２あるいはＤＭＡＣ４から入力された読み出し命令あるいは書き込み命令から、読み出しあるいは書き込みの対象となっているデータのアドレスを取得し、アクセス管理部２０に出力する。また、アドレス制御部３１は、データ・メモリ５０にキャッシュされているデータを読み出す際に、タグ・テーブル４０およびデータ・メモリ５０に読み出しあるいは書き込み対象のアドレスを出力したり、データ・メモリ５０にキャッシュされていないデータをメインメモリ７から読み出す際に、読み出し対象のアドレスをメインメモリ７に出力したりする。 Specifically, the prefetch cache unit 30 includes an address control unit 31, a processor write buffer 32, a DMA write buffer 33, a processor prefetch buffer 34, a DMA prefetch buffer 35, and an external memory prefetch buffer 36. Composed.
The address control unit 31 acquires the address of the data to be read or written from the read command or the write command input from the CPU core 2 or the DMAC 4 and outputs it to the access management unit 20. In addition, when the data cached in the data memory 50 is read, the address control unit 31 outputs an address to be read or written to the tag table 40 and the data memory 50, or caches the data in the data memory 50. When reading out unread data from the main memory 7, an address to be read is output to the main memory 7.

なお、アドレス制御部３１は、アクセス管理部２０からデータの読み出し（先読み）が指示され、そのデータのアドレスがタグ・テーブル４０に記憶されている場合には、データ・メモリ５０の各ウェイのうち、そのデータが記憶されているウェイにのみアドレスを出力する。
そのため、不要なウェイへのアクセス回数を低減することができ、消費電力を低減することができると共に、処理効率の向上を実現することができる。 The address control unit 31 is instructed to read data (read ahead) from the access management unit 20 and stores the address of the data in the tag table 40. The address is output only to the way in which the data is stored.
As a result, the number of accesses to unnecessary ways can be reduced, power consumption can be reduced, and processing efficiency can be improved.

プロセッサ書き込みバッファ３２は、アクセス調停部１０によって入力されたＣＰＵコア２の書き込みデータを記憶する。
ＤＭＡ書き込みバッファ３３は、アクセス調停部１０によって入力されたＤＭＡＣ４の書き込みデータを記憶する。
なお、プロセッサ書き込みバッファ３２およびＤＭＡ書き込みバッファ３３に記憶されたデータは、これらのデータに関する書き込み命令が終了した場合に、データ・メモリ５０およびメインメモリ７に転送される。 The processor write buffer 32 stores the write data of the CPU core 2 input by the access arbitration unit 10.
The DMA write buffer 33 stores the write data of the DMAC 4 input by the access arbitration unit 10.
Note that the data stored in the processor write buffer 32 and the DMA write buffer 33 is transferred to the data memory 50 and the main memory 7 when a write command relating to these data is completed.

プロセッサ先読みバッファ３４は、データ・メモリ５０から読み出されたデータを、ＭＵＸ７０を介して受け取り、ＣＰＵコア２に出力するデータとして記憶する。
ＤＭＡ先読みバッファ３５は、データ・メモリ５０から読み出されたデータを、ＭＵＸ７０を介して受け取り、ＤＭＡＣ４に出力するデータとして記憶する。
外部メモリ先読みバッファ３６は、メインメモリ７から読み出されたデータを受け取り、ＣＰＵコア２あるいはＤＭＡＣ４に出力するデータとして記憶する。また、外部メモリ先読みバッファ３６に記憶されたデータは、キャッシュメモリ５において処理が行われないクロックタイミングにデータ・メモリ５０に格納される。 The processor prefetch buffer 34 receives the data read from the data memory 50 via the MUX 70 and stores it as data to be output to the CPU core 2.
The DMA prefetch buffer 35 receives the data read from the data memory 50 via the MUX 70 and stores it as data to be output to the DMAC 4.
The external memory prefetch buffer 36 receives data read from the main memory 7 and stores it as data to be output to the CPU core 2 or the DMAC 4. The data stored in the external memory prefetch buffer 36 is stored in the data memory 50 at a clock timing at which no processing is performed in the cache memory 5.

タグ・テーブル４０は、図３（ａ）に示すように、各エントリ（０番〜５１１番、ただし、エントリ数Ｎ＝５１２の場合）について、データ・メモリ５０に記憶されたデータがキャッシュヒットしたか否かを示すフラグと、データ・メモリ５０に記憶されたデータが記憶されているメインメモリ７上のアドレスとを記憶している。また、各エントリには、ウェイＡ，Ｂに対応するフラグおよびアドレスが記憶されている。タグ・テーブル４０に記憶されているアドレスを参照することにより、データ・メモリ５０のデータが、キャッシュにヒットしたか否かを判定することができる。 In the tag table 40, as shown in FIG. 3A, the data stored in the data memory 50 has a cache hit for each entry (0 to 511, where the number of entries is N = 512). And an address on the main memory 7 where the data stored in the data memory 50 is stored. Each entry stores flags and addresses corresponding to the ways A and B. By referring to the address stored in the tag table 40, it can be determined whether or not the data in the data memory 50 has hit the cache.

データ・メモリ５０は、各エントリについて、所定のメモリデバイスデータを記憶している。なお、データ・メモリ５０は、４ワードを１ブロックとして取り扱い、データ・メモリ５０からデータを読み出す場合には、エントリに含まれるいずれかのウェイの４ワード（ｗ０〜ｗ４）を１まとまりとして読み出すことが可能である。ただし、１ブロックにおける一部のワード（例えば、ワードｗ１〜ｗ３）を読み出すことも可能である。 The data memory 50 stores predetermined memory device data for each entry. The data memory 50 treats 4 words as one block, and when reading data from the data memory 50, reads 4 words (w0 to w4) of any way included in the entry as one group. Is possible. However, it is also possible to read a part of words (for example, words w1 to w3) in one block.

ヒット検出部６０は、アドレス制御部３１からタグ・テーブル４０に対し、読み出し命令あるいは書き込み命令に示されたアドレスが入力された場合に、データ・メモリ５０に記憶されているメモリデバイスデータがヒットしたか否かを検出する。具体的には、タグ・テーブル４０に記憶されたアドレスそれぞれを参照し、アドレス制御部３１から入力されたアドレスが検出されると、キャッシュがヒットしたものと判定する。そして、ヒット検出部６０は、ヒットしたウェイを示す情報をＭＵＸ７０に出力する。 The hit detection unit 60 hits the memory device data stored in the data memory 50 when the address indicated by the read command or the write command is input from the address control unit 31 to the tag table 40. Whether or not is detected. Specifically, each address stored in the tag table 40 is referred to, and when an address input from the address control unit 31 is detected, it is determined that the cache is hit. Then, the hit detection unit 60 outputs information indicating the hit way to the MUX 70.

ＭＵＸ７０は、ヒット検出部６０からヒットしたウェイを示す情報を受け取り、データ・メモリ５０の各ウェイの記憶領域からメモリデバイスデータを受け取る。そして、ＭＵＸ７０は、ヒット検出部６０から入力されたウェイに対応するメモリデバイスデータを選択し、プロセッサ先読みバッファ３４あるいはＤＭＡ先読みバッファ３５に出力する。
次に、動作を説明する。 The MUX 70 receives information indicating the hit way from the hit detection unit 60, and receives memory device data from the storage area of each way in the data memory 50. Then, the MUX 70 selects memory device data corresponding to the way input from the hit detection unit 60 and outputs the memory device data to the processor prefetch buffer 34 or the DMA prefetch buffer 35.
Next, the operation will be described.

情報処理装置１は、主としてアクセス管理部２０の制御によって、所定の動作に対応する状態遷移を行う。
まず、情報処理装置１の基本的な動作について説明する。
なお、ここではＣＰＵコア２あるいはＤＭＡＣ４から読み出し命令が入力された場合について説明する。 The information processing apparatus 1 performs state transition corresponding to a predetermined operation mainly under the control of the access management unit 20.
First, the basic operation of the information processing apparatus 1 will be described.
Here, a case where a read command is input from the CPU core 2 or the DMAC 4 will be described.

情報処理装置１の基本的な動作においては、読み出し対象であるメモリデバイスデータの最後のワードのアドレスがプロセッサから出力されるタイミングで、タグ・テーブル４０を参照し、引き続き読み出されると予想されるデータ（以下、「予定データ」と言う。）がキャッシュにヒットするか否かを検出（先読み）する。したがって、実際に読み出しあるいは書き込みが行われる可能性の高いデータについて、キャッシュを先読みすることができるため、先読みキャッシュ部３０におけるデータのヒット率を向上させることができる。 In the basic operation of the information processing apparatus 1, data that is expected to be continuously read with reference to the tag table 40 at the timing when the address of the last word of the memory device data to be read is output from the processor. (Hereinafter referred to as “scheduled data”) is detected (prefetched) whether or not it hits the cache. Therefore, since the cache can be prefetched for data that is likely to be actually read or written, the data hit rate in the prefetch cache unit 30 can be improved.

図４は、情報処理装置１の基本的な動作を示す状態遷移図である。
図４において、情報処理装置１は、状態Ｓ１〜Ｓ４を遷移し、それぞれの状態間を遷移するための遷移条件Ｃ１〜Ｃ１２が定められている。
状態Ｓ１（ST-PRC-NORMAL）においては、予定データが、プロセッサ先読みバッファ３４あるいはＤＭＡ先読みバッファ３５に記憶されている場合（先読みキャッシュにヒットしている場合）に、そのデータをブロック単位でＣＰＵコア２あるいはＤＭＡＣ４に出力する。 FIG. 4 is a state transition diagram showing the basic operation of the information processing apparatus 1.
In FIG. 4, the information processing apparatus 1 transitions between states S1 to S4, and transition conditions C1 to C12 are defined for transitioning between the states.
In the state S1 (ST-PRC-NORMAL), when the scheduled data is stored in the processor prefetch buffer 34 or the DMA prefetch buffer 35 (when the prefetch cache is hit), the data is transferred to the CPU in units of blocks. Output to core 2 or DMAC 4.

また、状態Ｓ１においては、予定データが、プロセッサ先読みバッファ３４あるいはＤＭＡ先読みバッファ３５に記憶されていない場合には、読み出し対象のアドレスに基づいて、タグ・テーブル４０およびデータ・メモリ５０にアクセスし、そのアドレスと一致するデータをデータ・メモリ５０から読み込む状態（ST-PREREAD-ACTIVE）に移行する。
さらに、状態Ｓ１においては、メインメモリ７から読み込まれているデータのブロックのうち、末尾のワードの読み出しが完了するまでは、キャッシュのリード（データ・メモリ５０の読み出し）を行わない。 In the state S1, when the scheduled data is not stored in the processor prefetch buffer 34 or the DMA prefetch buffer 35, the tag table 40 and the data memory 50 are accessed based on the read target address. The state in which data matching the address is read from the data memory 50 (ST-PREREAD-ACTIVE) is entered.
Further, in the state S1, the cache reading (reading of the data memory 50) is not performed until the reading of the last word in the block of data read from the main memory 7 is completed.

状態Ｓ２（ST-PREREAD-ACTIVE）においては、予定データのアドレスに基づいて、タグ・テーブル４０にのみアクセスし、タグ・テーブル４０に記憶されたアドレスと一致（キャッシュにヒット）した場合、そのアドレスに対応するデータをデータ・メモリ５０から読み出す。
状態Ｓ３（ST-CACHE-HIT-TEST）においては、タグ・テーブル４０とデータ・メモリ５０にアクセスし、予定データのアドレスがタグ・テーブル４０のアドレスと一致するか否かを検出する。そして、状態Ｓ３においては、予定データのアドレスと一致したアドレスに対応するデータをデータ・メモリ５０から読み出す。 In the state S2 (ST-PREREAD-ACTIVE), only the tag table 40 is accessed based on the address of the scheduled data, and if the address matches the address stored in the tag table 40 (cache hit), that address Is read from the data memory 50.
In the state S3 (ST-CACHE-HIT-TEST), the tag table 40 and the data memory 50 are accessed to detect whether the address of the scheduled data matches the address of the tag table 40. In the state S3, data corresponding to the address that matches the address of the scheduled data is read from the data memory 50.

状態Ｓ４（ST-EXMEM-ACCESS）においては、メインメモリ７を読み出すためのステートマシン“sm-exmem-access”（図５参照）を起動し、メインメモリ７を読み出す。状態Ｓ４から他の状態へ遷移するタイミングは、１ワードの読み込みが終了する時点であり、ステートマシン“sm-exmem-access”の動作の終了を待たない。即ち、他の状態においては、メインメモリ７の読み出しを待つためのウェイトサイクル（wait-cycle）が生ずる場合がある。 In the state S4 (ST-EXMEM-ACCESS), the state machine “sm-exmem-access” (see FIG. 5) for reading the main memory 7 is activated and the main memory 7 is read. The timing of transition from the state S4 to another state is the time when reading of one word is completed, and the end of the operation of the state machine “sm-exmem-access” is not waited. That is, in other states, a wait cycle (wait-cycle) for waiting for reading from the main memory 7 may occur.

遷移条件Ｃ１（CND-PRA-START）は、状態Ｓ１において、読み込み対象であるデータの末尾のワード（１６進数で表されたアドレスの末尾が“Ｃ”のワード）のアドレスが、ＣＰＵコア２あるいはＤＭＡＣ４から入力されることを意味している。
遷移条件Ｃ２（CND-PRA-END）は、状態Ｓ２において、ウェイトサイクルが発生しなければ、次のサイクルで状態Ｓ１に戻ることを意味している。 In the transition condition C1 (CND-PRA-START), in the state S1, the address of the last word of the data to be read (the word whose hexadecimal address ends with “C”) is the CPU core 2 or It means that it is input from DMAC4.
The transition condition C2 (CND-PRA-END) means that if no wait cycle occurs in the state S2, the state returns to the state S1 in the next cycle.

遷移条件Ｃ３（CND-CHT-START）は、状態Ｓ１において、予定データが、先読みキャッシュにヒットしない場合（プロセッサ先読みバッファ３４あるいはＤＭＡ先読みバッファ３５に記憶されていない場合）を意味している。
遷移条件Ｃ４（CND-CHT-CNT）は、状態Ｓ３を継続するための条件である。即ち、先読みキャッシュ部３０に予定データが記憶されていないため、タグ・テーブル４０およびデータ・メモリ５０にアクセスしてキャッシュヒットを確認し続けるための条件である。なお、分岐命令について、分岐先のアドレスがブロックの末尾であり、かつ状態Ｓ３である場合には、次のサイクルでブロックの先頭のワードにアクセスすることから、連続して先読みキャッシュにヒットしない場合に、先読みキャッシュのミスヒットと判定される。 The transition condition C3 (CND-CHT-START) means that the scheduled data does not hit the prefetch cache in the state S1 (when it is not stored in the processor prefetch buffer 34 or the DMA prefetch buffer 35).
The transition condition C4 (CND-CHT-CNT) is a condition for continuing the state S3. In other words, since the scheduled data is not stored in the prefetch cache unit 30, it is a condition for continuously checking the cache hit by accessing the tag table 40 and the data memory 50. For branch instructions, if the branch destination address is at the end of the block and is in state S3, the first word of the block is accessed in the next cycle, so the prefetch cache is not hit continuously. On the other hand, it is determined that the prefetch cache is a miss hit.

遷移条件Ｃ５（CND-CHT-PRA）は、状態Ｓ３から状態Ｓ２へ遷移するための条件である。即ち、先読みキャッシュ部３０に予定データが記憶されていないため、タグ・テーブル４０およびデータ・メモリ５０にアクセスしてキャッシュヒットを確認する状態から、タグ・テーブル４０にのみアクセスし、キャッシュヒットを確認する状態へ遷移するための条件である。なお、分岐命令について、分岐先のアドレスがブロックの末尾から２番目（１６進数で表されたアドレスの末尾が“８”のワード）であり、かつ状態Ｓ３である場合には、次のサイクルでブロックの末尾のデータにアクセスすることとなり、即ち、状態Ｓ２に遷移することとなるため、状態Ｓ１に戻ることなく、状態Ｓ２に直接遷移するものである。 The transition condition C5 (CND-CHT-PRA) is a condition for making a transition from the state S3 to the state S2. That is, since the scheduled data is not stored in the prefetch cache unit 30, only the tag table 40 is accessed and the cache hit is confirmed from the state where the cache hit is confirmed by accessing the tag table 40 and the data memory 50. It is a condition for making a transition to a state to perform. For the branch instruction, if the branch destination address is the second from the end of the block (the word expressed as a hexadecimal number with the end of the address “8”) and is in the state S3, in the next cycle Since the data at the end of the block is accessed, that is, the state transits to the state S2, the state directly transits to the state S2 without returning to the state S1.

遷移条件Ｃ６（CND-CHT-END）は、状態Ｓ３において、分岐先のアドレスがブロックの先頭および２番目（１６進数で表されたアドレスの末尾が“０”あるいは“４”のワード）である場合に、先読みキャッシュにヒットしている状態であれば、状態Ｓ１に戻ることを意味している。
遷移条件Ｃ７（CND-EMA-START）は、状態Ｓ３において、キャッシュにヒットしない場合（データ・メモリ５０に読み込み対象であるデータが記憶されていない場合）を意味している。 In the transition condition C6 (CND-CHT-END), in the state S3, the branch destination address is the head and the second of the block (the word whose hexadecimal address ends with “0” or “4”). In this case, if the prefetch cache is hit, it means that the process returns to the state S1.
The transition condition C7 (CND-EMA-START) means that the cache is not hit in the state S3 (when the data to be read is not stored in the data memory 50).

遷移条件Ｃ８（CND-PRA-EMA）は、状態Ｓ２において、キャッシュにヒットしない場合を意味している。
遷移条件Ｃ９（CND-PRA-CHT）は、状態Ｓ２において、先読みキャッシュにヒットしない場合を意味している。
遷移条件Ｃ１０（CND-NORM-CNT）は、状態Ｓ１において、先読みキャッシュにヒットしている場合あるいはメインメモリ７にアクセスしている場合を意味している。 The transition condition C8 (CND-PRA-EMA) means a case where the cache is not hit in the state S2.
The transition condition C9 (CND-PRA-CHT) means a case where the prefetch cache is not hit in the state S2.
The transition condition C10 (CND-NORM-CNT) means a case where the prefetch cache is hit or the main memory 7 is accessed in the state S1.

遷移条件Ｃ１１（CND-PRA-CNT）は、状態Ｓ２において、先読み処理を継続する場合を意味している。
遷移条件Ｃ１２（CND-EMA-END）は、状態Ｓ４において、メインメモリ７へのアクセスが終了した場合を意味している。
次に、メインメモリ７を読み出すためのステートマシン“sm-exmem-access”について説明する。 The transition condition C11 (CND-PRA-CNT) means a case where the prefetch process is continued in the state S2.
The transition condition C12 (CND-EMA-END) means a case where access to the main memory 7 is completed in the state S4.
Next, the state machine “sm-exmem-access” for reading the main memory 7 will be described.

図５は、情報処理装置１上に構築されるステートマシン“sm-exmem-access”の動作を示す状態遷移図である。
図５において、情報処理装置１は、状態Ｔ１〜Ｔ６を遷移する。
状態Ｔ１（ST-WAIT）においては、メインメモリ７へのアクセスを停止している。状態Ｔ１においては、所定のタイミングで、状態Ｔ２に移行する。 FIG. 5 is a state transition diagram showing the operation of the state machine “sm-exmem-access” constructed on the information processing apparatus 1.
In FIG. 5, the information processing apparatus 1 transitions between states T1 to T6.
In the state T1 (ST-WAIT), access to the main memory 7 is stopped. In the state T1, the state transitions to the state T2 at a predetermined timing.

状態Ｔ２（ST-EXMEM-READ-1W-S）においては、読み出し対象であるデータの第１番目のワードをメインメモリ７から読み出し、読み出す処理が終了すると、状態Ｔ３に移行する。
状態Ｔ３（ST-EXMEM-READ-1W-E-2W-S）においては、読み出し対象であるデータの第２番目のワードをメインメモリ７から読み出し、読み出す処理が終了すると、状態Ｔ４に移行する。 In the state T2 (ST-EXMEM-READ-1W-S), when the first word of the data to be read is read from the main memory 7 and the reading process is completed, the process proceeds to the state T3.
In the state T3 (ST-EXMEM-READ-1W-E-2W-S), when the second word of the data to be read is read from the main memory 7 and the reading process is completed, the process proceeds to the state T4.

状態Ｔ４（ST-EXMEM-READ-2W-E-3W-S）においては、読み出し対象であるデータの第３番目のワードをメインメモリ７から読み出し、読み出す処理が終了すると、状態Ｔ５に移行する。
状態Ｔ５（ST-EXMEM-READ-3W-E-4W-S）においては、読み出し対象であるデータの第４番目のワードをメインメモリ７から読み出し、読み出す処理が終了すると、状態Ｔ６に移行する。 In the state T4 (ST-EXMEM-READ-2W-E-3W-S), when the third word of the data to be read is read from the main memory 7 and the reading process is completed, the process proceeds to the state T5.
In the state T5 (ST-EXMEM-READ-3W-E-4W-S), when the fourth word of the data to be read is read from the main memory 7 and the reading process is completed, the process proceeds to the state T6.

状態Ｔ６（ST-EXMEM-READ-4W-E）においては、読み出し対象であるデータの第４番目のワードをメインメモリ７から読み出す処理が終了することに対応して、状態Ｔ１に戻る。
図４および図５に示すように各状態を遷移する結果、情報処理装置１は、ＣＰＵコア２あるいはＤＭＡＣ４によって読み出されるデータに応じて、具体的に以下のような動作を行う。 In the state T6 (ST-EXMEM-READ-4W-E), the process returns to the state T1 in response to the completion of the process of reading the fourth word of the data to be read from the main memory 7.
As a result of transitioning between the states as shown in FIGS. 4 and 5, the information processing apparatus 1 specifically performs the following operation according to the data read by the CPU core 2 or the DMAC 4.

まず、ＣＰＵコア２あるいはＤＭＡＣ４によって読み出されるデータが、連続して先読みキャッシュにヒットする場合の例について説明する。
図６は、ＣＰＵコア２あるいはＤＭＡＣ４によって読み出されるデータが、連続して先読みキャッシュにヒットする場合の動作例を示すタイミングチャートである。
図６においては、ＣＰＵコア２あるいはＤＭＡＣ４によって、連続するアドレスのデータ（アドレス“Ａ００〜Ａ０Ｃ”、“Ａ１０〜Ａ１Ｃ”および“Ａ２０〜Ａ２Ｃ”のデータ）が読み出される場合を示している。なお、以下、アドレス“Ａ００〜Ａ０Ｃ”、“Ａ１０〜Ａ１Ｃ”および“Ａ２０〜Ａ２Ｃ”によって示されるデータをそれぞれ第１〜第３のデータと称する。 First, an example will be described in which data read by the CPU core 2 or the DMAC 4 continuously hits the prefetch cache.
FIG. 6 is a timing chart showing an operation example when data read by the CPU core 2 or the DMAC 4 continuously hits the prefetch cache.
FIG. 6 shows a case where data of continuous addresses (data of addresses “A00 to A0C”, “A10 to A1C”, and “A20 to A2C”) is read by the CPU core 2 or the DMAC 4. Hereinafter, data indicated by addresses “A00 to A0C”, “A10 to A1C”, and “A20 to A2C” are referred to as first to third data, respectively.

図６において、第１のデータにおける末尾のワードのアドレス（アドレス“Ａ０Ｃ”）がＣＰＵコア２あるいはＤＭＡＣ４から入力されるタイミング（サイクル“４”）で、予定データである第２のデータのアドレスがタグ・テーブル４０の各ウェイに入力される。
すると、次のクロックタイミング（サイクル“５”）において、タグ・テーブル４０の各ウェイに記憶されたアドレスが出力されると共に、それらのアドレスと第２のデータのアドレスとが一致するか否かが判定され、ここでは一致していることから、キャッシュにヒットしたことが検出される（C ACHE-HIT＝１）。また、このとき、第２のデータのアドレスが、データ・メモリ５０の第２のデータが記憶されている方のウェイ（ここでは、図６中に実線で示されているウェイＡであるものとし、第２のデータが記憶されていないウェイＢについては点線で示す。以下、同様である。）に入力される（WAYA-DATA-ADRS，WAYB-DATA-ADRS）。 In FIG. 6, the address of the last data in the first data (address “A0C”) is input from the CPU core 2 or the DMAC 4 (cycle “4”). Input to each way of the tag table 40.
Then, at the next clock timing (cycle “5”), the addresses stored in the respective ways of the tag table 40 are output, and whether these addresses match the address of the second data or not. In this case, since they match, it is detected that the cache is hit (C ACHE-HIT = 1). At this time, the address of the second data is assumed to be the way in which the second data of the data memory 50 is stored (here, the way A indicated by the solid line in FIG. 6). The way B in which the second data is not stored is indicated by a dotted line (the same applies hereinafter) (WAYA-DATA-ADRS, WAYB-DATA-ADRS).

そして、引き続くクロックタイミング（サイクル“６”）において、データ・メモリ５０から、第２のデータを記憶しているウェイのデータ（WAYA-TAG-DATA，WAYB-TAG-DATA）が出力されると共に、ヒット検出部６０から、いずれかのウェイを選択する情報（WAY-SELECT）が出力される。その結果、選択されたウェイのメモリデバイスデータ（データ“Ｄ１０”）が、ＣＰＵコア２あるいはＤＭＡＣ４に対して出力される（PBUS-RDDATA）。 At the subsequent clock timing (cycle “6”), the data of the way (WAYA-TAG-DATA, WAYB-TAG-DATA) storing the second data is output from the data memory 50, and The hit detection unit 60 outputs information (WAY-SELECT) for selecting any way. As a result, the memory device data (data “D10”) of the selected way is output to the CPU core 2 or the DMAC 4 (PBUS-RDDATA).

即ち、サイクル“５”で行われたＣＰＵコア２あるいはＤＭＡＣ４からの読み出し命令に対し、情報処理装置１は、対応するメモリデバイスデータをサイクル“６”で出力している。
また、情報処理装置１においては、メモリデバイスデータをブロック単位で読み出すことができるため、データ“Ｄ１０”を読み出すことにより、同じブロックの他のデータ（データ“Ｄ１４”〜“Ｄ１Ｃ”）もまとめて読み出され、プロセッサ先読みバッファ３４に記憶される。その結果、データ“Ｄ１０”に引き続く３ワードは、それぞれを読み出すためにタグ・テーブル４０およびデータ・メモリ５０にアクセスすることなく、データ“Ｄ１０”に連続して、プロセッサ先読みバッファ３４からＣＰＵコア２あるいはＤＭＡＣ４に出力されることとなる。 That is, in response to a read command from the CPU core 2 or the DMAC 4 executed in the cycle “5”, the information processing apparatus 1 outputs corresponding memory device data in the cycle “6”.
In the information processing apparatus 1, since memory device data can be read in units of blocks, by reading the data “D10”, other data (data “D14” to “D1C”) in the same block are also collected. It is read and stored in the processor prefetch buffer 34. As a result, the three words following the data “D10” are continuously accessed from the processor prefetch buffer 34 to the CPU core 2 without accessing the tag table 40 and the data memory 50 to read each word. Or it will be output to DMAC4.

なお、情報処理装置１は、第２のデータをＣＰＵコア２あるいはＤＭＡＣ４に出力しながら、上述のような処理によって第３のデータを先読みし、同様にＣＰＵコア２あるいはＤＭＡＣ４に出力する。
次に、ＣＰＵコア２あるいはＤＭＡＣ４によって読み出されるデータが、先読みキャッシュにヒットしない場合の例について説明する。 Note that the information processing apparatus 1 prefetches the third data by the processing as described above while outputting the second data to the CPU core 2 or the DMAC 4, and similarly outputs the third data to the CPU core 2 or the DMAC 4.
Next, an example in which data read by the CPU core 2 or the DMAC 4 does not hit the prefetch cache will be described.

図７は、ＣＰＵコア２あるいはＤＭＡＣ４によって読み出されるデータが、先読みキャッシュにヒットしない場合の動作例を示すタイミングチャートである。なお、図７におけるデータ名や信号名等は、図６における場合と同様である。
図７において、サイクル“６”までの動作は、図６に示すサイクル“６”までの動作とほぼ同様である。ただし、サイクル“５”で読み出される第２のデータの先頭ワードは分岐命令であり、その命令はサイクル“６”で実行される。 FIG. 7 is a timing chart showing an operation example when data read by the CPU core 2 or the DMAC 4 does not hit the prefetch cache. The data names and signal names in FIG. 7 are the same as those in FIG.
In FIG. 7, the operation up to cycle “6” is almost the same as the operation up to cycle “6” shown in FIG. However, the first word of the second data read in cycle “5” is a branch instruction, and the instruction is executed in cycle “6”.

そして、サイクル“７”において、分岐先であるアドレス“Ａ４４”のデータは、プロセッサ先読みバッファ３４あるいはＤＭＡ先読みバッファ３５に記憶されていないことから、先読みキャッシュにヒットしないことが検出される（PRC-HIT＝０）。このとき、情報処理装置１は、ノーウェイトでデータの供給を行うため、次のサイクルでアドレス“Ａ４４”に対応するメモリデバイスデータ“Ｄ４４”を出力するべく、タグ・テーブル４０およびデータ・メモリ５０の各ウェイに、アドレス“Ａ４４”のワードを含むブロック（以下、「分岐先データ」と言う。）のアドレスを出力する。 In cycle “7”, since the data of the branch destination address “A44” is not stored in the processor prefetch buffer 34 or the DMA prefetch buffer 35, it is detected that the prefetch cache is not hit (PRC−). HIT = 0). At this time, since the information processing apparatus 1 supplies data without waiting, the tag table 40 and the data memory 50 are to output the memory device data “D44” corresponding to the address “A44” in the next cycle. The address of the block including the word of address “A44” (hereinafter referred to as “branch destination data”) is output to each of the ways.

そして、サイクル“８”において、タグ・テーブル４０から、各ウェイに記憶されたアドレス（WAYA-TAG-DATA，WAYB-TAG-DATA）が出力されると共に、データ・メモリ５０から、各ウェイのデータ（WAYA-TAG-DATA，WAYB-TAG-DATA）が出力される。このとき、タグ・テーブル４０の各ウェイに記憶されたアドレスと、分岐先データのアドレスとが一致することから、キャッシュにヒットしたことが検出される（CACHE-HIT＝１）。さらに、ヒット検出部６０から、いずれかのウェイを選択する情報（WAY-SELECT）が出力される。その結果、選択されたウェイのメモリデバイスデータ（データ“Ｄ４４”）がＣＰＵコア２あるいはＤＭＡＣ４に対して出力される（PBUS-RDDATA）。 In cycle “8”, the address (WAYA-TAG-DATA, WAYB-TAG-DATA) stored in each way is output from the tag table 40 and the data of each way is output from the data memory 50. (WAYA-TAG-DATA, WAYB-TAG-DATA) is output. At this time, since the address stored in each way of the tag table 40 matches the address of the branch destination data, it is detected that the cache is hit (CACHE-HIT = 1). Further, the hit detection unit 60 outputs information (WAY-SELECT) for selecting any way. As a result, the memory device data (data “D44”) of the selected way is output to the CPU core 2 or the DMAC 4 (PBUS-RDDATA).

即ち、サイクル“７”で行われたプロセッサからの分岐先の読み出し命令に対し、情報処理装置１は、対応するメモリデバイスデータをサイクル“８”で出力している。
ここで、分岐先であるアドレス“Ａ４４”は、ブロックの第２番目のワードであることから、情報処理装置１においては、そのブロックの第２番目から第４番目のワード（アドレス“Ａ４４〜”Ａ４Ｃ“のワード）がまとめて読み出され、プロセッサ先読みバッファ３４あるいはＤＭＡ先読みバッファ３５に記憶される。 That is, in response to the branch destination read instruction from the processor executed in cycle “7”, the information processing apparatus 1 outputs the corresponding memory device data in cycle “8”.
Here, since the branch destination address “A44” is the second word of the block, in the information processing apparatus 1, the second to fourth words (address “A44˜”) of the block are used. A4C "word) are read together and stored in the processor prefetch buffer 34 or DMA prefetch buffer 35.

この後、情報処理装置１は、図６における処理と同様に、分岐先データをＣＰＵコア２あるいはＤＭＡＣ４に出力する。
次に、ＣＰＵコア２あるいはＤＭＡＣ４によって読み出されるデータが、先読みキャッシュにもキャッシュにもヒットしない場合の例について説明する。
図８は、ＣＰＵコア２あるいはＤＭＡＣ４によって読み出されるデータが、先読みキャッシュにもキャッシュにもヒットしない場合の動作例を示すタイミングチャートである。なお、図８におけるデータ名や信号名等は、図６における場合と同様である。 Thereafter, the information processing apparatus 1 outputs the branch destination data to the CPU core 2 or the DMAC 4 as in the processing in FIG.
Next, an example in which data read by the CPU core 2 or the DMAC 4 does not hit the prefetch cache or the cache will be described.
FIG. 8 is a timing chart showing an operation example when data read by the CPU core 2 or the DMAC 4 does not hit the prefetch cache or the cache. The data names and signal names in FIG. 8 are the same as in FIG.

図８において、サイクル“７”までの動作は、図７に示すサイクル“７”までの動作と同様である。
そして、サイクル“８”において、タグ・テーブル４０から、各ウェイに記憶されたアドレス（WAYA-TAG-DATA，WAYB-TAG-DATA）が出力されると共に、データ・メモリ５０から、各ウェイのデータ（WAYA-TAG-DATA，WAYB-TAG-DATA）が出力される。このとき、タグ・テーブル４０の各ウェイに記憶されたアドレスと、分岐先データのアドレスとが一致しないことから、キャッシュにヒットしないことが検出される（CACHE-HIT＝０）。 In FIG. 8, the operation up to cycle “7” is the same as the operation up to cycle “7” shown in FIG.
In cycle “8”, the address (WAYA-TAG-DATA, WAYB-TAG-DATA) stored in each way is output from the tag table 40 and the data of each way is output from the data memory 50. (WAYA-TAG-DATA, WAYB-TAG-DATA) is output. At this time, since the address stored in each way of the tag table 40 does not match the address of the branch destination data, it is detected that the cache is not hit (CACHE-HIT = 0).

すると、アドレス“Ａ４４”のデータが読み出せないことから、情報処理装置１は、メインメモリ７からデータを読み出す。そのため、メインメモリ７からデータが取り込めるまでの３サイクル分、ウェイトサイクルが発生する。
なお、情報処理装置１は、メインメモリ７から取り込んだ分岐先データを外部メモリ先読みバッファ３６に順次記憶する。このとき、メインメモリ７からデータを取り込むためには、キャッシュの場合と異なり、１ワードにつき２サイクルを要している（データ“Ｄ４４〜Ｄ４８”）。そして、情報処理装置１は、分岐先データを外部メモリ先読みバッファ３６に記憶した後には、図６における処理と同様に、引き続くデータを先読みし、同様にＣＰＵコア２あるいはＤＭＡＣ４に出力する。また、外部メモリ先読みバッファ３６に記憶された分岐先データは、データ・メモリ５０へのアクセスが行われていないタイミングで、データ・メモリ５０にキャッシュされる。さらに、外部メモリ先読みバッファ３６にメインメモリ７から取り込んだデータが記憶されている状態で、そのデータに対する読み出し命令がＣＰＵコア２あるいはＤＭＡＣ４から入力された場合、外部メモリ先読みバッファ３６に記憶されたデータが、ＣＰＵコア２あるいはＤＭＡＣ４に出力される。 Then, since the data of the address “A44” cannot be read, the information processing apparatus 1 reads the data from the main memory 7. Therefore, a wait cycle is generated for three cycles until data can be taken in from the main memory 7.
The information processing apparatus 1 sequentially stores branch destination data fetched from the main memory 7 in the external memory prefetch buffer 36. At this time, in order to fetch data from the main memory 7, unlike the case of the cache, two cycles per word are required (data “D44 to D48”). Then, after storing the branch destination data in the external memory prefetch buffer 36, the information processing apparatus 1 prefetches the subsequent data and outputs it to the CPU core 2 or the DMAC 4 in the same manner as in the processing in FIG. The branch destination data stored in the external memory prefetch buffer 36 is cached in the data memory 50 at a timing when the data memory 50 is not accessed. Further, when the data fetched from the main memory 7 is stored in the external memory prefetch buffer 36 and a read command for the data is input from the CPU core 2 or the DMAC 4, the data stored in the external memory prefetch buffer 36 is stored. Is output to the CPU core 2 or the DMAC 4.

次に、ＣＰＵコア２あるいはＤＭＡＣ４によって読み出されるデータが、連続するアドレスのデータであるにもかかわらず、先読みが行えない場合（キャッシュにヒットしない場合）の例について説明する。なお、このような場合には、予定データをメインメモリ７から取り込む必要がある。
図９は、ＣＰＵコア２あるいはＤＭＡＣ４によって読み出されるデータが、連続するアドレスのデータであるにもかからず、キャッシュにヒットしない場合の動作例を示すタイミングチャートである。なお、図９におけるデータ名や信号名等は、図６における場合と同様である。 Next, an example will be described in which the data read by the CPU core 2 or the DMAC 4 is data at continuous addresses, but prefetching is not possible (when no cache is hit). In such a case, it is necessary to fetch the scheduled data from the main memory 7.
FIG. 9 is a timing chart showing an operation example when the data read by the CPU core 2 or the DMAC 4 does not hit the cache even though the data is continuous address data. The data names and signal names in FIG. 9 are the same as those in FIG.

図９において、サイクル“４〜５”の動作は、図８に示すサイクル“６〜７”の動作とほぼ同様である。ただし、図９の場合、キャッシュにヒットしないことが検出されるサイクル“５”において、メインメモリ７へのアクセスを直ちに開始している。
そして、メインメモリ７へのアクセスを行ってから３サイクルの後（サイクル“８”）、メインメモリ７から予定データの各ワードが順次取り込まれる。 In FIG. 9, the operation of cycles “4 to 5” is almost the same as the operation of cycles “6 to 7” shown in FIG. However, in the case of FIG. 9, the access to the main memory 7 is immediately started in the cycle “5” in which it is detected that the cache is not hit.
Then, after three cycles after accessing the main memory 7 (cycle “8”), each word of the scheduled data is sequentially fetched from the main memory 7.

即ち、ＣＰＵコア２あるいはＤＭＡＣ４から読み出し対象となるデータのアドレス（アドレス“Ａ１０”）が入力されるサイクル“５”に対し、３サイクル後にメインメモリ７のデータがＣＰＵコア２あるいはＤＭＡＣ４に出力されることとなる。
この結果、従来のように、ＣＰＵコア２あるいはＤＭＡＣ４から読み出し対象となるデータのアドレスが入力されてから、キャッシュにヒットするか否かを検出する場合に比べ、１サイクル早いタイミングでメインメモリ７のデータを取り込むことが可能となる。つまり、従来の方法では、ＣＰＵコア２あるいはＤＭＡＣ４から読み出し命令が入力されてから、ＣＰＵコア２あるいはＤＭＡＣ４にデータが出力されるまで４サイクルを要していたが、図９においては、３サイクルに短縮されている。 That is, the data in the main memory 7 is output to the CPU core 2 or the DMAC 4 after three cycles with respect to the cycle “5” in which the address (address “A10”) of the data to be read is input from the CPU core 2 or the DMAC 4. It will be.
As a result, as compared with the conventional case, when the address of the data to be read is input from the CPU core 2 or the DMAC 4 and whether or not the cache is hit is detected, the main memory 7 has a timing one cycle earlier. Data can be imported. That is, in the conventional method, four cycles are required from when a read command is input from the CPU core 2 or the DMAC 4 until data is output to the CPU core 2 or the DMAC 4, but in FIG. It has been shortened.

なお、図４〜図９の説明においては、予定データがキャッシュにヒットするか否かを検出（先読み）するタイミングを、読み出し対象であるメモリデバイスデータの最後のワードのアドレスがＣＰＵコア２あるいはＤＭＡＣ４から出力されるタイミングであるものとして説明したが、読み出し対象であるメモリデバイスデータの先頭のワードのアドレスがプロセッサから入力されるタイミングで先読みを行うこととしても良い。この場合、先読みしたデータが実際にＣＰＵコア２あるいはＤＭＡＣ４から読み出される確率が低下するものの、キャッシュにヒットしない場合にウェイトサイクルのペナルティを軽減できる。 In the description of FIG. 4 to FIG. 9, the timing for detecting whether or not the scheduled data hits the cache (prefetching), the address of the last word of the memory device data to be read is the CPU core 2 or the DMAC 4 However, it is also possible to perform prefetching at the timing when the address of the first word of the memory device data to be read is input from the processor. In this case, although the probability that the prefetched data is actually read from the CPU core 2 or the DMAC 4 is reduced, the wait cycle penalty can be reduced when the cache is not hit.

以下、読み出し対象であるメモリデバイスデータの先頭のワードのアドレスがＣＰＵコア２あるいはＤＭＡＣ４から入力されるタイミングで先読みを行う場合（以下、「先行先読み処理」と言う。）の動作について説明する。
図１０は、先行先読み処理の動作を示す状態遷移図である。
図１０において、情報処理装置１は、状態Ｐ１〜Ｐ４および状態Ｐ５，６を遷移し、それぞれの状態間を遷移するための遷移条件Ｇ１〜Ｇ１４が定められている。 Hereinafter, an operation when prefetching is performed at the timing when the address of the first word of the memory device data to be read is input from the CPU core 2 or the DMAC 4 (hereinafter referred to as “preceding prefetch processing”) will be described.
FIG. 10 is a state transition diagram showing the operation of the prefetching process.
In FIG. 10, the information processing apparatus 1 transitions between states P1 to P4 and states P5 and P6, and transition conditions G1 to G14 for transitioning between the states are defined.

なお、図１０における状態Ｐ１〜Ｐ４および遷移条件Ｇ２〜Ｇ１２は、図４における状態Ｓ１〜Ｓ４および遷移条件Ｃ２〜Ｃ１２とそれぞれ同様であるため説明を省略し、異なる部分についてのみ説明する。
状態Ｐ５（ST-PREREAD-IDLE）は、タイミングを遅らせるためのアイドル状態である。即ち、ブロックの先頭のワードのアドレスがＣＰＵコア２あるいはＤＭＡＣ４から入力されるタイミングで先読みを行う場合、読み出し対象となるデータをプロセッサ先読みバッファ３４あるいはＤＭＡ先読みバッファ３５に取り込むタイミングが早すぎてしまう“ずれ”を解消するために、一定サイクルのアイドル状態が挿入されるものである。 Note that states P1 to P4 and transition conditions G2 to G12 in FIG. 10 are the same as states S1 to S4 and transition conditions C2 to C12 in FIG.
The state P5 (ST-PREREAD-IDLE) is an idle state for delaying the timing. That is, when prefetching is performed at the timing when the address of the first word of the block is input from the CPU core 2 or the DMAC 4, the timing for fetching the data to be read into the processor prefetch buffer 34 or the DMA prefetch buffer 35 is too early. In order to eliminate the “shift”, an idle state of a certain cycle is inserted.

状態Ｐ６（ST-PREREAD-EXE）においては、データ・メモリ５０からプロセッサ先読みバッファ３４あるいはＤＭＡ先読みバッファ３５にデータが転送される。
遷移条件Ｇ１（CND-PRA-F-START）は、状態Ｐ１において、読み込み対象であるデータの先頭のワード（１６進数で表されたアドレスの末尾が“０”のワード）のアドレスが、ＣＰＵコア２あるいはＤＭＡＣ４から入力されることを意味している。 In the state P6 (ST-PREREAD-EXE), data is transferred from the data memory 50 to the processor prefetch buffer 34 or the DMA prefetch buffer 35.
In the transition condition G1 (CND-PRA-F-START), in the state P1, the address of the first word of the data to be read (the word whose address is expressed as a hexadecimal number with “0” at the end) is the CPU core. 2 or DMAC4.

遷移条件Ｇ１３（CND-PRA-READ-START）は、状態Ｐ５において、読み込み対象であるデータの末尾のワード（１６進数で表されたアドレスの末尾が“Ｃ”のワード）のアドレスが、プロセッサから入力されることを意味している。
遷移条件Ｇ１４（CND-PRA-READ-END）は、データ・メモリ５０からプロセッサ先読みバッファ３４あるいはＤＭＡ先読みバッファ３５へのデータの転送が終了することを意味している。 In the transition condition G13 (CND-PRA-READ-START), in the state P5, the address of the last word of the data to be read (the word with the end of the address represented by the hexadecimal number “C”) is sent from the processor. It means that it is input.
The transition condition G14 (CND-PRA-READ-END) means that data transfer from the data memory 50 to the processor prefetch buffer 34 or the DMA prefetch buffer 35 is completed.

図１０に示すように各状態を遷移する結果、情報処理装置１は、例えば、上述の図６〜図９に対応する動作を行う。
なお、図４〜図１０の説明においては、読み出し命令の場合の動作について説明したが、書き込み命令が入力された場合にも、プロセッサ書き込みバッファ３２あるいはＤＭＡ書き込みバッファ３３が用いられる点を除き、ほぼ同様の動作となるため、ここでは説明を省略する。 As a result of the transition of each state as illustrated in FIG. 10, the information processing apparatus 1 performs operations corresponding to, for example, the above-described FIGS.
In the description of FIGS. 4 to 10, the operation in the case of a read command has been described. However, even when a write command is input, the processor write buffer 32 or the DMA write buffer 33 is used except for the point. Since it becomes the same operation | movement, description is abbreviate | omitted here.

以上のように、本実施の形態に係る情報処理装置１は、ローカルメモリ３とメインメモリ７との間でＣＰＵコア２を介さずにデータの転送を行う場合にも、キャッシュメモリ３を介してデータの転送が行われる。キャッシュメモリ３においては、読み出し対象であるデータがＣＰＵコア２あるいはＤＭＡＣ４から読み出されている際に、引き続いて読み出されると予想されるデータがキャッシュされているか否か（データ・メモリ５０に記憶されているか否か）を検出する。そして、キャッシュメモリ３は、引き続いて読み出されると予想されるデータがキャッシュに記憶されている場合、そのデータを先読みキャッシュ部３０に記憶し、引き続いて読み出されると予想されるデータがキャッシュに記憶されていない場合、そのデータをメインメモリ７から読み出し、先読みキャッシュ部３０に記憶する。その後、引き続くサイクルでＣＰＵコア２あるいはＤＭＡＣ４から実際に読み出されたデータのアドレスが、先読みキャッシュ部３０に記憶されたデータのアドレスと一致する場合、そのデータを先読みキャッシュ部３０からＣＰＵコア２あるいはＤＭＡＣ４に出力する。なお、引き続くサイクルでプロセッサから実際に読み出されたデータのアドレスが、先読みキャッシュ部３０に記憶されたデータのアドレスと一致しない場合には、その時点でメインメモリ７にアクセスする。 As described above, the information processing apparatus 1 according to the present embodiment also uses the cache memory 3 to transfer data between the local memory 3 and the main memory 7 without using the CPU core 2. Data transfer is performed. In the cache memory 3, when the data to be read is read from the CPU core 2 or the DMAC 4, whether the data expected to be read subsequently is cached (stored in the data memory 50). Or not) is detected. Then, when data that is expected to be read subsequently is stored in the cache, the cache memory 3 stores the data in the prefetch cache unit 30, and the data expected to be read subsequently is stored in the cache. If not, the data is read from the main memory 7 and stored in the prefetch cache unit 30. Thereafter, when the address of the data actually read from the CPU core 2 or the DMAC 4 in the subsequent cycle matches the address of the data stored in the prefetch cache unit 30, the data is transferred from the prefetch cache unit 30 to the CPU core 2 or Output to DMAC4. If the address of the data actually read from the processor in the subsequent cycle does not match the address of the data stored in the prefetch cache unit 30, the main memory 7 is accessed at that time.

そのため、ＤＭＡＣ４から読み出しあるいは書き込み対象であるデータのアドレスが入力された場合に、キャッシュメモリ３は、タグ・テーブル４０およびデータ・メモリ５０の各ウェイに常にはアクセスする必要がなく、データ・メモリ５０に読み出しあるいは書き込み対象であるデータが記憶されている場合にのみアクセスすれば足りる。
したがって、ＣＰＵコア２を介することなくローカルメモリ３とメインメモリ７との間でデータ転送を行う場合に、キャッシュメモリ３に記憶されたデータとのコヒーレンシを保証しつつ、消費電力の増加、あるいは、処理効率の低下を防止することが可能となる。 Therefore, when the address of data to be read or written is input from the DMAC 4, the cache memory 3 does not always have to access each way of the tag table 40 and the data memory 50, and the data memory 50 It is sufficient to access only when data to be read or written is stored in.
Therefore, when data is transferred between the local memory 3 and the main memory 7 without going through the CPU core 2, an increase in power consumption is ensured while ensuring coherency with the data stored in the cache memory 3, or It is possible to prevent a decrease in processing efficiency.

また、情報処理装置１は、読み出し対象であるデータの末尾のワードのアドレスが入力されているタイミングで、予定データの先読みを行う。
したがって、引き続くサイクルで読み出される確率の高いデータを先読みキャッシュ部３０に記憶しておくことができるため、無駄なデータに対するアクセスを行う事態を低減でき、消費電力の低減を図ることができる。 In addition, the information processing apparatus 1 pre-reads the scheduled data at the timing when the address of the last word of the data to be read is input.
Therefore, since data having a high probability of being read in the subsequent cycle can be stored in the prefetch cache unit 30, it is possible to reduce access to useless data and to reduce power consumption.

一方、情報処理装置１は、読み出し対象であるデータの先頭のワードのアドレスが入力されるタイミング等、末尾のワードのアドレスが入力されるタイミングより早く予定データの先読みを行うことも可能である。
この場合、より早いタイミングでキャッシュのヒットが検出されるため、キャッシュにヒットしない場合に、メインメモリ７から読み出し対象であるデータを読み込む処理をより早く行うことができ、ウェイトサイクルの発生を防止すること、あるいは、ウェイトサイクルの回数を低減することが可能となる。 On the other hand, the information processing apparatus 1 can also pre-read scheduled data earlier than the timing at which the address of the last word is input, such as the timing at which the address of the first word of the data to be read is input.
In this case, since a cache hit is detected at an earlier timing, the process of reading the data to be read from the main memory 7 can be performed earlier when the cache does not hit, thus preventing the occurrence of a wait cycle. Alternatively, the number of wait cycles can be reduced.

なお、情報処理装置１において、クロックゲーティング機能を備えることにより、さらに消費電力を低減することが可能である。
図１１は、情報処理装置１がクロックゲーティング機能を備える場合の構成を示す図である。
図１１において、情報処理装置１は、図２に示す構成に加え、消費電力制御部８０を含んで構成される。 Note that the information processing apparatus 1 can further reduce power consumption by providing a clock gating function.
FIG. 11 is a diagram illustrating a configuration when the information processing apparatus 1 includes a clock gating function.
11, the information processing apparatus 1 includes a power consumption control unit 80 in addition to the configuration shown in FIG.

消費電力制御部８０は、情報処理装置１において動作を行わない部分に対し、クロック信号の供給を停止させる機能を備えている。
図１２は、消費電力制御部８０の構成を示す図である。
図１２において、消費電力制御部８０は、複数ｎ個のメモリそれぞれに対応するクロックゲーティング素子（以下、「ＣＧ素子」と言う。）７１−１〜７１−ｎを含んで構成される。 The power consumption control unit 80 has a function of stopping the supply of a clock signal to a portion that does not operate in the information processing apparatus 1.
FIG. 12 is a diagram illustrating a configuration of the power consumption control unit 80.
In FIG. 12, the power consumption control unit 80 includes clock gating elements (hereinafter referred to as “CG elements”) 71-1 to 71-n corresponding to each of a plurality of n memories.

これらのＣＧ素子８１−１〜８１−ｎには、アクセス管理部２０から、クロック信号の供給を行うか否かを切り替えるための消費電力モード信号ＳＧ１〜ＳＧｎがそれぞれ入力される。アクセス管理部２０は、動作が不要であると判定されたメモリに対してはクロック信号の供給を停止させる消費電力モード信号を出力し、動作を行うと判定されたメモリに対してはクロック信号を供給する消費電力モード信号を出力する。 These CG elements 81-1 to 81-n receive power consumption mode signals SG1 to SGn for switching whether or not to supply a clock signal from the access management unit 20, respectively. The access management unit 20 outputs a power consumption mode signal for stopping the supply of the clock signal to the memory determined to be unnecessary, and outputs the clock signal to the memory determined to be operated. A power consumption mode signal to be supplied is output.

このような構成とすることにより、先読みキャッシュ部３０に読み込むデータが記憶されているデータ・メモリ５０のウェイに対してのみクロック信号を供給するといったことが可能となり、さらに消費電力の低減を図ることができる。 With this configuration, it is possible to supply a clock signal only to the way of the data memory 50 in which data to be read into the prefetch cache unit 30 is stored, and to further reduce power consumption. Can do.

本発明に係るデータ転送装置１Ａを備える情報処理装置１の概略構成を示すブロック図である。It is a block diagram which shows schematic structure of the information processing apparatus 1 provided with 1 A of data transfer apparatuses which concern on this invention. キャッシュメモリ５の機能構成を示すブロック図である。2 is a block diagram showing a functional configuration of a cache memory 5. FIG. タグ・テーブル４０およびデータ・メモリ５０に記憶されるデータの構成を示す図である。FIG. 3 is a diagram showing a configuration of data stored in a tag table 40 and a data memory 50. 情報処理装置１の基本的な動作を示す状態遷移図である。3 is a state transition diagram illustrating basic operations of the information processing apparatus 1. FIG. 情報処理装置１上に構築されるステートマシン“sm-exmem-access”の動作を示す状態遷移図である。FIG. 6 is a state transition diagram showing an operation of a state machine “sm-exmem-access” constructed on the information processing apparatus 1. ＣＰＵコア２あるいはＤＭＡＣ４によって読み出されるデータが、連続して先読みキャッシュにヒットする場合の動作例を示すタイミングチャートである。6 is a timing chart showing an operation example when data read by the CPU core 2 or the DMAC 4 continuously hits a prefetch cache. ＣＰＵコア２あるいはＤＭＡＣ４によって読み出されるデータが、先読みキャッシュにヒットしない場合の動作例を示すタイミングチャートである。It is a timing chart which shows the operation example when the data read by CPU core 2 or DMAC4 do not hit a prefetch cache. ＣＰＵコア２あるいはＤＭＡＣ４によって読み出されるデータが、先読みキャッシュにもキャッシュにもヒットしない場合の動作例を示すタイミングチャートである。It is a timing chart which shows the operation example when the data read by CPU core 2 or DMAC4 do not hit neither a prefetch cache nor a cache. ＣＰＵコア２あるいはＤＭＡＣ４によって読み出されるデータが、連続するアドレスのデータであるにもかからず、キャッシュにヒットしない場合の動作例を示すタイミングチャートである。7 is a timing chart showing an operation example when the data read by the CPU core 2 or the DMAC 4 does not hit the cache even though the data is continuous address data. 先行先読み処理の動作を示す状態遷移図である。It is a state transition diagram which shows the operation | movement of a prefetching process. 情報処理装置１がクロックゲーティング機能を備える場合の構成を示す図である。It is a figure which shows a structure in case the information processing apparatus 1 is provided with a clock gating function. 消費電力制御部８０の構成を示す図である。3 is a diagram illustrating a configuration of a power consumption control unit 80. FIG. 従来のセット・アソシアティブ方式のキャッシュメモリ１００の構成を示す概略図である。1 is a schematic diagram illustrating a configuration of a conventional set-associative cache memory 100. FIG.

Explanation of symbols

１情報処理装置、１Ａデータ転送装置、２ＣＰＵコア、３ローカルメモリ、４ＤＭＡＣ、５キャッシュメモリ、６メモリＩ／Ｆ、７メインメモリ、１０アクセス調停部、２０アクセス管理部、３０先読みキャッシュ部、３１アドレス制御部、３２プロセッサ書き込みバッファ、３３ＤＭＡ書き込みバッファ、３４プロセッサ先読みバッファ、３５ＤＭＡ先読みバッファ、３６外部メモリ先読みバッファ、４０タグ・テーブル、５０データ・メモリ、６０ヒット検出部、７０ＭＵＸ、８０消費電力制御部、８１ＣＧ素子 1 Information processing device, 1A data transfer device, 2 CPU core, 3 local memory, 4 DMAC, 5 cache memory, 6 memory I / F, 7 main memory, 10 access arbitration unit, 20 access management unit, 30 prefetch cache unit, 31 address control unit, 32 processor write buffer, 33 DMA write buffer, 34 processor prefetch buffer, 35 DMA prefetch buffer, 36 external memory prefetch buffer, 40 tag table, 50 data memory, 60 hit detection unit, 70 MUX, 80 Power consumption control unit, 81 CG element

Claims

A first memory provided outside the processor, and provided between the first memory and the processor, and caches at least a part of data stored in the first memory in a plurality of ways. A data transfer device including a cache memory capable of being accessed and a second memory accessible by the processor at a higher speed than the first memory,
By outputting to the cache memory at least one of a data read command from the first memory to the second memory or a data write command from the second memory to the first memory, Access control means for transferring data between the first memory and the second memory without going through the processor;
Cache determination means for determining whether or not scheduled data expected to be accessed after in-access data being read or written by the access control means is cached in any way of the cache memory; ,
When the cache determination means determines that the scheduled data is cached in any way, the way in which the scheduled data is stored is accessed from among the plurality of ways, and the scheduled data is Prefetch cache means for reading and storing;
Including
The read-ahead cache means outputs the stored scheduled data to the access control means when the read command for the scheduled data is input after the accessed data, and the scheduled data after the accessed data A data transfer device that writes data to the scheduled data when a write command for the target is input.

The cache memory includes, for the plurality of ways, address storage means for storing cached data addresses, and data storage means for storing data corresponding to each of the addresses,
The cache determination means determines whether the schedule data is cached by determining whether the address of the schedule data is stored in any way of the address storage means,
2. The prefetch cache unit accesses a way corresponding to a way of the address storage unit storing an address of the scheduled data among a plurality of ways of the data storage unit. Data transfer device.

3. The data transfer apparatus according to claim 1, wherein the scheduled data is data expected to be read or written immediately after the data being accessed.

The data to be accessed by the access control means is configured as a block including a plurality of words, and the block is used as a unit to determine whether the scheduled data is cached or to read or write the scheduled data. The data transfer device according to claim 1, wherein the data transfer device is a data transfer device.

The cache determination means determines whether the scheduled data is cached in response to the access control means instructing reading or writing of the last word among the plurality of words constituting the accessed data. The data transfer apparatus according to claim 4, wherein:

The cache determination unit caches the scheduled data in response to the access control unit instructing to read or write a word preceding the last word among a plurality of words constituting the accessed data. 5. The data transfer apparatus according to claim 4, wherein it is determined whether or not the data transfer is in progress.

When the cache determination unit determines that the scheduled data is cached in any one of the ways, the prefetch cache unit includes: 7. The data transfer according to claim 6, wherein in response to an instruction to read or write the last word, the way in which the scheduled data is stored is accessed to read or write the scheduled data. apparatus.

8. The apparatus according to claim 1, further comprising: a power consumption reduction unit configured to operate a way not related to data access among the plurality of ways in the cache memory with low power consumption. 9. Data transfer device.

9. The data transfer apparatus according to claim 8, wherein the low power consumption means has a clock gating function for controlling a way not related to data access so as not to supply a clock signal.

10. The data transfer apparatus according to claim 1, wherein the cache memory is a set associative cache memory.

The prefetch cache means accesses the memory device to read and store the scheduled data when the cache judging means determines that the scheduled data is not cached in any way of the cache memory. The data transfer device according to claim 1, wherein the data transfer device is a data transfer device.

A first memory provided outside the processor, and provided between the first memory and the processor, and caches at least a part of data stored in the first memory in a plurality of ways. A data transfer method between a cache memory capable of being accessed and a second memory accessible by the processor at a higher speed than the first memory,
By outputting to the cache memory at least one of a data read command from the first memory to the second memory or a data write command from the second memory to the first memory, An access control step for transferring data between the first memory and the second memory without going through the processor;
Cache determination step for determining whether or not the scheduled data that is expected to be accessed after the in-access data being read or written in the access control step is cached in any way of the cache memory When,
In the cache determination step, when it is determined that the scheduled data is cached in any way, the way in which the scheduled data is stored is accessed from among the plurality of ways, and the scheduled data is A read-ahead cache step for reading and storing;
When a read command for the scheduled data is issued after the accessing data, the stored scheduled data is read, and a write command for the scheduled data is issued after the accessing data A data transfer method comprising: writing to the scheduled data.