WO2014068694A1 - 半導体装置及びキャッシュメモリへのデータ先読み方法 - Google Patents
半導体装置及びキャッシュメモリへのデータ先読み方法 Download PDFInfo
- Publication number
- WO2014068694A1 WO2014068694A1 PCT/JP2012/078139 JP2012078139W WO2014068694A1 WO 2014068694 A1 WO2014068694 A1 WO 2014068694A1 JP 2012078139 W JP2012078139 W JP 2012078139W WO 2014068694 A1 WO2014068694 A1 WO 2014068694A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- way
- data
- access
- address
- cache
- Prior art date
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/08—Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
- G06F12/0802—Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
- G06F12/0862—Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches with prefetch
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/08—Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
- G06F12/0802—Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
- G06F12/0864—Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches using pseudo-associative means, e.g. set-associative or hashing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2212/00—Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
- G06F2212/10—Providing a specific technical effect
- G06F2212/1016—Performance improvement
- G06F2212/1021—Hit rate improvement
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2212/00—Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
- G06F2212/28—Using a specific disk cache architecture
- G06F2212/281—Single cache
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2212/00—Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
- G06F2212/60—Details of cache memory
- G06F2212/602—Details relating to cache prefetching
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2212/00—Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
- G06F2212/60—Details of cache memory
- G06F2212/6028—Prefetching based on hints or prefetch instructions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2212/00—Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
- G06F2212/60—Details of cache memory
- G06F2212/6032—Way prediction in set-associative cache
Definitions
- the present invention relates to a data prefetching technique for a cache memory, and more particularly to a technique effective when applied to a semiconductor device having a cache memory.
- cache memory For example, a technique for dividing an instruction cache and a data cache, a technique for configuring a cache memory with a plurality of ways, a technique for prefetching data including an instruction code, and the like. Adopting such a technology is also effective for secure microcomputers, but on the other hand, it is also necessary to satisfy severe restrictions on circuit scale and power consumption.
- Patent Documents 1 and 2 disclose data read-ahead technology in a cache memory.
- Patent Document 1 discloses a technique for reading a data block including data to be accessed when a cache miss occurs and prefetching a set data block in the adjacent direction. Since the set adjacent direction is updated following the access history, the prefetching direction can also be followed.
- Patent Document 2 in a multiple-way set associative cache memory, data that is expected to be read next is predicted in advance from which of a plurality of ways to be read, A technique for preventing an unnecessary way from being accessed is disclosed.
- Patent Document 3 discloses a technique in which data for one page is read into a pre-read cache at a time by a row address system selection operation in a non-volatile memory, and then a part thereof is selected and read out by a column address system operation. Yes.
- a multi-way cache When a multi-way cache is used, only one way may be used for continuous access of a series of data such as decryption operations of encryption, and other ways may not be used. If each way can be provided with a memory array having a large capacity, it is not a big problem. However, if only one block of nonvolatile memory memory capacity (one line) can be mounted on one way, data prefetching is possible. Even if is done, there is no area to store the data to be read ahead. A cache miss always occurs for each line, and overhead occurs due to the cache fill. If a memory array with a capacity of 2 ways is used as 1 way, prefetching can be executed within that 1 way, and no cache miss occurs for each line, but in the case of continuous access from 2 data series, A cache miss occurs for each access.
- the semiconductor device includes a processor, a memory, a plurality of tags, a plurality of ways, and a cache control unit that correspond to each other, and is configured to be capable of the following operations.
- the cache control unit performs the following operation when the first access and the second access are accesses to consecutive addresses that are successively executed, and the second access is an access through the first way.
- the second access is an access in a direction in which the address increases with respect to the first access
- data prefetching is performed for a way having a tag value that is one smaller than the tag value corresponding to the first way.
- data prefetching is performed for a way that is one greater than the tag value corresponding to the first way.
- FIG. 1 is a flowchart illustrating an operation example of the semiconductor device according to the embodiment.
- FIG. 2 is a block diagram illustrating a configuration example of a semiconductor device according to an embodiment.
- FIG. 3 is a flowchart (1/4) illustrating a detailed operation example of the semiconductor device according to the embodiment.
- FIG. 4 is a flowchart (2/4) illustrating a detailed operation example of the semiconductor device according to the embodiment.
- FIG. 5 is a flowchart (3/4) illustrating a detailed operation example of the semiconductor device according to the embodiment.
- FIG. 6 is a flowchart (4/4) illustrating a detailed operation example of the semiconductor device according to the embodiment.
- FIG. 7 is an explanatory diagram illustrating an operation example when the 2-way cache according to the embodiment operates independently.
- FIG. 7 is an explanatory diagram illustrating an operation example when the 2-way cache according to the embodiment operates independently.
- FIG. 10 is a timing chart (1/4) showing an operation example of the semiconductor device according to the comparative example.
- FIG. 11 is a timing chart (2/4) illustrating an operation example of the semiconductor device according to the comparative example.
- FIG. 12 is a timing chart (3/4) illustrating an operation example of the semiconductor device according to the comparative example.
- FIG. 13 is a timing chart (4/4) showing an operation example of the semiconductor device according to the comparative example.
- FIG. 14 is a timing chart (1/3) showing an operation example when a 2-way cache is prefetched as another way as another prefetch target.
- FIG. 15 is a timing chart (2/3) showing an operation example when a 2-way cache is prefetched as another way as another prefetch target.
- FIG. 16 is a timing chart (3/3) showing an operation example when a 2-way cache is prefetched as another way as another prefetch target.
- FIG. 17 is a timing chart (1/2) showing an operation example when the 2-way cache operates independently.
- FIG. 18 is a timing chart (2/2) illustrating an operation example when the 2-way cache operates independently.
- a processor 53
- a memory 55
- a plurality of tags 65
- a plurality of ways capable of storing a plurality of data at successive addresses of the memory using a tag value stored in the tag as a reference address (64)
- a semiconductor device 50
- a cache control unit 63
- Each of the plurality of ways is provided with an address change direction flag (66) indicating the direction in which the two most recent accesses to the way are address increase or decrease.
- the cache control unit is an access to an address in which a first access and a second access are sequentially performed from the processor to the memory (1), and the second access is an access through a first way.
- a predetermined prefetching operation is configured for the second way that satisfies the following conditions.
- the condition has an address change direction flag that matches the address change direction flag corresponding to the first way, and is opposite to the direction indicated by the address change direction flag with respect to the tag value corresponding to the first way. (4) having consecutive tag values.
- the predetermined prefetching operation is prefetching (5) of data indicated by a tag value continuous in the direction indicated by the address change direction flag with respect to the tag value corresponding to the first way.
- the address change direction flag (66) can be configured to be set autonomously by the cache control unit based on the access history to the cache memory.
- the cache control unit when the cache miss occurs, the cache control unit is configured to be able to execute a cache fill in a unit of data length that is 1 / integer of the total data length of data stored in the way.
- the cache control unit includes a valid flag (67) for each unit of the data length of the cache fill in each of the plurality of ways, and when a cache miss occurs, the cache fill target way All the valid flags corresponding to are cleared (23), and the valid flag corresponding to the cache-filled data can be set (25).
- the cache control unit is configured to enable the following operations.
- the address change direction flag corresponding to the second way matches the address change direction flag corresponding to the first way, and the tag value corresponding to the second way corresponds to the tag value corresponding to the first way.
- the address change direction flag continues in the direction opposite to the direction, the data prefetching is performed for the second way after the data prefetching to the area in the first way.
- the address change direction flag corresponding to the second way does not match the address change direction flag corresponding to the first way, or the tag value corresponding to the second way is the tag value corresponding to the first way
- the following prefetching operation is performed. After the prefetching of data to the area in the first way, after the second access in the first way, the first direction opposite to the direction indicated by the address change direction flag corresponding to the first way Prefetch data to the area in 1 way.
- the processor is configured to be able to execute a subsequent access of the second access after the second access is completed and before the cache memory completes the prefetching.
- each of the plurality of ways is configured to be able to cache data at an arbitrary address in the memory.
- the semiconductor device further includes an instruction cache, and the plurality of ways are data caches for the processor.
- ⁇ Data prefetching method> A plurality of tags provided between the processor (53) and the memory (55) and capable of storing a plurality of tags (65) and a plurality of data at successive addresses of the memory using a tag value stored in the tag as a reference address
- the data prefetching method in the cache memory (60) comprising the following way (64), comprising the following steps.
- First step (1) for determining whether or not the first access to the memory from the processor and the second access continuous to the memory are accesses to consecutive addresses.
- the determination result of the first step is an access to a continuous address
- the first access and the second access are accesses by the same tag value for the same way among the plurality of ways, the latest to the way
- a cache hit can be made to either way at all times by an operation in which the other way is prefetched in an empty cycle during a period in which one way is hit.
- the address change direction attribute can be set autonomously by the cache control unit based on the access history to the cache memory.
- each of the plurality of ways includes a valid flag (67) for each unit of the data length of the cache fill, and when a cache miss occurs in the third step, All the corresponding valid flags are cleared (23), and the valid flag corresponding to the cache-filled data is set (25).
- the fourth step is executed after the sixth step, and based on the determination result of the fourth step that is not, the direction opposite to the direction indicated by the address change direction attribute corresponding to the first way
- a data prefetching method further comprising a loop that repeats the sixth step (13) in order to prefetch data in an area in the first way in which the corresponding valid flag is not set.
- a semiconductor device (50) including a processor (53), a memory (55), a plurality of tags (65), a plurality of ways (64), and a cache control unit (63) that correspond to each other 1: 1. It is configured as follows.
- the cache control unit when the second access is an access in a direction in which the address increases relative to the first access, data for a way having a tag value that is 1 smaller than a tag value corresponding to the first way Is read ahead (4, 40).
- data prefetching is performed for a way larger than the tag value corresponding to the first way (4, 45).
- FIG. 2 is a block diagram illustrating a configuration example of a semiconductor device according to an embodiment.
- a cache memory 60 is provided between the processor 53 and the memory 55.
- the memory 55 may be a nonvolatile memory such as a flash memory, and may be any memory allocated to the memory space of the processor 53.
- the cache memory 60 includes an instruction cache 62 and a data cache 61.
- the data cache 61 includes a plurality of tags 65_0 to 65_3 and a plurality of ways 64_0 to 64_3 capable of storing a plurality of data at consecutive addresses in the memory 55 using the tag value stored in the tag as a reference address.
- the tags 65_0 to 65_3 and the ways 64_0 to 64_3 are associated with 1: 1, and data D0 to D7 for 16 addresses, for example, are stored in the way 64_0, starting from the tag value stored in the tag 65_0.
- the number of ways and tags is arbitrary, and at least two sets are sufficient.
- the number of data stored in one way and the data length (bit length) per data are also arbitrary. This can be determined appropriately in consideration of the access unit from the processor 53 and the interface with the nonvolatile memory 55 to be cached.
- FIG. 2 also illustrates other components, which will be described later.
- FIG. 1 is a flowchart showing an operation example of a semiconductor device according to an embodiment.
- step 1 When there is a data read request for accessing the nonvolatile memory 55 from the processor 53, it is determined whether or not the data read request and the immediately preceding data read request are data read requests to consecutive addresses (step 1). . As a result, when it is determined that it is a data read request to a continuous address, when two consecutive data read requests are accesses to the same way with the same tag value, the two most recent accesses to the way are addresses. An address change direction attribute indicating whether the direction of increase or decrease is added (step 2). The way (access target way) is accessed by the current data read request (step 3).
- the presence / absence of another way (a prefetch target way) having a value is determined (step 4).
- the prefetching of data indicated by the tag value continuous in the direction indicated by the address change direction attribute of the access target way is performed on the prefetching target way (step 5).
- Way 0 corresponds to tag 0 (65 — 0)
- way 1 corresponds to tag 1 (65 — 1).
- the address change direction attribute is first given to the way 0 (64_0) (step 2). If the previous access to the way 0 (64_0) is an access to the address A-1, the address is increasing. Conversely, if the previous access is to the address A + 1, the address is It is a decreasing direction. The address change direction attribute is given to each way.
- the tag 0 (65_0) corresponding to the way 0 (64_0) stores the upper address of the address A. If the access to way 0 (64_0) is a cache miss, the upper address of address A is stored in tag 0 (65_0) due to cache fill.
- way 1 64_1
- the determination condition is whether or not the tag value of the access target way has the same address change direction attribute and has a tag value continuous in the direction opposite to the direction indicated by the address change direction attribute (step 4). ).
- the way 1 (64_1) has the same address change direction attribute as the way 0 (64_0) reflects the access history that both ways have been continuously accessed in the same address change direction.
- tag 1 (65_1) corresponding to way 1 (64_1) has a tag value stored in tag 0 (65_0) corresponding to way 0 (64_0) in a direction opposite to the direction indicated by the address change direction attribute. Having continuous tag values indicates that way 1 (64_1) is already accessed. Therefore, if this condition is satisfied, the way 1 (64_1) has been continuously accessed with the same series of data, and there is a high possibility that the way has already been accessed. Therefore, the probability that the way 1 (64_1) is accessed again is low, and it can be determined that the way 1 (64_1) is suitable as a prefetch target following the access to the way 0 (64_0).
- prefetching of data is executed in the direction indicated by the address change direction attribute with way 1 (64_1) as a prefetch target (step 5).
- way 1 (64_1) has already been accessed by a series of accesses of the same data series.
- A′-1 should be stored in the corresponding tag 1 (65_1).
- the way 1 (64_1) is likely to be used to cache another series of data stored in another address area. Therefore, it is not appropriate as a prefetch target, and prefetch is not executed.
- the way is appropriate as a prefetch target based on whether the address change direction attribute and the tag value are continuous. If there is a way suitable for the prefetch target, the access target way and the prefetch target way are combined to function as one way having a prefetch function. On the other hand, if there is no way suitable for the prefetch target, each way is made to function as an independent plurality of ways as an independent way.
- the determination of whether or not to prefetch data is made based on the access history.
- the cache control unit is autonomous without any special control from the outside simply by giving an address change direction attribute indicating the direction. Judgment is made and the way look-ahead function is switched. Since the cache circuit collects and holds the access history as information indicating whether or not the address change direction attribute and the tag value are continuous, the cache circuit can make an autonomous determination.
- the way and the tag are associated with 1: 1 instead of providing a plurality of tags in one way.
- one way is composed of only one line and one tag corresponding thereto.
- a cache for at least two lines it can function as either a 1-way cache or a 2-way cache having a look-ahead function while autonomously switching.
- This embodiment is not a large-scale cache circuit, but a small and low-power cache circuit that has extremely strict requirements for circuit scale and power consumption, such as a secure microcomputer that can be mounted on an IC card or the like. When applied, it is particularly effective.
- FIG. 2 is a block diagram illustrating a configuration example of a semiconductor device according to an embodiment. A more detailed configuration example than that described in the outline of the embodiment will be described.
- the semiconductor device 50 according to the present embodiment is, for example, a microcomputer 50 used for an IC card, which is formed on a single semiconductor substrate using a known semiconductor manufacturing technique, or a nonvolatile memory 55 or the like is formed in a separate chip. It is formed as a multichip module.
- the present embodiment is not limited by the semiconductor device mounting method.
- the microcomputer 50 includes a CPU 53, a DMAC 54, a ROM 56, a RAM 57, and a peripheral module 58 that are connected to each other via an address bus 51 and a data bus 52.
- a DMAC (Direct Memory Access Controller) 54 can access the memory independently of the CPU and execute data transfer based on the transfer parameters set by the CPU 53.
- a ROM (Read Only Memory) 56 and a RAM (Random Access Memory) 57 are memories.
- the ROM 57 stores an instruction code of a program executed by the CPU 53 and initial values of data, and the RAM 57 stores values of variables used in the program. Store.
- the peripheral module 58 is, for example, a timer module or a communication interface module.
- a bus arbitration circuit, an interrupt control circuit, and the like can be provided.
- the nonvolatile memory 55 is connected to the address bus 51 and the data bus 52 via the cache memory 60, and can be accessed from a bus master such as the CPU 53 or the DMAC 54.
- the nonvolatile memory 55 and the cache memory 60 are connected to each other via a dedicated address bus 51_1 and a data bus 52_1.
- the non-volatile memory 55 is a non-volatile memory such as a flash memory, a phase change memory, and a ferroelectric memory, for example.
- the nonvolatile memory 55 can be replaced with any storage element allocated to the memory space of the CPU 53. It may be a volatile memory such as DRAM or SDRAM, or a memory other than a semiconductor.
- the nonvolatile memory 55 includes, for example, a read buffer 78 and a selector 79. The data at the address specified by the upper bits of the address bus 51_1 is read from the memory unit 77 and temporarily stored in the read buffer 78. A part of the data is selected by the selector 79 and output to the data bus 52_1.
- the size of the read buffer 78 coincide with the number of bits of the memory cells connected to one word line in the memory unit 77, the power consumption for accessing the nonvolatile memory 55 can be kept low. Furthermore, the provision of a pre-read cache described in Patent Document 3 is preferable because it can achieve more effective power consumption reduction.
- the read buffer 78 is assumed to be 16 bytes and the data bus 52_1 is assumed to be 4 bytes, although not particularly limited.
- the cache memory 60 includes a data cache 61, an instruction cache 62, and a cache control unit 63.
- the data cache 61 includes four ways 64_0 to 64_3. Each way is associated with a tag 65_0 to 65_3, a decrement flag 66_0 to 66_3, and a valid flag 67_0 to 67_3.
- LRU Least Recently ⁇ Used
- a cache fill is performed on the least recently used (Least Recently Used) way.
- the cache control unit 63 is provided with an address control unit 68 and a data control unit 69, and controls an interface of an access command with the CPU 53 and the like via the address bus 51 and the data bus 52 and an interface with the nonvolatile memory 55. To do.
- the cache control unit 63 further includes an LRU flag 70, an LRU control unit 71, a tag control unit 72, a decrement flag control unit 73, and a valid flag control unit 74, and includes data cache tags 65_0 to 65_3 and decrement flags 66_0 to 66_3. And valid flags 67_0 to 67_3 are controlled.
- the LRU control unit 71, tag control unit 72, decrement flag control unit 73, and valid flag control unit 74 are connected to the internal address bus 75_1 and the internal data bus 76_1.
- Each of the ways 64_0 to 64_3 can hold a plurality of data.
- Data in each way is composed of data with continuous addresses.
- An upper address common to the address of the data stored in the way is stored in the corresponding tag.
- data per address is 1 byte (8 bits)
- the bit length of data constituting a way is 2 bytes (16 bits).
- Each way can store 8 pieces of 2-byte data, and the upper bits excluding the lower 4 bits of the address correspond to the tag value.
- the data bus 52 has a width of 2 bytes.
- the cache can be fully associative.
- the data stored in each way may be any address in the memory space of the CPU 53.
- the decrement flag 66_0 to 66_3 is a flag indicating whether the access to each way 64_0 to 64_3 is an access in an address increasing direction or an access in a decreasing direction with respect to the previous access to the same way.
- the valid flags 67_0 to 67_3 are flags that indicate whether or not the data of the respective ways 64_0 to 64_3 is valid.
- a 1-bit flag is provided for each cache fill data size. For example, assuming that the data D0 to D7 constituting the ways 64_0 to 64_3 are 2 bytes (16 bits), respectively, and when a cache fill is performed every 4 bytes (32 bits), a 1 bit flag is set every 4 bytes.
- Each flag is composed of a total of 4 bits V0 to V4.
- addresses are allocated for each byte, if a 2-byte data read request is used, an even number of consecutive addresses is called a continuous address and if it is a 4-byte data read request, A continuous address in multiples of 4 is called a continuous address.
- the bit length of the cache fill can be 4 bytes, for example. Accordingly, the internal data bus 76_1 and the interface circuits of the LRU control unit 71, the tag control unit 72, the decrement flag control unit 73, and the valid flag control unit 74 connected thereto may be provided with 4 bytes each. Thereby, a circuit scale can be suppressed and power consumption can be reduced. Further, it is possible to determine cache hit / miss and output data or perform cache fill in units of 4 bytes stored in the way. On the other hand, the number of data provided for each way can be matched with the number of data connected to one word line of the nonvolatile memory.
- the read buffer 78 and the selector 79 provided in the nonvolatile memory 55 are not necessary, and data read from one word line can be filled into the way in a lump, thereby speeding up the cache fill. it can.
- the valid flag at this time can be 1 bit for each way. In addition, a configuration without a valid flag may be possible.
- the bit length of the cache fill can be appropriately designed in consideration of the circuit size, power consumption, and cache fill speed as described above.
- FIG. 2 A detailed operation example of the semiconductor device illustrated in FIG. 2 will be described. 3, 4, 5, and 6 are flowcharts showing detailed operation examples of the semiconductor device according to the embodiment.
- the cache control unit 63 of the cache memory 60 determines whether the bus command is an instruction fetch or not by data access. It is determined whether or not there is a data read request to the nonvolatile memory 55 (step 11). If it is not a data read request to the nonvolatile memory 55, it is determined whether or not all the data areas of all the ways 64_0 to 64_3 of the cache are valid (step 12). It is determined whether or not there is a flag for which 1 is not set among V0 to V4 constituting each of the valid flags 67_0 to 67_3.
- the data is read from the nonvolatile memory 55 and the data area (Step 13).
- a valid flag corresponding to the data area of the written way is set to 1 (step 14). This makes it possible to preliminarily pre-read data in the direction opposite to the pre-reading direction of the way indicated by the same tag (backward data) by using the period when no bus command is issued. The probability of a cache miss can be reduced.
- the cache control unit 63 of the cache memory 60 takes in the address of the bus command (step 15) and reads this time. It is determined whether the address of the request is continuous with the address of the previous read request (step 16). If it is continuous, the process proceeds to step 2 for assigning the address change direction attribute to the way. If it is not continuous, the process proceeds to step 3 for accessing the way. In FIG. 3 and FIG. 4, it is displayed as a connector C2.
- step 2 for assigning the address change direction attribute to the way the decrement flags 66_0 to 66_3 are set and updated. It is determined whether or not the address of the current read request matches the tag of any one way in the cache (step 17). If not, the process proceeds to step 3 for accessing the way. In FIG. 3 and FIG. 4, it is displayed as a connector C2. If there is a way with a matching tag value, it is determined whether the address of the continuous access is the increment direction or the decrement direction (step 18). As a result, if the direction is the increment direction, 0 is set to the decrement flag of the target way (step 19), and if the direction is the decrement direction, 1 is set to the decrement flag of the target way (step 20). Thereafter, the process proceeds to step 3 for accessing the way. In FIG. 3 and FIG. 4, it is displayed as a connector C3.
- step 16 If it is not continuous access (step 16), or even if it is continuous access, and there is no way having a tag value that matches the address of the current read request (step 17), it is transferred to the way via the connector C2. Go to step 3 to access (FIG. 4).
- step 22 it is determined whether or not there is a cache hit (step 22). In the case of a cache hit, the LRU flag 70 is updated so that the hit way is the latest (step 33), and read data is output from the hit way (step 34). If there is a cache miss, the cache fill is performed on the way storing the oldest data indicated by the LRU flag 70.
- step 23 all bits (V0 to V3) of the valid flag of the way storing the oldest data are cleared to 0 (step 23), the data is read from the nonvolatile memory 55, and the data area of the target way (Step 24), and a valid flag corresponding to the data area of the written way is set to 1 (step 25).
- step 26 an access target address is set in the tag of the written way (step 26), and the LRU flag 70 is updated so that the way to which the data has been written is the latest (step 27).
- Read data is output from the way in which the data was written (step 28).
- step 21 Even when the process proceeds to step 3 for accessing the way via the connector 3 through the step 2 for assigning the address change direction attribute to the way shown in FIG. 3, it is first determined whether or not it is a cache hit (see FIG. 3). Step 21). In the case of a cache hit, the LRU flag 70 is updated so that the hit way is the latest (step 33), and read data is output from the hit way (step 34). Even in the case of a cache miss, since it is known in step 17 that there is a way having the same tag value as the address of the current read request, the data area of the target way is read from the nonvolatile memory 55. Data is written (step 29).
- step 35 it is determined whether the data area in the direction indicated by the decrement flag is filled starting from the data area of the accessed way (step 35).
- the decrement flag of the accessed way is 0, it is checked whether any valid flag in the address increasing direction from the accessed data area is cleared to 0.
- the decrement flag of the accessed way is 1, it is checked whether any valid flag in the address decreasing direction from the accessed data area is cleared to 0. If there is a data area in which the valid flag is not set to 1, data is read from the nonvolatile memory 55, written in that data area of the target way (step 36), and the valid flag 70 corresponding to the data area of the written way is stored. 1 is set (step 37). This operation is repeated until valid data is written (filled) in all data areas starting from the accessed data area and extending to the end of the way in the direction indicated by the decrement flag.
- step 4 the way having the tag value continuous in the opposite direction to the direction indicated by the decrement flag is searched for in which the latest way and the decrement flag value match (step 4). If not, the process returns to step 11 for determining the presence or absence of data read to the nonvolatile memory 55 (shown as a flow connected by the connector C1 in the figure). On the other hand, if there is a way 64 that satisfies such conditions, the process proceeds to the data prefetching step shown in FIG. 6 via the connector C5.
- step 38 the value of the decrement flag of the latest way and the way that satisfies the above condition is checked (step 38). If both the value of the decrement flag of the latest way and the way that satisfies the above condition are 0, the process proceeds to step 39.
- a way that satisfies the above condition is a way that is determined as a way suitable for writing prefetched data.
- the tag value of the way that matches the condition is updated to the tag value of the latest way + 1 line (step 39).
- step 43 If the data area of the pre-read target way is not filled in the direction indicated by the decrement flag, that is, the address increasing direction, the address of the data read from the nonvolatile memory is incremented (step 43), and the next data is stored in the nonvolatile memory 55. (Step 40), and 1 is set to the valid flag of the corresponding data area (step 41). The operations from step 40 to step 43 are repeated until the data area of the prefetch target way is completely filled in the direction indicated by the decrement flag (step 42).
- step 38 The value of the decrement flag of the latest way and the way that matches the above condition is checked (step 38). If both the value of the decrement flag of the latest way and the way that satisfies the above condition are 1, proceed to step 44.
- a way that satisfies the above conditions is also a way that is determined as a way suitable for writing prefetched data.
- the tag value of the way that matches the above condition is updated to the value of the latest way tag value minus one line (step 44). If the data area of the pre-read target way is not filled in the direction indicated by the decrement flag, that is, the address decreasing direction, the address of the data read from the nonvolatile memory is decremented (step 48), and the next data is stored in the nonvolatile memory 55. (Step 45), and 1 is set in the valid flag 67 of the corresponding data area (step 46). The operations from step 45 to step 48 are repeated until the data area of the prefetch target way is completely filled in the direction indicated by the decrement flag
- step 42 and 47 When data is filled in all data areas of the way to be prefetched (steps 42 and 47), the process returns to step 11 for determining whether or not data is read from the nonvolatile memory 55 (in the figure, as a flow connected by the connector C1). Show).
- a continuous cache hit occurs in continuous access of a series of data.
- the other two ways can also function as one way for continuing the prefetching, or can function as two independent ways. Switching between functioning as one way to continue prefetching or functioning as two independent ways is performed only by comparing the decrement flag and the tag value.
- the determination of whether or not to prefetch data is made based on the access history.
- the cache control unit makes an autonomous decision without newly receiving control from the outside simply by providing a new decrement flag indicating the direction. Can be switched.
- FIG. 7 is an explanatory diagram illustrating an operation example when the 2-way cache operates independently.
- the case where the number of ways is 2 is illustrated. Even if the number of ways is 3 or more, the same effect is obtained.
- Each of the two ways 64_0 and 64_1 holds eight 2-byte data D0 to D7, and includes corresponding tags 65_0 and 65_1 and decrement flags 66_0 and 66_1.
- 1-bit valid flags V0 to V3 are provided for two pieces of data (4 bytes).
- the LRU flag 70 indicates the way that was most recently accessed, either WAY 0 or WAY 1.
- One address is assigned to one byte of data, and eight pieces of 2-byte data D0 to D7 correspond to data for 16 addresses.
- the tag stores an address value higher than the lower 4 bits as a tag value.
- D0 stores data corresponding to an address obtained by adding 0x0 to the lower 4 bits below the tag value on the basis of the tag value.
- D7 store data corresponding to addresses obtained by adding 0x2, 0x4, 0x8,. 7, 8, and 9, the case where a data read at address 0x002008 is requested is taken as an example, the state before access is shown on the upper side of each figure, and the state after access is shown on the lower side of each figure Are schematically shown.
- the tag value corresponding to the address 0x002008 is 0x00200, and the data of the address 0x002008 is stored at the position D4.
- “0x” is a symbol indicating that the subsequent numerical value is in hexadecimal notation.
- the tag value corresponding to the address 0x002008 is 0x00200, which is different from the tag values stored in the tags 65_0 and 65_1, and is thus determined as a cache miss (steps 17 and 22). Since the LRU flag 70 indicates the way 0, the cache fill is performed on the way 0 (steps 23 to 28).
- the valid flags V0 to V3 of way 0 (64_0) are all cleared to 0 (step 23). Data is read from nonvolatile memory address 0x002008 and written to D4 and D5 of way 0 (64_0) (step 24).
- the corresponding valid flag V2 is set to 1 (step 25).
- 0x00200 is set to the tag 65_0 (step 26).
- the LRU flag is changed to WAY 1 (step 27).
- Data is output from D4 of way 0 (64_0) that has been cache-filled (step 28).
- prefetching of data proceeds to D6 and D7.
- 1 is set to V3 of the valid flag 67_0.
- the condition determination in step 4 is performed.
- the decrement flags 66_0 and 66_1 coincide with each other in the increasing direction.
- the tag 65_1 of the way 1 is 0x00324, and is decrementing in the opposite direction to the decrement flags 66_0 and 66_1.
- the tag value 0x00200 of the way 0 is different from the continuous tag value 0x001FF. Therefore, it is determined that the way 1 is not appropriate as a prefetch target (step 4).
- the data of way 1 (64_1) is retained, and a cache hit occurs when there is an access to valid data D0 to D3 that matches the tag 65_1.
- Way 1 (64_1) functions as a way independent of way 0 (64_0).
- the way 0 is hit.
- the tag value corresponding to the address 0x002008 is 0x00200 and matches the tag value stored in the tag 65_0, so the decrement flag is updated (steps 17 to 20). If the continuous access address is in the increment direction, the value of the decrement flag 66_0 is set to 0 (step 19), and if it is in the decrement direction, it is set to 1 (step 20). In FIG. 8, the increment direction is used.
- a cache hit / miss is determined (step 21). Since the tag 65_0 matches but the data corresponding to the address 0x002008 should be stored, V2 of the valid flag 67_0 of D4 is 0, so the data corresponding to the address 0x002008 is not stored in the way 0 (64_0). . Therefore, it is determined as a cache miss. Data of 4 bytes is read from the non-volatile memory 55 from address 0x002008, written to D4 and D5 (step 29), and 1 is set to V2 of the corresponding valid flag 67_0 (step 30). The LRU flag 70 is updated to WAY 1 so that the accessed way 0 is the latest (step 31). Data is output from D4 of way 0 (64_0) that has been cache-filled (step 32).
- prefetching of data proceeds to D6 and D7.
- 1 is set to V3 of the valid flag 67_0.
- the condition determination in step 4 is performed.
- the tag 65_1 of the way 1 is 0x001FF, which is in the opposite direction to the increase direction of the address indicated by the decrement flags 66_0 and 66_1 and matches the tag value 0x00200 of the way 0 and the continuous tag value 0x001FF. It is determined that the target is appropriate (step 4).
- the decrement flags 66_0 and 66_1 of the way 0 that is the latest way and the way 1 that satisfies the condition as the prefetch target are both 0 (step 38).
- the value of the tag 65_1 of the way 1 that satisfies the condition to be prefetched is updated to 0x00201, which is the tag value 0x00200 + 1 of the tag 65_0 of the latest way 0 (step 39).
- step 43 the data is sequentially read from the nonvolatile memory 55 while incrementing the address (step 43) until all the data D0 to D7 of way 1 are filled (step 42), and the address 0x002010 is stored in the data D0 to D7 of way 1 Data of .about.0x00201F is written (step 40), and V0 to V3 of the corresponding valid flag 67_1 are sequentially updated to 1 (step 41).
- way 1 functions as a prefetch target of way 0.
- way 0 and way 1 function alternately as an access target and a prefetch target, and no cache miss occurs.
- Fig. 9 shows an example of the operation when the continuous access direction is the address decreasing direction.
- D2 and D3 of way 0 64_0
- 1 is set in the valid flag V1, and the other valid flags V0, V2, and V3 are cleared to 0.
- the tag 65_0 stores 0x00200.
- the valid flags V0 and V1 are set to 1, and the other valid flags V2 and V3 are cleared to 0.
- the tag 65_1 stores 0x00201.
- the decrement flags 66_0 and 66_1 are both set to 1.
- prefetching of data proceeds to D0 and D1.
- 1 is set to V0 of the valid flag 67_0.
- the condition determination in step 4 is performed.
- the tag 65_1 of the way 1 is 0x00201, which is the reverse direction of the address indicated by the decrement flags 66_0 and 66_1 and matches the tag value 0x00200 of the way 0 and the continuous tag value 0x00201, so the way 1 is prefetched. It is determined that the target is appropriate (step 4).
- the decrement flags 66_0 and 66_1 of the way 0 that is the latest way and the way 1 that satisfies the condition for the prefetch target are both 1 (step 38).
- the value of the tag 65_1 of the way 1 that satisfies the condition as a prefetch target is updated to 0x001FF, which is the tag value 0x00200-1 of the tag 65_0 of the latest way 0 (step 44).
- the data is sequentially read from the nonvolatile memory 55 while incrementing the address (step 48) until all the data D0 to D7 of the way 1 are filled (step 47), and the address 0x001FFF is read into the data D0 to D7 of the way 1 Data of .about.0x001FF0 is written (step 45), and V0 to V3 of the corresponding valid flag 67_1 are sequentially updated to 1 (step 46).
- the way 1 functions as a prefetch target of the way 0 as in the case of the continuous access in which the address described with reference to FIG.
- way 0 and way 1 function alternately as an access target and a prefetch target, and no cache miss occurs.
- FIGS. 10 to 13 are timing charts showing an operation example of the semiconductor device according to the comparative example.
- FIGS. 14 to 16 are timing charts showing an operation example in the case where the 2-way caches are mutually prefetched and perform prefetching as one way.
- FIGS. 17 to 18 are timing charts showing an operation example when the 2-way cache operates independently.
- the horizontal axis is a time axis and is expressed in order from T1 in units of clock cycles.
- a clock, a CPU access request, an address issued by the CPU, and read data to the CPU are shown in order from the top. Next, the operation of the cache memory 60 is shown.
- An access request of the cache control circuit 63, an address issued from the cache memory 60 to the nonvolatile memory 55, and read data output from the nonvolatile memory 55 to the cache memory 60 are shown.
- the internal states of way 0 and way 1 are shown.
- the decrement flags 66_0 and 66_1 the value of the tag [23: 4] (65_0 and 65_1), and the values of the data D0 to D7 are shown in order from the top.
- the value of the LRU flag 70 is shown.
- the CPU 53 requests a 2-byte data read from the address 0x00200C in the cycle T1.
- the value of the tag 65_0 of the way 0 is 0x01219
- the value of the tag 65_1 of the way 1 is 0x001FF
- the LRU flag 70 points to the way 0. Since there is no tag that matches the tag value 0x00200 corresponding to the address 0x00200C for which data read is requested, a cache miss occurs. Assume that two cycles are required to determine a cache miss, and the cache memory 60 outputs the address 0x00200C to the nonvolatile memory 55 in cycle T3.
- the target of the cache fill is way 0 indicated by the LRU flag 70.
- the LRU flag 70 is updated to way 1, and in cycle T4, the tag 65_0 of way 0 is updated to 0x00200.
- Data reading is executed in the nonvolatile memory 55, but assuming that three cycles are required, the value @ 00200C of the address 0x00200C is read at cycle T6, and written to D6 / D7 of way 0 (64_0) at cycle T7. In addition, it is also output to the CPU 53. Since the cache fill is performed in units of 4 bytes, 4-byte data from addresses 0x00200C to 0x00200F is written to D6 and D7 of way 0 (64_0). Since the data read request from the CPU 53 is 2 bytes, 2 bytes stored in D6 are returned to the CPU 53.
- the CPU 53 requests a 2-byte data read from the address 0x00200E in cycle T8.
- the data at address 0x00200E is written to D7 of way 0 (64_0) by the previous cache fill. Since the data read request in cycle T1 is a request for 2-byte data read from address 0x00200C, the data read request in cycle T1 is a request for continuous reading in the address increasing direction.
- the decrement flag 66_0 of way 0 is maintained as 0 indicating the address increasing direction. Since the data at address 0x00200E is stored in D7 of way 0 (64_0), it is a cache hit and is output as read data of the CPU as @ 00200E in cycle T9.
- cycle T12 the CPU 53 requests a 2-byte data read from 0x002010 which is the next continuous address. Since the data at address 0x002010 is not cached, a cache miss occurs. The cache control circuit does not issue an access request until then (cycles T9 to T13).
- the cache memory 60 outputs the address 0x002010 to the nonvolatile memory 55 at cycle T14.
- the target of the cache fill is way 1 indicated by the LRU flag 70.
- the LRU flag 70 is updated to way 0, and in cycle T15, the tag 65_1 of way 1 is updated to 0x00201.
- data reading is executed in 3 cycles.
- the value @ 002010 at address 0x002010 is read out and written to D0 / D1 of way 1 (64_1) at cycle T18. Also output to the CPU 53.
- FIGS. 14 to 16 are timing charts showing an operation example in a case where the 2-way caches are prefetched as other prefetch targets and prefetched as one way.
- the CPU 53 requests a 2-byte data read from the address 0x00200C in the cycle T1.
- the value of the tag 65_0 of the way 0 is 0x01219
- the value of the tag 65_1 of the way 1 is 0x001FF
- the LRU flag 70 points to the way 0. Since there is no tag that matches the tag value 0x00200 corresponding to the address 0x00200c for which data read is requested, a cache miss occurs. Assume that two cycles are required to determine a cache miss, and the cache memory 60 outputs the address 0x00200C to the nonvolatile memory 55 in cycle T3.
- the target of the cache fill is way 0 indicated by the LRU flag 70.
- the LRU flag 70 is updated to way 1
- the tag 65_0 of way 0 is updated to 0x00200.
- Data reading is executed in the nonvolatile memory 55, but assuming that three cycles are required, the value @ 00200C of the address 0x00200C is read at cycle T6, and D6 / D7 of way 0 (64_0) is displayed at cycle T7 shown in FIG. And output to the CPU 53 together. Since the cache fill is performed in units of 4 bytes, 4-byte data from addresses 0x00200C to 0x00200F is written to D6 and D7 of way 0 (64_0). Since the data read request from the CPU 53 is 2 bytes, 2 bytes stored in D6 are returned to the CPU 53.
- the CPU 53 requests a 2-byte data read from the address 0x00200E in cycle T8. Since the data read request in cycle T1 is a request for 2-byte data read from address 0x00200C, the data read request in cycle T1 is a request for continuous reading in the address increasing direction.
- the decrement flag 66_0 of way 0 is maintained as 0 indicating the address increasing direction.
- the data at address 0x00200E is written to D7 of way 0 (64_0) by the previous cache fill. Since the data at address 0x00200E is stored in D7 of way 0 (64_0), it is a cache hit and is output as read data of the CPU as @ 00200E in cycle T9.
- the data prefetching operation shown in the flowcharts of FIGS. 3 to 6 is started.
- the cache fill generated by the 2-byte data read request from address 0x00200C of cycle T1 the data area in the direction indicated by decrement flag 66_0 starting from D6 / D7 of way 0 is filled (step 35).
- the process proceeds to a determination step as to whether there is a way suitable for prefetching.
- a way is searched for which has the same decrement flag value as that of the latest way 0 and a tag value continuous in the direction opposite to the direction indicated by the decrement flag (step 4).
- the decrement flags 66_0 and 66_1 both coincide with each other indicating the address increasing direction, the tag value of the way 1 is 0x001FF, the address decreasing direction is opposite to the direction indicated by the decrement flags 66_0 and 66_1, and the tag value of the way 0 is 0x00200. Since they are continuous, it is determined that the way 1 is suitable as a prefetch target. In cycle T8, the tag 65_1 is updated to 0x00201 obtained by adding 1 to 0x00200 (step 39).
- cycle T12 the CPU 53 requests a 2-byte data read from 0x002010 which is the next continuous address.
- 0x002010 since the data at the address 0x002010 has been read ahead in way 1, a cache hit occurs and is read out in cycle T13 shown in FIG.
- a cache miss occurs and the data at the address 0x002010 is read at cycle T18.
- the occurrence of a cache miss is suppressed and the execution cycle is shortened by 5 cycles.
- a 2-byte data read is executed in 2 cycles when a cache hit occurs, and 7 cycles are required when a cache miss occurs.
- the instruction fetch and the data read cycle can be executed even during the data prefetching period.
- the cycle for performing the look-ahead can be hidden.
- FIGS. 17 to 18 are timing charts showing an operation example when the 2-way cache operates independently.
- the CPU 53 requests a 2-byte data read from the address 0x002008 in the cycle T1.
- the value of the tag 65_0 of the way 0 is 0x01219
- the value of the tag 65_1 of the way 1 is 0x0121A
- the LRU flag 70 points to the way 0. Since there is no tag that matches the tag value 0x00200 corresponding to the address 0x002008 for which data read is requested, a cache miss occurs. Assume that two cycles are required to determine a cache miss, and the cache memory 60 outputs the address 0x002008 to the nonvolatile memory 55 in cycle T3.
- the target of the cache fill is way 0 indicated by the LRU flag 70.
- cycle T3 the LRU flag 70 is updated to way 1, and in cycle T4, the tag 65_0 of way 0 is updated to 0x00200.
- Data reading is executed in the nonvolatile memory 55, but assuming that three cycles are required, the value @ 002008 at address 0x002008 is read at cycle T6, and D4 / D5 of way 0 (64_0) at cycle T7 shown in FIG. And output to the CPU 53 together.
- the CPU 53 requests a 2-byte data read from address 0x00200A in cycle T8. Since the data read request in cycle T1 is a request for 2-byte data read from address 0x002008, the data read request in cycle T1 is a request for continuous reading in the address increasing direction.
- the decrement flag 66_0 of way 0 is maintained as 0 indicating the address increasing direction. Since the data at address 0x00200A has been written to D5 of way 0 (64_0) by the previous cache fill, it is a cache hit, and is output as read data of the CPU as @ 00200A at cycle T9.
- step 35 address 0x00200C is issued to nonvolatile memory 55 at cycle T6, and data @ 0x00200c read at cycle T7 shown in FIG. 18 is written to data area D6 / D7 at cycle T8 (step 36). Thereafter, the process proceeds to step 4.
- the cache control circuit of the present embodiment simply refers to the tag value and the decrement flag to determine whether to function as a 2-way cache or whether the two ways collectively function as a prefetch target. It is possible to switch autonomously.
- the present invention is not limited to the data cache embodiment, and can be applied to an instruction cache and a unified cache.
- the number of ways is not limited to two, and can be arbitrarily determined.
- the number of cycles used for determination of cache hit / miss can be appropriately designed as appropriate, and the number of cycles for reading data from the nonvolatile memory. May be any value.
- the present invention relates to a technique for prefetching data into a cache memory, and can be widely applied to a semiconductor device equipped with a cache memory.
- Step of determining whether or not consecutive access addresses are continuous 2 Step of assigning an address change direction attribute (decrement flag) to the way 3 Step of accessing the way 4 Step of determining whether or not the read-ahead operation is suitable 5 Steps for prefetching 50 Semiconductor device (secure microcomputer) 51 Address bus 52 Data bus 53 Processor (CPU) 54 DMA (Direct Memory Access) controller 55 Non-volatile memory 56 ROM (Read Only Memory) 57 RAM (Random Access Memory) 58 Peripheral Module 60 Cache Memory 61 Data Cache 62 Instruction Cache 63 Cache Control Unit 64 Way 65 Tag 66 Decrement Flag 67 Valid Flag 68 Address Control Unit 69 Data Control Unit 70 LRU (Least Recently Used) Flag 71 LRU Control Unit 72 Tag Control Unit 73 Decrement Flag Control Unit 74 Valid Flag Control Unit 75 Cache Control Unit Internal Address Bus 76 Cache Control Unit Internal Data Bus 77 Memory Unit 78 Read Buffer 79 Selector
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Memory System Of A Hierarchy Structure (AREA)
Abstract
Description
先ず、本願において開示される発明の代表的な実施の形態について概要を説明する。代表的な実施の形態についての概要説明で括弧を付して参照する図面中の参照符号はそれが付された構成要素の概念に含まれるものを例示するに過ぎない。
プロセッサ(53)と、メモリ(55)と、複数のタグ(65)と、前記タグに格納されるタグ値を基準アドレスとして前記メモリの連続するアドレスの複数のデータをそれぞれ格納可能な複数のウェイ(64)と、キャッシュ制御部(63)を備える半導体装置(50)であって、以下の通り構成される。
項1において、前記キャッシュ制御部は、前記第2アクセスが前記第1アクセスと同一のウェイで且つ同一のタグ値へのアクセスであるときに、前記第2アクセスの対象アドレスが前記第1アクセスの対象アドレスより小さいとき、前記アドレス変化方向フラグをセット(2)可能に構成される。
項1において、前記キャッシュ制御部は、キャッシュミスのとき、前記ウェイに格納されるデータの合計のデータ長の整数分の1のデータ長の単位で、キャッシュフィルを実行可能に構成される。
項3において、前記キャッシュ制御部は、前記複数のウェイのそれぞれに、前記キャッシュフィルのデータ長の単位ごとに、有効フラグ(67)を備え、キャッシュミスが発生したとき、キャッシュフィルの対象のウェイに対応する前記有効フラグを全てクリアし(23)、キャッシュフィルしたデータに対応する前記有効フラグをセット(25)可能に構成される。
項4において、前記キャッシュ制御部は、前記第2アクセスの後、前記第1ウェイに対応するアドレス変化方向フラグによって示される方向で、対応する前記有効フラグがセットされていない前記第1ウェイ内の領域へのデータの先読み(12、13、14)を可能に構成される。
項5において、前記キャッシュ制御部は、以下の動作を可能に構成される。
項1において、前記プロセッサは、前記第2アクセスの終了後、前記キャッシュメモリが前記先読みを完了する前に、前記第2アクセスの後続のアクセスを実行可能に構成される。
項1において、前記複数のウェイのそれぞれは、前記メモリの任意のアドレスのデータをキャッシュすることが可能に構成される。
項1において、前記半導体装置は命令キャッシュをさらに備え、前記複数のウェイは、前記プロセッサに対するデータキャッシュである。
プロセッサ(53)とメモリ(55)の間に設けられ、複数のタグ(65)と前記タグに格納されるタグ値を基準アドレスとして前記メモリの連続するアドレスの複数のデータをそれぞれ格納可能な複数のウェイ(64)とを備える、キャッシュメモリ(60)におけるデータ先読み方法であって、以下の各ステップを含んで構成される。
項10において、前記第3ステップがキャッシュヒットかキャッシュミスかに関わらず、前記第4ステップは前記第3ステップの後に実行される(21、22…C2…4)。前記第3ステップがキャッシュミスであるときは、前記第3ステップは前記第1ウェイのタグ値を更新し(26)、前記第5ステップは更新された前記第1ウェイのタグ値に基づいて前記先読みを行うか否かの判定を行う。
項10において、前記第1ステップは、前記第2アクセスが前記第1アクセスと同一のウェイで且つ同一のタグ値へのアクセスであるときに(17)、前記第2アクセスの対象アドレスが前記第1アクセスの対象アドレスより小さいとき(18)、前記アドレス変化方向属性としてアドレス減少方向をセットする(20)。
項10において、前記第3ステップは、キャッシュミスのとき、前記ウェイに格納されるデータの合計のデータ長の整数分の1のデータ長の単位で、キャッシュフィルを行う。
項13において、前記複数のウェイのそれぞれは、前記キャッシュフィルのデータ長の単位ごとに、有効フラグ(67)を備え、前記第3ステップでキャッシュミスが発生したとき、キャッシュフィルの対象のウェイに対応する前記有効フラグを全てクリアし(23)、キャッシュフィルしたデータに対応する前記有効フラグをセットする(25)。
項14において、前記第3ステップの後、前記第1ウェイに対応するアドレス変化方向属性によって示される方向で、対応する前記有効フラグがセットされていない前記第1ウェイ内の領域へのデータの先読みを行う、第6ステップ(13)をさらに含む。
項15において、前記第6ステップの後で前記第4ステップを実行し、前記第4ステップの無の判定結果に基づいて、前記第1ウェイに対応するアドレス変化方向属性によって示される方向と逆の方向で、対応する前記有効フラグがセットされていない前記第1ウェイ内の領域へのデータの先読みを行うために、第6ステップ(13)を繰り返すループをさらに含む、データ先読み方法。
プロセッサ(53)と、メモリ(55)と、互いに1:1対応する複数のタグ(65)と複数のウェイ(64)とキャッシュ制御部(63)を備える半導体装置(50)であって、以下のように構成される。
実施の形態について更に詳述する。なお、発明を実施するための形態を説明するための全図において、同一の機能を有する要素には同一の符号を付して、その繰り返しの説明を省略する。
図2は、一実施形態に係る半導体装置の構成例を表すブロック図である。
図2は、一実施形態に係る半導体装置の構成例を表すブロック図である。実施形態の概要で説明したよりも、より詳細な構成例について説明する。本実施形態に係る半導体装置50は、例えば、ICカードに用いられるマイコン50であって、単一の半導体基板上に公知の半導体製造技術を用いて形成され、或いは、不揮発メモリ55などを別チップとしたマルチチップモジュールとして形成される。本実施形態は、半導体装置の実装方法によって制限されるものではない。
図2に示した半導体装置の詳細な動作例について説明する。図3、図4、図5及び図6は、一実施形態に係る半導体装置の詳細な動作例を表すフローチャートである。
本実施形態の動作例について説明する。図7は、2ウェイのキャッシュが独立に動作する場合の動作例を表す説明図である。図8は、2ウェイのキャッシュが互いに他の先読み対象となって1ウェイとして、アドレスがインクリメント方向(デクリメントフラグ=0)に先読み動作する場合の動作例を表す説明図である。図9は、2ウェイのキャッシュが互いに他の先読み対象となって1ウェイとして、アドレスがデクリメント方向(デクリメントフラグ=1)に先読み動作する場合の動作例を表す説明図である。それぞれウェイ数が2の場合を例示する。ウェイ数が3以上であっても、同様に作用する。2個のウェイ64_0、64_1はそれぞれ、8個の2バイトデータD0~D7を保持し、対応するタグ65_0、65_1、及び、デクリメントフラグ66_0、66_1を備える。データ2個(4バイト)に対して1ビットのバリッドフラグV0~V3が設けられている。LRUフラグ70は、WAY0またはWAY1のいずれか最も過去にアクセスされたウェイを示す。アドレスは1バイトのデータに対して1アドレスが割り付けられており、8個の2バイトデータD0~D7は16アドレス分のデータに相当する。タグには下位4ビットよりも上位のアドレス値がタグ値として格納される。D0はタグ値を基準にタグ値の下の下位4ビットに0x0を付加したアドレスに対応するデータが格納される。D2,D3,D4・・・D7は、それぞれ、タグ値に0x2、0x4、0x8、・・・0xEを付加したアドレスに対応するデータが格納される。図7、図8、図9のそれぞれにおいて、アドレス0x002008のデータリードが要求された場合を例に採り、各図の上側にはアクセス前の状態を、各図の下側にはアクセス後の状態を、それぞれ模式的に示す。アドレス0x002008に対応するタグ値は、0x00200であり、アドレス0x002008のデータは、D4の位置に格納されることになる。ここで、「0x」は後続の数値が16進表記であることを示す記号である。
本実施形態の動作例についてタイミングチャートを使ってさらに詳しく説明する。
2 ウェイにアドレス変化方向属性(デクリメントフラグ)を付与するステップ
3 ウェイにアクセスするステップ
4 先読み動作に適合するか否かを判定するステップ
5 先読みを実行するステップ
50 半導体装置(セキュアマイコン)
51 アドレスバス
52 データバス
53 プロセッサ(CPU)
54 DMA(Direct Memory Access)コントローラ
55 不揮発メモリ
56 ROM(Read Only Memory)
57 RAM(Random Access Memory)
58 周辺モジュール
60 キャッシュメモリ
61 データキャッシュ
62 命令キャッシュ
63 キャッシュ制御部
64 ウェイ
65 タグ
66 デクリメントフラグ
67 バリッドフラグ
68 アドレス制御部
69 データ制御部
70 LRU(Least Recently Used)フラグ
71 LRU制御部
72 タグ制御部
73 デクリメントフラグ制御部
74 バリッドフラグ制御部
75 キャッシュ制御部内部アドレスバス
76 キャッシュ制御部内部データバス
77 メモリ部
78 リードバッファ
79 セレクタ
Claims (17)
- プロセッサと、メモリと、複数のタグと、前記タグに格納されるタグ値を基準アドレスとして前記メモリの連続するアドレスの複数のデータをそれぞれ格納可能な複数のウェイと、キャッシュ制御部を備える半導体装置であって、
前記複数のウェイのそれぞれに、前記ウェイに対する直近の2回のアクセスがアドレスの増加または減少のどちらかの方向を示すアドレス変化方向フラグを設け、
前記キャッシュ制御部は、前記プロセッサから前記メモリに対して連続して行う第1アクセスと第2アクセスが互いに連続するアドレスへのアクセスであり、前記第2アクセスが第1ウェイを介するアクセスであり、前記第1ウェイに対応するアドレス変化方向フラグと一致するアドレス変化方向フラグを持ち、前記第1ウェイに対応するタグ値に対して、前記アドレス変化方向フラグが示す方向と逆方向で連続するタグ値を持つ、第2ウェイに対して、前記第1ウェイに対応するタグ値に対して、前記アドレス変化方向フラグが示す方向で連続するタグ値で示されるデータの先読みを可能に構成される、半導体装置。 - 請求項1において、前記キャッシュ制御部は、前記第2アクセスが前記第1アクセスと同一のウェイで且つ同一のタグ値へのアクセスであるときに、前記第2アクセスの対象アドレスが前記第1アクセスの対象アドレスより小さいとき、前記アドレス変化方向フラグをセット可能に構成される、半導体装置。
- 請求項1において、前記キャッシュ制御部は、キャッシュミスのとき、前記ウェイに格納されるデータの合計のデータ長の整数分の1のデータ長の単位で、キャッシュフィルを実行可能に構成される、半導体装置。
- 請求項3において、前記キャッシュ制御部は、前記複数のウェイのそれぞれに、前記キャッシュフィルのデータ長の単位ごとに、有効フラグを備え、キャッシュミスが発生したとき、キャッシュフィルの対象のウェイに対応する前記有効フラグを全てクリアし、キャッシュフィルしたデータに対応する前記有効フラグをセット可能に構成される、半導体装置。
- 請求項4において、前記キャッシュ制御部は、前記第2アクセスの後、前記第1ウェイに対応するアドレス変化方向フラグによって示される方向で、対応する前記有効フラグがセットされていない前記第1ウェイ内の領域へのデータの先読みを可能に構成される、半導体装置。
- 請求項5において、前記キャッシュ制御部は、前記第2ウェイに対応するアドレス変化方向フラグと前記第1ウェイに対応するアドレス変化方向フラグとが一致し、前記第2ウェイに対応するタグ値が前記第1ウェイに対応するタグ値に対して、前記アドレス変化方向フラグが示す方向と逆方向で連続するときに、前記第1ウェイ内の領域へのデータの先読みの後に、前記第2ウェイに対するデータの先読みを可能に構成され、
前記第2ウェイに対応するアドレス変化方向フラグと前記第1ウェイに対応するアドレス変化方向フラグとが一致せず、または、前記第2ウェイに対応するタグ値が前記第1ウェイに対応するタグ値に対して、前記アドレス変化方向フラグが示す方向と逆方向で連続しないときに、前記第1ウェイ内の領域へのデータの先読みの後に、前記第1ウェイ内の前記第2アクセスの後、前記第1ウェイに対応するアドレス変化方向フラグによって示される方向と逆方向の、前記第1ウェイ内の領域へのデータの先読みを可能に構成される、半導体装置。 - 請求項1において、前記プロセッサは、前記第2アクセスの終了後、前記キャッシュメモリが前記先読みを完了する前に、前記第2アクセスの後続のアクセスを実行可能に構成される、半導体装置。
- 請求項1において、前記複数のウェイのそれぞれは、前記メモリの任意のアドレスのデータをキャッシュすることが可能に構成される、半導体装置。
- 請求項1において、命令キャッシュをさらに備え、前記複数のウェイは、前記プロセッサに対するデータキャッシュである、半導体装置。
- プロセッサとメモリの間に設けられ、複数のタグと前記タグに格納されるタグ値を基準アドレスとして前記メモリの連続するアドレスの複数のデータをそれぞれ格納可能な複数のウェイとを備える、キャッシュメモリにおけるデータ先読み方法であって、
前記プロセッサから前記メモリに対する第1アクセスと連続する第2アクセスが互いに連続するアドレスへのアクセスであるか否かを判定する第1ステップと、
前記第1ステップの判定結果が連続アドレスへのアクセスであるとき、前記第1アクセスと前記第2アクセスが前記複数のウェイのうちの同一ウェイに対する同一タグ値によるアクセスであるとき、前記ウェイに対する直近の2回のアクセスがアドレスの増加または減少のどちらの方向かを示すアドレス変化方向属性を付与する第2ステップと、
前記第2アクセスによって第1ウェイをアクセスする第3ステップと、
前記第1ウェイと同一のアドレス変化方向属性を持ち、前記第1ウェイのタグ値に対して、前記同一のアドレス変化方向属性が示す方向と逆方向に連続するタグ値を持つ第2ウェイの有無を判定する第4ステップと、
前記第4ステップの有の判定結果に基づいて、前記第2ウェイを対象として、前記アドレス変化方向属性が示す方向で連続するタグ値で示されるデータの先読みを行う、第5ステップとを含む、データ先読み方法。 - 請求項10において、前記第3ステップがキャッシュヒットかキャッシュミスかに関わらず、前記第4ステップは前記第3ステップの後に実行され、前記第3ステップがキャッシュミスであるときは、前記第3ステップは前記第1ウェイのタグ値を更新し、前記第5ステップは更新された前記第1ウェイのタグ値に基づいて前記先読みを行うか否かの判定を行う、データ先読み方法。
- 請求項10において、前記第1ステップは、前記第2アクセスが前記第1アクセスと同一のウェイで且つ同一のタグ値へのアクセスであるときに、前記第2アクセスの対象アドレスが前記第1アクセスの対象アドレスより小さいとき、前記アドレス変化方向属性としてアドレス減少方向をセットする、データ先読み方法。
- 請求項10において、前記第3ステップは、キャッシュミスのとき、前記ウェイに格納されるデータの合計のデータ長の整数分の1のデータ長の単位で、キャッシュフィルを行う、データ先読み方法。
- 請求項13において、前記複数のウェイのそれぞれは、前記キャッシュフィルのデータ長の単位ごとに、有効フラグを備え、前記第3ステップでキャッシュミスが発生したとき、キャッシュフィルの対象のウェイに対応する前記有効フラグを全てクリアし、キャッシュフィルしたデータに対応する前記有効フラグをセットする、データ先読み方法。
- 請求項14において、前記第3ステップの後、前記第1ウェイに対応するアドレス変化方向属性によって示される方向で、対応する前記有効フラグがセットされていない前記第1ウェイ内の領域へのデータの先読みを行う、第6ステップをさらに含む、データ先読み方法。
- 請求項15において、前記第6ステップの後で前記第4ステップを実行し、前記第4ステップの無の判定結果に基づいて、前記第1ウェイに対応するアドレス変化方向属性によって示される方向と逆の方向で、対応する前記有効フラグがセットされていない前記第1ウェイ内の領域へのデータの先読みを行う、第7ステップをさらに含む、データ先読み方法。
- プロセッサと、メモリと、互いに1:1対応する複数のタグと複数のウェイとキャッシュ制御部を備える半導体装置であって、
前記プロセッサから前記メモリに対する第1アクセスと第2アクセスが互いに連続して実行され、連続するアドレスへのアクセスであり、前記第2アクセスが第1ウェイを介するアクセスであるとき、
前記キャッシュ制御部は、前記第2アクセスが前記第1アクセスに対してアドレスが増加する方向でのアクセスであるときには、前記第1ウェイに対応するタグ値よりも1小さいタグ値を持つウェイに対するデータの先読みを可能とし、前記第2アクセスが前記第1アクセスに対してアドレスが減少する方向でのアクセスであるときには、前記第1ウェイに対応するタグ値よりも1大きいウェイに対するデータの先読みを可能に構成される、半導体装置。
Priority Applications (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US14/439,355 US9892049B2 (en) | 2012-10-31 | 2012-10-31 | Semiconductor device and method for prefetching to cache memory |
JP2014544114A JP5901787B2 (ja) | 2012-10-31 | 2012-10-31 | 半導体装置及びキャッシュメモリへのデータ先読み方法 |
PCT/JP2012/078139 WO2014068694A1 (ja) | 2012-10-31 | 2012-10-31 | 半導体装置及びキャッシュメモリへのデータ先読み方法 |
US15/864,479 US20180150399A1 (en) | 2012-10-31 | 2018-01-08 | Semiconductor device and method for prefetching to cache memory |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/JP2012/078139 WO2014068694A1 (ja) | 2012-10-31 | 2012-10-31 | 半導体装置及びキャッシュメモリへのデータ先読み方法 |
Related Child Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US14/439,355 A-371-Of-International US9892049B2 (en) | 2012-10-31 | 2012-10-31 | Semiconductor device and method for prefetching to cache memory |
US15/864,479 Continuation US20180150399A1 (en) | 2012-10-31 | 2018-01-08 | Semiconductor device and method for prefetching to cache memory |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2014068694A1 true WO2014068694A1 (ja) | 2014-05-08 |
Family
ID=50626664
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/JP2012/078139 WO2014068694A1 (ja) | 2012-10-31 | 2012-10-31 | 半導体装置及びキャッシュメモリへのデータ先読み方法 |
Country Status (3)
Country | Link |
---|---|
US (2) | US9892049B2 (ja) |
JP (1) | JP5901787B2 (ja) |
WO (1) | WO2014068694A1 (ja) |
Families Citing this family (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10120808B2 (en) * | 2016-04-22 | 2018-11-06 | Arm Limited | Apparatus having cache memory disposed in a memory transaction path between interconnect circuitry and a non-volatile memory, and corresponding method |
JP2018106227A (ja) * | 2016-12-22 | 2018-07-05 | ルネサスエレクトロニクス株式会社 | キャッシュメモリ装置及び半導体装置 |
US11151043B2 (en) | 2019-08-12 | 2021-10-19 | Micron Technology, Inc. | Demand delay and data value correlated memory pre-fetching systems and methods |
US11741248B2 (en) * | 2019-08-20 | 2023-08-29 | Bank Of America Corporation | Data access control using data block level encryption |
KR20210066631A (ko) * | 2019-11-28 | 2021-06-07 | 삼성전자주식회사 | 메모리에 데이터를 기입하기 위한 장치 및 방법 |
US11977738B2 (en) * | 2022-09-06 | 2024-05-07 | Arm Limited | Allocation of store requests |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2001195304A (ja) * | 2000-01-13 | 2001-07-19 | Hitachi Ltd | キャッシュ記憶装置 |
WO2005091146A1 (ja) * | 2004-03-24 | 2005-09-29 | Matsushita Electric Industrial Co., Ltd. | キャッシュメモリ及びその制御方法 |
JP2008217825A (ja) * | 2008-04-30 | 2008-09-18 | Univ Waseda | マルチプロセッサ |
JP2010073029A (ja) * | 2008-09-19 | 2010-04-02 | Toshiba Corp | 命令キャッシュシステム |
Family Cites Families (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH0675853A (ja) | 1992-08-25 | 1994-03-18 | Oki Electric Ind Co Ltd | キャッシュメモリ装置 |
JP4374956B2 (ja) | 2003-09-09 | 2009-12-02 | セイコーエプソン株式会社 | キャッシュメモリ制御装置およびキャッシュメモリ制御方法 |
JP2007206806A (ja) * | 2006-01-31 | 2007-08-16 | Matsushita Electric Ind Co Ltd | キャッシュ観測装置、プロセッサの解析方法およびキャッシュメモリ |
US7529889B2 (en) * | 2006-08-14 | 2009-05-05 | Arm Limited | Data processing apparatus and method for performing a cache lookup in an energy efficient manner |
US7958317B2 (en) * | 2008-08-04 | 2011-06-07 | International Business Machines Corporation | Cache directed sequential prefetch |
US8762649B2 (en) * | 2010-03-29 | 2014-06-24 | Via Technologies, Inc. | Bounding box prefetcher |
JP2012038385A (ja) | 2010-08-06 | 2012-02-23 | Renesas Electronics Corp | データ処理装置 |
KR101788245B1 (ko) * | 2011-02-25 | 2017-11-16 | 삼성전자주식회사 | 다중 포트 캐시 메모리 장치 및 그 구동 방법 |
-
2012
- 2012-10-31 WO PCT/JP2012/078139 patent/WO2014068694A1/ja active Application Filing
- 2012-10-31 US US14/439,355 patent/US9892049B2/en active Active
- 2012-10-31 JP JP2014544114A patent/JP5901787B2/ja not_active Expired - Fee Related
-
2018
- 2018-01-08 US US15/864,479 patent/US20180150399A1/en not_active Abandoned
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2001195304A (ja) * | 2000-01-13 | 2001-07-19 | Hitachi Ltd | キャッシュ記憶装置 |
WO2005091146A1 (ja) * | 2004-03-24 | 2005-09-29 | Matsushita Electric Industrial Co., Ltd. | キャッシュメモリ及びその制御方法 |
JP2008217825A (ja) * | 2008-04-30 | 2008-09-18 | Univ Waseda | マルチプロセッサ |
JP2010073029A (ja) * | 2008-09-19 | 2010-04-02 | Toshiba Corp | 命令キャッシュシステム |
Also Published As
Publication number | Publication date |
---|---|
JP5901787B2 (ja) | 2016-04-13 |
JPWO2014068694A1 (ja) | 2016-09-08 |
US20150293850A1 (en) | 2015-10-15 |
US20180150399A1 (en) | 2018-05-31 |
US9892049B2 (en) | 2018-02-13 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JP5901787B2 (ja) | 半導体装置及びキャッシュメモリへのデータ先読み方法 | |
US10067872B2 (en) | Memory speculation for multiple memories | |
US7139878B2 (en) | Method and apparatus for dynamic prefetch buffer configuration and replacement | |
JP5417879B2 (ja) | キャッシュ装置 | |
US8725987B2 (en) | Cache memory system including selectively accessible pre-fetch memory for pre-fetch of variable size data | |
US20100191918A1 (en) | Cache Controller Device, Interfacing Method and Programming Method Using the Same | |
US11301250B2 (en) | Data prefetching auxiliary circuit, data prefetching method, and microprocessor | |
JP5428687B2 (ja) | メモリ制御装置 | |
US20090177842A1 (en) | Data processing system and method for prefetching data and/or instructions | |
KR102219288B1 (ko) | 캐시 모드 및 메모리 모드 동작을 지원하는 메모리 장치 및 이의 동작 방법 | |
JP2018534666A (ja) | インメモリ処理及び狭幅データポートを備えたコンピュータデバイス | |
EP2524314B1 (en) | System and method to access a portion of a level two memory and a level one memory | |
US7313658B2 (en) | Microprocessor and method for utilizing disparity between bus clock and core clock frequencies to prioritize cache line fill bus access requests | |
EP1941373A1 (en) | Cache with high access store bandwidth | |
JP2007500402A (ja) | 周辺装置アクセス保護付きデータ処理システム | |
US20170147498A1 (en) | System and method for updating an instruction cache following a branch instruction in a semiconductor device | |
JP4024247B2 (ja) | 半導体データプロセッサ | |
US9645825B2 (en) | Instruction cache with access locking | |
JP2011154528A (ja) | データ処理装置 | |
JP4765249B2 (ja) | 情報処理装置およびキャッシュメモリ制御方法 | |
US20090164732A1 (en) | Cache memory system and cache memory control method | |
US11294821B1 (en) | Write-back cache device | |
JP4583981B2 (ja) | 画像処理装置 | |
US20120151150A1 (en) | Cache Line Fetching and Fetch Ahead Control Using Post Modification Information | |
JP2002149488A (ja) | 集積回路装置およびキャッシュメモリの制御方法 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 12887400 Country of ref document: EP Kind code of ref document: A1 |
|
ENP | Entry into the national phase |
Ref document number: 2014544114 Country of ref document: JP Kind code of ref document: A |
|
WWE | Wipo information: entry into national phase |
Ref document number: 14439355 Country of ref document: US |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 12887400 Country of ref document: EP Kind code of ref document: A1 |