US20010008009A1 - Set-associative cache-management method with parallel and single-set sequential reads - Google Patents
Set-associative cache-management method with parallel and single-set sequential reads Download PDFInfo
- Publication number
- US20010008009A1 US20010008009A1 US09/797,644 US79764401A US2001008009A1 US 20010008009 A1 US20010008009 A1 US 20010008009A1 US 79764401 A US79764401 A US 79764401A US 2001008009 A1 US2001008009 A1 US 2001008009A1
- Authority
- US
- United States
- Prior art keywords
- cache
- read
- address
- line
- tag
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/08—Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
- G06F12/0802—Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
- G06F12/0877—Cache access modes
- G06F12/0882—Page mode
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/08—Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
- G06F12/0802—Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
- G06F12/0864—Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches using pseudo-associative means, e.g. set-associative or hashing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2212/00—Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
- G06F2212/10—Providing a specific technical effect
- G06F2212/1028—Power efficiency
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Definitions
- the present invention relates to computers and, more particularly, to a method for managing a set-associative cache.
- a major objective of the present invention is to reduce the average power consumed during single-cycle read operations in a set-associative cache that employs parallel reads.
- main memory is in the form of random-access memory (RAM) modules.
- a processor accesses main memory by asserting an address associated with a memory location.
- a 32-bit address can select any one of up to 2 32 address locations.
- each location holds eight bits, i.e., one “byte” of data, arranged in “words” of four bytes each, arranged in “lines” of four words each. In all, there are 2 30 word locations, and 2 28 line locations.
- Accessing main memory tends to be much faster than accessing disk and tape-based memories; nonetheless, even main memory accesses can leave a processor idling while it waits for a request to be fulfilled.
- a cache can intercept processor requests to main memory and attempt to fulfill them faster than main memory can.
- caches To fulfill processor requests to main memory, caches must contain copies of data stored in main memory. In part to optimize access times, a cache is typically much less capacious than main memory. Accordingly, it can represent only a small fraction of main-memory contents at any given time. To optimize the performance gain achievable by a cache, this small fraction must be selected strategically.
- a cache “miss” i.e., when a request cannot be fulfilled by a cache, the cache fetches an entire line of main memory including the memory location requested by the processor. Addresses near a requested address are more likely than average to be requested in the near future. By fetching and storing an entire line, the cache acquires not only the contents of the requested main-memory location, but also the contents of the main-memory locations that are relatively likely to be requested in the near future.
- a fully-associative cache can store the fetched line in any cache storage location.
- any location not containing valid data is given priority as a target storage location for a fetched line. If all cache locations have valid data, the location with the data least likely to be requested in the near term can be selected as the target storage location. For example, the fetched line might be stored in the location with the least recently used data.
- the fully-associative cache stores not only the data in the line, but also stores the line-address (the most-significant 28 bits) of the address as a “tag” in association with the line of data. The next time the processor asserts a main-memory address, the cache compares that address with all the tags stored in the cache. If a match is found, the requested data is provided to the processor from the cache.
- each cache storage location is given an index which, for example, might correspond to the least-significant line-address bits. For example, in the 32-bit address example, a six-bit index might correspond to address bits 23 - 28 .
- a restriction is imposed that a line fetched from main memory can only be stored at the cache location with an index that matches bits 23 - 28 of the requested address. Since those six bits are known, only the first 22 bits are needed as a tag. Thus, less cache capacity is devoted to tags.
- the processor asserts an address only one cache location (the one with an index matching the corresponding bits of the address asserted by the processor) needs to be examined to determine whether or not the request can be fulfilled from the cache.
- a line fetched in response to a cache miss must be stored at the one location having an index matching the index portion of the read address. Previously written data at that location is overwritten. If the overwritten data is subsequently requested, it must be fetched from main memory. Thus, a directed-mapped cache can force the overwritting of data that may be likely to be requested in the near future. The lack of flexibility in choosing the data to be overwritten limits the effectiveness of a direct-mapped cache.
- a set-associative cache has memory divided into two or more direct-mapped sets. Each index is associated with one memory location in each set. Thus, in a four-way set associative cache, there are four cache locations with the same index, and thus, four choices of locations to overwrite when a line is stored in the cache. This allows more optimal replacement strategies than are available for direct-mapped caches. Still, the number of locations that must be checked, e.g., one per set, to determine whether a requested location is represented in the cache is quite limited, and the number of bits that need to be compared is reduced by the length of the index. Thus, set-associative caches combine some of the replacement strategy flexibility of a fully-associative cache with much of the speed advantage of a direct-mapped cache.
- the index portion of an asserted address identifies one cache-line location within each cache set.
- the tag portion of the asserted address can be compared with the tags at the identified cache-line locations to determine whether there is a hit (i.e., tag match) and, if so, in what set the hit occurs. If there is a hit, the least-significant address bits are checked for the requested location within the line; the data at that location is then provided to the processor to fulfill the read request.
- a read operation can be hastened by starting the data access before a tag match is determined. While checking the relevant tags for a match, the appropriately indexed data locations within each set are accessed in parallel. By the time a match is determined, data from all four sets are ready for transmission. The match is used, e.g., as the control input to a multiplexer, to select the data actually transmitted. If there is no match, none of the data is transmitted.
- the parallel read operation is much faster since the data is accessed at the same time as the match operation is conducted rather than after.
- a parallel “tag-and-data” read operation might consume only one memory cycle, while a serial “tag-then-data” read operation might require two cycles.
- the serial read operation consumes only one cycle, the parallel read operation permits a shorter cycle, allowing for more processor operations per unit of time.
- the gains of the parallel tag-and-data reads are not without some cost.
- the data accesses to the sets that do not provide the requested data consume additional power that can tax power sources and dissipate extra heat.
- the heat can fatigue, impair, and damage the incorporating integrated circuit and proximal components. Accordingly, larger batteries or power supplies and more substantial heat removal provisions may be required. What is needed is a cache-management method that achieves the speed advantages of parallel reads but with reduced power consumption.
- the present invention provides for preselection of a set from which data is to be read.
- the preselection is based on a tag match with a preceeding read. In this case, it is not necessary to access all sets, but only the preselected set. When only one set is selected, a power saving accrues.
- the invention provides for comparing a present line address with the line address asserted in an immediately preceding read operation. If the line addresses match, a single-set read can be implemented instead of a parallel read.
- the invention provides for checking one or more line locations in a set other than the location used to satisfy a current request for a tag match.
- a tag match at such a “second” location does not result immediately in included data being accessed; instead a flag (or other indicator) is set indicating the tag match. This indication is used in an immediately succeeding read operation to determine whether the second line location can be preselected for a single-set read operation. If the tag portion of the next requested address matches the tag portion of the previously requested address, and the latter was matched by the tag at the second location, a single-set read can be performed.
- the invention has special application to computer systems that have a processor that indicates whether a read address is sequential or non-sequential. By default, e.g., when a read is non-sequential, a parallel read is implemented. If the read is sequential to a previous read that resulted in a cache hit, the type of read can depend on word position within the cache line.
- the tag is unchanged. Thus, a hit at the same index and set is assured. Accordingly, a “same-set” read is used. However, if the word position is at the beginning of a line, the index is different and a different tag may be stored at the indexed location. Accordingly, a parallel read can be used.
- next index location can be checked. This makes use of the tag-match circuitry that would otherwise be idle in the sequential read.
- the tag matching can be limited to only the set selected for the current read; alternatively, all sets can be checked. If the next read is sequential, it will correspond to the beginning of a line. However, the tag matching for this read will already have been completed. Accordingly, a single-set read can be performed.
- the present invention accesses only one set instead of all the sets that are accessed in a parallel read operation. Yet, there is no time penalty associated with the single-set reads provided by the invention. Thus, the power savings of single-set reads are achieved without sacrificing the speed advantages of the parallel reads.
- FIG. 1 is a block diagram of a first computer system including a cache in accordance with the present invention.
- FIG. 2 is a flow chart of the method of the invention as implemented in the cache of FIG. 1.
- FIG. 3 is a block diagram of a second computer system including a cache in accordance with the present invention.
- FIG. 4 is a block diagram showing a cache-controller of the cache of FIG. 3.
- a computer system AP 1 comprises a data processor 10 , main memory 12 , and a cache 20 , as shown in FIG. 1.
- Data processor 10 issues requests along a processor address bus ADP, which includes address lines, a read-write control line, a memory-request line, and a sequential -address signal line.
- ADP processor address bus
- cache 20 can issue requests to memory 12 via memory address bus ADM.
- Data transfers between cache 20 and memory 12 are along memory data bus DTM.
- Cache 20 comprises a processor interface 21 , a memory interface 23 , a cache controller 25 , a read-output multiplexer 27 , and cache memory 30 .
- Cache controller 25 includes a line-address memory 28 and a tag-match flag 29 .
- Cache memory 30 includes four sets S 1 , S 2 , S 3 , and S 4 .
- Set S 1 includes 64 memory locations, each with an associated six-bit index. Each memory location stores a line of data and an associated 22-bit tag. Each line of data holds four 32-bit words of data.
- Cache sets S 2 , S 3 , and S 4 are similar and use the same six-bit indices.
- Line-address memory 28 includes registers for storing a previous line address and the present line address. In addition, line-address memory 28 provides a validity bit for the previous line address. If this bit indicates invalidity, any comparison results in an inequality.
- Step S 1 A involves determining whether or not a cache-related read operation is being asserted. If, for example, a write operation is asserted initially, method M 1 terminates at step S 1 B. An alternative write method is invoked instead.
- a word-wide read operation asserts an address with an index portion of 000010 and a word address portion of 11 (the last word of a line).
- step S 2 A involves determining whether or not the read is a sequential read.
- a read is sequential if the asserted address is the successor to the address asserted in an immediately prior read operation.
- the sequential read is indicated by a corresponding signal level on the sequential read signal line of processor address bus ADP.
- the read is nonsequential; in which case, method M 1 proceeds to step S 3 A.
- Step S 3 A involves comparing the present line address (the asserted address, ignoring the least-significant bits that indicate word position within a cache line and byte position within a word) with the line address of an immediately preceding read operation.
- the validity bit associated with the old line address is set to “invalid” . So during this first iteration, the comparison indicated at step S 4 A is negative. If at any time during a sequence of reads, the data at the line location indicated by the line-address memory is invalid, the validity bit is set to “invalid” and any comparison with a new line address has a negative result.
- the first iteration of comparison step S 4 A has a negative result. Accordingly, the memory locations of all four sets S 1 , S 2 , S 3 , and S 4 with the appropriate indexes are accessed in parallel read step S 5 A. Concurrently, the tags stored at these locations are compared with the tag portion (bits 1 - 22 ) of the asserted address. If there is a match, multiplexer 27 is controlled so that data from the set with the matching tag is provided to processor 10 via processor interface 21 and processor data bus DTP.
- cache 20 fetches the line with the requested data from memory 12 via memory interface 23 .
- Cache 20 asserts the line address via memory address bus ADM and receives the requested data along memory data bus DTM.
- Cache 20 then writes the fetched line to the appropriately indexed location in a selected set in accordance with a replacement algorithm designed to optimize future hits. The read request is then satisfied from the cache location to which the fetched line was written. For this example, assume that the line is stored at set S 1 , index 000010. The four least-significant bits of the asserted read address determine the location within the line from which the requested data is provided to processor 10 .
- the requested line address is stored at step S 6 A.
- the tag portion of this line address is compared to the tag stored in the same set at the next index location.
- the next index location is at set S 1 , index 000011. If the tags match, the tag-match flag 29 is set to “true” ; if the tags do not match, the flag is set to “false” .
- Method M 1 then returns to step S 1 A for a second iteration.
- the index portion is 000010 as in the first iteration, and the word position is 10 (third word position of four).
- the second read operation is non-sequential but the line address is the same.
- the result is negative, but the result of the comparison at S 3 A is positive.
- method M 1 proceeds to same-set read step S 5 B.
- step S 5 B only one set is accessed. That set is the same set that provided the data to processor 10 in the immediately prior read operation.
- set S 1 is accessed to the exclusion of sets S 2 , S 3 , and S 4 . This results in a power savings relative to a parallel read.
- Method M 1 proceeds to step S 6 A overwritting the previous line address with the current line address. (The net result is no change since the new and old line addresses are the same).
- step S 6 B the tag at set S 1 , index 000011, is compared to the tag portion of the requested address. Flag 29 is set accordingly. Again, there is no change because the same comparison is performed in the previous iteration.
- Method M 1 proceeds to step S 1 A for a third iteration.
- the third iteration involves a sequential read of the last word at the same line address as the second read.
- method M 1 proceeds through steps S 1 A and S 2 A to arrive at step S 2 B.
- Step S 2 B involves determining whether the current address points to the start of a line. If a sequential read points to the start of a line, then the previous address pointed to the end of the previous line. Therefore, the sequential read has crossed a line boundary.
- step S 5 B In this illustrative third iteration, a line boundary is not crossed. Accordingly, method M 1 proceeds to step S 5 B, so that only set S 1 is accessed. Method M 1 proceeds through steps S 6 A and S 6 B with no net change in line address or flag. A fourth iteration is begun with a return to step S 1 A.
- step S 2 B the word address bits 00 indicate that the requested data is at the start of a line.
- Step S 3 B involves checking tag-match flag 29 . This was set in the last iteration of step S 6 B. If the tag at set S 1 index 000011 was the same as the tag at set S 1 , index 000010, it was set to true. This means that the sequential read of this fourth iteration can validly cross the line boundary between indices 000010 and 000011 in set S 1 . Thus, method M 1 proceeds to same-set read step S 5 B. On the other hand, if the tags differ, the line boundary cannot be validly crossed. Accordingly, a parallel read is conducted at step S 5 C. (Step S 5 C is the same as step S 5 A.)
- step S 6 A Both steps S 5 B and S 5 C are followed by step S 6 A.
- a new line address (corresponding to the new index 000011) is written at step S 6 A.
- the tag-match flag is re-determined at step S 6 B. In this case, the flag indicates whether the tag at set S 1 at index 000100 matches the tag at 000011.
- An alternative computer system AP 2 comprises a data processor 60 , main memory 62 , and a cache 70 , as shown in FIG. 3.
- Data processor 60 issues requests along a processor address bus A 2 P, which includes address lines, a read-write control line, a memory-request line, and a sequential-address signal line. Data transfers between cache 70 and processor 60 take place along processor data bus D 2 P.
- cache 70 can issue requests to main memory 62 via memory address bus A 2 M. Data transfers between cache 70 and memory 62 are along memory data bus D 2 M.
- Cache 70 comprises a processor interface 71 , a memory interface 73 , a cache controller 75 , a read-output multiplexer 77 , and cache memory 80 .
- Cache memory 80 includes four sets SE, SE 2 , SE 3 , and SE 4 .
- Set SE includes 64 memory locations, each with an associated six-bit index. Each memory location stores a line of data and an associated 22-bit tag. Each line of data holds four 32-bit words of data.
- Cache sets SE 2 , SE 3 , and SE 4 are similar and use the same six-bit indices.
- Computer system AP 2 differs from computer system AP 1 primarily in the arrangement of the respective controllers.
- Controller 75 comprises tag-matching function 79 , a current-address register 81 , a sequential-detect function 83 , a beginning-of-line detect function 85 , an end-of-line detect function 87 , and last-address-type flags 89 .
- Tag-matching function 79 has four flags F 1 , F 2 , F 3 , and F 4 , which correspond respectively to sets SE, SE 2 , SE 3 , and SE 4 . Each flag indicates whether or not there is a tag match of interest for the respective set.
- Last-address-type flags 89 include a flag F 5 that indicates whether or not the last address was sequential and a flag F 6 that indicates whether or not the last address pointed to the end of a cache line.
- Current-address register 81 stores not only the current address, but also control data reflecting the transfer type (sequential or non-sequential) and the transfer width (byte, doublet, or quadlet).
- Register 81 provides the transfer type bit to sequential detect function 83 , the word position bits to beginning-of-line detect function 85 , and word position and transfer width data to end-of-line detect function 87 .
- Each of the detect functions 83 , 85 , and 87 provide their respective detection data to tag-matching function 79 .
- tag-matching function 79 can read last-address-type flags F 5 (sequential?) and F 6 (end-of-line).
- tag-matching function 79 can access cache storage 80 to identify tag matches.
- An iterated method M 2 practiced in the context of cache controller 75 is indicated in the flow chart of FIG. 5.
- a read request is received at step T 1 .
- a determination is made whether the read is sequential or non-sequential at step T 2 . If the read is sequential, the word position within the selected cache line is checked at step T 3 .
- step T 4 A If the word position of a sequential transfer is at the beginning of a cache line, last-address type flags F 5 and F 6 are checked at step T 4 A. If from step T 5 , the previous read request was both sequential and end-of-line, tag match flags F 1 -F 4 are checked at step T 6 A. If there is no match between the tag of the previous address at the cache location with an index one greater than that indicated by the previous address, a parallel read is performed at step T 7 A. If a flag F 1 -F 4 indicates such a match, a one-set read is performed, at step T 7 B, at the incremented index in the set corresponding to the affirmative flag. In an alternative embodiment, there is only one flag that indicates whether there is a match within the same set as in the previous read request.
- end-of-line flag F 6 is set. If the end-of-line read is sequential, sequential-type flag F 5 is set. In the next iteration of method M 2 , these flags can be used at step T 4 A. If the word position of step T 3 is neither beginning of line or end of line, a same-set read is performed at step T 7 C. If at step T 2 , the read is non-sequential, match flags F 1 -F 4 and sequential flag F 5 are reset to negative at step T 6 B. In this case, method M 2 proceeds to a parallel read at step T 7 A.
- tags at a successor index location are only checked when the present read is to the end of a line. This reduces the frequency of such tag checks.
- the asserted word location must be checked to determine whether or not a tag comparison should be made.
- the processor provides for different transfer widths, e.g., byte, doublet, and quadlet (word, in this case)
- the bits to be checked to recognize an end-of-line data request are a function of this width.
- each index there is a flag associated with each index.
- all tags in the set from which a read is provided are compared to the tag portion of the read request.
- the flags are set according to the results.
- the associated flag can be checked. If the flag indicates true, a single-set read can be implemented. Otherwise, a parallel read operation is implemented. This approach reduces the number of parallel reads, but incurs a cost in cache complexity.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Memory System Of A Hierarchy Structure (AREA)
Abstract
Description
- The present invention relates to computers and, more particularly, to a method for managing a set-associative cache. A major objective of the present invention is to reduce the average power consumed during single-cycle read operations in a set-associative cache that employs parallel reads.
- Much of modern progress is associated with the increasing prevalence of computers. In a conventional computer architecture, a data processor manipulates data in accordance with program instructions. The data and instructions are read from, written to, and stored in the computer's “main” memory. Typically, main memory is in the form of random-access memory (RAM) modules.
- A processor accesses main memory by asserting an address associated with a memory location. For example, a 32-bit address can select any one of up to 232 address locations. In this example, each location holds eight bits, i.e., one “byte” of data, arranged in “words” of four bytes each, arranged in “lines” of four words each. In all, there are 230 word locations, and 228 line locations.
- Accessing main memory tends to be much faster than accessing disk and tape-based memories; nonetheless, even main memory accesses can leave a processor idling while it waits for a request to be fulfilled. To minimize such latencies, a cache can intercept processor requests to main memory and attempt to fulfill them faster than main memory can.
- To fulfill processor requests to main memory, caches must contain copies of data stored in main memory. In part to optimize access times, a cache is typically much less capacious than main memory. Accordingly, it can represent only a small fraction of main-memory contents at any given time. To optimize the performance gain achievable by a cache, this small fraction must be selected strategically.
- In the event of a cache “miss” , i.e., when a request cannot be fulfilled by a cache, the cache fetches an entire line of main memory including the memory location requested by the processor. Addresses near a requested address are more likely than average to be requested in the near future. By fetching and storing an entire line, the cache acquires not only the contents of the requested main-memory location, but also the contents of the main-memory locations that are relatively likely to be requested in the near future.
- Where the fetched line is stored within the cache depends on the cache type. A fully-associative cache can store the fetched line in any cache storage location. Typically, any location not containing valid data is given priority as a target storage location for a fetched line. If all cache locations have valid data, the location with the data least likely to be requested in the near term can be selected as the target storage location. For example, the fetched line might be stored in the location with the least recently used data.
- The fully-associative cache stores not only the data in the line, but also stores the line-address (the most-significant 28 bits) of the address as a “tag” in association with the line of data. The next time the processor asserts a main-memory address, the cache compares that address with all the tags stored in the cache. If a match is found, the requested data is provided to the processor from the cache.
- In a fully-associative cache, every cache-memory location must be checked for a tag match. Such an exhaustive match checking process can be time-consuming, making it hard to achieve the access speed gains desired of a cache. Another problem with a fully-associative cache is that the tags consume a relatively large percentage of cache capacity, which is limited to ensure high-speed accesses.
- In a direct-mapped cache, each cache storage location is given an index which, for example, might correspond to the least-significant line-address bits. For example, in the 32-bit address example, a six-bit index might correspond to address bits23-28. A restriction is imposed that a line fetched from main memory can only be stored at the cache location with an index that matches bits 23-28 of the requested address. Since those six bits are known, only the first 22 bits are needed as a tag. Thus, less cache capacity is devoted to tags. Also, when the processor asserts an address, only one cache location (the one with an index matching the corresponding bits of the address asserted by the processor) needs to be examined to determine whether or not the request can be fulfilled from the cache.
- In a direct-mapped cache, a line fetched in response to a cache miss must be stored at the one location having an index matching the index portion of the read address. Previously written data at that location is overwritten. If the overwritten data is subsequently requested, it must be fetched from main memory. Thus, a directed-mapped cache can force the overwritting of data that may be likely to be requested in the near future. The lack of flexibility in choosing the data to be overwritten limits the effectiveness of a direct-mapped cache.
- A set-associative cache has memory divided into two or more direct-mapped sets. Each index is associated with one memory location in each set. Thus, in a four-way set associative cache, there are four cache locations with the same index, and thus, four choices of locations to overwrite when a line is stored in the cache. This allows more optimal replacement strategies than are available for direct-mapped caches. Still, the number of locations that must be checked, e.g., one per set, to determine whether a requested location is represented in the cache is quite limited, and the number of bits that need to be compared is reduced by the length of the index. Thus, set-associative caches combine some of the replacement strategy flexibility of a fully-associative cache with much of the speed advantage of a direct-mapped cache.
- The index portion of an asserted address identifies one cache-line location within each cache set. The tag portion of the asserted address can be compared with the tags at the identified cache-line locations to determine whether there is a hit (i.e., tag match) and, if so, in what set the hit occurs. If there is a hit, the least-significant address bits are checked for the requested location within the line; the data at that location is then provided to the processor to fulfill the read request.
- A read operation can be hastened by starting the data access before a tag match is determined. While checking the relevant tags for a match, the appropriately indexed data locations within each set are accessed in parallel. By the time a match is determined, data from all four sets are ready for transmission. The match is used, e.g., as the control input to a multiplexer, to select the data actually transmitted. If there is no match, none of the data is transmitted.
- The parallel read operation is much faster since the data is accessed at the same time as the match operation is conducted rather than after. For example, a parallel “tag-and-data” read operation might consume only one memory cycle, while a serial “tag-then-data” read operation might require two cycles. Alternatively, if the serial read operation consumes only one cycle, the parallel read operation permits a shorter cycle, allowing for more processor operations per unit of time.
- The gains of the parallel tag-and-data reads are not without some cost. The data accesses to the sets that do not provide the requested data consume additional power that can tax power sources and dissipate extra heat. The heat can fatigue, impair, and damage the incorporating integrated circuit and proximal components. Accordingly, larger batteries or power supplies and more substantial heat removal provisions may be required. What is needed is a cache-management method that achieves the speed advantages of parallel reads but with reduced power consumption.
- The present invention provides for preselection of a set from which data is to be read. The preselection is based on a tag match with a preceeding read. In this case, it is not necessary to access all sets, but only the preselected set. When only one set is selected, a power saving accrues.
- The invention provides for comparing a present line address with the line address asserted in an immediately preceding read operation. If the line addresses match, a single-set read can be implemented instead of a parallel read.
- The invention provides for checking one or more line locations in a set other than the location used to satisfy a current request for a tag match. A tag match at such a “second” location does not result immediately in included data being accessed; instead a flag (or other indicator) is set indicating the tag match. This indication is used in an immediately succeeding read operation to determine whether the second line location can be preselected for a single-set read operation. If the tag portion of the next requested address matches the tag portion of the previously requested address, and the latter was matched by the tag at the second location, a single-set read can be performed.
- The invention has special application to computer systems that have a processor that indicates whether a read address is sequential or non-sequential. By default, e.g., when a read is non-sequential, a parallel read is implemented. If the read is sequential to a previous read that resulted in a cache hit, the type of read can depend on word position within the cache line.
- If the word position is not at the beginning of the cache line, then the tag is unchanged. Thus, a hit at the same index and set is assured. Accordingly, a “same-set” read is used. However, if the word position is at the beginning of a line, the index is different and a different tag may be stored at the indexed location. Accordingly, a parallel read can be used.
- In a further refinement, if a read that is sequential to a read resulting in a hit corresponds to the end of a cache line, the next index location can be checked. This makes use of the tag-match circuitry that would otherwise be idle in the sequential read. The tag matching can be limited to only the set selected for the current read; alternatively, all sets can be checked. If the next read is sequential, it will correspond to the beginning of a line. However, the tag matching for this read will already have been completed. Accordingly, a single-set read can be performed.
- For many read operations, the present invention accesses only one set instead of all the sets that are accessed in a parallel read operation. Yet, there is no time penalty associated with the single-set reads provided by the invention. Thus, the power savings of single-set reads are achieved without sacrificing the speed advantages of the parallel reads. These and other features and advantages of the invention are apparent from the description below with reference to the following drawings.
- FIG. 1 is a block diagram of a first computer system including a cache in accordance with the present invention.
- FIG. 2 is a flow chart of the method of the invention as implemented in the cache of FIG. 1.
- FIG. 3 is a block diagram of a second computer system including a cache in accordance with the present invention.
- FIG. 4 is a block diagram showing a cache-controller of the cache of FIG. 3.
- FIG. 5 is a flow chart of a method of the invention as implemented in the cache of FIG. 3.
- In accordance with the present invention, a computer system AP1 comprises a
data processor 10,main memory 12, and acache 20, as shown in FIG. 1.Data processor 10 issues requests along a processor address bus ADP, which includes address lines, a read-write control line, a memory-request line, and a sequential -address signal line. Data transfers betweencache 20 andprocessor 10 take place along processor data bus DTP. Similarly,cache 20 can issue requests tomemory 12 via memory address bus ADM. Data transfers betweencache 20 andmemory 12 are along memory data bus DTM. -
Cache 20 comprises aprocessor interface 21, amemory interface 23, acache controller 25, a read-output multiplexer 27, andcache memory 30.Cache controller 25 includes a line-address memory 28 and a tag-match flag 29.Cache memory 30 includes four sets S1, S2, S3, and S4. Set S1 includes 64 memory locations, each with an associated six-bit index. Each memory location stores a line of data and an associated 22-bit tag. Each line of data holds four 32-bit words of data. Cache sets S2, S3, and S4 are similar and use the same six-bit indices. - Line-
address memory 28 includes registers for storing a previous line address and the present line address. In addition, line-address memory 28 provides a validity bit for the previous line address. If this bit indicates invalidity, any comparison results in an inequality. - A method M1 implemented by
cache 20 is flow charted in FIG. 2. Step S1A involves determining whether or not a cache-related read operation is being asserted. If, for example, a write operation is asserted initially, method M1 terminates at step S1B. An alternative write method is invoked instead. In an exemplary first iteration of method M1, a word-wide read operation asserts an address with an index portion of 000010 and a word address portion of 11 (the last word of a line). - When a read is asserted, step S2A involves determining whether or not the read is a sequential read. A read is sequential if the asserted address is the successor to the address asserted in an immediately prior read operation. In the case of
processor 10, the sequential read is indicated by a corresponding signal level on the sequential read signal line of processor address bus ADP. In this first iteration of method M1, the read is nonsequential; in which case, method M1 proceeds to step S3A. - Step S3A involves comparing the present line address (the asserted address, ignoring the least-significant bits that indicate word position within a cache line and byte position within a word) with the line address of an immediately preceding read operation. Upon initialization, the validity bit associated with the old line address is set to “invalid” . So during this first iteration, the comparison indicated at step S4A is negative. If at any time during a sequence of reads, the data at the line location indicated by the line-address memory is invalid, the validity bit is set to “invalid” and any comparison with a new line address has a negative result.
- In the example, the first iteration of comparison step S4A has a negative result. Accordingly, the memory locations of all four sets S1, S2, S3, and S4 with the appropriate indexes are accessed in parallel read step S5A. Concurrently, the tags stored at these locations are compared with the tag portion (bits 1-22) of the asserted address. If there is a match,
multiplexer 27 is controlled so that data from the set with the matching tag is provided toprocessor 10 viaprocessor interface 21 and processor data bus DTP. - If there is a miss,
cache 20 fetches the line with the requested data frommemory 12 viamemory interface 23.Cache 20 asserts the line address via memory address bus ADM and receives the requested data along memory data bus DTM.Cache 20 then writes the fetched line to the appropriately indexed location in a selected set in accordance with a replacement algorithm designed to optimize future hits. The read request is then satisfied from the cache location to which the fetched line was written. For this example, assume that the line is stored at set S1,index 000010. The four least-significant bits of the asserted read address determine the location within the line from which the requested data is provided toprocessor 10. - Whether there was a hit or miss, the requested line address is stored at step S6A. In addition, the tag portion of this line address is compared to the tag stored in the same set at the next index location. In this example, the next index location is at set S1,
index 000011. If the tags match, the tag-match flag 29 is set to “true” ; if the tags do not match, the flag is set to “false” . Method M1 then returns to step S1A for a second iteration. - In this example, the index portion is 000010 as in the first iteration, and the word position is10 (third word position of four). Thus, the second read operation is non-sequential but the line address is the same. Thus, at step S2A, the result is negative, but the result of the comparison at S3A is positive. Thus, at step S4A, method M1 proceeds to same-set read step S5B.
- In step S5B, only one set is accessed. That set is the same set that provided the data to
processor 10 in the immediately prior read operation. In this example, set S1 is accessed to the exclusion of sets S2, S3, and S4. This results in a power savings relative to a parallel read. - Method M1 proceeds to step S6A overwritting the previous line address with the current line address. (The net result is no change since the new and old line addresses are the same). At step S6B, the tag at set S1,
index 000011, is compared to the tag portion of the requested address.Flag 29 is set accordingly. Again, there is no change because the same comparison is performed in the previous iteration. - Method M1 proceeds to step S1A for a third iteration. In this example, the third iteration involves a sequential read of the last word at the same line address as the second read. Accordingly, method M1 proceeds through steps S1A and S2A to arrive at step S2B. Step S2B involves determining whether the current address points to the start of a line. If a sequential read points to the start of a line, then the previous address pointed to the end of the previous line. Therefore, the sequential read has crossed a line boundary.
- In this illustrative third iteration, a line boundary is not crossed. Accordingly, method M1 proceeds to step S5B, so that only set S1 is accessed. Method M1 proceeds through steps S6A and S6B with no net change in line address or flag. A fourth iteration is begun with a return to step S1A.
- In this fourth iteration, we assume a sequential read. Since the third read at the third iteration was of the fourth word in a four-word line, the fourth read is to the beginning of the next line (index 000011). Accordingly, in this fourth iteration, method M1 proceeds through steps S1A and S2A to step S2B. In step S2B, the word address bits 00 indicate that the requested data is at the start of a line. When the result of S2B is positive, method M1 proceeds to step S3B.
- Step S3B involves checking tag-
match flag 29. This was set in the last iteration of step S6B. If the tag atset S1 index 000011 was the same as the tag at set S1,index 000010, it was set to true. This means that the sequential read of this fourth iteration can validly cross the line boundary betweenindices - Both steps S5B and S5C are followed by step S6A. A new line address (corresponding to the new index 000011) is written at step S6A. Also, the tag-match flag is re-determined at step S6B. In this case, the flag indicates whether the tag at set S1 at
index 000100 matches the tag at 000011. - In a fifth iteration of method M1, a write operation is assumed. In this case, there is a two-cycle write. As flow charted in FIG. 2, method M1 terminates at step S1B. However, the invention provides for updating the line addresses, as in step S6A, and tag-match flag, as in step S6B, during write operations. When this is done, it is possible for a same-set read to occur immediately after a write operation.
- An alternative computer system AP2 comprises a
data processor 60,main memory 62, and acache 70, as shown in FIG. 3.Data processor 60 issues requests along a processor address bus A2P, which includes address lines, a read-write control line, a memory-request line, and a sequential-address signal line. Data transfers betweencache 70 andprocessor 60 take place along processor data bus D2P. Similarly,cache 70 can issue requests tomain memory 62 via memory address bus A2M. Data transfers betweencache 70 andmemory 62 are along memory data bus D2M. -
Cache 70 comprises aprocessor interface 71, amemory interface 73, acache controller 75, a read-output multiplexer 77, andcache memory 80.Cache memory 80 includes four sets SE, SE2, SE3, and SE4. Set SE includes 64 memory locations, each with an associated six-bit index. Each memory location stores a line of data and an associated 22-bit tag. Each line of data holds four 32-bit words of data. Cache sets SE2, SE3, and SE4 are similar and use the same six-bit indices. - Computer system AP2 differs from computer system AP1 primarily in the arrangement of the respective controllers.
Controller 75 comprises tag-matching function 79, a current-address register 81, a sequential-detectfunction 83, a beginning-of-line detectfunction 85, an end-of-line detectfunction 87, and last-address-type flags 89. Tag-matchingfunction 79 has four flags F1, F2, F3, and F4, which correspond respectively to sets SE, SE2, SE3, and SE4. Each flag indicates whether or not there is a tag match of interest for the respective set. Last-address-type flags 89 include a flag F5 that indicates whether or not the last address was sequential and a flag F6 that indicates whether or not the last address pointed to the end of a cache line. - Current-address register81 stores not only the current address, but also control data reflecting the transfer type (sequential or non-sequential) and the transfer width (byte, doublet, or quadlet). Register 81 provides the transfer type bit to sequential detect
function 83, the word position bits to beginning-of-line detectfunction 85, and word position and transfer width data to end-of-line detectfunction 87. Each of the detectfunctions matching function 79. In addition, tag-matching function 79 can read last-address-type flags F5 (sequential?) and F6 (end-of-line). Finally, tag-matching function 79 can accesscache storage 80 to identify tag matches. - An iterated method M2 practiced in the context of
cache controller 75 is indicated in the flow chart of FIG. 5. A read request is received at step T1. A determination is made whether the read is sequential or non-sequential at step T2. If the read is sequential, the word position within the selected cache line is checked at step T3. - If the word position of a sequential transfer is at the beginning of a cache line, last-address type flags F5 and F6 are checked at step T4A. If from step T5, the previous read request was both sequential and end-of-line, tag match flags F1-F4 are checked at step T6A. If there is no match between the tag of the previous address at the cache location with an index one greater than that indicated by the previous address, a parallel read is performed at step T7A. If a flag F1-F4 indicates such a match, a one-set read is performed, at step T7B, at the incremented index in the set corresponding to the affirmative flag. In an alternative embodiment, there is only one flag that indicates whether there is a match within the same set as in the previous read request.
- If the word position is at the end of a cache line, as determined at step T3, end-of-line flag F6 is set. If the end-of-line read is sequential, sequential-type flag F5 is set. In the next iteration of method M2, these flags can be used at step T4A. If the word position of step T3 is neither beginning of line or end of line, a same-set read is performed at step T7C. If at step T2, the read is non-sequential, match flags F1-F4 and sequential flag F5 are reset to negative at step T6B. In this case, method M2 proceeds to a parallel read at step T7A.
- In system AP2, tags at a successor index location are only checked when the present read is to the end of a line. This reduces the frequency of such tag checks. On the other hand, the asserted word location must be checked to determine whether or not a tag comparison should be made. Where, as in the present case, the processor provides for different transfer widths, e.g., byte, doublet, and quadlet (word, in this case), the bits to be checked to recognize an end-of-line data request are a function of this width. Thus, this embodiment requires additional complexity to avoid superfluous tag matches.
- In another alternative embodiment of the invention, instead of
single flag 29, there is a flag associated with each index. During each read operation, all tags in the set from which a read is provided are compared to the tag portion of the read request. The flags are set according to the results. In a subsequent read with an arbitrary index portion, the associated flag can be checked. If the flag indicates true, a single-set read can be implemented. Otherwise, a parallel read operation is implemented. This approach reduces the number of parallel reads, but incurs a cost in cache complexity. These and other variations upon and modifications to the described embodiments are provided for by the present invention, the scope of which is defined by the following claims.
Claims (7)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US09/797,644 US6338118B2 (en) | 1999-06-21 | 2001-03-01 | Set-associative cache-management method with parallel and single-set sequential reads |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US09/337,089 US6321321B1 (en) | 1999-06-21 | 1999-06-21 | Set-associative cache-management method with parallel and single-set sequential reads |
US09/797,644 US6338118B2 (en) | 1999-06-21 | 2001-03-01 | Set-associative cache-management method with parallel and single-set sequential reads |
Related Parent Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US09/337,089 Division US6321321B1 (en) | 1999-06-21 | 1999-06-21 | Set-associative cache-management method with parallel and single-set sequential reads |
US09/339,089 Division US6299115B1 (en) | 1998-06-22 | 1999-06-22 | Remote control operating system and support structure for a retractable covering for an architectural opening |
Publications (2)
Publication Number | Publication Date |
---|---|
US20010008009A1 true US20010008009A1 (en) | 2001-07-12 |
US6338118B2 US6338118B2 (en) | 2002-01-08 |
Family
ID=23319077
Family Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US09/337,089 Expired - Lifetime US6321321B1 (en) | 1999-06-21 | 1999-06-21 | Set-associative cache-management method with parallel and single-set sequential reads |
US09/797,644 Expired - Lifetime US6338118B2 (en) | 1999-06-21 | 2001-03-01 | Set-associative cache-management method with parallel and single-set sequential reads |
Family Applications Before (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US09/337,089 Expired - Lifetime US6321321B1 (en) | 1999-06-21 | 1999-06-21 | Set-associative cache-management method with parallel and single-set sequential reads |
Country Status (1)
Country | Link |
---|---|
US (2) | US6321321B1 (en) |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20040264284A1 (en) * | 2003-06-27 | 2004-12-30 | Priborsky Anthony L | Assignment of queue execution modes using tag values |
US20080196036A1 (en) * | 2002-02-20 | 2008-08-14 | Agere Systems Inc. | Method and Apparatus for Establishing a Bound on the Effect of Task Interference in a Cache Memory |
US20130282977A1 (en) * | 2012-04-19 | 2013-10-24 | Nec Corporation | Cache control device, cache control method, and program thereof |
US20140173224A1 (en) * | 2012-12-14 | 2014-06-19 | International Business Machines Corporation | Sequential location accesses in an active memory device |
US20190272122A1 (en) * | 2016-11-16 | 2019-09-05 | Huawei Technologies Co., Ltd. | Memory Access Technology |
JP2020144856A (en) * | 2019-03-01 | 2020-09-10 | キヤノン株式会社 | Interface device, data processing device, cache control method, and program |
US11327768B2 (en) * | 2018-12-10 | 2022-05-10 | Fujitsu Limited | Arithmetic processing apparatus and memory apparatus |
US11409655B2 (en) * | 2019-03-01 | 2022-08-09 | Canon Kabushiki Kaisha | Interface apparatus, data processing apparatus, cache control method, and medium |
Families Citing this family (22)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2001075864A (en) * | 1999-09-02 | 2001-03-23 | Fujitsu Ltd | Cache controller |
US6895498B2 (en) * | 2001-05-04 | 2005-05-17 | Ip-First, Llc | Apparatus and method for target address replacement in speculative branch target address cache |
US7200740B2 (en) | 2001-05-04 | 2007-04-03 | Ip-First, Llc | Apparatus and method for speculatively performing a return instruction in a microprocessor |
US7165168B2 (en) * | 2003-01-14 | 2007-01-16 | Ip-First, Llc | Microprocessor with branch target address cache update queue |
US7707397B2 (en) * | 2001-05-04 | 2010-04-27 | Via Technologies, Inc. | Variable group associativity branch target address cache delivering multiple target addresses per cache line |
US7134005B2 (en) * | 2001-05-04 | 2006-11-07 | Ip-First, Llc | Microprocessor that detects erroneous speculative prediction of branch instruction opcode byte |
US7165169B2 (en) | 2001-05-04 | 2007-01-16 | Ip-First, Llc | Speculative branch target address cache with selective override by secondary predictor based on branch instruction type |
US6886093B2 (en) * | 2001-05-04 | 2005-04-26 | Ip-First, Llc | Speculative hybrid branch direction predictor |
US7203824B2 (en) * | 2001-07-03 | 2007-04-10 | Ip-First, Llc | Apparatus and method for handling BTAC branches that wrap across instruction cache lines |
US6823444B1 (en) * | 2001-07-03 | 2004-11-23 | Ip-First, Llc | Apparatus and method for selectively accessing disparate instruction buffer stages based on branch target address cache hit and instruction stage wrap |
US7162619B2 (en) * | 2001-07-03 | 2007-01-09 | Ip-First, Llc | Apparatus and method for densely packing a branch instruction predicted by a branch target address cache and associated target instructions into a byte-wide instruction buffer |
US7234045B2 (en) * | 2001-07-03 | 2007-06-19 | Ip-First, Llc | Apparatus and method for handling BTAC branches that wrap across instruction cache lines |
US7159097B2 (en) * | 2002-04-26 | 2007-01-02 | Ip-First, Llc | Apparatus and method for buffering instructions and late-generated related information using history of previous load/shifts |
US7185186B2 (en) * | 2003-01-14 | 2007-02-27 | Ip-First, Llc | Apparatus and method for resolving deadlock fetch conditions involving branch target address cache |
US7152154B2 (en) * | 2003-01-16 | 2006-12-19 | Ip-First, Llc. | Apparatus and method for invalidation of redundant branch target address cache entries |
US7143269B2 (en) * | 2003-01-14 | 2006-11-28 | Ip-First, Llc | Apparatus and method for killing an instruction after loading the instruction into an instruction queue in a pipelined microprocessor |
US7178010B2 (en) * | 2003-01-16 | 2007-02-13 | Ip-First, Llc | Method and apparatus for correcting an internal call/return stack in a microprocessor that detects from multiple pipeline stages incorrect speculative update of the call/return stack |
US7237098B2 (en) | 2003-09-08 | 2007-06-26 | Ip-First, Llc | Apparatus and method for selectively overriding return stack prediction in response to detection of non-standard return sequence |
US7081897B2 (en) * | 2003-12-24 | 2006-07-25 | Intel Corporation | Unified memory organization for power savings |
US7796137B1 (en) * | 2006-10-24 | 2010-09-14 | Nvidia Corporation | Enhanced tag-based structures, systems and methods for implementing a pool of independent tags in cache memories |
CN104346404B (en) * | 2013-08-08 | 2018-05-18 | 华为技术有限公司 | A kind of method, equipment and system for accessing data |
US10592414B2 (en) * | 2017-07-14 | 2020-03-17 | International Business Machines Corporation | Filtering of redundantly scheduled write passes |
Family Cites Families (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5287482A (en) * | 1989-01-13 | 1994-02-15 | International Business Machines Corporation | Input/output cache |
US5265236A (en) * | 1990-11-29 | 1993-11-23 | Sun Microsystems, Inc. | Method and apparatus for increasing the speed of memory access in a virtual memory system having fast page mode |
US5353424A (en) * | 1991-11-19 | 1994-10-04 | Digital Equipment Corporation | Fast tag compare and bank select in set associative cache |
US5367653A (en) * | 1991-12-26 | 1994-11-22 | International Business Machines Corporation | Reconfigurable multi-way associative cache memory |
US5371870A (en) * | 1992-04-24 | 1994-12-06 | Digital Equipment Corporation | Stream buffer memory having a multiple-entry address history buffer for detecting sequential reads to initiate prefetching |
US5461718A (en) * | 1992-04-24 | 1995-10-24 | Digital Equipment Corporation | System for sequential read of memory stream buffer detecting page mode cycles availability fetching data into a selected FIFO, and sending data without aceessing memory |
US5509119A (en) * | 1994-09-23 | 1996-04-16 | Hewlett-Packard Company | Fast comparison method and apparatus for error corrected cache tags |
US5854911A (en) * | 1996-07-01 | 1998-12-29 | Sun Microsystems, Inc. | Data buffer prefetch apparatus and method |
US5860097A (en) * | 1996-09-23 | 1999-01-12 | Hewlett-Packard Company | Associative cache memory with improved hit time |
KR19990017082A (en) * | 1997-08-21 | 1999-03-15 | 유기범 | Serial Parallel Cache Device |
-
1999
- 1999-06-21 US US09/337,089 patent/US6321321B1/en not_active Expired - Lifetime
-
2001
- 2001-03-01 US US09/797,644 patent/US6338118B2/en not_active Expired - Lifetime
Cited By (16)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20080196036A1 (en) * | 2002-02-20 | 2008-08-14 | Agere Systems Inc. | Method and Apparatus for Establishing a Bound on the Effect of Task Interference in a Cache Memory |
US8191067B2 (en) * | 2002-02-20 | 2012-05-29 | Agere Systems Inc. | Method and apparatus for establishing a bound on the effect of task interference in a cache memory |
US20040264284A1 (en) * | 2003-06-27 | 2004-12-30 | Priborsky Anthony L | Assignment of queue execution modes using tag values |
US7480754B2 (en) * | 2003-06-27 | 2009-01-20 | Seagate Technology, Llc | Assignment of queue execution modes using tag values |
US9268700B2 (en) * | 2012-04-19 | 2016-02-23 | Nec Corporation | Cache control device, cache control method, and program thereof |
US20130282977A1 (en) * | 2012-04-19 | 2013-10-24 | Nec Corporation | Cache control device, cache control method, and program thereof |
US20140173224A1 (en) * | 2012-12-14 | 2014-06-19 | International Business Machines Corporation | Sequential location accesses in an active memory device |
WO2014090092A1 (en) * | 2012-12-14 | 2014-06-19 | International Business Machines Corporation | Sequential location accesses in an active memory device |
US9104532B2 (en) * | 2012-12-14 | 2015-08-11 | International Business Machines Corporation | Sequential location accesses in an active memory device |
US20190272122A1 (en) * | 2016-11-16 | 2019-09-05 | Huawei Technologies Co., Ltd. | Memory Access Technology |
EP3534265A4 (en) * | 2016-11-16 | 2019-10-30 | Huawei Technologies Co., Ltd. | Memory access technique |
US11210020B2 (en) * | 2016-11-16 | 2021-12-28 | Huawei Technologies Co., Ltd. | Methods and systems for accessing a memory |
US11327768B2 (en) * | 2018-12-10 | 2022-05-10 | Fujitsu Limited | Arithmetic processing apparatus and memory apparatus |
JP2020144856A (en) * | 2019-03-01 | 2020-09-10 | キヤノン株式会社 | Interface device, data processing device, cache control method, and program |
US11409655B2 (en) * | 2019-03-01 | 2022-08-09 | Canon Kabushiki Kaisha | Interface apparatus, data processing apparatus, cache control method, and medium |
JP7474061B2 (en) | 2019-03-01 | 2024-04-24 | キヤノン株式会社 | Interface device, data processing device, cache control method, and program |
Also Published As
Publication number | Publication date |
---|---|
US6321321B1 (en) | 2001-11-20 |
US6338118B2 (en) | 2002-01-08 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US6321321B1 (en) | Set-associative cache-management method with parallel and single-set sequential reads | |
US7430642B2 (en) | System and method for unified cache access using sequential instruction information | |
US7165144B2 (en) | Managing input/output (I/O) requests in a cache memory system | |
USRE45078E1 (en) | Highly efficient design of storage array utilizing multiple pointers to indicate valid and invalid lines for use in first and second cache spaces and memory subsystems | |
JP4486750B2 (en) | Shared cache structure for temporary and non-temporary instructions | |
US5465342A (en) | Dynamically adaptive set associativity for cache memories | |
KR100339904B1 (en) | System and method for cache process | |
JP3016575B2 (en) | Multiple cache memory access methods | |
US6976126B2 (en) | Accessing data values in a cache | |
JP2735781B2 (en) | Cache memory control system and method | |
EP2017738A1 (en) | Hierarchical cache tag architecture | |
US6671779B2 (en) | Management of caches in a data processing apparatus | |
JP2008502069A (en) | Memory cache controller and method for performing coherency operations therefor | |
US20100011165A1 (en) | Cache management systems and methods | |
GB2468007A (en) | Data processing apparatus and method dependent on streaming preload instruction. | |
US7761665B2 (en) | Handling of cache accesses in a data processing apparatus | |
US6629206B1 (en) | Set-associative cache-management using parallel reads and serial reads initiated during a wait state | |
US5367657A (en) | Method and apparatus for efficient read prefetching of instruction code data in computer memory subsystems | |
US6434670B1 (en) | Method and apparatus for efficiently managing caches with non-power-of-two congruence classes | |
US5926841A (en) | Segment descriptor cache for a processor | |
US20010029573A1 (en) | Set-associative cache-management method with parallel read and serial read pipelined with serial write | |
US5619673A (en) | Virtual access cache protection bits handling method and apparatus | |
JPH1055276A (en) | Multi-level branching prediction method and device | |
US7325101B1 (en) | Techniques for reducing off-chip cache memory accesses | |
JP3295728B2 (en) | Update circuit of pipeline cache memory |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
STCF | Information on status: patent grant |
Free format text: PATENTED CASE |
|
FPAY | Fee payment |
Year of fee payment: 4 |
|
AS | Assignment |
Owner name: PHILIPS SEMICONDUCTORS VLSI INC., NEW YORK Free format text: CHANGE OF NAME;ASSIGNOR:VLSI TECHNOLOGY, INC.;REEL/FRAME:018635/0570 Effective date: 19990702 Owner name: NXP B.V., NETHERLANDS Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:PHILIPS SEMICONDUCTORS INC.;REEL/FRAME:018645/0779 Effective date: 20061130 |
|
AS | Assignment |
Owner name: PHILIPS SEMICONDUCTORS INC., NEW YORK Free format text: CHANGE OF NAME;ASSIGNOR:PHILIPS SEMICONDUCTORS VLSI INC.;REEL/FRAME:018668/0255 Effective date: 19991220 |
|
FPAY | Fee payment |
Year of fee payment: 8 |
|
CC | Certificate of correction | ||
AS | Assignment |
Owner name: VLSI TECHNOLOGY, INC., CALIFORNIA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:JOHNSON, MARK W.;REEL/FRAME:025436/0285 Effective date: 19990608 |
|
FEPP | Fee payment procedure |
Free format text: PAYER NUMBER DE-ASSIGNED (ORIGINAL EVENT CODE: RMPN); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY Free format text: PAYOR NUMBER ASSIGNED (ORIGINAL EVENT CODE: ASPN); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY |
|
AS | Assignment |
Owner name: SINO MATRIX TECHNOLOGY INC., TAIWAN Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:NXP B.V.;REEL/FRAME:026160/0111 Effective date: 20110225 Owner name: LIYUE CAPITAL, LLC, DELAWARE Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:SINO MATRIX TECHNOLOGY INC.;REEL/FRAME:026160/0125 Effective date: 20110407 |
|
FPAY | Fee payment |
Year of fee payment: 12 |
|
AS | Assignment |
Owner name: INTELLECTUAL VENTURES II LLC, DELAWARE Free format text: MERGER;ASSIGNOR:LIYUE CAPITAL, LLC;REEL/FRAME:031369/0368 Effective date: 20131007 |
|
AS | Assignment |
Owner name: INTELLECTUAL VENTURES ASSETS 32 LLC, DELAWARE Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:INTELLECTUAL VENTURES II LLC;REEL/FRAME:041668/0322 Effective date: 20170313 |
|
AS | Assignment |
Owner name: ROVI TECHNOLOGIES CORPORATION, CALIFORNIA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:ROVI CORPORATION;REEL/FRAME:042178/0324 Effective date: 20170406 Owner name: ROVI CORPORATION, CALIFORNIA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:INTELLECTUAL VENTURES ASSETS 32 LLC;REEL/FRAME:041887/0336 Effective date: 20170317 |