WO2009145888A1 - Dynamically partitionable cache - Google Patents

Dynamically partitionable cache Download PDF

Info

Publication number
WO2009145888A1
WO2009145888A1 PCT/US2009/003214 US2009003214W WO2009145888A1 WO 2009145888 A1 WO2009145888 A1 WO 2009145888A1 US 2009003214 W US2009003214 W US 2009003214W WO 2009145888 A1 WO2009145888 A1 WO 2009145888A1
Authority
WO
WIPO (PCT)
Prior art keywords
cache
memory
type
memory request
cache line
Prior art date
Application number
PCT/US2009/003214
Other languages
French (fr)
Inventor
Michael J. Mantor
Brian A. Buchner
John P. Mccardle
Original Assignee
Advanced Micro Devices, Inc.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from US12/165,741 external-priority patent/US20090300293A1/en
Application filed by Advanced Micro Devices, Inc. filed Critical Advanced Micro Devices, Inc.
Publication of WO2009145888A1 publication Critical patent/WO2009145888A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0844Multiple simultaneous or quasi-simultaneous cache accessing
    • G06F12/0846Cache with multiple tag or data arrays being simultaneously accessible
    • G06F12/0848Partitioned cache, e.g. separate instruction and operand caches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0806Multiuser, multiprocessor or multiprocessing cache systems
    • G06F12/0815Cache consistency protocols

Definitions

  • the present invention relates to servicing memory requests. Specifically, the present invention relates to cache resource allocation and cache coherency.
  • Memory requests solicit values held in a memory of a system.
  • the requested values can be used in instructions executed by a processor unit.
  • the time required to execute the memory request can often hamper the operation of the processor unit.
  • a cache can be used to decrease the average memory latency. The cache holds a subset of the memory that is likely to be requested by the processor unit. Memory lookup requests that can be serviced by the cache have shorter latency than memory lookup requests that require the memory to be accessed.
  • the cache can be partitioned. Specifically, a fixed partition can be used to separate the cache. However, the fixed partition can result in the cache being used inefficiently. For example, if the cache is heavily used by one processing unit and rarely used by another, portions of the cache will be under utilized.
  • values held in the cache can become out-dated or stale when the corresponding value in the memory is changed. If stale data is provided in response to a memory request, the outcome of an instruction executed by the processing unit may be incorrect. To prevent stale data from being provided in response to a memory request, portions of the cache are invalidated when it is determined that they may have become stale according to one of many different cache coherency algorithms. However, such an invalidation process is often costly in terms of processing time.
  • Embodiments described herein relate to methods and systems for dynamically partitioning a cache and maintaining cache coherency.
  • a type is associated with portions of the cache.
  • the cache can be dynamically partitioned based on the type.
  • the type can also be used to identify portions of the cache that might have stale data. By using the type to identify possibly stale portions of the cache, invalidation can be done automatically by suitable inspection of the type of each portion of the cache.
  • a system for processing fetch memory requests includes a cache and a cache controller configured to compare a memory address and a type of a received memory request to a memory address and a type, respectively, corresponding to a cache line of the cache to determine whether the memory request hits on the cache line.
  • a method for processing fetch memory requests includes receiving a memory request and determining if the memory request hits on a cache line of a cache by determining if a memory address and a type of the memory request match a memory address and a type, respectively, corresponding to a cache line of the cache.
  • FIG. 1 is a block diagram illustration of a system for servicing memory requests.
  • FIG. 2 is a block diagram illustration of a system for servicing memory lookup requests, according to an embodiment of the present invention.
  • FIG. 3 is an exemplary diagram of a type field, according to an embodiment of the present invention.
  • FIG. 4 is a block diagram illustration of a system for servicing memory lookup requests, according to another embodiment of the present invention.
  • FIG. 5 is a flowchart of an exemplary method of servicing memory lookup requests, according to an embodiment of the present invention.
  • the present invention will be described with reference to the accompanying drawings. Generally, the drawing in which an element first appears is typically indicated by the leftmost digit(s) in the corresponding reference number.
  • FIG. 1 is a block diagram illustration of a system 100 for servicing memory requests.
  • System 100 includes a first resource 102, a second resource 104, a cache controller 106, a list of cache tags 108, a memory 110, and a cache 112.
  • First and second resources 102 and 104 can be processor units such as a graphics processor unit (GPU).
  • First resource 102 and/or second resource 104 can execute sets of instructions to perform a task, e.g., shaders used to perform rendering effects. These instructions can require values that are requested from memory 110.
  • Memory 110 can be the primary memory of system 100 and can be implemented as a random access memory (RAM). Since first and second resources 102 and 104 can typically execute instructions faster than memory requests can be serviced, the latency introduced by accessing memory 110 can hamper the performance of first and second resources 102 and 104. [0021] To decrease the time required to service a memory request, cache 112 is provided.
  • Cache 112 holds a subset of the values stored in memory 110. It is desired that cache 112 hold the subset of values of memory 110 that are most likely to be accessed by first and second resources 102 and 104. Because cache 112 is typically coupled to first and second resources 102 and 104 by a high speed path, the memory latency of memory requests serviced by cache 112 is shorter than the memory latency of requests serviced by memory 110.
  • Cache 112 includes a plurality of cache lines. Each of the cache lines is configured to hold a value stored in a memory 110. In alternate embodiments, each cache line of cache 112 can be configured to hold multiple values stored in memory 110.
  • Cache 112 can be a multilevel cache.
  • cache 112 can include an Ll cache and an L2 cache, hi an embodiment in which first resource 102 and/or second resource 104 is a GPU, cache 112 can hold graphics data.
  • cache 112 can be a vector or a texture cache.
  • Cache lines of cache 112 can become out-dated or stale if a value stored in memory 110 is changed and the corresponding cache line is not updated.
  • Memory requests can be grouped into clauses formed such that cache 112 remains coherent with memory 110 within the clause.
  • synchronization elements of system 100 (not shown) can be used to ensure that cache lines of cache 112 that are to be accessed in response to memory requests of a clause will not become stale as the clause is being serviced. However, this does not ensure that the cache lines of cache 112 will be coherent with memory 110, as the values stored in memory 110 corresponding to those cache lines may have been changed before the clause was serviced.
  • a clause is defined as a contiguous burst of memory requests, hi another embodiment, a clause can be defined as a group of instructions that will execute without interruption.
  • Cache controller 106 receives clauses of memory requests from first and second resources 102 and 104. Upon receiving a memory request, cache controller 106 determines if the memory request hits on a line cache line of cache 112. For a memory request to hit on a cache line of cache 112, the requested value must be resident in cache 112 and the cache line that includes the requested value must be valid. [0026] In determining whether a memory request hits on a cache line of cache 112, cache controller 106 accesses a list of cache tags 108. Each row of list of cache tags 108 represents a tag that is associated with a cache line of cache 112. As shown in FIG.
  • each tag includes an index specifying the cache line, a field specifying the address(es) in memory 110 where the held value is located, and a valid flag that specifies whether the associated cache line is valid.
  • the valid field is a single bit, e.g., 1 for valid, and 0 for invalid.
  • Each cache tag may also include a variety of other flags. For example, a flag may include the number of times the cache line has been accessed, a number of memory requests that are pending for the cache line, or other types of information that may be used to determine which cache line is selected when a new value from memory 110 must be written to cache 112. These flags can be updated as memory requests are serviced.
  • the value is obtained from memory 110.
  • the requested value replaces a value held in a cache line of cache 112.
  • cache controller 106 can access list of cache tags 108 to determine which cache lines do not have pending memory requests that have not been serviced.
  • cache controller 106 can access list of cache tags 108 to determine which cache lines do not have pending memory requests that have not been serviced.
  • cache controller 106 can access list of cache tags 108 to determine which cache lines do not have pending memory requests that have not been serviced.
  • a variety of techniques known to those skilled in the relevant arts can be used to choose among the available cache lines to determine which one will have will hold the value obtained from memory 110.
  • cache controller 106 can use a first in first out (FIFO) or least recently used (LRU) technique along with the values stored in the flag fields of list cache tags 108 to determine which cache line is selected to have the value it is holding overwritten.
  • FIFO first in first out
  • LRU least recently used
  • the associated tag in list of cache tags 108 is updated, e.g., to include the memory address of the newly held value, to set the cache line as valid, and update the other fields.
  • the requested value is provided from cache 112 to the requester (first resource 102 or second resource 104). In an alternate embodiment, the requested value can be provided directly from memory 110 to the requester.
  • list of cache tags 108 includes a fixed partition 114.
  • Fixed partition 114 effectively partitions cache 112 so that a portion of cache 112 is allocated to first resource 102 and another portion is allocated to second resource 104. In such a manner, first resource 102 does not inadvertently access values intended for second resource 104 and vice versa.
  • fixed partition 114 also can result in a portion of cache 112 being under utilized. For example, if first resource 102 outputs a relatively low number of memory requests, its portion of cache 112 is more likely to include a substantial number of stale cache lines. If second resource 104 outputs a relatively high number of memory requests, its portion of cache 112 will have a relatively high number of valid cache lines over- written. Fixed partition 114 prevents part of the portion of cache 112 allocated to first resource 102 from being used to hold values for second resource 104 even if such a re-allocation would benefit second resource 104 and not hinder first resource 102.
  • data held in cache 112 can become stale. If a cache line of cache 112 is determined to be stale, the cache line is invalidated. Invalidation occurs when the valid field of the associated tag in list of cache tags 108 is set to be invalid, for example, by setting the valid bit to 0. For example, a portion of cache controller 106, implemented in hardware, software, firmware, or a combination thereof can invalidate cache lines of cache 112 based on a range of addresses in memory 110 that have been updated. However, such an invalidation process is often costly in the number of cycles required to complete the invalidation.
  • a tag associated with a cache line additionally includes a type field.
  • the address requested by the memory request and its type must match the memory address field and tag field, respectively, of its associated tag and the cache line must be valid.
  • FIG. 2 is a block diagram illustration of a system 200 for servicing memory requests, according to an embodiment of the present invention.
  • System 200 includes first and second resources 102 and 104, memory 110, cache 112, a cache controller 202, and a list of cache tags 210.
  • First and second resources 102 and 104, memory 110, and cache 112 can be substantially similar to corresponding elements described with reference to FIG. 1.
  • Cache controller 202 receives clauses of memory requests from first and secondary resources 102 and 104. Cache controller 202 compares a requested memory address and the type of the memory request, to a memory address and a type corresponding to a cache line of cache 112. This comparison will determine if the memory request hits on the cache line.
  • Cache controller 202 includes an input module 204, a comparison module 206, and an invalidation module 208.
  • Input module 204 extracts the requested memory address and the type from the received memory request.
  • Comparison module 206 determines whether the memory request hits on a cache line of cache 112 based on the extracted memory address and type. Specifically, comparison module 206 compares the extracted memory address and type to the type and memory address fields of tags in list of cache tags 210. If the extracted memory address and type match corresponding fields of a tag of list of cache tags 210 and the associated cache line is determined to be valid, the memory request is determined to hit on that cache line.
  • Invalidation module 208 is configured to invalidate one or more cache lines of cache 112. For example, the type field can be used to identify cache lines that are to be automatically invalidated.
  • List of cache tags 210 is substantially similar to list of cache tags 108 described with reference to FIG. 1, except that tags of list of cache tags 210 include an additional type field.
  • the type field may be one or more bits.
  • each cache line effectively has a type.
  • the type of a cache line can be changed by changing the value of the type field in its associated tag.
  • Each cache line of cache 112 has a dynamically adjusted type. The type of a cache line is determined by the type field of the tag of list of cache tags 210 that is associated with that cache line.
  • the type field can be used to allocate portions of cache 112.
  • each of first and second resources 102 and 104 can have unique types that are included in their respective memory requests. Since a hit on a cache line requires the type of the memory request to match the type of the cache line, first resource 102 is prevented from inadvertently accessing a cache line that includes data intended for second resource 104 and vice versa.
  • additional types can also be provided for additional resources that are coupled to cache controller 202.
  • the type field can also be used to partition the cache 112 based on the type of data that is held. For example, in graphics processing applications, the type field can be used to distinguish between pixel and vertex data. Partitioning cache 112 based on types of data can be done in addition to partitioning cache 112 based on individual resources.
  • first resource 102 can have different types of data held in cache 112. Each type of data is identified by its unique types. Each of these types associated with data intended for first resource 102 can be different than types used by second resource 104.
  • the type field of a tag of list of cache tags 210 associated with a cache line of cache 112 is updated when memory requests are received.
  • the type the field is updated to be type of the received memory request.
  • the types of data in cache 112 i.e., the values of the type fields of the tags associated with the cache lines
  • cache 112 will tend to have proportionally more cache lines allocated to data for first resource 102 than for second resource 104.
  • the ratio of cache lines allocated for each type adjusts accordingly.
  • the same applies to types that differentiate between data types e.g., between pixel and vertex data).
  • the type of field can be used in addition to a fixed partition similar to partition 114 described with reference to FIG. 1.
  • the fixed partition can be used to divide cache 112 based on different resources while the type field can be used to partition cache 112 based on data types.
  • other combinations of the type field and a fixed partition can be used without departing from the scope and spirit of the present invention.
  • the additional type field can be used to automatically invalidate cache lines of cache 112.
  • synchronization elements can be used to ensure that a cache line does not become stale as a clause is being serviced. However, once the clause has been serviced, its continued freshness can no longer be guaranteed.
  • cache lines of cache 112 can be designated for automatic invalidation.
  • invalidation module 208 inspects the type field of tags in list of cache tags 210 and invalidates all cache lines that are designated for automatic invalidation.
  • cache lines can be invalidated based on the type field of their associated cache tag in list of cache tags 210. In such a manner, an invalidation process can be completed quickly compared to the multiple-cycle process required to invalidate cache lines based a range of addresses in memory 110. For example, type-based automatic invalidation can be done in one cycle.
  • invalidation module 208 is completely implemented in hardware.
  • cache lines of cache 112 can be automatically invalidated at specific points_.of operation without the intervention of software, hi alternate embodiments, invalidation module 208 can be implemented as hardware, software, firmware, or a combination thereof.
  • a single bit of the type field is used to designate a cache line for automatic invalidation, e.g., 1 for automatic invalidation and 0 for not automatic invalidation, or vice versa.
  • Each cache line that is of the automatic invalidation type is invalidated by invalidation module 208 at end of the servicing of every received clause of memory requests.
  • the entire type field is a single bit.
  • the contents of cache 112 are dynamically allocated based on automatic invalidation, e.g., as opposed to resource or data types, as described above.
  • multiple bits can be used to specify different types of automatic invalidation.
  • automatic invalidation can be associated with a type used to specify a resource or data type.
  • a type field may have two bits. The first bit can specify to which resource the cache line corresponds, e.g., 1 for first resource 102 and 0 for second resource 104. The second bit can specify whether the cache line is to be automatically invalidated, e.g., 1 for automatic invalidation and 0 for no automatic invalidation.
  • Invalidation module 208 can invalidate all cache lines based solely on the second bit of the type field once all clauses are complete by invalidating all cache lines that have a 1 in the second bit position of their associated tag field.
  • invalidation module 208 can automatically invalidate cache lines based on the type of the clause being serviced. For example, invalidation module 208 can invalidate all cache lines that have a 1 in their first bit position (corresponding to first resource 102) and a 1 in their second bit position (corresponding to automatic invalidation) when a clause of memory requests received from first resource 102 has been serviced.
  • FIG. 3 is a diagram of exemplary type field 300, according to an embodiment of the present invention.
  • Type field 300 includes portions 302, 304, and 306.
  • Portion 302 can be used to identify different resources, e.g., different resources coupled to cache _ controller 202 such as first resource 102 and second resource 104.
  • Portion 304 can be used to specify different types of data, e.g., texture data, vertex data, etc.
  • Portion 306 can be used to designate the associated cache line for automatic invalidation.
  • Each of portions 302, 304, and 306 can be one or more bits.
  • type field 300 is presented for illustration only and not intended to be limiting. Alternative type fields may include additional or fewer portions.
  • FIG. 4 is a block diagram illustration of a system 400 for servicing memory requests, according to an embodiment of the present invention.
  • System 400 includes a processor 402, cache controller 202, list of cache tags 210, memory 110, and cache 112.
  • the type field of tags associated with cache lines of cache 112 can be a single bit, e.g., a 1 indicating that the associated cache line is to be automatically invalidated and a 0 indicating that the associated cache line is not to be automatically invalidated.
  • Processor 402 can be a graphics processor or other type of processor.
  • Invalidation module 208 can be configured to invalidate all cache lines of cache 112 that are designated for automatic invalidation when a clause has been serviced. Invalidation operations can be completed in a single cycle.
  • cache lines of cache 112 that are designated for automatic invalidation can be invalidated when a clause has been suspended or serviced. Specifically, in systems that allow for one clause to preempt another clause, thereby suspending the first clause, cache 112 would be invalidated when the first clause is suspended.
  • the automatic invalidation may result in some performance degradation due to redundant memory lookup requests required as a result of the automatic invalidation. For example, as clause is being serviced it can result in a set of values being added to cache 112. When the clause is suspended, all of the cache lines of cache 112 are invalidated. Then, once the clause is restarted, even if the set of values that were previously added remain coherent with corresponding values in memory 110, they still must be retrieved from memory 110 because the cache lines that are holding those values were invalidated as a result of the suspension.
  • a ring buffer is a fixed sized buffer conceptually represented as a ring that allows data to be written to a fixed buffer and read out in a first in-first out (FIFO) order without having to shuffle elements within the ring buffer.
  • FIFO first in-first out
  • a ring buffer has a producer/consumer relationship. The producer fills the buffer with data and the consumer reads data from the buffer. .
  • a geometry shader can be the consumer. Values are written to the ring buffer, e.g., by a shader that writes data to the ring buffer such as an export shader, and the geometry shader can read values from the ring buffer in a FIFO order.
  • the data can come from other shader stages or graphics hardware.
  • the data can be a variety of different types of data, e.g., tessellation data.
  • Ring buffers can be used as links between different stages of a graphics processing pipeline. Once the consumer has completed reading the values of the ring buffer, a producer writes values to it. Thus, elements of a ring buffer are valid in cache only when the consumer is reading them. Automatic invalidation may be used when a consumer has completed reading data from a ring buffer. For example, when one or more clauses that make up the consumer have serviced, invalidation modules 208 automatically invalidates all cache lines that are designated for automatic invalidation.
  • processor 402 includes general purpose registers (GPR) 404.
  • GPRs 404 can be used to implement scratch memory.
  • scratch memory can be used as temporary storage for different shaders. Because scratch memory is made up of general purpose registers that are expensive in terms of space on processor 402, they may be virtualized using memory 110.
  • the values of certain general purpose registers of GPR 404 can be sent to a ring buffer implemented in memory 110 and read back to GPR 404 when they are required.
  • processor 404 can operate as if it has more GPRs that it actually has. In order to avoid having to access memory 110 to retrieve a value to be placed in a GPR of GPRs 404, the value may be retrieved from cache 112. When a clause that includes memory requests that reads the data in the ring buffer has been serviced, cache lines that hold values of that ring buffer can be automatically invalidated by invalidation module 208.
  • Automatic invalidation can be particularly useful in situations where it is known that a value will likely not be read after a clause has been serviced. For example, in the case of a ring buffer with a producer/consumer relationship, it is known that values of the ring buffer will probably not be accessed after the consumer has read them. Thus, automatic invalidation will probably not lead to erroneous cache misses, i.e., cache misses because a cache line was invalid when it actually did include fresh data.
  • system 400 is shown as including a single resource (e.g., processor 402),
  • dt can include multiple resources without departing from the scope and spirit of the present invention.
  • the type can be larger than a single bit.
  • FIG. 5 is a flowchart of an exemplary method 500 of providing example steps for servicing memory requests, according to the present invention.
  • Flowchart 500 is described with reference to the embodiment of FIG. 2. However, flowchart 500 is not limited to that embodiment. The steps shown in FIG. 5 do not necessarily have to occur in the order shown. The steps of FIG. 5 are described in detail below.
  • a memory request is received.
  • a memory request is received by cache controller 202 from first resource 102 or second resource.
  • the received memory request is a member of a clause of memory requests.
  • step 504 it is determined whether the memory request hits on a cache line.
  • comparison module 206 of cache controller 202 determines whether the received memory request hits on a cache line of cache 112. Specifically, comparison module 206 determines if the type and address fields of tags associated with cache lines of cache 112 match the memory address and the type of memory request, respectively, and analyzes the valid fields of the tags to determine if their associated cache lines are valid.
  • step 506 If the memory request does not hit on any cache line step 506 is reached.
  • the requested value is retrieved from memory.
  • the requested value is retrieved from memory 110.
  • step 508 the cache is updated with the retrieved value.
  • a cache line of cache 112 is selected to have its data overwritten by the value retrieved from memory 110.
  • the selected cache line can be a cache line of cache 112 that is available and chosen by a variety of cache replacement algorithms.
  • the tag associated with the cache line is updated when the memory request is received.
  • the associated tag can be updated before the cache line has its value overwritten with the retrieved value.
  • step 510 the value is provided from cache.
  • the requested value is provided to the requestor (e.g., first resource 102 or second resource 104) by cache 112.
  • step 512 it is determined whether there are more request in the clause. If there are more requests in the clause, flowchart 500 returns to step 502 and the next memory request in the clause is processed. If the received memory request is the last memory request of the clause, step 514 occurs. In step 514 all entries with a predetermined type are invalidated. For example, in FIG. 2, invalidation module 208 can invalidate all cache lines of cache 112 that have the type of the received memory request. In alternative embodiments, invalidation module 208 invalidates other types of cache lines. [0067] Embodiments of the present invention may be used in any computing device where register resources are to be managed among a plurality of concurrently executing processes.
  • embodiments may include computers, game platforms, entertainment platforms, personal digital assistants, and video platforms.
  • Embodiments of the present invention may be encoded in many programming languages including hardware description languages (HDL), assembly language, and C language.
  • HDL hardware description languages
  • assembly language e.g., Verilog
  • Verilog can be used to synthesize, simulate, and manufacture a device that implements the aspects of one or more embodiments of the present invention.
  • Verilog can be used to model, design, verify, and/or implement cache controller 202, described with reference to FIG. 2.

Abstract

Methods and systems for dynamically partitioning a cache and maintaining cache coherency are provided. In an embodiment, a system for processing memory requests includes a cache and a cache controller configured to compare a memory address and a type of a received memory request to a memory address and a type, respectively, corresponding to a cache line of the cache to determine whether the memory request hits on the cache line. In another embodiment, a method for processing fetch memory requests includes receiving a memory request and determining if the memory request hits on a cache line of a cache by determining if a memory address and a type of the memory request match a memory address and a type, respectively, corresponding to a cache line of the cache.

Description

DYNAMICALLY PARTITIONABLE CACHE
BACKGROUND OF THE INVENTION
Field of the Invention
[0001] The present invention relates to servicing memory requests. Specifically, the present invention relates to cache resource allocation and cache coherency.
Background Art
[0002] Memory requests solicit values held in a memory of a system. The requested values can be used in instructions executed by a processor unit. However, the time required to execute the memory request, the memory latency, can often hamper the operation of the processor unit. A cache can be used to decrease the average memory latency. The cache holds a subset of the memory that is likely to be requested by the processor unit. Memory lookup requests that can be serviced by the cache have shorter latency than memory lookup requests that require the memory to be accessed.
[0003] Multiple processing units can access the same cache. To prevent one processing unit from inadvertently accessing data intended for another processing unit, the cache can be partitioned. Specifically, a fixed partition can be used to separate the cache. However, the fixed partition can result in the cache being used inefficiently. For example, if the cache is heavily used by one processing unit and rarely used by another, portions of the cache will be under utilized.
[0004] Also, values held in the cache can become out-dated or stale when the corresponding value in the memory is changed. If stale data is provided in response to a memory request, the outcome of an instruction executed by the processing unit may be incorrect. To prevent stale data from being provided in response to a memory request, portions of the cache are invalidated when it is determined that they may have become stale according to one of many different cache coherency algorithms. However, such an invalidation process is often costly in terms of processing time.
[0005] Thus, what is needed is a system and method for dynamically partitioning a cache and efficiently maintaining cache coherence. BRIEF SUMMARY OF THE INVENTION
[0006] Embodiments described herein relate to methods and systems for dynamically partitioning a cache and maintaining cache coherency. A type is associated with portions of the cache. The cache can be dynamically partitioned based on the type. The type can also be used to identify portions of the cache that might have stale data. By using the type to identify possibly stale portions of the cache, invalidation can be done automatically by suitable inspection of the type of each portion of the cache.
[0007] In an embodiment, a system for processing fetch memory requests includes a cache and a cache controller configured to compare a memory address and a type of a received memory request to a memory address and a type, respectively, corresponding to a cache line of the cache to determine whether the memory request hits on the cache line.
[0008] In another embodiment, a method for processing fetch memory requests includes receiving a memory request and determining if the memory request hits on a cache line of a cache by determining if a memory address and a type of the memory request match a memory address and a type, respectively, corresponding to a cache line of the cache.
[0009] Further embodiments, features, and advantages of the present invention, as well as the structure and operation of the various embodiments of the present invention, are described in detail below with reference to the accompanying drawings.
BRIEF DESCRIPTION OF THE DRAWINGS/FIGURES
[0010] The accompanying drawings, which are incorporated herein and form a part of the specification, illustrate the present invention and, together with the description, further serve to explain the principles of the invention and to enable a person skilled in the pertinent art to make and use the invention.
[0011] FIG. 1 is a block diagram illustration of a system for servicing memory requests.
[0012] FIG. 2 is a block diagram illustration of a system for servicing memory lookup requests, according to an embodiment of the present invention.
[0013] FIG. 3 is an exemplary diagram of a type field, according to an embodiment of the present invention.
[0014] FIG. 4 is a block diagram illustration of a system for servicing memory lookup requests, according to another embodiment of the present invention. [0015] FIG. 5 is a flowchart of an exemplary method of servicing memory lookup requests, according to an embodiment of the present invention. [0016] The present invention will be described with reference to the accompanying drawings. Generally, the drawing in which an element first appears is typically indicated by the leftmost digit(s) in the corresponding reference number.
DETAILED DESCRIPTION OF THE INVENTION
[0017] The following detailed description of the present invention refers to the accompanying drawings that illustrate exemplary embodiments consistent with this invention. Other embodiments are possible, and modifications may be made to the embodiments within the spirit and scope of the invention. Therefore, the detailed description is not meant to limit the invention. Rather, the scope of the invention is defined by the appended claims.
[0018] It would be apparent to one of skill in the art that the present invention, as described below, may be implemented in many different embodiments of software, hardware, firmware, and/or the entities illustrated in the figures. Any actual software code with the specialized control of hardware to implement the present invention is not limiting of the present invention. Thus, the operational behavior of the present invention will be described with the understanding that modifications and variations of the embodiments are possible, given the level of detail presented herein.
[0019] FIG. 1 is a block diagram illustration of a system 100 for servicing memory requests. System 100 includes a first resource 102, a second resource 104, a cache controller 106, a list of cache tags 108, a memory 110, and a cache 112. First and second resources 102 and 104 can be processor units such as a graphics processor unit (GPU). First resource 102 and/or second resource 104 can execute sets of instructions to perform a task, e.g., shaders used to perform rendering effects. These instructions can require values that are requested from memory 110.
[0020] Memory 110 can be the primary memory of system 100 and can be implemented as a random access memory (RAM). Since first and second resources 102 and 104 can typically execute instructions faster than memory requests can be serviced, the latency introduced by accessing memory 110 can hamper the performance of first and second resources 102 and 104. [0021] To decrease the time required to service a memory request, cache 112 is provided.
Cache 112 holds a subset of the values stored in memory 110. It is desired that cache 112 hold the subset of values of memory 110 that are most likely to be accessed by first and second resources 102 and 104. Because cache 112 is typically coupled to first and second resources 102 and 104 by a high speed path, the memory latency of memory requests serviced by cache 112 is shorter than the memory latency of requests serviced by memory 110.
[0022] Cache 112 includes a plurality of cache lines. Each of the cache lines is configured to hold a value stored in a memory 110. In alternate embodiments, each cache line of cache 112 can be configured to hold multiple values stored in memory 110. Cache 112 can be a multilevel cache. For example, cache 112 can include an Ll cache and an L2 cache, hi an embodiment in which first resource 102 and/or second resource 104 is a GPU, cache 112 can hold graphics data. For example, cache 112 can be a vector or a texture cache.
[0023] Cache lines of cache 112 can become out-dated or stale if a value stored in memory 110 is changed and the corresponding cache line is not updated. Memory requests can be grouped into clauses formed such that cache 112 remains coherent with memory 110 within the clause. In particular, synchronization elements of system 100 (not shown) can be used to ensure that cache lines of cache 112 that are to be accessed in response to memory requests of a clause will not become stale as the clause is being serviced. However, this does not ensure that the cache lines of cache 112 will be coherent with memory 110, as the values stored in memory 110 corresponding to those cache lines may have been changed before the clause was serviced.
[0024] In an embodiment, a clause is defined as a contiguous burst of memory requests, hi another embodiment, a clause can be defined as a group of instructions that will execute without interruption.
[0025] Cache controller 106 receives clauses of memory requests from first and second resources 102 and 104. Upon receiving a memory request, cache controller 106 determines if the memory request hits on a line cache line of cache 112. For a memory request to hit on a cache line of cache 112, the requested value must be resident in cache 112 and the cache line that includes the requested value must be valid. [0026] In determining whether a memory request hits on a cache line of cache 112, cache controller 106 accesses a list of cache tags 108. Each row of list of cache tags 108 represents a tag that is associated with a cache line of cache 112. As shown in FIG. 1, each tag includes an index specifying the cache line, a field specifying the address(es) in memory 110 where the held value is located, and a valid flag that specifies whether the associated cache line is valid. In an embodiment, the valid field is a single bit, e.g., 1 for valid, and 0 for invalid. Each cache tag may also include a variety of other flags. For example, a flag may include the number of times the cache line has been accessed, a number of memory requests that are pending for the cache line, or other types of information that may be used to determine which cache line is selected when a new value from memory 110 must be written to cache 112. These flags can be updated as memory requests are serviced.
[0027] If the memory request does not hit on any of the cache lines of cache 112, the value is obtained from memory 110. The requested value replaces a value held in a cache line of cache 112. In determining which cache line of cache 112 can be used to hold the value obtained from memory 110, it is determined which cache lines of cache 112 are available. For example, cache controller 106 can access list of cache tags 108 to determine which cache lines do not have pending memory requests that have not been serviced. A variety of techniques known to those skilled in the relevant arts can be used to choose among the available cache lines to determine which one will have will hold the value obtained from memory 110. For example, cache controller 106 can use a first in first out (FIFO) or least recently used (LRU) technique along with the values stored in the flag fields of list cache tags 108 to determine which cache line is selected to have the value it is holding overwritten. Once the requested value is written into cache 112, the associated tag in list of cache tags 108 is updated, e.g., to include the memory address of the newly held value, to set the cache line as valid, and update the other fields. The requested value is provided from cache 112 to the requester (first resource 102 or second resource 104). In an alternate embodiment, the requested value can be provided directly from memory 110 to the requester.
[0028] As shown in FIG. 1, list of cache tags 108 includes a fixed partition 114. Fixed partition 114 effectively partitions cache 112 so that a portion of cache 112 is allocated to first resource 102 and another portion is allocated to second resource 104. In such a manner, first resource 102 does not inadvertently access values intended for second resource 104 and vice versa. However, fixed partition 114 also can result in a portion of cache 112 being under utilized. For example, if first resource 102 outputs a relatively low number of memory requests, its portion of cache 112 is more likely to include a substantial number of stale cache lines. If second resource 104 outputs a relatively high number of memory requests, its portion of cache 112 will have a relatively high number of valid cache lines over- written. Fixed partition 114 prevents part of the portion of cache 112 allocated to first resource 102 from being used to hold values for second resource 104 even if such a re-allocation would benefit second resource 104 and not hinder first resource 102.
[0029] Also, as described above, data held in cache 112 can become stale. If a cache line of cache 112 is determined to be stale, the cache line is invalidated. Invalidation occurs when the valid field of the associated tag in list of cache tags 108 is set to be invalid, for example, by setting the valid bit to 0. For example, a portion of cache controller 106, implemented in hardware, software, firmware, or a combination thereof can invalidate cache lines of cache 112 based on a range of addresses in memory 110 that have been updated. However, such an invalidation process is often costly in the number of cycles required to complete the invalidation.
Exemplary Embodiments
[0030] In embodiments described herein, methods and systems are provided that allow for dynamically partitioning a cache and efficiently maintaining cache coherence. Specifically, a tag associated with a cache line additionally includes a type field. In order for a memory request to hit on a line of a cache, the address requested by the memory request and its type must match the memory address field and tag field, respectively, of its associated tag and the cache line must be valid.
[0031] The additional type field can be used to dynamically allocate the resources of the cache and to efficiently maintain cache coherency. Different resources can each correspond to unique types. Since a cache hit requires matching a type of the request to the type of the cache line, the type effectively partitions the cache. The type field can also be used to identify portions of the cache that are to be automatically invalidated. [0032] FIG. 2 is a block diagram illustration of a system 200 for servicing memory requests, according to an embodiment of the present invention. System 200 includes first and second resources 102 and 104, memory 110, cache 112, a cache controller 202, and a list of cache tags 210. First and second resources 102 and 104, memory 110, and cache 112 can be substantially similar to corresponding elements described with reference to FIG. 1.
[0033] Cache controller 202 receives clauses of memory requests from first and secondary resources 102 and 104. Cache controller 202 compares a requested memory address and the type of the memory request, to a memory address and a type corresponding to a cache line of cache 112. This comparison will determine if the memory request hits on the cache line.
[0034] Cache controller 202 includes an input module 204, a comparison module 206, and an invalidation module 208. Input module 204 extracts the requested memory address and the type from the received memory request. Comparison module 206 determines whether the memory request hits on a cache line of cache 112 based on the extracted memory address and type. Specifically, comparison module 206 compares the extracted memory address and type to the type and memory address fields of tags in list of cache tags 210. If the extracted memory address and type match corresponding fields of a tag of list of cache tags 210 and the associated cache line is determined to be valid, the memory request is determined to hit on that cache line. Invalidation module 208 is configured to invalidate one or more cache lines of cache 112. For example, the type field can be used to identify cache lines that are to be automatically invalidated.
[0035] List of cache tags 210 is substantially similar to list of cache tags 108 described with reference to FIG. 1, except that tags of list of cache tags 210 include an additional type field. The type field may be one or more bits. Thus, based on the type field of their associated tags, each cache line effectively has a type. The type of a cache line can be changed by changing the value of the type field in its associated tag. [0036] Each cache line of cache 112 has a dynamically adjusted type. The type of a cache line is determined by the type field of the tag of list of cache tags 210 that is associated with that cache line.
Dynamic Allocation of Cache Resources [0037] The type field can be used to allocate portions of cache 112. For example, each of first and second resources 102 and 104 can have unique types that are included in their respective memory requests. Since a hit on a cache line requires the type of the memory request to match the type of the cache line, first resource 102 is prevented from inadvertently accessing a cache line that includes data intended for second resource 104 and vice versa. As would be apparent to those skilled in the relevant arts, additional types can also be provided for additional resources that are coupled to cache controller 202.
[0038] The type field can also be used to partition the cache 112 based on the type of data that is held. For example, in graphics processing applications, the type field can be used to distinguish between pixel and vertex data. Partitioning cache 112 based on types of data can be done in addition to partitioning cache 112 based on individual resources.
[0039] Thus, first resource 102 can have different types of data held in cache 112. Each type of data is identified by its unique types. Each of these types associated with data intended for first resource 102 can be different than types used by second resource 104.
[0040] In the absence of a fixed partition that divides cache 112, the contents of cache
112 depend on memory requests received from first and second resources 102 and 104. Moreover, the type field of a tag of list of cache tags 210 associated with a cache line of cache 112 is updated when memory requests are received. In particular, the type the field is updated to be type of the received memory request. Thus, over time, the types of data in cache 112 (i.e., the values of the type fields of the tags associated with the cache lines) ..mimic received memory requests. For example, if first resource 102 generates more memory requests than second resource 104, cache 112 will tend to have proportionally more cache lines allocated to data for first resource 102 than for second resource 104. As the ratio of different types of memory requests changes, the ratio of cache lines allocated for each type adjusts accordingly. As would be appreciated by those skilled in the relevant arts, the same applies to types that differentiate between data types (e.g., between pixel and vertex data).
[0041] hi another embodiment, the type of field can be used in addition to a fixed partition similar to partition 114 described with reference to FIG. 1. For example, the fixed partition can be used to divide cache 112 based on different resources while the type field can be used to partition cache 112 based on data types. As would be appreciated by those skilled in the art, other combinations of the type field and a fixed partition can be used without departing from the scope and spirit of the present invention.
Automatic Invalidation
[0042] In addition to being used to dynamically allocate cache 112 based resources or data types, the additional type field can be used to automatically invalidate cache lines of cache 112. As described above, synchronization elements can be used to ensure that a cache line does not become stale as a clause is being serviced. However, once the clause has been serviced, its continued freshness can no longer be guaranteed.
[0043] Based on the type field, cache lines of cache 112 can be designated for automatic invalidation. For example, when a clause has been serviced, invalidation module 208 inspects the type field of tags in list of cache tags 210 and invalidates all cache lines that are designated for automatic invalidation. Thus, instead of determining which cache lines should be invalidated based on a range of memory addresses in memory 110, cache lines can be invalidated based on the type field of their associated cache tag in list of cache tags 210. In such a manner, an invalidation process can be completed quickly compared to the multiple-cycle process required to invalidate cache lines based a range of addresses in memory 110. For example, type-based automatic invalidation can be done in one cycle.
[0044] In the illustration of FIG. 2, invalidation module 208 is completely implemented in hardware. Thus, cache lines of cache 112 can be automatically invalidated at specific points_.of operation without the intervention of software, hi alternate embodiments, invalidation module 208 can be implemented as hardware, software, firmware, or a combination thereof.
[0045] In an embodiment, a single bit of the type field is used to designate a cache line for automatic invalidation, e.g., 1 for automatic invalidation and 0 for not automatic invalidation, or vice versa. Each cache line that is of the automatic invalidation type is invalidated by invalidation module 208 at end of the servicing of every received clause of memory requests. In a further embodiment, the entire type field is a single bit. In such an embodiment, the contents of cache 112 are dynamically allocated based on automatic invalidation, e.g., as opposed to resource or data types, as described above.
[0046] In alternate embodiments, multiple bits can be used to specify different types of automatic invalidation. For example, automatic invalidation can be associated with a type used to specify a resource or data type. For example, a type field may have two bits. The first bit can specify to which resource the cache line corresponds, e.g., 1 for first resource 102 and 0 for second resource 104. The second bit can specify whether the cache line is to be automatically invalidated, e.g., 1 for automatic invalidation and 0 for no automatic invalidation. Invalidation module 208 can invalidate all cache lines based solely on the second bit of the type field once all clauses are complete by invalidating all cache lines that have a 1 in the second bit position of their associated tag field. Alternatively, invalidation module 208 can automatically invalidate cache lines based on the type of the clause being serviced. For example, invalidation module 208 can invalidate all cache lines that have a 1 in their first bit position (corresponding to first resource 102) and a 1 in their second bit position (corresponding to automatic invalidation) when a clause of memory requests received from first resource 102 has been serviced.
[0047] FIG. 3 is a diagram of exemplary type field 300, according to an embodiment of the present invention. Type field 300 includes portions 302, 304, and 306. Portion 302 can be used to identify different resources, e.g., different resources coupled to cache _ controller 202 such as first resource 102 and second resource 104. Portion 304 can be used to specify different types of data, e.g., texture data, vertex data, etc. Portion 306 can be used to designate the associated cache line for automatic invalidation. Each of portions 302, 304, and 306 can be one or more bits. As would be apparent to those skilled in the relevant arts, type field 300 is presented for illustration only and not intended to be limiting. Alternative type fields may include additional or fewer portions.
[0048] Although the embodiments described above have focused on systems with multiple resources, an additional type field can also be applied advantageously to a system that includes a single resource.
[0049] FIG. 4 is a block diagram illustration of a system 400 for servicing memory requests, according to an embodiment of the present invention. System 400 includes a processor 402, cache controller 202, list of cache tags 210, memory 110, and cache 112. The type field of tags associated with cache lines of cache 112 can be a single bit, e.g., a 1 indicating that the associated cache line is to be automatically invalidated and a 0 indicating that the associated cache line is not to be automatically invalidated.
[0050] Processor 402 can be a graphics processor or other type of processor. Invalidation module 208 can be configured to invalidate all cache lines of cache 112 that are designated for automatic invalidation when a clause has been serviced. Invalidation operations can be completed in a single cycle.
[0051] In another embodiment, cache lines of cache 112 that are designated for automatic invalidation can be invalidated when a clause has been suspended or serviced. Specifically, in systems that allow for one clause to preempt another clause, thereby suspending the first clause, cache 112 would be invalidated when the first clause is suspended. The automatic invalidation may result in some performance degradation due to redundant memory lookup requests required as a result of the automatic invalidation. For example, as clause is being serviced it can result in a set of values being added to cache 112. When the clause is suspended, all of the cache lines of cache 112 are invalidated. Then, once the clause is restarted, even if the set of values that were previously added remain coherent with corresponding values in memory 110, they still must be retrieved from memory 110 because the cache lines that are holding those values were invalidated as a result of the suspension.
[0052] The above described invalidation procedure can be used in a variety of applications. For example, automatic invalidation can be used in the processing of ring buffers. A ring buffer is a fixed sized buffer conceptually represented as a ring that allows data to be written to a fixed buffer and read out in a first in-first out (FIFO) order without having to shuffle elements within the ring buffer. Typically, a ring buffer has a producer/consumer relationship. The producer fills the buffer with data and the consumer reads data from the buffer. .
[0053] For example, in the embodiment in which processor 402 is a graphics processor, a geometry shader can be the consumer. Values are written to the ring buffer, e.g., by a shader that writes data to the ring buffer such as an export shader, and the geometry shader can read values from the ring buffer in a FIFO order. The data can come from other shader stages or graphics hardware. The data can be a variety of different types of data, e.g., tessellation data. Once the geometry shader has completed reading the values of the ring buffer, the geometry shader can become a producer. The geometry shader then writes values to a second ring buffer and another shader (e.g., a vector shader) would be a consumer that reads those values.
[0054] Ring buffers can be used as links between different stages of a graphics processing pipeline. Once the consumer has completed reading the values of the ring buffer, a producer writes values to it. Thus, elements of a ring buffer are valid in cache only when the consumer is reading them. Automatic invalidation may be used when a consumer has completed reading data from a ring buffer. For example, when one or more clauses that make up the consumer have serviced, invalidation modules 208 automatically invalidates all cache lines that are designated for automatic invalidation.
[0055] Automatic invalidation can also be used to efficiently process the virtualization of general purpose registers. As shown in FIG. 3, processor 402 includes general purpose registers (GPR) 404. In an embodiment, GPRs 404 can be used to implement scratch memory. In the embodiment in which processor 402 is a graphics processor, scratch memory can be used as temporary storage for different shaders. Because scratch memory is made up of general purpose registers that are expensive in terms of space on processor 402, they may be virtualized using memory 110. In particular, the values of certain general purpose registers of GPR 404 can be sent to a ring buffer implemented in memory 110 and read back to GPR 404 when they are required.
[0056] Thus, processor 404 can operate as if it has more GPRs that it actually has. In order to avoid having to access memory 110 to retrieve a value to be placed in a GPR of GPRs 404, the value may be retrieved from cache 112. When a clause that includes memory requests that reads the data in the ring buffer has been serviced, cache lines that hold values of that ring buffer can be automatically invalidated by invalidation module 208.
[0057] Automatic invalidation can be particularly useful in situations where it is known that a value will likely not be read after a clause has been serviced. For example, in the case of a ring buffer with a producer/consumer relationship, it is known that values of the ring buffer will probably not be accessed after the consumer has read them. Thus, automatic invalidation will probably not lead to erroneous cache misses, i.e., cache misses because a cache line was invalid when it actually did include fresh data.
[0058] Although system 400 is shown as including a single resource (e.g., processor 402),
those skilled in the art will appreciate that dt can include multiple resources without departing from the scope and spirit of the present invention. In embodiments in which system 300 includes multiple resources, the type can be larger than a single bit.
[0059] FIG. 5 is a flowchart of an exemplary method 500 of providing example steps for servicing memory requests, according to the present invention. Other structural and operational embodiments will be apparent to persons skilled in the relevant art(s) based on the following discussion. Flowchart 500 is described with reference to the embodiment of FIG. 2. However, flowchart 500 is not limited to that embodiment. The steps shown in FIG. 5 do not necessarily have to occur in the order shown. The steps of FIG. 5 are described in detail below.
[0060] In step 502, a memory request is received. For example, in FIG. 2, a memory request is received by cache controller 202 from first resource 102 or second resource. In an embodiment, the received memory request is a member of a clause of memory requests.
[0061] In step 504, it is determined whether the memory request hits on a cache line. For example, comparison module 206 of cache controller 202 determines whether the received memory request hits on a cache line of cache 112. Specifically, comparison module 206 determines if the type and address fields of tags associated with cache lines of cache 112 match the memory address and the type of memory request, respectively, and analyzes the valid fields of the tags to determine if their associated cache lines are valid.
[0062] If the memory request does not hit on any cache line step 506 is reached. In step
506, the requested value is retrieved from memory. For example, in FIG. 2, the requested value is retrieved from memory 110.
[0063] In step 508, the cache is updated with the retrieved value. For example, in FIG. 2, a cache line of cache 112 is selected to have its data overwritten by the value retrieved from memory 110. As described above, the selected cache line can be a cache line of cache 112 that is available and chosen by a variety of cache replacement algorithms.
[0064] The tag associated with the cache line is updated when the memory request is received. Thus, the associated tag can be updated before the cache line has its value overwritten with the retrieved value.
[0065] In step 510, the value is provided from cache. For example, in FIG. 2, the requested value is provided to the requestor (e.g., first resource 102 or second resource 104) by cache 112.
[0066] In step 512, it is determined whether there are more request in the clause. If there are more requests in the clause, flowchart 500 returns to step 502 and the next memory request in the clause is processed. If the received memory request is the last memory request of the clause, step 514 occurs. In step 514 all entries with a predetermined type are invalidated. For example, in FIG. 2, invalidation module 208 can invalidate all cache lines of cache 112 that have the type of the received memory request. In alternative embodiments, invalidation module 208 invalidates other types of cache lines. [0067] Embodiments of the present invention may be used in any computing device where register resources are to be managed among a plurality of concurrently executing processes. For example and without limitation, embodiments may include computers, game platforms, entertainment platforms, personal digital assistants, and video platforms. Embodiments of the present invention may be encoded in many programming languages including hardware description languages (HDL), assembly language, and C language. For example, an HDL, e.g., Verilog, can be used to synthesize, simulate, and manufacture a device that implements the aspects of one or more embodiments of the present invention. For example, Verilog can be used to model, design, verify, and/or implement cache controller 202, described with reference to FIG. 2.
CONCLUSION
[0068] While various embodiments of the present invention have been described above, it should be understood that they have been presented by way of example only, and not limitation. It will be apparent to persons skilled in the relevant art that various changes in form and detail can be made therein without departing from the spirit and scope of the invention. Thus, the breadth and scope of the present invention should not be limited by any of the above-described exemplary embodiments, but should be defined only in accordance with the following claims and their equivalents.

Claims

WHAT IS CLAIMED IS:
1. A system for processing memory requests, comprising: a cache; a cache controller configured to compare a memory address and a type of a received memory request to a memory address and a type, respectively, corresponding to a cache line of the cache to determine whether the memory request hits on the cache line.
2. The system of claim 1, wherein the cache controller is configured to provide a value held in the cache line if the memory request hits on the cache line.
3. The system of claim 1, wherein the first memory request is received from a first resource, wherein the cache controller is configured to receive a second memory request from a second resource, and wherein a type of the second memory request is different than the type of the first memory request.
4. The system of claim 1, wherein the cache controller comprises: a comparison module configured to compare the type of the memory request to the type corresponding to the cache line.
5. The system of claim 1, wherein the cache controller comprises: an input module configured to receive the memory request and extract the type and the memory address of the memory request from the memory request.
6. The system of claim 1, wherein the cache controller is configured to replace a value held in an available cache line with a value corresponding to the memory address of the memory request if the memory request does not hit on any cache line of the cache and wherein the cache controller is configured to update a memory address and a type of a tag associated with the available cache line to include the memory address and the type, respectively, of the memory request.
7. The system of claim 1, wherein the memory request is included in a clause of memory requests, wherein each memory request of the clause of memory requests has the same type, and wherein the cache controller is configured to invalidate a second cache line of the cache that has a corresponding type that matches the type of the memory request.
8. The system of claim 7, wherein the cache controller is configured to invalidate all cache lines of the cache that have a corresponding type that matches the type of the memory request if the memory request is the last memory request of the clause.
9. The system of claim 7, wherein the type of the memory request is specified by a one-bit field.
10. The system of claim 7, wherein the cache controller module is configured to compare the memory address and the type of the memory request to the memory address and the type corresponding to the cache line in a clock cycle.
11. The system of claim 10, wherein the cache controller is configured to invalidate the second cache line in the clock cycle.
12. The system of claim 1, wherein the memory address of the memory request corresponds to a scratch register or an entry in a ring buffer.
13. A method for processing memory requests, comprising: receiving a memory request; and determining if the memory request hits on a cache line of a cache by determining if a memory address and a type of the memory request match a memory address and a type, respectively, corresponding to a cache line of the cache.
14. The method of claim 13, further comprising: providing a value held in the cache line in response to the memory request if the memory request hits on the cache line.
15. The method of claim 13, wherein the determining step further comprises: determining if the cache line is valid.
16. The method of claim 13 , further comprising: receiving a second memory request; wherein the first memory request is received from a first resource and the second memory request is received from a second resource and wherein the type of the first memory request is different than a type of the second memory request.
17. The method of claim 13, further comprising: replacing a value held in a second cache line of the cache with a value corresponding to the memory address of the memory request if the memory request does not hit on any cache line of the cache; and updating a memory address and a type of a tag associated with the second cache line to include the memory address and the type, respectively, of the memory request.
18. The method of claim 13, wherein the memory request is included in a clause of memory requests and wherein each memory request of the clause of memory requests has the same type, further comprising: invalidating a second cache line of the cache that has a corresponding type that matches the type of the memory request.
19. The system of claim 18, wherein the invalidating step.further comprises: invalidating all cache lines of the cache that have a corresponding type that matches the type of the memory request if the memory request is the final memory request of the clause.
20. A computer readable medium carrying one or more sequences of one or more instructions for execution by one or more processors to perform a method for processing memory requests, the instructions when executed by the one or more processors, cause the one or more processors to:
(a) receive a memory request; and
(b) determine if the memory request hits on a cache line of a cache by determining if a memory address and a type of the memory request match a memory address and a type, respectively, corresponding to a cache line of the cache.
21. The computer readable medium of claim 20, wherein the sequences of instructions are encoded using a hardware description language (HDL).
PCT/US2009/003214 2008-05-29 2009-05-27 Dynamically partitionable cache WO2009145888A1 (en)

Applications Claiming Priority (6)

Application Number Priority Date Filing Date Title
US5713008P 2008-05-29 2008-05-29
US61/057,130 2008-05-29
US5745208P 2008-05-30 2008-05-30
US61/057,452 2008-05-30
US12/165,741 US20090300293A1 (en) 2008-05-30 2008-07-01 Dynamically Partitionable Cache
US12/165,741 2008-07-01

Publications (1)

Publication Number Publication Date
WO2009145888A1 true WO2009145888A1 (en) 2009-12-03

Family

ID=41377427

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2009/003214 WO2009145888A1 (en) 2008-05-29 2009-05-27 Dynamically partitionable cache

Country Status (1)

Country Link
WO (1) WO2009145888A1 (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030023827A1 (en) * 2000-06-30 2003-01-30 Salvador Palanca Method and apparatus for cache replacement for a multiple variable-way associative cache
US20040022094A1 (en) * 2002-02-25 2004-02-05 Sivakumar Radhakrishnan Cache usage for concurrent multiple streams
US6901495B2 (en) * 1999-05-06 2005-05-31 Sun Microsystems, Inc. Cache memory system allowing concurrent reads and writes to cache lines to increase snoop bandwith
US20060179229A1 (en) * 2005-02-10 2006-08-10 Clark Leo J L2 cache controller with slice directory and unified cache structure
US7225300B1 (en) * 2004-09-15 2007-05-29 Azul Systems, Inc Duplicate snoop tags partitioned across multiple processor/cache chips in a multi-processor system

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6901495B2 (en) * 1999-05-06 2005-05-31 Sun Microsystems, Inc. Cache memory system allowing concurrent reads and writes to cache lines to increase snoop bandwith
US20030023827A1 (en) * 2000-06-30 2003-01-30 Salvador Palanca Method and apparatus for cache replacement for a multiple variable-way associative cache
US20040022094A1 (en) * 2002-02-25 2004-02-05 Sivakumar Radhakrishnan Cache usage for concurrent multiple streams
US7225300B1 (en) * 2004-09-15 2007-05-29 Azul Systems, Inc Duplicate snoop tags partitioned across multiple processor/cache chips in a multi-processor system
US20060179229A1 (en) * 2005-02-10 2006-08-10 Clark Leo J L2 cache controller with slice directory and unified cache structure

Similar Documents

Publication Publication Date Title
US20090300293A1 (en) Dynamically Partitionable Cache
US10037228B2 (en) Efficient memory virtualization in multi-threaded processing units
US8244984B1 (en) System and method for cleaning dirty data in an intermediate cache using a data class dependent eviction policy
US10310973B2 (en) Efficient memory virtualization in multi-threaded processing units
US8982140B2 (en) Hierarchical memory addressing
EP2480985B1 (en) Unified addressing and instructions for accessing parallel memory spaces
US10169091B2 (en) Efficient memory virtualization in multi-threaded processing units
EP2542973B1 (en) Gpu support for garbage collection
US9110810B2 (en) Multi-level instruction cache prefetching
US9262174B2 (en) Dynamic bank mode addressing for memory access
US9792221B2 (en) System and method for improving performance of read/write operations from a persistent memory device
US10083126B2 (en) Apparatus and method for avoiding conflicting entries in a storage structure
US8949541B2 (en) Techniques for evicting dirty data from a cache using a notification sorter and count thresholds
US9798543B2 (en) Fast mapping table register file allocation algorithm for SIMT processors
US8060700B1 (en) System, method and frame buffer logic for evicting dirty data from a cache using counters and data types
US8786618B2 (en) Shader program headers
KR960001945B1 (en) Apparatus for increasing the number of hits in a translation
US10121220B2 (en) System and method for creating aliased mappings to minimize impact of cache invalidation
US7562204B1 (en) Identifying and relocating relocatable kernel memory allocations in kernel non-relocatable memory
US20160217079A1 (en) High-Performance Instruction Cache System and Method
US9507725B2 (en) Store forwarding for data caches
US10754791B2 (en) Software translation prefetch instructions
US8570916B1 (en) Just in time distributed transaction crediting
US20020188805A1 (en) Mechanism for implementing cache line fills
US8307165B1 (en) Sorting requests to the DRAM for high page locality

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 09755251

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 09755251

Country of ref document: EP

Kind code of ref document: A1