US20140143499A1 - Methods and apparatus for data cache way prediction based on classification as stack data - Google Patents

Methods and apparatus for data cache way prediction based on classification as stack data Download PDF

Info

Publication number
US20140143499A1
US20140143499A1 US13/945,659 US201313945659A US2014143499A1 US 20140143499 A1 US20140143499 A1 US 20140143499A1 US 201313945659 A US201313945659 A US 201313945659A US 2014143499 A1 US2014143499 A1 US 2014143499A1
Authority
US
United States
Prior art keywords
ways
data
cache
data cache
recently accessed
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
US13/945,659
Other versions
US9734059B2 (en
Inventor
Lena E. Olson
Yasuko ECKERT
Vilas K. Sridharan
James M. O'Connor
Mark D. Hill
Srilatha Manne
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Advanced Micro Devices Inc
Original Assignee
Advanced Micro Devices Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority to US201261728843P priority Critical
Application filed by Advanced Micro Devices Inc filed Critical Advanced Micro Devices Inc
Priority to US13/945,659 priority patent/US9734059B2/en
Assigned to ADVANCED MICRO DEVICES, INC. reassignment ADVANCED MICRO DEVICES, INC. ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: ECKERT, Yasuko, O'CONNOR, JAMES M., MANNE, SRILATHA, OLSON, LENA E., SRIDHARAN, Vilas K., HILL, MARK D.
Publication of US20140143499A1 publication Critical patent/US20140143499A1/en
Application granted granted Critical
Publication of US9734059B2 publication Critical patent/US9734059B2/en
Active legal-status Critical Current
Adjusted expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0806Multiuser, multiprocessor or multiprocessing cache systems
    • G06F12/0811Multiuser, multiprocessor or multiprocessing cache systems with multilevel cache hierarchies
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0844Multiple simultaneous or quasi-simultaneous cache accessing
    • G06F12/0846Cache with multiple tag or data arrays being simultaneously accessible
    • G06F12/0848Partitioned cache, e.g. separate instruction and operand caches
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0862Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches with prefetch
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0864Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches using pseudo-associative means, e.g. set-associative or hashing
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0875Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches with dedicated cache, e.g. instruction or stack
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/10Address translation
    • G06F12/1027Address translation using associative or pseudo-associative address translation means, e.g. translation look-aside buffer [TLB]
    • G06F12/1036Address translation using associative or pseudo-associative address translation means, e.g. translation look-aside buffer [TLB] for multiple virtual address spaces, e.g. segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/12Replacement control
    • G06F12/121Replacement control using replacement algorithms
    • G06F12/123Replacement control using replacement algorithms with age lists, e.g. queue, most recently used [MRU] list or least recently used [LRU] list
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0804Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches with main memory updating
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0806Multiuser, multiprocessor or multiprocessing cache systems
    • G06F12/0815Cache consistency protocols
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2212/00Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
    • G06F2212/10Providing a specific technical effect
    • G06F2212/1016Performance improvement
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2212/00Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
    • G06F2212/10Providing a specific technical effect
    • G06F2212/1028Power efficiency
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2212/00Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
    • G06F2212/45Caching of specific data in cache memory
    • G06F2212/451Stack data
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2212/00Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
    • G06F2212/50Control mechanisms for virtual memory, cache or TLB
    • G06F2212/502Control mechanisms for virtual memory, cache or TLB using adaptive policy
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2212/00Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
    • G06F2212/60Details of cache memory
    • G06F2212/601Reconfiguration of cache memory
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2212/00Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
    • G06F2212/60Details of cache memory
    • G06F2212/6032Way prediction in set-associative cache
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2212/00Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
    • G06F2212/60Details of cache memory
    • G06F2212/608Details relating to cache mapping
    • G06F2212/6082Way prediction in set-associative cache
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2212/00Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
    • G06F2212/68Details of translation look-aside buffer [TLB]
    • G06F2212/683Invalidation
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2212/00Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
    • G06F2212/68Details of translation look-aside buffer [TLB]
    • G06F2212/684TLB miss handling
    • Y02D10/13

Abstract

A method of way prediction for a data cache having a plurality of ways is provided. Responsive to an instruction to access a stack data block, the method accesses identifying information associated with a plurality of most recently accessed ways of a data cache to determine whether the stack data block resides in one of the plurality of most recently accessed ways of the data cache, wherein the identifying information is accessed from a subset of an array of identifying information corresponding to the plurality of most recently accessed ways; and when the stack data block resides in one of the plurality of most recently accessed ways of the data cache, the method accesses the stack data block from the data cache.

Description

    CROSS-REFERENCE TO RELATED APPLICATION
  • This application claims the benefit of U.S. provisional patent application Ser. No. 61/728,843, filed Nov. 21, 2012.
  • TECHNICAL FIELD
  • Embodiments of the subject matter described herein relate generally to way prediction and manipulation in a cache memory environment. More particularly, embodiments of the subject matter relate to way prediction and soft-partitioning of a data cache based on stack versus non-stack classification of data.
  • BACKGROUND
  • A central processing unit (CPU) may include or cooperate with one or more levels of a cache hierarchy in order to facilitate quick access to data. This is accomplished by reducing the latency of a CPU request of data in memory for a read or a write operation. Generally, a data cache is divided into sections of equal capacity, called cache “ways”, and the data cache may store one or more blocks within the cache ways. Each block is a copy of data stored at a corresponding address in the system memory.
  • Cache ways are accessed to locate a specific block of data, and the energy expenditure increases with the number of cache ways that must be accessed. For this reason, it is beneficial to utilize methods of operation that limit the number of ways that are necessarily accessed in the search for a particular block of data.
  • BRIEF SUMMARY OF EMBODIMENTS
  • Some embodiments provide a method of way prediction for a data cache having a plurality of ways. Responsive to an instruction to access a stack data block, the method accesses identifying information associated with a plurality of most recently accessed ways of a data cache to determine whether the stack data block resides in one of the plurality of most recently accessed ways of the data cache, wherein the identifying information is accessed from a subset of an array of identifying information corresponding to the plurality of most recently accessed ways; and when the stack data block resides in one of the plurality of most recently accessed ways of the data cache, the method accesses the stack data block from the data cache.
  • Some embodiments provide a set-associative cache memory system. The system includes a first array of ways of the cache memory system, wherein the ways are configured to store N recently accessed data blocks, wherein N is an integer greater than one; a second array of storage elements, configured to store identifying information associated with the N recently accessed data blocks; and cache control circuitry operatively associated with the first and second arrays, wherein the cache control circuitry is configured to respond to a request to access a stack data block by: accessing identifying information associated with a designated number of recently accessed ways of a data cache to determine whether the stack data block resides in one of the designated number of recently accessed ways of the data cache, wherein the identifying information is accessed from a subset of an array of identifying information corresponding to the designated number of most recently accessed ways; and when the stack data block resides in one of the designated number of most recently accessed ways of the data cache, accessing the stack data block from the data cache.
  • Some embodiments provide a method of way prediction for a data cache comprising a plurality of sets, each of the plurality of sets comprising a plurality of ways. The method receives an instruction, at a processor-based device, to access a block of stack data, wherein the instruction comprises a first tag associated with the block of stack data: performs a first tag lookup within a tag array, to determine whether a copy of the first tag resides in a subset of the tag array corresponding to a plurality of recently used ways of the data cache; and when a copy of the first tag resides in the subset of the tag array, accesses the block of stack data from one of the plurality of most recently used ways of the data cache corresponding to the copy of the first tag.
  • This summary is provided to introduce a selection of concepts in a simplified form that are further described below in the detailed description. This summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used as an aid in determining the scope of the claimed subject matter.
  • BRIEF DESCRIPTION OF THE DRAWINGS
  • A more complete understanding of the subject matter may be derived by referring to the detailed description and claims when considered in conjunction with the following figures, wherein like reference numbers refer to similar elements throughout the figures.
  • FIG. 1 is a simplified block diagram of an embodiment of a processor system;
  • FIG. 2 is a block diagram representation of an embodiment of a cache memory environment;
  • FIG. 3 is a block diagram representation of a data transfer relationship between a main memory and a data cache;
  • FIG. 4 is a flow chart that illustrates an embodiment of data cache way prediction based on the data being classified as stack data;
  • FIGS. 5A-5B are block diagram representations that depict an embodiment of data cache recency information before and after updating;
  • FIG. 6 is a flow chart that illustrates an embodiment of soft-partitioning of a data cache, based on stack or non-stack classification of data:
  • FIG. 7 is a block diagram representation of a data transfer relationship between a main memory element and a data cache, where the data cache includes designated ways for stack data; and
  • FIG. 8 is a flow chart that illustrates an embodiment of determining a hit or miss for a soft-partitioned data cache.
  • DETAILED DESCRIPTION
  • The following detailed description is merely illustrative in nature and is not intended to limit the embodiments of the subject matter or the application and uses of such embodiments. As used herein, the word “exemplary” means “serving as an example, instance, or illustration.” Any implementation described herein as exemplary is not necessarily to be construed as preferred or advantageous over other implementations. Furthermore, there is no intention to be bound by any expressed or implied theory presented in the preceding technical field, background, brief summary or the following detailed description.
  • The subject matter presented herein relates to methods used to regulate the energy expended in the operation of a data cache within a computer system. In some embodiments, a request to manipulate a block of stack data is received, including an address for the location in main memory where the block of stack data is located. Once the request is received, the system will access cache memory to detect whether the requested block of stack data resides within the data cache, to accommodate faster and less resource-intensive access than if the system were required to access the block of stack data at the location in main memory in which the block of stack data resides.
  • Within the context of this application, two embodiments are described to address the previously described situation (i.e., storing and/or accessing stack data within a data cache). In accordance with a first embodiment, the system reads the data cache recency information to determine the most recently used (MRU) ways of the data cache and access their contents, predicting that stack data will always be located within a specified plurality of MRU ways of the data cache. In accordance with a second embodiment, the system routes all blocks of stack data to one or more specified ways of the data cache, and during all future accesses of that particular block of stack data, the system will only access the specified way(s) within the data cache.
  • Referring now to the drawings, FIG. 1 is a simplified block diagram of an embodiment of a processor system 100. In accordance with some embodiments, the processor system 100 may include, without limitation: a central processing unit (CPU) 102; a main memory element 104; and a cache memory architecture 108. These elements and features of the processor system 100 may be operatively associated with one another, coupled to one another, or otherwise configured to cooperate with one another as needed to support the desired functionality—in particular, the cache way prediction and cache management functionality described herein. For ease of illustration and clarity, the various physical, electrical, and logical couplings and interconnections for these elements and features are not depicted in FIG. 1. Moreover, it should be appreciated that embodiments of the processor system 100 will include other elements, modules, and features that cooperate to support the desired functionality. For simplicity. FIG. 1 only depicts certain elements that relate to the stack cache management techniques described in more detail below.
  • The CPU 102 may be implemented using any suitable processing system, such as one or more processors (e.g., multiple chips or multiple cores on a single chip), controllers, microprocessors, microcontrollers, processing cores and/or other computing resources spread across any number of distributed or integrated systems, including any number of “cloud-based” or other virtual systems. The CPU 102 represents a processing unit, or plurality of units, that are designed and configured to execute computer-readable instructions, which are stored in some type of accessible memory, such as main memory element 104.
  • Main memory element 104 represents any non-transitory short or long term storage or other computer-readable media capable of storing programming instructions for execution on the processor(s) 110, including any sort of random access memory (RAM), read only memory (ROM), flash memory, magnetic or optical mass storage, and/or the like. As will be recognized by those of ordinary skill in the art, a main memory element 104 is generally comprised of RAM, and, in some embodiments, the main memory element 104 is implemented using Dynamic Random Access Memory (DRAM) chips that are located near the CPU 102.
  • The stack resides within the main memory element 104, and may be defined as a region of memory in a computing architecture where data is added or removed in a last-in, first-out (LIFO) manner. Stack data may be defined as any data currently located in the stack. Generally, the stack is utilized to provide storage for local variables and other overhead data for a particular function within an execution thread, and in multi-threaded computing environments, each thread will have a separate stack for its own use. However, in some embodiments, a stack may be shared by multiple threads. The stack is allocated, and the size of the stack is determined, by the underlying operating system. When a function is called, a pre-defined number of cache lines are allocated within the program stack. One or more cache lines may be “pushed” onto the stack for storage purposes, and will be “popped” off of the stack when a function returns (i.e., when the data on the stack is no longer needed and may be discarded). In some embodiments, it is also possible that the stack may be popped before the function returns. Due to the nature of the LIFO storage mechanism, data at the top of the stack is the data that has been “pushed” onto the stack the most recently, and will be the data that is “popped” off of the stack first. The stack is often implemented as virtual memory that is mapped to physical memory on an as-needed basis.
  • The cache memory architecture 108 includes, without limitation, cache control circuitry 110, a cache memory array 112, and a tag memory array 114. For implementation purposes, these components may be implemented using multiple chips or all may be combined into a single chip.
  • The cache control circuitry 110 contains logic to manage and control certain functions of the cache memory architecture 108. For example, and without limitation, the cache control circuitry 110 may be configured to maintain consistency between the cache memory architecture 108 and the main memory element 104, to update the cache memory array 112 and tag memory array 114 when necessary, to implement a cache write policy, to determine if requested data located within the main memory element 104 is also located within the cache, and to determine if a specific block of requested data is located within the main memory element 104 is cacheable.
  • The cache memory array 112 is the portion of the cache that holds all data stored within the cache. The cache memory array 112 is most commonly implemented using static random access memory (SRAM), but may also be implemented using other forms of random access memory (RAM) or other computer-readable media capable of storing programming instructions. The size of the cache memory array 112 is determined by the size of the cache memory architecture 108, and will vary based upon individual implementation. A cache memory array 112 may be configured or arranged such that it contains “sets”, which may be further subdivided into “ways” of the data cache. Within the context of this application, sets and/or ways of a data cache may be collectively referred to as storage elements, cache memory storage, storage sub-elements, and the like.
  • The tag memory array 114 stores the addresses (or a portion thereof) of each block of data that is stored within the cache memory array 112. The addresses refer to specific locations in which data blocks reside in the main memory element 104, and may be implemented using physical memory addresses, virtual memory addresses, or a combination of both. The tag memory array 114 will generally consist of Random Access Memory (RAM), and in some embodiments, comprises Static Random Access Memory (SRAM). In some embodiments, the tag memory array 114 will be referred to as a tag array. A tag array may be further subdivided into storage elements for each tag stored.
  • FIG. 2 is a block diagram representation of an embodiment of a cache memory environment. An N-way, set-associative data cache 200, organized as an array of cache lines, is shown. The data cache 200 contains M sets 202. Each set 202 contains N ways 204, and consequently each set 202 is a collection of N cache lines 220, where each cache line 220 of a set 202 resides within a unique way of the data cache 200.
  • A cache line 220 may also be referred to as a cache entry or a cache block. Each cache line 220 is associated with a cache line tag 206 and cache line data 208. As shown, each cache line tag 206 resides within the same structure as the associated cache line data 208. However, in some embodiments, the cache line tags 206 associated with the contents of the ways 204 of the data cache 200 are stored in a separate structure. Cache line data 208 is a fixed-size copy of a block of data that resides in the main memory of the system, and a cache line tag 206 is a unique identifier associated with the memory address of the location within the main memory where the associated cache line data 208 resides.
  • The data cache 200 is depicted with the capability of performing comparison operations between cache line data 208, residing within a way (204-0, 204-1, 204-(n−1)) of the data cache 200, and a block of data residing in the main memory of the system. This situation presents itself when a core pipeline of a system initiates a request to access data for manipulation within the system, and a search of the data cache 200 for a copy of the relevant associated data block must be made. The request to manipulate a data block includes a physical address 210 for the request to manipulate the data block.
  • All of the information needed to potentially locate a copy of the data block in the data cache 200 is given in the physical address 210. As shown, the physical address 210 includes a tag field 212, an index field 214, and an offset field 216. The offset field 216 is used to locate a subset of data within the data block in the data cache 200. The index field 214 identifies the set 202 of the data cache 200 that will be searched for the copy of the data block. The tag field 212 is compared against all cache line tags in the identified set 202, and contains a unique identifier for the data block.
  • The data cache 200 utilizes an N:1 multiplexer 218, which receives input from each way (204-0, 204-1, 204-(n−1)) within a selected set 202, and forwards a selected input to a single line output. As is commonly known in the art, each cache line data 208 within a set is a separate input to the multiplexer. As shown, the tag field 212 of the physical address 210 is compared to the cache line tag 206 for each cache line 220 of a selected set 202. When there is a match between the two tags, a “hit” has occurred, and this information would be applied to the appropriate select lines of the multiplexer 218 so that data from the corresponding input is passed to the multiplexer 218 output. In this example, a hit is shown to have occurred at Set 2, Way 1, and the associated cache line data 208 is selected and output from the multiplexer 218.
  • FIG. 3 is a block diagram representation of a data transfer relationship between a main memory element and a data cache, as is well-known in the art. As shown, a partial memory hierarchy 300 contains a main memory element 302 (such as the main memory element 104 shown in FIG. 1) and a data cache 304. The data cache 304 contains four sets (Set 0, Set 1, Set 2, Set 3), which in turn are divided into four ways 310. The total number of sets within a data cache 304 is determined by the size of the data cache 304 and the number of ways 310, and the sets and ways 310 are numbered sequentially. For example, an four-way, set-associative data cache with four sets will contain sets numbered Set 0 through Set 3, and ways numbered Way 0 through Way 3 within each set.
  • The typical mapping scheme uses the “middle bits” (see “index” bits in FIG. 2) to map the memory addresses to the cache sets. The higher bits (i.e. the “tag” bits in FIG. 2) tell you if you have a hit or miss in a given way of the set. As shown in FIG. 2, for a given Set i, you would be able to store every Nth memory address (where N is the number of sets). In the example shown, in which there are four sets, for Set 0, the following data blocks could be stored: Data Block 0, Data Block 4, Data Block 8, Data Block 12, etc. For Set 1, the following data blocks could be stored: Data Block 1, Data Block 5, Data Block 9, Data Block 13, etc. Because the memory addresses associated with each data block 306 are linear, every Nth address maps into a particular set.
  • The main memory element 302 is divided into data blocks 306. As used herein, a “block” is a set of bytes stored in contiguous memory locations, which are treated as a unit for coherency purposes, and the terms “block” and “line” are interchangeable. Generally, each data block 306 stored in main memory and the capacity of each cache line are the same size. For example, a system including a main memory consisting of 64 byte data blocks 306 may also include cache lines that are configured to store 64 bytes. However, in some embodiments, a data block 306 may be twice the size of the capacity of each cache line. For example, a system including a main memory consisting of 128 byte data blocks 306 may also include cache lines that are configured to store 64 bytes.
  • Each data block 306 corresponds to a specific set 308 of the data cache 304. In other words, a data block 306 residing in a specific area (i.e., at a specific address) in the main memory element 302 will automatically be routed to a specific area, or set 308, when it is cached. For example, when a system receives a request to manipulate data that is not located within the data cache 304, the data can be imported from the main memory element 302 to the data cache 304. The data is imported into a specific, pre-defined set 308 within the data cache 304, based upon the address of the data block 306 in the main memory element 302.
  • In some embodiments, the imported data block 306 and the cache line into which the data block 306 is mapped are equivalent in size. However, in some embodiments, the data block 306 may be twice the size of the capacity of the cache line, including an amount of data that would fill the capacity of two cache lines. In this example, the large data block 306 may include multiple addresses, but only the first address (i.e., the address for the starting cache line) is used in mapping the data block 306 into the data cache 304. In this case, configuration information that is specific to the hardware involved is used by the processor to make the necessary calculations to map the second line of the data block 306 into the data cache 304.
  • The exemplary structures and relationships outlined above with reference to FIG. 1-3 are not intended to restrict or otherwise limit the scope or application of the subject matter described herein. FIG. 1-3, and their descriptions, are provided here to summarize and illustrate the general relationship between data blocks, sets, and ways, and to form a foundation for the techniques and methodologies presented below.
  • Data Cache Way Prediction
  • FIG. 4 is a flow chart that illustrates an embodiment of data cache way prediction process 400, based on the data being classified as stack data. The various tasks performed in connection with a process described here may be performed by software, hardware, firmware, or any combination thereof. For illustrative purposes, the description of a process may refer to elements mentioned above in connection with FIG. 1-3. In practice, portions of a described process may be performed by different elements of the described system, e.g., the CPU, the cache control circuitry, or other components of the system. It should be appreciated that a described process may include any number of additional or alternative tasks, the tasks shown in the figures need not be performed in the illustrated order, and that a described process may be incorporated into a more comprehensive procedure or process having additional functionality not described in detail herein. Moreover, one or more of the tasks shown in the figures could be omitted from embodiments of a described process as long as the intended overall functionality remains intact.
  • For case of description and clarity, this example assumes that the data cache way prediction process 400 begins when a process receives identifying information for a block of stack data (402). In certain embodiments, the identifying information is extracted from an instruction to manipulate a block of stack data, sent by a CPU (such as the CPU 102 shown in FIG. 1). This identifying information is associated with the stack data block and is then available to the system for further use. In some embodiments, the identifying information may include main memory location information, detailing a location within main memory where the data block in question is stored. In some embodiments, this memory address may be a physical address, a virtual address, or a combination of both.
  • In response to receiving the instruction, the process 400 may continue by accessing data cache recency information (404), also known as data cache metadata, which details how recently the contents of each way of the data cache have been accessed. In some embodiments, the process 400 maintains a sequential list of data detailing a most recently used (MRU) way of a particular set within the data cache, a least recently used (LRU) way of the data cache, and a plurality of ways that have been accessed between the MRU and LRU ways of the data cache. This sequential list is maintained for each set within the data cache. The plurality of ways that have been accessed between the MRU and LRU ways of the data cache are generally labeled MRU+1, MRU+2, MRU+3, etc., depending upon how many ways are contained within the data cache. For example, in an eight-way, set-associative cache, there will be an MRU, an LRU, and i MRU ways listed between the MRU and LRU, where i=(N−2). In some embodiments, there may be as few as two ways within the data cache, corresponding to one MRU way and one LRU way on the sequential list, without any ways listed between the MRU and LRU. In practice, the “entries” in such a sequential list are dynamically updated and shifted (if needed) in an ongoing manner in response to accesses to the ways of the data cache.
  • The process 400 may then continue by obtaining identifying information for a designated number of recently accessed data blocks (406). The recently accessed data blocks may include one or more data blocks in the data cache, and in some embodiments, will include at least the contents of the most recently used (MRU) way of the data cache. In some embodiments, the recently accessed data blocks may include the contents of one or more additional ways of the data cache, such as the MRU+1, MRU+2, etc. ways of the data cache. The number of recently accessed data blocks will depend upon the size of the data cache and possibly user and/or designer preference.
  • The designated number of recently accessed data blocks that will be used in further processing is determined using the recency information. For example, the recency information details which ways of the data cache have been accessed the most recently (MRU), the next most recently (MRU+1), and so on down the sequential list of data cache ways to the least recently accessed entry (LRU). A designated, or pre-defined, number of those most recently accessed ways are predicted to contain all stack data blocks that are located within the data cache. Accordingly, the list of designated ways begins with the MRU and continues in sequence to include the designated number of ways. Depending upon the size of the data cache, the designated number of ways may include any subset of the total number of ways included in the sequential list.
  • In certain implementations, the designated number of most recently accessed ways, which are predicted to contain all stack data blocks within the data cache, is a pre-determined number based on the size of the data cache. For example, in an eight-way, set-associative data cache, the designated number may be two, indicating that all stack data blocks will be located within the two most recently accessed ways of the data cache. In other words, all stack data blocks will be located within the ways of the data cache identified as the MRU and the MRU+1. In data caches containing a larger number of ways, the designated number of most recently accessed ways may be increased, and in data caches having a smaller number of ways, the designated number of most recently accessed ways may be decreased. The designated number of most recently accessed ways is a smaller subset of all of the ways contained within the data cache, and the objective in searching a smaller subset of ways to locate stack data is to conserve some of the energy that is usually expended when searching all of the ways of the data cache.
  • In some embodiments, the prediction that the designated number of most recently accessed ways will contain all stack data blocks within the data cache is based on empirical test results and/or statistical analyses that indicate a very high percentage of occurrences. Specific ways of the data cache are only associated with the recency positions (i.e., MRU, MRU+1, etc.) temporarily, because the recency information changes with every operation. The only exception to this situation occurs when subsequent operations utilize the same data, and the contents of the MRU remain the same. Otherwise, upon each operation the designated subset of ways of the data cache changes, and therefore the ways of the data cache that are predicted to hold all stack data within the data cache change.
  • In some embodiments, the identifying information obtained at block 406 of FIG. 4 includes main memory location data for the designated number of recently accessed data blocks. In this regard, the process 400 may read a specified number of tags to obtain the identifying information for the designated number of recently accessed data blocks.
  • In some embodiments, the process 400 obtains the contents of each of the designated number, or subset, of recently accessed data blocks concurrently with obtaining their identifying information. The contents for each of the designated number of recently accessed data blocks are obtained from within a way of the data cache where an identical copy of the data resides, when the original data resides within the system main memory. For example, the system accesses a group of designated tags in a first clock cycle, and rather than waiting for a second clock cycle to access the data associated with the tags and identified as a hit, the contents are accessed at the same time that the tags are accessed. When the determination that there has been a hit in the data cache has been made, the data will already have been accessed. This limits the access time to one clock cycle and is also a power-saving strategy. In some embodiments, the cache control circuitry obtains the contents of only the most recently used (MRU) way of the data cache concurrently with obtaining the identifying information for the MRU. In other words, in some embodiments, this concurrent “identify and obtain” methodology is utilized for the MRU way only, and is not performed for any of the other ways of the data cache.
  • The process 400 may then proceed to determine if a hit has occurred (408) within the data cache, based upon a comparison of the obtained identifying information for the designated plurality of most recently accessed data blocks to the identifying information for the requested block of stack data (i.e., the block of stack data that is the subject of the instruction received at 402). In this regard, each of the designated plurality of most recently accessed data blocks is associated with separate and distinct identifying information, and each is compared to the identifying information associated with the requested block of stack data. The objective of this comparison is to locate a match, or in other words, to determine whether the identifying information (address) for any of the designated recently accessed data blocks is identical to the identifying information (address) of the requested stack data block.
  • In accordance with well-established principles, a “hit” occurs when a segment of data that is stored in the main memory of a computer system is requested by the computer system for manipulation, and that segment of data has a more quickly accessible copy located in a data cache of the computer system. Otherwise, the process 400 does not indicate that a hit has occurred. Thus, if the comparison results in a match between the identifying information for the requested block of stack data and the identifying information for one of the designated number of most recently accessed data blocks (i.e., both sets of identifying information are the same), then the process 400 can indicate that both sets of data are the same. Accordingly, if the data being requested from memory (in this case, the stack data block) and the data located within one of the recently accessed ways of the data cache (in this case, a copy of the stack data block) are determined to be the same, then the process 400 will follow the “Yes” branch of the decision block 408. Otherwise, the process 400 follows the “No” branch of the decision block 408.
  • When a hit has been confirmed (the “Yes” branch of 408), the process 400 retrieves the requested block of stack data for use (410). In some embodiments, the process retrieves the stack data according to a previously received instruction. Because there has been a hit, it is known that one of the designated number of recently accessed ways of the data cache contains a copy of the requested block of stack data. Accordingly, the requested block of stack data can be accessed in the data cache, which has the advantage of occurring more quickly than attempting to access the requested block of stack data at its original location within the system main memory.
  • Following the retrieval of the block of stack data, the process 400 updates the data cache metadata, which includes recency information. For example, accessing and manipulating the block of stack data could alter which way of the data cache has been the most recently accessed. If necessary, therefore, the system will update the list of the MRU ways of the data cache, by labeling the data cache way holding the requested block of stack data as the “new” MRU. Moreover, the “old” MRU data cache way is shifted down to the MRU+1 space, the old MRU+1 is shifted down to MRU+2, and so on. In other embodiments, however, the process may access and manipulate the same block of stack data, and in this scenario, the list of MRU ways of the data cache would not change following the operation.
  • FIGS. 5A-5B are block diagram representations that depict data cache recency information before and after updating. FIG. 5A and FIG. 5B show charts listing all of the ways in an exemplary eight-way, set-associative data cache. The entry for each way in the chart has a corresponding entry detailing recency information for that particular way, or how recently each way has been accessed in relation to the other ways in the data cache.
  • As shown in FIG. 5A, the data contained within Way 0 has been accessed more recently than the contents of any of the other ways of the data cache. In other words, Way 0 is the most recently used (MRU) way of the data cache. In FIG. 5B, Way 4 of the data cache is now marked as the MRU, shifting the rest of the ways of the data cache down one value towards the least recently used (LRU) way of the data cache. This figure shows that, after Way 0 was utilized by the CPU, the contents of Way 4 were then accessed by the CPU. Way 4 is the most recently accessed way of the data cache, otherwise known as the MRU, and Way 0 is now identified as the MRU+1.
  • Referring back to FIG. 4, when a hit has not been confirmed (the “No” branch of 408), the process 400 may continue substantially as described above, while omitting the filtering of the most recently accessed ways of the data cache. In other words, when the process only checks the pre-specified number of recently accessed data blocks for a copy of a requested stack data block, the pre-specified number of recently accessed data blocks have been filtered out of the total number of data blocks contained within a corresponding set of the data cache. In some embodiments, rather than filtering the most recently accessed ways, the process 400 takes into account the contents of all of the ways in the set. In some embodiments, the “No” branch of 408 causes the process 400 to obtain identifying information associated with all of the ways (412) in the data cache. In some embodiments, the identifying information is found within the tags associated with each way of the data cache.
  • Next, the process 400 checks whether a hit has occurred (414) within any of the ways of the data cache, by comparing the identifying information for the contents of each of the data cache ways, individually, to the identifying information for the requested block of stack data and seeking a match between the two. Thus, the process 400 may need to repeat some or all of the comparison performed previously.
  • When a match between the identifying information for the contents of one of the data cache ways and the identifying information for the requested block of stack data is found, a hit is confirmed (the “Yes” branch of 414) within the data cache. The system will then retrieve the requested block of stack data for use (416) from the data cache. When a hit has not been confirmed (the “No” branch of 414), the process 400 reverts to conventional methods of processing a cache miss that are well known in the art (418). At this point, it may be necessary to access or search for the requested stack data block in main memory, or a lower level cache within the computer system, for manipulation. If found in the main memory, the requested stack data block is imported into the data cache for future use, according to conventional cache memory techniques.
  • Soft-Partitioning of the Data Cache
  • FIG. 6 is a flow chart that illustrates an embodiment of a process 600 for soft-partitioning a data cache, based on stack or non-stack classification of data. As used here, “soft-partitioning” means storing all stack data within one or more designated ways of the data cache. The designated ways of the data cache will not necessarily always contain stack data only, but any stack data will always be routed to the designated way(s).
  • For ease of description and clarity, this example assumes that the soft-partitioning of a data cache process 600 begins when a block of stack data is required for use by a computer system, but is not currently accessible from the data cache of the system. The process 600 selects one of a plurality of designated ways of a data cache (602). This is accomplished by first checking each of the plurality of designated ways within the applicable set for an invalid cache line. If there is an invalid line, this particular designated way is then selected. If there is no invalid line, the system will select one of the plurality of designated ways, according to the applicable replacement policy. This selected way is then evicted by removing the contents of the selected, designated way to accommodate new data that will replace it in the way. In accordance with conventional methodologies, the evicted data is removed from the way of the data cache, but continues to reside in its original place within main memory. The plurality of designated ways is a subset of the total ways of the data cache, and will be fewer or equal in number than the total number of ways contained within the data cache. The designated ways are ways in the data cache that have been chosen to hold all stack data that comes into the data cache. In some embodiments, there will be at least one designated way, and the number of designated ways may increase based upon the associativity of the cache. In some embodiments, the ways within the subset of designated ways may be chosen arbitrarily, or may be chosen based upon user and/or designer preference.
  • Once one of the plurality of designated ways of the data cache has been evicted, the process 600 retrieves a copy of the contents of a block of stack data from system memory (604). In some embodiments, this copy is retrieved from the location in which the block of stack data resides in main system memory. In some embodiments, this copy is retrieved from a lower level cache element within the memory hierarchy. In some embodiments, it is also possible for the copy of the block of stack data to be retrieved from another location in the memory hierarchy of the computer system.
  • Next, the process 600 imports the copy of the block of stack data into the selected, designated way of the data cache (606), where it will reside until the contents of the designated way are again evicted so that new data may be stored within the designated way. The designated ways are the only ways that are permitted to contain stack data, however, the stack data may be evicted so that it might be replaced within the designated way by non-stack data, depending upon the needs of the computer system. After the copy of the block of stack data is imported into the designated way, the process 600 may retrieve it from the data cache for use by the system.
  • This concept of storing stack data within specific way(s) of a data cache is illustrated in FIG. 7. FIG. 7 is a block diagram representation of a data transfer relationship between a main memory element and a data cache, where the data cache includes designated ways for stack data. As shown, a partial memory hierarchy 700 contains a main memory element 702 (such as the main memory element 104 shown in FIG. 1) and a data cache 704. The data cache 704 has four sets (Set 0, Set 1, Set 2, Set 3), which in turn are divided into four of ways 710. Here, the sets and the ways 710 are numbered sequentially. For example, a four-way, set-associative data cache with four sets will contain sets numbered Set 0 through Set 3 and ways numbered Way 0 through Way 3 within each set.
  • The main memory element 702 is divided into data blocks 706, and each data block 706 corresponds to a specific set of the data cache 704, as is well-known in the art. In this example, three data blocks 706 within the main memory element 702 are designated as stack data blocks 712. However, a certain number of stack data blocks 712 is not required, and will vary based on use of the stack. As shown, stack data blocks 712 are directed into designated ways 714 of the data cache 704. Stack data blocks 712 are not found within any of the other ways 710 of the data cache 704.
  • Before stack data can be stored within one or more designated ways of the data cache, as described in the context of FIG. 6 and as shown in FIG. 7, the system will determine whether the particular block of stack data already resides within the data cache. FIG. 8 is a flow chart that illustrates an embodiment of a process 800 of determining a hit or a miss for a soft-partitioned data cache, based on stack or non-stack classification of data. For ease of description and clarity, this example assumes that the data cache way prediction process 800 begins when a process receives identifying information for a block of stack data (802). The operation performed at 802 is identical to the operation performed at 402 of FIG. 4, which was described in detail above in context of the process 400.
  • The process 800 obtains identifying information associated with a plurality of designated ways (804) of the data cache. In some embodiments, the plurality of designated ways of the data cache may be selected arbitrarily at design time, based upon user and/or designer preference. In some embodiments, the identifying information includes main memory location data for the recently accessed data blocks. In certain embodiments, the process 800 reads a specified number of tags to obtain identifying information for the designated number of recently accessed data blocks
  • The process 800 may continue by determining whether or not a hit has occurred (806) by comparing the obtained identifying information associated with each of the plurality of designated ways to the identifying information for the requested block of stack data, as described above for the operation 408 of the process 400.
  • When a hit has been confirmed (the “Yes” branch of 806), the process 800 retrieves the requested block of stack data for use (808), as described previously in the context of the process 400. When a hit has not been confirmed (the “No” branch of 806), the process 800 may continue substantially as described above, while omitting the filtering of the designated ways of the data cache, and instead takes into account the contents of the entire data cache. To do this, the process 800 obtains identifying information associated with all ways (810) of the data cache. In some embodiments, the identifying information includes tags, which contain the address information required to identify whether the associated block in the hierarchy corresponds to a block of data requested by the processor. For example, the identifying information may include unique information associated with the contents of each way of the data cache which correspond to unique information associated with contents of various locations within main memory.
  • Next, the process 800 may continue by determining whether or not a hit has occurred (812) by comparing the obtained identifying information associated with each of the data cache ways, individually, to the identifying information for the requested block of stack data, as described above for the operation 414 of the process 400.
  • When a hit has been confirmed (the “Yes” branch of 812) within the data cache, the system will retrieve the requested block of stack data for use (814). When a hit has not been confirmed (the “No” branch of 812), the process 800 exits and the Soft-Partitioning of a Data Cache process 600 begins, as shown in FIG. 6 and described in detail above.
  • Techniques and technologies may be described herein in terms of functional and/or logical block components, and with reference to symbolic representations of operations, processing tasks, and functions that may be performed by various computing components or devices. Such operations, tasks, and functions are sometimes referred to as being computer-executed, computerized, software-implemented, or computer-implemented. In practice, one or more processor devices can carry out the described operations, tasks, and functions by manipulating electrical signals representing data bits at memory locations in the system memory, as well as other processing of signals. The memory locations where data bits are maintained are physical locations that have particular electrical, magnetic, optical, or organic properties corresponding to the data bits. It should be appreciated that the various block components shown in the figures may be realized by any number of hardware, software, and/or firmware components configured to perform the specified functions. For example, an embodiment of a system or a component may employ various integrated circuit components, e.g., memory elements, digital signal processing elements, logic elements, look-up tables, or the like, which may carry out a variety of functions under the control of one or more microprocessors or other control devices.
  • While at least one exemplary embodiment has been presented in the foregoing detailed description, it should be appreciated that a vast number of variations exist. It should also be appreciated that the exemplary embodiment or embodiments described herein are not intended to limit the scope, applicability, or configuration of the claimed subject matter in any way. Rather, the foregoing detailed description will provide those skilled in the art with a convenient road map for implementing the described embodiment or embodiments. It should be understood that various changes can be made in the function and arrangement of elements without departing from the scope defined by the claims, which includes known equivalents and foreseeable equivalents at the time of filing this patent application.

Claims (20)

What is claimed is:
1. A method of way prediction for a data cache having a plurality of ways, comprising:
responsive to an instruction to access a stack data block,
accessing identifying information associated with a plurality of most recently accessed ways of a data cache to determine whether the stack data block resides in one of the plurality of most recently accessed ways of the data cache, wherein the identifying information is accessed from a subset of an array of identifying information corresponding to the plurality of most recently accessed ways; and
when the stack data block resides in one of the plurality of most recently accessed ways of the data cache, accessing the stack data block from the data cache.
2. The method of claim 1, wherein the identifying information associated with each of the plurality of most recently accessed ways of the data cache references associated contents of each of the plurality of most recently accessed ways, and corresponds to identifying information for a copy of the associated contents of each of the plurality of most recently accessed ways, and wherein the copy of the associated contents of each of the plurality of most recently accessed ways is stored in a second location in a memory hierarchy.
3. The method of claim 2, wherein the plurality of the most recently accessed ways comprises two most recently used ways, identified as a most recently used (MRU) way of the data cache and in an MRU plus one way of the data cache.
4. The method of claim 1, wherein accessing the identifying information associated with a plurality of most recently accessed ways comprises reading a plurality of tags, and wherein each of the plurality of tags is associated with an individual one of the plurality of the most recently accessed ways.
5. The method of claim 1, further comprising:
obtaining contents of each of the plurality of most recently accessed ways concurrently with accessing the identifying information for each of the plurality of most recently accessed ways.
6. The method of claim 1, wherein the plurality of most recently accessed ways are located in a sequence of most recently used (MRU) ways of the data cache, and wherein the sequence of MRU ways of the data cache comprises an MRU way, a least recently used (LRU) way, and a number of ways between the MRU way and the LRU way.
7. The method of claim 6, further comprising:
accessing contents of the MRU way of the data cache, concurrently with accessing the identifying information for each of the plurality of the most recently accessed ways.
8. The method of claim 1, further comprising:
when the stack data block does not reside in one of the most recently accessed ways of the data cache,
accessing second identifying information for each of the plurality of ways of the data cache to determine whether the stack data block resides in the data cache;
wherein the second identifying information is accessed from the array of identifying information corresponding to all of the ways of a specified set of the data cache.
9. A set-associative cache memory system, comprising:
a first array of ways of the cache memory system, wherein the ways are configured to store N recently accessed data blocks, wherein N is an integer greater than one;
a second array of storage elements, configured to store identifying information associated with the N recently accessed data blocks; and
cache control circuitry operatively associated with the first and second arrays, wherein the cache control circuitry is configured to respond to a request to access a stack data block by:
accessing identifying information associated with a designated number of recently accessed ways of a data cache to determine whether the stack data block resides in one of the designated number of recently accessed ways of the data cache, wherein the identifying information is accessed from a subset of an array of identifying information corresponding to the designated number of most recently accessed ways; and
when the stack data block resides in one of the designated number of most recently accessed ways of the data cache, accessing the stack data block from the data cache.
10. The set-associative cache memory system of claim 9, wherein, prior to accessing the identifying information associated with a designated number of recently accessed ways, the cache control circuitry is further configured to:
obtain identifying information associated with the designated number of recently accessed ways;
compare the identifying information for the block of stack data to the identifying information associated with the designated number of recently accessed ways to determine whether a hit has occurred within the cache memory system, wherein the hit occurs when the comparison results in a match; and
when a hit has occurred, access the block of stack data for manipulation from the cache memory system.
11. The set-associative cache memory system of claim 10, wherein the cache control circuitry is further configured to obtain contents of each of the designated number of recently accessed ways concurrently with obtaining the identifying information associated with the designated number of recently accessed ways.
12. The set-associative cache memory system of claim 10, wherein the cache control circuitry is further configured to obtain contents of only one of the designated number of recently accessed ways, concurrently with obtaining the identifying information associated with the designated number of recently accessed ways.
13. The set-associative cache memory system of claim 10, wherein the cache control circuitry is further configured to:
determine a plurality of recently accessed ways of a data cache by accessing metadata associated with the set-associative cache memory system;
wherein the metadata comprises recency information detailing an ordered sequence of a plurality of most recently used (MRU) ways of the cache memory system.
14. The set-associative cache memory system of claim 13, wherein the cache control circuitry is further configured to update the metadata associated with the cache memory system when a hit has occurred within the cache memory system.
15. The set-associative cache memory system of claim 10, wherein when a hit has not occurred within the cache memory system, the cache control circuitry is further configured to:
examine second identifying information associated with the N recently accessed ways;
compare the identifying information for the block of stack data to the second identifying information associated with the N recently accessed ways to determine whether a hit has occurred within the cache memory system, wherein a hit occurs when the comparison results in a second match; and
access the block of stack data for manipulation when a hit has occurred within the cache memory system.
16. A method of way prediction for a data cache comprising a plurality of sets, each of the plurality of sets comprising a plurality of ways, wherein the method comprises:
receiving an instruction, at a processor-based device, to access a block of stack data, wherein the instruction comprises a first tag associated with the block of stack data;
performing a first tag lookup within a tag array, to determine whether a copy of the first tag resides in a subset of the tag array corresponding to a plurality of recently used ways of the data cache; and
when a copy of the first tag resides in the subset of the tag array,
accessing the block of stack data from one of the plurality of most recently used ways of the data cache corresponding to the copy of the first tag.
17. The method of claim 16, wherein the plurality of recently used ways of the data cache comprises a subset of the ways of the data cache, wherein the data cache comprises:
a most recently used (MRU) way of the data cache, a least recently used (LRU) way of the data cache, and all ways between the MRU and LRU ways of the data cache, wherein all the ways between the MRU and LRU ways in the data cache are ordered sequentially; and
wherein the subset comprises at least the MRU way and a specified number of additional sequential ways.
18. The method of claim 17, wherein performing a first tag lookup within a tag array comprises reading a plurality of tags, and wherein each of the plurality of tags is associated with an individual one of the plurality of recently used ways of the data cache.
19. The method of claim 18, further comprising:
reading contents associated with the plurality of tags concurrently with reading the plurality of tags.
20. The method of claim 17, further comprising:
reading contents associated with the MRU way of the data cache concurrently with reading the plurality of tags.
US13/945,659 2012-11-21 2013-07-18 Methods and apparatus for data cache way prediction based on classification as stack data Active 2034-06-29 US9734059B2 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
US201261728843P true 2012-11-21 2012-11-21
US13/945,659 US9734059B2 (en) 2012-11-21 2013-07-18 Methods and apparatus for data cache way prediction based on classification as stack data

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US13/945,659 US9734059B2 (en) 2012-11-21 2013-07-18 Methods and apparatus for data cache way prediction based on classification as stack data
US13/946,112 US9298615B2 (en) 2012-11-21 2013-07-19 Methods and apparatus for soft-partitioning of a data cache for stack data

Publications (2)

Publication Number Publication Date
US20140143499A1 true US20140143499A1 (en) 2014-05-22
US9734059B2 US9734059B2 (en) 2017-08-15

Family

ID=60808564

Family Applications (2)

Application Number Title Priority Date Filing Date
US13/945,659 Active 2034-06-29 US9734059B2 (en) 2012-11-21 2013-07-18 Methods and apparatus for data cache way prediction based on classification as stack data
US13/946,112 Active 2034-04-27 US9298615B2 (en) 2012-11-21 2013-07-19 Methods and apparatus for soft-partitioning of a data cache for stack data

Family Applications After (1)

Application Number Title Priority Date Filing Date
US13/946,112 Active 2034-04-27 US9298615B2 (en) 2012-11-21 2013-07-19 Methods and apparatus for soft-partitioning of a data cache for stack data

Country Status (1)

Country Link
US (2) US9734059B2 (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150125085A1 (en) * 2013-11-04 2015-05-07 Niraj Gupta Integral image coding
WO2016097795A1 (en) * 2014-12-14 2016-06-23 Via Alliance Semiconductor Co., Ltd. Multi-mode set associative cache memory dynamically configurable to selectively allocate into all or subset or tis ways depending on mode
US20160350229A1 (en) * 2014-12-14 2016-12-01 Via Alliance Semiconductor Co., Ltd. Dynamic cache replacement way selection based on address tag bits
US20160357681A1 (en) * 2014-12-14 2016-12-08 VIA Alliance Semicanductor Co., Ltd. Multi-mode set associative cache memory dynamically configurable to selectively select one or a plurality of its sets depending upon the mode
US10698827B2 (en) * 2014-12-14 2020-06-30 Via Alliance Semiconductor Co., Ltd. Dynamic cache replacement way selection based on address tag bits

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2016097806A1 (en) * 2014-12-14 2016-06-23 Via Alliance Semiconductor Co., Ltd. Fully associative cache memory budgeted by memory access type
US9652398B2 (en) * 2014-12-14 2017-05-16 Via Alliance Semiconductor Co., Ltd. Cache replacement policy that considers memory access type
EP3066572B1 (en) * 2014-12-14 2020-02-19 VIA Alliance Semiconductor Co., Ltd. Cache memory budgeted by chunks based on memory access type
US9910785B2 (en) 2014-12-14 2018-03-06 Via Alliance Semiconductor Co., Ltd Cache memory budgeted by ways based on memory access type
JP6207766B2 (en) * 2014-12-14 2017-10-04 ヴィア アライアンス セミコンダクター カンパニー リミテッド Set associative cache memory with heterogeneous replacement policy
US10089233B2 (en) * 2016-05-11 2018-10-02 Ge Aviation Systems, Llc Method of partitioning a set-associative cache in a computing platform
US10452598B2 (en) * 2016-10-18 2019-10-22 Micron Technology, Inc. Apparatuses and methods for an operating system cache in a solid state device

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5903910A (en) * 1995-11-20 1999-05-11 Advanced Micro Devices, Inc. Method for transferring data between a pair of caches configured to be accessed from different stages of an instruction processing pipeline
WO2002045385A2 (en) * 2000-11-20 2002-06-06 Zucotto Wireless, Inc. Methods and devices for caching method frame segments in a low-power stack-based processor
US20040148467A1 (en) * 2003-01-16 2004-07-29 Ip-First, Llc. Microprocessor and apparatus for performing fast speculative pop operation from a stack memory cache
US7065613B1 (en) * 2002-06-06 2006-06-20 Maxtor Corporation Method for reducing access to main memory using a stack cache
US20080235453A1 (en) * 2007-03-22 2008-09-25 International Business Machines Corporation System, method and computer program product for executing a cache replacement algorithm
US7536510B1 (en) * 2005-10-03 2009-05-19 Advanced Micro Devices, Inc. Hierarchical MRU policy for data cache
US20090328022A1 (en) * 2008-06-26 2009-12-31 International Business Machines Corporation Systems and methods for maintaining crtm code
JP2012027652A (en) * 2010-07-22 2012-02-09 Fujitsu Ltd Cache memory

Family Cites Families (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5434990A (en) 1990-08-06 1995-07-18 Ncr Corporation Method for serially or concurrently addressing n individually addressable memories each having an address latch and data latch
DE69130967T2 (en) 1990-08-06 1999-10-21 Ncr Int Inc Computer memory arrangement
US5764946A (en) 1995-04-12 1998-06-09 Advanced Micro Devices Superscalar microprocessor employing a way prediction unit to predict the way of an instruction fetch address and to concurrently provide a branch prediction address corresponding to the fetch address
US5835744A (en) 1995-11-20 1998-11-10 Advanced Micro Devices, Inc. Microprocessor configured to swap operands in order to minimize dependency checking logic
KR100529416B1 (en) 1996-01-24 2006-01-27 선 마이크로시스템즈 인코퍼레이티드 Method and apparatus of instruction folding for a stack-based machine
US6038643A (en) 1996-01-24 2000-03-14 Sun Microsystems, Inc. Stack management unit and method for a processor having a stack
JP3634379B2 (en) 1996-01-24 2005-03-30 サン・マイクロシステムズ・インコーポレイテッド Method and apparatus for stack caching
US5787469A (en) 1996-09-06 1998-07-28 Intel Corporation System and method for exclusively writing tag during write allocate requests
US6167488A (en) 1997-03-31 2000-12-26 Sun Microsystems, Inc. Stack caching circuit with overflow/underflow unit
US6058457A (en) 1997-06-23 2000-05-02 Sun Microsystems, Inc. Method for storing method frames in multiple stacks
US6067602A (en) 1997-06-23 2000-05-23 Sun Microsystems, Inc. Multi-stack-caching memory architecture
US6138210A (en) 1997-06-23 2000-10-24 Sun Microsystems, Inc. Multi-stack memory architecture
US6092152A (en) 1997-06-23 2000-07-18 Sun Microsystems, Inc. Method for stack-caching method frames
US6742112B1 (en) 1999-12-29 2004-05-25 Intel Corporation Lookahead register value tracking
US7222270B2 (en) 2003-01-10 2007-05-22 International Business Machines Corporation Method for tagging uncorrectable errors for symmetric multiprocessors
US20050050278A1 (en) 2003-09-03 2005-03-03 Advanced Micro Devices, Inc. Low power way-predicted cache
US7702855B2 (en) 2005-08-11 2010-04-20 Cisco Technology, Inc. Optimizing cached access to stack storage

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5903910A (en) * 1995-11-20 1999-05-11 Advanced Micro Devices, Inc. Method for transferring data between a pair of caches configured to be accessed from different stages of an instruction processing pipeline
WO2002045385A2 (en) * 2000-11-20 2002-06-06 Zucotto Wireless, Inc. Methods and devices for caching method frame segments in a low-power stack-based processor
US7065613B1 (en) * 2002-06-06 2006-06-20 Maxtor Corporation Method for reducing access to main memory using a stack cache
US20040148467A1 (en) * 2003-01-16 2004-07-29 Ip-First, Llc. Microprocessor and apparatus for performing fast speculative pop operation from a stack memory cache
US7536510B1 (en) * 2005-10-03 2009-05-19 Advanced Micro Devices, Inc. Hierarchical MRU policy for data cache
US20080235453A1 (en) * 2007-03-22 2008-09-25 International Business Machines Corporation System, method and computer program product for executing a cache replacement algorithm
US20090328022A1 (en) * 2008-06-26 2009-12-31 International Business Machines Corporation Systems and methods for maintaining crtm code
JP2012027652A (en) * 2010-07-22 2012-02-09 Fujitsu Ltd Cache memory

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Calder, B et al. Predictive Sequential Associative Cache. HPCA '96 Proceedings of the 2nd IEEE Symposium on High-Performance Computer Architecture, 1996, pp. 244-253 [online], [retrieved 2015-06-17]. Retrieved from the Internet *
Machine translation of Mori, JP 2012027652 A. Published February 9, 2012. *

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150125085A1 (en) * 2013-11-04 2015-05-07 Niraj Gupta Integral image coding
US9398297B2 (en) * 2013-11-04 2016-07-19 Intel Corporation Integral image coding
WO2016097795A1 (en) * 2014-12-14 2016-06-23 Via Alliance Semiconductor Co., Ltd. Multi-mode set associative cache memory dynamically configurable to selectively allocate into all or subset or tis ways depending on mode
US20160350229A1 (en) * 2014-12-14 2016-12-01 Via Alliance Semiconductor Co., Ltd. Dynamic cache replacement way selection based on address tag bits
US20160357681A1 (en) * 2014-12-14 2016-12-08 VIA Alliance Semicanductor Co., Ltd. Multi-mode set associative cache memory dynamically configurable to selectively select one or a plurality of its sets depending upon the mode
US20160357664A1 (en) * 2014-12-14 2016-12-08 Via Alliance Semiconductor Co., Ltd. Multi-mode set associative cache memory dynamically configurable to selectively allocate into all or a subset of its ways depending on the mode
US9798668B2 (en) * 2014-12-14 2017-10-24 Via Alliance Semiconductor Co., Ltd. Multi-mode set associative cache memory dynamically configurable to selectively select one or a plurality of its sets depending upon the mode
EP3055774B1 (en) * 2014-12-14 2019-07-17 VIA Alliance Semiconductor Co., Ltd. Multi-mode set associative cache memory dynamically configurable to selectively allocate into all or a subset of its ways depending on the mode
US10698827B2 (en) * 2014-12-14 2020-06-30 Via Alliance Semiconductor Co., Ltd. Dynamic cache replacement way selection based on address tag bits

Also Published As

Publication number Publication date
US9298615B2 (en) 2016-03-29
US20140143495A1 (en) 2014-05-22
US9734059B2 (en) 2017-08-15

Similar Documents

Publication Publication Date Title
US9274959B2 (en) Handling virtual memory address synonyms in a multi-level cache hierarchy structure
CN104272279B (en) Data processing equipment with caching and translation lookaside buffer
US9229873B2 (en) Systems and methods for supporting a plurality of load and store accesses of a cache
US9235514B2 (en) Predicting outcomes for memory requests in a cache memory
US20140208038A1 (en) Sectored cache replacement algorithm for reducing memory writebacks
JP5328748B2 (en) Cache line replacement in cache memory
KR100278328B1 (en) Cache miss buffer
US10019368B2 (en) Placement policy for memory hierarchies
US4332010A (en) Cache synonym detection and handling mechanism
US6877067B2 (en) Shared cache memory replacement control method and apparatus
KR100962058B1 (en) Method and apparatus for multithreaded cache with cache eviction based on thread identifier
TW554267B (en) Cache way prediction based on instruction base register
JP2833062B2 (en) Cache memory control method, processor and information processing apparatus using the cache memory control method
CN102498477B (en) TLB prefetching
US7739477B2 (en) Multiple page size address translation incorporating page size prediction
JP5373173B2 (en) Address translation method and apparatus
EP0856798B1 (en) A cache system
US7707397B2 (en) Variable group associativity branch target address cache delivering multiple target addresses per cache line
JP2839060B2 (en) Data processing system and data processing method
US7136967B2 (en) Multi-level cache having overlapping congruence groups of associativity sets in different cache levels
CN102640124B (en) For computing system, method and pre-fetch unit that the storage perception of data stream is looked ahead
TWI234709B (en) Weighted cache line replacement
US6014732A (en) Cache memory with reduced access time
US9361236B2 (en) Handling write requests for a data array
US10248572B2 (en) Apparatus and method for operating a virtually indexed physically tagged cache

Legal Events

Date Code Title Description
AS Assignment

Owner name: ADVANCED MICRO DEVICES, INC., CALIFORNIA

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:OLSON, LENA E.;ECKERT, YASUKO;SRIDHARAN, VILAS K.;AND OTHERS;SIGNING DATES FROM 20130621 TO 20130713;REEL/FRAME:030963/0485

STCF Information on status: patent grant

Free format text: PATENTED CASE