US20230169011A1 - Adaptive Cache Partitioning - Google Patents
Adaptive Cache Partitioning Download PDFInfo
- Publication number
- US20230169011A1 US20230169011A1 US18/057,628 US202218057628A US2023169011A1 US 20230169011 A1 US20230169011 A1 US 20230169011A1 US 202218057628 A US202218057628 A US 202218057628A US 2023169011 A1 US2023169011 A1 US 2023169011A1
- Authority
- US
- United States
- Prior art keywords
- cache
- memory
- metadata
- cache memory
- logic
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000638 solvent extraction Methods 0.000 title claims description 61
- 230000003044 adaptive effect Effects 0.000 title description 43
- 230000015654 memory Effects 0.000 claims abstract description 649
- 238000005192 partition Methods 0.000 claims abstract description 179
- 230000003247 decreasing effect Effects 0.000 claims abstract description 65
- 238000000034 method Methods 0.000 claims abstract description 38
- 238000013507 mapping Methods 0.000 claims description 125
- 230000004044 response Effects 0.000 claims description 45
- 230000007423 decrease Effects 0.000 claims description 31
- 238000012544 monitoring process Methods 0.000 claims description 10
- 239000013598 vector Substances 0.000 claims description 6
- 238000003860 storage Methods 0.000 abstract description 81
- 238000010586 diagram Methods 0.000 description 15
- 238000012546 transfer Methods 0.000 description 15
- 230000002829 reductive effect Effects 0.000 description 11
- 239000004065 semiconductor Substances 0.000 description 8
- 239000000758 substrate Substances 0.000 description 8
- 230000007246 mechanism Effects 0.000 description 7
- 230000006870 function Effects 0.000 description 6
- 230000005055 memory storage Effects 0.000 description 6
- 238000012545 processing Methods 0.000 description 6
- 230000001747 exhibiting effect Effects 0.000 description 5
- 238000011010 flushing procedure Methods 0.000 description 5
- 238000013528 artificial neural network Methods 0.000 description 4
- 230000002411 adverse Effects 0.000 description 3
- 230000009467 reduction Effects 0.000 description 3
- 230000000717 retained effect Effects 0.000 description 3
- 238000013459 approach Methods 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 238000011156 evaluation Methods 0.000 description 2
- 239000011521 glass Substances 0.000 description 2
- 238000010801 machine learning Methods 0.000 description 2
- 238000007726 management method Methods 0.000 description 2
- 238000004519 manufacturing process Methods 0.000 description 2
- 230000007704 transition Effects 0.000 description 2
- 230000001960 triggered effect Effects 0.000 description 2
- JBRZTFJDHDCESZ-UHFFFAOYSA-N AsGa Chemical compound [As]#[Ga] JBRZTFJDHDCESZ-UHFFFAOYSA-N 0.000 description 1
- ZOXJGFHDIHLPTG-UHFFFAOYSA-N Boron Chemical compound [B] ZOXJGFHDIHLPTG-UHFFFAOYSA-N 0.000 description 1
- 229910002601 GaN Inorganic materials 0.000 description 1
- 229910001218 Gallium arsenide Inorganic materials 0.000 description 1
- JMASRVWKEDWRBT-UHFFFAOYSA-N Gallium nitride Chemical compound [Ga]#N JMASRVWKEDWRBT-UHFFFAOYSA-N 0.000 description 1
- XUIMIQQOPSSXEZ-UHFFFAOYSA-N Silicon Chemical compound [Si] XUIMIQQOPSSXEZ-UHFFFAOYSA-N 0.000 description 1
- 229910000577 Silicon-germanium Inorganic materials 0.000 description 1
- LEVVHYCKPQWKOP-UHFFFAOYSA-N [Si].[Ge] Chemical compound [Si].[Ge] LEVVHYCKPQWKOP-UHFFFAOYSA-N 0.000 description 1
- 229910045601 alloy Inorganic materials 0.000 description 1
- 239000000956 alloy Substances 0.000 description 1
- 238000003491 array Methods 0.000 description 1
- 229910052785 arsenic Inorganic materials 0.000 description 1
- RQNWIZPPADIBDY-UHFFFAOYSA-N arsenic atom Chemical compound [As] RQNWIZPPADIBDY-UHFFFAOYSA-N 0.000 description 1
- 230000015572 biosynthetic process Effects 0.000 description 1
- 229910052796 boron Inorganic materials 0.000 description 1
- 239000013626 chemical specie Substances 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 230000008878 coupling Effects 0.000 description 1
- 238000010168 coupling process Methods 0.000 description 1
- 238000005859 coupling reaction Methods 0.000 description 1
- 238000013523 data management Methods 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 230000007717 exclusion Effects 0.000 description 1
- 239000004744 fabric Substances 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 229910052732 germanium Inorganic materials 0.000 description 1
- GNPVGFCGXDBREM-UHFFFAOYSA-N germanium atom Chemical compound [Ge] GNPVGFCGXDBREM-UHFFFAOYSA-N 0.000 description 1
- BHEPBYXIRTUNPN-UHFFFAOYSA-N hydridophosphorus(.) (triplet) Chemical compound [PH] BHEPBYXIRTUNPN-UHFFFAOYSA-N 0.000 description 1
- 239000012212 insulator Substances 0.000 description 1
- 238000005468 ion implantation Methods 0.000 description 1
- 230000000670 limiting effect Effects 0.000 description 1
- 239000000463 material Substances 0.000 description 1
- 230000007334 memory performance Effects 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 230000002085 persistent effect Effects 0.000 description 1
- 239000002574 poison Substances 0.000 description 1
- 231100000614 poison Toxicity 0.000 description 1
- 230000036316 preload Effects 0.000 description 1
- 230000008569 process Effects 0.000 description 1
- 230000000306 recurrent effect Effects 0.000 description 1
- 229910052594 sapphire Inorganic materials 0.000 description 1
- 239000010980 sapphire Substances 0.000 description 1
- 230000006403 short-term memory Effects 0.000 description 1
- 229910052710 silicon Inorganic materials 0.000 description 1
- 239000010703 silicon Substances 0.000 description 1
- 238000004088 simulation Methods 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
- 230000001360 synchronised effect Effects 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/08—Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
- G06F12/0802—Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
- G06F12/0893—Caches characterised by their organisation or structure
- G06F12/0895—Caches characterised by their organisation or structure of parts of caches, e.g. directory or tag array
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/06—Addressing a physical block of locations, e.g. base addressing, module addressing, memory dedication
- G06F12/0615—Address space extension
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/08—Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
- G06F12/0802—Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
- G06F12/0862—Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches with prefetch
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/08—Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
- G06F12/0802—Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
- G06F12/0864—Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches using pseudo-associative means, e.g. set-associative or hashing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/08—Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
- G06F12/0802—Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
- G06F12/0866—Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches for peripheral storage systems, e.g. disk cache
- G06F12/0871—Allocation or management of cache space
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2212/00—Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
- G06F2212/10—Providing a specific technical effect
- G06F2212/1016—Performance improvement
- G06F2212/1021—Hit rate improvement
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2212/00—Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
- G06F2212/28—Using a specific disk cache architecture
- G06F2212/282—Partitioned cache
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2212/00—Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
- G06F2212/60—Details of cache memory
- G06F2212/601—Reconfiguration of cache memory
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2212/00—Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
- G06F2212/60—Details of cache memory
- G06F2212/602—Details relating to cache prefetching
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2212/00—Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
- G06F2212/60—Details of cache memory
- G06F2212/6026—Prefetching based on access pattern detection, e.g. stride based prefetch
Definitions
- some computing systems include a hierarchical memory system, which may include multiple levels of memory.
- efficient operation can entail cost efficiency and speed efficiency.
- Faster memories are typically more expensive than relatively slower memories, so designers attempt to balance their relative costs and benefits.
- One approach is to use a smaller amount of faster memory with a larger amount of slower memory.
- the faster memory is deployed at a higher level in the hierarchical memory system than the slower memory such that the faster memory is preferably accessed first.
- An example of a relatively faster memory is called a cache memory.
- An example of a relatively slower memory is a backing memory, which can include primary memory, main memory, backing storage, or the like.
- a cache memory can accelerate data operations by storing and retrieving data of the backing memory using, for example, high-performance memory cells.
- the high-performance memory cells enable the cache memory to respond to memory requests more quickly than the backing memory.
- a cache memory can enable faster responses from a memory system based on desired data being present in the cache.
- One approach to increasing a likelihood that desired data is present in the cache is prefetching data before the data is requested. To do so, a prefetching system attempts to predict what data will be requested by a processor and then loads this predicted data into the cache. Although a prefetching system can make a cache memory more likely to accelerate memory access operations, data prefetching can introduce operational complexity that engineers and other computer designers strive to overcome.
- FIGS. 1 - 1 through 1 - 3 illustrate example environments in which techniques for adaptive cache partitioning can be implemented.
- FIG. 2 illustrates an example of an apparatus that can implement aspects of adaptive cache partitioning.
- FIG. 3 illustrates another example of an apparatus that can implement aspects of adaptive cache partitioning.
- FIGS. 4 - 1 through 4 - 3 illustrate example operational implementations of adaptive cache partitioning.
- FIGS. 5 - 1 through 5 - 3 illustrate further example operational implementations of adaptive cache partitioning.
- FIG. 6 illustrates an example flow diagram depicting operations for adaptive cache partitioning.
- FIG. 7 illustrates an example flow diagram depicting operations for adaptive cache partitioning.
- FIG. 8 illustrates an example flow diagram depicting operations for adaptive cache partitioning based, at least in part, on metrics pertaining to prefetch performance.
- FIG. 9 illustrates an example flow diagram depicting operations for adaptive cache partitioning based, at least in part, on metrics pertaining to cache and/or prefetch performance.
- FIG. 10 illustrates an example of a system for implementing adaptive cache partitioning.
- Cache memory which can store data of a backing memory, may be capable of servicing requests much more quickly than the backing memory.
- cache memory can be deployed “above” or “in front of” a backing memory in a memory hierarchy so that the cache memory is preferably accessed before accessing the slower backing memory.
- the cache memory may have a lower capacity than the backing or main memory.
- the cache memory may, therefore, load a selected subset of the address space of the backing memory.
- Data can be selectively admitted and/or evicted from the cache memory in accordance with suitable criteria, such as cache admission policies, eviction policies, replacement policies, and/or the like.
- a cache miss refers to a request pertaining to an address that has not been loaded into the cache and/or is not included in the working set of the cache.
- Servicing a cache miss may involve fetching data from the slower backing memory, which can significantly degrade performance.
- servicing requests that result in “cache hits” may involve accessing the relatively higher-performance cache memory without incurring latencies for accessing the relatively lower-performance backing memory.
- Prefetching typically involves loading addresses into cache memory before the addresses are requested.
- a prefetcher can predict addresses of upcoming requests and preload the addresses into the cache memory in the background so that, when requests pertaining to the predicted addresses are subsequently received, the requests can be serviced from the cache memory as opposed to triggering cache misses.
- requests pertaining to the prefetched addresses may be serviced using the relatively higher-performance cache memory without incurring the latency of the relatively lower-performance backing memory.
- a “useful” prefetch refers to a prefetch that results in a subsequent cache hit, which is termed a “prefetch hit.”
- a useful prefetch is achieved with the prefetching of data associated with an address that is subsequently requested and/or otherwise accessed from the cache memory.
- a “bad” prefetch or “prefetch miss” refers to a prefetch for data that is not subsequently requested and, as such, does not produce a cache or prefetch hit.
- Bad prefetches can adversely impact performance. Bad prefetches can consume limited cache memory resources with data that is unlikely to be requested (e.g., poison the cache), resulting in increased cache miss rate, lower hit rate, increased thrashing, higher bandwidth consumption, and so on.
- a prefetcher can try to avoid these problems by attempting to detect patterns in which a memory is accessed and then prefetching data in accordance with the detected patterns.
- the prefetcher may utilize metadata to detect, predict, derive, and/or exploit memory access patterns to determine accurate prefetch predictions (e.g., predict addresses of upcoming requests).
- the metadata utilized by the prefetcher may be referred to as “prefetcher metadata,” “prefetch metadata,” “request metadata,” “access metadata,” “memory metadata,” “memory access metadata,” or the like.
- This metadata may include any suitable information pertaining to an address space including, but not limited to: a sequence of previously requested addresses or address offsets, an address history, an address history table, an index table, access frequencies for respective addresses, access counts (e.g., accesses within respective windows), access time(s), last access time(s), and so on.
- the prefetcher may implement prefetch operations for workloads that are suitable for prefetching.
- a “suitable” workload or workload that is “suitable for prefetching” refers to a “predictable” workload that produces memory accesses according to patterns that are detectable (and/or exploitable) by the prefetcher.
- a suitable workload may, therefore, refer to a workload associated with metadata from which the prefetcher is capable of deriving a predictable access pattern. Examples of suitable workloads include workloads in which memory requests are offset by a consistent offset or stride. These types of workloads may be produced by programs that access structured data repeatedly and/or in regular patterns.
- a program may repeatedly access data structures of size D, resulting in a predictable workload in which memory accesses are offset by a relatively constant offset delta A or stride, where ⁇ D.
- Stride and other types of access patterns may be derived from metadata pertaining to previous memory accesses of the workload.
- the prefetcher can utilize the memory access patterns derived from such metadata to prefetch data that is likely to be requested in the future.
- the prefetcher can load data of addresses a+ ⁇ , ⁇ +2 ⁇ , . . . through a+d ⁇ into the cache memory in response to a cache miss for address a (where d is a configurable prefetch degree).
- the data prefetched from addresses a+ ⁇ through a+d ⁇ will likely result in subsequent prefetch hits, thereby preventing cache misses and resulting in improved performance.
- an “unsuitable” workload or workload that is “unsuitable for prefetching” refers to a workload that accesses memory in a manner that the prefetcher is incapable of predicting, modeling, and/or otherwise exploiting to produce accurate prefetch predictions.
- An unsuitable workload may refer to a workload associated with metadata from which the prefetcher is incapable of deriving address predictions, patterns, models, and/or the like.
- unsuitable workloads include workloads produced by programs that do not access memory in repeated and/or regular patterns, programs that access memory at seemingly random addresses and/or address offsets, programs that access memory according to patterns that are too complex or varied to be detected by the prefetcher (and/or captured in the prefetcher metadata), and/or the like. Attempting to prefetch data for unsuitable workloads may result in poor prefetch performance. Since prefetch decisions for unsuitable workloads are not guided by discernable access patterns, little, if any, of the prefetched data is likely to be subsequently requested before being evicted from the cache.
- inaccurate prefetch predictions may result in bad prefetches that consume the relatively limited capacity of the cache memory with data that is unlikely to be subsequently accessed to the exclusion of other data that may be accessed more frequently. Attempting prefetch for unsuitable workloads may therefore decrease cache performance (e.g., result in lower hit rate, increased miss rate, thrashing, increased bandwidth consumption, and so on). To avoid these and other problems, prefetching may not be implemented for unsuitable workloads (and/or within address regions associated with unsuitable workloads).
- a cache may service a plurality of different workloads, each having respective workload characteristics (e.g., respective memory access characteristics, patterns, and/or the like).
- workload characteristics within respective regions of the address space may depend on a number of factors, which may vary over time. Programs operating within different regions of the address space may, therefore, produce workloads having different characteristics (e.g., different memory access patterns).
- a first program operating within a first region of the address space may access memory per a first stride pattern ( ⁇ 1 ); a second program operating within a second region may access memory per a second, different stride pattern ( ⁇ 2 ); a third program operating within a third region of the address space may access memory according to a more complex pattern, such as a correlation pattern; a further program operating within a fourth region of the address space may access memory unpredictably; and so on.
- the first stride pattern may be capable of producing accurate prefetches within the first region, the first stride pattern will likely produce poor results if used in the other regions (and vice versa).
- prefetching performance can be improved by maintaining metadata pertaining to respective regions of the address space.
- the metadata utilized by the prefetcher may include a plurality of entries, with each entry including information pertaining to memory accesses with a respective region of the address space.
- the prefetcher may utilize metadata pertaining to respective regions to inform prefetch operations within the respective regions.
- the prefetcher can utilize metadata pertaining to respective regions of the address space to determine characteristics of the workload within the respective regions, determine whether the workloads are suitable for prefetching (e.g., distinguish workloads and/or regions that are suitable for prefetching from workloads and/or regions that are unsuitable for prefetching), determine access patterns within the respective regions, implement prefetch operations within the respective regions per the determined access patterns, and so on.
- the prefetcher metadata covers a plurality of fixed-sized address regions.
- the prefetcher metadata may be configured to cover adaptively sized address regions in which workload characteristics, such as access patterns, are consistent.
- the size of the address ranges covered by respective entries of the prefetcher metadata may vary within respective regions of the address space depending on, inter alia, workload characteristics and/or prefetch performance within the respective regions.
- Metadata pertaining to memory accesses are often tracked at and/or within performance-sensitive functionality of the hierarchical memory system, such as memory I/O paths or the like. Moreover, prefetch operations that utilize such metadata may be performance-sensitive (e.g., to ensure that prefetched data are available before such data is requested). Therefore, it can be advantageous to maintain metadata pertaining to memory accesses (prefetcher metadata) within high-performance memory resources.
- prefetch metadata may be maintained within high-performance cache memory. For example, a fixed portion of the high-performance memory resources of the cache may be allocated for the storage of prefetcher metadata (and/or be allocated to the prefetcher and/or prefetch logic of the cache).
- the size and/or configuration of the fixed portion may be determined at design, manufacturing, and/or fabrication of the cache and/or component in which the cache is deployed, such as a processor, System-on-Chip (SoC), or the like.
- the fixed portion of cache memory allocated for prefetch metadata may be set in hardware, a Register Level Transfer (RTL) implementation, and/or the like.
- the fixed allocation of cache memory may improve prefetch performance by, inter alia, decreasing the latency of metadata updates, address predictions, prefetch operations, and so on. Since the size of the cache memory is finite, allocation of the fixed portion of the cache memory for metadata storage may adversely impact other aspects of cache performance. For example, allocation of the fixed portion may reduce the amount of data that can be loaded into the cache, which can result in decreased cache performance (e.g., lead to increased miss rate, decreased hit rate, increased replacement rate, and/or the like). These disadvantages may be outweighed by the benefits of improved prefetch performance in some circumstances.
- the prefetcher when servicing suitable workloads having access patterns that can be accurately predicted and/or exploited, can utilize metadata maintained within the fixed allocation of high-performance cache memory to implement accurate, low-latency prefetch operations that result in better overall cache performance despite the reduced cache capacity.
- the benefits of improved prefetch performance may not outweigh the disadvantages of decreased cache capacity.
- the fixed portion of the cache memory resources allocated for prefetcher metadata may be effectively wasted. More specifically, when servicing workloads having access patterns that cannot be accurately predicted and/or exploited by the prefetcher, the fixed portion of the cache memory allocated for storage of prefetcher metadata may not yield useful prefetches and, as such, may not improve cache performance, much less outweigh the performance penalties incurred due to reduced cache capacity.
- the fixed allocation of the cache memory would be better utilized to increase the available capacity of the cache rather than storage of prefetcher metadata.
- the amount of cache memory allocated for prefetch metadata may be predetermined.
- the fixed prefetch metadata capacity may be configured to provide acceptable performance under a range of different operating conditions.
- the fixed prefetch metadata capacity may be determined by testing, experience, simulation, machine-learning, and/or the like. Although the fixed amount of prefetch metadata capacity may yield acceptable performance under some conditions, performance may suffer under other conditions. Moreover, the cache may be incapable of adapting to changes in workload conditions.
- the cache having the fixed prefetch metadata capacity services predominantly suitable workloads, such as a large number of workloads having different respective access patterns (tracked in respective prefetcher metadata), workloads having more complex access patterns, workloads having access patterns that involve larger amounts of prefetcher metadata, and/or the like.
- the fixed prefetch metadata capacity may not be sufficient to accurately capture access patterns of the workloads, resulting in decreased prefetch accuracy and decreased cache performance. Under these conditions, cache performance could be improved by increasing the metadata capacity available to the prefetcher (and/or further reducing available cache capacity).
- Workload characteristics may vary from address region to address region. Moreover, the characteristics of respective workloads, and/or corresponding address regions, may vary over time. Workload characteristics within respective regions of the address space may depend on a number of factors, including, but not limited to: the programs utilizing the respective regions, the state of the programs, the processing task(s) being performed by the programs, the execution phase of the programs, characteristics of the data structure(s) being accessed by the programs, the manner in which the data structure(s) are accessed, and/or the like.
- the prefetcher may utilize metadata pertaining to workload characteristics within respective address regions to determine accurate prefetch predictions within the respective address regions.
- the amount of prefetcher metadata needed to produce accurate prefetch predictions may, therefore, depend on a number of factors, which may vary over time, including, but not limited to: the quantity of workloads (and/or corresponding address regions), the amount of metadata needed to track access patterns within respective address regions, the prefetch technique(s) implemented by the prefetcher within the respective address regions, the complexity of the access patterns, and so on.
- the prefetch metadata capacity needed to produce accurate prefetch predictions under first operating conditions (and/or during a first time interval) may differ from the prefetch metadata capacity needed to produce accurate prefetch predictions under second operating conditions (and/or during a second time interval).
- this document describes adaptive cache partitioning techniques that enable the amount of cache memory allocated for storage of prefetcher metadata to be dynamically adjusted.
- the cache memory capacity allocated to prefetch operations may therefore be tuned to improve cache performance.
- the prefetcher implements a stride prefetch technique for a plurality of workloads, with each workload corresponding to a respective region of the address space.
- the prefetcher may detect stride patterns for respective workloads using metadata pertaining to accesses within the respective regions. Detecting the stride access patterns for Y workloads may involve maintaining metadata pertaining to accesses within Y different address regions.
- a fixed prefetch metadata capacity may be incapable of maintaining metadata capable of capturing the Y patterns, which may reduce the accuracy of the prefetch predictions, resulting in decreased cache performance.
- the fixed prefetch metadata capacity may only be capable of tracking a subset of the Y patterns, leaving Xaddress regions uncovered.
- the disclosed adaptive cache partitioning techniques may be capable of improving cache performance by, inter alia, increasing the amount of cache memory allocated to the prefetcher, such that the prefetcher is capable of storing metadata pertaining to stride patterns of each of the Y workloads and/or regions.
- the disclosed adaptive cache partitioning may be capable of modifying prefetch metadata capacity in response to changing workload conditions. For example, one or more of the Y workloads may transition from suitable to unsuitable over time, resulting in decreased prefetch performance. In response to the decrease in prefetch performance, the amount of cache memory allocated for the prefetcher metadata may be decreased, which may produce a corresponding increase to available cache capacity, thereby improving overall cache performance.
- a prefetcher may implement a correlation prefetch technique that learns access patterns that may repeat but are not as consistent as simple stride or delta address patterns (correlation patterns).
- the correlation patterns may include delta sequences including a plurality of elements and, as such, may be derived from larger amounts of metadata than simple stride patterns.
- a correlation prefetch for a delta sequence that includes two elements ( ⁇ 1 , ⁇ 2 ) may include prefetching addresses a+ ⁇ 1 , a+ ⁇ 1 + ⁇ 2 , a+2 ⁇ 1 + ⁇ 2 , a+ ⁇ 1 +2 ⁇ 2 , and so on, depending on the degree of the correlation prefetch operation.
- correlation prefetch techniques attempt to extract more complex patterns, these techniques may involve larger amounts of metadata.
- a cache having a fixed prefetch metadata capacity may be insufficient, resulting in decreased performance.
- the adaptive cache partitioning techniques disclosed herein can increase the amount of cache memory allocated to the prefetcher, resulting in improved prefetch accuracy and better overall performance, despite corresponding reductions to the available cache capacity.
- the disclosed adaptive cache partitioning techniques may adjust cache memory allocations responsive to changing workload conditions, such as workloads with simpler single stride access patterns, those with fewer workloads, and/or the like.
- the disclosed adaptive cache partitioning techniques can also improve the performance of machine-learning and/or machine-learned (ML) prefetch implementations, such as classification-based prefetchers, artificial neural network (NN) prefetchers, Deep Neural Network (DNN) prefetchers, Recurrent NN (RNN) prefetchers, Long Short-Term Memory (LSTM) prefetchers, and/or the like.
- ML machine-learning and/or machine-learned
- NN artificial neural network
- DNN Deep Neural Network
- RNN Recurrent NN
- LSTM Long Short-Term Memory
- ML prefetch techniques may attempt to leverage local context since, as disclosed herein, data structures accessed by programs running within respective local contexts tend to be stored in contiguous data structures or blocks that are accessed repeatedly and/or in regular patterns.
- An ML prefetcher can be trained to develop and/or refine ML models within respective local contexts and can use the ML models to implement prefetch operations.
- Local context can vary significantly across the address space due to differences in workload produced by programs operating within various regions of the address space.
- An ML model trained to learn the local context within one region of the address space (and/or that is produced by one program) may not be capable of accurately modeling the local context within other regions of the address space (and/or that is produced by another program).
- the ML models may, therefore, rely on metadata covering respective local contexts.
- a fixed allocation of cache memory may be insufficient to maintain ML models for the workloads being serviced by the cache, leading to poor prefetch performance.
- the disclosed adaptive cache partitioning techniques may be capable of adjusting the amount of prefetch metadata capacity allocated to the prefetcher in accordance with the quantity and/or complexity of ML models being tracked thereby.
- logic coupled to a cache memory is configured to balance performance improvements enabled by allocation of cache memory capacity for prefetch metadata against the impacts of corresponding decreases to available cache capacity.
- the logic can be configured to allocate a first portion of the cache memory for metadata pertaining to an address space (e.g., prefetch metadata), allocate cache data to a second portion of the cache memory that is different from the first portion, and/or modify a size of the first portion of the cache memory allocated for the metadata based, at least in part, on a metric pertaining to data prefetched into the second portion of the cache memory.
- the metric may be configured to quantify any suitable aspect of cache and/or prefetch performance including, but not limited to: prefetch hit rate, prefetch miss rate, quantity of useful prefetches, quantity of bad prefetches, ratio of useful prefetches to bad prefetches, cache hit rate, cache miss rate, request latency, average request latency, and/or the like.
- the amount of cache memory allocated for prefetcher metadata may be increased when the metric exceeds a first threshold and may be decreased when the metric falls below a second threshold.
- the metadata maintained within the first portion of the cache memory may be updated in response to requests pertaining to the address space, such as read requests, write requests, transfer requests, cache hits, cache misses, prefetch hits, prefetch misses, and/or the like.
- a prefetcher (and/or prefetch logic) of the cache may be configured to select data to prefetch into the second portion of the cache memory based, at least in part, on the metadata pertaining to the address space maintained within the first portion of the cache memory.
- the metadata may include any suitable information pertaining to addresses and/or ranges of the address space including, but not limited to: address sequence, address history, index table, delta sequence, stride pattern, correlation pattern, feature vectors, ML features, ML feature vectors, ML model, ML modeling data, and/or the like.
- the size of the first portion of the cache memory allocated for the metadata may be modified in response to monitoring one or more metrics pertaining to data prefetched into the second portion of the cache memory.
- the metrics may be configured to quantify prefetch performance and may include, but are not limited to: prefetch hit rate, quantity of useful prefetches, quantity of bad prefetches, ratio of useful prefetches to bad prefetches, and so on.
- the size of the first portion may be increased when one or more of the metrics exceeds a first threshold or may be decreased when the metrics are below a second threshold.
- the amount of cache memory allocated for storage of metadata pertaining to the address space may be incrementally and/or periodically increased while prefetch performance remains above the first threshold.
- the amount of cache memory allocated for the metadata may be increased until a maximum or upper bound is reached. Conversely, the amount of cache memory allocated for the metadata may be incrementally and/or periodically reduced while prefetch performance remains below the second threshold. The amount of cache memory allocated for the metadata may be decreased until a minimum or lower bound is reached. In some aspects, at the lower bound, no cache resources are allocated for metadata storage, and substantially all of cache memory is available as cache capacity.
- a cache memory can include a first portion allocated for metadata pertaining to an address space and a second portion allocated for caching data of the address space.
- relative sizes of the first and second portions are adapted based, at least in part, on current processing workloads. If a current processing workload is suitable for prefetching, the first portion can be sized appropriately. For example, logic can increase a size of the first portion and decrease a size of the second portion. The logic can shift some memory storage from being used for caching data to being used for storing metadata to increase prefetch capabilities for the workload that is suitable for prefetching.
- the first portion can be down-sized appropriately to provide greater resources for caching data.
- the logic can decrease the size of the first portion and increase the size of the second portion.
- logic can shift some cache memory storage from being used for maintaining the metadata to being used for storing cache data to decrease the resources consumed by the prefetcher for the workload that is not suitable for prefetching.
- the described cache partitioning can therefore adapt to efficiently provide more prefetch functionality or more cache storage depending on the current processing workload.
- FIG. 1 - 1 illustrates an example apparatus 100 that can implement aspects of adaptive cache partitioning.
- the apparatus 100 can be realized as, for example, at least one electronic device.
- Example electronic-device implementations include an internet-of-things (IoTs) device 100 - 1 , a tablet device 100 - 2 , a smartphone 100 - 3 , a notebook computer 100 - 4 , a desktop computer 100 - 5 , a server computer 100 - 6 , a server cluster 100 - 7 , and/or the like.
- IoTs internet-of-things
- a wearable device such as a smartwatch or intelligent glasses
- an entertainment device such as a set-top box or a smart television
- a motherboard or server blade a consumer appliance
- vehicles such as industrial equipment
- NAS network-attached storage
- Each type of electronic device includes one or more components to provide some computing functionality or feature.
- the apparatus 100 includes at least one host 102 , at least one processor 103 , at least one memory controller 104 , interconnect 105 , memory 108 , and at least one cache 110 .
- the memory 108 may represent main memory, system memory, backing memory, backing storage, a combination thereof, and/or the like.
- the memory 108 may be realized with any suitable memory and/or storage facility including, but not limited to: a memory array, semiconductor memory, read-only memory (ROM), random-access memory (RAM), Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), thyristor random access memory (TRAM), ferroelectric RAM (FeRAM), magnetic RAM (MRAM), resistive RAM (RRAM), magnetoresistive RAM (MRAM), spin-torque transfer RAM (STT RAM), phase-change memory (PCM), three-dimensional (3D) stacked DRAM, Double Data Rate (DDR) memory, high bandwidth memory (HBM), a hybrid memory cube (HMC), solid-state memory, Flash memory, NAND Flash memory, NOR Flash memory, 3D XPointTM memory, and/or the like.
- ROM read-only memory
- RAM random-access memory
- SRAM Static RAM
- DRAM Dynamic RAM
- SDRAM Synchronous DRAM
- TRAM thyristor random access memory
- FeRAM
- the host 102 can further include and/or be coupled to non-transitory storage, which may be realized with a device or module including any suitable non-transitory, persistent, solid-state, and/or non-volatile memory.
- the host 102 can include the processor 103 , memory controller 104 , and/or other components (e.g., cache 110 - 1 ).
- the processor 103 can be coupled to the cache 110 - 1 , and the cache 110 - 1 can be coupled to the memory controller 104 .
- the processor 103 can also be coupled, directly or indirectly, to the memory controller 104 .
- the host 102 can be coupled to the cache 110 - 2 through the interconnect 105 .
- the cache 110 - 2 can be coupled to the memory 108 .
- the depicted components of the apparatus 100 represent an example computing architecture with a memory hierarchy (or hierarchical memory system).
- the cache 110 - 1 can be logically coupled between the processor 103 and the cache 110 - 2 .
- the cache 110 - 2 can be logically coupled between the processor 103 (and/or cache 110 - 1 ) and the memory 108 .
- the cache 110 - 1 is at a higher level of the memory hierarchy than is the cache 110 - 2 .
- the cache 110 - 2 is at a higher level of memory hierarchy than is the memory 108 .
- the indicated interconnect 105 as well as the other interconnects that couple various components, can enable data to be transferred between or among the various components. Interconnect examples include a bus, a switching fabric, one or more wires that carry voltage or current signals, and/or the like.
- the host 102 may include additional caches, including multiple levels of cache memory (e.g., multiple cache layers).
- the processor 103 may include one or more internal memory and/or cache layers, such as instruction registers, data registers, an L1 cache, an L2 cache, an L3 cache, and/or the like. Further, at least one other cache and memory pair may be coupled “below” the illustrated cache 110 - 2 and/or memory 108 .
- the cache 110 - 2 and the memory 108 may be realized in various manners.
- the cache 110 - 2 and the memory 108 are both disposed on, or physically supported by, a motherboard with the memory 108 comprising “main memory.”
- the cache 110 - 2 includes and/or is realized by DRAM
- the memory 108 includes and/or is realized by a non-transitory memory device or module.
- the components may be implemented in alternative ways, including in distributed or shared memory systems. Further, a given apparatus 100 may include more, fewer, or different components.
- the cache 110 - 2 can be configured to improve memory performance by storing data of the relatively lower-performance memory 108 within a relatively higher-performance cache memory 120 .
- the cache memory 120 can be provided and/or be embodied by cache hardware, which can include, but is not limited to: semiconductor integrated circuitry, memory cells, memory arrays, memory banks, memory chips, and/or the like.
- the cache memory 120 includes a memory array.
- the memory array may be configured as cache memory 120 including a plurality of cache units, such as cache lines or the like.
- the memory array may be a collection (e.g., a grid) of memory cells, with each memory cell being configured to store at least one bit of digital data.
- the cache memory 120 may be formed on a semiconductor substrate, such as silicon, germanium, silicon-germanium alloy, gallium arsenide, gallium nitride, etc.
- the substrate is a semiconductor wafer.
- the substrate may be a silicon-on-insulator (SOI) substrate, such as silicon-on-glass (SOG) or silicon-on-sapphire (SOS), or epitaxial layers of semiconductor materials on another substrate.
- SOI silicon-on-insulator
- SOG silicon-on-glass
- SOS silicon-on-sapphire
- the conductivity of the substrate, or sub-regions of the substrate may be controlled through doping using various chemical species including, but not limited to, phosphorous, boron, or arsenic.
- Doping may be performed during the initial formation or growth of the substrate, by ion-implantation, or by any other doping mechanism.
- the cache memory 120 may include any suitable memory and/or memory mechanism including, but not limited to: a memory, a memory array, semiconductor memory, volatile memory, RAM, SRAM, DRAM, SDRAM, non-volatile memory, solid-state memory, Flash memory, and/or the like.
- Data may be loaded into the cache memory 120 in response to cache misses so that subsequent requests for the data can be serviced more quickly. Further performance improvements can be realized by prefetching data into the cache memory 120 , which may include predicting addresses that are likely to be requested in the future and prefetching the predicted addresses into the cache memory 120 . When requests pertaining to the prefetched addresses are subsequently received at the cache 110 - 2 , the requests can be serviced from the relatively higher-performance cache memory 120 , without triggering cache misses (and without accessing the relatively lower-performance memory 108 ).
- Addresses may be selected for prefetching based on, inter alia, metadata 122 pertaining to the address space of the memory 108 .
- access patterns within respective regions of the address space can be derived from the metadata 122 , and the access patterns can be used to prefetch data into the cache memory 120 .
- at least some of the metadata 122 is maintained within the cache memory 120 .
- a portion or partition of the cache memory 120 may be allocated for storage of the metadata 122 .
- the amount of cache memory 120 allocated for the metadata 122 may be adjusted, tuned, modified, varied, and/or otherwise managed based, at least in part, on one or more metrics.
- the metrics may pertain to one or more aspects of prefetch performance (may include one or more prefetch performance metrics), such as quantity of useful prefetches, quantity of bad prefetches, ratio of useful prefetches to bad prefetches, prefetch hit rate, prefetch miss rate, and/or the like.
- the metrics may pertain to one or more aspects of cache performance (may include one or more cache performance metrics), such as cache hit rate, cache miss rate, and/or the like.
- the amount of cache memory 120 allocated to the metadata 122 may be increased when one or more of the metrics exceeds a first threshold, may be decreased when the metrics fall below a second threshold, and so on.
- aspects of adaptive cache partitioning are implemented by cache 110 - 2 .
- the disclosure is not limited in this regard.
- the disclosed techniques for adaptive cache partitioning may be implemented in any cache 110 (e.g., cache 110 - 1 ) and/or cache layer, including across multiple caches 110 and/or cache layers.
- the cache 110 - 2 may be configured to allocate cache memory for metadata 122 pertaining to the address space.
- one or more internal cache(s) of the processor 103 may be configured to implement adaptive cache partitioning as disclosed herein (e.g., an L3 cache of the processor 103 may allocate cache memory to store metadata 122 pertaining to the address space).
- FIG. 1 - 2 illustrates further examples of apparatuses that can implement adaptive cache partitioning.
- the apparatus 100 can include a cache 110 configured to cache data associated with an address space.
- the cache 110 can be configured to cache data pertaining to any suitable address space including, but not limited to: a memory address space, a storage address space, a host address space, an input/output (I/O) address space, a main memory address space, a physical address space, a virtual address space, a virtual memory address space, an address space managed by, inter alia, the processor 103 , memory controller 104 , memory management unit (MMU), and/or the like.
- the cache 110 is configured to cache data pertaining to an address space of the memory 108 .
- the memory 108 may, therefore, represent a backing memory of the cache 110 within the memory hierarchy.
- the cache 110 can load addresses and/or corresponding data of the relatively slower memory 108 into the relatively faster cache memory 120 .
- Data may be loaded in response to cache misses (e.g., in response to requests pertaining to addresses and/or data that are not available within the cache 110 ).
- Servicing a cache miss may involve transferring data from the relatively slower memory 108 to the relatively faster cache memory 120 .
- Cache misses may, therefore, lead to increased request latency and poor performance.
- the cache 110 can address these and other issues by prefetching addresses into the cache memory 120 before requests pertaining to the addresses are received.
- Accurate prefetches may result in prefetch hits that can be serviced using the cache memory 120 , without incurring the latencies involved in cache misses. Inaccurate prefetches, however, may consume cache memory resources with data that are not subsequently accessed, which can adversely impact performance (e.g., increase miss rate, decrease cache hit rate, increase bandwidth consumption, and so on).
- Metadata 122 pertaining to the address space can be used to, inter alia, inform prefetch operations.
- address access patterns are derived from the metadata 122 , and the address access patterns are leveraged accurately to predict addresses of upcoming requests.
- the metadata 122 may include any information pertaining the address space and/or data associated with the backing memory of the cache 110 (e.g., the memory 108 ) including, but not limited to: a sequence of previously requested addresses or address offsets, an address history, an address history table, an index table, access frequencies for respective addresses, access counts (e.g., accesses within respective windows), access time(s), last access time(s), ML features or parameters, ANN features or parameters (e.g., weight and/or bias parameters), DNN features or parameters, LSTM features or parameters, and so on.
- the metadata 122 includes a plurality of entries, each entry including information pertaining to a respective region of the address space.
- the metadata 122 pertaining to respective regions of the address space may be used to, inter alia, determine address access patterns within the respective regions, which may be used to inform prefetch operations within the respective regions.
- the metadata 122 may be performance sensitive.
- the metadata 122 pertaining to the address space may be retrieved, updated, and/or otherwise accessed in performance-sensitive operations, such as operations to service requests, cache operations, prefetch operations, and so on. It may be advantageous, therefore, to maintain the metadata 122 in high-performance memory resources.
- at least some of the metadata 122 pertaining to the address space are maintained within the cache memory 120 .
- a portion of the cache memory 120 may be allocated for storage of the metadata 122 .
- a first portion 124 (or first partition) of the cache memory 120 is reserved for the metadata 122 .
- Data corresponding to addresses of the address space associated with the memory 108 may be cached within a second portion 126 of the cache memory 120 (or second partition), which may be different and/or separate from the first portion 124 .
- the first portion 124 may include any suitable resources of the cache memory 120 including, but not limited to zero or more: cache units, cache blocks, cache lines, hardware cache lines, sets, ways, rows, columns, banks, and/or the like.
- the first portion 124 (or first partition) may be referred to as a metadata portion, a metadata partition, a prefetch portion, a prefetch partition, or the like.
- the second portion 126 (or second partition) may be referred to as a cache portion, cache partition, or the like.
- the size of the first portion 124 may be adjusted based, at least in part, on one or more metrics.
- the metrics may pertain to any suitable aspect(s) of the cache 110 and/or memory hierarchy including, but not limited to: request latency, average request latency, throughput, cache performance, cache hit rate, cache miss rate, prefetch performance, prefetch hit rate, prefetch miss rate, quantity of useful prefetches, quantity of bad prefetches, ratio of useful prefetches to bad prefetches, and/or the like.
- the size of the first portion 124 may be increased in response to metrics indicating that cache and/or prefetch performance satisfies one or more first thresholds and may be reduced in response to metrics that fail to satisfy one or more second thresholds.
- the metrics may be configured to quantify the degree to which the workload on the cache 110 is suitable for prefetching.
- the amount of cache memory 120 allocated storage of the metadata 122 pertaining to prefetch operations may, therefore, correspond to a degree to which the workload is suitable for prefetching (as quantified by the one or more metrics).
- the size of the first portion 124 may be increased (and the size of the second portion 126 may be decreased), which may enable more accurate prefetching and further improve overall cache performance, despite the decrease to available cache capacity.
- the size of the first portion 124 may be decreased (and the size of the second portion 126 may be increased), which may increase the available capacity of the cache 110 .
- the increased availability of cache capacity may result in improved performance (e.g., reduced cache miss rate, lower replacement rate, and so on).
- FIG. 1 - 3 illustrates further examples of apparatuses that can implement adaptive cache partitioning.
- the cache 110 may be an internal cache and/or cache layer of the processor 103 (and/or a processor core thereof), such as an L1 cache, L2 cache, L3 cache, or the like.
- the memory hierarchy may further include a cache 110 disposed between the processor 103 and memory 108 (a cache 110 - 2 as illustrated in FIG. 1 - 1 ).
- the cache 110 may be configured to cache data associated with addresses of an address space.
- the cache 110 is configured to cache data pertaining to a virtual address space managed by an MMU, such as the memory controller 104 , an operating system, or the like.
- the address space may be larger than the physical address space of the memory resources of the host 102 (e.g., the address space may be larger than the physical address space of the memory 108 ).
- the address space may be a 32-bit address space, a 64-bit address space, a 128-bit address space, or the like.
- the cache 110 can allocate a first portion 124 of the cache memory 120 for storage of metadata 122 pertaining to the address space.
- the metadata 122 may include information pertaining to accesses to respective addresses and/or address regions of the address space, which may be used to, inter alia, prefetch data into the second portion 126 of the cache memory 120 (prefetch cache data 128 pertaining to respective addresses of the address space).
- the cache 110 can adjust the amount of cache memory 120 allocated to storage of the metadata 122 based, at least in part, on one or more metrics, as disclosed herein.
- the cache 110 can increase the amount of cache memory 120 allocated to the first portion 124 under workload conditions that are suitable for prefetching and can decrease the amount allocated to the first portion 124 under workload conditions that are unsuitable for prefetching.
- FIG. 2 illustrates an example 200 of an apparatus for implementing adaptive cache partitioning.
- the illustrated apparatus includes a cache 110 configured to accelerate memory storage operations pertaining to a memory 108 (a backing memory).
- the memory 108 may be any suitable memory and/or storage facility, as disclosed herein.
- the cache 110 may include and/or be coupled to an interface 215 , which may be configured to receive requests 202 pertaining to an address space associated with the memory 108 from at least one requestor 201 .
- the requestor 201 can be a host 102 , processor 103 , processor core, client, computing device, communication device (e.g., smartphone), Personal Digital Assistant (PDA), tablet computer, Internet of Things (IoT) device, camera, memory card reader, digital display, personal computer, server computer, data management system, Database Management System (DBMS), embedded system, system-on-chip (SoC) device, or the like.
- the requestor 201 can include a system motherboard and/or backplane and can include processing resources (e.g., one or more processors, microprocessors, control circuitry, and/or the like).
- the interface 215 can be configured to couple the cache 110 to one or more interconnects, such as an interconnect 105 a host 102 or the like.
- the cache 110 may be configured to service requests 202 pertaining to the memory 108 by use of high-performance memory resources, such as cache memory 120 .
- the cache memory 120 may include cache memory resources.
- a “cache memory resource” refers to any suitable data and/or memory storage resource.
- the cache memory resources of the cache memory 120 include a plurality of cache units 220 , each cache unit 220 capable of storing a respective quantity of data.
- the cache units 220 may include and/or correspond to any suitable type and/or arrangement of memory resource(s) including, but not limited to: a unit, a memory unit, a block, a memory block, a cache block, a cache memory block, a page, a memory page, a cache page, a cache memory page, a cache line, a hardware cache line, a set, a way, a memory array, a row, a column, a bank, a memory bank, and/or the like.
- the cache memory 120 includes X cache units 220 (cache units 220 - 1 through 220 -X).
- the cache 110 may be logically disposed between the requestor 201 and the memory 108 (e.g., may be interposed between the requestor 201 and the memory 108 ).
- the requestor 201 , cache 110 , and memory 108 may be communicatively coupled to an interconnect 105 .
- the cache 110 may include and/or be coupled to logic (cache logic 210 ) that is configured to receive requests 202 pertaining an address space associated with the memory 108 by, inter alia, monitoring, filtering, sniffing, extracting, intercepting, identifying and/or otherwise retrieving requests 202 pertaining to the address space on the interconnect 105 .
- the cache logic 210 can be further configured to map addresses 204 to cache units 220 .
- Requests 202 pertaining to addresses 204 that map to cache units 220 , including valid data associated with the addresses 204 result in cache hits, which can be serviced by use of the relatively higher-performance cache memory 120 .
- Requests 202 pertaining to addresses 204 do not map to valid data stored within the cache memory 120 result in cache misses.
- Servicing a request 202 pertaining to an address 204 resulting in a cache miss may involve implementing a transfer operation 203 to fetch data associated with the address 204 from the relatively slower memory 108 , which may increase the latency of the request 202 .
- the cache logic 210 includes and/or is coupled to prefetch logic 230 .
- the prefetch logic 230 may be configured to predict the addresses 204 of upcoming requests 202 .
- the prefetch logic 230 can be further configured to implement transfer operations 203 (or cause the cache logic 210 to implement transfer operations 203 ) to prefetch data corresponding to the predicted addresses 204 into the cache memory 120 .
- the prefetch logic 230 may cause the transfer operations 203 to be implemented before requests 202 pertaining to the predicted addresses 204 are received at the cache 110 .
- Subsequent requests 202 pertaining to the predicted addresses 204 may, therefore, result in prefetch hits that can be serviced using the relatively higher-performance cache memory 120 , without incurring latencies involved with servicing cache misses (or accessing the relatively lower-performance memory 108 ).
- the latency of the requests 202 pertaining to prefetched addresses 204 may not include latencies involved in loading data of the predicted addresses 204 into the cache 110 .
- the prefetch logic 230 may determine addresses predictions based, at least in part, on metadata 122 pertaining to the address space (e.g., prefetcher metadata).
- the metadata 122 may include any suitable address access characteristics including, but not limited to: a sequence of previously requested addresses or address offsets, an address history, an address history table, an index table, access frequencies for respective addresses, access counts (e.g., accesses within respective windows), access time(s), last access time(s), and so on.
- the prefetch logic 230 may be configured to maintain and/or update the metadata 122 in response to events pertaining to respective addresses 204 , which may include, but are not limited to: data access requests, read requests, write requests, copy requests, clone requests, trim requests, erase requests, delete requests, cache misses, cache hits, and/or the like.
- the prefetch logic 230 may utilize the metadata 122 to determine address access patterns and can use the determined address access patterns to predict the addresses 204 of upcoming requests 202 .
- the metadata 122 may include a plurality of entries, each entry including information pertaining to a respective region of the address space.
- the prefetch logic 230 may utilize the metadata 122 to determine access patterns within respective regions of the address space and use the determined access patterns to predict addresses 204 of upcoming requests 202 within the respective regions.
- the cache logic 210 may allocate a first portion 124 of the cache memory 120 for storage of the metadata 122 and/or use by the prefetch logic 230 .
- the cache logic 210 may maintain data pertaining to addresses of the address space (cache data 128 ) within a second portion of the cache memory 120 , which may be separate and/or distinct from the first portion 124 of the cache memory 120 .
- the cache logic 210 allocates M cache units 220 to storage of the metadata 122 .
- the first portion 124 may include cache units 220 - 1 through 220 ⁇ M
- FIG. 1 - 3 illustrates one example for adaptive cache partitioning, the disclosure is not limited in this regard and could be adapted to partition the cache memory 120 according to any suitable partitioning scheme.
- the cache logic 210 may allocate cache units 220 - 1 through 220 -C to the second portion 126 and allocate cache units 220 -C+1 through 220 -X to the first portion 124 .
- the cache logic 210 may allocate other groupings of cache units 220 , such as sets, ways, rows, columns, banks, and/or the like.
- the cache logic 210 may be configured to implement a first mapping scheme (a metadata scheme) to map the metadata 122 , and/or entries thereof, to cache units 220 within the first portion 124 .
- the cache logic 210 may be further configured to implement a second mapping scheme (a cache or address mapping scheme 316 ) to map addresses 204 to cache units 220 allocated to the second portion 126 .
- the cache logic 210 may be configured to modify the first mapping scheme and/or second mapping scheme in response to modifying the size and/or configuration of the cache memory 120 allocated to one or more of the first portion 124 and the second portion 126 .
- the cache logic 210 may adjust the quantity of cache memory 120 allocated to storage of the metadata 122 based, at least in part, on one or more metrics 212 .
- the metrics 212 may be configured to quantify a degree to which a workload on the cache 110 is suitable for prefetching. More specifically, the metrics 212 may be configured to quantify aspects of prefetch performance (e.g., may include one or more prefetch performance metrics 212 and/or metrics 212 pertaining to prefetch performance), such as a prefetch hit rate, quantity or useful prefetches, ratio of useful prefetches to bad prefetches, and/or the like.
- the metrics 212 may be configured to quantify other performance characteristics, including cache performance (e.g., may include one or more cache performance metrics 212 and/or metrics 212 pertaining to cache performance), such as cache hit rate, cache miss rate, and/or the like.
- the cache logic 210 may use the metrics 212 to determine the degree to which workload(s) on the cache 110 are suitable for prefetching and dynamically partition the cache memory 120 accordingly. More specifically, the cache logic 210 can adjust the amount of cache memory 120 allocated to the first portion 124 and/or second portion 126 based, at least in part, on one or more of the metrics 212 . The cache logic 210 may periodically monitor the metrics 212 and may determine whether to modify the size of the first portion 124 in response to the monitoring.
- the cache logic 210 may increase the amount of cache memory 120 allocated to the first portion 124 when one or more of the metrics 212 are above a first threshold (when prefetch performance exceeds the first threshold) and decrease the amount when the one or more metrics 212 are below a second threshold (when prefetch performance falls below the second threshold).
- the cache logic 210 may monitor the one or more metrics 212 in background operations, during idle periods (when not actively servicing requests 202 , implementing prefetch operations, or the like), on a determined schedule, and/or the like.
- Increasing the size of the first portion 124 may include decreasing the size of the second portion 126
- decreasing the size of the first portion 124 may include increasing the size of the second portion 126 .
- increasing the size of the first portion 124 may include reallocating one or more cache units 220 of the second portion 126 to the first portion 124
- decreasing the size of the first portion 124 may include reallocating one or more cache units 220 of the first portion 124 to the second portion 126
- Resizing the amount of cache memory 120 allocated to the metadata 122 may include manipulating the metadata 122 and/or cache data 128 . Reducing the amount of cache memory 120 allocated to the metadata 122 may include evicting portions of the metadata 122 .
- the metadata 122 may be evicted according to a policy (a metadata eviction policy).
- the metadata eviction policy may specify that the oldest and/or least recently used entries of the metadata 122 are to be evicted when the size of the first portion 124 is reduced.
- reducing the size of the second portion 126 allocated for the second portion 126 may include evicting cache data 128 from one or more cache units 220 .
- the cache data 128 may be evicted according to a policy (a replacement or eviction policy), which may include, but is not limited to: First In First Out (FIFO), Last In First Out (LIFO), Least Recently Used (LRU), Time Aware LRU (TLRU), Most Recently Used (MiRU), Least-Frequently Used (LFU), random replacement, and/or the like.
- a policy a replacement or eviction policy
- FIFO First In First Out
- LIFO Last In First Out
- LRU Least Recently Used
- TLRU Time Aware LRU
- Most Recently Used MiRU
- LFU Least-Frequently Used
- the cache logic 210 , prefetch logic 230 , and/or components and functionality thereof may include, but are not limited to: circuitry, logic circuitry, control circuitry, interface circuitry, input/output (I/O) circuitry, fuse logic, analog circuitry, digital circuitry, logic gates, registers, switches, multiplexers, arithmetic logic units (ALU), state machines, microprocessors, processor-in-memory (PIM) circuitry, and/or the like.
- the cache logic 210 may be configured as a controller of the cache 110 (or cache controller).
- the prefetch logic 230 may be configured as a prefetcher (or cache prefetcher) of the cache 110 .
- FIG. 3 illustrates another example 300 of an apparatus for implementing adaptive cache partitioning.
- the cache 110 may be configured to cache data pertaining to an address space associated with a memory 108 , as disclosed herein.
- the apparatus includes a cache 110 coupled between the requestor 201 and memory 108 .
- the cache 110 is interposed between the requestor 201 and memory 108 .
- the cache 110 can include and/or be coupled to a first interface 215 A and/or a second interface 215 B.
- the first interface 215 A may be configured receive requests 202 pertaining to addresses 204 of the address space through a first interconnect 105 A and, as such, may be referred to as a front-end interconnect.
- the requests 202 may correspond to one or more requestors 201 , as disclosed herein.
- the second interface 215 B may be configured to, inter alia, couple the cache 110 (and/or cache logic 210 ) to a backing memory, such as the memory 108 and, as such, may be referred to as a back-end interface.
- Cache data 128 may be loaded into the cache 110 in transfer operations 203 implemented by and/or through the second interface 215 B.
- the requestor 201 , cache 110 , and memory 108 may be coupled to a same interconnect, as illustrated in FIG. 2 .
- the cache memory 120 may include a plurality of cache units 220 (e.g., X cache units 220 - 1 through 220 -X).
- the cache units 220 may include memory cells, memory rows, memory columns, memory pages, cache lines, hardware cache lines, cache memory units 320 , cache tags 326 , and/or the like. In some implementations, the cache units 220 are organized into a plurality of sets, each set including a plurality of ways, each way including and/or corresponding to a respective cache unit 220 .
- each cache unit 220 includes and/or is associated with a respective cache memory unit (CMU) 320 and/or cache tag 326 .
- the cache tag 326 may be configured to identify the data stored within the CMU 320 .
- the CMU 320 may be capable of storing cache data 128 associated with one or more addresses 204 of the address space (one or more addressable data units).
- each CMU 320 (and each cache unit 220 ) is capable of storing data of U addresses 204 (or U data units).
- each CMU 320 (and corresponding cache unit 220 ) may have a capacity of U bytes.
- the cache units 220 may further include and/or be associated with cache metadata 322 .
- the cache metadata 322 of a cache unit 220 may include information pertaining to the cache data 128 stored within the CMU 320 of the cache unit 220 (e.g., cache metadata 322 - 1 through 322 -X pertaining to cache data 128 stored within CMU 320 - 1 through 320 -X of cache units 220 - 1 through 220 -X, respectively).
- the cache metadata 322 may include any suitable information pertaining to the contents of a cache unit 220 including, but not limited to: validity information indicating whether cache data 128 stored within the CMU 320 of the cache unit 220 is valid, a “dirty” flag indicating whether the cache data 128 has been modified since being loaded from the memory 108 (should be written to the memory 108 before eviction), access count, last access time, access frequency, a prefetch flag indicating whether the cache data 128 was loaded in a prefetch operation, and so on.
- the cache metadata 322 of a cache unit 220 may be maintained within the CMU 320 of the cache unit 220 .
- cache metadata 322 may be maintained within separate cache memory resources.
- the cache logic 210 may implement, include and/or be coupled to partition logic 310 , which may be configured to, inter alia, partition the cache memory 120 into a first portion 124 and second portion 126 (e.g., divide the cache memory 120 into a first partition and second partition).
- the cache logic 210 may be configured to map, assign, and/or otherwise associate addresses 204 with cache units 220 allocated to the second portion 126 .
- the cache logic 210 may associate addresses 204 with cache units 220 according to an address-cache mapping scheme (an address mapping scheme 316 or address mapping logic). In the FIG. 3 example, the address mapping scheme 316 may logically divide addresses 204 into a tag region (an address tag 206 ) and an offset region 205 .
- the offset region 205 may be defined within a least significant bit (LSB) address region.
- the address tag 206 may be defined within the remaining most significant bit (MSB) address region.
- example addresses 204 , offset regions 205 , and address tags 206 are illustrated and described herein in reference to big-endian format, the disclosure is not limited in this regard and could be adapted for use with addresses 204 in any suitable format, encoding, or endianness.
- the cache logic 210 can lookup cache units 220 for addresses 204 and/or determine whether the cache memory 120 includes valid data corresponding to the addresses 204 (e.g., determine whether addresses 204 are cache hits or cache misses).
- the cache logic 210 can lookup cache units 220 for respective addresses 204 by, inter alia, matching address tags 206 of the addresses 204 to cache tags 326 of the cache units 220 . Addresses 204 that match cache tags 326 may be identified as cache hits, whereas addresses 204 that do not match cache tags 326 may be identified as cache misses.
- the cache logic 210 implements a hierarchical or set-based address mapping scheme 316 in which address tags 206 are first mapped to one of a plurality of sets and then are compared to cache tags 326 of a plurality of ways of the set, each way corresponding to a respective cache unit 220 .
- the cache logic 210 may include and/or be coupled to prefetch logic 230 , which may utilize metadata 122 pertaining to the address space to predict addresses of upcoming requests 202 and prefetch data corresponding to the predicted addresses into the cache memory 120 , as disclosed herein. At least a portion of the metadata 122 may be maintained within the cache memory 120 .
- the cache logic 210 may include, implement, and/or be coupled to partition logic 310 , which may be configured to divide the cache memory 120 into a first portion 124 and a second portion 126 (partition the cache memory 120 ). The first portion 124 may be allocated for metadata 122 pertaining to the address space.
- the partition logic 310 may utilize a remaining available capacity of the cache memory 120 (the second portion 126 ) as available cache capacity.
- the cache logic 210 can use the second portion 126 of the cache memory 120 to maintain cache data 128 , as disclosed herein.
- the partition logic 310 may be configured to partition the cache memory 120 into a first partition comprising a first portion 124 of the cache memory resources of the cache memory 120 (e.g., a first quantity of cache units 220 ) and a second partition comprising a second portion 126 of the cache memory resources (e.g., a second quantity of cache units 220 ).
- the first portion 124 of the cache memory 120 may be allocated to store the metadata 122 pertaining to the address space, and the second portion 126 may be allocated as available cache capacity of the cache 110 (e.g., allocated to store cache data 128 ).
- the partition logic 310 may be configured to adjust the quantity to cache memory resources allocated to the first portion 124 and/or second portion 126 based, at least in part, on metrics 212 that are indicative of prefetch performance and/or a degree to which workload(s) being serviced by the cache 110 are suitable for prefetching.
- the partition logic 310 can be configured to zero or more cache units 220 to the first portion 124 and allocate one or more cache units 220 to the second portion 126 .
- the first portion 124 of the cache memory 120 may include cache units 220 - 1 through 220 ⁇ M
- the second portion 126 may include cache units 220 ⁇ M+1 through 220 -X.
- the disclosure is not limited in this regard, however, and could partition the cache memory 120 and/or allocate cache units 220 in any suitable pattern or in accordance with any suitable scheme or arrangement.
- the cache logic 210 may implement, include and/or be coupled to a metadata mapping scheme 314 (and/or metadata mapping logic), which may be configured to map, address, associate, reference, and/or otherwise provide access to cache units 220 allocated to the first portion 220 .
- the metadata mapping scheme 314 may enable the prefetch logic 230 (or external prefetcher) to access metadata 122 maintained within the first portion 124 of the cache memory 120 .
- the metadata mapping scheme 314 implemented by the cache logic 210 (and/or partition logic 310 ) maps metadata addresses to cache units 220 allocated to the first portion 124 (and/or offsets within the respective cache units 220 ).
- the metadata mapping scheme 314 may define a metadata address space (M A ), M A ⁇ 0, . .
- the metadata address space (M A ) may define a range of cache unit indexes M I , each corresponding to a respective one of the M cache units 220 allocated to the first portion 124 , M A ⁇ 0, . . . , M ⁇ 1 ⁇ .
- Partitioning the cache memory 120 into a plurality of portions may include configuring mapping logic and/or mapping schemes of the portions to allocate, include, and/or incorporate designated cache memory resources of the cache memory 120 .
- “allocating,” “partitioning,” or “assigning” a portion of the cache memory 120 may include configuring mapping logic and/or a mapping scheme of the portion (or partition) to “include” or “reference” the cache memory resources.
- Configuring mapping logic and/or a mapping scheme to “include” or “reference” cache memory resources allocated to the portion or partition of the cache memory 120 may include configuring the mapping logic and/or mapping scheme to reference, allocate, include, incorporate, add, map, address, associate and/or otherwise access (or provide access to) the cache memory resources.
- Allocating cache memory resources to a portion or partition of the cache memory 120 e.g., the first portion 124
- “deallocating,” “removing,” or “excluding” cache memory resources from a portion or partition of the cache memory 120 may include configuring mapping logic and/or a mapping scheme of the portion (or partition) to “remove,” “exclude,” or “dereference” the cache memory resources.
- Configuring mapping logic and/or a mapping scheme to “remove,” “exclude,” or “dereference” cache memory resources may include configuring the mapping logic and/or mapping scheme to remove, disable, ignore, deallocate, dereference, demap, bypass, and/or otherwise exclude the cache memory resources from the partition or portion (e.g., prevent the cache memory resources from being access by and/or through the mapping logic and/or mapping scheme).
- the cache logic 210 may be configured to allocate M cache units 220 to the first portion 124 of the cache memory 120 (e.g., cache units 220 - 1 through 220 ⁇ M).
- Allocating the M cache units 220 to the first portion 124 may include configuring the metadata mapping scheme 314 (and/or metadata mapping logic) to include and/or reference cache units 220 - 1 through 220 ⁇ M, as disclosed herein.
- Allocating the M cache units to the first portion 124 may further include deallocating and/or excluding the M cache units 220 from the second portion 126 .
- Deallocating or excluding cache units 220 - 1 through 220 ⁇ M from the second portion 126 of the cache memory 120 may include configuring the address mapping scheme 316 to remove, exclude, and/or dereference the cache units 220 - 1 through 220 ⁇ M.
- the address mapping scheme 316 may be configured such that addresses 204 (and/or address tags 206 ) do not map to cache units 220 allocated to the first portion 124 . In the FIG.
- a cache unit 220 may be excluded from the address mapping scheme 316 by, inter alia, disabling the cache tag 326 associated with the cache unit 220 .
- Allocating the cache units 220 - 1 through 220 ⁇ M to the first portion 124 of the cache memory 120 may, therefore, include disabling cache tags 326 - 1 through 326 -M.
- the cache tags 326 of the cache units 220 that are allocated to the first portion 124 of the cache memory 120 (and are excluded from the second portion 126 and/or address mapping scheme 316 ) are highlighted with crosshatching.
- Cache tags 326 -M+1 through 326 -X corresponding to the cache units 220 ⁇ M+1 through 220 -X included in the second portion 126 of the cache memory 120 may remain enabled and/or be indexable by address tags 206 in the address mapping scheme 316 .
- the cache logic 210 (and/or partition logic 310 ) divides the cache memory 120 in accordance with a partition scheme 312 .
- the partition scheme 312 may logically define how cache memory resources are divided between the first portion 124 and the second portion 126 of the cache memory 120 .
- the partition scheme 312 may also logically define how cache resources are allocated between the partitions.
- the partition scheme 312 may define rules, schemes, logic, and/or criteria by which the cache memory 120 may be dynamically allocated and/or partitioned between the first portion 124 and the second portion 126 .
- the partition scheme 312 may be further configured to specify the amount, quantity, and/or capacity of cache memory resources to allocate to the first portion 124 and/or second portion 126 , respectively.
- Adapting the partition scheme 312 may include modifying the amount, quantity, and/or capacity of the cache memory resources allocated to the first portion 124 and/or second portion 126 .
- the partition scheme 312 allocates M cache units 220 for metadata storage (e.g., allocates M cache units 220 to the first portion 124 ) and allocates X-M cache units 220 as available cache capacity (e.g., allocates the remaining X-M cache units 220 to the second portion 126 ).
- the partition scheme 312 configures the cache logic 210 (and/or partition logic 310 ) to allocate cache units 220 to the first portion 124 by cache unit 220 (may partition the cache memory 120 in accordance with a cache-unit or cache-unit-based scheme).
- the partition scheme 312 may allocate cache units 220 sequentially by cache unit address or index.
- allocating M cache units 220 to the first portion 124 may include allocating cache units 220 - 1 through 220 ⁇ M to the first portion 124 such that cache units 220 ⁇ M+1 through 220 -X are allocated to the second portion 126 , as illustrated in FIG. 3 .
- Increasing the size of the first portion 124 may include allocating additional cache units 220 to the first portion 124 sequentially.
- increasing the amount of cache units 220 allocated to the first portion 124 from M cache units 220 to M+R cache units 220 may include allocating cache units 220 ⁇ M+1 through 220 ⁇ M+R from the second portion 126 to the first portion 124 .
- the first portion 124 may include cache units 220 - 1 through 220 ⁇ M+R
- the second portion 126 may include cache units 220 +M+R+1 through 220 -X.
- decreasing the size of the first portion 124 from M cache units 220 to M ⁇ R cache units 220 may include allocating cache units 220 ⁇ M ⁇ R through 220 ⁇ M from the first portion 124 to the second portion 126 .
- the first portion 124 may include cache units 220 - 1 through 220 ⁇ M ⁇ R
- the second portion 126 may include cache units 220 +M ⁇ R+1 through 220 -X.
- partition scheme 312 may configure the cache logic 210 (and/or partition logic 310 ) to allocate cache units 220 in other patterns, sequences, and/or schemes.
- the partition scheme 312 may define an interleaved allocation pattern, a modulo pattern, a hash pattern, may allocate cache units 220 in accordance with the hardware structure of the cache memory 120 and/or manner in which cache units 220 of the cache memory 120 are organized, and/or the like.
- the cache memory 120 includes a plurality of sets, each set including a plurality of ways, each way including and/or corresponding to a respective cache unit 220 .
- the partition scheme 312 may allocate cache memory resources by way, set, or the like.
- the cache logic 210 (and/or partition logic 310 ) may partition the cache memory 120 by way.
- the cache logic 210 may allocate a first quantity of zero or more ways within one or more sets to the first portion 124 and may allocate a second quantity of one or more ways within one or more sets to the second portion 126 .
- the first portion 124 includes a first quantity of zero or more ways within each set of the cache memory 120
- the second portion 126 includes a second quantity of one or more ways within each set.
- the first portion 124 may include a first group of ways within each set and the second portion 126 may include a second group of ways within each set (e.g., may include ways not allocated to the first portion 124 ).
- Allocating M cache units 220 to the first portion 124 of the cache memory 120 by way may include allocating W 1 ways to the first portion within each set, where
- the first portion 124 may include ways 1 through W 1 within each set, and the second portion 126 may include ways W 1 +1 through N within each set.
- the disclosure is not limited in this regard, however, and could distribute ways between the first portion 124 and second portion 126 in any suitable manner, scheme, and/or pattern.
- increasing the amount of cache memory 120 allocated to the first portion 124 from M to M+R cache units 220 may include allocating an additional Wm ways of each set to the first portion 124 (and deallocating the W 1A ways of each set from the second portion 126 ), where
- the first portion 124 may include ways 1 through W 1 +W 1A within each set, and the second portion 126 may include ways W 1 +W 1A +1 through N within each set.
- decreasing the amount of cache memory 120 allocated to the first portion 124 from M to M ⁇ R cache units 220 may include allocating W 2A ways of each set from the first portion 124 to the second portion 126 , where
- the first portion 124 may include ways 1 through W 1 ⁇ W 2A within each set, and the second portion 126 may include ways W 1 ⁇ W 2A +1 through N within each set.
- the cache logic (and/or partition logic 310 ) may be configured to partition the cache memory 120 by set.
- Allocating a set may include allocating each way (and/or corresponding cache unit 220 ) of the set.
- the first portion 124 may include a first group of zero or more sets of the cache memory 120 and the second portion 126 may include a second group of one or more of the sets (may include each set of the cache memory 120 not allocated to the first portion 124 ).
- Allocating M cache units 220 of the cache memory 120 to the first portion 124 by set may include allocating E 1 sets to the first portion 124 , where
- the first portion 124 may include sets 1 through E 1 of the cache memory 120 and the second portion 126 may include sets E 1 +1 through S.
- the disclosure is not limited in this regard, however, and could distribute sets between the first portion 124 and second portion 126 in any suitable manner, scheme, and/or pattern.
- increasing the amount of cache memory 120 allocated to the first portion 124 from M to M+R cache units 220 may include allocating an additional E 1A sets of the cache memory 120 to the first portion 124 (and deallocating the E 1A sets from the second portion 126 ), where
- the first portion 124 may include sets 1 through E 1 +E 1A and the second portion 126 may include sets E 1 +E 1A +1 through S.
- decreasing the amount of cache memory 120 allocated to the first portion 124 from M to M ⁇ R cache units 220 may include allocating E 2A sets from the first portion 124 to the second portion 126 , where
- the first portion 124 may include sets 1 through E 1 ⁇ E 2A
- the second portion 126 may include sets E 1 ⁇ E 2A +1 through S.
- the cache logic 210 (and/or partition logic 310 ) can adjust the amount of cache memory 120 allocated to the first portion 124 (and/or second portion 126 ) based, at least in part, on one or more metrics 212 .
- the metrics 2121 may be configured to quantify the degree to which the workload on the cache 110 is suitable for prefetching.
- the metrics 212 may be configured to quantify aspects of prefetch performance.
- the cache logic 210 (and/or prefetch logic 310 ) can determine and/or monitor any suitable aspect of prefetch performance, such as prefetch hit rate, prefetch miss rate, quantity of useful prefetches, quantity of bad prefetches, ratio of useful prefetches to bad prefetches, and/or the like.
- Prefetch hit rate may be determined by tracking accesses to prefetched cache data 128 within the cache memory 120 .
- prefetched cache data 128 refers to cache data 128 that was loaded into the cache memory 120 before being requested (e.g., by the prefetch logic 230 and/or in a prefetch operation).
- non-prefetched cache data 128 refers to cache data 128 that was loaded in response to a request 202 , a cache miss, or the like.
- the cache logic 210 tracks prefetched cache data 128 by use of cache metadata 322 .
- the cache logic 210 may record a prefetch flag or other indicator in the cache metadata 322 to distinguish prefetched cache data 128 from non-prefetched cache data 128 .
- a prefetch hit rate may be determined based on access metrics of prefetched cache data 128 maintained within the cache metadata 322 , such as access count, access frequency, last access time, and/or the like.
- prefetch miss rate may be determined by identifying prefetched cache data 128 having no accesses or accesses below a threshold quantity or frequency.
- the metrics 212 are further configured to quantify other aspects of cache performance, such as cache hit rate, cache miss rate, request latency, and so on.
- the cache logic 210 may be configured to determine and/or monitor aspects of cache performance.
- the cache logic 210 may be configured to determine a cache hit rate by, inter alia, monitoring a quantity of requests 202 that result in cache hits, monitoring a quantity of requests 202 that result in cache misses, and/or the like.
- the one or more metrics 212 are configured to quantify aspects of cache and/or prefetch performance for respective regions of the address space.
- the cache logic 210 (and/or prefetch logic 230 ) can determine and/or monitor prefetch performance within regions of the address space covered by respective entries of the metadata 122 .
- the metrics 212 may, therefore, quantify the degree to which the workloads within respective regions of the address space are suitable for prefetching.
- the prefetch logic 230 may utilize the metrics 212 to determine whether to implement prefetching within the respective address regions, the prefetch degree for respective address regions, the amount of metadata 122 to maintain for the respective address regions, and/or the like.
- the cache logic 210 (and/or partition logic 310 ) can utilize the one or more metrics 212 to dynamically partition the cache memory 120 . More specifically, the cache logic 210 may utilize the metrics 212 to determine, tune, adapt, and/or otherwise manage the amount of cache memory 120 allocated for storage of the metadata 122 pertaining to the address space (the amount of cache memory 120 allocated to the first portion 124 ) and/or the amount of cache memory 120 allocated for storage of cache data 128 (the amount of cache memory 120 allocated to the second portion 126 ).
- the cache logic 210 (and/or partition logic 310 ) may be configured to: a) increase the quantity of cache units 220 allocated to the first portion 124 when one or more of the metrics 212 exceed a first threshold (thereby decreasing the quantity of cache units 220 allocated for storage of cache data 128 within the second portion 126 ), or b) decrease the quantity of cache units 220 allocated to the first portion 124 when one or more of the metrics 212 is below a second threshold (thereby increasing the quantity of cache units 220 allocated for storage of cache data 128 within the second portion 126 ).
- the adjustments implemented by the cache logic 210 can dynamically allocate cache memory resources between the first portion 124 (metadata 122 ) and the second portion 126 (cache data 128 ) based on the degree to which workload(s) being serviced by the cache 110 are suitable for prefetching.
- the cache logic 210 can increase the amount of cache memory 120 allocated for the metadata 122 under workload conditions that are suitable for prefetching and can decrease (or eliminate) the allocation under workload conditions that are not suitable for prefetching, thereby increasing the amount of available cache capacity when servicing unsuitable workloads.
- Increasing the quantity of cache units 220 allocated to the first portion 124 may include assigning or allocating one or more cache units 220 of the second portion 126 to the first portion 124 .
- allocating a cache unit 220 to the first portion 124 may include configuring the metadata mapping scheme 314 (and/or metadata mapping logic) to include the cache unit 220 , providing the prefetch logic 230 with access to the cache unit 220 (or CMU 320 thereof), and/or otherwise making the CMU 320 of the cache unit 220 available for storage of metadata 122 pertaining to the address space.
- Allocating a cache unit 220 to the first portion 124 of the cache memory 120 may further include deallocating the cache unit 220 from the second portion 126 .
- Deallocating a cache unit 220 from the second portion 126 of the cache memory 120 may include configuring the address mapping scheme 316 (and/or address mapping logic) to remove or exclude the cache unit 220 .
- the address mapping scheme 316 may be configured to dereference the cache unit 220 such that the cache unit 220 is excluded from the C cache units 220 included in the second portion 126 .
- the address mapping scheme 316 may be modified to remove the cache unit 220 from an index or other mechanism by which addresses 204 and/or address tags 206 are associated with cache units 220 (e.g., by disabling the cache tag 326 of the cache unit 220 ).
- Deallocating a cache unit 220 from the second portion 126 may further include evicting cache data 128 from the cache unit 220 , setting a validity flag of the cache metadata 322 to “false,” and/or the like.
- deallocating a cache unit 220 from the second portion 126 further includes identifying “dirty” cache data 128 within the cache unit 220 (based, at least in part, on cache metadata 322 associated with the cache data 128 , such as “dirty” indicators) and flushing and/or destaging the identified cache data 128 (if any) to a backing memory, such as the memory 108 (e.g., writing the identified cache data 128 back to the memory 108 ).
- the cache logic 210 (and/or partition logic 310 ) may be configured to preserve cache state when repartitioning the cache memory 120 to increase the size of the first portion 124 and/or decrease the size of the second portion 126 .
- the cache logic 210 may preserve cache state when reducing the amount of cache memory 120 allocated to the second portion 126 by, inter alia, compacting cache data 128 stored within the second portion 126 for storage within fewer cache units 220 .
- Compacting the cache data 128 may include evicting a first subset of the cache data 128 currently stored within the second portion 126 of the cache memory 120 , the first subset including an amount of cache data 128 equivalent to R cache units 220 (and/or cache data 128 stored within R cache units 220 currently allocated to the second portion 126 ).
- the first subset of the cache data 128 may be selected for eviction based on a suitable eviction or replacement policy and/or criteria, such as FIFO, LIFO, LRU, TLRU, MRU, LFU, random replacement, or the like. Evicting cache data 128 from a cache unit 220 may make the cache unit 220 available to store other cache data 128 (transition the cache unit 128 from “occupied” to “available” or empty).
- the cache logic 210 (and/or partition logic 310 ) may be further configured to move remaining cache data 128 stored within cache units 220 that are to be allocated to the first portion 124 (if any) to available cache units 220 that are to remain allocated to the second portion 126 .
- the cache units 220 selected for eviction may be different from the cache units 220 selected for reallocation.
- the cache logic 110 may select cache units 220 for eviction based on an eviction or replacement policy.
- the cache logic 110 (and/or partition logic 310 ) may select cache units 220 to reallocate from the second portion 126 to the first portion 124 (or vice versa) based on separate, independent criteria.
- the cache units 220 to reallocate from the second portion 126 to the first portion 124 (or vice versa) may be selected in accordance with a partition scheme 312 , as disclosed herein.
- the partition scheme 312 may define rules, schemes, logic, and/or other criteria by which cache units 220 are divided (and/or dynamically allocated) between the first portion 124 and the second portion 126 .
- the partition scheme 312 may divide the cache memory 120 in any suitable pattern or scheme including, but not limited to: a sequential scheme, a way-based scheme, a set-based scheme, and/or the like. Decreasing the quantity of cache units 220 allocated to the first portion 124 may include increasing the quantity of cache units 220 allocated to the second portion 126 (e.g., increasing the available cache capacity). Decreasing the size of the first portion 124 may include assigning or allocating one or more cache units 220 from the first portion 124 to the second portion 126 .
- Allocating a cache unit 220 to the second portion 126 may include removing the cache unit 220 from the metadata mapping scheme 314 , such that the cache unit 220 is no longer included in the group of M cache units 220 available for storage of the metadata 122 (e.g., modifying the metadata address scheme M A ). Allocating the cache unit 220 to the second portion 126 may further include modifying the address mapping scheme 316 to reference the cache unit 220 (e.g., including the cache unit 220 in the group of C cache units 220 available for storage of cache data 128 ).
- the address mapping scheme 316 may be modified to enable addresses 204 and/or address tags 206 to map and/or be assigned to the cache unit 220 by, inter alia, enabling the cache tag 326 of the cache unit 220 . Decreasing the quantity of cache units 220 allocated to the first portion 124 may decrease the amount of cache memory 120 available for storage of the metadata 122 . Decreasing the size of the first portion 124 may, therefore, include compacting the metadata 122 for storage within a smaller amount of cache memory 120 . The metadata 122 may be compacted for storage within a smaller memory range (e.g., from a first size M 1 to a second, smaller size M 2 ).
- Compacting the metadata 122 may include removing a portion of the metadata 122 , such as one or more entries of the metadata 122 .
- the portion of the metadata 122 may be selected based on a removal criterion, such as an age criterion (oldest removed first, youngest removed first, or the like), least recently accessed criterion, least frequently accessed criterion, and/or the like.
- portions of the metadata 122 may be selected for removal based, at least in part, on one or more metrics 212 .
- the metadata 122 may include a plurality of entries, each entry including access information pertaining to a respective region of the address space.
- the prefetch logic 230 may utilize respective entries of the metadata 122 to implement prefetch operations within the address regions covered by the respective entries.
- the one or more metrics 212 may be configured to quantify prefetch performance within the address regions covered by the respective entries of the metadata 122 .
- Compacting the metadata 122 may include selecting entries of the metadata 122 for removal based, at least in part, on prefetch performance within the address regions covered by the entries, as quantified by the metrics 212 .
- entries of the metadata 122 in which prefetch performance is below a threshold may be removed (and/or the amount of memory capacity allocated to the entries may be reduced).
- entries of the metadata 122 exhibiting higher prefetch performance may be retained, whereas entries exhibiting lower prefetch performance may be removed (e.g., the R lowest-performing entries of the metadata 122 may be selected for removal).
- Compacting the metadata may, therefore, include removing metadata 122 from one or more cache units 220 and/or moving metadata 122 (and/or entries of the metadata 122 ) from cache units 220 being reallocated to the second portion 126 to the remaining cache units 220 allocated to the first portion 124 .
- FIG. 4 - 1 illustrates another example 400 of an apparatus for implementing adaptive cache partitioning.
- the apparatus 400 includes a cache 110 that is configured to cache data pertaining to an address space associated with a memory 108 .
- the cache 110 may include and/or be coupled to one or more interconnects.
- the cache 110 includes and/or is coupled to a first interface 215 A configured to couple the cache 110 to a first interconnect 105 A and a second interface 215 B configured to couple the cache 110 to a second interconnect 105 B.
- the cache 110 may be configured to service requests 202 pertaining to addresses 204 of the address space from one or more requestors 201 .
- the cache 110 may service the requests 202 by use of cache memory 120 , which may include loading data associated with addresses 204 of the address space in transfer operations 203 .
- the transfer operations may be implemented in response to cache misses, prefetch operations, and/or the like.
- the cache 110 may include and/or be coupled to an interface 215 , which may be configured to couple the cache 110 (and/or cache logic 210 ) to one or more interconnects, such as interconnects 105 A and/or 105 B.
- the cache memory 120 includes a plurality of cache units 220 , each cache unit 220 including and/or corresponding to a respective cache line.
- the cache units 220 may be arranged into a plurality of sets 430 (e.g., sets 430 - 1 through 430 -S).
- the sets 430 may be N-way associative; each set 430 may include N ways 420 , each way 420 including and/or corresponding to a respective cache unit 220 (a respective cache line).
- each set 430 may include N ways 420 - 1 through 420 -N, each way 420 including and/or corresponding to a respective cache unit 220 .
- the address mapping scheme 316 (or address mapping logic) implemented by the cache logic 210 may be configured to divide addresses 204 into an offset 205 , set region (set tag 406 ), and address tag 206 .
- the offset 205 may correspond to a capacity of the cache units 220 (e.g., a capacity of the CMU 320 ), as disclosed herein.
- the address mapping scheme 316 may utilize set tags 406 to associate addresses 204 with respective sets 430 .
- the address mapping scheme 316 includes a set mapping scheme by which set tags 406 are mapped to one of a group of available sets (S C ), S C ⁇ 420 - 1 , . . .
- the address mapping scheme 316 may further include a way mapping scheme by which address tags 206 are mapped to one of the N ways 420 of the selected set 430 (e.g., by comparing the address tag 206 to cache tags 326 of the ways 420 ).
- the cache logic 210 can include, implement, and/or be coupled to partition logic 310 , which may be configured to partition the cache memory 120 into a first portion 124 and second portion 126 .
- the first portion 124 may be allocated for storage of metadata 122 pertaining to the address space
- the second portion 126 may be allocated for storage of cache data 128 (may be allocated as available cache capacity).
- the cache logic 210 may allocate cache memory 120 between the first portion 124 and the second portion 126 in accordance with a partition scheme 312 .
- the partition scheme 312 may specify an amount of cache memory 120 to be allocated to the metadata 122 (the first portion 124 ).
- the partition scheme 312 may also specify a manner in which cache units 220 are allocated to the first portion 124 and/or second portion 126 .
- the cache logic 210 is configured to partition the cache memory 120 by way 420 (may implement way-based or way partition scheme 312 - 1 ).
- the way partition scheme 312 - 1 may specify that the first portion 124 is allocated zero or more ways 420 within zero or more sets 430 of the cache memory 120 .
- the way partition scheme 312 - 1 specifies that the first portion 124 is allocated zero or more ways 420 within each set 430 of the cache memory 120 .
- the cache memory 120 includes a plurality of banks (e.g., SRAM banks).
- the ways 420 of the cache memory 120 may be organized within respective banks. More specifically, the ways 420 of each set 430 may be split across multiple banks of the cache memory 120 .
- each way 420 may be implemented by a respective one of the banks: way 420 - 1 of each set 430 - 1 through 430 -S may be implemented by a first bank, way 420 - 2 of each set 430 - 1 through 430 -S may be implemented by a second bank, and so on, with way 420 -N of each set 430 - 1 through 430 -S being implemented by an Nth bank of the cache memory 120 .
- the banks of the cache memory 120 may include separate memory blocks.
- the first portion 124 allocated to the metadata 122 may, therefore, include zero or more banks (or blocks) of the cache memory 120 .
- the metadata mapping scheme 314 may address banks allocated to the first portion 124 as a linear (or flat) chunk of memory.
- the metadata mapping scheme 314 may, therefore, enable the metadata 122 to be arranged and/or organized in any suitable manner (e.g., as specified by a prefetcher, prefetcher logic 230 , or the like).
- the partition scheme 312 - 1 allocates two ways 420 within each set 430 to the prefetch logic 230 .
- ways 420 (and/or cache units 220 ) that are allocated to the first portion 124 are illustrated with crosshatching to distinguish them from ways 420 that are allocated to the second portion 126 .
- the first portion 124 may include first portions 124 - 1 through 124 -S within each set 430 - 1 through 430 -S of the cache memory 120
- the second portion 126 may include second portions 126 - 1 through 126 -S within sets 430 - 1 through 430 -S.
- Allocating a cache unit 220 (or way 420 ) to the first portion 124 may include configuring the address mapping scheme 316 to disable or ignore the cache unit 220 .
- the address mapping scheme 316 is adapted to disable or ignore ways 420 - 1 and 420 - 2 of each set 430 (e.g., by disabling cache tags 326 - 1 and 326 - 2 of the corresponding cache units 220 - 1 and 220 - 2 ).
- the quantity of sets 430 available for storage of cache data 128 (S C ) may be substantially unchanged.
- the second portion 126 of the cache memory 120 may include S sets 430 , each including N ⁇ 2 ways 420 (ways 420 - 3 through 420 -N).
- the address mapping scheme 316 may, therefore, distribute addresses 204 between the S sets 430 of the cache memory 120 (by set tag 406 or the like).
- the way mapping scheme implemented by the cache logic 210 (and/or address mapping scheme 316 ) may be adapted to modify the associativity of the sets 430 .
- the address mapping scheme 316 manages the sets 430 as [N ⁇ 2]-way associative rather than N-way associative. More specifically, the address mapping scheme 316 maps N ⁇ 2 addresses tags 206 to respective sets 430 rather than N address tags 206 .
- allocating a way 420 to the first portion 124 may include evicting cache data 128 from the way 420 .
- Cache data 128 may be selected for eviction from respective sets 430 in accordance with an eviction or replacement policy, as disclosed herein.
- Allocating R ways 420 of a set 430 to the first portion 124 may include compacting the cache data 128 stored within the set 430 from a capacity of N cache units 220 to N-R cache units 220 .
- the cache logic 210 can select cache data 128 to retain within respective sets 430 and move the selected cache data 128 to the N-R ways 429 of the respective sets 430 that are to remain allocated to the second portion 126 .
- allocating ways 420 - 1 and 420 - 2 of each set 430 to the first portion 124 may include compacting the cache data 128 within each set 430 from a capacity of N cache units 220 to a capacity of N ⁇ 2 cache units 220 by, inter alia, evicting cache data 128 from a first group of two ways 420 of the set 430 (such that a second group of N ⁇ 2 ways 420 of the set 430 are retained), moving cache data 128 stored within the second group of ways 420 to ways 420 - 3 through 430 -N of the set 430 (if necessary), and assigning says 420 - 1 and 420 - 2 to the first portion 124 .
- the metadata 122 maintained within the first portion 124 of the cache memory 120 may be accessed in accordance with a metadata mapping scheme 314 .
- the metadata mapping scheme 314 may define metadata address space (M A ) that includes ways 420 - 1 and 420 - 2 of each set 430 - 1 through 430 -S.
- the metadata address space (M A ) may include an address range ⁇ 0, . . . , (R ⁇ U ⁇ S) ⁇ 1 ⁇ , where R is the number of ways 420 allocated to the first portion 124 within each of the S sets 430 and U is the capacity of each way 420 (in terms of addressable data units).
- the metadata address space (M A ) may define addresses corresponding to indexes and/or offsets of respective ways 420 (or cache units 220 ) of the first portion 124 , as follows ⁇ 0, . . . , (R ⁇ S) ⁇ 1 ⁇ .
- the cache logic 210 (and/or prefetch logic 230 ) can be configured to determine and/or monitor one or more metrics 212 .
- the metrics 212 may be configured to quantify cache and/or prefetch performance, as disclosed herein.
- the cache logic 210 (and/or partition logic 310 ) can adapt the way partition scheme 312 - 1 based, at least in part, on one or more of the metrics 212 .
- the cache logic 210 can adapt the way partition scheme 312 - 1 to: increase the size of the first portion 124 (and decrease the size of the second portion 126 ) when one or more of the metrics 212 exceeds a first threshold or decrease the size of the first portion 124 (and increase the size of the second portion 126 ) when one or more of the metrics 212 is below a second threshold.
- Increasing the size of the first portion 124 may include increasing the number of ways 420 allocated to first portion 124 within each set 430 of the cache memory 120 .
- Decreasing the size of the first portion 124 may include decreasing the number of ways 420 allocated to the first portion 124 within each set 430 of the cache memory 120 .
- FIG. 4 - 2 illustrates an example 401 in which the number of ways 420 allocated for storage of metadata 122 pertaining to the address space is increased (e.g., from two ways 420 within each set 430 to three ways 420 within each set 430 ).
- the amount of cache memory 120 allocated to the first portion 124 may be increased in response to determining and/or monitoring the metrics 212 (e.g., in response to prefetch performance quantified by the metrics 212 exceeding a first threshold).
- the first portion 124 allocated for storage of the metadata 122 may include ways 420 - 1 through 420 - 3 of each set 430 - 1 through 430 -S.
- Allocating the way 420 - 3 to the first portion 124 may include modifying the metadata mapping scheme 314 to reference way 420 - 3 within each set 430 (e.g., define a metadata address scheme including addresses 0 through (3 ⁇ S ⁇ U) ⁇ 1, way indexes 0 through (3 ⁇ S) ⁇ 1, or the like).
- Allocating the way 420 - 3 of each set 430 to the first portion 124 may include compacting cache data 128 stored within each set 430 into N ⁇ 3 ways 420 , as disclosed herein (by selecting cache data 128 within each set 430 for eviction and moving data to retain within each [N ⁇ 3] associative set 430 to ways 420 - 3 through 420 -N of each set 430 .
- Allocating the way 420 - 3 may further include modifying the address mapping scheme 316 to associate addresses 204 with [N ⁇ 3]-way associative sets 430 rather than [N ⁇ 2] or N-way associative sets 430 .
- Allocating the way 420 - 3 of each set 430 to the first portion 124 may include disabling the cache tag 326 - 3 of way 420 - 3 within each set 430 .
- FIG. 4 - 3 illustrates another example 402 in which the number of ways 420 allocated for storage of metadata 122 pertaining to the address space is decreased (e.g., to one way 420 within each set 430 ).
- the amount of cache memory 120 allocated to the first portion 124 may be reduced in response to determining and/or monitoring the one or more metrics 212 (e.g., in response to prefetch performance quantified by the metrics 212 falling below a second threshold).
- the first portion 124 allocated for storage of the metadata 122 may include a single way 420 - 1 within each set 430 - 1 through 430 -S.
- Reducing the size of the first portion 124 may include compacting the metadata 122 for storage within a reduced number of ways 420 , as disclosed herein (e.g., by portions of the metadata 122 , one or more entries of the metadata 122 , and/or the like).
- Reducing the size of the first portion 124 may further include modifying the metadata mapping scheme 314 to reference the smaller number of ways 420 allocated to the first portion 124 .
- the metadata mapping scheme 314 may reference ways 420 - 1 within each set 430 and/or define a metadata address scheme including addresses 0 through (S ⁇ U) ⁇ 1, way indexes 0 through (S) ⁇ 1, or the like.
- decreasing the amount of cache memory 120 allocated to the first portion 124 may result in increasing the amount of cache memory 120 allocated for storage of cache data 128 within the second portion 126 .
- ways 420 - 2 and 420 - 3 of each set are allocated to the second portion 126 .
- Allocating ways 420 - 2 and 420 - 3 to the second portion 126 may include modifying the address mapping scheme 316 to include ways 420 - 2 and 420 - 3 of each set 430 (e.g., by enabling cache tags 326 - 2 and 326 - 3 of the ways 420 - 2 and 420 - 3 ).
- FIG. 5 - 1 illustrates another example 500 of an apparatus for implementing adaptive cache partitioning.
- the apparatus 500 includes a cache 110 that is configured to cache data pertaining to an address space associated with a memory 108 .
- the cache 110 may include and/or be coupled to an interface 215 , which may be configured to couple the cache 110 (and/or cache logic 210 ) to an interconnect, such as the interconnect 105 for a host 102 .
- the cache 110 may be configured to service requests 202 pertaining to addresses 204 of the address space from a requestor 201 .
- the cache 110 may service the requests 202 by use of cache memory 120 , which may include loading data associated with addresses 204 of the address space in respective transfer operations 203 .
- the transfer operations may be implemented in response to cache misses, prefetch operations, and/or the like.
- the memory 108 , cache 110 , and requestor 201 may be communicatively coupled through an interconnect 105 .
- the cache memory 120 may include a plurality of cache units 220 , which may be organized into a plurality of N-way associative sets 430 (e.g., S sets 430 - 1 through 430 -S, each including N ways 420 - 1 through 420 -N).
- the cache logic 210 can implement, include, and/or be coupled to partition logic 310 configured to partition the cache memory 120 into a first portion 124 and a second portion 126 .
- the cache logic 210 partition and/or divide the cache memory 120 in accordance with a partition scheme 312 - 1 , which may specify an amount of cache memory 120 to allocate to storage of the metadata 122 (the first portion 124 ), cache data 128 (the second portion 126 ), and/or the like.
- the amount of cache memory 120 allocated to the first portion 124 may be based, at least in part, on one or more metrics 212 that, inter alia, quantify prefetch performance, as disclosed herein.
- the cache logic 210 partitions the cache memory 120 by set and/or in accordance with a set or set-based partition scheme 312 - 2 . More specifically, the cache logic 210 can allocate zero or more of sets 430 of the cache memory 120 for storage of metadata 122 pertaining to the address space (the first portion 124 ) and one or more of the sets 430 for storage of cache data 128 (the second portion 126 ). In FIG. 5 - 1 example, the cache logic 210 (and/or partition logic 310 ) partitions the cache memory 120 by set and/or in accordance with a set or set-based partition scheme 312 - 2 . More specifically, the cache logic 210 can allocate zero or more of sets 430 of the cache memory 120 for storage of metadata 122 pertaining to the address space (the first portion 124 ) and one or more of the sets 430 for storage of cache data 128 (the second portion 126 ). In FIG.
- the first portion 124 of the cache memory 120 allocated to the prefetch logic 230 includes two sets 430 (e.g., sets 430 - 1 and 430 - 2 ) and the second portion 126 of the cache memory 120 allocated for use as available cache capacity includes S ⁇ 2 sets 430 (e.g., sets 430 - 3 through 430 -S).
- sets 430 allocated to the first portion 124 are highlighted with a crosshatch fill pattern.
- the address mapping scheme 316 (or address mapping logic) implemented by the cache logic 210 may be configured to map addresses 204 to cache units 220 by, inter alia, associating the addresses 204 with respective sets 430 , and matching address tags 206 of the addresses 204 to cache tags 326 of the associated sets 430 .
- Allocating one or more sets 430 to the first portion 124 may reduce the number of sets 430 included in the second portion 126 (reduce the number of sets 430 to which addresses 204 may be mapped).
- Allocating R sets 430 for metadata storage may reduce the number of available sets 430 to S-R (or S ⁇ 2 in the FIG. 5 - 1 example).
- the address mapping scheme 316 modifies the manner in which addresses 204 are divided (and/or the size of respective address regions).
- the address mapping scheme 316 may adapt the number of bits included in set tags 406 - 1 based, at least in part, on the quantity of sets 430 allocated to the first portion 124 .
- the address mapping scheme 316 may, for example, reduce the number of bits included in set tags 406 - 1 by log 2 R, where R is the number of sets 430 allocated to store the metadata 122 (by one bit in the FIG. 5 - 1 example).
- the metadata mapping scheme 314 may be configured to associate metadata 122 (and/or metadata addresses) with cache memory 120 allocated to the first portion 124 .
- the metadata mapping scheme 314 may define a range of metadata addresses 0 through (R ⁇ N ⁇ U) ⁇ 1 or indexes 0 through R ⁇ N, where R is the number of sets 430 allocated to the metadata 122 , Nis the number of ways 420 included in each set 430 , and U is the capacity of each way 420 (and/or corresponding cache unit 220 ).
- the cache logic 210 can be further configured to adapt the set partition scheme 312 - 2 based, at least in part, on one or more metrics 212 pertaining to prefetch performance.
- the cache logic 210 can increase the number of sets 430 allocated for the metadata 122 when one or more of the metrics 212 exceeds a first threshold and can decrease the number of sets 430 allocated for the metadata 122 (and increase the number of sets 430 available to store cache data 128 ) when one or more of the metrics 212 falls below a second threshold.
- FIG. 5 - 2 illustrates an example 501 in which the amount of cache memory 120 allocated to the metadata 122 is increased as compared to the example 500 illustrated in FIG. 5 - 1 .
- the size of the first portion 124 may be increased based on prefetch performance within one or more regions of the address space.
- the quantity of sets 430 included in the first portion 124 may be increased to four (e.g., increased from sets 430 - 1 through 430 - 2 to sets 430 - 1 through 430 - 4 ).
- Allocating additional sets 430 - 3 and 430 - 4 for metadata storage may include adapting the address mapping scheme 316 to distribute addresses between S ⁇ 4 sets 430 (as opposed to S ⁇ 2 or S sets 430 ).
- the address mapping scheme 316 may be modified to reduce the number of bits included in set tags 406 - 2 by two bits (or a single bit as compared to the set tags 406 - 1 of the FIG. 5 - 1 example).
- Allocating a set 430 to the first portion 124 may include evicting cache data from the set 430 , disabling cache tags 326 - 1 through 326 -N of each way 420 of the set 430 , and so on.
- Allocating an additional set 430 to the first portion 124 may further include adapting the metadata mapping scheme 314 to include the additional set 430 .
- the metadata mapping scheme 314 may be adapted to define a range of metadata addresses 0 through (4 ⁇ N ⁇ U) ⁇ 1 or indexes 0 through 4 ⁇ N.
- FIG. 5 - 3 illustrates an example 502 in which the amount of cache memory 120 allocated to the metadata 122 is decreased as compared to example 501 of FIG. 5 - 2 (and example 500 of FIG. 5 - 1 ).
- the size of the first portion 124 may be decreased based on prefetch performance within one or more regions of the address space, as disclosed herein.
- the quantity of sets 430 included in the first portion 124 may be decreased to one (e.g., decreased to a single set 430 - 1 ). Reducing the size of the first portion 124 may, therefore, including allocating additional sets 430 - 4 through 430 - 2 for storage of cache data 128 (to the second portion 126 ).
- Allocating one or more sets 430 to the second portion 126 may include compacting the metadata 122 and storing compacted metadata within a reduced number of cache units 220 .
- the metadata 122 may be compacted for storage within N cache units 220 .
- Compacting the metadata 122 may include removing portions of the metadata 122 , such as one or more metadata entries. The entries may be selected based on any suitable criteria including, but not limited to: age criteria (oldest removed first, youngest removed first, or the like), least recently accessed criteria, least frequently accessed criteria, prefetch performance criteria (e.g., prefetch performance within address regions covered by respective entries of the metadata 122 ), and/or the like.
- the metadata mapping scheme 314 may be modified to decrease the number of cache units 220 referenced thereby (reduce the metadata address range to N ⁇ U or metadata index range to N), and so on.
- the address mapping scheme 316 may be modified to increase the quantity of available sets 430 to S ⁇ 1.
- the address mapping scheme 316 may be configured to distribute addresses 204 between a larger number of sets 430 by, inter alia, increasing the number of bits included in set tags 406 - 2 of the addresses 204 .
- Allocating a set 430 for cache data storage may further include enabling cache tags 326 of each way 420 of the set 430 (e.g., enabling cache tags 326 - 1 through 326 -N of each way 420 - 1 through 420 -N of the set 430 being allocated for storage of cache data 128 ).
- FIG. 6 illustrates with a flow diagram 600 example methods for an apparatus to implement adaptive cache partitioning.
- the flow diagram 600 includes blocks 602 through 606 .
- a host device 102 (and/or component thereof) can perform one or more operations of the flow diagram 600 (and/or operations of the other flow diagrams described herein) to realize at least one method for adaptive cache partitioning.
- one or more of the operations may be performed by a memory, memory controller, PIM logic, cache 110 , cache memory 120 , cache logic 210 , prefetch logic 230 , an embedded processor, and/or the like.
- a first portion of the cache memory 120 of a cache 110 is allocated for storage of metadata 122 pertaining to an address space associated with a backing memory of the cache 110 (e.g., the address space associated with the memory 108 ).
- allocating the first portion may include partitioning the cache memory 120 into a first portion 124 and a second portion 126 .
- the first portion 124 may be allocated for storage of the metadata 122
- the second portion 126 may be allocated for storage of cache data 128 (may be allocated as available cache capacity).
- the metadata 122 maintained within the first portion of the cache memory 120 may include information pertaining to accesses to respective addresses and/or regions of the address space.
- a prefetcher and/or prefetch logic 230 of the cache 110 may utilize the metadata 122 to predict addresses 204 of upcoming requests 202 and prefetch data associated with the predicted addresses 204 into the second portion 126 of the cache memory 120 .
- the metadata 122 can include any suitable information pertaining to the address space including, but not limited to: a sequence of previously requested addresses 204 or address offsets, an address history, an address history table, an index table, access frequencies for respective addresses 204 , access counts (e.g., accesses within respective windows), access time(s), last access time(s), and/or the like.
- the metadata 122 includes a plurality of entries, each entry including information pertaining to a respective region of the address space.
- the metadata 122 pertaining to respective regions of the address space may be used to, inter alia, determine address access patterns within the respective regions, which may be used to inform prefetch operations within the respective regions.
- the cache memory 120 may be partitioned into the first portion 124 (e.g., a first partition) and the second portion 126 (e.g., second partition) according to any suitable partition scheme 312 , such as a sequential scheme, a way-based partition scheme 312 - 1 , a set-based partition scheme 312 - 2 , and/or the like.
- the first portion 124 may include any suitable portion, quantity, and/or amount of the cache memory resources of the cache memory 120 including, but not limited to, zero or more: cache units 220 , CMU 320 , cache blocks, cache lines, hardware cache lines, ways 420 (and/or corresponding cache units 220 ), sets 430 , rows, columns, banks, and/or the like.
- the cache logic 210 allocates M cache units 220 to the first portion 124 and allocates X-M cache units 220 to the second portion 126 as available cache capacity (where X is the number of available cache units 220 included in the cache memory 120 ).
- Allocating the M cache units 220 may include allocating cache units 220 - 1 through 220 ⁇ M to the first portion 124 (e.g., according to a sequential scheme), allocating cache units 220 within ways W 1 of each set 430 of the cache memory 120 , where
- S is the sets 430 included in the cache memory 120 (e.g., according to a way-based partition scheme 312 - 1 ), allocating cache units 220 within sets 1 through E 1 , where
- N is the number of ways 420 included in each set 430 of the cache memory 120 (e.g., according to a set-based partition scheme 312 - 2 ), and/or the like.
- Allocating the M cache units 220 may include flushing and/or destaging cache data 128 from the M cache units 220 , which may include writing dirty cache data 128 stored within the cache units 220 to the memory 108 , and/or the like. Allocating the M cache units 220 may further include configuring an address mapping scheme 316 by which addresses 204 are mapped to respective cache units 220 , sets 430 , and/or ways 420 to disable, remove, and/or ignore the M cache units 220 , such that the addresses 204 do not map to the M cache units 220 (and the M cache units 220 are not available for storage of cache data 128 ). In some implementations, the cache logic 210 disables cache tags 326 of the M cache units 220 allocated to the first portion at 602 .
- the cache logic 210 partitions the cache memory 120 by way 420 (e.g., by allocating ways 420 within respective sets 430 of the cache memory 120 ).
- allocating the M cache units 220 to the first portion 124 may include allocating W 1 ways 420 within each of S sets 430 - 1 through 430 -S of the cache memory 120 to the first portion 124 , where
- the cache logic 210 may implement a set-based partition scheme 312 - 2 by which the cache memory 120 is divided by set 430 .
- Allocating M cache units 220 to the first portion 124 per a set-based partition scheme 312 - 2 may include allocating E 1 sets 430 to the first portion 124 , where
- Allocating the M cache units 220 for metadata storage may further include configuring a metadata mapping scheme 314 to provide access to memory storage capacity of the M cache units 220 .
- the metadata mapping scheme 314 implemented by the cache logic 210 may provide access to memory storage capacity of the M cache units 220 included in the first portion 124 of the cache memory 120 .
- the metadata mapping scheme 314 may define metadata address space (M A ), M A ⁇ 0, . . . , (M ⁇ U) ⁇ 1 ⁇ , where U is the capacity of a cache unit 220 (capacity of a CMU 320 ).
- the metadata address space (M A ) may define a range of cache unit indexes M I , each corresponding to a respective one of the M cache units 220 allocated to the first portion 124 , M A ⁇ 0, . . . , M ⁇ 1 ⁇ .
- metadata mapping schemes 314 and/or metadata addressing and/or access schemes
- the disclosure is not limited in this regard and could be adapted to provide access to cache memory 120 allocated to the first portion 124 through any suitable mechanism or technique.
- data associated with the address space is written to the second portion 126 of the cache memory 120 .
- the cache logic 210 may load the data into the cache memory 120 in response to requests 202 pertaining to addresses 204 that trigger cache misses (e.g., addresses 204 that have not yet been loaded into the second portion 126 of the cache memory 120 ).
- the cache logic 210 (and/or prefetch logic 230 ) may prefetch cache data 128 into the second portion 126 of the cache memory 120 at 604 .
- the prefetcher logic 230 may utilize the metadata 122 pertaining to the address space to predict addresses 204 of upcoming requests 202 and configure the cache logic 210 to prefetch cache data 128 corresponding to the predicted addresses 204 before requests 202 pertaining to the predicted addresses 204 are received.
- Prefetched cache data 128 may be transferred into the relatively faster cache memory 120 from the relatively slower memory 108 in transfer operations 203 .
- Transfer operations 203 to prefetch cache data 128 may be implemented as background operations (e.g., during idle periods during which the cache 110 is not servicing requests 202 ).
- the cache logic 210 (and/or prefetch logic 230 ) may be further configured to determine and/or monitor one or more metrics 212 pertaining to the cache 110 at 604 .
- the metrics 212 may be configured to quantify any suitable aspect of cache and/or prefetch performance including, but not limited to: request latency, average request latency, cache performance, cache hit rate, cache miss rate, prefetch performance, prefetch hit rate, prefetch miss rate, quantity of useful prefetches, quantity of bad prefetches, ratio of useful prefetches to bad prefetches, and/or the like.
- the cache logic 210 (and/or prefetch logic 230 ) may be further configured to record, update, and/or otherwise maintain metadata 122 pertaining to the address space within the first portion of the cache memory 120 allocated at 604 (e.g., within the first portion of the cache memory 120 ).
- the metadata 122 may be accessed by and/or through the metadata mapping scheme 314 , as disclosed herein.
- the size of the first portion of the cache memory 120 allocated for the metadata 122 pertaining to the address space is modified based, at least in part, on one or more metrics 212 pertaining to cache data 128 prefetched into the second portion of the cache memory.
- the amount of cache memory 120 allocated to the first portion 124 may be increased when one or more of the metrics 212 exceeds a first threshold.
- the size of the first portion 124 may be incrementally and/or periodically increased while prefetch performance remains above the first threshold and/or until a maximum or upper bound is reached.
- the amount of cache memory allocated to the first portion 124 may be decreased when one or more of the metrics 212 is below a second threshold.
- the size of the first portion 124 may be incrementally and/or periodically decreased when prefetch performance remains below the second threshold and/or until a lower bound is reached.
- no cache resources are allocated for storage of the metadata 122 and substantially all of cache memory 120 is available as cache capacity.
- the amount of cache memory 120 allocated to the metadata 122 may be increased when the workload on the cache 110 is suitable for prefetching and may be decreased when the workload is not suitable for prefetching (as indicated by the one or more metrics 212 ).
- the cache 110 may, therefore, be capable of adapting to different workload conditions. For example, increasing the amount of cache memory 120 allocated to prefetch metadata 122 when servicing workloads that are suitable for prefetching may result in improved performance despite decreases in available cache capacity, whereas decreasing the amount of cache memory 120 allocated for the metadata 122 may enable the available capacity of the cache 110 to be increased, resulting in improved performance under workloads that are not suitable for prefetching.
- modifying the size of the first portion 124 of the cache memory 120 allocated for the metadata 122 may include completing pending requests 202 (e.g., draining a pipeline of the cache 110 ), flushing the cache 110 , resetting the prefetch logic 230 (and/or prefetcher), repartitioning the cache memory 120 to modify the amount of cache memory 120 allocated to the first portion 124 and/or second portion 126 , and resuming operation using the repartitioned cache memory 120 (e.g., using the resized first portion 124 and/or second portion 126 of the cache memory 120 ).
- modifying the size of the first portion 124 of the cache memory 120 may include preserving cache and/or prefetcher state.
- the cache logic 210 can preserve cache state by, inter alia, compacting the cache data 128 maintained within the second portion 126 (e.g., selecting cache data 128 of R cache units 220 for eviction), moving cache data 128 from cache units 220 that are designated for allocation to the first portion 124 to cache units 220 that are to remain allocated to the second portion 126 , and so on.
- the cache logic 210 can preserve prefetcher state by, inter alia, compacting the metadata 122 maintained within the first portion 124 for storage within a smaller number of cache units 220 (e.g., by removing portions of the metadata 122 , such as entries associated with address regions exhibiting poor prefetch performance), moving the compacted metadata 122 to cache units 220 that are to remain allocated to the first portion 124 , and so on.
- Increasing the amount of cache memory 120 allocated to the first portion (e.g., first portion 124 ) at 606 may include allocating one or more cache units 220 from the second portion (e.g., second portion 126 ) to the first portion 124 .
- Allocating the one or more cache units 220 to the first portion 124 may include flushing and/or destaging the cache units 220 , modifying the address mapping scheme 316 to disable, remove, and/or ignore the cache units 220 (e.g., disable cache tags 326 of the cache units 220 ), modifying the metadata mapping scheme 314 to include and/or reference the cache units 220 , and so on, as disclosed herein.
- Increasing the amount of cache memory 120 allocated to the first portion may, therefore, include decreasing the amount of cache memory 120 allocated to the second portion (and/or decreasing the amount of cache memory 120 available for storage of cache data 128 ).
- Decreasing the size of the second portion 126 may include compacting the cache data 128 stored within the second portion 126 of the cache memory 120 , which may include selecting cache data 128 to remove and/or evict from the cache 110 .
- the cache data 128 may be selected according to any suitable replacement or eviction policy, such as FIFO, LIFO, LRU, TLRU, MRU, LFU, random replacement, or the like.
- Compacting the cache data 128 may include reducing the amount of cache memory 120 consumed by the cache data 128 by R cache units 220 , where R is the number of cache units 220 being allocated from the second portion 126 to the first portion 124 (or R ⁇ U, where U is the capacity of a cache unit 220 , CMU 320 , or way 420 ).
- the cache logic 210 selects a first group of cache units 220 to reallocate to the first portion 124 and selects a second group of cache units 220 for eviction.
- the first group and the second group may each include R cache units 220 , where R is the quantity of cache units 220 to be reallocated to the first portion 124 .
- the first group and second group may be selected independently and/or in accordance with respective selection criteria.
- the first group of cache units 220 may be selected in accordance with the address mapping scheme 316 , metadata mapping scheme 314 , partition scheme 312 , or the like (which may allocate cache units 220 for storage of the metadata 122 per a predetermined pattern or scheme, such as a sequential scheme, way-based partition scheme 312 - 1 , set-based partition scheme 312 - 2 , or the like).
- the second group of cache units 220 may be selected in accordance with an eviction or replacement policy, as disclosed herein.
- Reallocating the R cache units 220 may include: a) flushing the second group of cache units 220 , and b) moving cache data 128 from cache units 220 that are included in the first group (and are not included in the second group) to the second group of cache units 220 .
- the cache logic 210 may, therefore, retain more frequently accessed data within the cache memory 120 when reducing the available cache data 128 capacity of the cache 110 .
- the cache logic 210 partitions the cache memory 110 according to a way or way-based partition scheme 312 - 1 .
- Allocating R cache units 220 from the second portion 126 to the first portion 124 may include allocating one or more ways 420 within each set 430 of the cache to the first portion 124 .
- Allocating R cache units 220 from the second portion 126 to the first portion 124 may include allocating an additional W 1A ways 420 within each of S sets 430 - 1 through 430 -S from the second portion 126 to the first portion 124 , where
- the cache logic 210 may partition the cache memory 110 according to a set or set-based partition scheme 312 - 2 .
- Allocating R cache units 220 from the second portion 126 to the first portion 124 may include allocating an additional E 1A sets 430 of the cache memory 120 from the second portion 126 to the first portion 124 , where
- N is the number of cache units 220 (or ways 420 ) included in each set 430 .
- Decreasing the amount of cache memory 120 allocated to the first portion (e.g., first portion 124 ) at 606 may include allocating one or more cache units 220 from the first portion to the second portion (e.g., second portion 126 ). Allocating the one or more cache units 220 to the second portion 126 may include modifying the address mapping scheme 316 to enable, include, and/or otherwise reference the cache units 220 (e.g., enable cache tags 326 of the cache units 220 ), modifying the metadata mapping scheme 314 to remove the cache units 220 , and so on, as disclosed herein.
- Decreasing the amount of cache memory 120 allocated to the first portion may further include compacting the metadata 122 .
- the metadata 122 may be compacted for storage within R fewer cache units 220 , where R is the quantity of cache units 220 to be allocated from the first portion 124 to the second portion 126 .
- Compacting the metadata 122 at 606 may include removing a portion of the metadata 122 , such as one or more entries of the metadata 122 .
- the portion of the metadata 122 may be selected based on a removal criterion, such as an age criterion (oldest removed first, youngest removed first, or the like), least recently accessed criterion, least frequently accessed criterion, and/or the like.
- portions of the metadata 122 may be selected for removal based, at least in part, on one or more metrics 212 .
- the metadata 122 may include a plurality of entries, each entry including access information pertaining to a respective region of the address space.
- the prefetch logic 230 may utilize respective entries of the metadata 122 to implement prefetch operations within the address regions covered by the respective entries.
- the one or more metrics 212 may be configured to quantify prefetch performance within the address regions covered by the respective entries of the metadata 122 .
- Compacting the metadata 122 may include selecting entries of the metadata 122 for removal based, at least in part, on prefetch performance within the address regions covered by the entries, as quantified by the metrics 212 .
- entries of the metadata 122 in which prefetch performance is below a threshold may be removed (and/or the amount of memory capacity allocated to the entries may be reduced).
- entries of the metadata 122 exhibiting higher prefetch performance may be retained, whereas entries exhibiting lower prefetch performance may be removed (e.g., the R lowest-performing entries of the metadata 122 may be selected for removal).
- Compacting the metadata may, therefore, include removing metadata 122 from one or more cache units 220 and/or moving metadata 122 (and/or entries of the metadata 122 ) from cache units 220 being reallocated to the second portion 126 to the remaining cache units 220 allocated to the first portion 124 .
- FIG. 7 illustrates with a flow diagram 700 further examples of methods for an apparatus to implement adaptive cache partitioning.
- the flow diagram 700 includes blocks 702 through 708 .
- logic of a cache 110 e.g., cache logic 210
- the first portion 124 may include a first portion of the cache memory 120
- the second portion 126 may include a second portion of the cache memory 120 , different from the first portion 124 .
- the first portion 124 may be allocated for storage of metadata 122 pertaining to an address space, such as an address space associated with a backing memory of the cache 110 (a memory 108 ).
- the second portion 126 may be allocated for storage of cache data 128 pertaining to the address space (e.g., may be available cache capacity of the cache 110 ). Partitioning the cache memory 120 may include implementing a metadata mapping scheme 314 to access cache units 220 allocated to the first portion 124 and an address mapping scheme 316 to map addresses 204 of the address space to cache units 220 allocated to the second portion 126 .
- the cache 110 services requests pertaining to the address space, which may include maintaining metadata pertaining to the address space within the first portion 124 (e.g., within metadata 122 maintained within the first portion 124 ) and loading data associated with addresses of the address space into the second portion 126 .
- Data may be loaded into the cache memory 120 in response to cache misses, such as requests 202 pertaining to addresses 204 that are not available within the cache 110 .
- data may be prefetched into the cache memory 120 at 704 .
- a prefetcher (and/or prefetch logic 230 of the cache 110 ) may utilize the metadata 122 maintained within the first portion 124 to predict addresses 204 of upcoming requests 202 , and data corresponding to the predicted addresses 204 may be prefetched into the second portion 126 before requests 202 pertaining to the predicted addresses 204 are received at the cache 110 .
- the cache logic 210 may determine whether to adapt the partition scheme 312 of the cache memory 120 . More specifically, the cache logic 210 (and/or prefetch logic 230 ) may determine whether to modify the size of the first portion 124 allocated for the metadata 122 (and/or modify the size of the second portion 126 allocated for storage of cache data 128 ) at 706 . The determination may be based, at least in part, on one or more metrics 212 , which may be configured to quantify prefetch performance, as disclosed herein.
- Determining whether to adapt the partition scheme 312 may include determining and/or monitoring one or more metrics 212 pertaining to data prefetched into the second portion 126 and comparing the metrics 212 to one or more thresholds.
- the partition scheme 312 may be adapted at 708 responsive to one or more of the metrics 212 being greater than a first threshold and/or being lower than a second threshold; otherwise, the flow may continue at 704 where the cache 110 may continue to service requests pertaining to the address space.
- the cache logic 210 adapts the partitioning scheme to, inter alia, modify the amount of cache memory 120 allocated to the first portion 124 and/or second portion 126 .
- the size of the first portion 124 allocated for the metadata 122 may be increased (and the size of the second portion 126 allocated for cache data 128 may be decreased) when the metrics 212 exceed one or more first thresholds (e.g., when prefetch performance exceeds one or more first thresholds).
- the size of the first portion 124 may be decreased (and the size of the second portion 126 may be increased) when the metrics 212 are below one or more second thresholds (e.g., when prefetch performance is below one or more second thresholds).
- Increasing the size of the first portion 124 may include allocating cache resources from the second portion 126 to the first portion 124 (e.g., one or more cache units 220 , ways 420 , sets 430 , and/or the like). Increasing the size of the first portion 124 may include reducing the size of the second portion 126 . Reducing the size of the second portion 126 may include compacting cache data 128 stored within the second portion 126 , as disclosed herein (e.g., by selecting cache data 128 for eviction, moving cache data 128 to remaining cache units 220 allocated to the second portion 126 , and so on). Conversely, decreasing the size of the first portion 124 may include allocating cache resources from the first portion 124 to the second portion 126 .
- Decreasing the size of the first portion 124 may include compacting metadata 122 stored within the first portion 124 of the cache memory 120 , as disclosed herein (e.g., by selecting portions of the metadata 122 for removal, moving portions of the metadata 122 to remaining cache units 220 allocated to the first portion 124 , and so on).
- the flow may continue at 704 where the cache 110 may service requests pertaining to the address space, as disclosed herein.
- FIG. 8 illustrates another example flow diagram 800 depicting operations for adaptive cache partitioning based, at least in part, on metrics 212 pertaining to prefetch performance.
- the flow diagram 800 includes blocks 802 through 816 .
- cache logic 210 (and/or prefetch logic 230 ) divides a cache memory 120 into a first portion 124 and a second portion 126 .
- the first portion 124 includes a first partition of the cache memory 120 allocated for metadata 122 pertaining to an address space
- the second portion 126 may include a second partition of the cache memory 120 allocated for cache data 128 (the first portion separate from the second portion 126 ).
- the cache 110 services requests pertaining to the address space, which may include loading data into the second portion 126 of the cache memory 120 , retrieving data associated with respective addresses 204 of the address space in response to requests 202 pertaining to the addresses 204 from the second portion 126 of the cache memory 120 , maintaining metadata 122 pertaining to accesses to respective addresses and/or regions of the address space within the first portion 124 of the cache memory 120 , utilizing the metadata 122 maintained within the first portion 124 of the cache memory 120 to prefetch cache data 128 into the second portion 126 of the cache memory 120 , and so on.
- the cache logic 210 determines whether to evaluate the partition scheme 312 of the cache memory 120 .
- the partition scheme 312 may be evaluated in background operations and/or by use of idle resources of the cache 110 .
- the determination of 806 may be based, at least in part, on whether the cache 110 is idle (e.g., is servicing one or more requests 202 ), whether idle resources are available, and/or the like.
- the determination of 806 may be based on one or more time-based criteria (e.g., may evaluate the partitioning scheme periodically and/or at a determined interval), a predetermined schedule, and/or the like.
- the determination of 806 may be triggered by workload conditions and/or prefetch performance metrics (e.g., one or more metrics 212 ).
- the cache logic 210 may be configured to determine and/or monitor metrics 212 pertaining to prefetch performance periodically and/or continuously, and evaluation of the partition scheme 312 may be triggered at 806 in response to metrics 212 that exceed and/or are below one or more thresholds.
- the flow continues at 808 ; otherwise, the flow continues to service requests pertaining to the address space at 804 .
- the cache logic 210 determines and/or monitors one or more aspects of prefetch performance, such as prefetch hit rate, prefetch miss rate, quantity of useful prefetches, quantity of bad prefetches, ratio of useful prefetches to bad prefetches, and/or the like.
- the cache logic 210 may determine and/or monitor one or more metrics 212 pertaining to prefetch performance, as disclosed herein.
- Prefetch hit rate may be based on access metrics of prefetched cache data 128 maintained within cache metadata 122 associated with the prefetched cache data 128 .
- the cache data 128 that was prefetched into the cache memory 120 may be identified by use of prefetch indicators, such as prefetch flags associated with the cache data 128 , which may be maintained within cache metadata 322 associated with the cache units 220 in which the cache data 128 are stored).
- prefetch indicators such as prefetch flags associated with the cache data 128 , which may be maintained within cache metadata 322 associated with the cache units 220 in which the cache data 128 are stored).
- the prefetch performance determined at 806 is compared to a first threshold. If the prefetch performance exceeds the first threshold, the flow continues at 812 ; otherwise, the flow continues at 814 .
- the determination of 810 is based on whether the prefetch performance determined at 810 exceeds the first threshold and the amount of cache memory 120 currently allocated to the first portion 124 is below a maximum amount, threshold, or upper bound. If so, the flow continues at 812 ; otherwise, the flow continues at 814 .
- the cache logic 210 modifies the partition scheme 312 to increase the amount of cache memory 120 allocated for storage of the metadata 122 pertaining to the address space (e.g., increase the size of the first portion 124 and/or first portion of the cache memory 120 ). Increasing the amount of cache memory 120 allocated to the first portion 124 may include decreasing the amount of cache memory 120 allocated to the second portion 126 (e.g., reducing the available capacity of the cache 110 ). At 812 , the cache logic 210 may reassign designated cache memory resources from the second portion 126 to the first portion 124 , such as one or more cache units 220 , ways 420 , sets 430 , and/or the like.
- the cache memory 120 may be partitioned into a first portion 124 comprising a first group of cache units 220 and a second portion 126 comprising a second group of cache units, different from the first group.
- Increasing the amount of cache memory allocated to the first partition may include allocating one or more cache units 220 of the second group to the first group by, inter alia, evicting and/or moving cache data 128 from the one or more cache unit 220 , removing the one or more cache units 220 from the address mapping scheme 316 (e.g., disabling cache tags 326 of the one or more cache units 220 ), adding the one or more cache units 220 to the metadata mapping scheme 314 , and so on.
- the cache logic 210 may be further configured to compact cache data 128 stored within the second portion 126 for storage within a smaller amount of the cache memory 120 , configure the address mapping scheme 316 to remove, disable, and/or dereference the designated cache memory resources, configure the metadata mapping scheme 314 to include, reference, and/or otherwise provide access to the designated cache resources for use in storing the metadata 122 , and so on, as disclosed herein.
- the flow may continue at 804 .
- the prefetch performance determined and/or monitored at 808 is compared to a second threshold. If the prefetch performance is below the second threshold, the flow continues at 816 ; otherwise, the flow continues at 804 . In some implementations, the determination of 814 is based on whether the prefetch performance determined at 810 is below the second threshold and the amount of cache memory 120 currently allocated to the first portion 124 above a minimum amount, threshold, or lower bound. If so, the flow continues at 816 ; otherwise, the flow continues at 814 .
- the cache logic 210 modifies the partition scheme 312 to decrease the amount of cache memory 120 allocated for storage of the metadata 122 pertaining to the address space (e.g., decrease the size of the first portion 124 and/or first portion of the cache memory 120 ). Decreasing the amount of cache memory 120 allocated to the first portion 124 may include increasing the amount of cache memory 120 allocated to the second portion 126 (e.g., increasing the available capacity of the cache 110 ).
- the cache logic 210 may reassign designated cache memory resources from the first portion 124 to the second portion 126 , such as one or more cache units 220 , ways 420 , sets 430 , and/or the like.
- the cache logic 210 may be further configured to compact metadata 122 stored within the first portion 124 for storage within a smaller amount of the cache memory 120 , configure the address mapping scheme 316 to enable, reference, and/or otherwise utilize the designated cache memory resources for cache data 128 , configure the metadata mapping scheme 314 to remove, exclude, and/or dereference the designated cache resources, and so on, as disclosed herein.
- the flow may continue at 804 .
- FIG. 9 illustrates an example flow diagram 900 depicting operations for adaptive cache partitioning based, at least in part, on metrics pertaining to cache and/or prefetch performance.
- Flow diagram 900 includes blocks 902 through 916 .
- a cache 110 partitions a cache memory 120 thereof into a first portion 124 and a second portion 126 .
- the first portion 124 may include a first portion of the cache memory 120 (e.g., zero or more cache units 220 , cache lines, hardware cache lines, ways 420 , sets 430 , and/or the like).
- the second portion 126 may include a second portion of the cache memory 120 different from the first portion 124 .
- the first portion 124 may be allocated for metadata 122 pertaining to an address space, and the second portion 126 may be allocated for storage of cache data 128 (may be available cache capacity).
- the cache 110 services requests pertaining to the address space, which may include, inter alia, receiving requests 202 , loading cache data 128 into the second portion 126 of the cache memory 120 in response to cache misses, servicing the requests 202 by use of cache data 128 stored within the second portion 126 of the cache memory 120 , and so on.
- the cache 110 maintains metadata 122 pertaining to address access characteristics within the first portion 124 of the cache memory 120 .
- the metadata 122 may include any suitable information pertaining to accesses to respective addresses and/or address regions of the address space, as disclosed herein.
- cache data 128 are prefetched into the second portion 126 of the cache memory 120 based, at least in part, on the metadata 122 maintained within the first portion 124 of the cache memory 120 .
- the cache, cache logic 210 , prefetch logic 230 , and/or prefetcher coupled to the cache 110 determines and/or monitors one or more metrics 212 .
- the metrics 212 may be configured to quantify cache and/or prefetch performance, as disclosed herein.
- the metrics 212 are evaluated to determine whether to adapt the partition scheme 312 of the cache memory 120 (e.g., determine whether to adapt the amount of cache memory 120 allocated to the first portion 124 or second portion 126 ).
- the determination of 912 may be based, at least in part, on the metrics 212 determined and/or monitored at 910 .
- the determination of 912 may adapt the partition scheme 312 based on cache performance and/or prefetch performance.
- the partition scheme 312 may be adapted at 914 in response to: a) metrics 212 that are outside of one or more thresholds, b) prefetch performance that is outside of one or more prefetch thresholds, c) cache performance that is outside of one or more cache thresholds, and/or the like.
- the determination of 912 may be based on whether prefetch performance (e.g., prefetch hit rate) is above an upper prefetch threshold or below a lower performance threshold, whether cache performance (e.g., cache hit rate) is above an upper cache threshold or below a lower cache threshold, and/or the like. In some implementations, the determination of 912 may be based on both prefetch and cache performance (may be configured to balance prefetch and cache performance).
- the determination of 912 may be based on whether: a) prefetch performance exceeds a first prefetch threshold and cache performance is below a first cache threshold, b) prefetch performance is below a second prefetch threshold and cache performance is above a second cache performance threshold, and/or the like.
- the determination of 912 may be based on, inter alia, an amount of cache memory 120 currently allocated to the first portion 124 for the metadata 122 (metadata capacity).
- the determination 912 may be based on whether prefetch performance is above a first prefetch performance threshold and metadata capacity is below a first capacity threshold (e.g., a first prefetch or metadata capacity threshold), whether prefetch performance is below a second prefetch performance threshold and metadata capacity is above a second capacity threshold (e.g., a second prefetch or metadata capacity threshold), and/or the like.
- the determination of 912 is based on cache performance.
- the determination may be based on whether cache performance quantified by the metrics 212 (e.g., a cache performance metric 212 ) is below a cache performance threshold.
- the amount of cache memory 120 allocated for storage of metadata 122 pertaining to the address space may be iteratively and/or periodically adjusted to improve cache performance (e.g., either increased or decreased) at 914 .
- size adjustments for the first portion 124 and/or second portion 126 are determined.
- the size adjustments may be based, at least in part, on the metrics 212 determined and/or monitored at 910 (and/or the evaluation of the metrics 212 at 912 ).
- the size of the first portion 124 allocated for metadata 122 pertaining to the address space may be increased when prefetch performance quantified by the metrics 212 is at or above an upper prefetch threshold (and the metadata capacity is below a determined maximum).
- the size of the first portion 124 may be decreased when the prefetch performance quantified by the metrics 212 is at or below a lower prefetch threshold.
- the amount of cache memory 120 allocated to the first portion 124 a) may be increased when prefetch performance is above a first prefetch threshold and cache performance is below a first cache threshold, or b) may be decreased when prefetch performance is below a second prefetch threshold and cache performance is above a second cache performance threshold, and/or the like.
- the size adjustments may be based on, inter alia, an amount of cache memory 120 currently allocated to the first portion 124 for the metadata 122 (metadata capacity).
- the amount of cache memory 120 allocated to the first portion 124 may be increased when prefetch performance is above a first prefetch performance threshold and the amount of cache memory 120 currently allocated to the first portion 124 is below a first capacity threshold.
- the amount of cache memory 120 allocated to the first portion 124 may be decreased when prefetch performance is below a second prefetch performance threshold and the amount of cache memory 120 currently allocated to the first portion 124 is above a second capacity threshold, or the like.
- the size adjustments at 914 may be based on cache performance metrics, such as cache hit rate.
- the amount of cache memory 120 allocated for storage of metadata 122 pertaining to the address space may be iteratively and/or periodically adjusted to achieve improved cache hit rates (e.g., either increased or decreased).
- the determination of 912 and size adjustments at 914 may be implemented in accordance with an optimization algorithm, which may be configured to converge to optimal (or locally optimal) partition scheme 312 that results in optimal (or locally optimal) cache performance, as quantified by the metrics 212 .
- FIG. 10 illustrates an example system 1000 for adaptive cache partitioning.
- the system 1000 may include a cache apparatus 1001 , which may include a cache 110 and/or means for implementing a cache 110 , as disclosed herein.
- the description of FIG. 10 refers to aspects described above, such as the cache 110 , which is depicted in multiple other figures (e.g., FIGS. 1 - 1 to 5 - 3 ).
- the system 1000 may further include an interface 1015 for coupling the cache apparatus 1001 an interconnect 1005 , receiving requests 202 pertaining to addresses 204 of an address space associated with a memory 108 (e.g., from a requestor 201 ), implementing transfer operations 203 to fetch cache data 128 from a memory 108 , and so on.
- the interface 1015 may be configured to couple the cache apparatus 1001 to any suitable interconnect including, but not limited to: an interconnect, a physical interconnect, a bus, an interconnect 105 for a host device 102 , a front-end interconnect 105 A, a back-end interconnect 105 B, and/or the like.
- the interface 1015 may include, but is not limited to: circuitry, logic circuitry, interface circuitry, interface logic, switch circuitry, switch logic, routing circuitry, routing logic, interconnect circuitry, interconnect logic, I/O circuitry, analog circuitry, digital circuitry, logic gates, registers, switches, multiplexers, ALU, state machines, microprocessors, embedded processors, PIM circuitry, logic 220 , an interface 215 , a first interface 215 A, a second interface 215 B, or the like.
- the cache apparatus 1001 may include and/or be coupled to a cache memory 120 , which may include, but is not limited to: a memory, a memory array, semiconductor memory, volatile memory, RAM, SRAM, DRAM, SDRAM, and/or the like.
- the cache memory 120 includes a plurality of cache units 220 (e.g., cache units 220 - 1 through 220 -X), each cache unit 220 including and/or corresponding to a respective CMU 320 and/or cache tag 326 .
- the cache units 220 are arranged into a plurality of sets 430 (e.g., sets 430 - 1 through 430 -S), each set 430 including a plurality of ways 420 (e.g., ways 420 - 1 through 420 -N), each way 420 including and/or corresponding to a respective cache unit 220 .
- sets 430 e.g., sets 430 - 1 through 430 -S
- each set 430 including a plurality of ways 420 (e.g., ways 420 - 1 through 420 -N), each way 420 including and/or corresponding to a respective cache unit 220 .
- the system 1000 may include a component 1010 for allocating a first portion 124 of the cache memory 120 for metadata 122 pertaining to the address space, caching data within the second portion 126 of the cache memory 120 different from the first portion 124 of the cache memory 120 , and/or modifying a size of the first portion 124 of the cache memory 120 allocated for the metadata 122 based, at least in part, on a metric 212 pertaining to data prefetched into the second portion 126 of the cache memory 120 .
- the component 1010 may be configured to divide the cache memory 120 into a first partition 1024 that includes a first portion 124 of the cache memory 120 and a second partition 1026 that includes a second portion 126 of the cache memory 120 .
- the first partition 1024 may be allocated to store the metadata 122
- the second partition 1026 may be allocated to store cache data 128 .
- the component 1010 may include, but is not limited to: circuitry, logic circuitry, memory interface circuitry, memory interface logic, switch circuitry, switch logic, routing circuitry, routing logic, memory interconnect circuitry, memory interconnect logic, I/O circuitry, analog circuitry, digital circuitry, logic gates, registers, switches, multiplexers, ALU, state machines, microprocessors, embedded processors, PIM circuitry, cache logic 210 , partition logic 310 , a partition scheme 312 , a metadata mapping scheme 314 (and/or metadata logic 1014 ), an address mapping scheme 316 (and/or address logic 1016 ), and/or the like.
- the component 1010 may be configured to partition the cache memory 120 in accordance with a partition scheme 312 .
- the partition scheme 312 may define logic, rules, criteria, and/or other mechanisms for dividing cache memory resources of the cache memory 120 (e.g., cache units 220 ) between the first partition 1024 and the second partition 1026 .
- the partition scheme 312 may be further configured to specify an amount, quantity, capacity and/or size of the first partition 1024 and/or second partition 1026 (e.g., may specify the amount, quantity, capacity, and/or size of the first portion 124 and/or second portion 126 ).
- the partition scheme 312 may define logic, rules, criteria, and/or other mechanisms by which cache memory resources are dynamically reallocated and/or reassigned between the first partition 1024 and/or second partition 1026 , a such a cache-unit-based scheme, a way-based partition scheme 312 - 1 , a set-based partition scheme 312 - 2 , and/or the like.
- the partition scheme 312 configures the component 1010 to allocate M cache units 220 to the first partition 1024 (and X-M cache units 220 to the second partition 1026 ).
- the partition scheme 312 defines a cache-unit-based scheme.
- allocating M cache units 220 to the first partition 1024 may include allocating cache units 220 - 1 through 220 ⁇ M to the first portion 124 and/or allocating 220 ⁇ M+1 through 220 -X to the second portion 126 , as illustrated in FIG. 10 .
- the partition scheme 312 defines a way-based scheme (e.g., a way partition scheme 312 - 1 ).
- allocating M cache units 220 to the first partition 1024 may include allocating W1 ways 420 within each set 430 of the cache memory 120 to the first partition 1024 , where
- the partition scheme 312 may define a set-based scheme (e.g., a set partition scheme 312 - 2 ).
- allocating M cache units 220 to the first partition 1024 may include allocating E 1 sets 430 to the first partition 1024 , where
- the component 1010 may implement, include, and/or be coupled to metadata logic 1014 .
- the metadata logic 1014 may be configured for mapping, addressing, associating, referencing, and/or otherwise accessing (and/or providing access to) cache units 220 allocated to the first partition 1024 .
- the metadata logic 1014 may implement and/or include a metadata mapping scheme 314 , as disclosed herein.
- the metadata logic 1014 may include, but is not limited to: circuitry, logic circuitry, memory interface circuitry, memory interface logic, switch circuitry, switch logic, routing circuitry, routing logic, memory interconnect circuitry, memory interconnect logic, I/O circuitry, analog circuitry, digital circuitry, logic gates, registers, switches, multiplexers, ALU, state machines, microprocessors, embedded processors, PIM circuitry, cache logic 210 , partition logic 310 , a partition scheme 312 , a metadata mapping scheme 314 , and/or the like.
- the component 1010 may implement, include, and/or be coupled to address logic 1016 .
- the address logic 1016 may be configured for mapping, addressing, associating, referencing, and/or otherwise accessing (and/or providing access to) cache units 220 allocated to the second partition 1026 .
- the address logic 1016 may be configured to map and/or associate addresses 204 of the address space with cache data 128 stored within cache units 220 allocated to the second partition 1026 .
- the address logic 1016 may implement and/or include an address mapping scheme 316 , as disclosed herein.
- the address logic 1016 may include, but is not limited to: circuitry, logic circuitry, memory interface circuitry, memory interface logic, switch circuitry, switch logic, routing circuitry, routing logic, memory interconnect circuitry, memory interconnect logic, I/O circuitry, analog circuitry, digital circuitry, logic gates, registers, switches, multiplexers, ALU, state machines, microprocessors, embedded processors, PIM circuitry, cache logic 210 , partition logic 310 , a partition scheme 312 , an address mapping scheme 316 , and/or the like.
- the component 1010 may be further configured to adapt the partition scheme 312 based, at least in part, on one or metrics 212 .
- the metrics 212 may be configured to quantify prefetch performance, as disclosed herein. Alternatively, or in addition, the metrics 212 may be configured to quantify other aspects, such as cache performance (e.g., cache hit rate, cache miss rate, and/or the like).
- the component 1010 may be configured to determine and/or monitor the metrics 212 .
- the component 1010 can modify a size of the first partition 1024 (and/or first portion 124 ) of the cache memory 120 allocated for the metadata 122 based, at least in part, on one or more of the metrics 212 .
- the component 1010 may implement, include, and/or be coupled to a prefetcher 1030 for updating the metadata 122 maintained within the first portion 124 of the cache memory 120 in response to requests 202 pertaining to addresses 204 of the address space and/or selecting data to prefetch into the second portion 126 of the cache memory 120 based, at least in part, on the metadata 122 maintained within the first portion 124 of the cache memory 120 .
- the metadata 122 may include any suitable information pertaining to addresses of the address space, including, but not limited to: access characteristics, access statistics, an address sequence, address history, index table, delta sequence, stride pattern, correlation pattern, feature vector, ML feature, ML feature vector, ML model, ML modeling data, and/or the like.
- the prefetcher 1030 may include, but is not limited to: circuitry, logic circuitry, memory interface circuitry, cache circuitry, switch circuitry, switch logic, routing circuitry, routing logic, interconnect circuitry, interconnect logic, I/O circuitry, analog circuitry, digital circuitry, logic gates, registers, switches, multiplexers, ALU, state machines, microprocessors, embedded processors, PIM circuitry, a cache logic 210 , prefetch logic 230 , a stride prefetcher, a correlation prefetcher, an ML prefetcher, an LSTM prefetcher, and/or the like.
- the component 1010 is configured to determine and/or monitor the metric 212 pertaining to the data prefetched into the second portion 126 of the cache memory 120 , and to modify the size of the first portion 124 of the cache memory 120 in response to the monitoring.
- the component 1010 may be configured to increase the size of the first portion 124 of the cache memory 120 allocated for the metadata 122 (and decrease the size of the second portion 126 ) in response to the metric 212 being above a first threshold, or decrease the size of the first portion 124 (and increase the size of the second portion 126 ) in response to the metric 212 being below a second threshold.
- the component 1010 may be configured to increase the size of the first portion 124 in response to a current size of the first portion 124 being below a metadata capacity threshold and one or more of: a) a prefetch performance metric 212 that is above a prefetch performance threshold and/or b) a cache performance metric 212 that is below a cache performance threshold.
- the component 1010 may be configured to decrease the size of the first portion 124 of the cache memory 120 in response to the current size of the first portion 124 being above a prefetch capacity threshold and one or more of: a) a prefetch performance metric 212 that is below a prefetch performance threshold and/or b) a cache performance metric 212 that is above a cache performance threshold.
- the component 1010 may be configured to allocate one or more cache units 220 to the first partition 1024 .
- Allocating a cache unit 220 to the first partition 1024 (and/or first portion 124 ) may include configuring the metadata logic 1014 to address, reference, and/or otherwise provide access to the one or more cache units 220 for storage of the metadata 122 and/or removing, disabling, ignoring, and/or otherwise excluding the cache unit 220 from the address logic 1016 .
- allocating a cache unit 220 to the second partition 1026 and/or second portion 126 may include configuring the address logic 1016 to address, reference, and/or otherwise utilize the cache unit 220 as available cache capacity (e.g., for storage of cache data 128 ) and/or removing, disabling, ignoring, and/or otherwise excluding the cache unit 220 from the metadata logic 1014 .
- Allocating a cache unit 220 to the first portion 124 may include evicting cache data 128 from the cache unit 220 and disabling a cache tag 326 of the cache unit 220 .
- Allocating a cache unit 220 to the second portion 126 may include removing metadata 122 from the cache unit 220 and enabling the cache tag 326 of the cache unit 220 .
- the component 1010 may be configured to increase the size of the first portion 124 (e.g., in response to a metric 212 that is above a first threshold). Increasing the size of the first portion 124 may include compacting the cache data 128 stored within the second portion 126 .
- the component 1010 may be configured to preserve at least a portion of the cache data 128 maintained within the cache 110 when increasing the size of the first portion 124 (and decreasing the size of the second portion 126 ).
- the component 1010 may be configured to evict cache data 128 from a selected cache unit 220 , the selected cache unit 220 to remain allocated to the second portion 126 .
- the component 1010 may be further configured to move cache data 128 to the selected cache unit 220 .
- the cache data 128 may be moved from a cache unit 220 that is to be allocated from the second portion 126 to the first portion 124 .
- the component 1010 may be configured to decrease the size of the first portion 124 (e.g., in response to a metric 212 that is below a second threshold). Decreasing the size of the first portion 124 may include compacting the metadata 122 stored within the first portion 124 .
- the component 1010 may be configured to preserve at least a portion of the metadata 122 when decreasing the size of the first portion 124 .
- the component 1010 can be configured to reduce the amount of the cache memory 120 allocated for the metadata 122 from a first group of cache units 220 to a second group of cache units 220 , the second group smaller than the first group.
- the component 1010 can be further configured to compact the metadata 122 for storage within the second group of cache units 220 .
- the component 1010 may move metadata 122 stored within a cache unit 220 included in the first group of cache units 220 to a cache unit included in the second group of cache units 220 .
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Memory System Of A Hierarchy Structure (AREA)
Abstract
Described apparatuses and methods partition a cache memory based, at least in part, on a metric indicative of prefetch performance. The amount of cache memory allocated for metadata related to prefetch operations versus cache storage can be adjusted based on operating conditions. Thus, the cache memory can be partitioned into a first portion allocated for metadata pertaining to an address space (prefetch metadata) and a second portion allocated for data associated with the address space (cache data). The amount of cache memory allocated to the first portion can be increased under workloads that are suitable for prefetching and decreased otherwise. The first portion may include one or more cache units, cache lines, cache ways, cache sets, or other resources of the cache memory.
Description
- 100011 This application is a continuation of, and claims priority to, U.S. Utility patent application Ser. No. 16/997,811, filed on Aug. 19, 2020, which is incorporated herein by reference in its entirety.
- To operate efficiently, some computing systems include a hierarchical memory system, which may include multiple levels of memory. Here, efficient operation can entail cost efficiency and speed efficiency. Faster memories are typically more expensive than relatively slower memories, so designers attempt to balance their relative costs and benefits. One approach is to use a smaller amount of faster memory with a larger amount of slower memory. The faster memory is deployed at a higher level in the hierarchical memory system than the slower memory such that the faster memory is preferably accessed first. An example of a relatively faster memory is called a cache memory. An example of a relatively slower memory is a backing memory, which can include primary memory, main memory, backing storage, or the like.
- A cache memory can accelerate data operations by storing and retrieving data of the backing memory using, for example, high-performance memory cells. The high-performance memory cells enable the cache memory to respond to memory requests more quickly than the backing memory. Thus, a cache memory can enable faster responses from a memory system based on desired data being present in the cache. One approach to increasing a likelihood that desired data is present in the cache is prefetching data before the data is requested. To do so, a prefetching system attempts to predict what data will be requested by a processor and then loads this predicted data into the cache. Although a prefetching system can make a cache memory more likely to accelerate memory access operations, data prefetching can introduce operational complexity that engineers and other computer designers strive to overcome.
- The details of one or more aspects of adaptive cache partitioning are described in this document with reference to the following drawings. The same numbers are used throughout the drawings to reference like features and components:
-
FIGS. 1-1 through 1-3 illustrate example environments in which techniques for adaptive cache partitioning can be implemented. -
FIG. 2 illustrates an example of an apparatus that can implement aspects of adaptive cache partitioning. -
FIG. 3 illustrates another example of an apparatus that can implement aspects of adaptive cache partitioning. -
FIGS. 4-1 through 4-3 illustrate example operational implementations of adaptive cache partitioning. -
FIGS. 5-1 through 5-3 illustrate further example operational implementations of adaptive cache partitioning. -
FIG. 6 illustrates an example flow diagram depicting operations for adaptive cache partitioning. -
FIG. 7 illustrates an example flow diagram depicting operations for adaptive cache partitioning. -
FIG. 8 illustrates an example flow diagram depicting operations for adaptive cache partitioning based, at least in part, on metrics pertaining to prefetch performance. -
FIG. 9 illustrates an example flow diagram depicting operations for adaptive cache partitioning based, at least in part, on metrics pertaining to cache and/or prefetch performance. -
FIG. 10 illustrates an example of a system for implementing adaptive cache partitioning. - Advances in semiconductor process technology and microarchitecture have led to significant reductions in processor cycle times and increased processor density. Meanwhile, advances in memory technology have led to increasing memory density but relatively minor reductions in memory access times. Consequently, memory latencies measured in processor clock cycles are continually increasing. Cache memory, however, can help to bridge the processor-memory latency gap. Cache memory, which can store data of a backing memory, may be capable of servicing requests much more quickly than the backing memory. In some aspects, cache memory can be deployed “above” or “in front of” a backing memory in a memory hierarchy so that the cache memory is preferably accessed before accessing the slower backing memory.
- Due to, inter alia, cost considerations, the cache memory may have a lower capacity than the backing or main memory. The cache memory may, therefore, load a selected subset of the address space of the backing memory. Data can be selectively admitted and/or evicted from the cache memory in accordance with suitable criteria, such as cache admission policies, eviction policies, replacement policies, and/or the like.
- During operations, data can be loaded into the cache in response to “cache misses.” A cache miss refers to a request pertaining to an address that has not been loaded into the cache and/or is not included in the working set of the cache. Servicing a cache miss may involve fetching data from the slower backing memory, which can significantly degrade performance. By contrast, servicing requests that result in “cache hits” may involve accessing the relatively higher-performance cache memory without incurring latencies for accessing the relatively lower-performance backing memory.
- In some circumstances, cache performance can be improved through prefetching. Prefetching typically involves loading addresses into cache memory before the addresses are requested. A prefetcher can predict addresses of upcoming requests and preload the addresses into the cache memory in the background so that, when requests pertaining to the predicted addresses are subsequently received, the requests can be serviced from the cache memory as opposed to triggering cache misses. In other words, requests pertaining to the prefetched addresses may be serviced using the relatively higher-performance cache memory without incurring the latency of the relatively lower-performance backing memory.
- The benefits of prefetching can be quantified in terms of hit rate, quantity of “useful” prefetches, a ratio of useful prefetches to “bad” prefetches, or the like. As used herein, a “useful” prefetch refers to a prefetch that results in a subsequent cache hit, which is termed a “prefetch hit.” In other words, a useful prefetch is achieved with the prefetching of data associated with an address that is subsequently requested and/or otherwise accessed from the cache memory. By contrast, a “bad” prefetch or “prefetch miss” refers to a prefetch for data that is not subsequently requested and, as such, does not produce a cache or prefetch hit. Bad prefetches can adversely impact performance. Bad prefetches can consume limited cache memory resources with data that is unlikely to be requested (e.g., poison the cache), resulting in increased cache miss rate, lower hit rate, increased thrashing, higher bandwidth consumption, and so on.
- A prefetcher can try to avoid these problems by attempting to detect patterns in which a memory is accessed and then prefetching data in accordance with the detected patterns. The prefetcher may utilize metadata to detect, predict, derive, and/or exploit memory access patterns to determine accurate prefetch predictions (e.g., predict addresses of upcoming requests). The metadata utilized by the prefetcher may be referred to as “prefetcher metadata,” “prefetch metadata,” “request metadata,” “access metadata,” “memory metadata,” “memory access metadata,” or the like. This metadata may include any suitable information pertaining to an address space including, but not limited to: a sequence of previously requested addresses or address offsets, an address history, an address history table, an index table, access frequencies for respective addresses, access counts (e.g., accesses within respective windows), access time(s), last access time(s), and so on.
- The prefetcher may implement prefetch operations for workloads that are suitable for prefetching. As used herein, a “suitable” workload or workload that is “suitable for prefetching” refers to a “predictable” workload that produces memory accesses according to patterns that are detectable (and/or exploitable) by the prefetcher. A suitable workload may, therefore, refer to a workload associated with metadata from which the prefetcher is capable of deriving a predictable access pattern. Examples of suitable workloads include workloads in which memory requests are offset by a consistent offset or stride. These types of workloads may be produced by programs that access structured data repeatedly and/or in regular patterns. By way of non-limiting example, a program may repeatedly access data structures of size D, resulting in a predictable workload in which memory accesses are offset by a relatively constant offset delta A or stride, where Δ≈D. Stride and other types of access patterns may be derived from metadata pertaining to previous memory accesses of the workload. The prefetcher can utilize the memory access patterns derived from such metadata to prefetch data that is likely to be requested in the future. In the stride example above, the prefetcher can load data of addresses a+Δ, α+2Δ, . . . through a+dΔ into the cache memory in response to a cache miss for address a (where d is a configurable prefetch degree). Given the predictable memory access pattern derived from metadata associated with the workload, the data prefetched from addresses a+Δ through a+dΔ will likely result in subsequent prefetch hits, thereby preventing cache misses and resulting in improved performance.
- Some types of workloads may not be suitable for prefetching. As used herein, an “unsuitable” workload or workload that is “unsuitable for prefetching” refers to a workload that accesses memory in a manner that the prefetcher is incapable of predicting, modeling, and/or otherwise exploiting to produce accurate prefetch predictions. An unsuitable workload may refer to a workload associated with metadata from which the prefetcher is incapable of deriving address predictions, patterns, models, and/or the like. Examples of unsuitable workloads include workloads produced by programs that do not access memory in repeated and/or regular patterns, programs that access memory at seemingly random addresses and/or address offsets, programs that access memory according to patterns that are too complex or varied to be detected by the prefetcher (and/or captured in the prefetcher metadata), and/or the like. Attempting to prefetch data for unsuitable workloads may result in poor prefetch performance. Since prefetch decisions for unsuitable workloads are not guided by discernable access patterns, little, if any, of the prefetched data is likely to be subsequently requested before being evicted from the cache. As disclosed herein, inaccurate prefetch predictions may result in bad prefetches that consume the relatively limited capacity of the cache memory with data that is unlikely to be subsequently accessed to the exclusion of other data that may be accessed more frequently. Attempting prefetch for unsuitable workloads may therefore decrease cache performance (e.g., result in lower hit rate, increased miss rate, thrashing, increased bandwidth consumption, and so on). To avoid these and other problems, prefetching may not be implemented for unsuitable workloads (and/or within address regions associated with unsuitable workloads).
- A cache may service a plurality of different workloads, each having respective workload characteristics (e.g., respective memory access characteristics, patterns, and/or the like). As disclosed in further detail herein, workload characteristics within respective regions of the address space may depend on a number of factors, which may vary over time. Programs operating within different regions of the address space may, therefore, produce workloads having different characteristics (e.g., different memory access patterns). For example, a first program operating within a first region of the address space may access memory per a first stride pattern (Δ1); a second program operating within a second region may access memory per a second, different stride pattern (Δ2); a third program operating within a third region of the address space may access memory according to a more complex pattern, such as a correlation pattern; a further program operating within a fourth region of the address space may access memory unpredictably; and so on. Although the first stride pattern may be capable of producing accurate prefetches within the first region, the first stride pattern will likely produce poor results if used in the other regions (and vice versa).
- In some implementations, prefetching performance can be improved by maintaining metadata pertaining to respective regions of the address space. The metadata utilized by the prefetcher may include a plurality of entries, with each entry including information pertaining to memory accesses with a respective region of the address space. The prefetcher may utilize metadata pertaining to respective regions to inform prefetch operations within the respective regions. More specifically, the prefetcher can utilize metadata pertaining to respective regions of the address space to determine characteristics of the workload within the respective regions, determine whether the workloads are suitable for prefetching (e.g., distinguish workloads and/or regions that are suitable for prefetching from workloads and/or regions that are unsuitable for prefetching), determine access patterns within the respective regions, implement prefetch operations within the respective regions per the determined access patterns, and so on. In some implementations, the prefetcher metadata covers a plurality of fixed-sized address regions. Alternatively, the prefetcher metadata may be configured to cover adaptively sized address regions in which workload characteristics, such as access patterns, are consistent. In these implementations, the size of the address ranges covered by respective entries of the prefetcher metadata may vary within respective regions of the address space depending on, inter alia, workload characteristics and/or prefetch performance within the respective regions.
- Metadata pertaining to memory accesses are often tracked at and/or within performance-sensitive functionality of the hierarchical memory system, such as memory I/O paths or the like. Moreover, prefetch operations that utilize such metadata may be performance-sensitive (e.g., to ensure that prefetched data are available before such data is requested). Therefore, it can be advantageous to maintain metadata pertaining to memory accesses (prefetcher metadata) within high-performance memory resources. In some implementations, prefetch metadata may be maintained within high-performance cache memory. For example, a fixed portion of the high-performance memory resources of the cache may be allocated for the storage of prefetcher metadata (and/or be allocated to the prefetcher and/or prefetch logic of the cache). The size and/or configuration of the fixed portion may be determined at design, manufacturing, and/or fabrication of the cache and/or component in which the cache is deployed, such as a processor, System-on-Chip (SoC), or the like. In some implementations, the fixed portion of cache memory allocated for prefetch metadata may be set in hardware, a Register Level Transfer (RTL) implementation, and/or the like.
- The fixed allocation of cache memory may improve prefetch performance by, inter alia, decreasing the latency of metadata updates, address predictions, prefetch operations, and so on. Since the size of the cache memory is finite, allocation of the fixed portion of the cache memory for metadata storage may adversely impact other aspects of cache performance. For example, allocation of the fixed portion may reduce the amount of data that can be loaded into the cache, which can result in decreased cache performance (e.g., lead to increased miss rate, decreased hit rate, increased replacement rate, and/or the like). These disadvantages may be outweighed by the benefits of improved prefetch performance in some circumstances. For example, when servicing suitable workloads having access patterns that can be accurately predicted and/or exploited, the prefetcher can utilize metadata maintained within the fixed allocation of high-performance cache memory to implement accurate, low-latency prefetch operations that result in better overall cache performance despite the reduced cache capacity.
- In other circumstances, however, the benefits of improved prefetch performance may not outweigh the disadvantages of decreased cache capacity. For example, when servicing unsuitable workloads, the fixed portion of the cache memory resources allocated for prefetcher metadata may be effectively wasted. More specifically, when servicing workloads having access patterns that cannot be accurately predicted and/or exploited by the prefetcher, the fixed portion of the cache memory allocated for storage of prefetcher metadata may not yield useful prefetches and, as such, may not improve cache performance, much less outweigh the performance penalties incurred due to reduced cache capacity. When servicing unsuitable workloads, the fixed allocation of the cache memory would be better utilized to increase the available capacity of the cache rather than storage of prefetcher metadata.
- To address these and other issues, the amount of cache memory allocated for prefetch metadata (the fixed prefetch metadata capacity) may be predetermined. The fixed prefetch metadata capacity may be configured to provide acceptable performance under a range of different operating conditions. In some implementations, the fixed prefetch metadata capacity may be determined by testing, experience, simulation, machine-learning, and/or the like. Although the fixed amount of prefetch metadata capacity may yield acceptable performance under some conditions, performance may suffer under other conditions. Moreover, the cache may be incapable of adapting to changes in workload conditions.
- Consider, for example, situations in which a cache having a fixed prefetch metadata capacity services or otherwise operates with predominantly unsuitable workloads (e.g., workloads that produce memory accesses that the prefetcher is incapable of modeling, predicting, and/or exploiting to determine accurate prefetch predictions). The fixed prefetch metadata capacity may, therefore, not yield performance improvements. In these situations, cache performance could be improved by reducing the fixed prefetch metadata capacity or removing the fixed allocation altogether.
- Consider other situations in which the cache having the fixed prefetch metadata capacity services predominantly suitable workloads, such as a large number of workloads having different respective access patterns (tracked in respective prefetcher metadata), workloads having more complex access patterns, workloads having access patterns that involve larger amounts of prefetcher metadata, and/or the like. The fixed prefetch metadata capacity may not be sufficient to accurately capture access patterns of the workloads, resulting in decreased prefetch accuracy and decreased cache performance. Under these conditions, cache performance could be improved by increasing the metadata capacity available to the prefetcher (and/or further reducing available cache capacity).
- Workload characteristics, such as access patterns, may vary from address region to address region. Moreover, the characteristics of respective workloads, and/or corresponding address regions, may vary over time. Workload characteristics within respective regions of the address space may depend on a number of factors, including, but not limited to: the programs utilizing the respective regions, the state of the programs, the processing task(s) being performed by the programs, the execution phase of the programs, characteristics of the data structure(s) being accessed by the programs, the manner in which the data structure(s) are accessed, and/or the like. The prefetcher may utilize metadata pertaining to workload characteristics within respective address regions to determine accurate prefetch predictions within the respective address regions. The amount of prefetcher metadata needed to produce accurate prefetch predictions may, therefore, depend on a number of factors, which may vary over time, including, but not limited to: the quantity of workloads (and/or corresponding address regions), the amount of metadata needed to track access patterns within respective address regions, the prefetch technique(s) implemented by the prefetcher within the respective address regions, the complexity of the access patterns, and so on. The prefetch metadata capacity needed to produce accurate prefetch predictions under first operating conditions (and/or during a first time interval) may differ from the prefetch metadata capacity needed to produce accurate prefetch predictions under second operating conditions (and/or during a second time interval).
- To address these and other disadvantages, this document describes adaptive cache partitioning techniques that enable the amount of cache memory allocated for storage of prefetcher metadata to be dynamically adjusted. The cache memory capacity allocated to prefetch operations may therefore be tuned to improve cache performance.
- In one implementation, the prefetcher implements a stride prefetch technique for a plurality of workloads, with each workload corresponding to a respective region of the address space. The prefetcher may detect stride patterns for respective workloads using metadata pertaining to accesses within the respective regions. Detecting the stride access patterns for Y workloads may involve maintaining metadata pertaining to accesses within Y different address regions. A fixed prefetch metadata capacity, however, may be incapable of maintaining metadata capable of capturing the Y patterns, which may reduce the accuracy of the prefetch predictions, resulting in decreased cache performance. For example, the fixed prefetch metadata capacity may only be capable of tracking a subset of the Y patterns, leaving Xaddress regions uncovered. This can cause the prefetcher to implement inaccurate prefetches within the X address regions or prevent prefetching within the X address regions altogether. By contrast, the disclosed adaptive cache partitioning techniques may be capable of improving cache performance by, inter alia, increasing the amount of cache memory allocated to the prefetcher, such that the prefetcher is capable of storing metadata pertaining to stride patterns of each of the Y workloads and/or regions. The disclosed adaptive cache partitioning may be capable of modifying prefetch metadata capacity in response to changing workload conditions. For example, one or more of the Y workloads may transition from suitable to unsuitable over time, resulting in decreased prefetch performance. In response to the decrease in prefetch performance, the amount of cache memory allocated for the prefetcher metadata may be decreased, which may produce a corresponding increase to available cache capacity, thereby improving overall cache performance.
- In another example, a prefetcher may implement a correlation prefetch technique that learns access patterns that may repeat but are not as consistent as simple stride or delta address patterns (correlation patterns). The correlation patterns may include delta sequences including a plurality of elements and, as such, may be derived from larger amounts of metadata than simple stride patterns. For example, a correlation prefetch for a delta sequence that includes two elements (Δ1, Δ2) may include prefetching addresses a+Δ1, a+Δ1+Δ2, a+2Δ1+Δ2, a+Δ1+2Δ2, and so on, depending on the degree of the correlation prefetch operation. Since correlation prefetch techniques attempt to extract more complex patterns, these techniques may involve larger amounts of metadata. In situations where correlation patterns are tracked for multiple workloads and/or regions, a cache having a fixed prefetch metadata capacity may be insufficient, resulting in decreased performance. By contrast, the adaptive cache partitioning techniques disclosed herein can increase the amount of cache memory allocated to the prefetcher, resulting in improved prefetch accuracy and better overall performance, despite corresponding reductions to the available cache capacity. The disclosed adaptive cache partitioning techniques may adjust cache memory allocations responsive to changing workload conditions, such as workloads with simpler single stride access patterns, those with fewer workloads, and/or the like.
- The disclosed adaptive cache partitioning techniques can also improve the performance of machine-learning and/or machine-learned (ML) prefetch implementations, such as classification-based prefetchers, artificial neural network (NN) prefetchers, Deep Neural Network (DNN) prefetchers, Recurrent NN (RNN) prefetchers, Long Short-Term Memory (LSTM) prefetchers, and/or the like. For example, an LSTM prefetcher may be trained to model the “local context” of memory accesses within an address space, with each “local context” corresponding to a respective address range of the address space. These types of ML prefetch techniques may attempt to leverage local context since, as disclosed herein, data structures accessed by programs running within respective local contexts tend to be stored in contiguous data structures or blocks that are accessed repeatedly and/or in regular patterns. An ML prefetcher can be trained to develop and/or refine ML models within respective local contexts and can use the ML models to implement prefetch operations. Local context, however, can vary significantly across the address space due to differences in workload produced by programs operating within various regions of the address space. An ML model trained to learn the local context within one region of the address space (and/or that is produced by one program) may not be capable of accurately modeling the local context within other regions of the address space (and/or that is produced by another program). The ML models may, therefore, rely on metadata covering respective local contexts. A fixed allocation of cache memory may be insufficient to maintain ML models for the workloads being serviced by the cache, leading to poor prefetch performance. The disclosed adaptive cache partitioning techniques, however, may be capable of adjusting the amount of prefetch metadata capacity allocated to the prefetcher in accordance with the quantity and/or complexity of ML models being tracked thereby.
- The described techniques for adaptive cache partitioning can be used with caches, prefetchers, and/or other components of hierarchical memory systems. In some implementations, logic coupled to a cache memory is configured to balance performance improvements enabled by allocation of cache memory capacity for prefetch metadata against the impacts of corresponding decreases to available cache capacity. The logic can be configured to allocate a first portion of the cache memory for metadata pertaining to an address space (e.g., prefetch metadata), allocate cache data to a second portion of the cache memory that is different from the first portion, and/or modify a size of the first portion of the cache memory allocated for the metadata based, at least in part, on a metric pertaining to data prefetched into the second portion of the cache memory. The metric may be configured to quantify any suitable aspect of cache and/or prefetch performance including, but not limited to: prefetch hit rate, prefetch miss rate, quantity of useful prefetches, quantity of bad prefetches, ratio of useful prefetches to bad prefetches, cache hit rate, cache miss rate, request latency, average request latency, and/or the like. The amount of cache memory allocated for prefetcher metadata may be increased when the metric exceeds a first threshold and may be decreased when the metric falls below a second threshold.
- The metadata maintained within the first portion of the cache memory may be updated in response to requests pertaining to the address space, such as read requests, write requests, transfer requests, cache hits, cache misses, prefetch hits, prefetch misses, and/or the like. A prefetcher (and/or prefetch logic) of the cache may be configured to select data to prefetch into the second portion of the cache memory based, at least in part, on the metadata pertaining to the address space maintained within the first portion of the cache memory. The metadata may include any suitable information pertaining to addresses and/or ranges of the address space including, but not limited to: address sequence, address history, index table, delta sequence, stride pattern, correlation pattern, feature vectors, ML features, ML feature vectors, ML model, ML modeling data, and/or the like.
- The size of the first portion of the cache memory allocated for the metadata may be modified in response to monitoring one or more metrics pertaining to data prefetched into the second portion of the cache memory. The metrics may be configured to quantify prefetch performance and may include, but are not limited to: prefetch hit rate, quantity of useful prefetches, quantity of bad prefetches, ratio of useful prefetches to bad prefetches, and so on. The size of the first portion may be increased when one or more of the metrics exceeds a first threshold or may be decreased when the metrics are below a second threshold. In some implementations, the amount of cache memory allocated for storage of metadata pertaining to the address space (prefetcher metadata) may be incrementally and/or periodically increased while prefetch performance remains above the first threshold. The amount of cache memory allocated for the metadata may be increased until a maximum or upper bound is reached. Conversely, the amount of cache memory allocated for the metadata may be incrementally and/or periodically reduced while prefetch performance remains below the second threshold. The amount of cache memory allocated for the metadata may be decreased until a minimum or lower bound is reached. In some aspects, at the lower bound, no cache resources are allocated for metadata storage, and substantially all of cache memory is available as cache capacity.
- In these manners, adaptive cache partitioning provides flexible apparatuses and techniques for efficiently handling different prefetching environments. A cache memory can include a first portion allocated for metadata pertaining to an address space and a second portion allocated for caching data of the address space. In example implementations, relative sizes of the first and second portions are adapted based, at least in part, on current processing workloads. If a current processing workload is suitable for prefetching, the first portion can be sized appropriately. For example, logic can increase a size of the first portion and decrease a size of the second portion. The logic can shift some memory storage from being used for caching data to being used for storing metadata to increase prefetch capabilities for the workload that is suitable for prefetching. On the other hand, if a current processing workload is not suitable for prefetching, the first portion can be down-sized appropriately to provide greater resources for caching data. In this case, the logic can decrease the size of the first portion and increase the size of the second portion. Thus, logic can shift some cache memory storage from being used for maintaining the metadata to being used for storing cache data to decrease the resources consumed by the prefetcher for the workload that is not suitable for prefetching. The described cache partitioning can therefore adapt to efficiently provide more prefetch functionality or more cache storage depending on the current processing workload.
-
FIG. 1-1 illustrates an example apparatus 100 that can implement aspects of adaptive cache partitioning. The apparatus 100 can be realized as, for example, at least one electronic device. Example electronic-device implementations include an internet-of-things (IoTs) device 100-1, a tablet device 100-2, a smartphone 100-3, a notebook computer 100-4, a desktop computer 100-5, a server computer 100-6, a server cluster 100-7, and/or the like. Other apparatus examples include a wearable device, such as a smartwatch or intelligent glasses; an entertainment device, such as a set-top box or a smart television; a motherboard or server blade; a consumer appliance; vehicles; industrial equipment; a network-attached storage (NAS) device, and so forth. Each type of electronic device includes one or more components to provide some computing functionality or feature. - In example implementations, the apparatus 100 includes at least one
host 102, at least oneprocessor 103, at least onememory controller 104,interconnect 105,memory 108, and at least onecache 110. Thememory 108 may represent main memory, system memory, backing memory, backing storage, a combination thereof, and/or the like. Thememory 108 may be realized with any suitable memory and/or storage facility including, but not limited to: a memory array, semiconductor memory, read-only memory (ROM), random-access memory (RAM), Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), thyristor random access memory (TRAM), ferroelectric RAM (FeRAM), magnetic RAM (MRAM), resistive RAM (RRAM), magnetoresistive RAM (MRAM), spin-torque transfer RAM (STT RAM), phase-change memory (PCM), three-dimensional (3D) stacked DRAM, Double Data Rate (DDR) memory, high bandwidth memory (HBM), a hybrid memory cube (HMC), solid-state memory, Flash memory, NAND Flash memory, NOR Flash memory, 3D XPoint™ memory, and/or the like. Other examples of thememory 108 are described herein. In some aspects, thehost 102 can further include and/or be coupled to non-transitory storage, which may be realized with a device or module including any suitable non-transitory, persistent, solid-state, and/or non-volatile memory. - As shown, the
host 102, orhost device 102, can include theprocessor 103,memory controller 104, and/or other components (e.g., cache 110-1). Theprocessor 103 can be coupled to the cache 110-1, and the cache 110-1 can be coupled to thememory controller 104. Theprocessor 103 can also be coupled, directly or indirectly, to thememory controller 104. Thehost 102 can be coupled to the cache 110-2 through theinterconnect 105. The cache 110-2 can be coupled to thememory 108. - The depicted components of the apparatus 100 represent an example computing architecture with a memory hierarchy (or hierarchical memory system). For example, the cache 110-1 can be logically coupled between the
processor 103 and the cache 110-2. Further, the cache 110-2 can be logically coupled between the processor 103 (and/or cache 110-1) and thememory 108. In theFIG. 1-1 example, the cache 110-1 is at a higher level of the memory hierarchy than is the cache 110-2. Similarly, the cache 110-2 is at a higher level of memory hierarchy than is thememory 108. The indicatedinterconnect 105, as well as the other interconnects that couple various components, can enable data to be transferred between or among the various components. Interconnect examples include a bus, a switching fabric, one or more wires that carry voltage or current signals, and/or the like. - Although particular implementations of the apparatus 100 are depicted in
FIG. 1-1 and described herein, an apparatus 100 can be implemented in alternative manners. For example, thehost 102 may include additional caches, including multiple levels of cache memory (e.g., multiple cache layers). In some implementations, theprocessor 103 may include one or more internal memory and/or cache layers, such as instruction registers, data registers, an L1 cache, an L2 cache, an L3 cache, and/or the like. Further, at least one other cache and memory pair may be coupled “below” the illustrated cache 110-2 and/ormemory 108. The cache 110-2 and thememory 108 may be realized in various manners. In some implementations, the cache 110-2 and thememory 108 are both disposed on, or physically supported by, a motherboard with thememory 108 comprising “main memory.” In other implementations, the cache 110-2 includes and/or is realized by DRAM, and thememory 108 includes and/or is realized by a non-transitory memory device or module. Nonetheless, the components may be implemented in alternative ways, including in distributed or shared memory systems. Further, a given apparatus 100 may include more, fewer, or different components. - The cache 110-2 can be configured to improve memory performance by storing data of the relatively lower-
performance memory 108 within a relatively higher-performance cache memory 120. Thecache memory 120 can be provided and/or be embodied by cache hardware, which can include, but is not limited to: semiconductor integrated circuitry, memory cells, memory arrays, memory banks, memory chips, and/or the like. In some aspects, thecache memory 120 includes a memory array. The memory array may be configured ascache memory 120 including a plurality of cache units, such as cache lines or the like. The memory array may be a collection (e.g., a grid) of memory cells, with each memory cell being configured to store at least one bit of digital data. The cache memory 120 (and/or memory array thereof) may be formed on a semiconductor substrate, such as silicon, germanium, silicon-germanium alloy, gallium arsenide, gallium nitride, etc. In some cases, the substrate is a semiconductor wafer. In other cases, the substrate may be a silicon-on-insulator (SOI) substrate, such as silicon-on-glass (SOG) or silicon-on-sapphire (SOS), or epitaxial layers of semiconductor materials on another substrate. The conductivity of the substrate, or sub-regions of the substrate, may be controlled through doping using various chemical species including, but not limited to, phosphorous, boron, or arsenic. Doping may be performed during the initial formation or growth of the substrate, by ion-implantation, or by any other doping mechanism. The disclosure is not limited in this regard, however; thecache memory 120 may include any suitable memory and/or memory mechanism including, but not limited to: a memory, a memory array, semiconductor memory, volatile memory, RAM, SRAM, DRAM, SDRAM, non-volatile memory, solid-state memory, Flash memory, and/or the like. - Data may be loaded into the
cache memory 120 in response to cache misses so that subsequent requests for the data can be serviced more quickly. Further performance improvements can be realized by prefetching data into thecache memory 120, which may include predicting addresses that are likely to be requested in the future and prefetching the predicted addresses into thecache memory 120. When requests pertaining to the prefetched addresses are subsequently received at the cache 110-2, the requests can be serviced from the relatively higher-performance cache memory 120, without triggering cache misses (and without accessing the relatively lower-performance memory 108). - Addresses may be selected for prefetching based on, inter alia,
metadata 122 pertaining to the address space of thememory 108. In some implementations, access patterns within respective regions of the address space can be derived from themetadata 122, and the access patterns can be used to prefetch data into thecache memory 120. In some implementations, at least some of themetadata 122 is maintained within thecache memory 120. A portion or partition of thecache memory 120 may be allocated for storage of themetadata 122. The amount ofcache memory 120 allocated for themetadata 122 may be adjusted, tuned, modified, varied, and/or otherwise managed based, at least in part, on one or more metrics. The metrics may pertain to one or more aspects of prefetch performance (may include one or more prefetch performance metrics), such as quantity of useful prefetches, quantity of bad prefetches, ratio of useful prefetches to bad prefetches, prefetch hit rate, prefetch miss rate, and/or the like. Alternatively, or in addition, the metrics may pertain to one or more aspects of cache performance (may include one or more cache performance metrics), such as cache hit rate, cache miss rate, and/or the like. The amount ofcache memory 120 allocated to themetadata 122 may be increased when one or more of the metrics exceeds a first threshold, may be decreased when the metrics fall below a second threshold, and so on. - In the
FIG. 1-1 example, aspects of adaptive cache partitioning are implemented by cache 110-2. The disclosure, however, is not limited in this regard. The disclosed techniques for adaptive cache partitioning may be implemented in any cache 110 (e.g., cache 110-1) and/or cache layer, including acrossmultiple caches 110 and/or cache layers. In some examples, the cache 110-2 may be configured to allocate cache memory formetadata 122 pertaining to the address space. Alternatively, or in addition, one or more internal cache(s) of theprocessor 103 may be configured to implement adaptive cache partitioning as disclosed herein (e.g., an L3 cache of theprocessor 103 may allocate cache memory to storemetadata 122 pertaining to the address space). -
FIG. 1-2 illustrates further examples of apparatuses that can implement adaptive cache partitioning. The apparatus 100 can include acache 110 configured to cache data associated with an address space. Thecache 110 can be configured to cache data pertaining to any suitable address space including, but not limited to: a memory address space, a storage address space, a host address space, an input/output (I/O) address space, a main memory address space, a physical address space, a virtual address space, a virtual memory address space, an address space managed by, inter alia, theprocessor 103,memory controller 104, memory management unit (MMU), and/or the like. In theFIG. 1-2 example, thecache 110 is configured to cache data pertaining to an address space of thememory 108. Thememory 108 may, therefore, represent a backing memory of thecache 110 within the memory hierarchy. - The
cache 110 can load addresses and/or corresponding data of the relativelyslower memory 108 into the relativelyfaster cache memory 120. Data may be loaded in response to cache misses (e.g., in response to requests pertaining to addresses and/or data that are not available within the cache 110). Servicing a cache miss may involve transferring data from the relativelyslower memory 108 to the relativelyfaster cache memory 120. Cache misses may, therefore, lead to increased request latency and poor performance. Thecache 110 can address these and other issues by prefetching addresses into thecache memory 120 before requests pertaining to the addresses are received. Accurate prefetches may result in prefetch hits that can be serviced using thecache memory 120, without incurring the latencies involved in cache misses. Inaccurate prefetches, however, may consume cache memory resources with data that are not subsequently accessed, which can adversely impact performance (e.g., increase miss rate, decrease cache hit rate, increase bandwidth consumption, and so on). -
Metadata 122 pertaining to the address space can be used to, inter alia, inform prefetch operations. In some implementations, address access patterns are derived from themetadata 122, and the address access patterns are leveraged accurately to predict addresses of upcoming requests. Themetadata 122 may include any information pertaining the address space and/or data associated with the backing memory of the cache 110 (e.g., the memory 108) including, but not limited to: a sequence of previously requested addresses or address offsets, an address history, an address history table, an index table, access frequencies for respective addresses, access counts (e.g., accesses within respective windows), access time(s), last access time(s), ML features or parameters, ANN features or parameters (e.g., weight and/or bias parameters), DNN features or parameters, LSTM features or parameters, and so on. In some aspects, themetadata 122 includes a plurality of entries, each entry including information pertaining to a respective region of the address space. Themetadata 122 pertaining to respective regions of the address space may be used to, inter alia, determine address access patterns within the respective regions, which may be used to inform prefetch operations within the respective regions. - The
metadata 122 may be performance sensitive. Themetadata 122 pertaining to the address space may be retrieved, updated, and/or otherwise accessed in performance-sensitive operations, such as operations to service requests, cache operations, prefetch operations, and so on. It may be advantageous, therefore, to maintain themetadata 122 in high-performance memory resources. In theFIG. 1-2 example, at least some of themetadata 122 pertaining to the address space are maintained within thecache memory 120. A portion of thecache memory 120 may be allocated for storage of themetadata 122. As illustrated inFIG. 1-2 , a first portion 124 (or first partition) of thecache memory 120 is reserved for themetadata 122. Data corresponding to addresses of the address space associated with the memory 108 (cache data) may be cached within asecond portion 126 of the cache memory 120 (or second partition), which may be different and/or separate from thefirst portion 124. Thefirst portion 124 may include any suitable resources of thecache memory 120 including, but not limited to zero or more: cache units, cache blocks, cache lines, hardware cache lines, sets, ways, rows, columns, banks, and/or the like. The first portion 124 (or first partition) may be referred to as a metadata portion, a metadata partition, a prefetch portion, a prefetch partition, or the like. The second portion 126 (or second partition) may be referred to as a cache portion, cache partition, or the like. - The size of the
first portion 124 may be adjusted based, at least in part, on one or more metrics. The metrics may pertain to any suitable aspect(s) of thecache 110 and/or memory hierarchy including, but not limited to: request latency, average request latency, throughput, cache performance, cache hit rate, cache miss rate, prefetch performance, prefetch hit rate, prefetch miss rate, quantity of useful prefetches, quantity of bad prefetches, ratio of useful prefetches to bad prefetches, and/or the like. The size of thefirst portion 124 may be increased in response to metrics indicating that cache and/or prefetch performance satisfies one or more first thresholds and may be reduced in response to metrics that fail to satisfy one or more second thresholds. The metrics may be configured to quantify the degree to which the workload on thecache 110 is suitable for prefetching. The amount ofcache memory 120 allocated storage of themetadata 122 pertaining to prefetch operations may, therefore, correspond to a degree to which the workload is suitable for prefetching (as quantified by the one or more metrics). Under workload conditions that are suitable for prefetching, the size of thefirst portion 124 may be increased (and the size of thesecond portion 126 may be decreased), which may enable more accurate prefetching and further improve overall cache performance, despite the decrease to available cache capacity. Under workload conditions that are unsuitable for prefetching, the size of thefirst portion 124 may be decreased (and the size of thesecond portion 126 may be increased), which may increase the available capacity of thecache 110. Under suitable workloads, the increased availability of cache capacity may result in improved performance (e.g., reduced cache miss rate, lower replacement rate, and so on). -
FIG. 1-3 illustrates further examples of apparatuses that can implement adaptive cache partitioning. In theFIG. 1-3 example, thecache 110 may be an internal cache and/or cache layer of the processor 103 (and/or a processor core thereof), such as an L1 cache, L2 cache, L3 cache, or the like. In some aspects, the memory hierarchy may further include acache 110 disposed between theprocessor 103 and memory 108 (a cache 110-2 as illustrated inFIG. 1-1 ). - The
cache 110 may be configured to cache data associated with addresses of an address space. In theFIG. 1-3 example, thecache 110 is configured to cache data pertaining to a virtual address space managed by an MMU, such as thememory controller 104, an operating system, or the like. The address space may be larger than the physical address space of the memory resources of the host 102 (e.g., the address space may be larger than the physical address space of the memory 108). The address space may be a 32-bit address space, a 64-bit address space, a 128-bit address space, or the like. - The
cache 110 can allocate afirst portion 124 of thecache memory 120 for storage ofmetadata 122 pertaining to the address space. Themetadata 122 may include information pertaining to accesses to respective addresses and/or address regions of the address space, which may be used to, inter alia, prefetch data into thesecond portion 126 of the cache memory 120 (prefetch cache data 128 pertaining to respective addresses of the address space). Thecache 110 can adjust the amount ofcache memory 120 allocated to storage of themetadata 122 based, at least in part, on one or more metrics, as disclosed herein. Thecache 110 can increase the amount ofcache memory 120 allocated to thefirst portion 124 under workload conditions that are suitable for prefetching and can decrease the amount allocated to thefirst portion 124 under workload conditions that are unsuitable for prefetching. -
FIG. 2 illustrates an example 200 of an apparatus for implementing adaptive cache partitioning. The illustrated apparatus includes acache 110 configured to accelerate memory storage operations pertaining to a memory 108 (a backing memory). Thememory 108 may be any suitable memory and/or storage facility, as disclosed herein. - The
cache 110 may include and/or be coupled to aninterface 215, which may be configured to receiverequests 202 pertaining to an address space associated with thememory 108 from at least onerequestor 201. The requestor 201 can be ahost 102,processor 103, processor core, client, computing device, communication device (e.g., smartphone), Personal Digital Assistant (PDA), tablet computer, Internet of Things (IoT) device, camera, memory card reader, digital display, personal computer, server computer, data management system, Database Management System (DBMS), embedded system, system-on-chip (SoC) device, or the like. The requestor 201 can include a system motherboard and/or backplane and can include processing resources (e.g., one or more processors, microprocessors, control circuitry, and/or the like). Theinterface 215 can be configured to couple thecache 110 to one or more interconnects, such as an interconnect 105 ahost 102 or the like. - The
cache 110 may be configured to servicerequests 202 pertaining to thememory 108 by use of high-performance memory resources, such ascache memory 120. Thecache memory 120 may include cache memory resources. As used herein, a “cache memory resource” refers to any suitable data and/or memory storage resource. In theFIG. 2 example, the cache memory resources of thecache memory 120 include a plurality ofcache units 220, eachcache unit 220 capable of storing a respective quantity of data. Thecache units 220 may include and/or correspond to any suitable type and/or arrangement of memory resource(s) including, but not limited to: a unit, a memory unit, a block, a memory block, a cache block, a cache memory block, a page, a memory page, a cache page, a cache memory page, a cache line, a hardware cache line, a set, a way, a memory array, a row, a column, a bank, a memory bank, and/or the like. In theFIG. 2 example, thecache memory 120 includes X cache units 220 (cache units 220-1 through 220-X). - In some implementations, the
cache 110 may be logically disposed between the requestor 201 and the memory 108 (e.g., may be interposed between the requestor 201 and the memory 108). In theFIG. 2 example, therequestor 201,cache 110, andmemory 108 may be communicatively coupled to aninterconnect 105. Thecache 110 may include and/or be coupled to logic (cache logic 210) that is configured to receiverequests 202 pertaining an address space associated with thememory 108 by, inter alia, monitoring, filtering, sniffing, extracting, intercepting, identifying and/or otherwise retrievingrequests 202 pertaining to the address space on theinterconnect 105. Thecache logic 210 can be further configured to mapaddresses 204 tocache units 220.Requests 202 pertaining toaddresses 204 that map tocache units 220, including valid data associated with theaddresses 204 result in cache hits, which can be serviced by use of the relatively higher-performance cache memory 120.Requests 202 pertaining toaddresses 204 do not map to valid data stored within thecache memory 120 result in cache misses. Servicing arequest 202 pertaining to anaddress 204 resulting in a cache miss may involve implementing atransfer operation 203 to fetch data associated with theaddress 204 from the relativelyslower memory 108, which may increase the latency of therequest 202. - In some implementations, the
cache logic 210 includes and/or is coupled toprefetch logic 230. Theprefetch logic 230 may be configured to predict theaddresses 204 ofupcoming requests 202. Theprefetch logic 230 can be further configured to implement transfer operations 203 (or cause thecache logic 210 to implement transfer operations 203) to prefetch data corresponding to the predicted addresses 204 into thecache memory 120. Theprefetch logic 230 may cause thetransfer operations 203 to be implemented beforerequests 202 pertaining to the predicted addresses 204 are received at thecache 110.Subsequent requests 202 pertaining to the predicted addresses 204 may, therefore, result in prefetch hits that can be serviced using the relatively higher-performance cache memory 120, without incurring latencies involved with servicing cache misses (or accessing the relatively lower-performance memory 108). The latency of therequests 202 pertaining toprefetched addresses 204 may not include latencies involved in loading data of the predicted addresses 204 into thecache 110. - The
prefetch logic 230 may determine addresses predictions based, at least in part, onmetadata 122 pertaining to the address space (e.g., prefetcher metadata). As disclosed herein, themetadata 122 may include any suitable address access characteristics including, but not limited to: a sequence of previously requested addresses or address offsets, an address history, an address history table, an index table, access frequencies for respective addresses, access counts (e.g., accesses within respective windows), access time(s), last access time(s), and so on. Theprefetch logic 230 may be configured to maintain and/or update themetadata 122 in response to events pertaining torespective addresses 204, which may include, but are not limited to: data access requests, read requests, write requests, copy requests, clone requests, trim requests, erase requests, delete requests, cache misses, cache hits, and/or the like. Theprefetch logic 230 may utilize themetadata 122 to determine address access patterns and can use the determined address access patterns to predict theaddresses 204 ofupcoming requests 202. In some implementations, themetadata 122 may include a plurality of entries, each entry including information pertaining to a respective region of the address space. Theprefetch logic 230 may utilize themetadata 122 to determine access patterns within respective regions of the address space and use the determined access patterns to predictaddresses 204 ofupcoming requests 202 within the respective regions. - In some implementations, at least a portion of the
metadata 122 is maintained within thecache memory 120. Thecache logic 210 may allocate afirst portion 124 of thecache memory 120 for storage of themetadata 122 and/or use by theprefetch logic 230. Thecache logic 210 may maintain data pertaining to addresses of the address space (cache data 128) within a second portion of thecache memory 120, which may be separate and/or distinct from thefirst portion 124 of thecache memory 120. In theFIG. 1-3 example, thecache logic 210 allocatesM cache units 220 to storage of themetadata 122. Thefirst portion 124 may include cache units 220-1 through 220−M, and thesecond portion 126 used to storecache data 128 may include C cache units, where C=X−M (cache units 220−M+1 through 220-X). AlthoughFIG. 1-3 illustrates one example for adaptive cache partitioning, the disclosure is not limited in this regard and could be adapted to partition thecache memory 120 according to any suitable partitioning scheme. In another example, thecache logic 210 may allocate cache units 220-1 through 220-C to thesecond portion 126 and allocate cache units 220-C+1 through 220-X to thefirst portion 124. In other examples, thecache logic 210 may allocate other groupings ofcache units 220, such as sets, ways, rows, columns, banks, and/or the like. - The
cache logic 210 may be configured to implement a first mapping scheme (a metadata scheme) to map themetadata 122, and/or entries thereof, tocache units 220 within thefirst portion 124. Thecache logic 210 may be further configured to implement a second mapping scheme (a cache or address mapping scheme 316) to map addresses 204 tocache units 220 allocated to thesecond portion 126. Thecache logic 210 may be configured to modify the first mapping scheme and/or second mapping scheme in response to modifying the size and/or configuration of thecache memory 120 allocated to one or more of thefirst portion 124 and thesecond portion 126. - The
cache logic 210 may adjust the quantity ofcache memory 120 allocated to storage of themetadata 122 based, at least in part, on one ormore metrics 212. Themetrics 212 may be configured to quantify a degree to which a workload on thecache 110 is suitable for prefetching. More specifically, themetrics 212 may be configured to quantify aspects of prefetch performance (e.g., may include one or moreprefetch performance metrics 212 and/ormetrics 212 pertaining to prefetch performance), such as a prefetch hit rate, quantity or useful prefetches, ratio of useful prefetches to bad prefetches, and/or the like. Alternatively, or in addition, themetrics 212 may be configured to quantify other performance characteristics, including cache performance (e.g., may include one or morecache performance metrics 212 and/ormetrics 212 pertaining to cache performance), such as cache hit rate, cache miss rate, and/or the like. - The
cache logic 210 may use themetrics 212 to determine the degree to which workload(s) on thecache 110 are suitable for prefetching and dynamically partition thecache memory 120 accordingly. More specifically, thecache logic 210 can adjust the amount ofcache memory 120 allocated to thefirst portion 124 and/orsecond portion 126 based, at least in part, on one or more of themetrics 212. Thecache logic 210 may periodically monitor themetrics 212 and may determine whether to modify the size of thefirst portion 124 in response to the monitoring. Thecache logic 210 may increase the amount ofcache memory 120 allocated to thefirst portion 124 when one or more of themetrics 212 are above a first threshold (when prefetch performance exceeds the first threshold) and decrease the amount when the one ormore metrics 212 are below a second threshold (when prefetch performance falls below the second threshold). Thecache logic 210 may monitor the one ormore metrics 212 in background operations, during idle periods (when not actively servicingrequests 202, implementing prefetch operations, or the like), on a determined schedule, and/or the like. Increasing the size of thefirst portion 124 may include decreasing the size of thesecond portion 126, whereas decreasing the size of thefirst portion 124 may include increasing the size of thesecond portion 126. More specifically, increasing the size of thefirst portion 124 may include reallocating one ormore cache units 220 of thesecond portion 126 to thefirst portion 124, whereas decreasing the size of thefirst portion 124 may include reallocating one ormore cache units 220 of thefirst portion 124 to thesecond portion 126. Modifying the size of thefirst portion 124 may include modifying the number ofcache units 220 included in the first portion 124 (e.g., modifying M), which may result in modifying the number ofcache units 220 included in the second portion 126 (e.g., modify C where C=X−M). - Resizing the amount of
cache memory 120 allocated to themetadata 122 may include manipulating themetadata 122 and/orcache data 128. Reducing the amount ofcache memory 120 allocated to themetadata 122 may include evicting portions of themetadata 122. Themetadata 122 may be evicted according to a policy (a metadata eviction policy). The metadata eviction policy may specify that the oldest and/or least recently used entries of themetadata 122 are to be evicted when the size of thefirst portion 124 is reduced. Similarly, reducing the size of thesecond portion 126 allocated for thesecond portion 126 may include evictingcache data 128 from one ormore cache units 220. Thecache data 128 may be evicted according to a policy (a replacement or eviction policy), which may include, but is not limited to: First In First Out (FIFO), Last In First Out (LIFO), Least Recently Used (LRU), Time Aware LRU (TLRU), Most Recently Used (MiRU), Least-Frequently Used (LFU), random replacement, and/or the like. - The
cache logic 210,prefetch logic 230, and/or components and functionality thereof may include, but are not limited to: circuitry, logic circuitry, control circuitry, interface circuitry, input/output (I/O) circuitry, fuse logic, analog circuitry, digital circuitry, logic gates, registers, switches, multiplexers, arithmetic logic units (ALU), state machines, microprocessors, processor-in-memory (PIM) circuitry, and/or the like. Thecache logic 210 may be configured as a controller of the cache 110 (or cache controller). Theprefetch logic 230 may be configured as a prefetcher (or cache prefetcher) of thecache 110. -
FIG. 3 illustrates another example 300 of an apparatus for implementing adaptive cache partitioning. Thecache 110 may be configured to cache data pertaining to an address space associated with amemory 108, as disclosed herein. In theFIG. 3 example, the apparatus includes acache 110 coupled between the requestor 201 andmemory 108. In some implementations, thecache 110 is interposed between the requestor 201 andmemory 108. Thecache 110 can include and/or be coupled to afirst interface 215A and/or asecond interface 215B. Thefirst interface 215A may be configured receiverequests 202 pertaining toaddresses 204 of the address space through afirst interconnect 105A and, as such, may be referred to as a front-end interconnect. Therequests 202 may correspond to one ormore requestors 201, as disclosed herein. Thesecond interface 215B may be configured to, inter alia, couple the cache 110 (and/or cache logic 210) to a backing memory, such as thememory 108 and, as such, may be referred to as a back-end interface.Cache data 128 may be loaded into thecache 110 intransfer operations 203 implemented by and/or through thesecond interface 215B. Alternatively, therequestor 201,cache 110, andmemory 108 may be coupled to a same interconnect, as illustrated inFIG. 2 . Thecache memory 120 may include a plurality of cache units 220 (e.g., X cache units 220-1 through 220-X). Thecache units 220 may include memory cells, memory rows, memory columns, memory pages, cache lines, hardware cache lines,cache memory units 320, cache tags 326, and/or the like. In some implementations, thecache units 220 are organized into a plurality of sets, each set including a plurality of ways, each way including and/or corresponding to arespective cache unit 220. - In the
FIG. 3 example, eachcache unit 220 includes and/or is associated with a respective cache memory unit (CMU) 320 and/orcache tag 326. Thecache tag 326 may be configured to identify the data stored within theCMU 320. TheCMU 320 may be capable of storingcache data 128 associated with one ormore addresses 204 of the address space (one or more addressable data units). In some aspects, each CMU 320 (and each cache unit 220) is capable of storing data of U addresses 204 (or U data units). In implementations where address space references respective bytes (is byte-addressable), each CMU 320 (and corresponding cache unit 220) may have a capacity of U bytes. - The
cache units 220 may further include and/or be associated withcache metadata 322. Thecache metadata 322 of acache unit 220 may include information pertaining to thecache data 128 stored within theCMU 320 of the cache unit 220 (e.g., cache metadata 322-1 through 322-X pertaining tocache data 128 stored within CMU 320-1 through 320-X of cache units 220-1 through 220-X, respectively). Thecache metadata 322 may include any suitable information pertaining to the contents of acache unit 220 including, but not limited to: validity information indicating whethercache data 128 stored within theCMU 320 of thecache unit 220 is valid, a “dirty” flag indicating whether thecache data 128 has been modified since being loaded from the memory 108 (should be written to thememory 108 before eviction), access count, last access time, access frequency, a prefetch flag indicating whether thecache data 128 was loaded in a prefetch operation, and so on. In some implementations, thecache metadata 322 of acache unit 220 may be maintained within theCMU 320 of thecache unit 220. Alternatively,cache metadata 322 may be maintained within separate cache memory resources. - The
cache logic 210 may implement, include and/or be coupled topartition logic 310, which may be configured to, inter alia, partition thecache memory 120 into afirst portion 124 and second portion 126 (e.g., divide thecache memory 120 into a first partition and second partition). Thecache logic 210 may be configured to map, assign, and/or otherwise associate addresses 204 withcache units 220 allocated to thesecond portion 126. Thecache logic 210 may associateaddresses 204 withcache units 220 according to an address-cache mapping scheme (anaddress mapping scheme 316 or address mapping logic). In theFIG. 3 example, theaddress mapping scheme 316 may logically divideaddresses 204 into a tag region (an address tag 206) and an offsetregion 205. The offsetregion 205 may be defined within a least significant bit (LSB) address region. The number of bits included in the offsetregion 205 may correspond to the capacity of the cache units 220 (e.g., O=log2 U, where O is the number of bits included in the offsetregion 205, and U is the capacity of theCMU 320 of the cache units 220 (in terms of addressable data units)). Theaddress tag 206 may be defined within the remaining most significant bit (MSB) address region. The number of bits included in address tags 206 (TA) may be TA=AL−U, where AL is the number of bits included in the address 204 (e.g., 64 bits). Although example addresses 204, offsetregions 205, and addresstags 206 are illustrated and described herein in reference to big-endian format, the disclosure is not limited in this regard and could be adapted for use withaddresses 204 in any suitable format, encoding, or endianness. - The
cache logic 210 can be configured to map address tags 206 tocache units 220 using any suitable address mapping scheme 316 (or map logic) including, but not limited to: a modulo scheme (IU=TA% C, where IU is the index of thecache unit 220 to which address tag (TA) 206 maps within a set or group of C cache units 220), a mapping function (e.g., by IU=fU (TA, C), where fU is a function that maps address tags (TA) 206 to indexes IU within a group of C available cache units 220), a hash function (e.g., by IU=fh (TA, C), where fh is a function that maps address tags (TA) 206 to indexes IU by hash values derived from the address tags (TA) 206, a direct mapping, a fully associate mapping, a set-associative mapping, and/or the like. Thecache logic 210 canlookup cache units 220 foraddresses 204 and/or determine whether thecache memory 120 includes valid data corresponding to the addresses 204 (e.g., determine whetheraddresses 204 are cache hits or cache misses). Thecache logic 210 canlookup cache units 220 forrespective addresses 204 by, inter alia, matchingaddress tags 206 of theaddresses 204 to cache tags 326 of thecache units 220.Addresses 204 that match cache tags 326 may be identified as cache hits, whereasaddresses 204 that do not match cache tags 326 may be identified as cache misses. In some implementations, thecache logic 210 implements a hierarchical or set-basedaddress mapping scheme 316 in which addresstags 206 are first mapped to one of a plurality of sets and then are compared to cache tags 326 of a plurality of ways of the set, each way corresponding to arespective cache unit 220. - The
cache logic 210 may include and/or be coupled toprefetch logic 230, which may utilizemetadata 122 pertaining to the address space to predict addresses ofupcoming requests 202 and prefetch data corresponding to the predicted addresses into thecache memory 120, as disclosed herein. At least a portion of themetadata 122 may be maintained within thecache memory 120. Thecache logic 210 may include, implement, and/or be coupled topartition logic 310, which may be configured to divide thecache memory 120 into afirst portion 124 and a second portion 126 (partition the cache memory 120). Thefirst portion 124 may be allocated formetadata 122 pertaining to the address space. Thepartition logic 310 may utilize a remaining available capacity of the cache memory 120 (the second portion 126) as available cache capacity. Thecache logic 210 can use thesecond portion 126 of thecache memory 120 to maintaincache data 128, as disclosed herein. - The
partition logic 310 may be configured to partition thecache memory 120 into a first partition comprising afirst portion 124 of the cache memory resources of the cache memory 120 (e.g., a first quantity of cache units 220) and a second partition comprising asecond portion 126 of the cache memory resources (e.g., a second quantity of cache units 220). Thefirst portion 124 of thecache memory 120 may be allocated to store themetadata 122 pertaining to the address space, and thesecond portion 126 may be allocated as available cache capacity of the cache 110 (e.g., allocated to store cache data 128). Thepartition logic 310 may be configured to adjust the quantity to cache memory resources allocated to thefirst portion 124 and/orsecond portion 126 based, at least in part, onmetrics 212 that are indicative of prefetch performance and/or a degree to which workload(s) being serviced by thecache 110 are suitable for prefetching. Thepartition logic 310 can be configured to zero ormore cache units 220 to thefirst portion 124 and allocate one ormore cache units 220 to thesecond portion 126. - In the example 300 illustrated in
FIG. 3 , the cache logic 210 (and/or partition logic 310) allocatesM cache units 220 of Xavailable cache units 220 of thecache memory 120 to thefirst portion 124, such thatC cache units 220 are allocated to thesecond portion 126, where C=X−M. Thefirst portion 124 of thecache memory 120 may include cache units 220-1 through 220−M, and thesecond portion 126 may includecache units 220−M+1 through 220-X. The disclosure is not limited in this regard, however, and could partition thecache memory 120 and/or allocatecache units 220 in any suitable pattern or in accordance with any suitable scheme or arrangement. - The
cache logic 210 may implement, include and/or be coupled to a metadata mapping scheme 314 (and/or metadata mapping logic), which may be configured to map, address, associate, reference, and/or otherwise provide access tocache units 220 allocated to thefirst portion 220. Themetadata mapping scheme 314 may enable the prefetch logic 230 (or external prefetcher) toaccess metadata 122 maintained within thefirst portion 124 of thecache memory 120. In some implementations, themetadata mapping scheme 314 implemented by the cache logic 210 (and/or partition logic 310) maps metadata addresses tocache units 220 allocated to the first portion 124 (and/or offsets within the respective cache units 220). Themetadata mapping scheme 314 may define a metadata address space (MA), MA∈{0, . . . (M·U)−1}, where U is the capacity of a cache unit 220 (capacity of a CMU 320), and M is the quantity ofcache units 220 allocated to thefirst portion 124. Alternatively, or in addition, 220, the metadata address space (MA) may define a range of cache unit indexes MI, each corresponding to a respective one of theM cache units 220 allocated to thefirst portion 124, MA ∈{0, . . . , M−1}. Although examples of metadata mapping schemes 314 (and/or metadata addressing and/or access schemes) are described herein, the disclosure is not limited in this regard and could be adapted to provide access tocache memory 120 allocated to thefirst portion 124 through any suitable mechanism or technique. - Partitioning the
cache memory 120 into a plurality of portions (e.g., afirst portion 124 and second portion 126) may include configuring mapping logic and/or mapping schemes of the portions to allocate, include, and/or incorporate designated cache memory resources of thecache memory 120. As used herein, “allocating,” “partitioning,” or “assigning” a portion of the cache memory 120 (or “allocating,” “partitioning,” or “assigning” cache memory resources to a portion or partition of the cache memory 120) may include configuring mapping logic and/or a mapping scheme of the portion (or partition) to “include” or “reference” the cache memory resources. Configuring mapping logic and/or a mapping scheme to “include” or “reference” cache memory resources allocated to the portion or partition of thecache memory 120 may include configuring the mapping logic and/or mapping scheme to reference, allocate, include, incorporate, add, map, address, associate and/or otherwise access (or provide access to) the cache memory resources. Allocating cache memory resources to a portion or partition of the cache memory 120 (e.g., the first portion 124) may further include “deallocating,” “removing,” or “excluding” the memory resources from other partitions or portions of the cache memory 120 (e.g., the second portion 126). As used herein, “deallocating,” “removing,” or “excluding” cache memory resources from a portion or partition of thecache memory 120 may include configuring mapping logic and/or a mapping scheme of the portion (or partition) to “remove,” “exclude,” or “dereference” the cache memory resources. Configuring mapping logic and/or a mapping scheme to “remove,” “exclude,” or “dereference” cache memory resources may include configuring the mapping logic and/or mapping scheme to remove, disable, ignore, deallocate, dereference, demap, bypass, and/or otherwise exclude the cache memory resources from the partition or portion (e.g., prevent the cache memory resources from being access by and/or through the mapping logic and/or mapping scheme). - In the
FIG. 3 example, the cache logic 210 (and/or partition logic) may be configured to allocateM cache units 220 to thefirst portion 124 of the cache memory 120 (e.g., cache units 220-1 through 220−M). Allocating theM cache units 220 to thefirst portion 124 may include configuring the metadata mapping scheme 314 (and/or metadata mapping logic) to include and/or reference cache units 220-1 through 220−M, as disclosed herein. Allocating the M cache units to thefirst portion 124 may further include deallocating and/or excluding theM cache units 220 from thesecond portion 126. Deallocating or excluding cache units 220-1 through 220−M from thesecond portion 126 of thecache memory 120 may include configuring theaddress mapping scheme 316 to remove, exclude, and/or dereference the cache units 220-1 through 220−M. Theaddress mapping scheme 316 may be configured such that addresses 204 (and/or address tags 206) do not map tocache units 220 allocated to thefirst portion 124. In theFIG. 3 example, theaddress mapping scheme 316 may be configured to index a subset of the Xavailable cache units 220 of thecache memory 120, the subset includingC cache units 220, where C=X−M (e.g.,cache units 220−M+1 through 220-X) and excluding theM cache units 220 allocated to the first portion 124 (e.g., cache units 220-1 through 220−M). In some implementations, acache unit 220 may be excluded from theaddress mapping scheme 316 by, inter alia, disabling thecache tag 326 associated with thecache unit 220. Allocating the cache units 220-1 through 220−M to thefirst portion 124 of thecache memory 120 may, therefore, include disabling cache tags 326-1 through 326-M. InFIG. 3 , the cache tags 326 of thecache units 220 that are allocated to thefirst portion 124 of the cache memory 120 (and are excluded from thesecond portion 126 and/or address mapping scheme 316) are highlighted with crosshatching. Cache tags 326-M+ 1 through 326-X corresponding to thecache units 220−M+1 through 220-X included in thesecond portion 126 of thecache memory 120 may remain enabled and/or be indexable byaddress tags 206 in theaddress mapping scheme 316. - In some aspects, the cache logic 210 (and/or partition logic 310) divides the
cache memory 120 in accordance with apartition scheme 312. Thepartition scheme 312 may logically define how cache memory resources are divided between thefirst portion 124 and thesecond portion 126 of thecache memory 120. Thepartition scheme 312 may also logically define how cache resources are allocated between the partitions. Thepartition scheme 312 may define rules, schemes, logic, and/or criteria by which thecache memory 120 may be dynamically allocated and/or partitioned between thefirst portion 124 and thesecond portion 126. Thepartition scheme 312 may be further configured to specify the amount, quantity, and/or capacity of cache memory resources to allocate to thefirst portion 124 and/orsecond portion 126, respectively. Adapting thepartition scheme 312 may include modifying the amount, quantity, and/or capacity of the cache memory resources allocated to thefirst portion 124 and/orsecond portion 126. In theFIG. 3 example, thepartition scheme 312 allocatesM cache units 220 for metadata storage (e.g., allocatesM cache units 220 to the first portion 124) and allocatesX-M cache units 220 as available cache capacity (e.g., allocates the remainingX-M cache units 220 to the second portion 126). - In some implementations, the
partition scheme 312 configures the cache logic 210 (and/or partition logic 310) to allocatecache units 220 to thefirst portion 124 by cache unit 220 (may partition thecache memory 120 in accordance with a cache-unit or cache-unit-based scheme). Thepartition scheme 312 may allocatecache units 220 sequentially by cache unit address or index. In asequential partition scheme 312, allocatingM cache units 220 to thefirst portion 124 may include allocating cache units 220-1 through 220−M to thefirst portion 124 such thatcache units 220−M+1 through 220-X are allocated to thesecond portion 126, as illustrated inFIG. 3 . Increasing the size of thefirst portion 124 may include allocatingadditional cache units 220 to thefirst portion 124 sequentially. In a sequential scheme, increasing the amount ofcache units 220 allocated to thefirst portion 124 fromM cache units 220 to M+R cache units 220 (e.g., increasing the size of thefirst portion 124 by R cache units 220) may include allocatingcache units 220−M+1 through 220−M+R from thesecond portion 126 to thefirst portion 124. As a result, thefirst portion 124 may include cache units 220-1 through 220−M+R, and thesecond portion 126 may includecache units 220+M+R+1 through 220-X. Conversely, decreasing the size of thefirst portion 124 fromM cache units 220 to M−R cache units 220 (e.g., decreasing the size of thefirst portion 124 by R cache units 220) may include allocatingcache units 220−M−R through 220−M from thefirst portion 124 to thesecond portion 126. As a result, thefirst portion 124 may include cache units 220-1 through 220−M−R, and thesecond portion 126 may includecache units 220+M−R+ 1 through 220-X. - Although examples of
partitioning schemes 312 are described herein, the disclosure is not limited in this regard. In other implementations, thepartition scheme 312 may configure the cache logic 210 (and/or partition logic 310) to allocatecache units 220 in other patterns, sequences, and/or schemes. In one example, thepartition scheme 312 may define an interleaved allocation pattern, a modulo pattern, a hash pattern, may allocatecache units 220 in accordance with the hardware structure of thecache memory 120 and/or manner in whichcache units 220 of thecache memory 120 are organized, and/or the like. - In some implementations, the
cache memory 120 includes a plurality of sets, each set including a plurality of ways, each way including and/or corresponding to arespective cache unit 220. Thepartition scheme 312 may allocate cache memory resources by way, set, or the like. In a way-based scheme, the cache logic 210 (and/or partition logic 310) may partition thecache memory 120 by way. In one example, thecache logic 210 may allocate a first quantity of zero or more ways within one or more sets to thefirst portion 124 and may allocate a second quantity of one or more ways within one or more sets to thesecond portion 126. In another example, thefirst portion 124 includes a first quantity of zero or more ways within each set of thecache memory 120, and thesecond portion 126 includes a second quantity of one or more ways within each set. In acache memory 120 that includes N-way sets (e.g., each set including N ways of cache units 220), thefirst portion 124 may include a first group of ways within each set and thesecond portion 126 may include a second group of ways within each set (e.g., may include ways not allocated to the first portion 124). AllocatingM cache units 220 to thefirst portion 124 of thecache memory 120 by way may include allocating W1 ways to the first portion within each set, where -
- and S is the quantity of sets included in the cache memory. The
second portion 126 may be allocated W2 ways within each set, where W2=N−W1 or -
- and N is the number of ways within each set. The
first portion 124 may includeways 1 through W1 within each set, and thesecond portion 126 may include ways W1+1 through N within each set. The disclosure is not limited in this regard, however, and could distribute ways between thefirst portion 124 andsecond portion 126 in any suitable manner, scheme, and/or pattern. - In a way-based scheme, increasing the amount of
cache memory 120 allocated to thefirst portion 124 from M to M+R cache units 220 may include allocating an additional Wm ways of each set to the first portion 124 (and deallocating the W1A ways of each set from the second portion 126), where -
- As a result, the
first portion 124 may includeways 1 through W1+W1A within each set, and thesecond portion 126 may include ways W1+W1A+1 through N within each set. Conversely, decreasing the amount ofcache memory 120 allocated to thefirst portion 124 from M to M−R cache units 220 may include allocating W2A ways of each set from thefirst portion 124 to thesecond portion 126, where -
- As a result, the
first portion 124 may includeways 1 through W1−W2A within each set, and thesecond portion 126 may include ways W1−W2A+1 through N within each set. - Alternatively, or in addition, the cache logic (and/or partition logic 310) may be configured to partition the
cache memory 120 by set. Allocating a set may include allocating each way (and/or corresponding cache unit 220) of the set. In a set-based scheme, thefirst portion 124 may include a first group of zero or more sets of thecache memory 120 and thesecond portion 126 may include a second group of one or more of the sets (may include each set of thecache memory 120 not allocated to the first portion 124). AllocatingM cache units 220 of thecache memory 120 to thefirst portion 124 by set may include allocating E1 sets to thefirst portion 124, where -
- and N is the number of ways included in each set such that E2 sets are allocated to the
second portion 126, where E2=S−E1 and S is the number of sets included in thecache memory 120. Thefirst portion 124 may includesets 1 through E1 of thecache memory 120 and thesecond portion 126 may include sets E1+1 through S. The disclosure is not limited in this regard, however, and could distribute sets between thefirst portion 124 andsecond portion 126 in any suitable manner, scheme, and/or pattern. - In a set-based scheme, increasing the amount of
cache memory 120 allocated to thefirst portion 124 from M to M+R cache units 220 may include allocating an additional E1A sets of thecache memory 120 to the first portion 124 (and deallocating the E1A sets from the second portion 126), where -
- As a result, the
first portion 124 may includesets 1 through E1+E1A and thesecond portion 126 may include sets E1+E1A+1 through S. Conversely, decreasing the amount ofcache memory 120 allocated to thefirst portion 124 from M to M−R cache units 220 may include allocating E2A sets from thefirst portion 124 to thesecond portion 126, where -
- As a result, the
first portion 124 may includesets 1 through E1−E2A, and thesecond portion 126 may include sets E1−E2A+1 through S. - As disclosed herein, the cache logic 210 (and/or partition logic 310) can adjust the amount of
cache memory 120 allocated to the first portion 124 (and/or second portion 126) based, at least in part, on one ormore metrics 212. The metrics 2121 may be configured to quantify the degree to which the workload on thecache 110 is suitable for prefetching. Themetrics 212 may be configured to quantify aspects of prefetch performance. The cache logic 210 (and/or prefetch logic 310) can determine and/or monitor any suitable aspect of prefetch performance, such as prefetch hit rate, prefetch miss rate, quantity of useful prefetches, quantity of bad prefetches, ratio of useful prefetches to bad prefetches, and/or the like. Prefetch hit rate may be determined by tracking accesses to prefetchedcache data 128 within thecache memory 120. As used herein, “prefetched”cache data 128 refers tocache data 128 that was loaded into thecache memory 120 before being requested (e.g., by theprefetch logic 230 and/or in a prefetch operation). Conversely,non-prefetched cache data 128 refers tocache data 128 that was loaded in response to arequest 202, a cache miss, or the like. In some implementations, thecache logic 210 tracks prefetchedcache data 128 by use ofcache metadata 322. Thecache logic 210 may record a prefetch flag or other indicator in thecache metadata 322 to distinguish prefetchedcache data 128 fromnon-prefetched cache data 128. A prefetch hit rate may be determined based on access metrics of prefetchedcache data 128 maintained within thecache metadata 322, such as access count, access frequency, last access time, and/or the like. Alternatively, or in addition, prefetch miss rate may be determined by identifying prefetchedcache data 128 having no accesses or accesses below a threshold quantity or frequency. - In some aspects, the
metrics 212 are further configured to quantify other aspects of cache performance, such as cache hit rate, cache miss rate, request latency, and so on. Thecache logic 210 may be configured to determine and/or monitor aspects of cache performance. Thecache logic 210 may be configured to determine a cache hit rate by, inter alia, monitoring a quantity ofrequests 202 that result in cache hits, monitoring a quantity ofrequests 202 that result in cache misses, and/or the like. - In some implementations, the one or
more metrics 212 are configured to quantify aspects of cache and/or prefetch performance for respective regions of the address space. The cache logic 210 (and/or prefetch logic 230) can determine and/or monitor prefetch performance within regions of the address space covered by respective entries of themetadata 122. Themetrics 212 may, therefore, quantify the degree to which the workloads within respective regions of the address space are suitable for prefetching. Theprefetch logic 230 may utilize themetrics 212 to determine whether to implement prefetching within the respective address regions, the prefetch degree for respective address regions, the amount ofmetadata 122 to maintain for the respective address regions, and/or the like. - The cache logic 210 (and/or partition logic 310) can utilize the one or
more metrics 212 to dynamically partition thecache memory 120. More specifically, thecache logic 210 may utilize themetrics 212 to determine, tune, adapt, and/or otherwise manage the amount ofcache memory 120 allocated for storage of themetadata 122 pertaining to the address space (the amount ofcache memory 120 allocated to the first portion 124) and/or the amount ofcache memory 120 allocated for storage of cache data 128 (the amount ofcache memory 120 allocated to the second portion 126). The cache logic 210 (and/or partition logic 310) may be configured to: a) increase the quantity ofcache units 220 allocated to thefirst portion 124 when one or more of themetrics 212 exceed a first threshold (thereby decreasing the quantity ofcache units 220 allocated for storage ofcache data 128 within the second portion 126), or b) decrease the quantity ofcache units 220 allocated to thefirst portion 124 when one or more of themetrics 212 is below a second threshold (thereby increasing the quantity ofcache units 220 allocated for storage ofcache data 128 within the second portion 126). Since themetrics 212 can be configured to quantify prefetch performance, the adjustments implemented by thecache logic 210 can dynamically allocate cache memory resources between the first portion 124 (metadata 122) and the second portion 126 (cache data 128) based on the degree to which workload(s) being serviced by thecache 110 are suitable for prefetching. Thecache logic 210 can increase the amount ofcache memory 120 allocated for themetadata 122 under workload conditions that are suitable for prefetching and can decrease (or eliminate) the allocation under workload conditions that are not suitable for prefetching, thereby increasing the amount of available cache capacity when servicing unsuitable workloads. - Increasing the quantity of
cache units 220 allocated to thefirst portion 124 may include assigning or allocating one ormore cache units 220 of thesecond portion 126 to thefirst portion 124. As disclosed herein, allocating acache unit 220 to thefirst portion 124 may include configuring the metadata mapping scheme 314 (and/or metadata mapping logic) to include thecache unit 220, providing theprefetch logic 230 with access to the cache unit 220 (orCMU 320 thereof), and/or otherwise making theCMU 320 of thecache unit 220 available for storage ofmetadata 122 pertaining to the address space. - Allocating a
cache unit 220 to thefirst portion 124 of thecache memory 120 may further include deallocating thecache unit 220 from thesecond portion 126. Deallocating acache unit 220 from thesecond portion 126 of thecache memory 120 may include configuring the address mapping scheme 316 (and/or address mapping logic) to remove or exclude thecache unit 220. Theaddress mapping scheme 316 may be configured to dereference thecache unit 220 such that thecache unit 220 is excluded from theC cache units 220 included in thesecond portion 126. Theaddress mapping scheme 316 may be modified to remove thecache unit 220 from an index or other mechanism by which addresses 204 and/or addresstags 206 are associated with cache units 220 (e.g., by disabling thecache tag 326 of the cache unit 220). Deallocating acache unit 220 from thesecond portion 126 may further include evictingcache data 128 from thecache unit 220, setting a validity flag of thecache metadata 322 to “false,” and/or the like. In some implementations, deallocating acache unit 220 from thesecond portion 126 further includes identifying “dirty”cache data 128 within the cache unit 220 (based, at least in part, oncache metadata 322 associated with thecache data 128, such as “dirty” indicators) and flushing and/or destaging the identified cache data 128 (if any) to a backing memory, such as the memory 108 (e.g., writing the identifiedcache data 128 back to the memory 108). - The cache logic 210 (and/or partition logic 310) may be configured to preserve cache state when repartitioning the
cache memory 120 to increase the size of thefirst portion 124 and/or decrease the size of thesecond portion 126. Thecache logic 210 may preserve cache state when reducing the amount ofcache memory 120 allocated to thesecond portion 126 by, inter alia, compactingcache data 128 stored within thesecond portion 126 for storage withinfewer cache units 220. AllocatingR cache units 220 from thesecond portion 126 of thecache memory 120 to thefirst portion 124 may include compactingcache data 128 stored within thesecond portion 126 of thecache memory 120 from current size corresponding to C1 cache units 220 to a compacted size corresponding to C2 cache units 220, where C2=C1−R. Compacting thecache data 128 may include evicting a first subset of thecache data 128 currently stored within thesecond portion 126 of thecache memory 120, the first subset including an amount ofcache data 128 equivalent to R cache units 220 (and/orcache data 128 stored withinR cache units 220 currently allocated to the second portion 126). The first subset of thecache data 128 may be selected for eviction based on a suitable eviction or replacement policy and/or criteria, such as FIFO, LIFO, LRU, TLRU, MRU, LFU, random replacement, or the like. Evictingcache data 128 from acache unit 220 may make thecache unit 220 available to store other cache data 128 (transition thecache unit 128 from “occupied” to “available” or empty). The cache logic 210 (and/or partition logic 310) may be further configured to move remainingcache data 128 stored withincache units 220 that are to be allocated to the first portion 124 (if any) toavailable cache units 220 that are to remain allocated to thesecond portion 126. - In some implementations, increasing the size of the
first portion 124 of acache memory 120 that includesX cache units 220 fromM cache units 220 to M+R cache units 220 may include: a) selectingcache units 220 of thesecond portion 126 to allocate to the first portion 124 (e.g., selectingR cache units 220 for reallocation), b)evicting cache data 128 fromR cache units 220 of theC cache units 220 currently allocated to the second portion 126 (where C=M−X), c) movingcache data 128 stored within the selected cache units 220 (if any) toavailable cache units 220 of the C−R cache units 220 to remain allocated to thesecond portion 126, and d) allocating the R selectedcache units 220 from thesecond portion 126 to thefirst portion 124. Thecache units 220 selected for eviction may be different from thecache units 220 selected for reallocation. As disclosed herein, thecache logic 110 may selectcache units 220 for eviction based on an eviction or replacement policy. By contrast, the cache logic 110 (and/or partition logic 310) may selectcache units 220 to reallocate from thesecond portion 126 to the first portion 124 (or vice versa) based on separate, independent criteria. Thecache units 220 to reallocate from thesecond portion 126 to the first portion 124 (or vice versa) may be selected in accordance with apartition scheme 312, as disclosed herein. Thepartition scheme 312 may define rules, schemes, logic, and/or other criteria by whichcache units 220 are divided (and/or dynamically allocated) between thefirst portion 124 and thesecond portion 126. Thepartition scheme 312 may divide thecache memory 120 in any suitable pattern or scheme including, but not limited to: a sequential scheme, a way-based scheme, a set-based scheme, and/or the like. Decreasing the quantity ofcache units 220 allocated to thefirst portion 124 may include increasing the quantity ofcache units 220 allocated to the second portion 126 (e.g., increasing the available cache capacity). Decreasing the size of thefirst portion 124 may include assigning or allocating one ormore cache units 220 from thefirst portion 124 to thesecond portion 126. Allocating acache unit 220 to thesecond portion 126 may include removing thecache unit 220 from themetadata mapping scheme 314, such that thecache unit 220 is no longer included in the group ofM cache units 220 available for storage of the metadata 122 (e.g., modifying the metadata address scheme MA). Allocating thecache unit 220 to thesecond portion 126 may further include modifying theaddress mapping scheme 316 to reference the cache unit 220 (e.g., including thecache unit 220 in the group ofC cache units 220 available for storage of cache data 128). Theaddress mapping scheme 316 may be modified to enableaddresses 204 and/or addresstags 206 to map and/or be assigned to thecache unit 220 by, inter alia, enabling thecache tag 326 of thecache unit 220. Decreasing the quantity ofcache units 220 allocated to thefirst portion 124 may decrease the amount ofcache memory 120 available for storage of themetadata 122. Decreasing the size of thefirst portion 124 may, therefore, include compacting themetadata 122 for storage within a smaller amount ofcache memory 120. Themetadata 122 may be compacted for storage within a smaller memory range (e.g., from a first size M1 to a second, smaller size M2). Compacting themetadata 122 may include removing a portion of themetadata 122, such as one or more entries of themetadata 122. The portion of themetadata 122 may be selected based on a removal criterion, such as an age criterion (oldest removed first, youngest removed first, or the like), least recently accessed criterion, least frequently accessed criterion, and/or the like. - Alternatively, or in addition, portions of the
metadata 122 may be selected for removal based, at least in part, on one ormore metrics 212. Themetadata 122 may include a plurality of entries, each entry including access information pertaining to a respective region of the address space. Theprefetch logic 230 may utilize respective entries of themetadata 122 to implement prefetch operations within the address regions covered by the respective entries. The one ormore metrics 212 may be configured to quantify prefetch performance within the address regions covered by the respective entries of themetadata 122. Compacting themetadata 122 may include selecting entries of themetadata 122 for removal based, at least in part, on prefetch performance within the address regions covered by the entries, as quantified by themetrics 212. In some implementations, entries of themetadata 122 in which prefetch performance is below a threshold may be removed (and/or the amount of memory capacity allocated to the entries may be reduced). Alternatively, entries of themetadata 122 exhibiting higher prefetch performance may be retained, whereas entries exhibiting lower prefetch performance may be removed (e.g., the R lowest-performing entries of themetadata 122 may be selected for removal). Compacting the metadata may, therefore, include removingmetadata 122 from one ormore cache units 220 and/or moving metadata 122 (and/or entries of the metadata 122) fromcache units 220 being reallocated to thesecond portion 126 to the remainingcache units 220 allocated to thefirst portion 124. -
FIG. 4-1 illustrates another example 400 of an apparatus for implementing adaptive cache partitioning. Theapparatus 400 includes acache 110 that is configured to cache data pertaining to an address space associated with amemory 108. Thecache 110 may include and/or be coupled to one or more interconnects. In theFIG. 4-1 example, thecache 110 includes and/or is coupled to afirst interface 215A configured to couple thecache 110 to afirst interconnect 105A and asecond interface 215B configured to couple thecache 110 to asecond interconnect 105B. Thecache 110 may be configured to servicerequests 202 pertaining toaddresses 204 of the address space from one ormore requestors 201. Thecache 110 may service therequests 202 by use ofcache memory 120, which may include loading data associated withaddresses 204 of the address space intransfer operations 203. The transfer operations may be implemented in response to cache misses, prefetch operations, and/or the like. Thecache 110 may include and/or be coupled to aninterface 215, which may be configured to couple the cache 110 (and/or cache logic 210) to one or more interconnects, such asinterconnects 105A and/or 105B. - In the
FIG. 4-1 example, thecache memory 120 includes a plurality ofcache units 220, eachcache unit 220 including and/or corresponding to a respective cache line. Thecache units 220 may be arranged into a plurality of sets 430 (e.g., sets 430-1 through 430-S). Thesets 430 may be N-way associative; each set 430 may includeN ways 420, eachway 420 including and/or corresponding to a respective cache unit 220 (a respective cache line). As illustrated, each set 430 may include N ways 420-1 through 420-N, eachway 420 including and/or corresponding to arespective cache unit 220. Thecache memory 120 may include X cache units 220 (or X cache lines), where X=S N. - The address mapping scheme 316 (or address mapping logic) implemented by the
cache logic 210 may be configured to divideaddresses 204 into an offset 205, set region (set tag 406), andaddress tag 206. The offset 205 may correspond to a capacity of the cache units 220 (e.g., a capacity of the CMU 320), as disclosed herein. Theaddress mapping scheme 316 may utilize settags 406 toassociate addresses 204 withrespective sets 430. In some aspects, theaddress mapping scheme 316 includes a set mapping scheme by which set tags 406 are mapped to one of a group of available sets (SC), SC {420-1, . . . , 420-S}, as follows SI=fS(TS, SC), where fs is a set mapping function, TS is aset tag 406, and SI is the index or other identifier of the selectedset 430. Theaddress mapping scheme 316 may further include a way mapping scheme by which addresstags 206 are mapped to one of theN ways 420 of the selected set 430 (e.g., by comparing theaddress tag 206 to cache tags 326 of the ways 420). - The
cache logic 210 can include, implement, and/or be coupled topartition logic 310, which may be configured to partition thecache memory 120 into afirst portion 124 andsecond portion 126. As disclosed herein, thefirst portion 124 may be allocated for storage ofmetadata 122 pertaining to the address space, and thesecond portion 126 may be allocated for storage of cache data 128 (may be allocated as available cache capacity). Thecache logic 210 may allocatecache memory 120 between thefirst portion 124 and thesecond portion 126 in accordance with apartition scheme 312. Thepartition scheme 312 may specify an amount ofcache memory 120 to be allocated to the metadata 122 (the first portion 124). Thepartition scheme 312 may also specify a manner in whichcache units 220 are allocated to thefirst portion 124 and/orsecond portion 126. In theFIG. 4-1 example, thecache logic 210 is configured to partition thecache memory 120 by way 420 (may implement way-based or way partition scheme 312-1). The way partition scheme 312-1 may specify that thefirst portion 124 is allocated zero ormore ways 420 within zero ormore sets 430 of thecache memory 120. In some implementations, the way partition scheme 312-1 specifies that thefirst portion 124 is allocated zero ormore ways 420 within each set 430 of thecache memory 120. - In some implementations, the
cache memory 120 includes a plurality of banks (e.g., SRAM banks). Theways 420 of thecache memory 120 may be organized within respective banks. More specifically, theways 420 of each set 430 may be split across multiple banks of thecache memory 120. In some examples, eachway 420 may be implemented by a respective one of the banks: way 420-1 of each set 430-1 through 430-S may be implemented by a first bank, way 420-2 of each set 430-1 through 430-S may be implemented by a second bank, and so on, with way 420-N of each set 430-1 through 430-S being implemented by an Nth bank of thecache memory 120. The banks of thecache memory 120 may include separate memory blocks. Thefirst portion 124 allocated to themetadata 122 may, therefore, include zero or more banks (or blocks) of thecache memory 120. Themetadata mapping scheme 314 may address banks allocated to thefirst portion 124 as a linear (or flat) chunk of memory. Themetadata mapping scheme 314 may, therefore, enable themetadata 122 to be arranged and/or organized in any suitable manner (e.g., as specified by a prefetcher,prefetcher logic 230, or the like). - In the
FIG. 4-1 example, the partition scheme 312-1 allocates twoways 420 within each set 430 to theprefetch logic 230. Thefirst portion 124 of thecache memory 120 allocated for storage of themetadata 122 may, therefore, include ways 420-1 and 420-2 within each set 430-1 through 430-S (may includeM cache units 220, where M=2·S. Thesecond portion 126 of thecache memory 120 available for storage ofcache data 128 may include N−2 ways within each set 430-1 through 430-S (may includeC cache units 220 where M=(N−2)·S. InFIGS. 4-1 through 4-3 , ways 420 (and/or cache units 220) that are allocated to thefirst portion 124 are illustrated with crosshatching to distinguish them fromways 420 that are allocated to thesecond portion 126. As illustrated, thefirst portion 124 may include first portions 124-1 through 124-S within each set 430-1 through 430-S of thecache memory 120, and thesecond portion 126 may include second portions 126-1 through 126-S within sets 430-1 through 430-S. - Allocating a cache unit 220 (or way 420) to the
first portion 124 may include configuring theaddress mapping scheme 316 to disable or ignore thecache unit 220. InFIG. 4-1 , twoways 420 of each set 430 are allocated to thefirst portion 124 and theaddress mapping scheme 316 is adapted to disable or ignore ways 420-1 and 420-2 of each set 430 (e.g., by disabling cache tags 326-1 and 326-2 of the corresponding cache units 220-1 and 220-2). In theFIG. 4-1 example, the quantity ofsets 430 available for storage of cache data 128 (SC) may be substantially unchanged. As illustrated, thesecond portion 126 of thecache memory 120 may include S sets 430, each including N−2 ways 420 (ways 420-3 through 420-N). Theaddress mapping scheme 316 may, therefore, distributeaddresses 204 between the S sets 430 of the cache memory 120 (byset tag 406 or the like). The way mapping scheme implemented by the cache logic 210 (and/or address mapping scheme 316) may be adapted to modify the associativity of thesets 430. In theFIG. 4-1 example, theaddress mapping scheme 316 manages thesets 430 as [N−2]-way associative rather than N-way associative. More specifically, theaddress mapping scheme 316 maps N−2addresses tags 206 torespective sets 430 rather than N address tags 206. - As disclosed herein, allocating a
way 420 to thefirst portion 124 may include evictingcache data 128 from theway 420.Cache data 128 may be selected for eviction fromrespective sets 430 in accordance with an eviction or replacement policy, as disclosed herein. AllocatingR ways 420 of aset 430 to thefirst portion 124 may include compacting thecache data 128 stored within the set 430 from a capacity ofN cache units 220 toN-R cache units 220. Thecache logic 210 can selectcache data 128 to retain withinrespective sets 430 and move the selectedcache data 128 to the N-R ways 429 of therespective sets 430 that are to remain allocated to thesecond portion 126. In theFIG. 4-1 example, allocating ways 420-1 and 420-2 of each set 430 to thefirst portion 124 may include compacting thecache data 128 within each set 430 from a capacity ofN cache units 220 to a capacity of N−2cache units 220 by, inter alia, evictingcache data 128 from a first group of twoways 420 of the set 430 (such that a second group of N−2ways 420 of theset 430 are retained), movingcache data 128 stored within the second group ofways 420 to ways 420-3 through 430-N of the set 430 (if necessary), and assigning says 420-1 and 420-2 to thefirst portion 124. - The
metadata 122 maintained within thefirst portion 124 of thecache memory 120 may be accessed in accordance with ametadata mapping scheme 314. In theFIG. 4-1 example, themetadata mapping scheme 314 may define metadata address space (MA) that includes ways 420-1 and 420-2 of each set 430-1 through 430-S. For example, the metadata address space (MA) may include an address range {0, . . . , (R·U·S)−1}, where R is the number ofways 420 allocated to thefirst portion 124 within each of the S sets 430 and U is the capacity of each way 420 (in terms of addressable data units). Alternatively, the metadata address space (MA) may define addresses corresponding to indexes and/or offsets of respective ways 420 (or cache units 220) of thefirst portion 124, as follows {0, . . . , (R·S)−1}. - The cache logic 210 (and/or prefetch logic 230) can be configured to determine and/or monitor one or
more metrics 212. Themetrics 212 may be configured to quantify cache and/or prefetch performance, as disclosed herein. The cache logic 210 (and/or partition logic 310) can adapt the way partition scheme 312-1 based, at least in part, on one or more of themetrics 212. Thecache logic 210 can adapt the way partition scheme 312-1 to: increase the size of the first portion 124 (and decrease the size of the second portion 126) when one or more of themetrics 212 exceeds a first threshold or decrease the size of the first portion 124 (and increase the size of the second portion 126) when one or more of themetrics 212 is below a second threshold. Increasing the size of thefirst portion 124 may include increasing the number ofways 420 allocated tofirst portion 124 within each set 430 of thecache memory 120. Decreasing the size of thefirst portion 124 may include decreasing the number ofways 420 allocated to thefirst portion 124 within each set 430 of thecache memory 120. -
FIG. 4-2 illustrates an example 401 in which the number ofways 420 allocated for storage ofmetadata 122 pertaining to the address space is increased (e.g., from twoways 420 within each set 430 to threeways 420 within each set 430). The amount ofcache memory 120 allocated to thefirst portion 124 may be increased in response to determining and/or monitoring the metrics 212 (e.g., in response to prefetch performance quantified by themetrics 212 exceeding a first threshold). As illustrated, thefirst portion 124 allocated for storage of themetadata 122 may include ways 420-1 through 420-3 of each set 430-1 through 430-S. The capacity available for storage of themetadata 122 may increase to M=3·S·U, where U is the capacity of a cache unit 220 (or CMU 320) in terms of addressable data units. Allocating the way 420-3 to thefirst portion 124 may include modifying themetadata mapping scheme 314 to reference way 420-3 within each set 430 (e.g., define a metadata address scheme including addresses 0 through (3·S·U)−1, way indexes 0 through (3·S)−1, or the like). - As illustrated in
FIG. 4-2 , increasing the size of thefirst portion 124 may result in decreasing the amount ofcache memory 120 allocated for storage of cache data 128 (decrease the size of the second portion 126). Allocating the way 420-3 of each set 430 to thefirst portion 124 may include compactingcache data 128 stored within each set 430 into N−3ways 420, as disclosed herein (by selectingcache data 128 within each set 430 for eviction and moving data to retain within each [N−3]associative set 430 to ways 420-3 through 420-N of eachset 430. Allocating the way 420-3 may further include modifying theaddress mapping scheme 316 toassociate addresses 204 with [N−3]-wayassociative sets 430 rather than [N−2] or N-way associative sets 430. Allocating the way 420-3 of each set 430 to thefirst portion 124 may include disabling the cache tag 326-3 of way 420-3 within eachset 430. -
FIG. 4-3 illustrates another example 402 in which the number ofways 420 allocated for storage ofmetadata 122 pertaining to the address space is decreased (e.g., to oneway 420 within each set 430). The amount ofcache memory 120 allocated to thefirst portion 124 may be reduced in response to determining and/or monitoring the one or more metrics 212 (e.g., in response to prefetch performance quantified by themetrics 212 falling below a second threshold). As illustrated, thefirst portion 124 allocated for storage of themetadata 122 may include a single way 420-1 within each set 430-1 through 430-S. The capacity available for storage of themetadata 122 may decrease to M=S·U, where U is the capacity of a cache unit 220 (or CMU 320) in terms of addressable data units (or S cache units 220). Reducing the size of thefirst portion 124 may include compacting themetadata 122 for storage within a reduced number ofways 420, as disclosed herein (e.g., by portions of themetadata 122, one or more entries of themetadata 122, and/or the like). Reducing the size of thefirst portion 124 may further include modifying themetadata mapping scheme 314 to reference the smaller number ofways 420 allocated to thefirst portion 124. InFIG. 4-3 , themetadata mapping scheme 314 may reference ways 420-1 within eachset 430 and/or define a metadata address scheme including addresses 0 through (S·U)−1, way indexes 0 through (S)−1, or the like. - As illustrated in
FIG. 4-3 , decreasing the amount ofcache memory 120 allocated to thefirst portion 124 may result in increasing the amount ofcache memory 120 allocated for storage ofcache data 128 within thesecond portion 126. In theFIG. 4-3 example, ways 420-2 and 420-3 of each set are allocated to thesecond portion 126. Allocating ways 420-2 and 420-3 to thesecond portion 126 may include modifying theaddress mapping scheme 316 to include ways 420-2 and 420-3 of each set 430 (e.g., by enabling cache tags 326-2 and 326-3 of the ways 420-2 and 420-3). -
FIG. 5-1 illustrates another example 500 of an apparatus for implementing adaptive cache partitioning. Theapparatus 500 includes acache 110 that is configured to cache data pertaining to an address space associated with amemory 108. Thecache 110 may include and/or be coupled to aninterface 215, which may be configured to couple the cache 110 (and/or cache logic 210) to an interconnect, such as theinterconnect 105 for ahost 102. Thecache 110 may be configured to servicerequests 202 pertaining toaddresses 204 of the address space from arequestor 201. Thecache 110 may service therequests 202 by use ofcache memory 120, which may include loading data associated withaddresses 204 of the address space inrespective transfer operations 203. The transfer operations may be implemented in response to cache misses, prefetch operations, and/or the like. Thememory 108,cache 110, andrequestor 201 may be communicatively coupled through aninterconnect 105. - As illustrated, the
cache memory 120 may include a plurality ofcache units 220, which may be organized into a plurality of N-way associative sets 430 (e.g., S sets 430-1 through 430-S, each including N ways 420-1 through 420-N). Thecache logic 210 can implement, include, and/or be coupled topartition logic 310 configured to partition thecache memory 120 into afirst portion 124 and asecond portion 126. Thecache logic 210 partition and/or divide thecache memory 120 in accordance with a partition scheme 312-1, which may specify an amount ofcache memory 120 to allocate to storage of the metadata 122 (the first portion 124), cache data 128 (the second portion 126), and/or the like. The amount ofcache memory 120 allocated to thefirst portion 124 may be based, at least in part, on one ormore metrics 212 that, inter alia, quantify prefetch performance, as disclosed herein. - In the
FIG. 5-1 example, the cache logic 210 (and/or partition logic 310) partitions thecache memory 120 by set and/or in accordance with a set or set-based partition scheme 312-2. More specifically, thecache logic 210 can allocate zero or more ofsets 430 of thecache memory 120 for storage ofmetadata 122 pertaining to the address space (the first portion 124) and one or more of thesets 430 for storage of cache data 128 (the second portion 126). InFIG. 5-1 , thefirst portion 124 of thecache memory 120 allocated to theprefetch logic 230 includes two sets 430 (e.g., sets 430-1 and 430-2) and thesecond portion 126 of thecache memory 120 allocated for use as available cache capacity includes S−2 sets 430 (e.g., sets 430-3 through 430-S). InFIGS. 5-1 sets 430 allocated to thefirst portion 124 are highlighted with a crosshatch fill pattern. - As disclosed above, the address mapping scheme 316 (or address mapping logic) implemented by the
cache logic 210 may be configured to mapaddresses 204 tocache units 220 by, inter alia, associating theaddresses 204 withrespective sets 430, and matchingaddress tags 206 of theaddresses 204 to cache tags 326 of the associated sets 430. Allocating one ormore sets 430 to thefirst portion 124, however, may reduce the number ofsets 430 included in the second portion 126 (reduce the number ofsets 430 to which addresses 204 may be mapped). Allocating R sets 430 for metadata storage may reduce the number ofavailable sets 430 to S-R (or S−2 in theFIG. 5-1 example). Allocating R sets to thefirst portion 124 may include modifying the address mapping scheme 316 (and/or set mapping scheme thereof) to distributeaddresses 204 between a group (SC) of S-R sets 430, SC {420-R, . . . , 420-S} or {420-1, . . . , 420-[S-R]}, as follows SI=fS(TS,SC), where fs is a set mapping function, TS is aset tag 406, and SI is the index or other identifier of a selected one of the available sets 430 (SC). In some implementations, theaddress mapping scheme 316 modifies the manner in which addresses 204 are divided (and/or the size of respective address regions). Theaddress mapping scheme 316 may adapt the number of bits included in set tags 406-1 based, at least in part, on the quantity ofsets 430 allocated to thefirst portion 124. Theaddress mapping scheme 316 may, for example, reduce the number of bits included in set tags 406-1 by log2 R, where R is the number ofsets 430 allocated to store the metadata 122 (by one bit in theFIG. 5-1 example). - The
metadata mapping scheme 314 may be configured to associate metadata 122 (and/or metadata addresses) withcache memory 120 allocated to thefirst portion 124. Themetadata mapping scheme 314 may define a range of metadata addresses 0 through (R·N·U)−1 or indexes 0 through R·N, where R is the number ofsets 430 allocated to themetadata 122, Nis the number ofways 420 included in eachset 430, and U is the capacity of each way 420 (and/or corresponding cache unit 220). - The
cache logic 210 can be further configured to adapt the set partition scheme 312-2 based, at least in part, on one ormore metrics 212 pertaining to prefetch performance. Thecache logic 210 can increase the number ofsets 430 allocated for themetadata 122 when one or more of themetrics 212 exceeds a first threshold and can decrease the number ofsets 430 allocated for the metadata 122 (and increase the number ofsets 430 available to store cache data 128) when one or more of themetrics 212 falls below a second threshold. -
FIG. 5-2 illustrates an example 501 in which the amount ofcache memory 120 allocated to themetadata 122 is increased as compared to the example 500 illustrated inFIG. 5-1 . The size of thefirst portion 124 may be increased based on prefetch performance within one or more regions of the address space. InFIG. 5-2 , the quantity ofsets 430 included in thefirst portion 124 may be increased to four (e.g., increased from sets 430-1 through 430-2 to sets 430-1 through 430-4). Allocating additional sets 430-3 and 430-4 for metadata storage may include adapting theaddress mapping scheme 316 to distribute addresses between S−4 sets 430 (as opposed to S−2 or S sets 430). Theaddress mapping scheme 316 may be modified to reduce the number of bits included in set tags 406-2 by two bits (or a single bit as compared to the set tags 406-1 of theFIG. 5-1 example). Allocating aset 430 to thefirst portion 124 may include evicting cache data from theset 430, disabling cache tags 326-1 through 326-N of eachway 420 of theset 430, and so on. Allocating anadditional set 430 to thefirst portion 124 may further include adapting themetadata mapping scheme 314 to include theadditional set 430. In theFIG. 5-2 example, themetadata mapping scheme 314 may be adapted to define a range of metadata addresses 0 through (4·N·U)−1 or indexes 0 through 4·N. -
FIG. 5-3 illustrates an example 502 in which the amount ofcache memory 120 allocated to themetadata 122 is decreased as compared to example 501 ofFIG. 5-2 (and example 500 ofFIG. 5-1 ). The size of thefirst portion 124 may be decreased based on prefetch performance within one or more regions of the address space, as disclosed herein. InFIG. 5-3 , the quantity ofsets 430 included in thefirst portion 124 may be decreased to one (e.g., decreased to a single set 430-1). Reducing the size of thefirst portion 124 may, therefore, including allocating additional sets 430-4 through 430-2 for storage of cache data 128 (to the second portion 126). Allocating one ormore sets 430 to thesecond portion 126 may include compacting themetadata 122 and storing compacted metadata within a reduced number ofcache units 220. In theFIG. 5-3 example, themetadata 122 may be compacted for storage withinN cache units 220. Compacting themetadata 122 may include removing portions of themetadata 122, such as one or more metadata entries. The entries may be selected based on any suitable criteria including, but not limited to: age criteria (oldest removed first, youngest removed first, or the like), least recently accessed criteria, least frequently accessed criteria, prefetch performance criteria (e.g., prefetch performance within address regions covered by respective entries of the metadata 122), and/or the like. Themetadata mapping scheme 314 may be modified to decrease the number ofcache units 220 referenced thereby (reduce the metadata address range to N·U or metadata index range to N), and so on. Theaddress mapping scheme 316 may be modified to increase the quantity ofavailable sets 430 toS− 1. Theaddress mapping scheme 316 may be configured to distributeaddresses 204 between a larger number ofsets 430 by, inter alia, increasing the number of bits included in set tags 406-2 of theaddresses 204. Allocating aset 430 for cache data storage may further include enabling cache tags 326 of eachway 420 of the set 430 (e.g., enabling cache tags 326-1 through 326-N of each way 420-1 through 420-N of theset 430 being allocated for storage of cache data 128). - Example methods are described in this section with reference to the flowcharts and flow diagrams of
FIGS. 6 through 9 . These descriptions reference components, entities, and other aspects depicted inFIGS. 1-1 through 5-3 by way of example only.FIG. 6 illustrates with a flow diagram 600 example methods for an apparatus to implement adaptive cache partitioning. The flow diagram 600 includesblocks 602 through 606. In some implementations, a host device 102 (and/or component thereof) can perform one or more operations of the flow diagram 600 (and/or operations of the other flow diagrams described herein) to realize at least one method for adaptive cache partitioning. Alternatively, or in addition, one or more of the operations may be performed by a memory, memory controller, PIM logic,cache 110,cache memory 120,cache logic 210,prefetch logic 230, an embedded processor, and/or the like. - At 602, a first portion of the
cache memory 120 of acache 110 is allocated for storage ofmetadata 122 pertaining to an address space associated with a backing memory of the cache 110 (e.g., the address space associated with the memory 108). For example, allocating the first portion may include partitioning thecache memory 120 into afirst portion 124 and asecond portion 126. Thefirst portion 124 may be allocated for storage of themetadata 122, and thesecond portion 126 may be allocated for storage of cache data 128 (may be allocated as available cache capacity). Themetadata 122 maintained within the first portion of thecache memory 120 may include information pertaining to accesses to respective addresses and/or regions of the address space. A prefetcher and/orprefetch logic 230 of thecache 110 may utilize themetadata 122 to predictaddresses 204 ofupcoming requests 202 and prefetch data associated with the predicted addresses 204 into thesecond portion 126 of thecache memory 120. Themetadata 122 can include any suitable information pertaining to the address space including, but not limited to: a sequence of previously requestedaddresses 204 or address offsets, an address history, an address history table, an index table, access frequencies forrespective addresses 204, access counts (e.g., accesses within respective windows), access time(s), last access time(s), and/or the like. In some aspects, themetadata 122 includes a plurality of entries, each entry including information pertaining to a respective region of the address space. Themetadata 122 pertaining to respective regions of the address space may be used to, inter alia, determine address access patterns within the respective regions, which may be used to inform prefetch operations within the respective regions. - The
cache memory 120 may be partitioned into the first portion 124 (e.g., a first partition) and the second portion 126 (e.g., second partition) according to anysuitable partition scheme 312, such as a sequential scheme, a way-based partition scheme 312-1, a set-based partition scheme 312-2, and/or the like. Thefirst portion 124 may include any suitable portion, quantity, and/or amount of the cache memory resources of thecache memory 120 including, but not limited to, zero or more:cache units 220,CMU 320, cache blocks, cache lines, hardware cache lines, ways 420 (and/or corresponding cache units 220), sets 430, rows, columns, banks, and/or the like. In some implementations, thecache logic 210 allocatesM cache units 220 to thefirst portion 124 and allocatesX-M cache units 220 to thesecond portion 126 as available cache capacity (where X is the number ofavailable cache units 220 included in the cache memory 120). Allocating theM cache units 220 may include allocating cache units 220-1 through 220−M to the first portion 124 (e.g., according to a sequential scheme), allocatingcache units 220 within ways W1 of each set 430 of thecache memory 120, where -
- and S is the
sets 430 included in the cache memory 120 (e.g., according to a way-based partition scheme 312-1), allocatingcache units 220 withinsets 1 through E1, where -
- and N is the number of
ways 420 included in each set 430 of the cache memory 120 (e.g., according to a set-based partition scheme 312-2), and/or the like. - Allocating the
M cache units 220 may include flushing and/ordestaging cache data 128 from theM cache units 220, which may include writingdirty cache data 128 stored within thecache units 220 to thememory 108, and/or the like. Allocating theM cache units 220 may further include configuring anaddress mapping scheme 316 by which addresses 204 are mapped torespective cache units 220, sets 430, and/orways 420 to disable, remove, and/or ignore theM cache units 220, such that theaddresses 204 do not map to the M cache units 220 (and theM cache units 220 are not available for storage of cache data 128). In some implementations, thecache logic 210 disables cache tags 326 of theM cache units 220 allocated to the first portion at 602. - In one example, the
cache logic 210 partitions thecache memory 120 by way 420 (e.g., by allocatingways 420 withinrespective sets 430 of the cache memory 120). In a way or way-based partition scheme 312-1. In a way partition scheme 312-1, allocating theM cache units 220 to thefirst portion 124 may include allocating W1 ways 420 within each of S sets 430-1 through 430-S of thecache memory 120 to thefirst portion 124, where -
- such that W2 ways 420 within each
set 430 are allocated to thesecond portion 126, where W2=S−W1 or -
- In another example, the
cache logic 210 may implement a set-based partition scheme 312-2 by which thecache memory 120 is divided byset 430. AllocatingM cache units 220 to thefirst portion 124 per a set-based partition scheme 312-2 may include allocating E1 sets 430 to thefirst portion 124, where -
- and N is the number of cache units 220 (ways 420) included in each set 430 such that E2 sets 430 are allocated to the
second portion 126, where E2=S−E1 or -
- Allocating the
M cache units 220 for metadata storage may further include configuring ametadata mapping scheme 314 to provide access to memory storage capacity of theM cache units 220. Themetadata mapping scheme 314 implemented by thecache logic 210 may provide access to memory storage capacity of theM cache units 220 included in thefirst portion 124 of thecache memory 120. Themetadata mapping scheme 314 may define metadata address space (MA), MA ∈{0, . . . , (M·U)−1}, where U is the capacity of a cache unit 220 (capacity of a CMU 320). Alternatively, or in addition, 220, the metadata address space (MA) may define a range of cache unit indexes MI, each corresponding to a respective one of theM cache units 220 allocated to thefirst portion 124, MA∈{0, . . . , M−1}. Although examples of metadata mapping schemes 314 (and/or metadata addressing and/or access schemes) are described herein, the disclosure is not limited in this regard and could be adapted to provide access tocache memory 120 allocated to thefirst portion 124 through any suitable mechanism or technique. - At 604, data associated with the address space is written to the
second portion 126 of thecache memory 120. For example, thecache logic 210 may load the data into thecache memory 120 in response torequests 202 pertaining toaddresses 204 that trigger cache misses (e.g., addresses 204 that have not yet been loaded into thesecond portion 126 of the cache memory 120). Alternatively, or in addition, the cache logic 210 (and/or prefetch logic 230) may prefetchcache data 128 into thesecond portion 126 of thecache memory 120 at 604. Theprefetcher logic 230 may utilize themetadata 122 pertaining to the address space to predictaddresses 204 ofupcoming requests 202 and configure thecache logic 210 to prefetchcache data 128 corresponding to the predicted addresses 204 beforerequests 202 pertaining to the predicted addresses 204 are received.Prefetched cache data 128 may be transferred into the relativelyfaster cache memory 120 from the relativelyslower memory 108 intransfer operations 203.Transfer operations 203 to prefetchcache data 128 may be implemented as background operations (e.g., during idle periods during which thecache 110 is not servicing requests 202). - In some aspects, the cache logic 210 (and/or prefetch logic 230) may be further configured to determine and/or monitor one or
more metrics 212 pertaining to thecache 110 at 604. Themetrics 212 may be configured to quantify any suitable aspect of cache and/or prefetch performance including, but not limited to: request latency, average request latency, cache performance, cache hit rate, cache miss rate, prefetch performance, prefetch hit rate, prefetch miss rate, quantity of useful prefetches, quantity of bad prefetches, ratio of useful prefetches to bad prefetches, and/or the like. - In some implementations, the cache logic 210 (and/or prefetch logic 230) may be further configured to record, update, and/or otherwise maintain
metadata 122 pertaining to the address space within the first portion of thecache memory 120 allocated at 604 (e.g., within the first portion of the cache memory 120). Themetadata 122 may be accessed by and/or through themetadata mapping scheme 314, as disclosed herein. - At 606, the size of the first portion of the
cache memory 120 allocated for themetadata 122 pertaining to the address space is modified based, at least in part, on one ormore metrics 212 pertaining tocache data 128 prefetched into the second portion of the cache memory. The amount ofcache memory 120 allocated to thefirst portion 124 may be increased when one or more of themetrics 212 exceeds a first threshold. At 606, the size of thefirst portion 124 may be incrementally and/or periodically increased while prefetch performance remains above the first threshold and/or until a maximum or upper bound is reached. Conversely, at 606, the amount of cache memory allocated to thefirst portion 124 may be decreased when one or more of themetrics 212 is below a second threshold. At 606, the size of thefirst portion 124 may be incrementally and/or periodically decreased when prefetch performance remains below the second threshold and/or until a lower bound is reached. In some aspects, at the lower bound, no cache resources are allocated for storage of themetadata 122 and substantially all ofcache memory 120 is available as cache capacity. - At 606, the amount of
cache memory 120 allocated to themetadata 122 may be increased when the workload on thecache 110 is suitable for prefetching and may be decreased when the workload is not suitable for prefetching (as indicated by the one or more metrics 212). Thecache 110 may, therefore, be capable of adapting to different workload conditions. For example, increasing the amount ofcache memory 120 allocated to prefetchmetadata 122 when servicing workloads that are suitable for prefetching may result in improved performance despite decreases in available cache capacity, whereas decreasing the amount ofcache memory 120 allocated for themetadata 122 may enable the available capacity of thecache 110 to be increased, resulting in improved performance under workloads that are not suitable for prefetching. - In some implementations, modifying the size of the
first portion 124 of thecache memory 120 allocated for themetadata 122 may include completing pending requests 202 (e.g., draining a pipeline of the cache 110), flushing thecache 110, resetting the prefetch logic 230 (and/or prefetcher), repartitioning thecache memory 120 to modify the amount ofcache memory 120 allocated to thefirst portion 124 and/orsecond portion 126, and resuming operation using the repartitioned cache memory 120 (e.g., using the resizedfirst portion 124 and/orsecond portion 126 of the cache memory 120). Alternatively, modifying the size of thefirst portion 124 of thecache memory 120 may include preserving cache and/or prefetcher state. When increasing the amount ofcache memory 120 allocated to thefirst portion 124, thecache logic 210 can preserve cache state by, inter alia, compacting thecache data 128 maintained within the second portion 126 (e.g., selectingcache data 128 ofR cache units 220 for eviction), movingcache data 128 fromcache units 220 that are designated for allocation to thefirst portion 124 tocache units 220 that are to remain allocated to thesecond portion 126, and so on. When decreasing the amount ofcache memory 120 allocated to the first portion, thecache logic 210 can preserve prefetcher state by, inter alia, compacting themetadata 122 maintained within thefirst portion 124 for storage within a smaller number of cache units 220 (e.g., by removing portions of themetadata 122, such as entries associated with address regions exhibiting poor prefetch performance), moving the compactedmetadata 122 tocache units 220 that are to remain allocated to thefirst portion 124, and so on. - Increasing the amount of
cache memory 120 allocated to the first portion (e.g., first portion 124) at 606 may include allocating one ormore cache units 220 from the second portion (e.g., second portion 126) to thefirst portion 124. Allocating the one ormore cache units 220 to thefirst portion 124 may include flushing and/or destaging thecache units 220, modifying theaddress mapping scheme 316 to disable, remove, and/or ignore the cache units 220 (e.g., disablecache tags 326 of the cache units 220), modifying themetadata mapping scheme 314 to include and/or reference thecache units 220, and so on, as disclosed herein. Increasing the amount ofcache memory 120 allocated to the first portion may, therefore, include decreasing the amount ofcache memory 120 allocated to the second portion (and/or decreasing the amount ofcache memory 120 available for storage of cache data 128). Decreasing the size of thesecond portion 126 may include compacting thecache data 128 stored within thesecond portion 126 of thecache memory 120, which may include selectingcache data 128 to remove and/or evict from thecache 110. Thecache data 128 may be selected according to any suitable replacement or eviction policy, such as FIFO, LIFO, LRU, TLRU, MRU, LFU, random replacement, or the like. Compacting thecache data 128 may include reducing the amount ofcache memory 120 consumed by thecache data 128 byR cache units 220, where R is the number ofcache units 220 being allocated from thesecond portion 126 to the first portion 124 (or R·U, where U is the capacity of acache unit 220,CMU 320, or way 420). - In some aspects, the
cache logic 210 selects a first group ofcache units 220 to reallocate to thefirst portion 124 and selects a second group ofcache units 220 for eviction. The first group and the second group may each includeR cache units 220, where R is the quantity ofcache units 220 to be reallocated to thefirst portion 124. The first group and second group may be selected independently and/or in accordance with respective selection criteria. The first group ofcache units 220 may be selected in accordance with theaddress mapping scheme 316,metadata mapping scheme 314,partition scheme 312, or the like (which may allocatecache units 220 for storage of themetadata 122 per a predetermined pattern or scheme, such as a sequential scheme, way-based partition scheme 312-1, set-based partition scheme 312-2, or the like). The second group ofcache units 220 may be selected in accordance with an eviction or replacement policy, as disclosed herein. Reallocating theR cache units 220 may include: a) flushing the second group ofcache units 220, and b) movingcache data 128 fromcache units 220 that are included in the first group (and are not included in the second group) to the second group ofcache units 220. Thecache logic 210 may, therefore, retain more frequently accessed data within thecache memory 120 when reducing theavailable cache data 128 capacity of thecache 110. - In some implementations, the
cache logic 210 partitions thecache memory 110 according to a way or way-based partition scheme 312-1. AllocatingR cache units 220 from thesecond portion 126 to thefirst portion 124 may include allocating one ormore ways 420 within each set 430 of the cache to thefirst portion 124. AllocatingR cache units 220 from thesecond portion 126 to thefirst portion 124 may include allocating an additional W1A ways 420 within each of S sets 430-1 through 430-S from thesecond portion 126 to thefirst portion 124, where -
- Alternatively, or in addition, the
cache logic 210 may partition thecache memory 110 according to a set or set-based partition scheme 312-2. AllocatingR cache units 220 from thesecond portion 126 to thefirst portion 124 may include allocating an additional E1A sets 430 of thecache memory 120 from thesecond portion 126 to thefirst portion 124, where -
- and N is the number of cache units 220 (or ways 420) included in each
set 430. - Decreasing the amount of
cache memory 120 allocated to the first portion (e.g., first portion 124) at 606 may include allocating one ormore cache units 220 from the first portion to the second portion (e.g., second portion 126). Allocating the one ormore cache units 220 to thesecond portion 126 may include modifying theaddress mapping scheme 316 to enable, include, and/or otherwise reference the cache units 220 (e.g., enablecache tags 326 of the cache units 220), modifying themetadata mapping scheme 314 to remove thecache units 220, and so on, as disclosed herein. - Decreasing the amount of
cache memory 120 allocated to the first portion may further include compacting themetadata 122. Themetadata 122 may be compacted for storage within Rfewer cache units 220, where R is the quantity ofcache units 220 to be allocated from thefirst portion 124 to thesecond portion 126. Compacting themetadata 122 at 606 may include removing a portion of themetadata 122, such as one or more entries of themetadata 122. The portion of themetadata 122 may be selected based on a removal criterion, such as an age criterion (oldest removed first, youngest removed first, or the like), least recently accessed criterion, least frequently accessed criterion, and/or the like. - Alternatively, or in addition, portions of the
metadata 122 may be selected for removal based, at least in part, on one ormore metrics 212. Themetadata 122 may include a plurality of entries, each entry including access information pertaining to a respective region of the address space. Theprefetch logic 230 may utilize respective entries of themetadata 122 to implement prefetch operations within the address regions covered by the respective entries. The one ormore metrics 212 may be configured to quantify prefetch performance within the address regions covered by the respective entries of themetadata 122. Compacting themetadata 122 may include selecting entries of themetadata 122 for removal based, at least in part, on prefetch performance within the address regions covered by the entries, as quantified by themetrics 212. In some implementations, entries of themetadata 122 in which prefetch performance is below a threshold may be removed (and/or the amount of memory capacity allocated to the entries may be reduced). Alternatively, entries of themetadata 122 exhibiting higher prefetch performance may be retained, whereas entries exhibiting lower prefetch performance may be removed (e.g., the R lowest-performing entries of themetadata 122 may be selected for removal). Compacting the metadata may, therefore, include removingmetadata 122 from one ormore cache units 220 and/or moving metadata 122 (and/or entries of the metadata 122) fromcache units 220 being reallocated to thesecond portion 126 to the remainingcache units 220 allocated to thefirst portion 124. -
FIG. 7 illustrates with a flow diagram 700 further examples of methods for an apparatus to implement adaptive cache partitioning. The flow diagram 700 includesblocks 702 through 708. At 702, logic of a cache 110 (e.g., cache logic 210) implements apartition scheme 312 to, inter alia, partition acache memory 120 into afirst portion 124 and asecond portion 126. Thefirst portion 124 may include a first portion of thecache memory 120, and thesecond portion 126 may include a second portion of thecache memory 120, different from thefirst portion 124. Thefirst portion 124 may be allocated for storage ofmetadata 122 pertaining to an address space, such as an address space associated with a backing memory of the cache 110 (a memory 108). Thesecond portion 126 may be allocated for storage ofcache data 128 pertaining to the address space (e.g., may be available cache capacity of the cache 110). Partitioning thecache memory 120 may include implementing ametadata mapping scheme 314 to accesscache units 220 allocated to thefirst portion 124 and anaddress mapping scheme 316 to mapaddresses 204 of the address space tocache units 220 allocated to thesecond portion 126. - At 704, the
cache 110 services requests pertaining to the address space, which may include maintaining metadata pertaining to the address space within the first portion 124 (e.g., withinmetadata 122 maintained within the first portion 124) and loading data associated with addresses of the address space into thesecond portion 126. Data may be loaded into thecache memory 120 in response to cache misses, such asrequests 202 pertaining toaddresses 204 that are not available within thecache 110. Alternatively, or in addition, data may be prefetched into thecache memory 120 at 704. A prefetcher (and/orprefetch logic 230 of the cache 110) may utilize themetadata 122 maintained within thefirst portion 124 to predictaddresses 204 ofupcoming requests 202, and data corresponding to the predicted addresses 204 may be prefetched into thesecond portion 126 beforerequests 202 pertaining to the predicted addresses 204 are received at thecache 110. - At 706, the cache logic 210 (and/or prefetch logic 230) may determine whether to adapt the
partition scheme 312 of thecache memory 120. More specifically, the cache logic 210 (and/or prefetch logic 230) may determine whether to modify the size of thefirst portion 124 allocated for the metadata 122 (and/or modify the size of thesecond portion 126 allocated for storage of cache data 128) at 706. The determination may be based, at least in part, on one ormore metrics 212, which may be configured to quantify prefetch performance, as disclosed herein. Determining whether to adapt thepartition scheme 312 may include determining and/or monitoring one ormore metrics 212 pertaining to data prefetched into thesecond portion 126 and comparing themetrics 212 to one or more thresholds. Thepartition scheme 312 may be adapted at 708 responsive to one or more of themetrics 212 being greater than a first threshold and/or being lower than a second threshold; otherwise, the flow may continue at 704 where thecache 110 may continue to service requests pertaining to the address space. - At 708, the
cache logic 210 adapts the partitioning scheme to, inter alia, modify the amount ofcache memory 120 allocated to thefirst portion 124 and/orsecond portion 126. At 708, the size of thefirst portion 124 allocated for themetadata 122 may be increased (and the size of thesecond portion 126 allocated forcache data 128 may be decreased) when themetrics 212 exceed one or more first thresholds (e.g., when prefetch performance exceeds one or more first thresholds). Conversely, the size of thefirst portion 124 may be decreased (and the size of thesecond portion 126 may be increased) when themetrics 212 are below one or more second thresholds (e.g., when prefetch performance is below one or more second thresholds). - Increasing the size of the
first portion 124 may include allocating cache resources from thesecond portion 126 to the first portion 124 (e.g., one ormore cache units 220,ways 420, sets 430, and/or the like). Increasing the size of thefirst portion 124 may include reducing the size of thesecond portion 126. Reducing the size of thesecond portion 126 may include compactingcache data 128 stored within thesecond portion 126, as disclosed herein (e.g., by selectingcache data 128 for eviction, movingcache data 128 to remainingcache units 220 allocated to thesecond portion 126, and so on). Conversely, decreasing the size of thefirst portion 124 may include allocating cache resources from thefirst portion 124 to thesecond portion 126. Decreasing the size of thefirst portion 124 may include compactingmetadata 122 stored within thefirst portion 124 of thecache memory 120, as disclosed herein (e.g., by selecting portions of themetadata 122 for removal, moving portions of themetadata 122 to remainingcache units 220 allocated to thefirst portion 124, and so on). In response to adapting thepartition scheme 312 of thecache memory 120 at 708, the flow may continue at 704 where thecache 110 may service requests pertaining to the address space, as disclosed herein. -
FIG. 8 illustrates another example flow diagram 800 depicting operations for adaptive cache partitioning based, at least in part, onmetrics 212 pertaining to prefetch performance. The flow diagram 800 includesblocks 802 through 816. At 802, cache logic 210 (and/or prefetch logic 230) divides acache memory 120 into afirst portion 124 and asecond portion 126. Thefirst portion 124 includes a first partition of thecache memory 120 allocated formetadata 122 pertaining to an address space, and thesecond portion 126 may include a second partition of thecache memory 120 allocated for cache data 128 (the first portion separate from the second portion 126). - At 804, the
cache 110 services requests pertaining to the address space, which may include loading data into thesecond portion 126 of thecache memory 120, retrieving data associated withrespective addresses 204 of the address space in response torequests 202 pertaining to theaddresses 204 from thesecond portion 126 of thecache memory 120, maintainingmetadata 122 pertaining to accesses to respective addresses and/or regions of the address space within thefirst portion 124 of thecache memory 120, utilizing themetadata 122 maintained within thefirst portion 124 of thecache memory 120 toprefetch cache data 128 into thesecond portion 126 of thecache memory 120, and so on. - At 806, the cache logic 210 (and/or prefetch logic 230) determines whether to evaluate the
partition scheme 312 of thecache memory 120. In some implementations, thepartition scheme 312 may be evaluated in background operations and/or by use of idle resources of thecache 110. The determination of 806 may be based, at least in part, on whether thecache 110 is idle (e.g., is servicing one or more requests 202), whether idle resources are available, and/or the like. The determination of 806 may be based on one or more time-based criteria (e.g., may evaluate the partitioning scheme periodically and/or at a determined interval), a predetermined schedule, and/or the like. Alternatively, or in addition, the determination of 806 may be triggered by workload conditions and/or prefetch performance metrics (e.g., one or more metrics 212). Thecache logic 210 may be configured to determine and/or monitormetrics 212 pertaining to prefetch performance periodically and/or continuously, and evaluation of thepartition scheme 312 may be triggered at 806 in response tometrics 212 that exceed and/or are below one or more thresholds. - If the determination at 806 is to evaluate the
partition scheme 312, the flow continues at 808; otherwise, the flow continues to service requests pertaining to the address space at 804. - At 808, the cache logic 210 (and/or prefetch logic 230) determines and/or monitors one or more aspects of prefetch performance, such as prefetch hit rate, prefetch miss rate, quantity of useful prefetches, quantity of bad prefetches, ratio of useful prefetches to bad prefetches, and/or the like. At 808, the cache logic 210 (and/or prefetch logic 230) may determine and/or monitor one or
more metrics 212 pertaining to prefetch performance, as disclosed herein. Prefetch hit rate may be based on access metrics of prefetchedcache data 128 maintained withincache metadata 122 associated with the prefetchedcache data 128. Thecache data 128 that was prefetched into thecache memory 120 may be identified by use of prefetch indicators, such as prefetch flags associated with thecache data 128, which may be maintained withincache metadata 322 associated with thecache units 220 in which thecache data 128 are stored). - At 810, the prefetch performance determined at 806 is compared to a first threshold. If the prefetch performance exceeds the first threshold, the flow continues at 812; otherwise, the flow continues at 814. In some implementations, the determination of 810 is based on whether the prefetch performance determined at 810 exceeds the first threshold and the amount of
cache memory 120 currently allocated to thefirst portion 124 is below a maximum amount, threshold, or upper bound. If so, the flow continues at 812; otherwise, the flow continues at 814. - At 812, the
cache logic 210 modifies thepartition scheme 312 to increase the amount ofcache memory 120 allocated for storage of themetadata 122 pertaining to the address space (e.g., increase the size of thefirst portion 124 and/or first portion of the cache memory 120). Increasing the amount ofcache memory 120 allocated to thefirst portion 124 may include decreasing the amount ofcache memory 120 allocated to the second portion 126 (e.g., reducing the available capacity of the cache 110). At 812, thecache logic 210 may reassign designated cache memory resources from thesecond portion 126 to thefirst portion 124, such as one ormore cache units 220,ways 420, sets 430, and/or the like. For example, thecache memory 120 may be partitioned into afirst portion 124 comprising a first group ofcache units 220 and asecond portion 126 comprising a second group of cache units, different from the first group. Increasing the amount of cache memory allocated to the first partition (first portion 124) may include allocating one ormore cache units 220 of the second group to the first group by, inter alia, evicting and/or movingcache data 128 from the one ormore cache unit 220, removing the one ormore cache units 220 from the address mapping scheme 316 (e.g., disabling cache tags 326 of the one or more cache units 220), adding the one ormore cache units 220 to themetadata mapping scheme 314, and so on. - The
cache logic 210 may be further configured tocompact cache data 128 stored within thesecond portion 126 for storage within a smaller amount of thecache memory 120, configure theaddress mapping scheme 316 to remove, disable, and/or dereference the designated cache memory resources, configure themetadata mapping scheme 314 to include, reference, and/or otherwise provide access to the designated cache resources for use in storing themetadata 122, and so on, as disclosed herein. In response to implementing the modifiedpartition scheme 312 to increase the amount ofcache memory 120 allocated for the metadata 122 (and decrease the amount of available cache capacity), the flow may continue at 804. - At 814, the prefetch performance determined and/or monitored at 808 is compared to a second threshold. If the prefetch performance is below the second threshold, the flow continues at 816; otherwise, the flow continues at 804. In some implementations, the determination of 814 is based on whether the prefetch performance determined at 810 is below the second threshold and the amount of
cache memory 120 currently allocated to thefirst portion 124 above a minimum amount, threshold, or lower bound. If so, the flow continues at 816; otherwise, the flow continues at 814. - At 816, the
cache logic 210 modifies thepartition scheme 312 to decrease the amount ofcache memory 120 allocated for storage of themetadata 122 pertaining to the address space (e.g., decrease the size of thefirst portion 124 and/or first portion of the cache memory 120). Decreasing the amount ofcache memory 120 allocated to thefirst portion 124 may include increasing the amount ofcache memory 120 allocated to the second portion 126 (e.g., increasing the available capacity of the cache 110). At 816, thecache logic 210 may reassign designated cache memory resources from thefirst portion 124 to thesecond portion 126, such as one ormore cache units 220,ways 420, sets 430, and/or the like. At 816, thecache logic 210 may be further configured tocompact metadata 122 stored within thefirst portion 124 for storage within a smaller amount of thecache memory 120, configure theaddress mapping scheme 316 to enable, reference, and/or otherwise utilize the designated cache memory resources forcache data 128, configure themetadata mapping scheme 314 to remove, exclude, and/or dereference the designated cache resources, and so on, as disclosed herein. In response to implementing the modifiedpartition scheme 312 to decrease the amount ofcache memory 120 allocated for the metadata 122 (and increase the amount of available cache capacity), the flow may continue at 804. -
FIG. 9 illustrates an example flow diagram 900 depicting operations for adaptive cache partitioning based, at least in part, on metrics pertaining to cache and/or prefetch performance. Flow diagram 900 includesblocks 902 through 916. At 902, acache 110 partitions acache memory 120 thereof into afirst portion 124 and asecond portion 126. Thefirst portion 124 may include a first portion of the cache memory 120 (e.g., zero ormore cache units 220, cache lines, hardware cache lines,ways 420, sets 430, and/or the like). Thesecond portion 126 may include a second portion of thecache memory 120 different from thefirst portion 124. Thefirst portion 124 may be allocated formetadata 122 pertaining to an address space, and thesecond portion 126 may be allocated for storage of cache data 128 (may be available cache capacity). - At 904, the
cache 110 services requests pertaining to the address space, which may include, inter alia, receivingrequests 202,loading cache data 128 into thesecond portion 126 of thecache memory 120 in response to cache misses, servicing therequests 202 by use ofcache data 128 stored within thesecond portion 126 of thecache memory 120, and so on. - At 906, the
cache 110,cache logic 210,prefetch logic 230, and/or a prefetcher coupled to thecache 110 maintainsmetadata 122 pertaining to address access characteristics within thefirst portion 124 of thecache memory 120. Themetadata 122 may include any suitable information pertaining to accesses to respective addresses and/or address regions of the address space, as disclosed herein. At 908,cache data 128 are prefetched into thesecond portion 126 of thecache memory 120 based, at least in part, on themetadata 122 maintained within thefirst portion 124 of thecache memory 120. - At 910, the cache,
cache logic 210,prefetch logic 230, and/or prefetcher coupled to thecache 110 determines and/or monitors one ormore metrics 212. Themetrics 212 may be configured to quantify cache and/or prefetch performance, as disclosed herein. At 912, themetrics 212 are evaluated to determine whether to adapt thepartition scheme 312 of the cache memory 120 (e.g., determine whether to adapt the amount ofcache memory 120 allocated to thefirst portion 124 or second portion 126). The determination of 912 may be based, at least in part, on themetrics 212 determined and/or monitored at 910. The determination of 912 may adapt thepartition scheme 312 based on cache performance and/or prefetch performance. Thepartition scheme 312 may be adapted at 914 in response to: a)metrics 212 that are outside of one or more thresholds, b) prefetch performance that is outside of one or more prefetch thresholds, c) cache performance that is outside of one or more cache thresholds, and/or the like. The determination of 912 may be based on whether prefetch performance (e.g., prefetch hit rate) is above an upper prefetch threshold or below a lower performance threshold, whether cache performance (e.g., cache hit rate) is above an upper cache threshold or below a lower cache threshold, and/or the like. In some implementations, the determination of 912 may be based on both prefetch and cache performance (may be configured to balance prefetch and cache performance). The determination of 912 may be based on whether: a) prefetch performance exceeds a first prefetch threshold and cache performance is below a first cache threshold, b) prefetch performance is below a second prefetch threshold and cache performance is above a second cache performance threshold, and/or the like. - Alternatively, or in addition, the determination of 912 may be based on, inter alia, an amount of
cache memory 120 currently allocated to thefirst portion 124 for the metadata 122 (metadata capacity). Thedetermination 912 may be based on whether prefetch performance is above a first prefetch performance threshold and metadata capacity is below a first capacity threshold (e.g., a first prefetch or metadata capacity threshold), whether prefetch performance is below a second prefetch performance threshold and metadata capacity is above a second capacity threshold (e.g., a second prefetch or metadata capacity threshold), and/or the like. In some implementations, the determination of 912 is based on cache performance. The determination may be based on whether cache performance quantified by the metrics 212 (e.g., a cache performance metric 212) is below a cache performance threshold. The amount ofcache memory 120 allocated for storage ofmetadata 122 pertaining to the address space may be iteratively and/or periodically adjusted to improve cache performance (e.g., either increased or decreased) at 914. - At 914, size adjustments for the
first portion 124 and/orsecond portion 126 are determined. The size adjustments may be based, at least in part, on themetrics 212 determined and/or monitored at 910 (and/or the evaluation of themetrics 212 at 912). At 914, the size of thefirst portion 124 allocated formetadata 122 pertaining to the address space may be increased when prefetch performance quantified by themetrics 212 is at or above an upper prefetch threshold (and the metadata capacity is below a determined maximum). Conversely, the size of thefirst portion 124 may be decreased when the prefetch performance quantified by themetrics 212 is at or below a lower prefetch threshold. In another example, the amount ofcache memory 120 allocated to the first portion 124 a) may be increased when prefetch performance is above a first prefetch threshold and cache performance is below a first cache threshold, or b) may be decreased when prefetch performance is below a second prefetch threshold and cache performance is above a second cache performance threshold, and/or the like. - Alternatively, or in addition, the size adjustments may be based on, inter alia, an amount of
cache memory 120 currently allocated to thefirst portion 124 for the metadata 122 (metadata capacity). The amount ofcache memory 120 allocated to thefirst portion 124 may be increased when prefetch performance is above a first prefetch performance threshold and the amount ofcache memory 120 currently allocated to thefirst portion 124 is below a first capacity threshold. Conversely, the amount ofcache memory 120 allocated to thefirst portion 124 may be decreased when prefetch performance is below a second prefetch performance threshold and the amount ofcache memory 120 currently allocated to thefirst portion 124 is above a second capacity threshold, or the like. In some implementations, the size adjustments at 914 may be based on cache performance metrics, such as cache hit rate. At 914, The amount ofcache memory 120 allocated for storage ofmetadata 122 pertaining to the address space may be iteratively and/or periodically adjusted to achieve improved cache hit rates (e.g., either increased or decreased). In some implementations, the determination of 912 and size adjustments at 914 may be implemented in accordance with an optimization algorithm, which may be configured to converge to optimal (or locally optimal)partition scheme 312 that results in optimal (or locally optimal) cache performance, as quantified by themetrics 212. -
FIG. 10 illustrates anexample system 1000 for adaptive cache partitioning. Thesystem 1000 may include acache apparatus 1001, which may include acache 110 and/or means for implementing acache 110, as disclosed herein. The description ofFIG. 10 refers to aspects described above, such as thecache 110, which is depicted in multiple other figures (e.g.,FIGS. 1-1 to 5-3 ). Thesystem 1000 may further include aninterface 1015 for coupling thecache apparatus 1001 aninterconnect 1005, receivingrequests 202 pertaining toaddresses 204 of an address space associated with a memory 108 (e.g., from a requestor 201), implementingtransfer operations 203 to fetchcache data 128 from amemory 108, and so on. Theinterface 1015 may be configured to couple thecache apparatus 1001 to any suitable interconnect including, but not limited to: an interconnect, a physical interconnect, a bus, aninterconnect 105 for ahost device 102, a front-end interconnect 105A, a back-end interconnect 105B, and/or the like. Theinterface 1015 may include, but is not limited to: circuitry, logic circuitry, interface circuitry, interface logic, switch circuitry, switch logic, routing circuitry, routing logic, interconnect circuitry, interconnect logic, I/O circuitry, analog circuitry, digital circuitry, logic gates, registers, switches, multiplexers, ALU, state machines, microprocessors, embedded processors, PIM circuitry,logic 220, aninterface 215, afirst interface 215A, asecond interface 215B, or the like. - The
cache apparatus 1001 may include and/or be coupled to acache memory 120, which may include, but is not limited to: a memory, a memory array, semiconductor memory, volatile memory, RAM, SRAM, DRAM, SDRAM, and/or the like. In theFIG. 10 example, thecache memory 120 includes a plurality of cache units 220 (e.g., cache units 220-1 through 220-X), eachcache unit 220 including and/or corresponding to arespective CMU 320 and/orcache tag 326. In some aspects, thecache units 220 are arranged into a plurality of sets 430 (e.g., sets 430-1 through 430-S), each set 430 including a plurality of ways 420 (e.g., ways 420-1 through 420-N), eachway 420 including and/or corresponding to arespective cache unit 220. - The
system 1000 may include acomponent 1010 for allocating afirst portion 124 of thecache memory 120 formetadata 122 pertaining to the address space, caching data within thesecond portion 126 of thecache memory 120 different from thefirst portion 124 of thecache memory 120, and/or modifying a size of thefirst portion 124 of thecache memory 120 allocated for themetadata 122 based, at least in part, on a metric 212 pertaining to data prefetched into thesecond portion 126 of thecache memory 120. Thecomponent 1010 may be configured to divide thecache memory 120 into afirst partition 1024 that includes afirst portion 124 of thecache memory 120 and asecond partition 1026 that includes asecond portion 126 of thecache memory 120. Thefirst partition 1024 may be allocated to store themetadata 122, and thesecond partition 1026 may be allocated to storecache data 128. Thecomponent 1010 may include, but is not limited to: circuitry, logic circuitry, memory interface circuitry, memory interface logic, switch circuitry, switch logic, routing circuitry, routing logic, memory interconnect circuitry, memory interconnect logic, I/O circuitry, analog circuitry, digital circuitry, logic gates, registers, switches, multiplexers, ALU, state machines, microprocessors, embedded processors, PIM circuitry,cache logic 210,partition logic 310, apartition scheme 312, a metadata mapping scheme 314 (and/or metadata logic 1014), an address mapping scheme 316 (and/or address logic 1016), and/or the like. - The
component 1010 may be configured to partition thecache memory 120 in accordance with apartition scheme 312. Thepartition scheme 312 may define logic, rules, criteria, and/or other mechanisms for dividing cache memory resources of the cache memory 120 (e.g., cache units 220) between thefirst partition 1024 and thesecond partition 1026. Thepartition scheme 312 may be further configured to specify an amount, quantity, capacity and/or size of thefirst partition 1024 and/or second partition 1026 (e.g., may specify the amount, quantity, capacity, and/or size of thefirst portion 124 and/or second portion 126). Thepartition scheme 312 may define logic, rules, criteria, and/or other mechanisms by which cache memory resources are dynamically reallocated and/or reassigned between thefirst partition 1024 and/orsecond partition 1026, a such a cache-unit-based scheme, a way-based partition scheme 312-1, a set-based partition scheme 312-2, and/or the like. In theFIG. 10 example, thepartition scheme 312 configures thecomponent 1010 to allocateM cache units 220 to the first partition 1024 (andX-M cache units 220 to the second partition 1026). - In some examples, the
partition scheme 312 defines a cache-unit-based scheme. In a cache-unit-based scheme, allocatingM cache units 220 to thefirst partition 1024 may include allocating cache units 220-1 through 220−M to thefirst portion 124 and/or allocating 220−M+1 through 220-X to thesecond portion 126, as illustrated inFIG. 10 . In other examples, thepartition scheme 312 defines a way-based scheme (e.g., a way partition scheme 312-1). In a way-based scheme, allocatingM cache units 220 to thefirst partition 1024 may include allocatingW1 ways 420 within each set 430 of thecache memory 120 to thefirst partition 1024, where -
- and S is the quantity of
sets 430 included in thecache memory 120 such that W2 ways 420 within eachset 430 are allocated to thesecond partition 1026, where W2=N−W1 or -
- Alternatively, the
partition scheme 312 may define a set-based scheme (e.g., a set partition scheme 312-2). In a set-based scheme, allocatingM cache units 220 to thefirst partition 1024 may include allocating E1 sets 430 to thefirst partition 1024, where -
- and N is the number of
ways 420 included in each set 430 such that E2 sets are allocated to thesecond partition 1026, where E2=S−E1 or -
- The
component 1010 may implement, include, and/or be coupled tometadata logic 1014. Themetadata logic 1014 may be configured for mapping, addressing, associating, referencing, and/or otherwise accessing (and/or providing access to)cache units 220 allocated to thefirst partition 1024. Themetadata logic 1014 may implement and/or include ametadata mapping scheme 314, as disclosed herein. Themetadata logic 1014 may include, but is not limited to: circuitry, logic circuitry, memory interface circuitry, memory interface logic, switch circuitry, switch logic, routing circuitry, routing logic, memory interconnect circuitry, memory interconnect logic, I/O circuitry, analog circuitry, digital circuitry, logic gates, registers, switches, multiplexers, ALU, state machines, microprocessors, embedded processors, PIM circuitry,cache logic 210,partition logic 310, apartition scheme 312, ametadata mapping scheme 314, and/or the like. - The
component 1010 may implement, include, and/or be coupled to addresslogic 1016. Theaddress logic 1016 may be configured for mapping, addressing, associating, referencing, and/or otherwise accessing (and/or providing access to)cache units 220 allocated to thesecond partition 1026. Theaddress logic 1016 may be configured to map and/or associate addresses 204 of the address space withcache data 128 stored withincache units 220 allocated to thesecond partition 1026. Theaddress logic 1016 may implement and/or include anaddress mapping scheme 316, as disclosed herein. Theaddress logic 1016 may include, but is not limited to: circuitry, logic circuitry, memory interface circuitry, memory interface logic, switch circuitry, switch logic, routing circuitry, routing logic, memory interconnect circuitry, memory interconnect logic, I/O circuitry, analog circuitry, digital circuitry, logic gates, registers, switches, multiplexers, ALU, state machines, microprocessors, embedded processors, PIM circuitry,cache logic 210,partition logic 310, apartition scheme 312, anaddress mapping scheme 316, and/or the like. - The
component 1010 may be further configured to adapt thepartition scheme 312 based, at least in part, on one ormetrics 212. Themetrics 212 may be configured to quantify prefetch performance, as disclosed herein. Alternatively, or in addition, themetrics 212 may be configured to quantify other aspects, such as cache performance (e.g., cache hit rate, cache miss rate, and/or the like). Thecomponent 1010 may be configured to determine and/or monitor themetrics 212. Thecomponent 1010 can modify a size of the first partition 1024 (and/or first portion 124) of thecache memory 120 allocated for themetadata 122 based, at least in part, on one or more of themetrics 212. - The
component 1010 may implement, include, and/or be coupled to aprefetcher 1030 for updating themetadata 122 maintained within thefirst portion 124 of thecache memory 120 in response torequests 202 pertaining toaddresses 204 of the address space and/or selecting data to prefetch into thesecond portion 126 of thecache memory 120 based, at least in part, on themetadata 122 maintained within thefirst portion 124 of thecache memory 120. Themetadata 122 may include any suitable information pertaining to addresses of the address space, including, but not limited to: access characteristics, access statistics, an address sequence, address history, index table, delta sequence, stride pattern, correlation pattern, feature vector, ML feature, ML feature vector, ML model, ML modeling data, and/or the like. Theprefetcher 1030 may include, but is not limited to: circuitry, logic circuitry, memory interface circuitry, cache circuitry, switch circuitry, switch logic, routing circuitry, routing logic, interconnect circuitry, interconnect logic, I/O circuitry, analog circuitry, digital circuitry, logic gates, registers, switches, multiplexers, ALU, state machines, microprocessors, embedded processors, PIM circuitry, acache logic 210,prefetch logic 230, a stride prefetcher, a correlation prefetcher, an ML prefetcher, an LSTM prefetcher, and/or the like. - In some aspects, the
component 1010 is configured to determine and/or monitor the metric 212 pertaining to the data prefetched into thesecond portion 126 of thecache memory 120, and to modify the size of thefirst portion 124 of thecache memory 120 in response to the monitoring. Thecomponent 1010 may be configured to increase the size of thefirst portion 124 of thecache memory 120 allocated for the metadata 122 (and decrease the size of the second portion 126) in response to the metric 212 being above a first threshold, or decrease the size of the first portion 124 (and increase the size of the second portion 126) in response to the metric 212 being below a second threshold. Alternatively, or in addition, thecomponent 1010 may be configured to increase the size of thefirst portion 124 in response to a current size of thefirst portion 124 being below a metadata capacity threshold and one or more of: a) aprefetch performance metric 212 that is above a prefetch performance threshold and/or b) a cache performance metric 212 that is below a cache performance threshold. Conversely, thecomponent 1010 may be configured to decrease the size of thefirst portion 124 of thecache memory 120 in response to the current size of thefirst portion 124 being above a prefetch capacity threshold and one or more of: a) aprefetch performance metric 212 that is below a prefetch performance threshold and/or b) a cache performance metric 212 that is above a cache performance threshold. - The
component 1010 may be configured to allocate one ormore cache units 220 to thefirst partition 1024. Allocating acache unit 220 to the first partition 1024 (and/or first portion 124) may include configuring themetadata logic 1014 to address, reference, and/or otherwise provide access to the one ormore cache units 220 for storage of themetadata 122 and/or removing, disabling, ignoring, and/or otherwise excluding thecache unit 220 from theaddress logic 1016. Conversely, allocating acache unit 220 to thesecond partition 1026 and/orsecond portion 126 may include configuring theaddress logic 1016 to address, reference, and/or otherwise utilize thecache unit 220 as available cache capacity (e.g., for storage of cache data 128) and/or removing, disabling, ignoring, and/or otherwise excluding thecache unit 220 from themetadata logic 1014. Allocating acache unit 220 to thefirst portion 124 may include evictingcache data 128 from thecache unit 220 and disabling acache tag 326 of thecache unit 220. Allocating acache unit 220 to thesecond portion 126 may include removingmetadata 122 from thecache unit 220 and enabling thecache tag 326 of thecache unit 220. - The
component 1010 may be configured to increase the size of the first portion 124 (e.g., in response to a metric 212 that is above a first threshold). Increasing the size of thefirst portion 124 may include compacting thecache data 128 stored within thesecond portion 126. Thecomponent 1010 may be configured to preserve at least a portion of thecache data 128 maintained within thecache 110 when increasing the size of the first portion 124 (and decreasing the size of the second portion 126). In response to increasing the size of thefirst portion 124, thecomponent 1010 may be configured to evictcache data 128 from a selectedcache unit 220, the selectedcache unit 220 to remain allocated to thesecond portion 126. Thecomponent 1010 may be further configured to movecache data 128 to the selectedcache unit 220. Thecache data 128 may be moved from acache unit 220 that is to be allocated from thesecond portion 126 to thefirst portion 124. - Conversely, the
component 1010 may be configured to decrease the size of the first portion 124 (e.g., in response to a metric 212 that is below a second threshold). Decreasing the size of thefirst portion 124 may include compacting themetadata 122 stored within thefirst portion 124. Thecomponent 1010 may be configured to preserve at least a portion of themetadata 122 when decreasing the size of thefirst portion 124. Thecomponent 1010 can be configured to reduce the amount of thecache memory 120 allocated for themetadata 122 from a first group ofcache units 220 to a second group ofcache units 220, the second group smaller than the first group. Thecomponent 1010 can be further configured to compact themetadata 122 for storage within the second group ofcache units 220. Thecomponent 1010 may move metadata 122 stored within acache unit 220 included in the first group ofcache units 220 to a cache unit included in the second group ofcache units 220. - Although implementations for adaptive cache partitioning have been described in language specific to certain features and/or methods, the subject of the appended claims is not necessarily limited to the specific features or methods described. Rather, the specific features and methods are disclosed as example implementations for adaptive cache partitioning.
Claims (21)
1.-20. (canceled)
21. A method comprising:
implementing a partitioning scheme to partition a cache memory into a first portion and a second portion;
servicing multiple requests relating to an address space, including:
maintaining metadata pertaining to the address space within the first portion of the cache memory, and
loading data associated with addresses of the address space into the second portion of the cache memory; and
modifying the partitioning scheme to adapt a size of the first portion based, at least in part, on a metric quantifying performance of the loading.
22. The method of claim 21 , further comprising:
implementing a metadata mapping scheme to access cache units allocated to the first portion of the cache memory; and
implementing an address mapping scheme to map addresses of the address space to cache units allocated to the second portion of the cache memory.
23. The method of claim 21 , further comprising:
loading data into the second portion of the cache memory in response to cache misses, including requests pertaining to addresses that are not available within the cache memory.
24. The method of claim 21 , further comprising:
prefetching data into the second portion of the cache memory based, at least in part, on the metric quantifying performance of the loading.
25. The method of claim 24 , further comprising:
utilizing the metadata maintained within the first portion of the cache memory to predict addresses of upcoming requests; and
prefetching data corresponding to the predicted addresses into the second portion of the cache memory before requests pertaining to the predicted addresses are received at the cache memory.
26. The method of claim 21 , further comprising:
modifying the partitioning scheme to adapt a size of the second portion based, at least in part, on the metric quantifying performance of the loading.
27. The method of claim 26 , further comprising:
modifying the partitioning scheme to adapt the size of the first portion and the size of the second portion based, at least in part, on a metric quantifying prefetch performance of the loading.
28. The method of claim 27 , further comprising:
monitoring the metric quantifying prefetch performance that pertains to data prefetched into the second portion of the cache memory; and
comparing the metric quantifying prefetch performance to one or more thresholds.
29. The method of claim 21 , further comprising:
increasing the size of the first portion of the cache memory allocated for the metadata and decreasing a size of the second portion of the cache memory allocated for the data responsive to the metric quantifying performance exceeding at least one threshold.
30. The method of claim 21 , further comprising:
increasing the size of the first portion and decreasing a size of the second portion, including compacting data stored within the second portion of the cache memory; and
decreasing the size of the first portion and increasing the size of the second portion, including compacting metadata stored within the first portion of the cache memory.
31. An apparatus comprising:
a memory array configured as a cache memory; and
logic coupled to the memory array, the logic configured to:
implement a partitioning scheme to partition the cache memory into a first portion and a second portion;
service multiple requests relating to an address space, including:
maintaining metadata pertaining to the address space within the first portion of the cache memory, and
loading data associated with addresses of the address space into the second portion of the cache memory; and
modify the partitioning scheme to adapt a size of the first portion based, at least in part, on a metric quantifying performance of the loading.
32. The apparatus of claim 31 , wherein the logic is further configured to:
maintain within the first portion of the cache memory one or more of an address sequence, address history, index table, delta sequence, stride pattern, correlation pattern, feature vector, machine-learned (ML) feature, ML feature vector, ML model, or ML modeling data.
33. The apparatus of claim 31 , wherein:
the metric quantifying performance of the loading comprises a prefetch metric; and
the logic is further configured to monitor one or more of a prefetch hit rate, quantity of useful prefetches, quantity of bad prefetches, or ratio of useful prefetches to bad prefetches.
34. The apparatus of claim 31 , wherein the logic is further configured to:
increase the size of the first portion of the cache memory for the metadata pertaining to the address space in response to the metric exceeding at least one threshold; and
decrease a size of the second portion of the cache memory for the data associated with addresses of the address space in response to the metric exceeding the at least one threshold.
35. The apparatus of claim 34 , wherein the logic is further configured to:
decrease the size of the first portion of the cache memory for the metadata pertaining to the address space in response to the metric being below the at least one threshold; and
increase the size of the second portion of the cache memory for the data associated with addresses of the address space in response to the metric being below the at least one threshold.
36. The apparatus of claim 31 , wherein the logic is further configured to:
allocate a quantity of ways of the cache memory for the metadata pertaining to the address space; and
modify the quantity of ways of the cache memory allocated for the metadata pertaining to the address space based, at least in part, on the metric quantifying performance of the loading.
37. The apparatus of claim 36 , wherein the logic is further configured to:
divide ways of a set of the cache memory into a first group allocated for the metadata pertaining to the address space and a second group allocated for the data associated with addresses of the address space; and
move data from a way within the first group to a way within the second group.
38. The apparatus of claim 31 , wherein the logic is further configured to:
allocate a quantity of sets of the cache memory for the metadata pertaining to the address space; and
modify the quantity of sets of the cache memory allocated for the metadata pertaining to the address space based, at least in part, on the metric quantifying performance of the loading.
39. The apparatus of claim 31 , wherein the logic is further configured to:
responsive to allocating a group of cache units from the second portion to the first portion,
evicting data from cache units of the group of cache units,
disabling cache tags associated with the cache units of the group of cache units,
evicting data from a selected cache unit, the selected cache unit to remain allocated to the second portion, and
moving to the selected cache unit data stored within a cache unit of the group of cache units being allocated from the second portion to the first portion.
40. An apparatus comprising:
a memory array configured as a cache memory;
means for implementing a partitioning scheme to partition the cache memory into a first portion and a second portion;
means for maintaining metadata pertaining to an address space within the first portion of the cache memory;
means for loading data associated with addresses of the address space into the second portion of the cache memory; and
means for modifying the partitioning scheme to adapt a size of the first portion based, at least in part, on a metric quantifying performance of the loading of the data associated with addresses of the address space.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US18/057,628 US20230169011A1 (en) | 2020-08-19 | 2022-11-21 | Adaptive Cache Partitioning |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US16/997,811 US11507516B2 (en) | 2020-08-19 | 2020-08-19 | Adaptive cache partitioning |
US18/057,628 US20230169011A1 (en) | 2020-08-19 | 2022-11-21 | Adaptive Cache Partitioning |
Related Parent Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US16/997,811 Continuation US11507516B2 (en) | 2020-08-19 | 2020-08-19 | Adaptive cache partitioning |
Publications (1)
Publication Number | Publication Date |
---|---|
US20230169011A1 true US20230169011A1 (en) | 2023-06-01 |
Family
ID=80269585
Family Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US16/997,811 Active 2041-02-10 US11507516B2 (en) | 2020-08-19 | 2020-08-19 | Adaptive cache partitioning |
US18/057,628 Pending US20230169011A1 (en) | 2020-08-19 | 2022-11-21 | Adaptive Cache Partitioning |
Family Applications Before (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US16/997,811 Active 2041-02-10 US11507516B2 (en) | 2020-08-19 | 2020-08-19 | Adaptive cache partitioning |
Country Status (2)
Country | Link |
---|---|
US (2) | US11507516B2 (en) |
CN (1) | CN114077553A (en) |
Families Citing this family (17)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11294808B2 (en) | 2020-05-21 | 2022-04-05 | Micron Technology, Inc. | Adaptive cache |
US11409657B2 (en) | 2020-07-14 | 2022-08-09 | Micron Technology, Inc. | Adaptive address tracking |
US11422934B2 (en) | 2020-07-14 | 2022-08-23 | Micron Technology, Inc. | Adaptive address tracking |
KR20220120016A (en) * | 2021-02-22 | 2022-08-30 | 에스케이하이닉스 주식회사 | Memory controller and operating method thereof |
TWI768731B (en) * | 2021-02-25 | 2022-06-21 | 威盛電子股份有限公司 | Computer system |
US20220358235A1 (en) * | 2021-05-05 | 2022-11-10 | EMC IP Holding Company LLC | Access Control of Protected Data Using Storage System-Based Multi-Factor Authentication |
US12118236B2 (en) * | 2021-08-30 | 2024-10-15 | International Business Machines Corporation | Dynamically allocating memory controller resources for extended prefetching |
US11704245B2 (en) * | 2021-08-31 | 2023-07-18 | Apple Inc. | Dynamic allocation of cache memory as RAM |
US11893251B2 (en) | 2021-08-31 | 2024-02-06 | Apple Inc. | Allocation of a buffer located in system memory into a cache memory |
US20230205872A1 (en) * | 2021-12-23 | 2023-06-29 | Advanced Micro Devices, Inc. | Method and apparatus to address row hammer attacks at a host processor |
US20230350598A1 (en) * | 2022-04-28 | 2023-11-02 | Micron Technology, Inc. | Performance monitoring for a memory system |
US12099723B2 (en) * | 2022-09-29 | 2024-09-24 | Advanced Micro Devices, Inc. | Tag and data configuration for fine-grained cache memory |
US20240111677A1 (en) * | 2022-09-30 | 2024-04-04 | Advanced Micro Devices, Inc. | Region pattern-matching hardware prefetcher |
KR20240050567A (en) * | 2022-10-12 | 2024-04-19 | 에스케이하이닉스 주식회사 | Memory system and host device |
US20240220409A1 (en) * | 2022-12-28 | 2024-07-04 | Advanced Micro Devices, Inc. | Unified flexible cache |
KR102561809B1 (en) * | 2023-01-10 | 2023-07-31 | 메티스엑스 주식회사 | Method and apparatus for adaptively managing cache pool |
CN118606229A (en) * | 2024-08-07 | 2024-09-06 | 深圳市芯存科技有限公司 | SLC cache management method, device, equipment and storage medium |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20110093654A1 (en) * | 2009-10-20 | 2011-04-21 | The Regents Of The University Of Michigan | Memory control |
US20130166831A1 (en) * | 2011-02-25 | 2013-06-27 | Fusion-Io, Inc. | Apparatus, System, and Method for Storing Metadata |
US20150113223A1 (en) * | 2013-10-18 | 2015-04-23 | Fusion-Io, Inc. | Systems and methods for adaptive reserve storage |
US9129033B1 (en) * | 2011-08-31 | 2015-09-08 | Google Inc. | Caching efficiency using a metadata cache |
US10101934B1 (en) * | 2016-03-24 | 2018-10-16 | Emc Corporation | Memory allocation balancing for storage systems |
US20220179785A1 (en) * | 2019-08-26 | 2022-06-09 | Huawei Technologies Co., Ltd. | Cache space management method and apparatus |
Family Cites Families (29)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5721874A (en) | 1995-06-16 | 1998-02-24 | International Business Machines Corporation | Configurable cache with variable, dynamically addressable line sizes |
US7590800B2 (en) | 2006-06-30 | 2009-09-15 | Seagate Technology Llc | 2D dynamic adaptive data caching |
US7873791B1 (en) * | 2007-09-28 | 2011-01-18 | Emc Corporation | Methods and systems for incorporating improved tail cutting in a prefetch stream in TBC mode for data storage having a cache memory |
US8069311B2 (en) | 2007-12-28 | 2011-11-29 | Intel Corporation | Methods for prefetching data in a memory storage structure |
US8266409B2 (en) | 2009-03-03 | 2012-09-11 | Qualcomm Incorporated | Configurable cache and method to configure same |
US8180763B2 (en) | 2009-05-29 | 2012-05-15 | Microsoft Corporation | Cache-friendly B-tree accelerator |
US20120278560A1 (en) | 2009-10-04 | 2012-11-01 | Infinidat Ltd. | Pre-fetching in a storage system that maintains a mapping tree |
US8589627B2 (en) | 2010-08-27 | 2013-11-19 | Advanced Micro Devices, Inc. | Partially sectored cache |
US8527704B2 (en) | 2010-11-11 | 2013-09-03 | International Business Machines Corporation | Method and apparatus for optimal cache sizing and configuration for large memory systems |
CN103380417B (en) | 2011-02-18 | 2016-08-17 | 英特尔公司(特拉华州公司) | The method and system of the data for being stored from memory requests |
US9753858B2 (en) | 2011-11-30 | 2017-09-05 | Advanced Micro Devices, Inc. | DRAM cache with tags and data jointly stored in physical rows |
US9769030B1 (en) * | 2013-02-11 | 2017-09-19 | Amazon Technologies, Inc. | Page prefetching |
US9740621B2 (en) * | 2014-05-21 | 2017-08-22 | Qualcomm Incorporated | Memory controllers employing memory capacity and/or bandwidth compression with next read address prefetching, and related processor-based systems and methods |
US10001927B1 (en) | 2014-09-30 | 2018-06-19 | EMC IP Holding Company LLC | Techniques for optimizing I/O operations |
KR102130578B1 (en) * | 2014-12-02 | 2020-07-06 | 에스케이하이닉스 주식회사 | Semiconductor device |
JP6438777B2 (en) | 2015-01-30 | 2018-12-19 | ルネサスエレクトロニクス株式会社 | Image processing apparatus and semiconductor device |
US9563564B2 (en) | 2015-04-07 | 2017-02-07 | Intel Corporation | Cache allocation with code and data prioritization |
US9696931B2 (en) * | 2015-06-12 | 2017-07-04 | International Business Machines Corporation | Region-based storage for volume data and metadata |
KR102362941B1 (en) | 2015-09-03 | 2022-02-16 | 삼성전자 주식회사 | Method and apparatus for adaptive cache management |
US9734070B2 (en) | 2015-10-23 | 2017-08-15 | Qualcomm Incorporated | System and method for a shared cache with adaptive partitioning |
US9767028B2 (en) | 2015-10-30 | 2017-09-19 | Advanced Micro Devices, Inc. | In-memory interconnect protocol configuration registers |
US9832277B2 (en) | 2015-11-13 | 2017-11-28 | Western Digital Technologies, Inc. | Systems and methods for adaptive partitioning in distributed cache memories |
US10387315B2 (en) | 2016-01-25 | 2019-08-20 | Advanced Micro Devices, Inc. | Region migration cache |
US10503652B2 (en) | 2017-04-01 | 2019-12-10 | Intel Corporation | Sector cache for compression |
US10678692B2 (en) | 2017-09-19 | 2020-06-09 | Intel Corporation | Method and system for coordinating baseline and secondary prefetchers |
US10394717B1 (en) * | 2018-02-16 | 2019-08-27 | Microsoft Technology Licensing, Llc | Central processing unit cache friendly multithreaded allocation |
US11294808B2 (en) | 2020-05-21 | 2022-04-05 | Micron Technology, Inc. | Adaptive cache |
US11422934B2 (en) | 2020-07-14 | 2022-08-23 | Micron Technology, Inc. | Adaptive address tracking |
US11409657B2 (en) | 2020-07-14 | 2022-08-09 | Micron Technology, Inc. | Adaptive address tracking |
-
2020
- 2020-08-19 US US16/997,811 patent/US11507516B2/en active Active
-
2021
- 2021-07-23 CN CN202110838429.8A patent/CN114077553A/en active Pending
-
2022
- 2022-11-21 US US18/057,628 patent/US20230169011A1/en active Pending
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20110093654A1 (en) * | 2009-10-20 | 2011-04-21 | The Regents Of The University Of Michigan | Memory control |
US20130166831A1 (en) * | 2011-02-25 | 2013-06-27 | Fusion-Io, Inc. | Apparatus, System, and Method for Storing Metadata |
US9129033B1 (en) * | 2011-08-31 | 2015-09-08 | Google Inc. | Caching efficiency using a metadata cache |
US20150113223A1 (en) * | 2013-10-18 | 2015-04-23 | Fusion-Io, Inc. | Systems and methods for adaptive reserve storage |
US10101934B1 (en) * | 2016-03-24 | 2018-10-16 | Emc Corporation | Memory allocation balancing for storage systems |
US20220179785A1 (en) * | 2019-08-26 | 2022-06-09 | Huawei Technologies Co., Ltd. | Cache space management method and apparatus |
Also Published As
Publication number | Publication date |
---|---|
US11507516B2 (en) | 2022-11-22 |
US20220058132A1 (en) | 2022-02-24 |
CN114077553A (en) | 2022-02-22 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11507516B2 (en) | Adaptive cache partitioning | |
US11693775B2 (en) | Adaptive cache | |
US11531617B2 (en) | Allocating and accessing memory pages with near and far memory blocks from heterogenous memories | |
Jevdjic et al. | Unison cache: A scalable and effective die-stacked DRAM cache | |
US9152569B2 (en) | Non-uniform cache architecture (NUCA) | |
KR101826073B1 (en) | Cache operations for memory management | |
CN111052095B (en) | Multi-line data prefetching using dynamic prefetch depth | |
Mittal | A survey of techniques for architecting and managing GPU register file | |
US12111764B2 (en) | Forward caching memory systems and methods | |
US20150067264A1 (en) | Method and apparatus for memory management | |
US7584327B2 (en) | Method and system for proximity caching in a multiple-core system | |
US9317448B2 (en) | Methods and apparatus related to data processors and caches incorporated in data processors | |
US20230088638A1 (en) | Adaptive Address Tracking | |
US11409657B2 (en) | Adaptive address tracking | |
JP2023507078A (en) | Downgrade prefetch level | |
EP3827355B1 (en) | Forward caching application programming interface systems and methods | |
Bakhshalipour et al. | Die-Stacked DRAM: Memory, Cache, or MemCache? | |
US7702875B1 (en) | System and method for memory compression | |
US11841798B2 (en) | Selective allocation of memory storage elements for operation according to a selected one of multiple cache functions | |
CN117120989A (en) | Method and apparatus for DRAM cache tag prefetcher | |
KR101976320B1 (en) | Last level cache memory and data management method thereof | |
US11693778B2 (en) | Cache grouping for increasing performance and fairness in shared caches | |
Lee et al. | Dirty-block tracking in a direct-mapped DRAM cache with self-balancing dispatch | |
Xu | Improving flash translation layer performance by using log block mapping scheme and two-level buffer for address translation information | |
Wang et al. | Workload-aware page-level flash translation layer for NAND flash-based storage systems |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: MICRON TECHNOLOGY, INC., IDAHO Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:ROBERTS, DAVID ANDREW;PAWLOWSKI, JOSEPH THOMAS;REEL/FRAME:061845/0383 Effective date: 20200819 |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: NON FINAL ACTION MAILED |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: RESPONSE TO NON-FINAL OFFICE ACTION ENTERED AND FORWARDED TO EXAMINER |