EP4371011A1 - Remplacement de mémoire cache sensible au niveau - Google Patents

Remplacement de mémoire cache sensible au niveau

Info

Publication number
EP4371011A1
EP4371011A1 EP22751611.9A EP22751611A EP4371011A1 EP 4371011 A1 EP4371011 A1 EP 4371011A1 EP 22751611 A EP22751611 A EP 22751611A EP 4371011 A1 EP4371011 A1 EP 4371011A1
Authority
EP
European Patent Office
Prior art keywords
cache
data
request
level
address
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
EP22751611.9A
Other languages
German (de)
English (en)
Inventor
Amit Kumar
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Qualcomm Inc
Original Assignee
Qualcomm Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from US17/666,429 external-priority patent/US20230012880A1/en
Application filed by Qualcomm Inc filed Critical Qualcomm Inc
Publication of EP4371011A1 publication Critical patent/EP4371011A1/fr
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/10Address translation
    • G06F12/1027Address translation using associative or pseudo-associative address translation means, e.g. translation look-aside buffer [TLB]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/12Replacement control
    • G06F12/121Replacement control using replacement algorithms
    • G06F12/123Replacement control using replacement algorithms with age lists, e.g. queue, most recently used [MRU] list or least recently used [LRU] list

Definitions

  • This application relates generally to microprocessor technology including, but not limited to, methods, systems, and devices for controlling cache replacement in a cache for a processing cluster having multiple processors.
  • Caching improves computer performance by keeping recently used or often used data items (e.g., references to physical addresses of often used data) in caches that are faster to access compared to physical memory stores.
  • caches are updated to store the newly fetched information to reflect current and/or anticipated data needs.
  • caches are limited in their storage size and often require demotion of data currently stored in the caches to lower cache levels or eviction of data currently stored in the cache to a lower cache or memory store in order to make space for the newly fetched information.
  • the level-aware cache replacement policy defines a level of a table (e.g., within a table walk process) from which the cache entry is obtained or generated. In some implementations, the level-aware cache replacement policy determines whether data in a cache entry satisfies cache promotion criteria based on a level of a table (e.g., within a table walk process) from which the data is obtained. In some implementations, the level-aware cache replacement policy includes a first set of one or more cache management rules for cache entries that store data that satisfy cache promotion criteria, and a second set of one or more cache management rules for cache entries that store data that does not satisfy cache promotion criteria.
  • an electronic device includes a first processing cluster that includes one or more processors and a cache coupled to the one or more processors in the first processing cluster.
  • the cache stores a plurality of data entries.
  • the electronic device is configured to transmit an address translation request of a first address from the first processing cluster to the cache.
  • the electronic device transmits the address translation request to memory (e.g., a lower-level cache or system memory) that is distinct from the cache.
  • the electronic device replaces an entry (e.g., a cache entry) at a first priority level (e.g., a first cache level) in the cache with the data.
  • a first priority level e.g., a first cache level
  • the electronic device replaces an entry (e.g., a cache entry) at a second priority level (e.g., a first cache level) in the cache with the data including the second address.
  • the second priority level is a higher priority level in the cache than the first priority level (e.g., the second cache level stores data that is more recently used than the first cache level).
  • Figure 1 is a block diagram of an example system module in a typical electronic device, in accordance with some implementations.
  • Figure 2 is a block diagram of an example electronic device having one or more processing clusters, in accordance with some implementations.
  • Figure 3A illustrates an example method of a table walk for fetching data from memory, in accordance with some implementations.
  • Figure 3B illustrates an example of caching table walk outputs for increased speed in data fetching, in accordance with some implementations.
  • Figure 4 A illustrates an example method of a two-stage table walk for fetching data from memory, in accordance with some implementations.
  • Figure 4B illustrates an example caching table walk outputs for increased speed in data fetching, in accordance with some implementations.
  • Figure 5 illustrates levels in a cache, in accordance with some implementations.
  • Figures 6A - 6D illustrate cache replacement policies for cache entries that store data that do not satisfy cache promotion criteria, in accordance with some implementations.
  • Figures 7A - 7B illustrate cache replacement policies for cache entries that store data that satisfies cache promotion criteria, in accordance with some implementations.
  • Figures 8 A - 8C illustrate a flow chart of an example method of controlling cache entry replacement in a cache, in accordance with some implementations.
  • FIG. 1 is a block diagram of an example system module 100 in a typical electronic device in accordance with some implementations.
  • System module 100 in this electronic device includes at least a system on a chip (SoC) 102, memory modules 104 for storing programs, instructions and data, an input/output ( I/O) controller 106, one or more communication interfaces such as network interfaces 108, and one or more communication buses 150 for interconnecting these components.
  • SoC system on a chip
  • I/O controller 106 allows SoC 102 to communicate with an I/O device (e.g., a keyboard, a mouse or a track-pad) via a universal serial bus interface.
  • I/O controller 106 allows SoC 102 to communicate with an I/O device (e.g., a keyboard, a mouse or a track-pad) via a universal serial bus interface.
  • I/O controller 106 allows SoC 102 to communicate with an I/O device (e.g., a keyboard, a mouse or a track
  • network interfeces 108 includes one or more interfaces for Wi-Fi, Ethernet and Bluetooth networks, each allowing the electronic device to exchange data with an external source, e.g., a server or another electronic device.
  • communication buses 150 include circuitry (sometimes called a chipset) that interconnects and controls communications among various system components included in system module 100.
  • memory modules 104 include high-speed random access memory, such as DRAM, SRAM, DDR RAM or other random access solid state memory devices.
  • memory modules 104 include non-volatile memory, such as one or more magnetic disk storage devices, optical disk storage devices, flash memory devices, or other non-volatile solid state storage devices.
  • memory modules 104, or alternatively the non-volatile memory device(s) within memory modules 104 include a non-transitory computer readable storage medium.
  • memory slots are reserved on system module 100 for receiving memory modules 104. Once inserted into the memory slots, memory modules 104 are integrated into system module 100.
  • system module 100 further includes one or more components selected from:
  • a memory controller 110 that controls communication between SoC 102 and memory components, including memory modules 104, in electronic device, including controlling memory management unit (MMU) line replacement (e.g., cache entry replacement, cache line replacement) in a cache in accordance with a cache replacement policy;
  • MMU memory management unit
  • SSDs 112 that apply integrated circuit assemblies to store data in the electronic device, and in many implementations, are based on NAND or NOR memory configurations;
  • a hard drive 114 that is a conventional data storage device used for storing and retrieving digital information based on electromechanical magnetic disks
  • a power supply connector 116 that is electrically coupled to receive an external power supply
  • PMIC power management integrated circuit
  • a graphics module 120 that generates a feed of output images to one or more display devices according to their desirable image/video formats; and • a sound module 122 that facilitates the input and output of audio signals to and from the electronic device under control of computer programs.
  • communication buses 150 also interconnect and control communications among various system components including components 110-122.
  • non-transitory computer readable storage media can be used, as new data storage technologies are developed for storing information in the non-transitory computer readable storage media in the memory modules 104 and in SSDs 112.
  • These new non-transitory computer readable storage media include, but are not limited to, those manufactured from biological materials, nanowires, carbon nanotubes and individual molecules, even though the respective data storage technologies are currently under development and yet to be commercialized.
  • SoC 102 is implemented on an integrated circuit that integrates one or more microprocessors or central processing units, memory, input/output ports and secondary storage on a single substrate. SoC 102 is configured to receive one or more internal supply voltages provided by PMIC 118. In some implementations, both the SoC 102 and PMIC 118 are mounted on a main logic board, e.g., on two distinct areas of the main logic board, and electrically coupled to each other via conductive wires formed in the main logic board. As explained above, this arrangement introduces parasitic effects and electrical noise that could compromise performance of the SoC, e.g., cause a voltage drop at an internal voltage supply.
  • SoC 102 and PMIC 118 are vertically arranged in an electronic device, such that they are electrically coupled to each other via electrical connections that are not formed in the main logic board. Such vertical arrangement of SoC 102 and PMIC 118 can reduce a length of electrical connections between SoC 102 and PMIC 118 and avoid performance degradation caused by the conductive wires of the main logic board. In some implementations, vertical arrangement of SoC 102 and PMIC 118 is facilitated in part by integration of thin film inductors in a limited space between SoC 102 and PMIC 118.
  • FIG. 2 is a block diagram of an example electronic device 200 having one or more processing clusters 202 (e.g., first processing cluster 202-1, Mth processing cluster 202- M), in accordance with some implementations.
  • Electronic device 200 further includes a cache 220 and a memory 104 in addition to processing clusters 202.
  • Cache 220 is coupled to processing clusters 202 on SOC 102, which is further coupled to memory 104 that is external to SOC 102.
  • Each processing cluster 202 includes one or more processors 204 and a cluster cache 212.
  • Cluster cache 212 is coupled to one or more processors 204, and maintains one or more request queues 214 for one or more processors 204.
  • Each processor 204 further includes a respective data fetcher 208 to control cache fetching (including cache prefetching) associated with the respective processor 204.
  • each processor 204 farther includes a core cache 218 that is optionally split into an instruction cache and a data cache, and core cache 218 stores instructions and data that can be immediately executed by the respective processor 204.
  • first processing cluster 202- 1 includes first processor 204- 1 N-th processor 204-N, first cluster cache 212-1, where N is an integer greater than 1.
  • First cluster cache 212-1 has one or more first request queues, and each first request queue includes a queue of demand requests and prefetch requests received from a subset of processors 204 of first processing cluster 202-1.
  • SOC 102 only includes a single processing cluster 202-1.
  • SOC 102 includes at least an additional processing cluster 202, e.g., M-th processing cluster 202-M.
  • M-th processing cluster 202-M includes first processor 206-1 N’-th processor 206-N’, and M-th cluster cache 212-M, where N’ is an integer greater than 1 and M-th cluster cache 212-M has one or more M-th request queues.
  • the one or more processing clusters 202 are configured to provide a central processing unit for an electronic device and are associated with a hierarchy of caches.
  • the hierarchy of caches includes three levels that are distinguished based on their distinct operational speeds and sizes.
  • a reference to “the speed” of a memory relates to the time required to write data to or read data from the memory (e.g., a faster memory has shorter write and/or read times than a slower memory)
  • a reference to “the size” of a memory relates to the storage capacity of the memory (e.g., a smaller memory provides less storage space than a larger memory).
  • the core cache 218, cluster cache 212, and cache 220 correspond to a first level (L1) cache, a second level (L2) cache, and a third level (L3) cache, respectively.
  • Each core cache 218 holds instructions and data to be executed directly by a respective processor 204, and has the fastest operational speed and smallest size among the three levels of memory.
  • the cluster cache 212 is slower operationally than the core cache 218 and bigger in size, and holds data that is more likely to be accessed by processors 204 of respective processing cluster 202.
  • Cache 220 is shared by the plurality of processing clusters 202, and bigger in size and slower in speed than each core cache 218 and cluster cache 212.
  • Each processing cluster 202 controls prefetches of instructions and data to core caches 218 and/or cluster cache 212.
  • Each individual processor 204 further controls prefetches of instructions and data from respective cluster cache 212 into respective individual core cache 218.
  • a first cluster cache 212-1 of first processing cluster 202-1 is coupled to a single processor 204-1 in the same processing cluster, and not to any other processors (e.g., 204-N). In some implementations, first cluster cache 212-1 of first processing cluster 202-1 is coupled to a plurality of processors 204-1 and 204-N in the same processing cluster. In some implementations, first cluster cache 212-1 of first processing cluster 202- 1 is coupled to the one or more processors 204 in the same processing cluster 202- 1 , and not to processors in any cluster other than the first processing cluster 202-1 (e.g., processors 206 in cluster 202-M). In such cases, first cluster cache 212-1 of first processing cluster 202-1 is sometimes referred to as a second-level cache (e.g., L2 cache).
  • L2 cache second-level cache
  • each request queue optionally includes a queue of demand requests and prefetch requests received from a subset of processors 204 of respective processing cluster 202.
  • Each data retrieval request received from respective processor 204 is distributed to one of the request queues associated with the respective processing cluster.
  • a request queue receives only requests received from a specific processor 204.
  • a request queue receives requests from more than one processor 204 in the processing cluster 202, allowing a request load to be balanced among the plurality of request queues.
  • a request queue receives only one type of data retrieval requests (e.g., prefetch requests) from different processors 204 in the same processing cluster 202.
  • Each processing cluster 202 includes or is coupled to one or more data fetchers 208 in processors 204, and the data fetch requests (e.g., demand requests, prefetch requests) are generated and processed by one or more data fetchers 208.
  • each processor 204 in processing cluster 202 includes or is coupled to a respective data fetcher 208.
  • two or more of processors 204 in processing cluster 202 share the same data fetcher 208.
  • a respective data fetcher 208 may include any of a demand fetcher for fetching data for demand requests and a prefetcher for fetching data for prefetch requests.
  • a data fetch request (including demand requests and prefetch requests) are received at a processor (e.g., processor 204-1) of a processing cluster 202.
  • the data fetch request is an address translation request to retrieve data from memory (e.g., memory 104) that includes information for translating a virtual address into a physical address (e.g., to retrieve data that includes a virtual address to physical address translation or a virtual address to physical address mapping, which includes, for example, a page entry in a page table).
  • a data fetcher of the processor begins the data fetching process by querying a translation lookaside buffer (TLB) to see if a requested data 390 (e.g., the requested address translation) is stored in the TLB.
  • a requested data 390 e.g., the requested address translation
  • the data is retrieved from the TLB and passed onto the processor.
  • data fetcher 208 starts searching for requested data 390 in a core cache 218 associated the processor (e.g., core cache 218-1 associated with processor 204-1).
  • data fetcher 208-1 queries cluster cache 212-1.
  • data fetcher 208-1 queries cache 220, and in accordance with a determination that requested data 390 is not stored in cache 220, data fetcher 208-1 queries memory 104.
  • data fetcher 208 performs a table walk process in the respective cache.
  • the table walk process is a one-stage table walk process (e.g., single-stage table walk process), such as the table walk process shown in Figures 3A and 3B.
  • the table walk process is a two-stage table walk process, such as the two-stage table walk process shown in Figures 4A and 4B.
  • Figure 3A illustrates an example of a one-stage table walk process 300 for fetching data by a processing cluster 202 (e.g., by a data fetcher 208 of first processing cluster 202-1 of Figure 2), in accordance with some implementations.
  • address translation information e.g., the page table
  • a multi-level hierarchy that includes at least one level 0 table, a plurality of level 1 tables, a plurality of level 2 tables, and a plurality of level 3 tables.
  • a level 0 table stores page entries that include table descriptors that identify a specific level 1 table (e.g., a specific table of the plurality of level 1 tables, a first table of the plurality of level 1 tables), a level 1 table stores page entries that include table descriptors that identify a specific level 2 table (e.g., a specific table of the plurality of level 2 tables, a first table of the plurality of level 2 tables), a level 2 table stores page entries that include table descriptors that identify a specific level 3 table (e.g., a specific table of the plurality of level 3 tables, a first table of the plurality of level 3 tables), and a level 3 table stores page entries that include page descriptors that identify a specific page table in memory 104.
  • Table walk process 300 begins at the level 0 table and continues until the requested data 390 stored in the page entry in memory 104 (e.g., the page table in memory 104) is identified.
  • a data fetch process begins with a processor (e.g., processor 204-1) of a processing cluster (e.g., processing cluster 202-1) receiving an address translation request 310 that includes a virtual address 312 to be translated.
  • Virtual address 312 includes a translation table base register (TTBR), which identifies the level 0 table at which a data fetcher of the processor (e.g., data fetcher 208-1 of processor 204-1) can begin table walk process 300.
  • Table walk process 300 is initiated in accordance with a determination that requested data 390 (e.g., data requested by address translation request 310) is not stored in the TLB (e.g., a TLB “miss”).
  • Data fetcher 208 begins table walk process 300 by identifying a first table descriptor 322 that is stored in a page table entry in the level 0 table 320.
  • First table descriptor 322 includes information that identifies a level 1 table 330 (e.g., a specific level 1 table) for which data fetcher 208 can query to continue table walk process 300.
  • a portion e.g., a first portion 312-1
  • virtual address 312 is used to find first table descriptor 322 in level 0 table 320.
  • a first portion 312-1 of virtual address 312 may include a reference to the page table entry in level 0 table 320 that stores first table descriptor 322.
  • Data fetcher 208 identifies level 1 table 330 based on first table descriptor 322 obtained (e.g., output) from level 0 table 320, and identifies a second table descriptor 332 that is stored in a page table entry in level 1 table 330.
  • Second table descriptor 332 includes information that identifies a level 2 table 340 (e.g., a specific level 2 table) for which data fetcher 208 can query to continue table walk process 300.
  • at least a portion (e.g., a second portion 312-2) of virtual address 312 is used to find second table descriptor 332 in level 1 table 330.
  • a second portion 312-2 of virtual address 312 may include a reference to the page table entry in level 1 table 330 that stores second table descriptor 332.
  • level 1 table 330 in addition to providing second table descriptor 332, level 1 table 330 also provides a first block descriptor 334 that identifies a first contiguous portion 390-1 within memory 104, e.g., a first contiguous portion 390-1 in memory 104 within which requested data 390 is stored.
  • Data fetcher 208 identifies level 2 table 340 based on second table descriptor 332 obtained from level 1 table 330, and identifies a third table descriptor 342 that is stored in a page table entry in level 2 table 340.
  • Third table descriptor 342 includes information that identifies a level 3 table 350 (e.g., a specific level 3 table) for which data fetcher 208 can query to continue table walk process 300.
  • a portion e.g., a third portion 312-3) of virtual address 312 is used to find third table descriptor 342 in level 2 table 340.
  • a third portion 312-3 of virtual address 312 may include a reference to the page table entry in level 2 table 340 that stores third table descriptor 342.
  • level 2 table 330 in addition to providing (e.g., outputting) third table descriptor 342, level 2 table 330 also provides a second block descriptor 344 that identifies a second contiguous portion 390-2 within memory 104 (e.g., a second contiguous portion 390-2 in memory 104 within which requested data 390 (e.g., requested address translation) is stored).
  • second contiguous portion 390-2 in memory 104 includes a smaller portion of memory 104 compared to first contiguous portion 390-1 in memory 104, and first contiguous portion 390-1 in memory 104 includes second contiguous portion 390-2 in memory 104.
  • first contiguous portion 390-1 in memory 104 includes 16 MB of space in memory 104
  • second contiguous portion 390-2 in memory 104 includes 32 KB of space in the memory.
  • Data fetcher 208 identifies level 3 table 350 based on third table descriptor 342 obtained (e.g., output) from level 2 table 340, and identifies a page descriptor 352 that is stored in a page table entry in level 3 table 350.
  • Page descriptor 352 includes information that identifies a page table 360 in memory 104 for which data fetcher 208 can query to continue table walk process 300.
  • at least a portion (e.g., a fourth portion 312-4) of virtual address 312 is used to find page descriptor 352 in memory 104.
  • a fourth portion 312-4 of virtual address 312 may include a reference to the page table entry in level 3 table 350 that stores page descriptor 352.
  • Data fetcher 208 queries page table 360 in memory 104, as identified by page descriptor 352 output from level 3 table 350, to find a page entry 362 that stores requested data 390 (e.g., stores the requested virtual address to physical address translation).
  • a page entry 362 that stores requested data 390 (e.g., stores the requested virtual address to physical address translation).
  • at least a portion (e.g., a fifth portion 312-5) of virtual address 312 is used to find page entry 362 in page table 360.
  • a fifth portion 312-5 of virtual address 312 may include a reference to the byte on page table 360 that stores requested data 390.
  • a data fetcher of a processor e.g., data fetcher 208-1 of processor 204-1 is able to obtain requested data 390 (e.g., requested address translation 390, physical address 390 corresponding to request 310) and pass requested data 390 to the processor.
  • requested data 390 e.g., requested address translation 390, physical address 390 corresponding to request 3
  • the table walk process introduces latency into system operations.
  • outputs from a table walk process are stored in a cache to speed up the data fetching process.
  • Figure 3B illustrates an example of caching outputs from the table walk process to increase data fetching speed, in accordance with some implementations.
  • Table descriptors 322, 332, and 342 output from level 0 table 320, level 1 table 330, and level 2 table 350, respectively, can be stored in a cache 392 such that future data requests for the same data (e.g., for the same address translation) can be quickly retrieved from cache 392, allowing data fetcher 208 to skip at least a portion of table walk process 300.
  • Cache 392 may correspond to any of cache 218, cache 212, and cache 220.
  • the table walk outputs are stored in cache 212, which is the highest level cache shared by a plurality of processing cores 204.
  • third table descriptor 342 is stored in cache 392
  • data fetcher 208 in response to a new request for an address translation for virtual address 312 (e.g., a request for physical address 390)
  • data fetcher 208 in response to a new request for an address translation for virtual address 312 (e.g., a request for physical address 390)
  • data fetcher 208 in response to a new request for an address translation for virtual address 312 (e.g., a request for physical address 390)
  • data fetcher 208 in response to a new request for an address translation for virtual address 312 (e.g., a request for physical address 390)
  • data fetcher 208 in response to a new request for an address translation for virtual address 312 (e.g., a request for physical address 390)
  • data fetcher 208 in response to a new request for an address translation for virtual address 312 (e.g., a request for physical address 390)
  • cache 392 stores the physical address 390,
  • second table descriptor 332 in response to new request for an address translation for virtual address 312 (e.g., a request for physical address 390), data fetcher 208 is able to skip querying level 0 table 320 and level 1 table 330. Instead, data fetcher 208 can directly obtain second table descriptor 332 since it is stored in cache 392 and complete the table walk process by using second table descriptor 332 to directly identify level 2 table 340 (e.g., without having to query level 0 table 320 and level 1 table 330).
  • Data fetcher 208 completes table walk process 300 by traversing level 2 table 340, level 3 table 350, and page table 360 to retrieve requested data 390 (e.g., physical address 390).
  • requested data 390 e.g., physical address 390.
  • table walk outputs are stored in cache 392, and particularly, table walk outputs from level 2 table 340 are stored over other outputs from the table walk process since outputs from level 2 table 340 provide the biggest shortcut in the table walk process.
  • cache 392 directly stores requested data 390 (e.g., physical address 390) for level 2 table 340. Storing table walk outputs from level 2 table 340 directly returns requested data 390 without requiring data fetcher 208 to perform a table walk.
  • cache 392 stores page descriptor 352 for level 2 table 340.
  • cache replacement policies include different policies for cache entries that store data that satisfy cache promotion criteria (also referred to herein as “preferential cache entries”) versus cache entries that store data that does not satisfy cache promotion criteria (also referred to herein as “non-preferential cache entries”).
  • a data satisfies cache promotion criteria when the data corresponds to outputs from level 2 table 340 (e.g., cache entries that store outputs from level 2 table 340 are preferential cache entries).
  • table walk caches can also be employed in two-stage table walks, which are used in virtual machines that require translation of a virtual address to an intermediate physical address (IPA) and translation of the IP A to a physical address.
  • IPA intermediate physical address
  • FIG 4A illustrates an example method of implementing a two-stage table walk process 400 for fetching data from memory 104, in accordance with some implementations.
  • the two-stage table walk process 400 includes a stage 1 table walk (also called a guest table walk) and a stage 2 table walk.
  • the stage 1 table walk is similar to the one- stage table walk process 300 shown in Figures 3A and 3B, such that the guest table walk first identifies and queries a stage 1 level 0 table (e.g., S 1 L0) to find a table descriptor that identifies a stage 1 level 1 table (e.g., S1L1).
  • a stage 1 level 0 table e.g., S 1 L0
  • S1L1L1 stage 1 level 1 table
  • Data fetcher 208 then uses a table descriptor obtained from (e.g., output from) the stage 1 level 1 table to identify and query a stage 1 level 2 table (e.g., S1L2) to find a table descriptor that identifies a stage 1 level 3 table (e.g., S1L3).
  • Data fetcher 208 then uses a page descriptor obtained from (e.g., output from) the stage 1 level 3 table to identify and query a page table in memory 104 to find the requested data (e.g., requested address translation, requested physical address).
  • each stage 1 table (e.g., tables S1L0, S1L1, S1L2, and S1L3) outputs an IP A that is used in a second stage portion of the two-stage table walk to identify the next table in the first stage (e.g., table S1L0 outputs an IP A that points to a stage 2 level 0 table and a second stage table walk is performed to identify table S1L1).
  • table S1L0 outputs an IP A that points to a stage 2 level 0 table and a second stage table walk is performed to identify table S1L1).
  • Request 410 (e.g., request for an address translation) includes a virtual address that includes a translation table base register (TTBR).
  • TTBR translation table base register
  • the TTBR identifies a stage 2 level 0 table (e.g., SOL0, represented by block “1”) at which a data fetcher of the processor (e.g., data fetcher 208-1 of processor 204-1) begins the two-stage table walk process 400.
  • Two-stage table walk process 400 starts by performing the second stage of table walk process.
  • data fetcher 208 queries the stage 2 tables (e.g., S2L0, S2L1, S2L2, and S2L3 tables) to find descriptors (e.g., IPAs) that identify which stage 1 tables (e.g., S1L0, S1L1, S1L2, and S1L3 tables) to query during the first stage of table walk process 400.
  • stage 2 tables e.g., S2L0, S2L1, S2L2, and S2L3 tables
  • descriptors e.g., IPAs
  • Data fetcher 208 starts by performing the second stage of table walk process 400, starting at a stage 2 level 0 table (e.g., S2L0, represented by block “1”) which provides a descriptor that identifies a stage 2 level 1 table (e.g., S2L1, represented by block “2”), then progressing to stage 2 level 1 table (e.g., S2L1, represented by block “2”) which provides a descriptor that identifies a stage 2 level 2 table (e.g., S2L2, represented by block “3”), then to stage 2 level 2 table which provides a descriptor that identifies a stage 2 level 3 table (e.g., S2L3, represented by block “4”), then to stage 2 level 3 table which provides a descriptor that identifies a stage 1 level 0 table (e.g., S1L0).
  • a stage 2 level 0 table e.g., S2L0, represented by block “1”
  • stage 2 level 1 table e
  • data fetcher 208 can query S1L1 table for an IP A that identifies a stage 2 level 0 table in the next row (e.g., S2L0, represented by block “6”), and data fetcher 208 performs another second stage of table walk process 400 to identify a stage 1 level 1 table in the second row (e.g., S1L1, represented by block ‘7”). This process is repeated until data fetcher 208 identifies S1L3 table.
  • Data fetcher 208 then queries S1L3 table to identify a stage 2 level 0 table in the fifth row (e.g., S2L0, represented by block “21”) and performs a second stage of table walk 400 to identify until a stage 2 level 3 table (e.g., S2L3, represented by block “24”) is identified.
  • Data fetcher queries the stage 2 level 3 table (e.g., S2L3, represented by block “24”) to find a page descriptor that points to a page table in memory 104 where requested data 490 (e.g., requested address translation 490, requested physical address 490) is stored.
  • requested data 490 e.g., requested address translation 490, requested physical address 490
  • the two-stage table walk process 400 shown in Figure 4A can be sped by storing the store outputs (e.g., caching the outputs, such as IPAs, table descriptors, page descriptors, and physical addresses) obtained during two-stage table walk process 400.
  • outputs from any of a stage 2 table e.g., S2L0, S2L1, S2L2, and S2L3 in any row
  • a stage 1 table e.g., S1L0, S1L1, and S1L3
  • FIG. 4B illustrates an example caching table walk outputs for increased speed in data fetching, in accordance with some implementations.
  • a cache e.g., cache 392, 218, 212, or 220
  • stores an output from the tables involved in table walk process 400 e.g., stage 2 tables S2L0, S2L1, S2L2, and S2L3 in any row, stage 1 tables S1L0, S1L1, and S1L3.
  • cache 212 is the upper-most cache that is shared by a plurality of processing cores 204, and is applied to store the outputs from table walk process 400.
  • data fetcher 208 in response to a new request for physical address 490, is configured to skip the second stage of the table walk for the first row of S2L0 table (block “1”), S2L1 table (block “2”), S2L2 table (block “3”), and S2L3 table (blocks “4”) and directly start the table walk at the second stage of the table walk for the second row of stage 2 tables including 82L0 table (block “6”), 82L1 table (block “7”), S2L2 table (block “8”), and S2L3 table (blocks “9”).
  • data fetcher 208 in response to a new request for physical address 490, is able to skip querying the first three rows of the stage 2 tables and skip 81L0, 81 LI, and S1L2 tables in the table walk.
  • Data fetcher 208 can use the cached output to identify the stage 2 level 0 table in the fourth row (e.g,, S2L0 (block “16”)) and perform the two-stage table walk process 400 until physical address 490 is retrieved (e.g., obtained, acquired, identified).
  • stage 2 tables in the fifth row e.g., S2L0, S2L1, S2L2 and S2L3 tables represented by blocks “21,” “22,” “23,” and “24,” respectively
  • data fetcher 208 in response to a new request for the physical address 490, is able to skip the stage 1 table walk entirely and skip the first four row's of the second stage of the table walk, and directly start, the table walk at the fifth row of stage 2 tables.
  • cache 392 stores physical address 490 and does not store descriptors when caching outputs from stage 2 tables S2L0, S2L1, S2L2 and S2L3 in the fifth row, thereby further increasing the data fetch speed and reducing latency. [0054] In some implementations, all outputs from two- stage table walk process 400 are stored in cache 392.
  • Cache 392 stores table walk outputs from the stage 1 level 2 table (e.g., S1L2, represented by block “15” and shown with a patterned fill) and the stage 2 tables in the fifth row (e.g., S2L0, S2L1, S2L2 and S2L3 tables represented by blocks “21,” “22,” “23,” and “24,” respectively, each shown with a patterned fill).
  • stage 1 level 2 table e.g., S1L2, represented by block “15” and shown with a patterned fill
  • the stage 2 tables in the fifth row e.g., S2L0, S2L1, S2L2 and S2L3 tables represented by blocks “21,” “22,” “23,” and “24,” respectively, each shown with a patterned fill.
  • Those outputs provide the biggest shortcut (e.g,, the most steps skipped) in two-stage table walk process 400,
  • table "walk outputs from the stage 1 level 2 table e.g., S1L2, represented by block “15”
  • the stage 2 tables in the fifth row e.g., S2L0, S2L1, S2L2 and S2L3 tables represented by blocks “21,” “22,” “23,” and “24,” respectively
  • cache 392 reduces a corresponding latency and improves data fetching speeds.
  • cache replacement policies include different policies for cache entries (e.g., preferential cache entries) storing data that satisfies cache promotion criteria and cache entries (e.g., non-preferential cache entries) storing data that does not satisfy cache promotion criteria.
  • data satisfies cache promotion criteria when the data corresponds to an output from any of the stage 1 level 2 table (e.g., S1 L2, represented by block “15” and shown with a patterned fill) and the stage 2 tables S2L0, S2L1, S2L2 and S2L3 in the fifth row (e.g., tables represented by blocks “21,” “22,” “23,” and “24,” respectively, each shown with a patterned fill).
  • a new cache entry is added to cache 392.
  • the new cache entry optionally include, and are not limited to, a new cache line and an MMU line that stores table walk outputs including physical address translations, table descriptors, and page descriptors.
  • a cache entry within cache 392 is removed to make space for the new cache entry.
  • Cache 392 relies on a cache replacement policy to determine where in cache 392 the new cache line is stored, e.g., where in cache 392 to insert the new cache line, at what level in cache 392 to insert the new cache line.
  • the cache replacement policy is also used by cache 392 to determine which cache entry in cache 392 is replaced, demoted to a lower cache line, or evicted to make space for the new cache line.
  • the cache entry selected for replacement, demotion, or eviction is called a “victim.” More details regarding cache lines in a cache are discussed below with respect to Figure 5, and more details regarding a cache replacement policy are discussed below with respect to Figures 6A - 6D and 7 A - 7B.
  • FIG. 5 illustrates cache lines 501 (e.g., cache lines 501-1 through 501-P, also referred to herein as “cache levels”) in a cache 392, in accordance with some implementations.
  • Cache 392 may correspond to any of caches 218, 212, and 220 (shown in Figure 2).
  • Cache 392 includes N number ofcache lines 501, with N being any integer number.
  • Cache lines 501 are ordered such that cache line 501-1 is the lowest cache line and cache line 501-P is the highest cache line.
  • cache line 502-2 is higher than first cache line 501-1 and lower than cache line 501-3.
  • cache lines 501 are organized from most recently used (MRU) (e.g., most recently accessed) to least recently used (LRU) (e.g., least recently accessed).
  • MRU most recently used
  • LRU least recently used
  • a cache entry stored at MRU cache line 501-P is more recently used (e.g., more recently accessed, more recently requested by a processor) than a cache entry stored at LRU+1 cache line 501-2.
  • cache 392 is organized based on how recently a cache entry (e.g., the data in the cache entry) was accessed.
  • cache entries of cache 392 stores data (e.g., address translation) as well as a tag corresponding to the data.
  • the tag includes one or more bits that indicates how recently the data was used (e.g., accessed, requested). For example, data is stored in a first cache entry that is stored at LRU+1 cache line 502-2 and requested and thus, a tag corresponding to the first data is updated to indicate that the data was recently accessed.
  • the first cache entry in response to receiving a request for the first data, the first cache entry (which stores the first data) is promoted to a higher cache line.
  • the first cache entry is moved to MRU cache line 501-P or to LRU+2 cache line 501-3.
  • Which cache line 501 in cache 392 the first cache entry is moved to depends on the cache replacement policy of the cache.
  • all cache lines below the new cache line are updated in accordance with promotion of the first data. For example, if the first cache entry is promoted from LRU+1 cache line 501-2 to LRU+3 cache line 501-4, cache lines 501-1 through 501-3 are updated.
  • data previously stored in cache line 501-4 is demoted to cache line 501-3 so that the first cache entry can be stored at cache line 501-4
  • data previously stored in cache line 501-3 is demoted to cache line 501-2
  • data previously stored in cache line 501-2 is demoted to cache line 501-1
  • data previously stored in cache line 501-1 is evited from cache 392, and cache lines above 501-4 are not affected (e.g., MRU cache line 501-P is not affected as long as N > 4).
  • data previously stored in cache line 501-4 is demoted to cache line 501-3 so that the first cache entry can be stored at cache line 501-4 and data previously stored in cache line 501-3 is evicted out of the cache.
  • data previously stored in cache line 501-4 is evicted out of the cache.
  • one of cache lines 501 in cache 392 is selected to store a new cache entry. In some implementations, one of cache entries currently stored in cache 392 is selected to be replaced when a new cache is added to cache 392. In some embodiments, one of cache lines 501 in cache 392 is selected to receive a cache entry (that is already stored in cache 392) to be moved in response to a request for data from the cache entry.
  • a cache replacement policy includes a first set of one or more rules for cache entries (e.g., preferential cache entries) storing data that satisfies cache promotion criteria and a second set of one or more rules, which differ from the first set of one or more rules, for cache entries (e.g., non-preferential cache entries) storing data that does not satisfy cache promotion criteria.
  • implementing the cache replacement policy includes storing an indicator (e.g., marker, tag) in cache entries storing data that satisfy the cache promotion criteria (e.g., in preferential cache entries) that indicates (e.g., identifies, determines) that data stored in the cache entry satisfies the cache promotion criteria.
  • implementing the cache replacement policy includes storing, in a cache entry, an indicator an indicator (e.g., marker, tag) that indicates whether or not data stored in the cache entry satisfies the cache promotion criteria (e.g., whether the cache entry is a preferential cache entry or a non-preferential cache entry).
  • the inclusion of different sets of rules for preferential cache entries versus non-preferential cache entries can be usefill in maintaining usefill (e.g., relevant) information in a cache. For example, when storing outputs from a table walk process in a cache, the cache stores cache entries that store physical addresses over cache entries that store outputs (e.g., table walk descriptors) that do not provide as big of a shortcut in the table walk process.
  • the cache stores cache entries that store physical addresses at high cache lines in order to provide a longer lifetime for the cache entry in the cache compared to storing the cache entry at a lower cache line in the cache.
  • FIGs 6A - 6D and 7 A - 7B illustrate a replacement policy for a cache 392, in accordance with some implementations.
  • Cache 392 may correspond to any of caches 218, 212, and 220 (shown in Figure 2).
  • cache 392 corresponds to a level 2 cache (e.g., a secondary cache, cache 212).
  • memory controller 110 shown in Figure 1 is configured to execute cache replacement policies when adding a new cache entry to the cache, replacing an existing cache entry from the cache, and reorganizing cache lines (including promoting an existing cache entry in the cache to a higher cache line and/or demoting an existing cache entry in the cache to a lower cache line).
  • a cache entry includes data (such as a physical address translation, an intermediate address translation, a block descriptor, or a page descriptor) and a tag that includes one or more indicators regarding the cache entry or the data stored in the cache entry.
  • a tag corresponding to a cache entry may include (e.g., bits in a tag portion of a cache entry include) information regarding any of: (i) whether the cache entry corresponds to a prefetch request or a demand request, (ii) whether or not data in the cache entry satisfies cache promotion criteria (e.g., whether the cache entry is a preferential cache entry or a non-preferential cache entry), (iii) whether or not the cache entry has seen reuse while stored in the cache.
  • a tag may include a plurality of bits.
  • the cache replacement policy handles a cache entry based on the information stored in the tag corresponding to the cache entry.
  • the cache replacement policy biases away from selecting preferential cache entries as victims (e.g., memory controller 110 will select a non- preferential cache entry for replacement before selecting a non-preferential cache entry for replacement regardless of which cache line(s) the preferential cache entry and the non- preferential cache entry are stored).
  • FIGS 6A - 6D illustrate cache replacement policies for cache entries (e.g., non-preferential cache entries) that store data that does not satisfy cache promotion criteria, in accordance with some implementations.
  • Data stored in cache entry 601 does not satisfy cache promotion criteria and thus, cache entry 601 is a non-preferential cache entry (e.g., non-preferential cache line, non-preferential MMU line).
  • Cache entry 601 includes a tag having one or more bits that indicate that data stored in cache entry 601 does not satisfy cache promotion criteria.
  • memory controller 110 receives instructions to store the data as a non-preferential cache entry 601 in cache 392 (e.g., add non-preferential cache entry 601 to cache 392).
  • a pre-determined cache line 501-x e.g., a threshold cache line 501-x, a predefined cache line 501-x.
  • Cache 392 stores non-preferential cache entry 601 at the selected cache line (in this example, LRU+1 cache line 501-2) until memory controller 110 selects cache entry 601 as a victim for replacement from cache 392 (e.g., to make space for a new cache entry), until cache entry 601 is moved (e.g., demoted) to a lower cache line (e.g., LRU cache line 501-1) as new cache entries are added to cache 392 over time and cache entry 601 becomes older (e.g., less recently used), until cache entry 601 is evicted from cache 392, or until another request (e.g., prefetch request or demand request) for data stored in non-preferential cache entry 601 is received by a processor that is in communication with cache 392 (e.g., until any of processors 204-1 through 204- N of processing cluster 202-1 that is in communication with cache 212-1 receives a request for data stored in non-preferential cache entry 601).
  • a processor that is in communication with cache 3
  • non-preferential cache entry 601 is selected for replacement before a request for data stored in non-preferential cache entry 601 is received at a processor that is in communication with cache 392, memory controller 110 demotes non-preferential cache entry 601 to a lower cache line in cache 392 or evicts cache entry 601 (e.g., cache entry 601 is no longer stored at cache 392) to make space for a new cache entry.
  • Figures 6B and 6C illustrate promotion of non-preferential cache entry 601 in accordance with a determination that a second request (e.g., subsequent to and distinct from the first request) for data stored in non-preferential cache entry 601 is received at a processor that is in communication with cache 392 while non-preferential cache entry 601 is stored in cache 392 (regardless of which cache line 501 cache entry 601 is stored at).
  • a second request e.g., subsequent to and distinct from the first request
  • data fetcher passes data stored in non-preferential cache entry 601 to the processor (e.g., data fetcher 208 passes data stored in non-preferential cache entry 601 to processor 204-1) and memory controller 110 promotes non-preferential cache entry 601 to be stored at the highest cache line (e.g., MRU cache line 501 -P), thereby increasing (e.g., maximizing) a lifetime of non-preferential cache entry 601 in cache 392.
  • the processor e.g., data fetcher 208 passes data stored in non-preferential cache entry 601 to processor 204-1
  • memory controller 110 promotes non-preferential cache entry 601 to be stored at the highest cache line (e.g., MRU cache line 501 -P), thereby increasing (e.g., maximizing) a lifetime of non-preferential cache entry 601 in cache 392.
  • the tag associated with data stored in non-preferential cache entry 601 in response to receiving the second request for data stored in non-preferential cache entry 601, is updated to indicate that cache entry 601 has seen re-use (e.g., cache entry 601 was accessed while stored in cache 392). In some implementations, in response to receiving the second request for data stored in non-preferential cache entry 601 and in accordance with a determination that the second request is a demand request, the tag associated with data stored in non-preferential cache entry 601 is updated to indicate that cache entry 601 corresponds to a demand request.
  • the data fetcher passes data stored in non-preferential cache entry 601 to the processor (e.g., data fetcher 208 passes data stored in non-preferential cache entry 601 to processor 204-1) and memory controller 110 promotes non-preferential cache entry 601 to be stored at a cache line (e.g., any of cache lines 501-3 through 501-P) that is higher than a cache line at which non-preferential cache entry 601 is currently stored, thereby increasing the lifetime of non-preferential cache entry 601 in cache 392.
  • a cache line e.g., any of cache lines 501-3 through 501-P
  • memory controller 110 may promote non-preferential cache entry 601 to any of cache lines 501-3 through 501-P.
  • memory controller 110 may promote non-preferential cache entry 601 to any of cache lines 501-2 through 501-P.
  • memory controller 110 promotes non-preferential cache entry 601 to be stored at a cache line (e.g., any of cache lines 501-3 through 501-(P-l)) that is higher than a cache line at which non-preferential cache entry 601 is currently stored other than the highest cache line (e.g., MRU cache line 501-P).
  • a cache line e.g., any of cache lines 501-3 through 501-(P-l)
  • the tag associated with data stored in the non-preferential cache entry 601 in response to receiving the second request for data stored in non-preferential cache entry 601, is updated to indicate that cache entry 601 has seen re-use (e.g., cache entry 601 was accessed while stored in cache 392). In some implementations, in response to receiving the second request for data stored in non-preferential cache entry 601 and in accordance with a determination that the second request is a prefetch request, the tag associated with data stored in non-preferential cache entry 601 is updated to indicate that cache entry 601 corresponds to a prefetch request.
  • a third request (e.g., subsequent to and distinct from each of the first request and the second request) for data stored in non-preferential cache entry 601 is received at a processor that is in communication with cache 392 while non-preferential cache entry 601 is stored in cache 392 (regardless of which cache line 501 cache entry 601 is stored at)
  • the data fetcher passes data stored in non-preferential cache entry 601 to the processor (e.g., data fetcher 208 passes data stored in non-preferential cache entry 601 to processor 204-1) and memory controller 110 promotes non-preferential cache entry 601 to be stored at the highest cache line (e.g., MRU cache line 501-P), thereby increasing (e.g., maximizing) a lifetime of non-preferential cache entry 601 in cache 392.
  • the highest cache line e.g., MRU cache line 501-P
  • memory controller 110 promotes non-preferential cache entry 601 to be stored in cache 392 at LRU+3 cache line 501-4 in response to the second request, and memory controller 110 promotes non-preferential cache entry 601 to be stored in cache 392 at MRU cache line 501-P in response to the third request.
  • the tag associated with data stored in non-preferential cache entry 601 in response to receiving the third request for data stored in non-preferential cache entry 601, is updated to indicate that cache entry 601 has seen multiple reuses (e.g., cache entry 601 was accessed at least twice while stored in cache 392). In some implementations, the tag associated with non-preferential cache entry 601 is updated to indicate the number of times cache entry 601 has been accessed while stored in cache 392 (e.g., the tag indicates that cache entry 601 was accessed twice while stored in cache 392).
  • memory controller 110 in response to subsequent requests (e.g., each subsequent request after the third request) for data stored in cache entry 601, promotes cache entry 601 to MRU cache line 501-P if cache entry 601 is stored in cache 392 at a cache line that is different from MRU cache line 501-P.
  • the tag associated with cache entry 601 in response to each subsequent request, is updated to indicate the number of times cache entry 601 has been accessed while stored in cache 392.
  • FIGS 7 A - 7B illustrate cache replacement policies of a cache 392 that stores data that satisfies cache promotion criteria, in accordance with some implementations.
  • Data stored in cache entry 701 satisfies the cache promotion criteria and thus, cache entry 701 is a preferential cache entry (e.g., preferential cache line, preferential MMU line).
  • Cache entry 701 includes a tag having one or more bits that indicate that data in cache entry 701 satisfies the cache promotion criteria.
  • data stored in a cache entry satisfies the cache promotion criteria (and thus the cache entry storing the data is a preferential cache entry) when the data includes any of: (i) table walk outputs from a level 2 table (such as a cache entry that stores table descriptor 342 or physical address 390 associated with an output from level 2 table 340 in a one-stage table walk process 300 shown in Figure 3B), (ii) table walk outputs from a stage 1 level 2 table (such as a cache entry that stores a table descriptor, intermediate physical address, or physical address 490 associated with an output from S1L2 table (e.g., block “15") in a two-stage table walk process 400 shown in Figure 4B), and (iii) table walk outputs from any stage 2 table in the fifth row of a two-stage table walk (such as a cache entry that stores a table descriptor, page descriptor, intermediate physical address, or physical address 490 associated with an output from any of S2L0, S2L1,
  • memory controller 110 receives instructions to store the data as a preferential cache entry 701 in cache 392 (e.g., add preferential cache entry 701 to cache 392).
  • cache entry 701 being a preferential cache entry
  • memory controller 110 adds preferential cache entry 701 at a cache line 501 that is at or above the pre-determined cache line 501-x (e.g., a threshold cache line 501-x, a predefined cache line 501-x).
  • memory controller 110 adds preferential cache entry 701 to cache 392 at any cache line that is at LRU+2 cache line 501-3 or higher (e.g., any of LRU+2 cache line 501-3 through MRU cache line 501-P) (e.g., such that preferential cache entry 701 is stored at any cache line between (and including) LRU+2 cache line 501-3 through MRU cache line 501-P of cache 392).
  • memory controller 110 adds preferential cache entry 701 at a cache line 501 that is at or above the pre-determined cache line 501-x (e.g., a threshold cache line 501-x, a predefined cache line 501-x) other than MRU cache line 501 -P.
  • memory controller 110 adds preferential cache entry 701 to cache 392 at any cache line that is at LRU+2 cache line 501-3 or higher with the exception of MRU cache line 501 -P (e.g., any of LRU+2 cache line 501-3 through MRU-1 cache line 501-(P-l)) (e.g., such that preferential cache entry 701 is stored at any cache line between (and including) LRU+2 cache line 501-3 through MRU-1 cache line 501-(P-l) of cache 392).
  • MRU cache line 501 -P e.g., any of LRU+2 cache line 501-3 through MRU-1 cache line 501-(P-l)
  • preferential cache entry 701 is stored at any cache line between (and including) LRU+2 cache line 501-3 through MRU-1 cache line 501-(P-l) of cache 392).
  • the data in accordance with a determination that the first request is a demand request, is stored in preferential cache entry 701 at MRU cache line 501 -P.
  • the data in accordance with a determination that the first request is a prefetch request, is stored in preferential cache entry 701 at any cache line 501 that is at or above the pre-determined cache line 501-x (e.g., a threshold cache line 501-x, a predefined cache line 501-x) other than MRU cache line 501-P.
  • the pre-determined cache line 501-x e.g., a threshold cache line 501-x, a predefined cache line 501-x
  • Cache 392 stores preferential cache entry 701 at the selected cache line (in this example, LRU+3 cache line 501-4) until cache entry 701 is evicted from cache 392 (e.g., to make space for a new cache entry), until cache entry 701 is moved (e.g., demoted) to a lower cache line (e.g., LRU+2 cache line 501-3, LRU+1 cache line 501-2, or LRU cache line 501-1) as new cache entries are added to cache 392 over time and cache entry 701 becomes older (e.g., less recently used), or until another request (e.g., prefetch request or demand request) for data stored in preferential cache entry 701 is received by a processor that is in communication with cache 392 (e.g., until any of processors 204-1 through 204-N of processing cluster 202-1 that is in communication with cache 212-1 receives a request for data stored in preferential cache entry 701).
  • a processor that is in communication with cache 392 (e.g., until any of
  • preferential cache entry 701 is selected for replacement before a request for data stored in preferential cache entry 701 is received at a processor that is in communication with cache 392, memory controller 110 demotes preferential cache entry 701 to a lower cache line in cache 392 or evicts preferential cache entry 701 from cache 392 (e.g., cache entry 601 is no longer stored at cache 392) to make space for a new cache entry.
  • the cache replacement policy instructs memory controller 110 to bias away from selecting preferential cache entries that store data that satisfy the cache promotion criteria, such as preferential cache entry 701, for replacement.
  • a preferential cache entry (such as preferential cache entry 701) would be not selected for replacement if cache 392 includes at least one non-preferential cache entry (such as non-preferential cache entry 601).
  • cache 392 may also store other information in addition to cache entries.
  • cache 392 may store instructions for a processor that is in communication with cache 392 (e.g., instructions for any of processors 204-1 through 204-N that are in communication with cache 212-1).
  • memory controller 110 may select other data (e.g., instructions, data that is not stored in a preferential cache entry) stored in cache 392 for replacement before selecting a preferential cache entry 701 for replacement.
  • the cache replacement policy may instruct memory controller 110 to bias away from selecting cache entries that provide a largest shortcut in a table walk process and thus, bias away from selecting preferential cache entries (e.g., cache entries that store data corresponding to any of: (i) an output from level 2 table 340 (shown in Figure 3B) in a one-stage table walk process, an output from a stage 1 level 2 table (e.g., S1L2 table in Figure 4B) in a two-stage table walk process, and (iii) an output from any stage 2 table in the fifth row (e.g., S2L0, S2L1, S2L2, S2L3 in Figure 4B) of a two-stage table walk) for replacement.
  • preferential cache entries e.g., cache entries that store data corresponding to any of: (i) an output from level 2 table 340 (shown in Figure 3B) in a one-stage table walk process, an output from a stage 1 level 2 table (e.g., S1L2 table in Figure 4
  • memory controller 110 when selecting a victim from cache 392, memory controller 110 considers selecting a cache entry that is stored in LRU cache line 501-1. In accordance that cache line 501-1 stores a preferential cache entry (such as preferential cache entry 701), memory controller 110 selects a non-preferential cache entry (such as non-preferential cache entry 601) for replacement instead of selecting a preferential cache entry. In some implementations, memory controller 110 selects a non-preferential cache entry for replacement instead of selecting a preferential cache entry independently of a cache line at which the non- preferential cache entry is stored at and independently of a cache line at which the preferential cache entry is stored. For example, memory controller 110 may select a non-preferential cache entry for replacement instead of selecting a preferential cache entry even if the non-preferential cache entry is stored at a higher cache line than the preferential cache entry.
  • preferential cache entry such as preferential cache entry 701
  • non-preferential cache entry 601 for replacement instead of selecting a preferential cache entry.
  • Figures 7B illustrates promotion of preferential cache entry 701 in accordance with a determination that a second request (e.g., subsequent to and distinct from the first request) for data stored in preferential cache entry 701 is received at a processor that is in communication with cache 392 while preferential cache entry 701 is stored in cache 392 (regardless of the cache line 501 at which cache entry 701 is stored).
  • a second request e.g., subsequent to and distinct from the first request
  • the data fetcher passes data stored in preferential cache entry 701 to the processor (e.g., data fetcher 208 passes data stored in preferential cache entry 701 to processor 204-1) and memory controller 110 promotes preferential cache entry 701 to be stored at the highest cache line (e.g., MRU cache line 501-P), thereby increasing (e.g., maximizing) a lifetime of preferential cache entry 701 in cache 392.
  • the highest cache line e.g., MRU cache line 501-P
  • the tag associated with data stored in preferential cache entry 701 is updated to indicate that cache entry 701 has seen re-use (e.g., cache entry 701 was accessed while stored in cache 392)
  • memory controller 110 in response to subsequent requests (e.g., each subsequent request after the third request) for data stored in cache entry 701, memory controller 110 promotes cache entry 701 to MRU cache line 501-P if cache entry 701 is stored in cache 392 at a cache line that is different from MRU cache line 501-P.
  • the tag associated with cache entry 701 in response to each subsequent request, is updated to indicate the number of times cache entry 701 has been accessed while stored in cache 392.
  • Figures 8A - 8C illustrate a flow chart of an example method of controlling cache entry (e.g., cache line, memory management unit line) replacement in a cache, in accordance with some implementations.
  • Method 800 is implemented at an electronic device 200 that includes a first processing cluster 202-1 having one or more processors 204, and a cache 212-1 that is coupled to one or more processors 204 in first processing cluster 202-1.
  • Cache 212-1 stores a plurality of data entries.
  • Electronic device 200 transmits (810) an address translation request (e.g., address translation request 310 or 410) for translation of a first address from the first processing cluster 202-1 to cache 212.
  • an address translation request e.g., address translation request 310 or 4
  • the electronic device 200 transmits (830) the address translation request to memory (e.g., a lower level cache such as L3 cache 220 or system memory 104, such as DRAM) distinct from cache 212-1.
  • memory e.g., a lower level cache such as L3 cache 220 or system memory 104, such as DRAM
  • the electronic device 200 receives (840) data including a second address (e.g., the requested address translation, such as physical address 390 or 490) corresponding to the first address (e.g., the received data is requested and retrieved from the lower level cache (such as cache 220) or system memory 104).
  • a second address e.g., the requested address translation, such as physical address 390 or 490
  • a determination (850) that the data does not satisfy cache promotion criteria e.g., the data will not be stored as a preferential cache entry
  • replace an entry e.g., a cache entry
  • a first priority level e.g., a first cache line
  • the replaced entry is optionally stored at a level that is lower than the first priority level or evicted from (e.g., no longer stored at) cache 212-1).
  • the data satisfies the cache promotion criteria (e.g., the data will be stored as a preferential cache entry)
  • replace an entry e.g., cache entry
  • a second priority level e.g., a second cache line
  • the replaced entry is optionally stored at a level that is lower than the second priority level or evicted fiom (e.g., no longer stored at) cache 212-1.
  • the second priority level is a higher priority level in cache 212-1 than the first priority level.
  • the address translation request includes a request for translation of a virtual address 312 to a physical address (e.g., physical address 390 or 490).
  • the address translation request includes a request for translation of a virtual address 312 to an intermediate physical address.
  • the address translation request includes a request for translation of an intermediate physical address to another intermediate physical address.
  • the address translation request includes a request for translation of an intermediate physical address to a physical address.
  • the address translation request (e.g., request 310 or 410) is a demand request transmitted fiom the one or more processors (e.g., any of processors 204-1 through 204-N) of the first processing cluster 202-1.
  • the address translation request is transmitted in accordance with the one or more processors 204 executing an instruction requiring translation of the first address (e.g., address 312).
  • the second priority level indicates a most recently used (MRU) entry in the cache 212-1.
  • the address translation request e.g., request 310 or 410
  • the address translation is performed in accordance with a demand request
  • the retrieved translated address is stored in a cache level (e.g., cache line) that indicates a most recently used entry (e.g., at MRU cache line 501-P) or one of a threshold number of most recently used entries in the cache (e.g., one of two, three, or other number of most recently used entries, such as any cache line that is at or above a threshold cache line 501-x).
  • Figure 6B illustrates implementation of the cache replacement policy in accordance with a determination that the address translation request is a demand request.
  • the address translation request is a prefetch request (e.g., the address translation request is transmitted independently of execution of an instruction requiring translation of the first address). In some implementations, the address translation prefetch request is transmitted in the absence of a specific request (e.g., demand request) from the one or more processors for translation of the first address. In some implementations, the address translation prefetch request is transmitted from prefetching circuitry of the first processing cluster 202- 1.
  • the retrieved translated address is stored in a cache level that indicates an entry more recently used than the least recently used entry, but not necessarily the most recently used entry (e.g., the translated address is stored at a lower cache level (e.g., a cache tine that is below a threshold cache tine 501 -x).
  • the translated address is stored at a lower cache line that is below a threshold cache tine 501-x but not at the LRU cache tine 501-1.
  • the translated address is stored at the LRU cache tine 501-1.
  • the first priority level indicates a least recently used (LRU) entry in the cache 212-1.
  • LRU least recently used
  • An example of storing retrieved data that does not satisfy cache promotion criteria in a cache entry (e.g., non-preferential cache entry, such as cache entry 601) at LRU cache tine 501-1 is provided with respect to Figure 6A.
  • the received data is stored in a cache level that indicates the least recently used entry in accordance with a determination that the address translation request is a prefetch request.
  • the received data is stored at the LRU cache tine 501-1 of cache 392.
  • the data is moved to a cache level that indicates a more recently used entry in response to a subsequent data retrieval request (e.g., a demand request) for the same data (e.g., as described herein with reference to operation 880 - 886).
  • a subsequent data retrieval request e.g., a demand request
  • the cache entry is moved to a higher cache tine than a cache tine at which the cache entry is currently stored in the cache.
  • FIG. 6A - 6C An example of storing the retrieved data at a cache entry at LRU cache tine 501-1 in response to a first request and promoting the cache entry to a higher cache tine (e.g., a higher cache level) in response to a second request is provided above in Figures 6A - 6C.
  • the first request and the second request are both prefetch requests.
  • the first priority level (e.g., cache level that is below a threshold cache tine 501-x) indicates one of a threshold number of least recently used entries in the cache 212-1 (e.g., of two, three, or other number of least recently used entries).
  • the first priority level indicates the second least recently used entry in the cache 212-1 (e.g., LRU+1 cache tine 501-2), the third least recently used entry in the cache (e.g., LRU+2 cache tine 501-3), or other less recently used entry in the cache.
  • the received data is stored in a cache level that indicates one of the threshold number of least recently used entries in accordance with a determination that the address translation request is a prefetch request.
  • the data is moved to a cache level that indicates a more recently used entry in response to a subsequent data retrieval request (e.g., a demand request) for the same data (e.g., as described herein with reference to operation 880 - 886).
  • Figure 6A illustrates examples of adding data that does not satisfy cache promotion criteria to cache 392 by storing the data in a non-preferential cache entry (such as non-preferential cache entry 601) at a cache line 501 that is below a cache line threshold 501-x (e.g., cache level threshold).
  • the data satisfies the cache promotion criteria in accordance with a determination that the address translation request for translation of the first address is a request for translation of an intermediate physical address of a respective level to an intermediate physical address of a next level.
  • the data corresponds to an output from a stage 1 level 2 table (e.g., S1L2 (block “15”) in Figures 4A and 4B) in a two- stage table walk process 400.
  • the translation of the intermediate physical address of the respective level to the intermediate physical address of the next level constitutes a last level of translation during a first stage of a two-stage table walk.
  • the data satisfies the cache promotion criteria in accordance with a determination that the address translation request for translation of the first address is a request for translation of an intermediate physical address to a physical address.
  • the data corresponds to an output from a stage 2 table (e.g., S2L, S2L1, S2L2, and S2L3 tables) in a two-stage table walk process 400.
  • the translation of the intermediate physical address to the physical address constitutes a second stage of translation of a two-stage table walk.
  • the intermediate physical address is obtained from the first stage (e.g., a last level of translation of the first stage, stage 1 level 3 table (S1L3)) of translation of the two-stage table walk.
  • method 800 further includes forgoing (870) selecting, for replacement by the data, one or more respective entries (e.g., preferential cache entries, such as preferential cache entry 701 storing data that satisfies cache promotion criteria) in the cache that satisfy the cache promotion criteria.
  • the electronic device 200 avoids selecting any respective entry (e.g., any preferential cache entry that stores data that satisfies cache promotion criteria) that satisfies the cache promotion criteria as a victim for replacement.
  • the replaced entry is selected for replacement in accordance with a determination that the replaced entry fails to satisfy the cache promotion criteria (e.g., a non- preferential cache entry that stores data that does not satisfy cache promotion criteria is selected as a victim for replacement).
  • a cache entry satisfies the cache promotion criteria in accordance with a determination that the entry has satisfied an address translation request to the cache.
  • the cache entry has seen reuse while being stored in the cache.
  • whether a cache entry has satisfied an address translation request is indicated using one or more reuse bits associated with the entry (e.g., a tag stored with the data in the cache entry).
  • method 800 further includes receiving (880) a data retrieval request (e.g., a second address translation request for translation of the first address, such as a demand request from the first processing cluster 202-1) for data at the cache 212-1, and in response to (882) receiving the data retrieval request for the data at the cache, transmitting (884) the data from the cache 212-1 to the first processing cluster 202-1.
  • a data retrieval request e.g., a second address translation request for translation of the first address, such as a demand request from the first processing cluster 202-1
  • method 800 further includes replacing (886) an entry (e.g., cache entry) at a third level in the cache 212-1 with the data.
  • the third level is a higher priority level in the cache 212-1 than the respective level at which the data is stored.
  • the entry at the third level ceases to be stored at the third level, and is optionally stored at a level lower than the third level.
  • the preferential cache entry (such as preferential cache entry 701) that stores the data is promoted (e.g., moved) to a higher cache line such that the preferential cache entry storing the data is stored at a new cache line that is higher than a cache line at which the preferential cache entry is currently stored.
  • the data is stored at a level indicating a least recently used entry or one of a threshold number of least recently used entries (e.g., at a lower cache line that is below the threshold cache line 501-x) as a result of a prefetch request for the data.
  • the data is moved to progressively lower levels in the cache if data retrieval requests for the data are not received (e.g., the data is demoted or degraded over time with nonuse).
  • a subsequent demand request for the data causes the data to be promoted to a higher priority level in the cache (optionally, a level indicating a most recently used entry (e.g., MRU cache line 501-P), or a level indicating one of a threshold number of most recently used entries (e.g., a higher cache line that is at or above the threshold cache line 501-x)) if the data satisfies the cache promotion criteria.
  • method 800 further includes receiving (890) a data retrieval request (e.g., a second address translation request for translation of the first address, such as a demand request from the first processing cluster 202-1) for data at the cache 212-1.
  • a data retrieval request e.g., a second address translation request for translation of the first address, such as a demand request from the first processing cluster 202-1
  • Method 800 further includes, in response to (892) receiving the data retrieval request for the data at the cache and in accordance with a determination (894) that the data does not satisfy the cache promotion criteria, storing the data at a level that is a first number of levels higher in the cache than a respective level (e.g., the first priority level) at which the data is stored (e.g., storing the data in a non-preferential cache entry at a cache line that is higher than a cache line at which the non-preferential cache entry is stored).
  • a level that is a first number of levels higher in the cache than a respective level (e.g., the first priority level) at which the data is stored
  • the first number is (e.g., an integer) greater than zero, and the data is moved from the respective level (e.g., the first priority level) to a higher priority level, and the entry previously stored at the higher priority level ceases to be stored at the higher priority level, and is optionally stored at a level lower than the higher priority level. In some implementations, the first number of levels is zero, and the data continues to be stored at the respective level.
  • Method 800 further includes, in response to (892) receiving the data retrieval request for the data at the cache and in accordance with a determination (896) that the data satisfies the cache promotion criteria, storing the data at a level that is a second number of levels higher in the cache than a respective level (e.g., the second priority level) at which the data is stored (e.g., storing the data in a preferential cache entry at a cache line that is higher than a cache line at which the preferential cache entry is stored).
  • the second number of levels is greater than the first number of levels.
  • the cache is configured to replace the entry previously stored at the higher priority level in the cache with the data.
  • a subsequent request for data stored in the cache e.g., a demand request for prefetched data
  • the data is promoted in the cache more than if the data does not satisfy the cache promotion criteria.
  • Cache translation to physical addresses is implemented such that each physical address can be accessed using a virtual address as an input.
  • a memory management unit MMU
  • MMU memory management unit
  • the table-walk process includes a sequence of memory accesses to the page tables stored in the memory.
  • these memory accesses of the table-walk process are line-size accesses, e.g., to 64B cache lines that are allowed to be cached in a cache hierarchy distinct from a TLB hierarchy.
  • these cache lines associated with the line-size accesses are applied in the L2 and/or L3 cache and not in the L1 cache.
  • each of the 64B lines applied in the L2 cache holds multiple descriptors, and the table-walk process identifies at least a subset of descriptors.
  • Various implementations of this application can be applied to enable cache replacement in the L2 cache.
  • a set of levels or steps of the table-walk process e.g., certain memory accesses or replacement to the L2 cache) are associated with a higher priority and given preferential treatment in the L2 cache compared with other L2 cache accesses or replacement
  • An electronic device comprising: a first processing cluster including one or more processors; and a cache coupled to the one or more processors in the first processing cluster and storing a plurality of data entries; wherein the electronic device is configured to: transmit to the cache an address translation request for translation of a first address; in accordance with a determination that the address translation request is not satisfied by the data entries in the cache: transmit the address translation request to memory distinct from the cache; in response to the address translation request, receive data including a second address corresponding to the first address; in accordance with a determination that the data does not satisfy cache promotion criteria, replacing an entry at a first priority level in the cache with the data; and in accordance with a determination that the data satisfies the cache promotion criteria, replacing an entry at a second priority level in the cache with the data including the second address, wherein the second priority level is a higher priority level in the cache than the first priority level.
  • Clause 3 The electronic device of clause 2, wherein the second priority level indicates a most recently used entry in the cache.
  • Clause 5 The electronic device of any of the preceding clauses, wherein the first priority level indicates a least recently used entry in the cache.
  • Clause 6 The electronic device of any of clauses 1-4, wherein the first priority level indicates one of a threshold number of least recently used entries in the cache.
  • Clause 7 The electronic device of any of the preceding clauses, wherein the data satisfies the cache promotion criteria in accordance with a determination that the address translation request for translation of the first address is a request for translation of an intermediate physical address of a respective level to an intermediate physical address of a next level.
  • Clause 8 The electronic device of any of clauses 1-6, wherein the data satisfies the cache promotion criteria in accordance with a determination that the address translation request for translation of the first address is a request for translation of an intermediate physical address to a physical address.
  • Clause 9 The electronic device of any of the preceding clauses, including forgoing selecting, for replacement by the data, one or more respective entries in the cache that satisfy the cache promotion criteria.
  • Clause 10 The electronic device of any of the preceding clauses, wherein the cache is configured to: receive a data retrieval request for the data; in response to receiving the data retrieval request for the data: transmit the data; and in accordance with a determination that the data satisfies the cache promotion criteria, replace an entry at a third level in the cache with the data, wherein the third level is a higher priority level in the cache than the respective level at which the data is stored.
  • Clause 11 The electronic device of any of clauses 1-9, wherein the cache is configured to: receive a data retrieval request for the data; in response to receiving the data retrieval request for the data: transmit the data; and in accordance with a determination that the data does not satisfy the cache promotion criteria, storing the data at a level that is a first number of levels higher in the cache than a respective level at which the data is stored; and in accordance with a determination that the data satisfies the cache promotion criteria, storing the data at a level that is a second number of levels higher in the cache than a respective level at which the data is stored, and the second number of levels is greater than the first number of levels.
  • Clause 12 A method executed at an electronic device that includes a first processing cluster having one or more processors, and a cache coupled to the one or more processors in the first processing cluster and storing a plurality of data entries, the method comprising: transmitting an address translation request for translation of a first address to the cache; in accordance with a determination that the address translation request is not satisfied by the data entries in the cache: transmitting the address translation request to memory distinct from the cache; in response to the address translation request, receiving data including a second address corresponding to the first address; in accordance with a determination that the data does not satisfy cache promotion criteria, replacing an entry at a first priority level in the cache with the data; and in accordance with a determination that the data satisfies the cache promotion criteria, replacing an entry at a second priority level in the cache with the data including the second address, wherein the second priority level is a higher priority level in the cache than the first priority level.
  • Clause 13 The method of clause 12, wherein the address translation request is a demand request transmitted from the one or more processors of the first processing cluster.
  • Clause 14 The method of clause 13, wherein the second priority level indicates a most recently used entry in the cache.
  • Clause 15 The method of any of clauses 12-14, wherein the address translation request is a prefetch request
  • Clause 16 The method of any of clauses 12-15, wherein the first priority level indicates a least recently used entry in the cache.
  • Clause 17 The method of any of clauses 12-15, wherein the first priority level indicates one of a threshold number of least recently used entries in the cache.
  • Clause 18 The method of any of clauses 12-17, wherein the data satisfies the cache promotion criteria in accordance with a determination that the address translation request for translation of the first address is a request for translation of an intermediate physical address of a respective level to an intermediate physical address of a next level.
  • Clause 19 The method of any of clauses 12-17, wherein the data satisfies the cache promotion criteria in accordance with a determination that the address translation request for translation of the first address is a request for translation of an intermediate physical address to a physical address.
  • Clause 20 The method of any of clauses 12-19, further comprising: forgoing selecting, for replacement by the data, one or more respective entries in the cache that satisfy the cache promotion criteria.
  • Clause 21 The method of any of clauses 12-20, further comprising: receiving a data retrieval request for the data at the cache; in response to receiving the data retrieval request for the data at the cache: transmitting the data from the cache; and in accordance with a determination that the data satisfies the cache promotion criteria, replacing an entry at a third level in the cache with the data, wherein the third level is a higher priority level in the cache than the respective level at which the data is stored.
  • Clause 22 The method of any of clauses 12-20, further comprising: receiving a data retrieval request for the data at the cache; in response to receiving the data retrieval request for the data at the cache: transmitting the data from the cache; and in accordance with a determination that the data does not satisfy the cache promotion criteria, storing the data at a level that is a first number of levels higher in the cache than a respective level at which the data is stored; and in accordance with a determination that the data satisfies the cache promotion criteria, storing the data at a level that is a second number of levels higher in the cache than a respective level at which the data is stored, and the second number of levels is greater than the first number of levels.
  • Clause 23 A non-transitory computer readable storage medium storing one or more programs configured for execution by an electronic device that comprises a first processing cluster including one or more processors, and a cache coupled to the one or more processors in the first processing cluster and storing a plurality of data entries, the one or more programs including instractions that, when executed by the electronic device, cause the electronic device to perform a method of any of clauses 12-22.
  • Clause 24 An electronic device that includes a first processing cluster having one or more processors, and a cache coupled to the one or more processors in the first processing cluster and storing a plurality of data entries, comprising at least one means for performing a method of any of clauses 12-22.
  • the term “if’ is, optionally, construed to mean “when” or “upon” or “in response to determining” or “in response to detecting” or “in accordance with a determination that,” depending on the context.
  • the phrase “if it is determined” or “if [a stated condition or event] is detected” is, optionally, construed to mean “upon determining” or “in response to determining” or “upon detecting [the stated condition or event]” or “in response to detecting [the stated condition or event]” or “in accordance with a determination that [a stated condition or event] is detected,” depending on the context
  • stages that are not order dependent may be reordered and other stages may be combined or broken out. While some reordering or other groupings are specifically mentioned, others will be obvious to those of ordinary skill in the art, so the ordering and groupings presented herein are not an exhaustive list of alternatives. Moreover, it should be recognized that the stages can be implemented in hardware, firmware, software or any combination thereof.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Memory System Of A Hierarchy Structure (AREA)

Abstract

L'invention concerne un dispositif électronique qui comprend un ou plusieurs processeurs et une mémoire cache qui stocke des entrées de données. Le dispositif électronique transmet une demande de transfert d'une première adresse vers la mémoire cache. S'il est déterminé que la requête n'est pas satisfaite par les entrées de données dans la mémoire cache, le dispositif électronique transmet la requête à une mémoire qui est distincte de la mémoire cache, et reçoit des données comprenant une seconde adresse correspondant à la première adresse. S'il est déterminé que les données ne satisfont pas à des critères de promotion de mémoire cache, le dispositif électronique remplace une entrée à un premier niveau de priorité dans la mémoire cache par les données. S'il est déterminé que les données satisfont aux critères de promotion de mémoire cache, le dispositif électronique remplace une entrée à un second niveau de priorité, qui est un niveau de priorité plus élevé que le premier niveau de priorité dans la mémoire cache, par les données comprenant la seconde adresse.
EP22751611.9A 2021-07-14 2022-07-11 Remplacement de mémoire cache sensible au niveau Pending EP4371011A1 (fr)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
US202163221875P 2021-07-14 2021-07-14
US17/666,429 US20230012880A1 (en) 2021-07-14 2022-02-07 Level-aware cache replacement
PCT/US2022/073591 WO2023288192A1 (fr) 2021-07-14 2022-07-11 Remplacement de mémoire cache sensible au niveau

Publications (1)

Publication Number Publication Date
EP4371011A1 true EP4371011A1 (fr) 2024-05-22

Family

ID=82839259

Family Applications (1)

Application Number Title Priority Date Filing Date
EP22751611.9A Pending EP4371011A1 (fr) 2021-07-14 2022-07-11 Remplacement de mémoire cache sensible au niveau

Country Status (2)

Country Link
EP (1) EP4371011A1 (fr)
WO (1) WO2023288192A1 (fr)

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP6995723B2 (ja) * 2018-09-19 2022-01-17 キオクシア株式会社 メモリシステム、ストレージシステム、および制御方法
US11210232B2 (en) * 2019-02-08 2021-12-28 Samsung Electronics Co., Ltd. Processor to detect redundancy of page table walk
US11232042B2 (en) * 2019-11-15 2022-01-25 Microsoft Technology Licensing, Llc Process dedicated in-memory translation lookaside buffers (TLBs) (mTLBs) for augmenting memory management unit (MMU) TLB for translating virtual addresses (VAs) to physical addresses (PAs) in a processor-based system

Also Published As

Publication number Publication date
WO2023288192A1 (fr) 2023-01-19

Similar Documents

Publication Publication Date Title
US11074190B2 (en) Slot/sub-slot prefetch architecture for multiple memory requestors
US7552286B2 (en) Performance of a cache by detecting cache lines that have been reused
US8176255B2 (en) Allocating space in dedicated cache ways
JP6505132B2 (ja) メモリ容量圧縮を利用するメモリコントローラならびに関連するプロセッサベースのシステムおよび方法
US8806137B2 (en) Cache replacement using active cache line counters
KR101483849B1 (ko) 계층적으로 캐싱되는 프로세서들에서의 조정된 프리페칭
US20230153251A1 (en) Cache Memory That Supports Tagless Addressing
JP6859361B2 (ja) 中央処理ユニット(cpu)ベースシステムにおいて複数のラストレベルキャッシュ(llc)ラインを使用してメモリ帯域幅圧縮を行うこと
US8583874B2 (en) Method and apparatus for caching prefetched data
US8185692B2 (en) Unified cache structure that facilitates accessing translation table entries
US10628318B2 (en) Cache sector usage prediction
JP2017516234A (ja) 次の読取りアドレスプリフェッチングによるメモリ容量圧縮および/またはメモリ帯域幅圧縮を利用するメモリコントローラ、ならびに関連するプロセッサベースシステムおよび方法
CN1955948A (zh) 用于管理高速缓存数据的数字数据处理设备和方法
CN1352771A (zh) 改进虚拟存储器系统存取性能的技术
US7809889B2 (en) High performance multilevel cache hierarchy
US20110320720A1 (en) Cache Line Replacement In A Symmetric Multiprocessing Computer
US20060106991A1 (en) Victim prefetching in a cache hierarchy
US6772299B2 (en) Method and apparatus for caching with variable size locking regions
CN1607510A (zh) 改善高速缓存性能的方法和系统
JP5976225B2 (ja) スティッキー抜去エンジンを伴うシステムキャッシュ
US20120159086A1 (en) Cache Management
US20230012880A1 (en) Level-aware cache replacement
EP4371011A1 (fr) Remplacement de mémoire cache sensible au niveau
US20230064603A1 (en) System and methods for invalidating translation information in caches
CN117642731A (zh) 级感知高速缓存替换

Legal Events

Date Code Title Description
STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: UNKNOWN

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE INTERNATIONAL PUBLICATION HAS BEEN MADE

PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: REQUEST FOR EXAMINATION WAS MADE

17P Request for examination filed

Effective date: 20231116

AK Designated contracting states

Kind code of ref document: A1

Designated state(s): AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR