CN118159952A - Use of retirement page history for instruction translation look-aside buffer (TLB) prefetching in a processor-based device - Google Patents

Use of retirement page history for instruction translation look-aside buffer (TLB) prefetching in a processor-based device Download PDF

Info

Publication number
CN118159952A
CN118159952A CN202380014159.6A CN202380014159A CN118159952A CN 118159952 A CN118159952 A CN 118159952A CN 202380014159 A CN202380014159 A CN 202380014159A CN 118159952 A CN118159952 A CN 118159952A
Authority
CN
China
Prior art keywords
page
instruction
htp
determining
processor
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202380014159.6A
Other languages
Chinese (zh)
Inventor
A·K·拉特
C·布拉斯科
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Qualcomm Inc
Original Assignee
Qualcomm Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from US18/340,291 external-priority patent/US20240037042A1/en
Application filed by Qualcomm Inc filed Critical Qualcomm Inc
Priority claimed from PCT/US2023/069044 external-priority patent/WO2024030707A1/en
Publication of CN118159952A publication Critical patent/CN118159952A/en
Pending legal-status Critical Current

Links

Landscapes

  • Memory System Of A Hierarchy Structure (AREA)

Abstract

Disclosed herein are methods and apparatus for using retired page history for instruction translation look-aside buffer (TLB) prefetching in a processor-based device. In some exemplary aspects, a processor-based device is provided. The processor-based device includes a history-based instruction TLB prefetcher (HTP) circuit configured to determine that a first instruction of a first page has been retired. The HTP circuit is also configured to determine a first page Virtual Address (VA) of the first page. The HTP circuit is further configured to determine that the first page VA is different from a last retired page VA indicator of the HTP circuit. The HTP circuit is further configured to store the first page VA as the value of the last retired page VA indicator in response to determining that the first page VA is different from the value of the last retired page VA indicator of the HTP circuit.

Description

Use of retirement page history for instruction translation look-aside buffer (TLB) prefetching in a processor-based device
Request priority
The present application claims priority to U.S. provisional patent application Ser. No.63/369,996, filed on 8/1 at 2022, and entitled "using retired pages history for instruction translation look buffer(TLB)prefetch in processor-based device", the contents of which are incorporated herein by reference in their entirety.
The present application also claims priority from U.S. patent application Ser. No.18/340,291, entitled "USING RETIRED PAGES HISTORY FOR INSTRUCTION TRANSLATION LOOKASIDE BUFFER(TLB)PREFETCHING IN PROCESSOR-BASED DEVICES", filed on 6/23 at 2023, the contents of which are incorporated herein by reference in their entirety.
Technical Field
The present technology relates generally to instruction translation look-aside buffer (TLB) prefetching in a processor-based apparatus.
Background
Microprocessors (also referred to herein as "processors") perform computing tasks for various applications. Conventional processors employ a processing technique called an instruction pipeline whereby the throughput of computer instructions being executed can be increased by dividing the processing of each instruction into a series of steps, which are then executed within an execution pipeline consisting of multiple stages. Optimal processor performance may be achieved if all stages in the execution pipeline are capable of processing instructions simultaneously and sequentially as they are ordered in the execution pipeline. Conventional processors also utilize virtual memory, which refers to a memory management mechanism that maps memory addresses referenced by executing processes (i.e., virtual addresses) to physical addresses within system memory. By using virtual memory, a processor-based system is able to provide access to virtual memory space that is larger than the actual physical memory space, and is able to enhance inter-process security through memory isolation. The mapping of virtual memory addresses to their corresponding physical memory addresses is accomplished using a data structure called a page table. To further improve performance, page table entries retrieved from the page table during virtual to physical memory address translations are cached in a data structure referred to as a translation look-aside buffer or TLB.
Performance of conventional processors may be negatively impacted by instruction TLB demand misses that occur when an instruction TLB does not contain TLB entries corresponding to a page containing instructions to be fetched and executed. An instruction TLB demand miss may require a stall during which the processor must wait to perform a virtual-to-physical memory address translation for the virtual address of the page containing the instruction. Such stalls waste processor cycles during which the processor may be performing production work.
One approach to reducing the impact of instruction TLB demand misses employs a mechanism known as a history-based TLB prefetcher (HTP). The HTP associates the instruction TLB demand miss with a history of one or more previous instruction TLB demand misses, such that a repeat of a previous instruction TLB demand miss may trigger an instruction TLB prefetch and hopefully prevent a repeat of a subsequent instruction TLB demand miss. However, current conventional HTPs may provide suboptimal performance due to the dependency of instruction TLB prefetching on the occurrence of previous instruction TLB demand misses that may not actually occur. Furthermore, the accuracy of conventional HTPs may be negatively impacted by prefetching performed on speculative execution paths.
Disclosure of Invention
Aspects disclosed in the detailed description include using retired page history for instruction translation look-aside buffer (TLB) prefetching in a processor-based device. Related apparatus and methods are also disclosed. In this regard, in some exemplary aspects disclosed herein, a processor-based device provides a history-based TLB prefetcher (HTP) circuit configured to track last retired page Virtual Addresses (VA), which represent page VA of pages containing recently retired instructions. When the HTP circuit detects that retirement of an instruction belonging to a page having a different VA than the last retired page VA has occurred, the HTP circuit captures a page VA of the page containing the retired instruction and stores the captured page VA as the last retired page VA. The HTP circuitry also tracks each subsequent instruction TLB demand miss by creating a corresponding history table entry in the history table of the HTP circuitry that associates the page VA of the instruction TLB demand miss with the last retired page VA. When the HTP circuit updates the last retired page VA, the HTP circuit consults the history table and, if a history table entry is identified that corresponds to the (new) last retired page VA, the HTP circuit initiates an instruction TLB prefetch request for the page VA that is not hit by the instruction TLB demand associated with the last retired page VA in the history table entry. By associating instruction TLB demand misses with the last retired page VA, instruction TLB prefetching may be performed in a more accurate and timely manner, thus improving processor performance.
In another aspect, a processor-based device is provided. The processor-based device includes HTP circuitry configured to determine that a first instruction of a first page has been retired. The HTP circuit is also configured to determine a first page VA of the first page. The HTP circuit is further configured to determine that the first page VA is different from a last retired page VA indicator of the HTP circuit. The HTP circuit is further configured to store the first page VA as the value of the last retired page VA indicator in response to determining that the first page VA is different from the value of the last retired page VA indicator of the HTP circuit. The HTP circuit is also configured to determine that an instruction TLB demand for a second page VA of a second page results in a miss. The HTP circuitry is further configured to store, in response to determining that the instruction TLB demand for the second page VA results in a miss, a history table entry in the history table of the HTP circuitry representing an association of the second page VA with a value of a last retired page VA indicator. The HTP circuit is additionally configured to identify a history table entry corresponding to the first page VA and indicating a previous instruction TLB demand miss for the second page VA. The HTP circuitry is also configured to initiate an instruction TLB prefetch request for the second page VA.
In another aspect, a processor-based device is provided. The processor-based device includes HTP circuitry configured to determine that a first instruction of a first page has been retired. The HTP circuit is also configured to determine a first page VA of the first page. The HTP circuit is further configured to determine that the first page VA is different from a last retired page VA indicator of the HTP circuit. The HTP circuit is further configured to store the first page VA as the value of the last retired page VA indicator in response to determining that the first page VA is different from the value of the last retired page VA indicator of the HTP circuit.
In another aspect, a processor-based device is provided. The processor-based device includes means for determining that a first instruction of a first page has been retired. The processor-based apparatus further includes means for determining a first page VA of the first page. The processor-based apparatus further includes means for determining that the first page VA is different from the value of the last retired page VA indicator. The processor-based apparatus additionally includes means for storing the first page VA as the value of the last retired page VA indicator in response to determining that the first page VA is different from the value of the last retired page VA indicator.
In another aspect, a method is provided for using retired page history for instruction TLB prefetching in a processor-based device. The method includes determining, by HTP circuitry of a processor-based device, that a first instruction of a first page has been retired. The method further includes determining, by the HTP circuit, a first page VA of the first page. The method also includes determining, by the HTP circuit, that the first page VA is different from a last retired page VA indicator of the HTP circuit. The method additionally includes storing, by the HTP circuit, the first page VA as a value of a last retired page VA indicator in response to determining that the first page VA is different from a value of a last retired page VA indicator of the HTP circuit.
Drawings
FIG. 1 is a block diagram of an exemplary processor-based device including a history-based translation look-aside buffer (TLB) prefetch (HTP) circuit configured to use a retirement page history for instruction TLB prefetching, according to some aspects;
FIG. 2 is a block diagram illustrating exemplary operations and communication flows for using retired page history for instruction TLB prefetching, according to some aspects;
FIGS. 3A and 3B provide a flowchart illustrating exemplary operations performed by the HTP circuits of FIGS. 1 and 2 for using retired page history for instruction TLB prefetching, according to some aspects; and
Fig. 4 is a block diagram of an exemplary processor-based device that may include the HTP circuits of fig. 1 and 2.
Detailed Description
Referring now to the drawings, several exemplary aspects of the present disclosure are described. The word "exemplary" is used herein to mean "serving as an example, instance, or illustration. Any aspect described herein as "exemplary" is not necessarily to be construed as preferred or advantageous over other aspects.
Aspects disclosed in the detailed description include using retired page history for instruction translation look-aside buffer (TLB) prefetching in a processor-based device. Related apparatus and methods are also disclosed. In this regard, in some exemplary aspects disclosed herein, a processor-based device provides a history-based TLB prefetcher (HTP) circuit configured to track last retired page Virtual Addresses (VA), which represent page VA of pages containing recently retired instructions. When the HTP circuit detects that retirement of an instruction belonging to a page having a different VA than the last retired page VA has occurred, the HTP circuit captures a page VA of the page containing the retired instruction and stores the captured page VA as the last retired page VA. The HTP circuitry also tracks each subsequent instruction TLB demand miss by creating a corresponding history table entry in the history table of the HTP circuitry that associates the page VA of the instruction TLB demand miss with the last retired page VA. When the HTP circuit updates the last retired page VA, the HTP circuit consults the history table and, if a history table entry is identified that corresponds to the (new) last retired page VA, the HTP circuit initiates an instruction TLB prefetch request for the page VA that is not hit by the instruction TLB demand associated with the last retired page VA in the history table entry. By associating instruction TLB demand misses with the last retired page VA, instruction TLB prefetching may be performed in a more accurate and timely manner, thus improving processor performance.
In this regard, fig. 1 is a schematic diagram of an exemplary processor-based device 100 including a processor 102. The processor 102 (which may also be referred to as a "processor core" or a "Central Processing Unit (CPU) core") may be an in-order processor or an out-of-order processor (OoP), and/or may be one of the plurality of processors 102 provided by the processor-based device 100. In the example of fig. 1, processor 102 includes instruction processing circuitry 104 that includes one or more instruction pipelines I 0-IN for processing instructions 106 fetched from instruction memory (the "instr memory" of fig. 1) 108 for execution by fetch circuitry 110. As a non-limiting example, the instruction memory 108 may be provided in or as part of a system memory in the processor-based device 100. An instruction cache (the "INSTR. Cache" in the declarative text of FIG. 1) may also be provided in processor 102 to cache instructions 106 fetched from instruction memory 108 to reduce latency in fetch circuitry 110.
The fetch circuit 110 in the example of fig. 1 is configured to provide the instruction 106 as a fetched instruction 106F into one or more instruction pipelines I 0 -I N to be preprocessed in the instruction processing circuit 104 before the fetched instruction 106F reaches an execution circuit (the illustrated "exec circuit" in fig. 1) 114 to be executed. The instruction pipeline I 0-IN is provided across different processing circuits or stages of the instruction processing circuit 104 to pre-process and process the fetched instructions 106F in a series of steps that are performed concurrently to increase throughput before the fetched instructions 106F are executed by the execution circuit 114.
With continued reference to FIG. 1, the instruction processing circuit 104 includes a decode circuit 118 configured to decode the fetched instructions 106F fetched by the fetch circuit 110 into decoded instructions 106D to determine the desired instruction type and action. The type of instruction and the required actions encoded in the decoded instruction 106D may also be used to determine in which instruction pipeline I 0-IN the decoded instruction 106D should be placed. In this example, the decoded instructions 106D are placed in one or more of the instruction pipelines I 0-IN and are then provided to the renaming circuitry 120 in the instruction processing circuitry 104. Renaming circuit 120 is configured to determine whether any register names in decoded instruction 106D should be renamed to decouple any register dependencies that would prevent parallel or out-of-order processing.
Instruction processing circuit 104 in processor 102 in fig. 1 also includes register access circuit (labeled "RACC circuit" in fig. 1) 122. The register access circuitry 122 is configured to access physical registers in a Physical Register File (PRF) (not shown) based on mapping entries of logical registers in a Register Map Table (RMT) (not shown) mapped to source register operands of the decoded instruction 106D to retrieve the generated values from the executed instruction 106E in the execution circuitry 114. The register access circuit 122 is also configured to provide the resulting value taken from the executed instruction 106E as a source register operand for the decoded instruction 106D to be executed.
Further, in the instruction processing circuit 104, a scheduler circuit (the pictogram is "sched. Circuit" in fig. 1) 124 is provided in the instruction pipeline I 0-IN and is configured to store the decoded instruction 106D in a reserved entry until all source register operands for the decoded instruction 106D are available. The scheduler circuitry 124 issues decoded instructions 106D to the execution circuitry 114 ready for execution. Write circuitry 126 is also provided in instruction processing circuitry 104 to write or commit the generated value from executed instruction 106E back to memory (e.g., PRF), cache memory, or system memory.
As seen in fig. 1, the processor-based device 100 further includes a memory system 128 that provides a memory management unit (abbreviated as "MMU" in fig. 1) 130 configured to manage memory accesses. The MMU 130 is communicatively coupled to an instruction translation look-aside buffer (in FIG. 1 the pictorially text is an "instruction TLB") 132 for caching recently used virtual to physical memory address translations of pages containing instructions to be fetched. As shown in FIG. 1, the MMU 130 is also communicatively coupled to a memory controller 134 configured to perform memory read and write operations to a system memory 136. As a non-limiting example, in some aspects, the system memory 136 may comprise Double Data Rate (DDR) Synchronous Dynamic Random Access Memory (SDRAM). In some aspects, the instruction TLB 132 may be provided as an integral element of the MMU 130.
MMU 130 of FIG. 1 is responsible for performing virtual-to-physical memory address translation operations to support virtual memory functionality of processor-based device 100. In some aspects, MMU 130 may include a plurality of hierarchical page tables (not shown) including page table entries that each represent a mapping for subdivision of an addressable virtual memory space having a particular size. The map stored by the page table entries of the hierarchical page table of the MMU 130 may be cached in TLB entries (not shown) of the instruction TLB 132. In this way, frequently used virtual to physical memory address mappings do not have to be recalculated for each memory access request performed by MMU 130.
However, as described above, the performance of the processor-based apparatus 100 may be negatively impacted by instruction TLB demand misses that occur when the instruction TLB 132 does not contain TLB entries corresponding to pages containing instructions to be fetched and executed. Such instruction TLB demand misses may require the processor 102 to stall while it waits to perform a virtual-to-physical memory address translation for the page containing the instruction. Conventional approaches to minimizing instruction TLB demand misses may provide suboptimal performance due to their dependence on the occurrence of previous instruction TLB demand misses that may not occur, and due to their sensitivity to corruption by prefetching on speculative execution paths.
In this regard, in some exemplary aspects disclosed herein, the processor-based device 100 provides an HTP circuit 138 that includes a last retired page VA indicator 140 and a history table 142. As discussed in more detail below with respect to FIG. 2, the HTP circuit 138 uses the last retired page VA indicator 140 to track the page VA of the last retired instruction (i.e., the last retired page VA) and uses the history table entry (not shown) of the history table 142 to associate the value of the last retired page VA indicator 140 with a subsequent instruction TLB demand miss on the instruction TLB 132. The HTP circuitry 138 may later query the history table 142 and if a history table entry is identified that corresponds to the value of the last retired page VA indicator 140, the HTP circuitry 138 initiates an instruction TLB prefetch request for the page VA for which the instruction TLB demand miss is associated with the value of the last retired page VA indicator 140 in the history table entry.
FIG. 2 illustrates exemplary operations of the HTP circuit 138 of FIG. 1 for using retired page history for instruction TLB prefetching, according to some aspects. As shown in FIG. 2, HTP circuit 138 includes last retired page VA indicator 140 and history table 142 of FIG. 1. History table 142 includes a plurality of history table entries 200 (0) -200 (H). FIG. 2 also shows a sequence 202 of page accesses, where a series of pages 204 (0) -204 (2) are shown as exemplary pages, each page having a corresponding page VA 206 (0) -206 (2). At a time prior to the point shown in FIG. 2, when instruction 208 (0) in page 204 (0) is retired, the value of last retired page VA indicator 140 is set to page VA 206 (0). Subsequently, the next instruction to retire is instruction 208 (1) in page 204 (1). The HTP circuitry 138 determines that the instruction 208 (1) has retired (e.g., by receiving a notification from the instruction processing circuitry 104 of fig. 1, monitoring the instruction processing circuitry 104, or otherwise communicating with the instruction processing circuitry 104). As used herein, an instruction that has been "retired" means that the instruction has been executed and committed by the processor-based apparatus 100 and is no longer speculatively executed.
The HTP circuit 138 determines the page VA 206 (1) of the page 204 (1) containing the instruction 208 (1) and determines that the page VA 206 (1) is different from the last retired page VA (i.e., page VA 206 (0)) indicator 140. Thus, the HTP circuit 138 stores page VA 206 (1) as the last retired page VA indicator 140 value, as indicated by arrow 210. Later in the example of FIG. 2, the HTP circuit 138 determines that the instruction TLB requirement of page VA 206 (2), which contains page 204 (2) of instruction 208 (2), resulted in a miss. In response, the HTP circuit 138 stores a history table entry 200 (0) representing the association of the value of the last retired page VA indicator 140 (as indicated by arrow 212) with page VA 206 (2) (as indicated by arrow 214). In some aspects, history table entry 200 (0) may include a Markov chain that associates the value of last retired page VA indicator 140 with page VA 206 (2).
During a subsequent iteration of the page access sequence 202, when the HTP circuit 138 again stores the page VA 206 (1) as the last retired page VA indicator 140 value, the HTP circuit 138 also identifies the history table entry 200 (0) as corresponding to page VA 206 (1) and indicating a previous instruction TLB demand miss for page VA 206 (2) of page 204 (2). Thus, the HTP circuit 138 initiates an instruction TLB prefetch request 216 for page VA 206 (2) (i.e., to the MMU 130 and/or instruction TLB 132 of FIG. 1), as indicated by arrow 218.
To illustrate exemplary operations performed by the HTP circuits 138 of fig. 1 and 2 for retired page history using instruction TLB prefetching, according to some aspects, fig. 3A and 3B provide a flowchart 300. For clarity, reference is made to the elements of fig. 1 and 2 in describing fig. 3A and 3B. It should be appreciated that some aspects may provide that some of the operations shown in fig. 3A and 3B may be performed in a different order than shown herein and/or may be omitted. The operations in FIG. 3A begin with an HTP circuit (e.g., HTP circuit 138 of FIGS. 1 and 2) determining that an instruction (e.g., instruction 208 (1) of FIG. 2) of a page (e.g., page 204 (1) of FIG. 2) has been retired (block 302). In some aspects, the operations of block 302 to determine that instruction 208 (1) of page 204 (1) has retired may include HTP circuitry 138 determining that instruction 208 (1) has been executed and committed by processor-based device 100 and is no longer speculative (block 304).
The HTP circuitry 138 next determines the page VA of the page 204 (1) (e.g., the page VA 206 (1) of fig. 2) (block 306). The HTP circuit 138 then determines whether the page VA 206 (1) is different from the value of the last retired page VA indicator of the HTP circuit 138 (e.g., the last retired page VA indicator 140 of fig. 1 and 2) (block 308). If so, the HTP circuit 138 stores page VA 206 (1) as the value of the last retired page VA indicator 140 (block 310). In some aspects, if the HTP circuit 138 determines at decision block 308 that page VA 206 (1) is not different than the value of the last retired page VA indicator 140, then processing continues in a conventional manner (block 312). Operations according to some aspects may then continue at block 314 of fig. 3B.
Referring now to FIG. 3B, some aspects may provide that the HTP circuit 138 may determine whether an instruction TLB requirement to page VA (e.g., page VA 206 (2) of FIG. 2) of a page (e.g., page 204 (2) of FIG. 2) resulted in a miss (block 314). If so, the HTP circuit 138 stores a history table entry (e.g., history table entry 200 (0) of FIG. 2) in the history table 142 of the HTP circuit 138 that represents an association of the page VA 206 (2) with the value of the last retired page VA indicator 140 (e.g., page VA 206 (1) of FIG. 2) (block 316). If the HTP circuit 138 determines at decision block 314 that an instruction TLB demand miss has not occurred, operation in some aspects may continue at block 318.
According to some aspects, the HTP circuit 138 may identify a history table entry (e.g., history table entry 200 (0) of fig. 2) of the history table 142 of the HTP circuit 138 that corresponds to page VA 206 (1) and indicates a previous instruction TLB demand miss for page VA 206 (2) of page 204 (2) (block 318). The HTP circuitry 138 then initiates an instruction TLB prefetch request (e.g., the instruction TLB prefetch request 216 of fig. 2) for the page VA 206 (2) (block 320).
HTP circuits according to aspects disclosed herein and discussed with reference to fig. 1, 2, 3A, and 3B may be provided in or integrated into any processor-based device. Examples include, but are not limited to, set top boxes, entertainment units, navigation devices, communications devices, fixed location data units, mobile location data units, global Positioning System (GPS) devices, mobile phones, cellular phones, smart phones, session Initiation Protocol (SIP) phones, tablet computers, cell phones, servers, computers, portable computers, mobile computing devices, laptop computers, wearable computing devices (e.g., smart watches, health or fitness trackers, eyeglasses, etc.), desktop computers, personal Digital Assistants (PDAs), monitors, computer monitors, televisions, tuners, radio units, satellite radio units, music players, digital music players, portable music players, digital Video Disc (DVD) players, portable digital video players, automobiles, vehicle components, avionics systems, drones, and multi-gyroplanes.
In this regard, fig. 4 illustrates an example of a processor-based device 400 that includes HTP circuitry as shown and described with respect to fig. 1, 2, 3A, and 3B. In this example, processor-based device 400, which corresponds in functionality to processor-based device 100 of fig. 1, includes a processor 402, processor 402 including one or more CPUs 404 coupled to a cache memory 406. The CPU 404 is also coupled to a system bus 408 and may interconnect devices included in the processor-based device 400. As is well known, the CPU 404 communicates with these other devices by exchanging address, control and data information over the system bus 408. For example, CPU 404 may communicate a bus transaction request to memory controller 410. Although not shown in fig. 4, a plurality of system buses 408 may be provided, wherein each system bus 408 constitutes a different fabric.
Other devices may be connected to the system bus 408. As shown in FIG. 4, these devices may include, by way of example, a memory system 412, one or more input devices 414, one or more output devices 416, one or more network interface devices 418, and one or more display controllers 420. The input device 414 may include any type of input device including, but not limited to, input keys, switches, a voice processor, etc. The output device 416 may include any type of output device including, but not limited to, audio, video, other visual indicators, and the like. Network interface device 418 may be any device configured to allow data to be exchanged to and from network 422. Network 422 may be any type of network including, but not limited to, a wired or wireless network, a private or public network, a Local Area Network (LAN), a Wireless Local Area Network (WLAN), a Wide Area Network (WAN), a bluetooth network, and the internet. The network interface device 418 may be configured to support any type of communication protocol desired. The memory system 412 may include a memory controller 410 coupled to one or more memory arrays 424 and HTP circuits 426 (e.g., the HTP circuits 138 of fig. 1 and 2).
The CPU 404 may also be configured to access the display controller 420 via the system bus 408 to control information sent to one or more displays 428. Display controller 420 sends information to display 428 for display via one or more video processors 430, and video processor 730 processes the information to be displayed into a format suitable for display 428. The display 428 may include any type of display including, but not limited to, a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), a plasma display, a Light Emitting Diode (LED) display, and the like.
Those of skill would further appreciate that the various illustrative logical blocks, modules, circuits, and algorithms described in connection with the aspects disclosed herein may be implemented as electronic hardware, instructions stored in memory or other computer readable media and executed by a processor or other processing device, or combinations of both. As an example, the master and slave devices described herein may be used in any circuit, hardware component, integrated Circuit (IC), or IC chip. The memory disclosed herein may be any type and size of memory and may be configured to store any desired type of information. To clearly illustrate this interchangeability, various illustrative components, blocks, modules, circuits, and steps have been described above generally in terms of their functionality. How these functions are implemented depends on the particular application, design choice and/or design constraints imposed on the overall system. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present disclosure.
The various illustrative logical blocks, modules, and circuits described in connection with the aspects disclosed herein may be implemented with: a processor, digital Signal Processor (DSP), application Specific Integrated Circuit (ASIC), field Programmable Gate Array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components, or any combination thereof designed to perform the functions described herein. A processor may be a microprocessor, but in the alternative, the processor may be any conventional processor, controller, microcontroller, or state machine. A processor may also be implemented as a combination of computing devices (e.g., a combination of a DSP and a microprocessor, a plurality of microprocessors, one or more microprocessors in conjunction with a DSP core, or any other such configuration).
Aspects disclosed herein may be embodied in hardware and in instructions that are stored in hardware and that may be located, for example, in Random Access Memory (RAM), flash memory, read Only Memory (ROM), electrically Programmable ROM (EPROM), electrically Erasable Programmable ROM (EEPROM), registers, a hard disk, a removable disk, a CD-ROM, or any other form of computer-readable medium known in the art. An exemplary storage medium is coupled to the processor such the processor can read information from, and write information to, the storage medium. In the alternative, the storage medium may be integral to the processor. The processor and the storage medium may reside in an ASIC. The ASIC may reside in a remote station. In the alternative, the processor and the storage medium may reside as discrete components in a remote station, base station, or server.
It is also noted that the operational steps described in any of the exemplary aspects herein are described to provide examples and discussion. The described operations may be performed in a variety of different orders than that shown. Furthermore, operations described in a single operational step may actually be performed in a plurality of different steps. Additionally, one or more of the operational steps discussed in the exemplary aspects may be combined. It will be understood that the operational steps shown in the flow diagrams may be subject to many different modifications, as will be apparent to those skilled in the art. Those of skill in the art would understand that information and signals may be represented using any of a variety of different technologies and techniques. For example, data, instructions, commands, information, signals, bits, symbols, and chips that may be referenced throughout the above description may be represented by voltages, currents, electromagnetic waves, magnetic fields or particles, optical fields or particles, or any combination thereof.
The previous description of the disclosure is provided to enable any person skilled in the art to make or use the disclosure. Various modifications to the disclosure will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other variations as well. Thus, the disclosure is not limited to the examples and designs described herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.
Implementation examples are described in the numbered clauses below.
1. A processor-based device comprising a history-based translation look-aside buffer (TLB) prefetcher (HTP) circuit configured to:
determining that a first instruction of a first page has been retired;
Determining a first page Virtual Address (VA) of the first page;
Determining that the first page VA is different from the last retired page VA indicator of the HTP circuitry;
In response to determining that the first page VA is different from the last retired page VA indicator value of the HTP circuit, storing the first page VA as the last retired page VA indicator value;
Determining that an instruction TLB demand for a second page VA of the second page results in a miss;
Responsive to determining that the instruction TLB demand for the second page VA results in a miss, storing a history table entry in the history table of the HTP circuit representing an association of the second page VA with a value of a last retired page VA indicator;
identifying a history table entry corresponding to the first page VA and indicating a previous instruction TLB demand miss for the second page VA; and
An instruction TLB prefetch request is initiated for the second page VA.
2. A processor-based device comprising a history-based translation look-aside buffer (TLB) prefetcher (HTP) circuit configured to:
determining that a first instruction of a first page has been retired;
Determining a first page Virtual Address (VA) of the first page;
Determining that the first page VA is different from the last retired page VA indicator of the HTP circuitry;
In response to determining that the first page VA is different from the last retired page VA indicator value of the HTP circuit, storing the first page VA as the last retired page VA indicator value;
3. The processor-based device of clause 2, wherein the HTP circuit is configured to determine that the first instruction has been retired by being configured to determine that the first instruction has been executed and committed by the processor-based device and is no longer speculative.
4. The processor-based device of any of clauses 2-3, wherein the HTP circuit is further configured to:
Determining that an instruction TLB demand for a second page VA of a second page results in a miss; and
Responsive to determining that the instruction TLB demand for the second page VA results in a miss, storing a history table entry in the history table of the HTP circuit representing an association of the second page VA with a value of a last retired page VA indicator;
5. the processor-based device of clause 4, wherein the history table entry comprises a markov chain.
6. The processor-based device of any of clauses 4-5, wherein the HTP circuit is further configured to:
identifying a history table entry corresponding to the first page VA and indicating a previous instruction TLB demand miss for the second page VA; and
An instruction TLB prefetch request is initiated for the second page VA.
7. The processor-based device of any of clauses 2-6, integrated into a device selected from the group consisting of: a set top box; an entertainment unit; a navigation device; a communication device; a fixed location data unit; moving the location data unit; a Global Positioning System (GPS) device; a mobile telephone; a cellular telephone; a smart phone; session Initiation Protocol (SIP) telephony; a tablet computer; a tablet mobile phone; a server; a computer; a portable computer; a mobile computing device; a wearable computing device; a desktop computer; personal Digital Assistants (PDAs); a monitor; a computer monitor; a television set; a tuner; a radio unit; a satellite radio unit; a music player; a digital music player; a portable music player; a digital video player; a video player; digital Video Disc (DVD) players; a portable digital video player; an automobile; a vehicle component; avionics systems; unmanned plane; a multi-gyroplane.
8. A processor-based device, comprising:
means for determining that a first instruction of a first page has been retired;
A unit for determining a first page Virtual Address (VA) of a first page;
means for determining that the first page VA is different from the value of the last retired page VA indicator; and
The method also includes storing the first page VA as the value of the last retired page VA indicator in response to determining that the first page VA is different from the value of the last retired page VA indicator.
9. The processor-based device of clause 8, wherein the means for determining that the first instruction has retired comprises means for determining that the first instruction has been executed and committed by the processor-based device and is no longer speculative.
10. The processor-based device of any of clauses 8-9, further comprising:
means for determining that an instruction translation look-aside buffer (TLB) requirement for a second page VA of a second page results in a miss; and
Means for storing, in the history table, a history table entry representing an association of the second page VA with a value of a last retired page VA indicator in response to determining that an instruction TLB demand for the second page VA resulted in a miss.
11. The processor-based device of clause 10, wherein the history table entry comprises a markov chain.
12. The processor-based device of any of clauses 10-11, further comprising:
Means for identifying a history table entry corresponding to a first page VA and indicating a previous instruction translation look-aside buffer (TLB) demand miss for a second page VA; and
A unit for initiating an instruction TLB prefetch request for the second page VA.
13. A method of using retired page history for instruction translation look-aside buffer (TLB) prefetching, comprising:
Determining, by a history-based translation TLB prefetcher (HTP) circuit of a processor-based device, that a first instruction of a first page has been retired;
Determining, by the HTP circuit, a first page Virtual Address (VA) of the first page;
Determining, by the HTP circuit, that the first page VA is different from a last retired page VA indicator value of the HTP circuit; and
In response to determining that the first page VA is different from the last retired page VA indicator value of the HTP circuitry, the first page VA is stored by the HTP circuitry as the last retired page VA indicator value.
14. The method of clause 13, wherein determining that the first instruction has been retired comprises determining that the first instruction has been executed and committed by a processor-based device and is no longer speculative.
15. The method according to any one of clauses 13-14, further comprising:
Determining that an instruction TLB demand for a second page VA of a second page results in a miss; and
Responsive to determining that the instruction TLB demand for the second page VA results in a miss, storing a history table entry in the history table of the HTP circuit representing an association of the second page VA with a value of a last retired page VA indicator;
16. The method of clause 15, wherein the history table entry comprises a markov chain.
17. The method according to any one of clauses 15-16, further comprising:
identifying a history table entry corresponding to the first page VA and indicating a previous instruction TLB demand miss for the second page VA; and
An instruction TLB prefetch request is initiated for the second page VA.

Claims (17)

1. A processor-based device comprising a history-based translation look-aside buffer (TLB) prefetcher (HTP) circuit configured to:
determining that a first instruction of a first page has been retired;
Determining a first page Virtual Address (VA) of the first page;
determining that the first page VA is different from a last retired page VA indicator of the HTP circuit;
In response to determining that the first page VA is different from the value of the last retired page VA indicator of the HTP circuit, storing the first page VA as the value of the last retired page VA indicator;
Determining that an instruction TLB demand for a second page VA of a second page results in a miss; and
In response to determining that the instruction TLB demand for the second page VA results in a miss, storing a history table entry in a history table of the HTP circuit representing an association of the second page VA with the value of the last retired page VA indicator;
identifying the history table entry corresponding to the first page VA and indicating a previous instruction TLB demand miss for the second page VA; and
An instruction TLB prefetch request is initiated for the second page VA.
2. A processor-based device comprising a history-based translation look-aside buffer (TLB) prefetcher (HTP) circuit configured to:
determining that a first instruction of a first page has been retired;
Determining a first page Virtual Address (VA) of the first page;
determining that the first page VA is different from a last retired page VA indicator of the HTP circuit; and
In response to determining that the first page VA is different from the value of the last retired page VA indicator of the HTP circuit, the first page VA is stored as the value of the last retired page VA indicator.
3. The processor-based device of claim 2 wherein the HTP circuitry is configured to determine that the first instruction has been retired by being configured to determine that the first instruction has been executed and committed by the processor-based device and is no longer speculative.
4. The processor-based device of claim 2, wherein the HTP circuit is further configured to:
Determining that an instruction TLB demand for a second page VA of a second page results in a miss; and
In response to determining that the instruction TLB demand for the second page VA results in a miss, a history table entry representing an association of the second page VA with the value of the last retired page VA indicator is stored in a history table of the HTP circuit.
5. The processor-based device of claim 4 wherein the history table entry comprises a markov chain.
6. The processor-based device of claim 4, wherein the HTP circuit is further configured to:
identifying the history table entry corresponding to the first page VA and indicating a previous instruction TLB demand miss for the second page VA; and
An instruction TLB prefetch request is initiated for the second page VA.
7. The processor-based device of claim 2 integrated into a device selected from the group consisting of: a set top box; an entertainment unit; a navigation device; a communication device; a fixed location data unit; moving the location data unit; a Global Positioning System (GPS) device; a mobile telephone; a cellular telephone; a smart phone; session Initiation Protocol (SIP) telephony; a tablet computer; a tablet mobile phone; a server; a computer; a portable computer; a mobile computing device; a wearable computing device; a desktop computer; personal Digital Assistants (PDAs); a monitor; a computer monitor; a television set; a tuner; a radio unit; a satellite radio unit; a music player; a digital music player; a portable music player; a digital video player; a video player; digital Video Disc (DVD) players; a portable digital video player; an automobile; a vehicle component; avionics systems; unmanned plane; a multi-gyroplane.
8. A processor-based device, comprising:
means for determining that a first instruction of a first page has been retired;
a unit for determining a first page Virtual Address (VA) of the first page;
Means for determining that the first page VA is different from the value of the last retired page VA indicator; and
The apparatus further includes means for storing the first page VA as the value of the last retired page VA indicator in response to determining that the first page VA is different from the value of the last retired page VA indicator.
9. The processor-based device of claim 8 wherein the means for determining that the first instruction has been retired comprises means for determining that the first instruction has been executed and committed by the processor-based device and is no longer speculative.
10. The processor-based device of claim 8, further comprising:
means for determining that an instruction translation look-aside buffer (TLB) requirement for a second page VA of a second page results in a miss; and
Means for storing a history table entry in a history table representing an association of the second page VA with the value of the last retired page VA indicator in response to determining that an instruction TLB demand for the second page VA resulted in a miss.
11. The processor-based device of claim 10 wherein the history table entry comprises a markov chain.
12. The processor-based device of claim 10, further comprising:
Means for identifying the history table entry corresponding to the first page VA and indicating a previous instruction translation look-aside buffer (TLB) demand miss for the second page VA; and
A unit for initiating an instruction TLB prefetch request for the second page VA.
13. A method of using retired page history for instruction translation look-aside buffer (TLB) prefetching, comprising:
determining, by a history-based TLB prefetcher (HTP) circuit of the processor-based device, that a first instruction of a first page has been retired;
Determining, by the HTP circuit, a first page Virtual Address (VA) of the first page;
Determining, by the HTP circuitry, that the first page VA is different from a value of a last retired page VA indicator of the HTP circuitry; and
In response to determining that the first page VA is different from the value of the last retired page VA indicator of the HTP circuit, the first page VA is stored by the HTP circuit as the value of the last retired page VA indicator.
14. The method of claim 13, wherein determining that the first instruction has been retired comprises determining that the first instruction has been executed and committed by the processor-based device and is no longer speculative.
15. The method of claim 13, further comprising:
Determining that an instruction TLB demand for a second page VA of a second page results in a miss; and
In response to determining that the instruction TLB demand for the second page VA results in a miss, a history table entry representing an association of the second page VA with the value of the last retired page VA indicator is stored in a history table of the HTP circuit.
16. The method of claim 15, wherein the history table entry comprises a markov chain.
17. The method of claim 15, further comprising:
identifying the history table entry corresponding to the first page VA and indicating a previous instruction TLB demand miss for the second page VA; and
An instruction TLB prefetch request is initiated for the second page VA.
CN202380014159.6A 2022-08-01 2023-06-26 Use of retirement page history for instruction translation look-aside buffer (TLB) prefetching in a processor-based device Pending CN118159952A (en)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
US63/369,996 2022-08-01
US18/340,291 US20240037042A1 (en) 2022-08-01 2023-06-23 Using retired pages history for instruction translation lookaside buffer (tlb) prefetching in processor-based devices
US18/340,291 2023-06-23
PCT/US2023/069044 WO2024030707A1 (en) 2022-08-01 2023-06-26 Using retired pages history for instruction translation lookaside buffer (tlb) prefetching in processor-based devices

Publications (1)

Publication Number Publication Date
CN118159952A true CN118159952A (en) 2024-06-07

Family

ID=91290758

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202380014159.6A Pending CN118159952A (en) 2022-08-01 2023-06-26 Use of retirement page history for instruction translation look-aside buffer (TLB) prefetching in a processor-based device

Country Status (1)

Country Link
CN (1) CN118159952A (en)

Similar Documents

Publication Publication Date Title
US11709679B2 (en) Providing load address predictions using address prediction tables based on load path history in processor-based systems
US10353819B2 (en) Next line prefetchers employing initial high prefetch prediction confidence states for throttling next line prefetches in a processor-based system
US9886385B1 (en) Content-directed prefetch circuit with quality filtering
JP2018523242A (en) Storing narrow generated values for instruction operands directly in a register map in the out-of-order processor
EP3221784B1 (en) Providing loop-invariant value prediction using a predicted values table, and related apparatuses, methods, and computer-readable media
WO2018057273A1 (en) Reusing trained prefetchers
US11061824B2 (en) Deferring cache state updates in a non-speculative cache memory in a processor-based system in response to a speculative data request until the speculative data request becomes non-speculative
US11847053B2 (en) Apparatuses, methods, and systems for a duplication resistant on-die irregular data prefetcher
US20240037042A1 (en) Using retired pages history for instruction translation lookaside buffer (tlb) prefetching in processor-based devices
US20240264949A1 (en) Using retired pages history for instruction translation lookaside buffer (tlb) prefetching in processor-based devices
US11868269B2 (en) Tracking memory block access frequency in processor-based devices
US11593109B2 (en) Sharing instruction cache lines between multiple threads
CN118159952A (en) Use of retirement page history for instruction translation look-aside buffer (TLB) prefetching in a processor-based device
US20240273033A1 (en) Exploiting virtual address (va) spatial locality using translation lookaside buffer (tlb) entry compression in processor-based devices
US12130751B2 (en) Compressing translation lookaside buffer (TLB) tags using a TLB metadata buffer in processor-based devices
US20240273034A1 (en) Compressing translation lookaside buffer (tlb) tags using a tlb metadata buffer in processor-based devices
US11762660B2 (en) Virtual 3-way decoupled prediction and fetch
US20240201999A1 (en) Accelerating fetch target queue (ftq) processing in a processor
US20160291981A1 (en) Removing invalid literal load values, and related circuits, methods, and computer-readable media
EP4070188A1 (en) Providing express memory obsolescence in processor-based devices

Legal Events

Date Code Title Description
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination