WO2021104502A1 - 一种加速硬件页表遍历的方法及装置 - Google Patents

一种加速硬件页表遍历的方法及装置 Download PDF

Info

Publication number
WO2021104502A1
WO2021104502A1 PCT/CN2020/132489 CN2020132489W WO2021104502A1 WO 2021104502 A1 WO2021104502 A1 WO 2021104502A1 CN 2020132489 W CN2020132489 W CN 2020132489W WO 2021104502 A1 WO2021104502 A1 WO 2021104502A1
Authority
WO
WIPO (PCT)
Prior art keywords
level
cache
base address
address
access request
Prior art date
Application number
PCT/CN2020/132489
Other languages
English (en)
French (fr)
Inventor
张乾龙
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2021104502A1 publication Critical patent/WO2021104502A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/10Address translation
    • G06F12/1009Address translation using page tables, e.g. page table structures

Definitions

  • This application relates to the field of computer storage, and in particular to a method and device for accelerating hardware page table traversal.
  • the memory management unit (Memory Management Unit, MMU) in the processor core is responsible for translating a virtual address (Virtual Address, VA) used by an application program into a physical address (Physical Address, PA).
  • VA Virtual Address
  • PA Physical Address
  • the conversion from VA to PA requires querying the page table. Since part of the page table is stored in the MMU's Translation Lookaside Buffer (TLB), the TLB can generally be used to assist the MMU to complete the virtual address conversion (for example, if the required page table base address happens to be stored in the TLB (ie Hit TLB), then the conversion from VA to PA can be completed); otherwise, it is necessary to perform hardware page table walk (HPTW) on the stored page table to obtain the final PA.
  • TLB Translation Lookaside Buffer
  • TLB miss an automatic hardware page table traversal process is triggered, and then the TLB is refilled.
  • HPTW is currently completed serially, and the number of memory accesses is directly related to the number of page table levels (for example, if the page table is divided into four levels, at least four data accesses are required to complete the conversion from VA to PA).
  • LSU Load Store Unit
  • ALU arithmetic logic unit
  • the data obtained each time needs to be returned to the LSU in the processor core for reprocessing, which results in low page table traversal efficiency.
  • Solution 1 Add a cache to the MMU to cache the base address of the intermediate level page table needed in the HPTW process, such as the base address of the third to fourth level page table. Query the cache after a TLB miss occurs; if the cache is hit, it will jump directly to the page table of the corresponding level (for example, query the cache after a TLB miss occurs; if a third-level page table is found in the cache, the page table is retrieved The base address; then proceed to HPTW to query the page table of the remaining levels).
  • adding additional cache to the processor core will increase production costs; in addition, because the additional cache has a small capacity, it is impossible to guarantee the hits of all queries, and it is difficult to effectively improve the memory access efficiency.
  • Solution 2 Add a prefetch engine to each level of cache to prefetch the data of the linked list structure in each level of cache, so there is no need to fetch the memory address back to the LSU in the core for processing.
  • a TLB structure needs to be added to each level of cache (the TLB structure is used to translate VA to PA), otherwise data prefetching cannot be performed. Therefore, taking the second option will cause too much overhead.
  • the embodiment of the present invention provides a method and device for accelerating hardware page table traversal, which can improve the hardware page table traversal process, accelerate page table traversal, and improve address conversion efficiency.
  • the control unit of the i-th level first cache is configured to:
  • the K-th level base address is sent to the calculation unit of the i-th level first cache.
  • the calculation unit of the i-th level first cache is used to:
  • the (K+1)-th stage base address is determined according to the K-th stage base address and the K-th stage offset address, and the K-th stage offset address is determined according to the K-th stage high-order address.
  • a calculation unit is added to each level of the first cache, so that after each level of the first cache obtains the base address, it can directly calculate the base address of the next level based on the base address and the corresponding offset.
  • the control unit of the i-th level first cache receives the memory access request, and on the premise that the i-th level first cache (that is, the current level cache) stores the K-th base address, it sends it to the calculation unit of the current level cache The K-th base address in the memory fetch request.
  • This level cache calculates the (K+1) level base address (ie, the next level base address) based on the K level base address and the offset address determined from the upper address included in the memory access request.
  • Level base address and then continue to query the next level base address; in this embodiment of the present invention, no matter which level of the first cache determines the K-th level base address, it can be moved through the calculation unit added in the level of cache
  • the next-level base address is obtained by bit calculation, which improves the access efficiency of the base address and accelerates the traversal process of the hardware page table, thereby ultimately improving the conversion efficiency of virtual addresses to physical addresses.
  • control unit of the i-th level first cache is further configured to: when determining that the i-th level first cache does not store the K-th level base address and i ⁇ N In this case, the memory access request is sent to the control unit of the (i+1)-th level first cache.
  • the lower-level cache that is, the (i+1)-th level first cache
  • Sending the memory access request supplements the aforementioned another result of judging whether the i-th level first cache stores the K-th level base address, so that the embodiment of the present invention is more complete to avoid execution errors or system failures.
  • the device further includes a home node; the home node is coupled to the i-th level first cache; the control unit of the i-th level first cache is further configured to: If the i-th level first cache does not store the K-th level base address, sending the memory access request to the home node.
  • the home node is used to receive the memory access request.
  • after the description of the home node is added in the case where it is determined that the k-th level base address is not stored in the i-th level first cache, not only can it be sent to the (i+1)-th level first cache When sending the memory access request, the memory access request can also be sent to the home node at the same time. Further improve the processing situation of the first cache at the Nth level. Sending a memory access request to the home node is helpful for judging the storage status of the base address in all caches.
  • the home node is further configured to: after receiving the memory access request, determine whether each level of the first cache in the N-level first cache is stored according to the K-th base address There is the Kth level base address; in the case of determining that the target first cache stores the Kth level base address, the memory access request is sent to the control unit of the target first cache.
  • the home node judges whether each level of the first cache has a cache storing the required K-th level base address. After determining that the target first cache (a certain level or several levels of cache) stores the K-th level base address, a memory access request is sent to the level of cache.
  • the third-level cache judges that it does not have the base address of the third-level page table, and the second-level cache stores the base address of the third-level page table. Since the third-level cache does not return to the second-level cache for access, the home node can be responsible for returning the request to the second-level cache.
  • the device further includes: a memory controller coupled with the home node, and a memory coupled with the memory controller; the home node includes a buffer; the home node, It is also used to send the memory access request to the memory controller in the case where it is determined that the K-th base address is not stored in the first cache of each level.
  • the memory controller is configured to: determine the Kth level base address in the memory according to the Kth level base address; and send the Kth level base address to the home node.
  • the memory is used to store the K-th level base address.
  • the buffer is used to determine the (K+1)th level base address according to the Kth level base address and the Kth level offset address.
  • the home node when the home node finds that all caches do not have the required base address for storage, it will send the memory access request to the memory controller, instructing the memory controller in the processor chip to obtain the required base address from the memory. Base address.
  • the situation that may occur in the foregoing embodiment is supplemented to avoid execution errors, and to ensure that the page table traversal process can continue even if the cache does not store the required base address.
  • the device further includes a second cache coupled with the home node, the second cache including a control unit of the second cache and a calculation unit of the second cache;
  • the home node is further configured to: after receiving the memory access request, determine whether the second cache stores the K-th base address; after determining that the second cache stores the K-th base address In the case of sending the memory access request to the control unit of the second cache.
  • the situation that the processor chip has multiple processor cores is supplemented.
  • the home node can check whether there is the required base address in the cache of other cores to continue the query of the page table.
  • the device further includes a memory controller coupled with the home node and a memory coupled with the memory controller; the home node includes a buffer; the home node further It is used to send the memory access request to the memory controller in a case where it is determined that the K-th level base address is not stored in the first cache and the second cache of each level.
  • the memory controller is configured to: determine the Kth level base address in the memory according to the Kth level base address; and send the Kth level base address to the home node.
  • the memory is used to store the K-th level base address.
  • the buffer is used to determine the (K+1)th level base address according to the Kth level base address and the Kth level offset address.
  • the processor chip has multiple processor cores
  • the home node detects that all the caches do not have the required base address to store
  • the corresponding instruction is sent to the memory controller, so that the corresponding instruction is sent from the memory in time.
  • the required base address In order to obtain the required base address to ensure that it can continue to query the page table base address when the cache cannot be hit.
  • a third cache (that is, a newly added cache) is newly added to the memory management unit, and the newly added cache can store part of the base address of the page table.
  • a TLB miss occurs, you can find in the newly added cache whether there are all the base addresses of the page tables needed to complete the VA to PA conversion. If it hits the newly added cache, you can directly obtain all the page table base addresses, so you don't need to perform page table traversal, and quickly get the required physical address.
  • the memory access request further includes the base address identifier of the K-th base address, and the base address identifier of the K-th base address is used to indicate the base address of the K-th base address. level.
  • the form of the memory access request is supplemented in the embodiment of the present invention.
  • a specific domain or data segment can be added to the memory access request, and the control unit or home node used for caching can identify which level of base address the memory access request is for inquiring and accessing.
  • control unit of the i-th level first cache is further configured to: in the case of determining the (M+1)-th level base address, send the first cache to the memory management unit (M+1)-level base address.
  • the memory management unit is further configured to receive the (M+1)th level base address.
  • the embodiment of the present invention supplements the possible processing situation of the cache after the last-level base address (that is, the (M+1)-th level base address) is determined. For example, a certain level of cache determines the last level of base address, which can be sent to the memory management unit. Further, the memory management unit receives the last page base address (that is, the last level base address) and adds the page offset to obtain the physical address.
  • an embodiment of the present invention provides a method for accelerating page table traversal, which includes: receiving a memory access request through a control unit of an i-th level first cache, the memory access request including a K-th base address and a K-th base address.
  • the method further includes: in a case where it is determined that the K-th base address is not stored in the i-th level first cache and i ⁇ N, passing the i-th level A cache control unit sends the memory access request to the (i+1)-th level first cache control unit.
  • the method further includes: in a case where it is determined that the K-th level base address is not stored in the i-th level first cache, through the control of the i-th level first cache The unit sends the memory access request to the home node.
  • the method further includes: after receiving the memory access request, through the home node, judging the first level of each level of the N level first cache according to the K level base address Whether the cache stores the K-th level base address; in the case of determining that the target first cache stores the K-th level base address, sending the memory access request to the control unit of the target first cache.
  • the method further includes: in a case where it is determined that the K-th level base address is not stored in the first cache of each level, sending to the memory controller through the home node The memory access request; determine the K-th level base address in the memory according to the K-th level base address through the memory controller; send the K-th level to the home node through the memory controller Base address; according to the K-th level base address and the K-th level offset address, the (K+1)-th level base address is determined through the buffer of the home node.
  • the method further includes: after receiving the memory access request, judging by the home node whether the K-th base address is stored in the second cache; If the cache stores the K-th level base address, the home node sends the memory access request to the control unit of the second cache.
  • the method further includes: in the case where it is determined that the K-th base address is not stored in the first cache and the second cache at each level, passing the home The node sends the memory access request to the memory controller;
  • the K-th level base address determine the K-th level base address in the memory by the memory controller; send the K-th level base address to the home node through the memory controller; according to the The K-th level base address and the K-th level offset address are used to determine the (K+1)-th level base address through the buffer of the home node.
  • the method further includes: before sending a memory access request to the first-level first cache, judging by the memory management unit whether the third cache stores the first-level base address; When the third cache stores the level 1 base address, obtain the level 1 base address through the memory management unit; when it is determined that the third cache does not store the level 1 base address In the case of, the memory access request is sent to the level 1 first cache through the memory management unit.
  • the memory access request further includes the base address identifier of the K-th base address, and the base address identifier of the K-th base address is used to indicate the base address of the K-th base address. level.
  • the at-home node may first determine whether the K-th base address is stored in the first cache, and then determine whether the K-th base address is stored in the second cache.
  • the at-home node simultaneously determines whether the K-th base address is stored in the first cache and the second cache.
  • the home node sends a corresponding memory access request to it; the home node can receive the results of multiple cache feedbacks, and the most high-level base address is used in many results.
  • the address is the final traversal result, and then continue the traversal process.
  • the process of traversal in multiple caches can be directly fed back to the memory management unit by completing the last step of traversing the cache (that is, the cache obtaining the final base address) to the memory management unit.
  • an embodiment of the present invention provides an apparatus for accelerating page table traversal, including: an N-level first cache; each of the N-level first caches includes a calculation unit and a control unit; The N-level first cache stores one or more base addresses among the M-level base addresses; N is an integer greater than 0, and M is an integer greater than 1; where,
  • the calculation unit of the i-th level first cache is configured to: determine the (K+1)-th level base address according to the K-th level base address and the K-th level offset address; the K-th level offset address is based on the The high-order address of the Kth level is determined.
  • the device further includes a home node; the home node is coupled to the N-th level first cache; the control unit of the N-th level first cache is further configured to: If the Nth level first cache does not store the Kth level base address, sending the Kth level memory access request to the home node. The home node is used to receive the K-th level memory access request.
  • the method further includes: before sending the level 1 memory access request to the level 1 first cache, judging by the memory management unit whether the third cache stores the level 1 base address In the case of determining that the third cache stores the first-level base address, obtain the first-level base address through the memory management unit; in determining that the third cache does not store the first-level base address; In the case of a level base address, the memory management unit sends the level 1 memory access request to the level 1 first cache.
  • an embodiment of the present invention provides a terminal, the terminal includes a processor, and the processor is configured to support the terminal to perform a corresponding function in the method for accelerating hardware page table traversal provided in the second aspect.
  • the terminal may also include a memory, which is used for coupling with the processor and stores necessary program instructions and data for the terminal.
  • the terminal may also include a communication interface for the terminal to communicate with other devices or communication networks.
  • an embodiment of the present invention provides a chip system, which may include the accelerated page table traversal device as described in the above first aspect, and an auxiliary circuit coupled to the accelerated page table traversal device.
  • an embodiment of the present invention provides an electronic device, which may include: the accelerated page table traversal device as described in the above first aspect, and a discrete device coupled to the outside of the accelerated page table traversal device.
  • an embodiment of the present invention provides a chip system, and the chip system can execute any method involved in the above-mentioned second aspect, so that related functions can be realized.
  • the chip system further includes a memory, and the memory is used to store necessary program instructions and data.
  • the chip system can be composed of chips, or include chips and other discrete devices.
  • FIG. 1 is a schematic diagram of a hardware page table traversal process according to an embodiment of the present invention
  • FIG. 2 is a schematic diagram of a system architecture provided by an embodiment of the present invention.
  • FIG. 3 is a schematic diagram of an application architecture provided by an embodiment of the present invention.
  • FIG. 4 is a schematic diagram of another application architecture provided by an embodiment of the present invention.
  • FIG. 5 is a schematic diagram of an accelerated hardware page table traversal device provided by an embodiment of the present invention.
  • FIG. 6 is a schematic diagram of a specific accelerated hardware page table traversal device provided by an embodiment of the present invention.
  • FIG. 7 is a schematic diagram of the internal structure of a cache provided by an embodiment of the present invention.
  • FIG. 8 is a schematic diagram of a page table query process provided by an embodiment of the present invention.
  • FIG. 9 is a schematic diagram of interaction of part of the hardware corresponding to FIG. 8 provided by an embodiment of the present invention.
  • FIG. 10 is a schematic diagram of a command format of a memory access request provided by an embodiment of the present invention.
  • FIG. 11 is a schematic diagram of another page table query process provided by an embodiment of the present invention.
  • FIG. 12 is a schematic diagram of interaction of part of the hardware corresponding to FIG. 11 provided by an embodiment of the present invention.
  • FIG. 13 is a schematic diagram of another page table query process provided by an embodiment of the present invention.
  • FIG. 14 is a schematic diagram of interaction of part of the hardware corresponding to FIG. 13 provided by an embodiment of the present invention.
  • FIG. 15 is a schematic diagram of another page table query process provided by an embodiment of the present invention.
  • FIG. 16 is a schematic diagram of interaction of part of the hardware corresponding to FIG. 15 provided by an embodiment of the present invention.
  • FIG. 17 is a schematic diagram of an accelerated hardware page table traversal device provided by an embodiment of the present invention.
  • FIG. 18 is a schematic diagram of another specific accelerated hardware page table traversal device provided by an embodiment of the present invention.
  • FIG. 19 is a schematic diagram of a page table query process in a multi-core situation according to an embodiment of the present invention.
  • FIG. 20 is a schematic diagram of interaction of part of the hardware corresponding to FIG. 19 according to an embodiment of the present invention.
  • 21 is a schematic diagram of an accelerated hardware page table traversal device in a multi-core situation according to an embodiment of the present invention.
  • 22 is a schematic diagram of an accelerated hardware page table traversal method provided by an embodiment of the present invention.
  • FIG. 23 is a schematic diagram of another method for accelerating hardware page table traversal provided by an embodiment of the present invention.
  • FIG. 24 is a schematic structural diagram of a chip provided by an embodiment of the present invention.
  • first”, “second”, “third” and “fourth” in the specification and claims of this application and the drawings are used to distinguish different objects, not to describe a specific order ; And the objects described by the terms “first”, “second”, “third” and “fourth” may also be the same objects, or contain each other or have other relationships.
  • the terms “including” and “having” and any variations of them are intended to cover non-exclusive inclusions. For example, a process, method, system, product, or device that includes a series of steps or units is not limited to the listed steps or units, but optionally includes unlisted steps or units, or optionally also includes Other steps or units inherent to these processes, methods, products or equipment.
  • component used in this specification are used to denote computer-related entities, hardware, firmware, a combination of hardware and software, software, or software in execution.
  • the component may be, but is not limited to, a process, a processor, an object, an executable file, an execution thread, a program, and/or a computer running on a processor.
  • the application running on the computing device and the computing device can be components.
  • One or more components may reside in processes and/or threads of execution, and components may be located on one computer and/or distributed among two or more computers.
  • these components can be executed from various computer readable media having various data structures stored thereon.
  • a component can be based on a signal having one or more data packets (for example, data from two components that interact with another component in a local system, a distributed system, and/or a network, such as the Internet that interacts with other systems through signals). Communicate through local and/or remote processes.
  • data packets for example, data from two components that interact with another component in a local system, a distributed system, and/or a network, such as the Internet that interacts with other systems through signals.
  • Physical address also called real address (real address), binary address (binary address), it exists in electronic form on the address bus, so that the data bus can access a certain specific part of the main memory
  • the memory address of the storage unit Addresses are numbered starting from 0 and sequentially incremented by 1 each time, so the physical address space of the memory grows linearly.
  • the virtual address is transformed to generate a physical address.
  • Memory is used to temporarily store the calculation data in the processor and the data exchanged with external storage such as hard disks.
  • the processor transfers the data that needs to be calculated into the memory for calculation, and then transmits the result after the calculation is completed.
  • Dynamic random access memory DRAM is very cost-effective and has good scalability. It is the most important part of general memory.
  • the central processing unit mainly includes two parts, namely the controller and the arithmetic unit, which also includes the high-speed buffer memory and the bus that realizes the data and control of the connection between them.
  • Arithmetic&logical Unit is the core component of the processor, which mainly performs various arithmetic and logical operations, such as four arithmetic operations such as addition, subtraction, multiplication, and division, and, OR, NOT, XOR, etc.
  • Logic operations, as well as operations such as shifting, comparing, and transferring.
  • Memory Management Unit sometimes called paged memory management unit (PMMU), or memory management unit
  • PMMU paged memory management unit
  • memory management unit is a kind of memory access that is responsible for processing the central processing unit (CPU) The requested computer hardware. Its functions include virtual address to physical address conversion (ie virtual memory management), memory protection, and control of the central processing unit's cache; in a relatively simple computer architecture, it is responsible for bus arbitration and memory bank switching.
  • Cache is located between the CPU and the main memory DRAM. It is a small but high-speed memory, usually composed of static random access memory (SRAM). Among them, as long as the SRAM is kept powered on, the stored data can be kept constantly. SRAM can generally be divided into the following five major parts: memory cell array, row/column address decoder, sensitive amplifier, control circuit and drive circuit. Specifically, the cache can store a part of the data that the CPU has just used or recycled. If the CPU needs to use this part of the data again, it can be called directly from the cache, which speeds up the data fetching speed and reduces the waiting time of the CPU.
  • SRAM static random access memory
  • Cache is generally divided into level 1 cache (level 1 cache, L1 cache), level 2 cache (level 2 cache, L2 cache), and level 3 cache (level 3 cache, L3 cache), etc.; among them, L1 cache is mainly integrated inside the CPU , L2cache is integrated on the motherboard or inside the CPU, L3 cache is integrated on the motherboard or inside the CPU, and L3 cache is shared by multiple processor cores in the CPU.
  • L1 cache is mainly integrated inside the CPU
  • L2cache is integrated on the motherboard or inside the CPU
  • L3 cache is integrated on the motherboard or inside the CPU
  • L3 cache is shared by multiple processor cores in the CPU.
  • Buffer is a reserved storage space with a certain capacity for buffering input or output data.
  • the buffer is divided into input buffer and output buffer according to whether it corresponds to an input device or an output device.
  • the processor core is the core of the processor, used to complete all calculations, accept/store commands, and process data.
  • the cores of various processors have a fixed logical structure, involving the layout of logical units such as level one cache, level two cache, execution unit, instruction level unit, and bus interface.
  • TLB Translation Lookaside Buffer
  • All current desktop and server processors (such as x86) use TLB.
  • the TLB has a fixed number of space slots for storing tab page table entries that map virtual addresses to physical addresses.
  • the search key is the virtual memory address, and the search result is the physical address. If the requested virtual address exists in the TLB, a very fast matching result will be given, and then the obtained physical address can be used to access the memory.
  • the tab page table will be used for virtual and real address conversion, and the access speed of the tab page table is much slower than the TLB.
  • Some systems allow the tab page table to be swapped to the secondary memory, so the virtual-real address conversion may take a very long time.
  • the page table is a special data structure that is placed in the page table area of the system space to store the correspondence between logical pages and physical page frames.
  • a fixed-size page (Page) is used to describe the logical address space, and a page frame (Frame) of the same size is used to describe the physical memory space.
  • the operating system implements page mapping from logical pages to physical page frames, and is responsible for all pages. Management and control of process operation.
  • Memory Controller is a bus circuit controller used to manage and plan the transmission speed from memory to CPU; it can be a separate chip or integrated into a related large chip. Perform necessary control of memory access in accordance with certain timing rules, including the control of address signals, data signals, and various command signals, so that the CPU can use the storage resources of the memory according to demand.
  • PoS Point of Serialization
  • HTNW Home Node/Ordering Point
  • CPU pipeline technology is a technology that decomposes instructions into multiple steps and overlaps the operations of different instructions to achieve parallel processing of several instructions to speed up the program running process. Each step of the instruction has its own independent circuit to process, and each step is completed, it goes to the next step, and the previous step processes subsequent instructions.
  • the pipeline structure of the processor is the most basic element of the processor micro-architecture, which carries and determines the details of other micro-architectures of the processor.
  • the bus is the key to the multi-core connection of the processor, and is used to transfer information between the core and the cache and memory.
  • Extended Page Table consisting of four-level page mapping table (page map level 4 table, PML4), page directory pointer table (page-directory-pointer table, PDPT), page directory table (page -directory, PD) and page table (page table, PT) consist of four levels of page tables.
  • Hardware Page Table Walk is the process of hardware module querying the page table.
  • the virtual address used when the program accesses the memory needs to be converted into a physical address to access the memory; after a TLBmiss occurs, the hardware can complete the traversal of the page table to find the missing page table. For example, a 48-bit virtual address is traversed completely to determine the corresponding physical address.
  • FIG. 1 is a schematic diagram of a hardware page table traversal process according to an embodiment of the present invention.
  • the page base address register stores the physical base address of the page table.
  • the MMU part of the processor core provides the HPTW function.
  • the physical address in CR3 is the first-level page table. Base address.
  • the HPTW process needs to query the L2 cache. If the page table hits the L2 cache, the above process can be accelerated to a certain extent (you do not need to access the data every time), otherwise the above HPTW process requires four memory access operations to complete.
  • IP Intellectual Property
  • the base address or physical base address, referred to as the base address, can be understood as the basic address where data (such as a page table or page) is stored in the memory, and is a calculation basis for a relative offset (offset address).
  • offset address a relative offset
  • a certain page table entry in the level of page table can be determined as the base address of the next level of page table according to the corresponding offset address.
  • FIG. 2 is a schematic diagram of a system architecture provided by an embodiment of the present invention.
  • the system architecture includes a processor chip 10. , The processor core 11, the bus 12, the memory controller 14 and the memory 20.
  • the processor chip 10 includes a processor core 11, a bus 12 and a memory controller 14.
  • the processor core 11 may include a processor core 1, a processor core 2, ... and a processor core Q (Q is an integer greater than 0).
  • the multiple processor cores and the memory controller 14 are all connected to the bus 12.
  • the processor chip 10 processes data through the processor core 11 (for example, in the embodiment of the present invention, the processor core is responsible for converting virtual addresses into physical addresses.
  • the core can pass through Complete the process of hardware page table traversal with other devices on the chip; in the case of multiple processor cores and the cache of a core does not store all base addresses, multiple cores will interact with each other and cooperate with other on-chip devices.
  • the device completes the process of traversing the hardware page table); the processor chip 10 interacts with the memory 20 through the memory controller 14, such as reading data or writing data.
  • the bus is a channel for data interaction between multiple cores and other components (such as memory controllers, etc.).
  • FIG. 3 is a schematic diagram of an application architecture provided by an embodiment of the present invention; as shown in FIG. 3, the embodiment of the present invention can be applied to the processor chip 10; the processor chip 10 is connected to the memory 20.
  • the processor chip 10 includes a processor core 11, a bus 12, a home node 13 and a memory controller 14.
  • the embodiment of the present invention does not limit the specific connection mode between the modules, units or devices. among them,
  • the memory 20 is used to store the base address of each level in the M-level physical base address (ie, the M-level base address), and M is an integer greater than one.
  • M is an integer greater than one.
  • the memory stores the base addresses of the first-level page table to the fourth-level page table, a total of four-level base addresses.
  • the memory controller 14 is used to receive the Kth level memory access request sent by the home node; according to the Kth level base address in the Kth level memory access request, obtain the corresponding Kth level base address from the memory, And return the K-th base address to the home node; K is an integer greater than 0.
  • Home node 13 is used for:
  • a request is received.
  • the type of request may include HPTW request and other types of requests.
  • Request Determine whether the received request (request) is a memory access request (ie, HPTW request).
  • the at-home node judges that the request is a memory fetch request, identify the level of the target page table base address of the memory fetch request (for example, for a Kth level memory fetch request, the query is for the Kth level page table base address);
  • the level of the base address of the target page table is identified through the home node buffer (including the identification module).
  • the home node After the home node determines which level of the N-level first cache stores the base address of the target page table, it sends the memory access request to the first cache that stores the base address of the target page table.
  • the K-th level memory access request is sent to the memory controller.
  • the home node calculates the (K+1)-level base address based on obtaining the K-th level base address, and further queries based on the (K+1)-level base address. For example, if the home node obtains the third-level base address from the memory and calculates the fourth-level base address, it can immediately continue to search for the fourth-level base address in the PoS of the home node.
  • the processor core 11 may include a processor core pipeline 110, an arithmetic unit 111, a memory access unit 112, a memory management unit 113, and a first cache 114.
  • the number of multi-level caches included is not limited.
  • the processor core pipeline 110 is used for parallel processing of instructions between units or devices such as the arithmetic unit 111 and the memory access unit 112. It can be understood that the arithmetic unit or execution unit may include an arithmetic logic unit ALU.
  • the memory fetching unit 112 is configured to send a virtual address provided by an application program to the memory management unit MMU, and receive a physical address corresponding to the virtual address fed back by the MMU.
  • the memory fetching unit 112 may interact with the first cache.
  • the memory fetching unit 112 may obtain the page base address returned by the first cache 114, and then send the page base address to the ALU, and the ALU performs calculation processing. For example, the page base address is fed back to the ALU, and the ALU adds the page base address to the page offset to obtain the final required physical address.
  • the memory management unit 113 is configured to send a level 1 memory access request to the level 1 first cache, for example, after a TLB miss occurs, determine the base address of the TLB missing page table (in the case of a four-level page table, the TLB stores 1-4 After the base address of the level page table or the base address of the 1-4 page table is not stored, there is no situation where only part of the level page table base address is stored), then the first level memory access request (including the first level page table base address is sent) And the upper address in the virtual address); receive the (M+1)th level base address, that is, the base address of a certain page in the last page table.
  • the memory management unit 113 may include a bypass transfer buffer 1130. It can be understood that the memory management unit 113 is a part of the storage unit.
  • the first cache 114 is used to store the base address of the one-level or multi-level page table among the base addresses of the M-level page table.
  • the processor core 11 may be one core as shown in FIG. 3, or may include multiple cores.
  • FIG. 4 is a schematic diagram of another application architecture provided by an embodiment of the present invention.
  • the processor core 11 includes a plurality of cores, such as a processor core 1, a processor core 2, ..., a processor core Q (Q is an integer greater than 1).
  • the home node 13 may include multiple home nodes.
  • a corresponding home node may be set at the connection between each core and the bus.
  • PoS and the first N-level cache (for example, L3 cache is a cache shared by multiple cores outside the core) are logically and functionally independent; PoS can be logically located at any position inside the processor.
  • the two can be designed separately (that is, they are located in different physical locations in the processor); or, in order to accelerate the interaction between the two, the two can be placed in one place or close to each other in the physical structure.
  • the embodiment of the present invention does not limit the organization structure of PoS and L3 cache.
  • the home node is used to:
  • the home node sends a Kth level memory access request to the corresponding cache;
  • a K-th level memory access request is sent to the memory controller.
  • the processor core 11 may also include a second cache 115; the second cache includes (N-1) level first cache, such as level 1 first cache, level 2 first cache, and level 3 first cache ,..., the i-th level first cache,...and the (N-1)th level first cache, and so on.
  • the first cache includes the N-th level first cache and the second cache 115 of all cores; in the case of a single core, the first cache includes all the N-level first caches.
  • the first cache can be the cache in the core and the cache shared with other cores (for example, the Nth level first cache shown in FIG. 3
  • the shared cache can be a general term for one or more);
  • the second cache is a general term for caches in other cores.
  • processor core pipeline 110 arithmetic unit 111, memory access unit 112, memory management unit 113 and other units or devices that are consistent with those in FIG. 3 shown in FIG. 4, please refer to the related description of FIG. 3 and the foregoing Part of the explanation of the terms is not repeated here.
  • FIG. 5 is a schematic diagram of an accelerated hardware page table traversal device provided by an embodiment of the present invention; as shown in FIG. 5, it is mainly related to the storage management unit 113, the first cache 114, and the home node 13 in FIG. , The interaction between the memory controller 14 and the memory 20 is described.
  • the memory management unit 113 is connected to the second-level cache 1141; the third-level cache 1142 is connected to the second-level cache 1141, and is also connected to the memory controller 14 through the bus 12.
  • the node where the three-level cache 1142 is connected to the bus may be provided with a home node 13.
  • the memory controller 14 in the processor chip 10 is connected to the memory 20.
  • the 2-level first cache stores one or more base addresses among the 4-level base addresses, that is, only part of the base addresses or all base addresses may be stored in the first cache .
  • FIG. 6 is a schematic diagram of a specific accelerated hardware page table traversal device provided by an embodiment of the present invention; as shown in FIG. 6, the processor chip includes a processor core 11 and a three-level cache (L3 cache, namely Level 2 first cache), the processor core 11 includes a second level cache (L2 cache, that is, level 1 first cache).
  • L3 cache in the figure is an out-of-core cache, but the embodiment of the present invention does not limit this, that is, the L3 cache may also be an in-core cache.
  • FIG. 6 is only an exemplary situation.
  • a multi-level cache capable of storing the base address of a page table, such as a fourth-level cache, a fifth-level cache, and a sixth-level cache, refer to FIG. 6 and the description of the corresponding embodiment.
  • Figure 7 is a schematic diagram of the internal structure of a cache provided by an embodiment of the present invention.
  • the i-th level of the first cache includes the control unit of the i-th level of the first cache and The computing unit of the i-th level first cache.
  • the first-level base address and the first-level offset address determine the second-level base address; assuming that there are a total of 4-level page tables, according to the fourth-level base address and the fourth-level
  • the offset address determines the fifth-level base address (the fifth-level base address is the physical base address of a page in the fourth-level page table); therefore, the base address can be the page table or the physical base address of the page.
  • the L2 cache determines which item in the first-level page table is the first-level page table according to the offset address.
  • the base address of the second-level page table assuming that the 48-bit virtual address [47:39] (that is, the first-level high-order address) is 000100000, which corresponds to 32, under the premise that 32 belongs to the serial number range of the first-level page table ,
  • the cache looks at item 32 of the first-level page table, and obtains the physical base address of the second-level page table stored in it, that is, determines the base address of the second-level page table.
  • the specific base address determination and address splicing description please refer to the explanation of some terms (19), which will not be repeated here.
  • Figure 8 is a schematic diagram of a page table query process provided by an embodiment of the present invention; as shown in Figure 8, the Kth level memory access request, the Kth level base address, and the (K+1)th level The value of K in the base address of the level includes 1, 2, 3, ..., M.
  • Figure 9 is an interactive schematic diagram of part of the hardware corresponding to Figure 8 provided by an embodiment of the present invention; assuming that the L2 cache hits all the four-level page table base addresses (that is, each level of the four-level base address is stored in the L2 cache) As shown in FIG. 9, in the device for accelerating hardware page table traversal shown in FIG. 6, its various functional modules can perform corresponding operations according to the following sequence. The specific steps are as follows:
  • the first-level memory access request is received from the MMU113; the command format of the memory access request sent from the MMU to the second-level cache is shown in FIG. 10, which is a command format of a memory access request provided by an embodiment of the present invention Schematic diagram; as shown in Figure 10, the command format can include high-order address, request type, i-th level page table base address and bit field.
  • the corresponding command format is ⁇ high address [47:12], memory access request (ie HPTW request), i-th level page table base address and Bit field [1:0] ⁇ . Take the command format of the first-level memory access request as an example.
  • [47:12] is the upper address of the virtual address except the lower address of the page offset [11:0], including [ 47:39] (i.e. the first level high address), [38:30] (i.e. the second level high address), [29:21] (i.e. the third level high address) and [20:12] (i.e. the fourth High-level address).
  • the bit field 00 is used to indicate that the memory access request queries the base address of the second level page table.
  • control unit of the second-level cache receives the first-level memory access request from the MMU; wherein, the high-order address included in the first-level memory access request may be [47:12].
  • the hardware page table traversal process starts to query by obtaining the base address of the starting page table from CR3 until the end of querying the base address of the target page table at this level. For example, under the premise of a four-level page table, only the page table entry that contains the base address of the second-level page table is missing in the TLB, and the traversal process ends when the base address of the second-level page table is queried.
  • the tag is a part of the base address, and it can be considered that a part of the base address is used as the identification for judging whether there is storage; optionally, the received first-level base address is compared with all stored base addresses, and judgment two Does the level cache store the same base address.
  • the current memory access request is querying the base address of the first-level page table, then according to the tag of the base address of the first-level page table to determine whether the second-level cache hits the base address of the first-level page table and query the first level after the hit The base address of the level page table.
  • the corresponding first-level offset address is spliced on the first-level base address to obtain the second-level base address.
  • add the first-level offset address to determine the second-level base address that is, in the first-level page table Which page table entry in the first-level page table is the base address of the second-level page table is determined by the first-level offset address.
  • Step 6 (S6) The control unit 71 of the second-level cache determines that the second-level cache has a second-level base address according to the second-level high-order address (the second-level high-order address is included in the high-order address); and then searches according to the second-level base address To the second-level base address of the store.
  • step 2 please refer to the description of step 2 above, which will not be repeated here.
  • Step (L-1) The calculation unit 72 of the second-level cache adds an offset address to the fourth-level base address to calculate the fifth-level base address (that is, the page base address in the fourth-level page table).
  • the page table entry in the fourth-level page table is determined according to the fourth-level offset address, and the page determines the page base address corresponding to the required physical address.
  • Step L (SL, the following are all referred to as step L/SL, which represents the final step): the calculation unit 72 of the second-level cache feeds back the fifth-level base address to the control unit 71 of the second-level cache.
  • control unit 71 of the secondary cache may feed back the page base address to the MMU113.
  • the MMU may complete the conversion from VA to PA according to the page base address and the page offset.
  • a calculation unit is added to each level of the first cache, so that after each level of the first cache obtains the base address, it can directly calculate the base address of the next level based on the base address and the corresponding offset.
  • the control unit of the i-th level first cache receives the K-th level memory access request, and on the premise that the i-th level first cache (that is, the current level cache) stores the K-th level base address, the control unit of the current level cache The computing unit sends the Kth level base address in the Kth level memory access request.
  • This level cache calculates the (K+1) level base address (ie, the next level base address) based on the K level base address and the offset address determined from the high address included in the K level memory access request. Different from any level 1 cache in the prior art, after determining the K-th level base address, it needs to return the base address of this level to the memory access unit in the processor core, and then calculate the next level by the arithmetic logic unit in the processor core.
  • Level base address and then continue to query the next level base address; in this embodiment of the present invention, no matter which level of the first cache determines the K-th level base address, it can be moved through the calculation unit added in the level of cache
  • the next-level base address is obtained by bit calculation, which improves the access efficiency of the base address and accelerates the traversal process of the hardware page table, thereby ultimately improving the conversion efficiency of virtual addresses to physical addresses.
  • identifiers S1, S2, S3, etc. shown in the figure correspond to steps 1, step 2, step 3, etc.
  • the following embodiments also use the same identifiers, and will not be described later. It is understandable that the steps corresponding to the part of the logo in the illustration in this application are exemplary descriptions.
  • Figure 11 is a schematic diagram of another page table query process provided by an embodiment of the present invention
  • the memory access request can be a Kth level memory access request
  • the base address can be a Kth level memory access request.
  • Level base address; among them, the value of K includes 1, 2, 3, ..., M.
  • Figure 12 is an interactive schematic diagram of part of the hardware corresponding to Figure 11 provided by an embodiment of the present invention; suppose that L2 cache hits the first level base address (ie, the first level page table base address), and the L3 cache hits the second level 2-4 base address.
  • each of its functional modules can perform corresponding operations in accordance with the following sequence. The specific steps are as follows:
  • the third-level cache (that is, the third-level first cache) is sent to the Level 2 memory access request.
  • the third-level cache may send a second-level memory access request to the home node.
  • the control unit 71 of the second-level cache may also send a second-level memory access request to the home node at the same time. It is understandable that, in this embodiment of the present invention, the home node and the control unit 81 of the third-level cache perform basically the same operations after receiving the second-level memory access request. Therefore, only the third-level cache and the second-level cache are described in the figure. Interaction of level cache.
  • Step 9 (S9) The control unit 81 of the third-level cache sends the second-level base address and the second-level high-order address to the calculation unit of the third-level cache.
  • Figure 13 is a schematic diagram of another page table query process provided by an embodiment of the present invention; as shown in Figure 13, the memory access request can be a Kth level memory access request, and the base address can be the Kth level. Level base address; among them, the value of K includes 1, 2, 3, ..., M.
  • Figure 14 is an interactive schematic diagram of part of the hardware corresponding to Figure 13 provided by an embodiment of the present invention; assuming that L2 cache hits the first level base address, the third level base address, and the fourth level base address, and the L3 cache hits the second level base address.
  • each of its functional modules can perform corresponding operations according to the following timing. The specific steps are as follows:
  • the control unit 71 of the second-level cache may also send a second-level memory access request to the home node at the same time. It is understandable that, in this embodiment of the present invention, the home node and the control unit 81 of the third-level cache perform basically the same operations after receiving the second-level memory access request. Therefore, only the third-level cache and the second-level cache are described in the figure. Interaction of level cache.
  • Step 9 (S9) The control unit 81 of the third-level cache sends the second-level base address and the second-level high-order address to the calculation unit of the third-level cache.
  • the control unit 81 of the three-level cache sends a memory access request to the home node PoS13, which adds a new command compared to the request in the prior art.
  • Requests from the cache need to be sent to PoS for unified processing.
  • a new command needs to be added to the request command.
  • the Opcode field is a command format, and 0x3B-0x3F can be reserved commands (that is, reserve commands).
  • the embodiment of the present invention can implement the HPTW request by using the 0x3B command.
  • the original command length is increased.
  • the original command length is 137
  • the third-level base address which of all caches or which caches store the third-level base address.
  • the first column stores the tag (ie, physical address) of the cache line
  • the second column stores the cache line in which cache line (such as the second level cache).
  • the second column stores the second level cache in which the cache line is stored, then 0001 can represent the data in the first L2 cache, and 1111 can represent the data in all four L2 caches.
  • the home node also includes a buffer.
  • a buffer For example, four new entries are added.
  • the four entries have the same structure as the home node, but they can be dedicated to HPTW (that is, only HPTW requests can enter the above four entries for processing, for example, the first-level page table enters the first An entry, or the two columns of data of the first entry only store the relevant information of the first level page table).
  • the buffer in the home node determines which level of the first cache stores the level 3 base address according to the tag of the level 3 base address.
  • the home node first judges that the memory access request is a query of the third-level base address according to the bit field (such as 10), and then judges whether the third-level base address is stored in which cache (for example, judges that the second-level cache is There is a 3rd level base address).
  • the bit field such as 10
  • the control unit 71 of the second-level cache may determine whether the third-level base address is stored in the current level cache according to the memory access request.
  • control unit 71 of the second-level cache feeds back the fifth-level base address to the MMU.
  • Figure 15 is a schematic diagram of another page table query process provided by an embodiment of the present invention.
  • the memory access request can be a Kth level memory access request
  • the base address can be the Kth level. Level base address; among them, the value of K includes 1, 2, 3, ..., M.
  • Fig. 16 is a schematic diagram of interaction of part of the hardware corresponding to Fig.
  • the control unit 71 of the second-level cache may also send a second-level memory access request to the home node at the same time. Or, in step 6, only the second-level memory access request is sent to the home node. It is understandable that, in this embodiment of the present invention, the home node and the control unit 81 of the third-level cache perform basically the same operations after receiving the second-level memory access request. Therefore, only the third-level cache and the second-level cache are described in the figure. Interaction of level cache.
  • Step 9 (S9) The control unit 81 of the third-level cache sends the second-level base address and the second-level high-order address to the calculation unit of the third-level cache.
  • the home node also includes a buffer.
  • the buffer determines which level of the first cache stores the level 3 base address according to the tag of the level 3 base address. For example, the home node determines that the third-level base address is stored in the second-level cache.
  • control unit 71 of the second-level cache may determine whether the third-level base address is stored in the current level cache according to the memory access request.
  • the third-level cache receives the fourth-level memory access and the fourth-level base address is not stored, it is determined that the current-level cache does not have the fourth-level base address.
  • the at-home node determines that the request is a fourth-level memory access request.
  • a request is sent to the memory controller, instructing the memory controller 14 to obtain the fourth-level base address from the memory 20.
  • the home node includes a buffer, and the buffer is used to determine which level of page table the request is for querying, and after the determination is completed, the current level of query is started and subsequent processing is started.
  • the buffer of the home node may also include a buffer calculation unit; the buffer is also used to perform shift calculation on the base address. Further optionally, after the home node obtains the level 5 base address, it sends the level 5 base address to the MMU.
  • the page table query process involved in the embodiment of the present invention may include, but is not limited to, the four processes provided above.
  • level 4 cache, level 5 cache, and other levels of cache, etc. reference may be made to the description of the foregoing illustration.
  • FIG. 17 is an accelerated hardware page table traversal provided by an embodiment of the present invention.
  • FIG. 17 is an accelerated hardware page table traversal provided by an embodiment of the present invention.
  • the memory management unit 113 queries the TLB for the occurrence of TLBmiss, it first queries whether the page table base address is stored in the third cache. If the third cache is hit, and the whole process of page table traversal can be completed by relying on the third cache, the solution provided in the foregoing embodiment may not be executed. Otherwise, for the missing page table base addresses in the TLB 1130 and the third cache 1131, continue to execute the page table traversal solution provided in the foregoing embodiment of the present invention.
  • FIG. 18 is a schematic diagram of another specific accelerated hardware page table traversal device provided by an embodiment of the present invention.
  • the processor chip 10 includes a processor core 11, and the processor core 11 may include Processor core 1 and processor core 2.
  • the second cache 115 of the processor core 2 is a second cache (ie, the first level second cache).
  • the second cache has only level 1, and the second cache 115 may be the second cache 115.
  • the third level cache is an off-core cache shared by the processor core 1 and the processor core 2, that is, the second level first cache. Both the first-level first cache and the first-level second cache may be second-level caches.
  • a home node 13 can be set where the processor core 1 is connected to the bus, and a home node 13 can be set where the processor core 2 is connected to the bus.
  • the home node 13 may include multiple home nodes, but the functions of the multiple home nodes are the same.
  • FIG. 19 is a schematic diagram of a page table query process in a multi-core situation provided by an embodiment of the present invention
  • the memory access request may be a K-th level memory access request.
  • the base address can be the K-th level base address; among them, the value of K includes 1, 2, 3,...,M.
  • Fig. 20 is a schematic diagram of interaction of part of the hardware corresponding to Fig.
  • L2 cache that is, the second-level cache in the first cache
  • L3 cache That is, the third-level cache in the first cache
  • the fourth-level base address is stored in the second-level cache of the second cache; as shown in FIG. 20, in the acceleration hardware shown in FIG.
  • the page table traversal device its various functional modules can perform corresponding operations according to the following sequence. The specific steps are as follows:
  • Step 9 (S9) The control unit 81 of the third-level cache sends the second-level base address and the second-level high-order address to the calculation unit of the third-level cache.
  • control unit 81 of the three-level cache receives the fourth-level memory access request.
  • step 1 to step 21 please refer to step 1 to step 21 in the corresponding embodiment of FIG. 19 and FIG. 20, which will not be repeated here.
  • the second-level cache in the second cache may also include a control unit and a calculation unit, and reference may be made to the description in the first cache, which will not be repeated here.
  • the description in the first cache which will not be repeated here.
  • the base address calculation node can occur in multiple cores or home nodes.
  • the third-level base address is hit in the third-level cache, and the fourth-level base address is obtained through the calculation unit of the third-level cache.
  • the third-level cache did not hit the fourth-level base address.
  • the home node found that the fourth-level base address was hit in the second-level cache of processor core 2, and the fourth-level access was sent to the second-level cache of processor core 2 through the home node. Deposit request.
  • FIG. 15 and FIG. 16 which is not repeated here.
  • the i-th level first cache (for example, the second level cache, the third level cache) in the foregoing embodiment refers to a cache capable of storing the base address of the page table, and does not involve the first level cache in the prior art (due to the current The first level cache does not store the page table base address). However, it does not rule out that the first-level cache may store the base address of the page table in the future. If the first-level cache can store the base address of the page table, the first-level first cache may be the first-level cache. Otherwise, the first-level first cache in the embodiment of the present invention is generally a second-level cache. In addition, the embodiment of the present invention does not limit the number and level of caches.
  • FIG. 21 is an acceleration hardware in a multi-core situation provided by an embodiment of the present invention.
  • FIG. 21 is an acceleration hardware in a multi-core situation provided by an embodiment of the present invention.
  • a schematic diagram of a page table traversal device as shown in FIG. 21, a third cache 1131 is added to the memory management unit 113 of the processor core 1. After the memory management unit 113 queries the TLB for the occurrence of TLBmiss, it first queries whether all required page table base addresses are stored in the third cache. If the third cache is hit, and the whole process of page table traversal can be completed by relying on the third cache, the solution provided in the foregoing embodiment may not be executed. Otherwise, for the missing page table base addresses in the TLB 1130 and the third cache 1131, continue to execute the page table traversal solution provided in the foregoing embodiment of the present invention.
  • each processor All cores can include a third cache.
  • the illustrated processor core 1 has a third cache 1131; then the processor core 2 or the processor core Q can also have a third cache.
  • FIG. 22 is a schematic diagram of an accelerated hardware page table traversal method according to an embodiment of the present invention. As shown in FIG. 22, steps S2201-step S2212 may be included; among which, optional steps may include step S2205- Step S2212.
  • Step S2201 Receive a memory access request through the control unit of the i-th level first cache.
  • the memory access request further includes a base address identifier of the K-th base address, and the base address identifier is used to indicate the level of the K-th base address.
  • Step S2202 Through the control unit of the i-th level first cache, determine whether the i-th level first cache stores the K-th level base address according to the K-th level base address.
  • Step S2203 In the case of determining that the base address of the K-th level is stored in the i-th level of the first cache, a calculation to the i-th level of the first cache is performed by the control unit of the i-th level of the first cache The unit sends the Kth level base address.
  • Step S2204 Determine the (K+1)-th level base address according to the K-th level base address and the K-th level offset address through the calculation unit of the i-th level first cache.
  • the Kth level offset address is determined according to the Kth level high address.
  • Step S2205 In the case where it is judged that the K-th base address is not stored in the i-th level first cache and i ⁇ N, the control unit of the i-th level first cache sends the information to the (i+1 ) The first cache sends the memory access request.
  • Step S2206 In the case where it is determined that the K-th base address is not stored in the i-th level first cache, send the memory access request to the home node through the control unit of the i-th level first cache.
  • Step S2207 After receiving the memory access request, the home node determines whether each level of the first cache in the N-level first cache stores the K-th level base address according to the K-th level base address.
  • Step S2208 In the case where it is determined that the K-th base address is stored in the target first cache, send the memory access request to the control unit of the target first cache.
  • Step S2209 In the case where it is determined that the K-th base address is not stored in the first cache of each level, send the memory access request to the memory controller through the home node.
  • Step S2210 Determine the Kth level base address in the memory according to the Kth level base address through the memory controller.
  • Step S2211 Send the Kth level base address to the home node through the memory controller.
  • Step S2212 According to the Kth level base address and the Kth level offset address, the (K+1)th level base address is determined through the buffer of the home node.
  • FIG. 23 is a schematic diagram of another accelerated hardware page table traversal method provided by an embodiment of the present invention; as shown in FIG. 23, it may include steps S2301-step S2317.
  • Step S2301 Receive a memory access request through the control unit of the i-th level first cache.
  • the memory access request further includes a base address identifier of the K-th base address, and the base address identifier is used to indicate the level of the K-th base address.
  • Step S2302 The control unit of the i-th level first cache judges whether the i-th level first cache stores the K-th level base address according to the K-th level base address.
  • Step S2303 In the case where it is determined that the K-th base address is stored in the i-th level first cache, the control unit of the i-th level first cache is used to calculate to the i-th level first cache The unit sends the Kth level base address.
  • Step S2304 Determine the (K+1)-th level base address according to the K-th level base address and the K-th level offset address through the calculation unit of the i-th level first cache.
  • the Kth level offset address is determined according to the Kth level high address.
  • Step S2305 In the case where it is determined that the K-th base address is not stored in the i-th level first cache and i ⁇ N, the control unit of the i-th level first cache sends the data to the (i+1 ) The first cache sends the memory access request.
  • Step S2306 In the case where it is determined that the K-th base address is not stored in the i-th level first cache, send the memory access request to the home node through the control unit of the i-th level first cache.
  • Step S2307 After receiving the memory access request, the home node determines whether each level of the first cache in the N level first cache stores the K level base address according to the K level base address.
  • Step S2308 In the case where it is determined that the K-th level base address is stored in the target first cache, the memory access request is sent to the control unit of the target first cache.
  • Step S2309 After receiving the memory access request, determine whether the second cache stores the K-th base address through the home node.
  • Step S2310 In the case where it is determined that the K-th base address is stored in the second cache, send the memory access request to the second cache through the home node.
  • Step S2311 In a case where it is determined that the K-th base address is not stored in the first cache and the second cache at each level, send the memory access request to the memory controller through the home node.
  • Step S2312 Determine the Kth level base address in the memory by the memory controller according to the Kth level base address.
  • Step S2313 Send the Kth level base address to the home node through the memory controller.
  • Step S2314 According to the Kth level base address and the Kth level offset address, the (K+1)th level base address is determined through the buffer of the home node.
  • Step S2315 Before sending the memory access request to the level 1 first cache, determine whether the third cache stores the level 1 base address through the memory management unit.
  • Step S2316 In a case where it is determined that the third cache stores the level 1 base address, obtain the level 1 base address through the memory management unit.
  • the third cache stores the base address of the first level
  • the third cache also stores the base addresses of the remaining levels, so as to complete the conversion of the target virtual address into a physical address.
  • Step S2317 In a case where it is determined that the third cache does not store the level 1 base address, the memory management unit sends the memory access request to the level 1 first cache.
  • FIG. 24 is a schematic structural diagram of a chip provided by an embodiment of the present invention.
  • the accelerated hardware page table traversal apparatus in the foregoing embodiment can be implemented with the structure shown in FIG. 24.
  • the device includes at least one processor 241 and at least one memory 242.
  • the device may also include general components such as antennas, which will not be described in detail here.
  • the processor 241 may be a general-purpose central processing unit (CPU), a microprocessor, an application-specific integrated circuit (ASIC), or one or more integrated circuits for controlling the execution of the above scheme programs.
  • CPU central processing unit
  • ASIC application-specific integrated circuit
  • the memory 242 may be a read-only memory (ROM) or other types of static storage devices that can store static information and instructions, random access memory (RAM), or other types that can store information and instructions
  • the dynamic storage device can also be electrically erasable programmable read-only memory (Electrically Erasable Programmable Read-Only Memory, EEPROM), CD-ROM (Compact Disc Read-Only Memory, CD-ROM) or other optical disc storage, optical disc storage (Including compact discs, laser discs, optical discs, digital versatile discs, Blu-ray discs, etc.), magnetic disk storage media or other magnetic storage devices, or can be used to carry or store desired program codes in the form of instructions or data structures and can be used by a computer Any other media accessed, but not limited to this.
  • the memory can exist independently and is connected to the processor through a bus.
  • the memory can also be integrated with the processor.
  • the memory 242 is used to store application program codes for executing the above solutions, and the processor 241 controls the execution.
  • the processor 241 is configured to execute application program codes stored in the memory 242. details as follows:
  • the control unit of the i-th level first cache it is judged according to the K-th level base address whether the i-th level first cache stores the K-th level base address;
  • the control unit of the i-th level first cache sends the K-th level base address to the calculation unit of the i-th level first cache;
  • the calculation unit of the first level i cache determines the (K+1) level base address according to the K level base address and the K level offset address, and the K level offset address is based on the K level offset address. The upper address is determined.
  • the code stored in the memory 242 can execute the accelerated hardware page table traversal device method provided in FIG. 22 or FIG.
  • the control unit of the i-th first cache sends the control unit of the (i+1)-th first cache to the control unit of the (i+1)-th first cache. Fetch request.
  • the home node determines whether each level of the first cache in the N-level first cache stores the K-th level base address according to the K-th level base address; In a case where it is determined that the target first cache stores the K-th level base address, the memory access request is sent to the control unit of the target first cache.
  • the memory access request is sent to the memory controller through the home node; through the memory controller, Determine the Kth level base address in the memory according to the Kth level base address; send the Kth level base address to the home node through the memory controller; For the Kth level offset address, the (K+1)th level base address is determined through the buffer of the home node.
  • the disclosed device may be implemented in other ways.
  • the device embodiments described above are merely illustrative, for example, the division of the above-mentioned units is only a logical function division, and there may be other divisions in actual implementation, for example, multiple units or components can be combined or integrated. To another system, or some features can be ignored, or not implemented.
  • the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, and may be in electrical or other forms.
  • the units described above as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or they may be distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the objectives of the solutions of the embodiments.
  • the functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit.
  • the above-mentioned integrated unit can be implemented in the form of hardware or software functional unit.
  • the technical solution of the present application essentially or the part that contributes to the existing technology or all or part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium , Including several instructions to enable a computer device (which may be a personal computer, a server, or a network device, etc., specifically a processor in a computer device) to execute all or part of the steps of the above methods of the various embodiments of the present application.
  • the aforementioned storage media may include: U disk, mobile hard disk, magnetic disk, optical disk, read-only memory (Read-Only Memory, abbreviation: ROM) or Random Access Memory (Random Access Memory, abbreviation: RAM), etc.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Memory System Of A Hierarchy Structure (AREA)

Abstract

本发明实施例公开了一种加速硬件页表遍历的方法及装置。所述方法包括通过第i级第一缓存的控制单元接收访存请求,所述访存请求包括第K级基址和第K级高位地址;通过所述第i级第一缓存的控制单元,根据所述第K级基址判断所述第i级第一缓存是否存储有所述第K级基址;在判断所述第i级第一缓存存储有所述第K级基址的情况下,通过所述第i级第一缓存的控制单元,向所述第i级第一缓存的计算单元发送所述第K级基址;通过第i级第一缓存的计算单元,根据所述第K级基址和第K级偏移地址确定第(K+1)级基址。通过实施本发明实施例,加速页表遍历过程而提高虚拟地址的转换效率。

Description

一种加速硬件页表遍历的方法及装置
本申请要求于2019年11月28日提交中国专利局、申请号为201911195523.5、申请名称为“一种加速硬件页表遍历的方法及装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及计算机存储领域,尤其涉及一种加速硬件页表遍历的方法及装置。
背景技术
在处理器架构中,处理器核中的存储器管理单元(Memory Management Unit,MMU)负责把应用程序使用的虚拟地址(Virtual Address,VA)翻译成物理地址(Physical Address,PA)。在应用分页式管理机制的情况下,从VA到PA的转换需要查询页表。由于部分页表存储于MMU中的旁路转换缓冲(Translation Lookaside Buffer,TLB),一般可以通过TLB辅助MMU完成虚拟地址的转换(例如,如果所需页表基址恰好都存储在TLB中(即命中TLB),那么就能完成VA到PA的转换);否则需要对存储的页表进行硬件页表遍历(Hardware Page Table Walk,HPTW)来获取最终的PA。
目前,在所有商用处理器的绝大部分架构中,在未命中TLB(即TLBmiss)后会触发硬件自动的页表遍历过程,然后对TLB进行重填。但是,目前HPTW是串行完成的,并且访问内存次数跟页表分级数直接相关(例如,页表共分四级则需要进行至少四次数据访问才能完成VA到PA的转换)。具体地,在某一级页表基址命中某一级缓存后,将获取的页表基址返回至核(core)中的访存单元(Load Store Unit,LSU),经过算术逻辑单元(Arithmetic and Logic Unit,ALU)计算得到下一级的页表基址。在上述页表查询过程中,需要把每一次获取的数据返回至处理器核中的LSU再处理,导致页表遍历效率低。
现有技术中,为了加速HPTW一般提供以下两种加速方案。
方案一,在MMU中额外添加cache,用于缓存HPTW过程中需要的中间级别的页表基址,如第三级至第四级页表基址。在发生TLBmiss后查询该cache;如果命中该cache则直接跳转到对应级别的页表(例如,在发生TLBmiss后查询该cache;在该cache中发现命中第三级页表,则取出该页表的基址;然后继续进行HPTW来查询剩余级别的页表)。但是,在处理器核中额外增加cache会增加生产成本;另外,由于额外的cache容量较小,无法保证全部查询的命中,难以有效提高访存效率。
方案二,在每一级cache中增加预取引擎(prefetch engine),把链表结构的数据在每一级cache中进行预取,所以不需要把访存地址取回core内的LSU进行处理。但是,需要在每一级cache中增加TLB结构(TLB结构用于将VA翻译为PA),否则无法进行数据预取。因此,采取方案二会造成开销太大。
因此,如何改善硬件页表遍历的过程,加速页表遍历而提高地址转换效率,是亟待解决的问题。
发明内容
本发明实施例提供了一种加速硬件页表遍历的方法及装置,能够改善硬件页表遍历的过程,加速页表遍历而提高地址转换效率。
第一方面,本发明实施例提供了一种加速页表遍历的装置,包括第i级第一缓存的控制单元和所述第i级第一缓存的计算单元;i=1、2、…、N,N为正整数;其中,
所述第i级第一缓存的控制单元,用于:
接收访存请求,所述访存请求包括第K级基址和第K级高位地址,K为正整数;
根据所述第K级基址判断所述第i级第一缓存是否存储有所述第K级基址;
在判断所述第i级第一缓存存储有所述第K级基址的情况下,向所述第i级第一缓存的计算单元发送所述第K级基址。
所述第i级第一缓存的计算单元,用于:
根据所述第K级基址和第K级偏移地址确定第(K+1)级基址,所述第K级偏移地址为根据所述第K级高位地址确定的。
本发明实施例,通过在第一缓存的每一级缓存中增加计算单元,使得每一级第一缓存获取基址之后,能够直接根据基址和对应的偏移计算得到下一级的基址。具体地,第i级第一缓存的控制单元接收访存请求,在判断第i级第一缓存(即本级缓存)存储有第K级基址的前提下,向本级缓存的计算单元发送访存请求中的第K级基址。本级缓存根据第K级基址和从访存请求包含的高位地址中确定的偏移地址,计算得到了第(K+1)级基址(即下一级基址)。区别于现有技术中任何一级缓存在确定第K级基址后,都需要返回该级基址到处理器核内的访存单元,然后通过处理器核内的算术逻辑单元计算得到下一级基址,再继续下一级基址的查询;本发明实施例中,无论在哪一级第一缓存确定了第K级基址后,都可以通过在本级缓存增加的计算单元,移位计算得到下一级基址,提高了基址的访问效率,实现了对硬件页表遍历过程的加速,从而最终提高了虚拟地址到物理地址的转换效率。
在一种可能的实现方式中,所述第i级第一缓存的控制单元,还用于:在判断所述第i级第一缓存未存储有所述第K级基址且i≠N的情况下,向第(i+1)级第一缓存的控制单元发送所述访存请求。本发明实施例中,在第i级第一缓存判断自身没有存储第K级基址且存在下一级缓存的情况下,向下一级缓存(即第(i+1)级第一缓存)发送访存请求,补充了前述判断所述第i级第一缓存是否存储有所述第K级基址的另一种结果,使得本发明实施例更加完整,以免发生执行错误或者系统故障等。
在一种可能的实现方式中,所述装置还包括家节点;所述家节点与所述第i级第一缓存耦合;所述第i级第一缓存的控制单元,还用于:在判断所述第i级第一缓存未存储有所述第K级基址的情况下,向所述家节点发送所述访存请求。所述家节点用于接收所述访存请求。本发明实施例中,在增加对家节点的描述后,在判断第i级第一缓存未存储有所述第K级基址的情况下,不仅可以向第(i+1)级第一缓存发送所述访存请求,还可以同时向家节点发送所述访存请求。进一步完善在了第N级第一缓存的处理情况。向家节点发送访存请求,有利于判断全部缓存中基址存储情况。
在一种可能的实现方式中,所述家节点,还用于:在接收所述访存请求后,根据所述第K级基址判断N级第一缓存中每一级第一缓存是否存储有所述第K级基址;在判断目标 第一缓存存储有所述第K级基址的情况下,向所述目标第一缓存的控制单元发送所述访存请求。本发明实施例中,家节点在接受到访存请求后,判断每一级第一缓存是否有缓存存储有需要的第K级基址。在判断目标第一缓存(某一级或者某几级缓存)存储有第K级基址后,向该级缓存发送访存请求。可以避免在某缓存存储有基址但仍然去内存中访问的情况,例如,三级缓存判断自身没有第三级页表的基址,而二级缓存存储有第三级页表基址。由于三级缓存不会返回至二级缓存进行访问,那么家节点就可以负责将请求返回至二级缓存。
在一种可能的实现方式中,所述装置还包括:与所述家节点耦合的内存控制器、与所述内存控制器耦合的内存;所述家节点包括缓冲区buffer;所述家节点,还用于:在判断所述每一级第一缓存均未存储有所述第K级基址的情况下,向所述内存控制器发送所述访存请求。所述内存控制器,用于:根据所述第K级基址,在所述内存中确定所述第K级基址;向所述家节点发送所述第K级基址。所述内存,用于存储所述第K级基址。所述buffer,用于根据所述第K级基址和所述第K级偏移地址,确定所述第(K+1)级基址。本发明实施例中,在家节点发现所有的缓存都没有存储所需基址的情况下,会向内存控制器发送该访存请求,指示处理器芯片内的内存控制器从内存中获取所需的基址。补充了前述实施例可能发生的情况,避免发生执行错误,保障在缓存均未存储所需基址的情况下还可以继续页表遍历的过程。
在一种可能的实现方式中,所述装置还包括与所述家节点耦合的第二缓存,所述第二缓存包括所述第二缓存的控制单元和所述第二缓存的计算单元;所述家节点,还用于:在接收所述访存请求后,判断所述第二缓存是否存储有所述第K级基址;在判断所述第二缓存存储有所述第K级基址的情况下,向所述第二缓存的控制单元发送所述访存请求。本发明实施例中,补充了处理器芯片有多个处理器内核(包含了第二缓存)的情况。当一个处理器内核的缓存没有存储所需基址时,可以通过家节点检查其他内核的缓存中是否有需要的基址,以继续页表的查询。
在一种可能的实现方式中,所述装置还包括与所述家节点耦合的内存控制器、与所述内存控制器耦合的内存;所述家节点包括缓冲区buffer;所述家节点,还用于:在判断所述每一级第一缓存和所述第二缓存,均未存储有所述第K级基址的情况下,向所述内存控制器发送所述访存请求。所述内存控制器,用于:根据所述第K级基址,在所述内存中确定所述第K级基址;向所述家节点发送所述第K级基址。所述内存用于存储所述第K级基址。所述buffer,用于根据所述第K级基址和所述第K级偏移地址,确定所述第(K+1)级基址。本发明实施例中,描述了在处理器芯片有多个处理器内核的情况下,通过家节点检测所有缓存均没有存储所需的基址时向内存控制器发送相应指令,以便于及时从内存中获取所需的基址,以保障无法命中缓存时还能够继续查询页表基址。
在一种可能的实现方式中,所述装置还包括与第1级第一缓存耦合的存储器管理单元,所述存储器管理单元包括第三缓存;所述第三缓存用于存储K级基址,K=1、2、…、M,M为大于1的整数;所述存储器管理单元,用于:在向所述第1级第一缓存发送访存请求之前,判断所述第三缓存是否存储有所述第1级基址;在判断所述第三缓存存储有所述第1级基址的情况下,获取所述第1级基址;在判断所述第三缓存未存储有所述第1级基址 的情况下,向所述第1级第一缓存发送所述访存请求。本发明实施例中,在存储器管理单元中新增加了第三缓存(即新增cache),该新增的cache可以存储部分页表基址。当发生TLB miss后,可以在新增的cache中查找是否有完成VA到PA转换所需的全部页表基址。如果命中新增的cache的话,就可以直接获得全部的页表基址,从而不必执行页表遍历,快速得到需要的物理地址。
在一种可能的实现方式中,所述访存请求还包括所述第K级基址的基址标识,所述第K级基址的基址标识用于指示所述第K级基址的级别。本发明实施例中补充了访存请求的形式。在访存请求中可以加入特定的域或者数据段,用于缓存的控制单元或家节点识别该访存请求是对哪一级基址的查询和访问。
在一种可能的实现方式中,所述第i级第一缓存的控制单元,还用于:在确定第(M+1)级基址的情况下,向所述存储器管理单元发送所述第(M+1)级基址。所述存储器管理单元,还用于接收所述第(M+1)级基址。本发明实施例,补充了在确定最后一级基址(即第(M+1)级基址)后缓存可能的处理情况。例如,某一级缓存确定了最后一级基址,可以将该基址发送给存储器管理单元。进一步地,存储器管理单元在接到最后的页基址(即最后一级基址),加上页偏移得到物理地址。
第二方面,本发明实施例提供了一种加速页表遍历的方法,包括:通过第i级第一缓存的控制单元接收访存请求,所述访存请求包括第K级基址和第K级高位地址,K为正整数,i=1、2、...、N;通过所述第i级第一缓存的控制单元,根据所述第K级基址判断所述第i级第一缓存是否存储有所述第K级基址;在判断所述第i级第一缓存存储有所述第K级基址的情况下,通过所述第i级第一缓存的控制单元,向所述第i级第一缓存的计算单元发送所述第K级基址;通过第i级第一缓存的计算单元,根据所述第K级基址和第K级偏移地址确定第(K+1)级基址,所述第K级偏移地址为根据所述第K级高位地址确定的。
在一种可能的实现方式中,所述方法还包括:在判断所述第i级第一缓存未存储有所述第K级基址且i≠N的情况下,通过所述第i级第一缓存的控制单元,向第(i+1)级第一缓存的控制单元发送所述访存请求。
在一种可能的实现方式中,所述方法还包括:在判断所述第i级第一缓存未存储有所述第K级基址的情况下,通过所述第i级第一缓存的控制单元向家节点发送所述访存请求。
在一种可能的实现方式中,所述方法还包括:在接收所述访存请求后,通过所述家节点,根据所述第K级基址判断N级第一缓存中每一级第一缓存是否存储有所述第K级基址;在判断目标第一缓存存储有所述第K级基址的情况下,向所述目标第一缓存的控制单元发送所述访存请求。
在一种可能的实现方式中,所述方法还包括:在判断所述每一级第一缓存均未存储有所述第K级基址的情况下,通过所述家节点向内存控制器发送所述访存请求;通过所述内存控制器,根据所述第K级基址在内存中确定所述第K级基址;通过所述内存控制器向所述家节点发送所述第K级基址;根据所述第K级基址和所述第K级偏移地址,通过所述家节点的缓冲区buffer确定所述第(K+1)级基址。
在一种可能的实现方式中,所述方法还包括:在接收所述访存请求后,通过所述家节 点判断第二缓存是否存储有所述第K级基址;在判断所述第二缓存存储有所述第K级基址的情况下,通过所述家节点向所述第二缓存的控制单元发送所述访存请求。
在一种可能的实现方式中,所述方法还包括:在判断所述每一级第一缓存和所述第二缓存均未存储有所述第K级基址的情况下,通过所述家节点向内存控制器发送所述访存请求;
根据所述第K级基址,通过所述内存控制器在内存中确定所述第K级基址;通过所述内存控制器向所述家节点发送所述第K级基址;根据所述第K级基址和所述第K级偏移地址,通过所述家节点的缓冲区buffer确定所述第(K+1)级基址。
在一种可能的实现方式中,所述方法还包括:在向第1级第一缓存发送访存请求之前,通过存储器管理单元判断第三缓存是否存储有所述第1级基址;在判断所述第三缓存存储有所述第1级基址的情况下,通过所述存储器管理单元获取所述第1级基址;在判断所述第三缓存未存储有所述第1级基址的情况下,通过所述存储器管理单元向所述第1级第一缓存发送所述访存请求。
在一种可能的实现方式中,所述访存请求还包括所述第K级基址的基址标识,所述第K级基址的基址标识用于指示所述第K级基址的级别。
在一种可能的实现方式中,在家节点接收所述访存请求后,可以先判断第一缓存中是否存储有第K级基址,再判断第二缓存中是否存储有第K级基址。可选地,在家节点接收所述访存请求后,同时判断第一缓存中以及第二缓存中是否存储有第K级基址。进一步可选地,在任一缓存中存储有所需的第K级基址,家节点都向其发送相应的访存请求;家节点可以接受多个缓存反馈的结果,在众多结果以最高级基址为最终的遍历结果,再继续遍历过程。或者,在多个缓存中已完成遍历的过程,可以直接通过完成遍历最后一步的缓存(即获得最后基址的缓存)向存储器管理单元反馈最后的基址。
第三方面,本发明实施例提供了一种加速页表遍历的装置,包括:N级第一缓存;所述N级第一缓存中的每一级第一缓存包括计算单元和控制单元;所述N级第一缓存存储了M级基址中一级或多级基址;N为大于0的整数,M为大于1的整数;其中,
第i级第一缓存的控制单元,用于:接收第K级访存请求,所述第K级访存请求包括第K级基址和第K级高位地址;0<K≤M,K为整数;i=1、2、...、N;根据所述第K级基址判断所述第i级第一缓存是否存储有所述第K级基址;在判断所述第i级第一缓存存储有所述第K级基址的情况下,向所述第i级第一缓存的计算单元发送所述第K级基址。
第i级第一缓存的计算单元,用于:根据所述第K级基址和第K级偏移地址确定第(K+1)级基址;所述第K级偏移地址为根据所述第K级高位地址确定的。
在一种可能的实现方式中,所述装置还包括家节点;所述家节点与第N级第一缓存耦合;所述第N级第一缓存的控制单元,还用于:在判断所述第N级第一缓存未存储有所述第K级基址的情况下,向所述家节点发送所述第K级访存请求。所述家节点用于接收所述第K级访存请求。
在一种可能的实现方式中,所述家节点,还用于:在接收所述第K级访存请求后,根据所述第K级基址判断所述每一级第一缓存是否存储有所述第K级基址;在判断第p级第 一缓存存储有所述第K级基址的情况下,向所述第p级第一缓存的控制单元发送所述第K级访存请求,p=1、2、...、N且p≠i。
在一种可能的实现方式中,所述方法还包括:在向第1级第一缓存发送第1级访存请求之前,通过存储器管理单元判断第三缓存是否存储有所述第1级基址;在判断所述第三缓存存储有所述第1级基址的情况下,通过所述存储器管理单元获取所述第1级基址;在判断所述第三缓存未存储有所述第1级基址的情况下,通过所述存储器管理单元向所述第1级第一缓存发送所述第1级访存请求。
第四方面,本发明实施例提供一种终端,该终端包括处理器,处理器被配置为支持该终端执行第二方面提供的一种加速硬件页表遍历的方法中相应的功能。该终端还可以包括存储器,存储器用于与处理器耦合,其保存终端必要的程序指令和数据。该终端还可以包括通信接口,用于该终端与其它设备或通信网络通信。
第五方面,本发明实施例提供了一种芯片系统,可包括:如上述第一方面所述的加速页表遍历装置、以及耦合于所述加速页表遍历装置的辅助电路。
第六方面,本发明实施例提供了一种电子设备,可包括:如上述第一方面所述的加速页表遍历装置,以及耦合于所述加速页表遍历装置外部的分立器件。
第七方面,本发明实施例提供了一种芯片系统,所述芯片系统可以执行如上述第二方面中涉及的任意方法,使得相关功能得以实现。在一种可能的设计中,所述芯片系统还包括存储器,所述存储器,用于保存必要的程序指令和数据。该芯片系统,可以由芯片构成,也可以包含芯片和其他分立器件。
附图说明
为了更清楚地说明本发明实施例中的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍。
图1是本发明实施例提供的一种硬件页表遍历流程示意图;
图2是本发明实施例提供的一种系统架构示意图;
图3是本发明实施例提供的一种应用架构示意图;
图4是本发明实施例提供的另一种应用架构示意图;
图5是本发明实施例提供的一种加速硬件页表遍历装置的示意图;
图6是本发明实施例提供的一种具体的加速硬件页表遍历装置的示意图;
图7是本发明实施例提供的一种缓存内部结构示意图;
图8是本发明实施例提供的一种页表查询过程示意图;
图9是本发明实施例提供的一种对应图8的部分硬件的交互示意图;
图10是本发明实施例提供的一种访存请求的命令格式示意图;
图11是本发明实施例提供的另一种页表查询过程示意图;
图12是本发明实施例提供的一种对应图11的部分硬件的交互示意图;
图13是本发明实施例提供的又一种页表查询过程示意图;
图14是本发明实施例提供的一种对应图13的部分硬件的交互示意图;
图15是本发明实施例提供的再一种页表查询过程示意图;
图16是本发明实施例提供的一种对应图15的部分硬件的交互示意图;
图17是本发明实施例提供的一种加速硬件页表遍历装置的示意图;
图18是本发明实施例提供的另一种具体的加速硬件页表遍历装置的示意图;
图19是本发明实施例提供的一种多核情况下的页表查询过程示意图;
图20是本发明实施例提供的一种对应图19的部分硬件的交互示意图;
图21是本发明实施例提供的一种多核情况下加速硬件页表遍历装置的示意图;
图22是本发明实施例提供的一种加速硬件页表遍历方法的示意图;
图23是本发明实施例提供的另一种加速硬件页表遍历方法的示意图;
图24是本发明实施例提供的一种芯片的结构示意图。
具体实施方式
下面将结合本发明实施例中的附图,对本发明实施例进行描述。
本申请的说明书和权利要求书及所述附图中的术语“第一”、“第二”、“第三”和“第四”等是用于区别不同对象,而不是用于描述特定顺序;并且术语“第一”、“第二”、“第三”和“第四”等描述的对象也可以是相同的对象,或者彼此存在包含或者其他关系。此外,术语“包括”和“具有”以及它们任何变形,意图在于覆盖不排他的包含。例如包含了一系列步骤或单元的过程、方法、系统、产品或设备没有限定于已列出的步骤或单元,而是可选地还包括没有列出的步骤或单元,或可选地还包括对于这些过程、方法、产品或设备固有的其它步骤或单元。
在本文中提及“实施例”意味着,结合实施例描述的特定特征、结构或特性可以包含在本申请的至少一个实施例中。在说明书中的各个位置出现该短语并不一定均是指相同的实施例,也不是与其它实施例互斥的独立的或备选的实施例。本领域技术人员显式地和隐式地理解的是,本文所描述的实施例可以与其它实施例相结合。
在本说明书中使用的术语“部件”、“模块”、“系统”等用于表示计算机相关的实体、硬件、固件、硬件和软件的组合、软件、或执行中的软件。例如,部件可以是但不限于,在处理器上运行的进程、处理器、对象、可执行文件、执行线程、程序和/或计算机。通过图示,在计算设备上运行的应用和计算设备都可以是部件。一个或多个部件可驻留在进程和/或执行线程中,部件可位于一个计算机上和/或分布在2个或更多个计算机之间。此外,这些部件可从在上面存储有各种数据结构的各种计算机可读介质执行。部件可例如根据具有一个或多个数据分组(例如来自与本地系统、分布式系统和/或网络间的另一部件交互的二个部件的数据,例如通过信号与其它系统交互的互联网)的信号通过本地和/或远程进程来通信。
首先,对本申请中的部分用语进行解释说明,以便于本领域技术人员理解。
(1)物理地址(Physical Address),也叫实地址(real address)、二进制地址(binary address),它是在地址总线上,以电子形式存在的,使得数据总线可以访问主存的某个特定存储单元的内存地址。地址从0开始编号,顺序地每次加1,因此存储器的物理地址空间是呈线性增长的。
(2)在分页式管理机制(即分页机制)中,虚拟地址经过变换以产生物理地址。
(3)内存(Memory),用于暂时存放处理器中的运算数据,以及与硬盘等外部存储器交换的数据。在计算机运行过程中,处理器会把需要运算的数据调到内存中进行运算,当运算完成后再将结果传送出来。动态随机存取存储器(Dynamic Random Access Memory,DRAM)的性价比很高且扩展性不错,是一般内存的最主要部分。
(4)中央处理器(Central Processing Unit,CPU),用于解释计算机指令以及处理计算机中的数据。在计算机中负责读取指令,对指令译码并执行指令。中央处理器主要包括两个部分,即控制器、运算器,其中还包括高速缓冲存储器及实现它们之间联系的数据、控制的总线。
(5)算术逻辑单元(Arithmetic&logical Unit,ALU)是处理器的核心组成部分,主要执行各种算术和逻辑运算操作,如加、减、乘、除四则运算,与、或、非、异或等逻辑操作,以及移位、比较和传送等操作。
(6)存储器管理单元(Memory Management Unit,MMU),有时称作分页存储器管理单元(paged memory management unit,PMMU),或称存储器管理单元,是一种负责处理中央处理器(CPU)的内存访问请求的计算机硬件。它的功能包括虚拟地址到物理地址的转换(即虚拟内存管理)、内存保护、中央处理器高速缓存的控制;在较为简单的计算机体系结构中,负责总线的仲裁以及存储体切换。
(7)高速缓冲存储器(cache),简称缓存,位于CPU和主存储器DRAM之间,是一种容量较小但速度很高的存储器,通常由静态存储器(Static Random Access Memory,SRAM)组成。其中,只要保持SRAM通电,其储存的数据就可以恒常保持。SRAM一般可分为以下五大部分:存储单元阵列、行/列地址译码器、灵敏放大器、控制电路和驱动电路。具体地,cache可以保存CPU刚用过或循环使用的一部分数据,如果CPU需要再次使用该部分数据时可从cache中直接调用,加快数据取用的速度,减少了CPU的等待时间。cache一般分为一级缓存(level 1 cache,L1 cache)、二级缓存(level 2 cache,L2 cache)和三级缓存(level 3 cache,L3 cache)等等;其中,L1cache主要集成在CPU内部,L2cache集成在主板上或CPU内部,L3 cache集成在主板上或者CPU内部,在CPU内部时L3 cache为多个处理器核共享。
(8)缓冲区(buffer),是预留的一定容量的存储空间,用来缓冲输入或输出的数据。缓冲区根据其对应的是输入设备还是输出设备,分为输入缓冲区和输出缓冲区。
(9)处理器核(core),是处理器的核心,用来完成所有的计算、接受/存储命令、处理数据等。各种处理器的核心都具有固定的逻辑结构,涉及一级缓存、二级缓存、执行单元、指令级单元和总线接口等逻辑单元的布局。
(10)转译后备缓冲器(Translation Lookaside Buffer,TLB),也被翻译为页表缓存、转址旁路缓存,或称旁路转换缓冲,为CPU的一种缓存,用于改进虚拟地址到物理地址的 转译速度。当前所有的桌面型及服务器型处理器(如x86)皆使用TLB。TLB具有固定数目的空间槽,用于存放将虚拟地址映射至物理地址的标签页表条目。其搜索关键字为虚拟内存地址,其搜索结果为物理地址。如果请求的虚拟地址在TLB中存在,将给出一个非常快速的匹配结果,之后就可以使用得到的物理地址访问存储器。如果请求的虚拟地址不在TLB中,就会使用标签页表进行虚实地址转换,而标签页表的访问速度比TLB慢很多。有些系统允许标签页表被交换到次级存储器,那么虚实地址转换可能要花非常长的时间。
(11)页表是一种特殊的数据结构,放在系统空间的页表区,存放逻辑页与物理页帧的对应关系。用固定大小的页(Page)来描述逻辑地址空间,用相同大小的页框(Frame)来描述物理内存空间,由操作系统实现从逻辑页到物理页框的页面映射,同时负责对所有页的管理和进程运行的控制。
(12)内存控制器(Memory Controller),是一个用于管理与规划从内存到CPU间传输速度的总线电路控制器;它可以是一个单独的芯片或集成到相关的大型芯片里。按照一定的时序规则对内存访问进行必要控制,包括地址信号、数据信号以及各种命令信号的控制,使CPU能够根据需求使用内存的存储资源。
(13)排序点(Point of Serialization,PoS),或家节点(Home Node/Ordering Point),也称作串行点,是用于维护多核处理器中多核之间一致性的关键点。在PoS处,可以监测所有处理器核中的cache的数据是否被修改过以及其状态,以保证cache之间的数据一致性。因此,在本申请中可以通过PoS处理HPTW可以确保不会取到旧的页表数据,保证数据的正确性。
(14)CPU流水线技术,是一种将指令分解为多步,并让不同指令的各步操作重叠,从而实现几条指令并行处理,以加速程序运行过程的技术。指令的每步有各自独立的电路来处理,每完成一步,就进到下一步,而前一步则处理后续指令。处理器的流水线结构是处理器微架构最基本的一个要素,承载并决定处理器其他微架构的细节。
(15)总线,是处理器多核连接的关键,用于核与高速缓存、内存等之间传送信息。
(16)扩展页表(Extended Page Table,EPT),由四级页映射表(page map level 4 table,PML4)、页目录指针表(page-directory-pointer table,PDPT)、页目录表(page-directory,PD)和页表(page table,PT)共四级页表组成。
(17)硬件页表遍历(Hardware Page Table Walk,HPTW),是硬件模块查询页表的过程。程序访问内存时所使用的虚拟地址,需要转换成物理地址才可以访问内存;在发生TLBmiss后,可以由硬件完成对页表的遍历,查找缺失的页表。例如,48位的虚拟地址经过完整的遍历最终确定对应的物理地址。请参见图1,图1是本发明实施例提供的一种硬件页表遍历流程示意图。如图1所示,页面基址寄存器中存储的是页表的物理基地址,当发生TLBmiss后,处理器核中的MMU部分提供HPTW功能,以CR3中的物理地址为第一级页表的基地址。获取该数据后再把VA中的PML4作为地址偏移[47:39]查询并获取第二级页表的基地址;拼接VA的[38:30]后继续查询,一直到获取最后物理地址结束。HPTW过程需要查询L2 cache,如果页表在L2 cache中命中,上述过程可以被一定程度上加速(不用每次都要去内存取数据),否则上述HPTW过程需要进行四次访存操作才能完成。
(18)片上总线协议,提供一种特殊的机制,可将处理器集成在其它知识产权核 (Intellectual Property,IP)和外设中。
(19)基地址,或物理基地址,简称基址,可以理解为数据(如页表或页)在存储器中存储的基本地址,是相对偏移量(偏移地址)的计算基准。例如,在多级页表的情况下,确定某一级页表的基址后,根据对应的偏移地址可以确定该级页表中某一个页表项为下一级页表的基址。
下面先对本发明实施例基于的一种系统架构进行描述,请参见图2,图2是本发明实施例提供的一种系统架构示意图;如图2所示,该系统架构包括了处理器芯片10、处理器内核11、总线12、内存控制器14和内存20。其中,在处理器芯片10内部包括处理器内核11、总线12和内存控制器14。处理器内核11可以包括处理器内核1、处理器内核2、....和处理器内核Q(Q为大于0的整数)。多个处理器内核以及内存控制器14,均与总线12连接。处理器芯片10通过处理器内核11处理数据(例如,在本发明实施例中,处理器内核负责将虚拟地址转换成物理地址,具体地,在只有一个处理器内核的情况下,该核可以通过与片内的其他器件完成硬件页表遍历的过程;在多个处理器内核且某个核的缓存未存储全部基址的情况下,多个核之间会进行数据交互,配合其他片内的器件完成硬件页表遍历的过程);处理器芯片10通过内存控制器14与内存20进行交互,例如读取数据或者写入数据等。总线是多个核以及其他部件(如内存控制器等)之间进行数据交互的一种通道。
基于前述的系统架构,以某一个处理器内核为例,接下来对本发明实施例所基于的其中一种应用架构进行描述。请参见图3,图3是本发明实施例提供的一种应用架构示意图;如图3所示,本发明实施例可以应用于处理器芯片10;处理器芯片10与内存20连接。处理器芯片10包括处理器内核11、总线12、家节点13和内存控制器14。本发明实施例对模块、单元或者器件之间的具体连接方式不作限定。其中,
内存20用于存储M级物理基地址(即M级基址)中每一级基址,M为大于1的整数。例如,在四级页表的情况下,内存存储第1级页表至第4级页表的基址,共4级基址。
内存控制器14用于接收家节点发送的第K级访存请求;根据所述第K级访存请求中的第K级基址的相关信息,从内存中获取对应的第K级基址,并向家节点返回第K级基址;K为大于0的整数。
家节点13用于:
接收请求(request),请求的类型可以包括HPTW请求以及其他类型的请求。
判断接收的请求(request)是否为访存请求(即HPTW请求)。
在家节点判断该请求为访存请求的前提下,识别该访存请求查询的目标页表基址的级别(例如,第K级访存请求,查询的是第K级的页表基址);可选地,通过家节点buffer(包含识别模块),识别处目标页表基址的级别。
判断N级第一缓存中哪一级缓存存储了该目标页表的基址。
在家节点判断N级第一缓存中哪一级缓存存储了该目标页表的基址后,向存储了该目标页表基址的第一缓存发送该访存请求。
在判断所述N级第一缓存每一级都没有存储所述第K级基址的情况下,向所述内存控 制器发送所述第K级访存请求。可选地,家节点根据获得第K级基址计算第(K+1)级基址,进一步根据第(K+1)级基址进行查询。例如,家节点从内存中获取第3级基址,计算得到第4级基址,可以立刻在家节点PoS中继续寻找第4级基址。
处理器内核11中可以包括处理器核流水线110、运算器111、访存单元112、存储器管理单元113和第一缓存114。其中,第一缓存114可以包括N级第一缓存,如第1级第一缓存、第2级第一缓存、第3级第一缓存、…、第i级第一缓存、…和第N级第一缓存等等;i=1、2、...、N,即i的取值可以是1、2、3、…或者N,N为大于0的整数;本发明实施例对第一缓存包含的多级缓存数量不作限定。
处理器核流水线110用于并行处理运算器111、访存单元112等单元或者器件之间的指令。可以理解的是,运算器或执行单元,可以包括算术逻辑单元ALU。
访存单元112用于向存储器管理单元MMU发送应用程序提供的虚拟地址,接收MMU反馈的该虚拟地址对应的物理地址。
可选地,在不启用MMU的情况下,访存单元112可以与第一缓存交互。访存单元112可以获取第一缓存114返回的页基址,再向ALU发送页基址,由ALU进行计算处理。例如,将页基址反馈给ALU,ALU将页基址加上页偏移得到最终需要的物理地址。
存储器管理单元113用于向第1级第一缓存发送第1级访存请求,例如,在发生TLBmiss后,确定TLB缺失页表基址(在四级页表的情况下,TLB存储1-4级页表的基址或者没有存储1-4页表的基址,不存在只存储部分级别页表基址的状况)后,那么发送第1级访存请求(包括第一级页表基址和虚拟地址中的高位地址);接收所述第(M+1)级基址,即最后页表中的某一页的基址。可选地,存储器管理单元113可以包括旁路转换缓冲1130。可以理解的是,存储器管理单元113为存储单元的一部分。
第一缓存114用于存储M级页表基址中一级或者多级页表的基址。
可选地,处理器内核11可以是图3所示的一个内核,也可以包括多个内核。例如,请参见图4,图4是本发明实施例提供的另一种应用架构示意图。如图4所示,处理器内核11包括多个内核,如处理器内核1、处理器内核2、…、处理器内核Q(Q为大于1的整数)。其中,家节点13可以包括多个家节点,例如,在每个内核与总线的连接处,可以设置对应的家节点。PoS和第N级第一缓存(如L3 cache为核外的多核共享的缓存)是逻辑和功能上独立的;PoS在逻辑上可以位于处理器内部的任何位置。在处理器设计中可以把两者分开设计(即分别位于处理器中不同的物理位置);或者,为了加速两者交互,在物理结构上将两者放在一处或者紧挨。本发明实施例对PoS和L3 cache的组织结构不作限定。
具体地,在图4所示的架构中,家节点用于:
在判断所述第二缓存和所述第N级第一缓存中任一缓存存储有所述第K级基址的情况下,家节点向对应的缓存发送第K级访存请求;
在判断所述第二缓存和所述第N级第一缓存均未存储有所述第K级基址的情况下,向所述内存控制器发送第K级访存请求。
需要说明的是,家节点的其他内容(如接收访存请求、识别基址级别等等)请参见前述图1的相关描述,在此不再赘述。
处理器内核11中还可以包括第二缓存115;所述第二缓存包括(N-1)级第一缓存, 如第1级第一缓存、第2级第一缓存、第3级第一缓存、…、第i级第一缓存、…和第(N-1)级第一缓存等等。可以理解的是,在多核的情况下,第一缓存包含了第N级第一缓存和所有核的第二缓存115;在单核的情况下,第一缓存包含了所有的N级第一缓存。可以理解的是,在多核的情况下,针对某一个核而言,第一缓存可以是该核内的缓存,以及与其他核共享的缓存(例如,图3所示的第N级第一缓存;可选地,共享的缓存可以为一个或者多个)的总称;第二缓存是其他核内的缓存总称。
需要说明的是,图4所示的处理器核流水线110、运算器111、访存单元112、存储器管理单元113等等与图3中一致的单元或者器件,请参见图3的相关描述以及前述的部分用语解释,在此不再赘述。
需要说明的是,本申请可以具体应用于上述所有缓存和家节点中,以加速HPTW过程。
结合图3所述应用架构,下面对本发明实施例涉及的一种加速硬件页表遍历装置进行描述。请参见图5,图5是本发明实施例提供的一种加速硬件页表遍历装置的示意图;如图5所示,主要对图1中涉及存管理单元113、第一缓存114、家节点13、内存控制器14和内存20之间的交互情况进行描述。存储器管理单元113与二级缓存1141连接;三级缓存1142与二级缓存1141连接,还通过总线12与内存控制器14连接。其中,三级缓存1142连接总线的节点可以设置有家节点13。处理器芯片10中的内存控制器14与内存20连接。本发明实施例涉及的部件或者单元的相关内容,可以请参见前述实施例的描述,在此不再赘述。
在N=2,M=4的情况下,这2级第一缓存中存储了4级基址中一级或多级基址,即第一缓存中可能只存储部分基址或者存储全部基址。请参见图6,图6是本发明实施例提供的一种具体的加速硬件页表遍历装置的示意图;如图6所示,处理器芯片包括处理器内核11和三级缓存(L3 cache,即第2级第一缓存),处理器内核11包括二级缓存(L2 cache,即第1级第一缓存)。图中L3 cache为核外的缓存,但本发明实施例对此不作限定,即L3 cache也可以是核内的缓存。需要说明的是,图6只是一种示例性的情况。对于包含了四级缓存、五级缓存、六级缓存等能够存储页表基址的多级缓存的情况,可以参考图6以及相应实施例的描述。
其中,缓存内部结构请参见图7,图7是本发明实施例提供的一种缓存内部结构示意图;如图7所示,第i级第一缓存中包括第i级第一缓存的控制单元和第i级第一缓存的计算单元。
第i级第一缓存的计算单元,用于根据第K级基址和第K级偏移地址,确定第(K+1)级基址;所述第K级偏移地址为根据虚拟地址中的第K级高位地址确定的;K为大于0且小于或等于M的整数,i=1、2、...、N。例如,根据第1级基址和第1级偏移地址(地址偏移或者偏移量),确定第2级基址;假设一共有4级页表,根据第4级基址和第4级偏移地址确定第5级基址(第5级基址就是第4级页表中某一页的物理基地址);所以,基址可以为页表或者页的物理基地址。具体地,例如,在第一级页表基址命中L2 cache的情况下,L2 cache在确定第一级页表基址的后,根据偏移地址确定第一级页表中哪一项是第二级页表基址;假设48位的虚拟地址中[47:39](即第1级高位地址)为000100000,对应的是32, 在32属于第一级页表的序号范围内的前提下,cache查看第一级页表第32项,获取里面存放的是第二级页表的物理基地址,即确定第二级页表基址。具体的基址确定以及地址拼接的描述,请参见部分用语解释(19),在此不再赘述。
基于图3所示的架构和相关装置,在N=2,M=4情况下对本发明实施例可能涉及的四种页表查询流程进行描述。
请参见图8和图9,图8是本发明实施例提供的一种页表查询过程示意图;如图8所示,第K级访存请求、第K级基址以及第(K+1)级基址中的K的取值包括1、2、3、…、M。图9是本发明实施例提供的一种对应图8的部分硬件的交互示意图;假设L2 cache命中全部的四级页表基址(即4级基址的每一级都存储在L2 cache中),如图9所示,在前述图6所示的加速硬件页表遍历的装置中,其各个功能模块可以按照以下时序执行相应的操作,具体步骤如下:
步骤1(S1):二级缓存的控制单元71接收第1级访存请求(包括第1级基址和第1级高位地址)。
具体地,从MMU113接收第1级访存请求;从MMU发往二级缓存的访存请求的命令格式,请参见图10,图10是本发明实施例提供的一种访存请求的命令格式示意图;如图10所示,命令格式可以包括高位地址、请求的类型、第i级页表基址和位域。结合前述实施例,在48位虚拟地址和4级页表的例子中,对应的命令格式为{高位地址[47:12]、访存请求(即HPTW请求)、第i级页表基址和位域[1:0]}。以第1级访存请求的命令格式为例,如图9所示,[47:12]为虚拟地址中除了页偏移offset[11:0]该段低位地址以外的高位地址,包括了[47:39](即第1级高位地址)、[38:30](即第2级高位地址)、[29:21](即第3级高位地址)和[20:12](即第4级高位地址)。
可选地,根据“请求的类型”该区域的数据,判断该请求是否为访存请求。
可选地,在判断某请求为HPTW请求后,根据命令格式中的位域判断该HPTW请求是对页表查询的哪一级页表进行的查询。以{[47:12]、访存请求、第1级页表基址、00}为例,位域00用于指示该访存请求对第2级页表基址查询。
可选地,二级缓存的控制单元从MMU接收第1级访存请求;其中,第1级访存请求包含的高位地址可以为[47:12]。
可选地,在MMU触发页表遍历后,该硬件页表遍历的过程从CR3中获取起始页表的基址开始查询,直到查询该级目标页表基址结束。例如,在四级页表的前提下,TLB中只缺失包含第2级页表基址的页表项,那么当遍历过程中查询到第2级页表基址时结束。
步骤2(S2):二级缓存的控制单元71根据第1级基址,判断二级缓存有第1级基址;然后根据第1级高位地址查找到存储的第1级基址。具体地,确定第1级基址的高位,即该地址的tag;判断二级缓存中是否存储该tag,在二级缓存中存储有该tag就可以确定二级缓存中存储有第1级基址。可以理解的是,tag是基址的一部分,可以认为以基址的一部分作为判断有无存储的标识;可选地,将接收的第1级基址和存储的所有基址进行比较,判断二级缓存是否存储有相同的基址。
步骤3(S3):二级缓存的控制单元71向二级缓存的计算单元发送第1级基址和第1 级高位地址。具体地,计算单元发送第1级基址和高位地址(如[47:12]);可以理解的是,高位地址包含了第1级高位地址(如[47:39])。可选地,在判断访存请求查询的是页表级别后,可以从高位地址中确定对应级别的高位地址。例如,当前的访存请求查询的是第1级页表基址,那么根据第1级页表基址的tag来判断二级缓存是否命中第1级页表基址以及在命中后查询第1级页表基址。
步骤4(S4):二级缓存的计算单元72在第1级基址上加偏移地址计算得到第2级基址。
具体地,在确定了第1级基址后,在第1级基址上拼接对应的第1级偏移地址,得到第2级基址。在第1级基址基础上,再增加第1级偏移地址确定第2级基址(该计算过程请参见前述实施例的相关描述,在此不再赘述),即在第1级页表中通过第1级偏移地址确定了第1级页表中哪一个页表项为第2级页表的基址。
步骤5(S5):在二级缓存的计算单元72得到第2级基址后,向二级缓存的控制单元71发送第2级基址。
步骤6(S6):二级缓存的控制单元71根据第2级高位地址(第2级高位地址包含在高位地址中)判断二级缓存有第2级基址;然后根据第2级基址查找到存储的第2级基址。
具体地,请参见前述步骤2的描述,在此不再赘述。
步骤7(S7):二级缓存的控制单元71向二级缓存的计算单元发送第2级基址和第2级高位地址。
具体地,请参见前述步骤3的描述,在此不再赘述。
……
步骤(L-1):二级缓存的计算单元72在第4级基址上加偏移地址计算得到第5级基址(即第四级页表中的页基址)。
具体地,当获得第4级基址后,根据第4级偏移地址确定第4级页表中的页表项,页就确定了所需物理地址对应的页基址。
步骤L(SL,以下均以步骤L/SL,表示最后的步骤):二级缓存的计算单元72向二级缓存的控制单元71反馈第5级基址。
可选地,二级缓存的控制单元71可以向MMU113反馈该页基址。进一步可选地,在获取第5级基址之后,MMU可以根据页基址和页偏移完成VA到PA的转换。
本发明实施例,通过在第一缓存的每一级缓存中增加计算单元,使得每一级第一缓存获取基址之后,能够直接根据基址和对应的偏移计算得到下一级的基址。具体地,第i级第一缓存的控制单元接收第K级访存请求,在判断第i级第一缓存(即本级缓存)存储有第K级基址的前提下,向本级缓存的计算单元发送第K级访存请求中的第K级基址。本级缓存根据第K级基址和从第K级访存请求包含的高位地址中确定的偏移地址,计算得到了第(K+1)级基址(即下一级基址)。区别于现有技术中任何一级缓存在确定第K级基址后,都需要返回该级基址到处理器核内的访存单元,然后通过处理器核内的算术逻辑单元计算得到下一级基址,再继续下一级基址的查询;本发明实施例中,无论在哪一级第一缓存确定了第K级基址后,都可以通过在本级缓存增加的计算单元,移位计算得到下一级基址,提高了基址的访问效率,实现了对硬件页表遍历过程的加速,从而最终提高了虚拟地址到 物理地址的转换效率。
需要说明的是,图示的S1、S2、S3等标识对应步骤1、步骤2、步骤3等步骤,以下的实施例也采用相同的标识,后续不再说明。可以理解的是,本申请中图示标识对应部分步骤,是示例性的描述。
请参见图11和图12,图11是本发明实施例提供的另一种页表查询过程示意图;如图11所示,访存请求可以为第K级访存请求、基址可以为第K级基址;其中,K的取值包括1、2、3、…、M。图12是本发明实施例提供的一种对应图11的部分硬件的交互示意图;假设L2 cache命中第1级基址(即第1级页表基址),L3 cache命中第2-4级基址;如图12所示,在前述图6所示的加速硬件页表遍历的装置中,其各个功能模块可以按照以下时序执行相应的操作,具体步骤如下:
步骤1(S1):二级缓存的控制单元71接收第1级访存请求。
步骤2(S2):二级缓存的控制单元71根据第1级基址,判断二级缓存有第1级基址;然后根据第1级基址查找到存储的第1级基址。
步骤3(S3):二级缓存的控制单元71向二级缓存的计算单元发送第1级基址和第1级高位地址。
步骤4(S4):二级缓存的计算单元72在第1级基址上加偏移地址计算得到第2级基址。
步骤5(S5):在二级缓存的计算单元72得到第2级基址后,向二级缓存的控制单元71发送第2级基址。
步骤6(S6):二级缓存的控制单元71根据第2级基址判断二级缓没有第2级基址;然后向三级缓存的控制单元81发送第2级访存请求。
具体地,在判断二级缓存(第2级第一缓存)未存储有所述第2级基址且i≠N的情况下,向三级缓存(即第3级第一缓存)发送所述第2级访存请求。在i=N的情况下,例如,在判断三级缓存未存储有第2级基址后,可以由三级缓存向家节点发送第2级访存请求。
可选地,二级缓存的控制单元71在判断二级缓没有第2级基址后,还可以同时向家节点发送第2级访存请求。可以理解的是,在本发明实施例中,家节点和三级缓存的控制单元81在接收到第2级访存请求后,执行的操作基本一致,所以在图中只描述三级缓存与二级缓存的交互关系。
步骤7(S7):三级缓存的控制单元81接收第2级访存请求,所述第2级访存请求包括第2级基址和第2级高位地址(也包含在高位地址中)。
步骤8(S8):三级缓存的控制单元81根据第2级基址,判断三级缓存有第2级基址;然后根据第2级基址查找到存储的第2级基址。
步骤9(S9):三级缓存的控制单元81向三级缓存的计算单元发送第2级基址和第2级高位地址。
步骤10(S10):三级缓存的计算单元82在第2级基址上加对应的偏移地址,计算得到第3级基址。
……
步骤L(SL):三级缓存的计算单元82向三级缓存的控制单元81反馈第5级基址。
需要说明的是,本发明实施例中各个步骤与前述实施例相似的内容不再赘述,请参见前述实施例对应步骤的描述。
请参见图13和图14,图13是本发明实施例提供的又一种页表查询过程示意图;如图13所示,访存请求可以为第K级访存请求、基址可以为第K级基址;其中,K的取值包括1、2、3、…、M。图14是本发明实施例提供的一种对应图13的部分硬件的交互示意图;假设L2 cache命中第1级基址、第3级基址和第4级基址,L3 cache命中第2级基址;如图14所示,在前述图6所示的加速硬件页表遍历的装置中,其各个功能模块可以按照以下时序执行相应的操作,具体步骤如下:
步骤1(S1):二级缓存的控制单元71接收第1级访存请求。
步骤2(S2):二级缓存的控制单元71根据第1级基址,判断二级缓存有第1级基址;然后根据第1级基址查找到存储的第1级基址。
步骤3(S3):二级缓存的控制单元71向二级缓存的计算单元发送第1级基址和第1级高位地址。
步骤4(S4):二级缓存的计算单元72在第1级基址上加偏移地址计算得到第2级基址。
步骤5(S5):在二级缓存的计算单元72得到第2级基址后,向二级缓存的控制单元71发送第2级基址。
步骤6(S6):二级缓存的控制单元71根据第2级基址判断二级缓存没有第2级基址;然后向三级缓存的控制单元81发送第2级访存请求。
可选地,二级缓存的控制单元71在判断二级缓没有第2级基址后,还可以同时向家节点发送第2级访存请求。可以理解的是,在本发明实施例中,家节点和三级缓存的控制单元81在接收到第2级访存请求后,执行的操作基本一致,所以在图中只描述三级缓存与二级缓存的交互关系。
步骤7(S7):三级缓存的控制单元81接收第2级访存请求,所述第2级访存请求包括第2级基址和第2级高位地址(也包含在高位地址中)。
步骤8(S8):三级缓存的控制单元81根据第2级基址,判断三级缓存有第2级基址;然后根据第2级基址查找到存储的第2级基址。
步骤9(S9):三级缓存的控制单元81向三级缓存的计算单元发送第2级基址和第2级高位地址。
步骤10(S10):三级缓存的计算单元82在第2级基址上加对应的偏移地址,计算得到第3级基址。
步骤11(S11):在三级缓存单元82得到第3级基址后,向三级缓存的控制单元81发送第3级基址。
步骤12(S12):三级缓存的控制单元81根据第3级基址判断三级缓存没有第3级基址;然后向家节点13发送第3级访存请求。
具体地,从三级缓存的控制单元81向家节点PoS13发送访存请求,相比现有技术中的 请求新增了一条命令。从缓存出来的请求,需要发送到PoS进行统一处理。为了实现本发明实施例,需要在请求命令中添加新的命令。在AMBA CHI协议命令编码中,Opcode域是命令格式,0x3B-0x3F可以为保留命令(即reserve命令),本发明实施例可以通过使用0x3B命令实现HPTW请求。可选地,在Request flit中额外添加Addr[47:12]域(用于识别原虚拟地址中的高位地址)和Level[1:0]域(指示PoS当前正在处理的是哪一级页表)。可以理解的是,在现有技术中增加了新的命令后,原来的命令长度增加了。例如,原命令长度为137,增加后命令长度是137(原命令长度)+36(即原虚拟地址中的高位地址的长度)+2(即Level[1:0]域的长度)=175。
步骤13(S13):家节点13接收第3级访存请求。
步骤14(S14):在家节点13接收第3级访存请求后,根据第3级基址判断每一级第一缓存是否存储有第3级基址。在判断二级缓存存储有第3级基址的情况下,向二级缓存发送第3级访存请求。
具体地,根据第3级基址判断所有的缓存中哪一个或者哪些缓存存储有第3级基址。其中,PoS的结构中有2列数据,第一列存储缓存行(cacheline)的Tag(即物理地址),第二列存储是cacheline在哪一级缓存的哪个缓存(如二级缓存)中。例如,第二列存储的是缓存行存储在哪一个二级缓存中,那么0001可以代表在第一个L2 cache中,1111可以代表四个L2cache都有该数据。
可选地,家节点还包括缓冲区buffer。例如,新增四个entry,四个entry与家节点的结构相同,但可以是专门用于HPTW的(即只有HPTW请求才进入上述四个entry进行处理,例如,第一级页表进入第一个entry,或者说第一个entry的两列数据只存储第1级页表的相关信息)。在家节点13接收第3级访存请求后,家节点中的buffer根据第3级基址的tag判断哪一级第一缓存存储有第3级基址。例如,家节点先根据位域(如10)判断该访存请求是对第3级基址的查询,然后判断哪一个缓存中是否存储有第3级基址(比如,判断出二级缓存中有第3级基址)。
步骤15(S15):二级缓存的控制单元71接收第3级访存请求。可选地,在二级缓存接收了第3级访存请求之后,二级缓存的控制单元71可以根据该访存请求,再次确定本级缓存中是否存储有第3级基址。
……
步骤L(SL):二级缓存的计算单元72向二级缓存的控制单元71反馈第5级基址。
可选地,二级缓存的控制单元71向MMU反馈第5级基址。
需要说明的是,本发明实施例中各个步骤与前述实施例相似的内容不再赘述,请参见前述实施例对应步骤的描述。
请参见图15和图16,图15是本发明实施例提供的再一种页表查询过程示意图;如图15所示,访存请求可以为第K级访存请求、基址可以为第K级基址;其中,K的取值包括1、2、3、…、M。图16是本发明实施例提供的一种对应图15的部分硬件的交互示意图;假设L2 cache命中第1级基址和第3级基址,L3 cache命中第2级基址,而第4级基址存储在内存中;如图16所示,在前述图6所示的加速硬件页表遍历的装置中,其各个功能模 块可以按照以下时序执行相应的操作,具体步骤如下:
步骤1(S1):二级缓存的控制单元71接收第1级访存请求。
步骤2(S2):二级缓存的控制单元71根据第1级基址,判断二级缓存有第1级基址;然后根据第1级基址查找到存储的第1级基址。
步骤3(S3):二级缓存的控制单元71向二级缓存的计算单元发送第1级基址和第1级高位地址。
步骤4(S4):二级缓存的计算单元72在第1级基址上加偏移地址计算得到第2级基址。
步骤5(S5):在二级缓存的计算单元72得到第2级基址后,向二级缓存的控制单元71发送第2级基址。
步骤6(S6):二级缓存的控制单元71根据第2级基址判断二级缓存没有第2级基址;然后向三级缓存的控制单元81发送第2级访存请求。
可选地,二级缓存的控制单元71在判断二级缓没有第2级基址后,还可以同时向家节点发送第2级访存请求。或者,步骤6中只向家节点发送第2级访存请求。可以理解的是,在本发明实施例中,家节点和三级缓存的控制单元81在接收到第2级访存请求后,执行的操作基本一致,所以在图中只描述三级缓存与二级缓存的交互关系。
步骤7(S7):三级缓存的控制单元81接收第2级访存请求,所述第2级访存请求包括第2级基址和第2级高位地址(也包含在高位地址中)。
步骤8(S8):三级缓存的控制单元81根据第2级基址,判断三级缓存有第2级基址;然后根据第2级基址查找到存储的第2级基址。
步骤9(S9):三级缓存的控制单元81向三级缓存的计算单元发送第2级基址和第2级高位地址。
步骤10(S10):三级缓存的计算单元82在第2级基址上加对应的偏移地址,计算得到第3级基址。
步骤11(S11):在三级缓存单元82得到第3级基址后,向三级缓存的控制单元81发送第3级基址。
步骤12(S12):三级缓存的控制单元81根据第3级基址判断三级缓存没有第3级基址;然后向家节点13发送第3级访存请求。
步骤13(S13):家节点13接收第3级访存请求。
步骤14(S14):在家节点13接收第3级访存请求后,根据第3级基址判断每一级第一缓存是否存储有第3级基址。在判断二级缓存存储有第3级基址的情况下,向二级缓存发送第3级访存请求。
可选地,家节点还包括缓冲区buffer。在家节点13接收第3级访存请求后,buffer根据第3级基址的tag判断哪一级第一缓存存储有第3级基址。例如,家节点判断二级缓存中存储有第3级基址。
步骤15(S15):二级缓存的控制单元71接收第3级访存请求。
可选地,在二级缓存接收了第3级访存请求之后,二级缓存的控制单元71可以根据该访存请求,再次确定本级缓存中是否存储有第3级基址。
步骤16(S16):二级缓存的控制单元71根据第3级基址判断二级缓存有第3级基址后,根据第3级基址查找到存储的第3级基址。
步骤17(S17):二级缓存的控制单元71向二级缓存的计算单元72发送第3级基址和第3级高位地址。
步骤18(S18):二级缓存的计算单元72在第3级基址上加偏移地址计算得到第4级基址。
步骤19(S19):在二级缓存的计算单元72得到第4级基址后,向二级缓存的控制单元71发送第4级基址。
步骤20(S20):二级缓存的控制单元71根据第4级基址,判断二级缓存没有第4级基址;然后向三级缓存的控制单元81和家节点13发送第4级访存请求。
步骤21(S21):家节点13和三级缓存的控制单元81都接收第4级访存请求。
具体地,在本发明实施例中,可选地,在三级缓存接收第4级访存后且没有存储第4级基址的情况下,判断本级缓存没有第4级基址。
步骤22(S22):在家节点13接收第4级访存请求后,判断二级缓存和三级缓存均没有存储第4级基址,向内存控制器14发送第4级访存请求。
具体地,在家节点接收第4级访存请求后,判断该请求为第4级的访存请求。向内存控制器发送请求,指示内存控制器14从内存20中获取第4级基址。可选地,家节点包括缓冲区buffer,buffer用于判断该请求是对哪一级页表进行查询,在完成判断后从当前级别查询开始往后处理。
步骤23(S23):内存控制器14接收第4级访存请求。
步骤24(S24):内存控制器根据第4级访存请求,从内存中获取第4级基址后,向家节点13发送第4级基址。具体地,内存控制器与内存之间的数据交互在此不展开描述。
步骤25(S25):家节点13根据第4级基址和第4级偏移地址,计算得到第5级基址。
可选地,家节点的buffer还可以包括buffer计算单元;buffer还用于对基址进行移位计算。进一步可选地,在家节点得到第5级基址后,向MMU发送第5级基址。
需要说明的是,本发明实施例中各个步骤与前述实施例相似的内容不再赘述,请参见前述实施例对应步骤的描述。
可以理解的是,本发明实施例涉及的页表查询流程可以包括但不限于上述提供的四种流程。例如,在存储第4级缓存、第5级缓存以及其他级别缓存的情况等等,均可以参考前述图示的描述。
基于图3所示的架构和相关装置,下面对本发明实施例涉及的一种加速硬件页表遍历装置进行描述,请参见图17,图17是本发明实施例提供的一种加速硬件页表遍历装置的示意图;如图17所示,在存储器管理单元113中增加第三缓存1131。当存储器管理单元113查询TLB发生TLBmiss之后,先查询第三缓存中是否存储有页表基址。如果命中第三缓存,并且依靠第三缓存就可以完成页表遍历的全过程,就可以不用执行前述实施例提供的方案。否则,针对TLB1130和第三缓存1131中缺失了页表基址,继续执行前述本发明实施例提供的页表遍历方案。
基于图4所示的架构,在N=2,M=4情况下对本发明实施例可能涉及的一种页表查询流程进行描述。
下面先对在N=2,M=4情况下的一种具体的加速硬件页表遍历装置进行描述。
请参见图18,图18是本发明实施例提供的另一种具体的加速硬件页表遍历装置的示意图;如图18所示,处理器芯片10包括处理器内核11,处理器内核11可以包括处理器内核1和处理器内核2。处理器内核2的第二缓存115为二级缓存(即第1级第二缓存),在本发明实施例中二级缓存只有1级,二级缓存可以就是第二缓存115。三级缓存是处理器核1和处理器核2共享的核外缓存,即第2级第一缓存。第1级第一缓存和第1级第二缓存都可以是二级缓存。处理器内核1与总线连接之处可以设置家节点13,处理器内核2与总线连接之处可以设置家节点13。家节点13可以包括多个的家节点,但多个家节点的功能一致。
请参见图19,图19是本发明实施例提供的一种多核情况下的页表查询过程示意图;如图19所示,如图19所示,访存请求可以为第K级访存请求,基址可以为第K级基址;其中,K的取值包括1、2、3、…、M。图20是本发明实施例提供的一种对应图19的部分硬件的交互示意图;假设L2 cache(即第一缓存中的二级缓存)命中第1级基址和第3级基址,L3 cache(即第一缓存中的三级缓存)命中第2级基址,而第4级基址存储在第二缓存的二级缓存中;如图20所示,在前述图18所示的加速硬件页表遍历的装置中,其各个功能模块可以按照以下时序执行相应的操作,具体步骤如下:
步骤1(S1):二级缓存的控制单元71接收第1级访存请求。
步骤2(S2):二级缓存的控制单元71根据第1级基址,判断二级缓存有第1级基址;然后根据第1级基址查找到存储的第1级基址。
步骤3(S3):二级缓存的控制单元71向二级缓存的计算单元发送第1级基址和第1级高位地址。
步骤4(S4):二级缓存的计算单元72在第1级基址上加偏移地址计算得到第2级基址。
步骤5(S5):在二级缓存的计算单元72得到第2级基址后,向二级缓存的控制单元71发送第2级基址。
步骤6(S6):二级缓存的控制单元71根据第2级基址判断二级缓存没有第2级基址;然后向三级缓存的控制单元81发送第2级访存请求。
步骤7(S7):三级缓存的控制单元81接收第2级访存请求,所述第2级访存请求包括第2级基址和第2级高位地址(也包含在高位地址中)。
步骤8(S8):三级缓存的控制单元81根据第2级基址,判断三级缓存有第2级基址;然后根据第2级基址查找到存储的第2级基址。
步骤9(S9):三级缓存的控制单元81向三级缓存的计算单元发送第2级基址和第2级高位地址。
步骤10(S10):三级缓存的计算单元82在第2级基址上加对应的偏移地址,计算得到第3级基址。
步骤11(S11):在三级缓存单元82得到第3级基址后,向三级缓存的控制单元81发送第3级基址。
步骤12(S12):三级缓存的控制单元81根据第3级基址判断三级缓存没有第3级基址;然后向家节点13发送第3级访存请求。
步骤13(S13):家节点13接收第3级访存请求。
步骤14(S14):在家节点13接收第3级访存请求后,根据第3级基址判断每一级第一缓存是否存储有第3级基址。在判断二级缓存存储有第3级基址的情况下,向二级缓存发送第3级访存请求。
步骤15(S15):二级缓存的控制单元71接收第3级访存请求。
步骤16(S16):二级缓存的控制单元71根据第3级基址判断二级缓存有第3级基址后,根据第3级基址查找到存储的第3级基址。
步骤17(S17):二级缓存的控制单元71向二级缓存的计算单元72发送第3级基址和第3级高位地址。
步骤18(S18):二级缓存的计算单元72在第3级基址上加偏移地址计算得到第4级基址。
步骤19(S19):在二级缓存的计算单元72得到第4级基址后,向二级缓存的控制单元71发送第4级基址。
步骤20(S20):二级缓存的控制单元71根据第4级基址,判断二级缓存没有第4级基址;然后向三级缓存的控制单元81和家节点13发送第4级访存请求。
步骤21(S21):家节点13接收第4级访存请求。
可选地,三级缓存的控制单元81接收第4级访存请求。
步骤22(S22):在家节点13接收第4级访存请求后,判断二级缓存和三级缓存均没有存储第4级基址而第二缓存中的二级缓存中存储有第4级基址,向第二缓存中的二级缓存115发送第4级访存请求。
步骤23(S23):第二缓存的二级缓存115接收家节点13发送的第4级访存请求。
步骤24(S24):在第二缓存的二级缓存115根据接收的第4级访存请求,确定二级缓存中存储的第4级基址后,第二缓存的二级缓存115向家节点13发送第4级基址。
步骤25(S25):家节点13根据第4级基址确定和第4级偏移地址确定第5级基址。
在多核的情况下(即处理器芯片内有第一缓存、第二缓存以及处理器芯片连接内存的情况下),假设L2 cache(即第一缓存中的二级缓存)命中第1级基址和第3级基址,L3 cache(即第一缓存中的三级缓存)命中第2级基址,而第4级基址存储在内存中。前述图19和图20对应的实施例中,具体步骤如下:
步骤1(S1):二级缓存的控制单元71接收第1级访存请求。
……
步骤21(S21):家节点13接收第4级访存请求。
(步骤1-步骤21可以参考前述图19以及图20对应实施例中的步骤1-步骤21,在此不再赘述。)
步骤22(S22):在家节点13接收第4级访存请求后,判断所有二级缓存和所有一级缓存均没有存储第4级基址,向内存控制器14发送第4级访存请求。
步骤23(S23):内存控制器14接收第4级访存请求。
步骤24(S24):内存控制器根据第4级访存请求,从内存中获取第4级基址后,向家节点13发送第4级基址。具体地,内存控制器与内存之间的数据交互在此不展开描述。
步骤25(S25):家节点13根据第4级基址和第4级偏移地址,计算得到第5级基址。
可以理解的是,第二缓存中二级缓存也可以包括控制单元和计算单元,可以参考第一缓存中的描述,在此不再赘述。上述步骤可能存在的其他实现方式以及具体描述可以参考前述图15和图16对应的实施例的描述,在此不再赘述。
可以理解的是,在有多个处理器内核的情况下,基址计算节点可以发生在多核内或者家节点。例如,在处理器内核1的二级缓存计算得到第3级基址后,在三级缓存命中了第3级基址,并通过三级缓存的计算单元得到了第4级基址。但是三级缓存没有命中第4级基址,通过家节点发现在处理器内核2的二级缓存中命中第4级基址,通过家节点向处理器内核2的二级缓存发送第4级访存请求。具体的描述可以参见前述图15和图16对应的实施例的描述,在此不再赘述。
需要说明的是,本发明实施例中各个步骤以及硬件结构与前述实施例相似的内容不再赘述,请参见前述实施例对应步骤的描述。其中,前述的实施例中第i级第一缓存(例如,二级缓存、三级缓存)都是指具备存储页表基址能力的缓存,不涉及现有技术中的一级缓存(由于目前的一级缓存不存储页表基址)。但是不排除一级缓存将来可能存储页表基址的情况,那么在一级缓存能够存储页表基址的情况下,第1级第一缓存可以是一级缓存。否则,本发明实施例中的第1级第一缓存一般为二级缓存。并且,本发明实施例对缓存的数量和级别不作限定。
基于图4所示的架构和相关装置,下面对本发明实施例涉及的一种加速硬件页表遍历装置进行描述,请参见图21,图21是本发明实施例提供的一种多核情况下加速硬件页表遍历装置的示意图;如图21所示,在处理器内核1的存储器管理单元113中增加第三缓存1131。当存储器管理单元113查询TLB发生TLBmiss之后,先查询第三缓存中是否存储有需要的全部页表基址。如果命中第三缓存,并且依靠第三缓存就可以完成页表遍历的全过程,就可以不用执行前述实施例提供的方案。否则,针对TLB1130和第三缓存1131中缺失的页表基址,继续执行前述本发明实施例提供的页表遍历方案。
需要说明的是,在处理器芯片10有多个处理器内核11的情况下(如处理器内核1、处理内核2、......、处理器内核Q等等),每个处理器内核都可以包括第三缓存。比如,图示的处理器内核1有第三缓存1131;那么处理器内核2或者处理器内核Q也都可以有第三缓存。
结合图3所述应用架构,下面对本发明实施例涉及的一种加速硬件页表遍历方法进行描述。请参见图22,图22是本发明实施例提供的一种加速硬件页表遍历方法的示意图;如图22所示,可以包括步骤S2201-步骤S2212;其中,可选的步骤可以包括步骤S2205- 步骤S2212。
步骤S2201:通过第i级第一缓存的控制单元接收访存请求。
具体地,所述访存请求包括第K级基址和第K级高位地址。0<K≤M,K为整数;i=1、2、...、N。在一种可能的实现方式中,所述访存请求还包括所述第K级基址的基址标识,所述基址标识用于指示所述第K级基址的级别。
步骤S2202:通过所述第i级第一缓存的控制单元,根据所述第K级基址判断所述第i级第一缓存是否存储有所述第K级基址。
步骤S2203:在判断所述第i级第一缓存存储有所述第K级基址的情况下,通过所述第i级第一缓存的控制单元,向所述第i级第一缓存的计算单元发送所述第K级基址。
步骤S2204:通过第i级第一缓存的计算单元,根据所述第K级基址和第K级偏移地址确定第(K+1)级基址。
具体地,所述第K级偏移地址为根据所述第K级高位地址确定的。
步骤S2205:在判断所述第i级第一缓存未存储有所述第K级基址且i≠N的情况下,通过所述第i级第一缓存的控制单元,向第(i+1)级第一缓存发送所述访存请求。
步骤S2206:在判断所述第i级第一缓存未存储有所述第K级基址的情况下,通过所述第i级第一缓存的控制单元向家节点发送所述访存请求。
步骤S2207:在接收所述访存请求后,通过所述家节点,根据所述第K级基址判断N级第一缓存中每一级第一缓存是否存储有所述第K级基址。
步骤S2208:在判断目标第一缓存存储有所述第K级基址的情况下,向所述目标第一缓存的控制单元发送所述访存请求。
步骤S2209:在判断所述每一级第一缓存均未存储有所述第K级基址的情况下,通过所述家节点向内存控制器发送所述访存请求。
步骤S2210:通过所述内存控制器,根据所述第K级基址在内存中确定所述第K级基址。
步骤S2211:通过所述内存控制器向所述家节点发送所述第K级基址。
步骤S2212:根据所述第K级基址和所述第K级偏移地址,通过所述家节点的缓冲区buffer确定所述第(K+1)级基址。
需要说明的是,本发明实施例中所描述的加速硬件页表遍历方法可参见上述图8-图17中所述的装置实施例中的加速硬件页表遍历装置的相关描述,此处不再赘述。
结合图4所述应用架构,下面对本发明实施例涉及的另一种加速硬件页表遍历方法进行描述。请参见图23,图23是本发明实施例提供的另一种加速硬件页表遍历方法的示意图;如图23所示,可以包括步骤S2301-步骤S2317。
步骤S2301:通过第i级第一缓存的控制单元接收访存请求。
具体地,所述访存请求包括第K级基址和第K级高位地址;0<K≤M,K为整数;i=1、2、...、N。在一种可能的实现方式中,所述访存请求还包括所述第K级基址的基址标识,所述基址标识用于指示所述第K级基址的级别。
步骤S2302:通过所述第i级第一缓存的控制单元,根据所述第K级基址判断所述第i 级第一缓存是否存储有所述第K级基址。
步骤S2303:在判断所述第i级第一缓存存储有所述第K级基址的情况下,通过所述第i级第一缓存的控制单元,向所述第i级第一缓存的计算单元发送所述第K级基址。
步骤S2304:通过第i级第一缓存的计算单元,根据所述第K级基址和第K级偏移地址确定第(K+1)级基址。
具体地,所述第K级偏移地址为根据所述第K级高位地址确定的。
步骤S2305:在判断所述第i级第一缓存未存储有所述第K级基址且i≠N的情况下,通过所述第i级第一缓存的控制单元,向第(i+1)级第一缓存发送所述访存请求。
步骤S2306:在判断所述第i级第一缓存未存储有所述第K级基址的情况下,通过所述第i级第一缓存的控制单元向家节点发送所述访存请求。
步骤S2307:在接收所述访存请求后,通过所述家节点,根据所述第K级基址判断N级第一缓存中每一级第一缓存是否存储有所述第K级基址。
步骤S2308:在判断目标第一缓存存储有所述第K级基址的情况下,向所述目标第一缓存的控制单元发送所述访存请求。
步骤S2309:在接收所述访存请求后,通过所述家节点判断第二缓存是否存储有所述第K级基址。
步骤S2310:在判断所述第二缓存存储有所述第K级基址的情况下,通过所述家节点向所述第二缓存发送所述访存请求。
步骤S2311:在判断所述每一级第一缓存和所述第二缓存均未存储有所述第K级基址的情况下,通过所述家节点向内存控制器发送所述访存请求。
步骤S2312:根据所述第K级基址,通过所述内存控制器在内存中确定所述第K级基址。
步骤S2313:通过所述内存控制器向所述家节点发送所述第K级基址。
步骤S2314:根据所述第K级基址和所述第K级偏移地址,通过所述家节点的缓冲区buffer确定所述第(K+1)级基址。
步骤S2315:在向第1级第一缓存发送访存请求之前,通过所述存储器管理单元判断第三缓存是否存储有第1级基址。
步骤S2316:在判断所述第三缓存存储有所述第1级基址的情况下,通过所述存储器管理单元获取所述第1级基址。
需要说明的是,在第三缓存存储有第1级基址的前提下,第三缓存也会存储有剩余级别的基址,以供完成目标虚拟地址转换成物理地址。
步骤S2317:在判断所述第三缓存未存储有所述第1级基址的情况下,通过所述存储器管理单元向所述第1级第一缓存发送所述访存请求。
需要说明的是,本发明实施例中所描述的加速硬件页表遍历方法可参见上述图19-图21中所述的装置实施例中的加速硬件页表遍历装置的相关描述,此处不再赘述。
如图24所示,图24是本发明实施例提供的一种芯片的结构示意图。前述实施例中的加速硬件页表遍历装置可以以图24中的结构来实现,该设备包括至少一个处理器241,至 少一个存储器242。此外,该设备还可以包括天线等通用部件,在此不再详述。
处理器241可以是通用中央处理器(CPU),微处理器,特定应用集成电路(application-specific integrated circuit,ASIC),或一个或多个用于控制以上方案程序执行的集成电路。
存储器242可以是只读存储器(read-only memory,ROM)或可存储静态信息和指令的其他类型的静态存储设备,随机存取存储器(random access memory,RAM)或者可存储信息和指令的其他类型的动态存储设备,也可以是电可擦可编程只读存储器(Electrically Erasable Programmable Read-Only Memory,EEPROM)、只读光盘(Compact Disc Read-Only Memory,CD-ROM)或其他光盘存储、光碟存储(包括压缩光碟、激光碟、光碟、数字通用光碟、蓝光光碟等)、磁盘存储介质或者其他磁存储设备、或者能够用于携带或存储具有指令或数据结构形式的期望的程序代码并能够由计算机存取的任何其他介质,但不限于此。存储器可以是独立存在,通过总线与处理器相连接。存储器也可以和处理器集成在一起。
其中,所述存储器242用于存储执行以上方案的应用程序代码,并由处理器241来控制执行。所述处理器241用于执行所述存储器242中存储的应用程序代码。具体如下:
通过第i级第一缓存的控制单元接收访存请求,所述访存请求包括第K级基址和第K级高位地址,K为正整数,i=1、2、...、N;通过所述第i级第一缓存的控制单元,根据所述第K级基址判断所述第i级第一缓存是否存储有所述第K级基址;在判断所述第i级第一缓存存储有所述第K级基址的情况下,通过所述第i级第一缓存的控制单元,向所述第i级第一缓存的计算单元发送所述第K级基址;通过第i级第一缓存的计算单元,根据所述第K级基址和第K级偏移地址确定第(K+1)级基址,所述第K级偏移地址为根据所述第K级高位地址确定的。
图24所示的芯片为一种加速硬件页表遍历装置时,存储器242存储的代码可执行以上图22或者图23提供的加速硬件页表遍历装置方法,比如,在判断所述第i级第一缓存未存储有所述第K级基址且i≠N的情况下,通过所述第i级第一缓存的控制单元,向第(i+1)级第一缓存的控制单元发送所述访存请求。
或者,在接收所述访存请求后,通过所述家节点,根据所述第K级基址判断N级第一缓存中每一级第一缓存是否存储有所述第K级基址;在判断目标第一缓存存储有所述第K级基址的情况下,向所述目标第一缓存的控制单元发送所述访存请求。
或者,在判断所述每一级第一缓存均未存储有所述第K级基址的情况下,通过所述家节点向内存控制器发送所述访存请求;通过所述内存控制器,根据所述第K级基址在内存中确定所述第K级基址;通过所述内存控制器向所述家节点发送所述第K级基址;根据所述第K级基址和所述第K级偏移地址,通过所述家节点的缓冲区buffer确定所述第(K+1)级基址。
需要说明的是,本发明实施例中所描述的芯片24的功能可参见上述图22-图23中的所述的方法实施例中的相关描述,此处不再赘述。
在上述实施例中,对各个实施例的描述都各有侧重,某个实施例中没有详述的部分,可以参见其他实施例的相关描述。
需要说明的是,对于前述的各方法实施例,为了简单描述,故将其都表述为一系列的动作组合,但是本领域技术人员应该知悉,本申请并不受所描述的动作顺序的限制,因为依据本申请,某些步骤可能可以采用其他顺序或者同时进行。其次,本领域技术人员也应该知悉,说明书中所描述的实施例均属于优选实施例,所涉及的动作和模块并不一定是本申请所必须的。
在本申请所提供的几个实施例中,应该理解到,所揭露的装置,可通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如上述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性或其它的形式。
上述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
另外,在本申请各实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。
基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以为个人计算机、服务器或者网络设备等,具体可以是计算机设备中的处理器)执行本申请各个实施例上述方法的全部或部分步骤。其中,而前述的存储介质可包括:U盘、移动硬盘、磁碟、光盘、只读存储器(Read-Only Memory,缩写:ROM)或者随机存取存储器(Random Access Memory,缩写:RAM)等各种可以存储程序代码的介质。
以上所述,以上实施例仅用以说明本申请的技术方案,而非对其限制;尽管参照前述实施例对本申请进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本申请各实施例技术方案的精神和范围。

Claims (19)

  1. 一种加速硬件页表遍历的装置,其特征在于,包括第i级第一缓存的控制单元和所述第i级第一缓存的计算单元;i=1、2、…、N,N为正整数;其中,
    所述第i级第一缓存的控制单元,用于:
    接收访存请求,所述访存请求包括第K级基址和第K级高位地址,K为正整数;
    根据所述第K级基址判断所述第i级第一缓存是否存储有所述第K级基址;
    在判断所述第i级第一缓存存储有所述第K级基址的情况下,向所述第i级第一缓存的计算单元发送所述第K级基址;
    所述第i级第一缓存的计算单元,用于:
    根据所述第K级基址和第K级偏移地址确定第(K+1)级基址,所述第K级偏移地址为根据所述第K级高位地址确定的。
  2. 根据权利要求1所述的装置,其特征在于,所述第i级第一缓存的控制单元,还用于:
    在判断所述第i级第一缓存未存储有所述第K级基址且i≠N的情况下,向第(i+1)级第一缓存的控制单元发送所述访存请求。
  3. 根据权利要求1或2所述的装置,其特征在于,所述装置还包括家节点;所述家节点与所述第i级第一缓存耦合;
    所述第i级第一缓存的控制单元,还用于:
    在判断所述第i级第一缓存未存储有所述第K级基址的情况下,向所述家节点发送所述访存请求;
    所述家节点,用于接收所述访存请求。
  4. 根据权利要求3所述的装置,其特征在于,所述家节点,还用于:
    在接收所述访存请求后,根据所述第K级基址判断N级第一缓存中每一级第一缓存是否存储有所述第K级基址;
    在判断目标第一缓存存储有所述第K级基址的情况下,向所述目标第一缓存的控制单元发送所述访存请求。
  5. 根据权利要求4所述的装置,其特征在于,所述装置还包括:与所述家节点耦合的内存控制器、与所述内存控制器耦合的内存;所述家节点包括缓冲区buffer;
    所述家节点,还用于:
    在判断所述每一级第一缓存均未存储有所述第K级基址的情况下,向所述内存控制器发送所述访存请求;
    所述内存控制器,用于:
    根据所述第K级基址,在所述内存中确定所述第K级基址;
    向所述家节点发送所述第K级基址;
    所述内存,用于存储所述第K级基址;
    所述buffer,用于根据所述第K级基址和所述第K级偏移地址,确定所述第(K+1)级基址。
  6. 根据权利要求4所述的装置,其特征在于,所述装置还包括与所述家节点耦合的第二缓存,所述第二缓存包括所述第二缓存的控制单元和所述第二缓存的计算单元;
    所述家节点,还用于:
    在接收所述访存请求后,判断所述第二缓存是否存储有所述第K级基址;
    在判断所述第二缓存存储有所述第K级基址的情况下,向所述第二缓存的控制单元发送所述访存请求。
  7. 根据权利要求6所述的装置,其特征在于,所述装置还包括与所述家节点耦合的内存控制器、与所述内存控制器耦合的内存;所述家节点包括缓冲区buffer;
    所述家节点,还用于:
    在判断所述每一级第一缓存和所述第二缓存,均未存储有所述第K级基址的情况下,向所述内存控制器发送所述访存请求;
    所述内存控制器,用于:
    根据所述第K级基址,在所述内存中确定所述第K级基址;
    向所述家节点发送所述第K级基址;
    所述内存用于存储所述第K级基址;
    所述buffer,用于根据所述第K级基址和所述第K级偏移地址,确定所述第(K+1)级基址。
  8. 根据权利要求1-7任一项所述的装置,其特征在于,所述装置还包括与第1级第一缓存耦合的存储器管理单元,所述存储器管理单元包括第三缓存;所述第三缓存用于存储K级基址,K=1、2、…、M,M为大于1的整数;
    所述存储器管理单元,用于:
    在向所述第1级第一缓存发送所述访存请求之前,判断所述第三缓存是否存储有所述第1级基址;
    在判断所述第三缓存存储有所述第1级基址的情况下,获取所述第1级基址;
    在判断所述第三缓存未存储有所述第1级基址的情况下,向所述第1级第一缓存发送所述访存请求。
  9. 根据权利要求1-8任一项所述的装置,其特征在于,所述访存请求还包括所述第K级基址的基址标识,所述第K级基址的基址标识用于指示所述第K级基址的级别。
  10. 一种加速硬件页表遍历的方法,其特征在于,包括:
    通过第i级第一缓存的控制单元接收访存请求,所述访存请求包括第K级基址和第K级高位地址,K为正整数,i=1、2、...、N;
    通过所述第i级第一缓存的控制单元,根据所述第K级基址判断所述第i级第一缓存是否存储有所述第K级基址;
    在判断所述第i级第一缓存存储有所述第K级基址的情况下,通过所述第i级第一缓存的控制单元,向所述第i级第一缓存的计算单元发送所述第K级基址;
    通过第i级第一缓存的计算单元,根据所述第K级基址和第K级偏移地址确定第(K+1)级基址,所述第K级偏移地址为根据所述第K级高位地址确定的。
  11. 根据权利要求10所述的方法,其特征在于,所述方法还包括:
    在判断所述第i级第一缓存未存储有所述第K级基址且i≠N的情况下,通过所述第i级第一缓存的控制单元,向第(i+1)级第一缓存的控制单元发送所述访存请求。
  12. 根据权利要求10或11所述的方法,其特征在于,所述方法还包括:
    在判断所述第i级第一缓存未存储有所述第K级基址的情况下,通过所述第i级第一缓存的控制单元向家节点发送所述访存请求。
  13. 根据权利要求12所述的方法,其特征在于,所述方法还包括:
    在接收所述访存请求后,通过所述家节点,根据所述第K级基址判断N级第一缓存中每一级第一缓存是否存储有所述第K级基址;
    在判断目标第一缓存存储有所述第K级基址的情况下,向所述目标第一缓存的控制单元发送所述访存请求。
  14. 根据权利要求13所述的方法,其特征在于,所述方法还包括:
    在判断所述每一级第一缓存均未存储有所述第K级基址的情况下,通过所述家节点向内存控制器发送所述访存请求;
    通过所述内存控制器,根据所述第K级基址在内存中确定所述第K级基址;
    通过所述内存控制器向所述家节点发送所述第K级基址;
    根据所述第K级基址和所述第K级偏移地址,通过所述家节点的缓冲区buffer确定所述第(K+1)级基址。
  15. 根据权利要求13所述的方法,其特征在于,所述方法还包括:
    在接收所述访存请求后,通过所述家节点判断第二缓存是否存储有所述第K级基址;
    在判断所述第二缓存存储有所述第K级基址的情况下,通过所述家节点向所述第二缓存的控制单元发送所述访存请求。
  16. 根据权利要求15所述的方法,其特征在于,所述方法还包括:
    在判断所述每一级第一缓存和所述第二缓存均未存储有所述第K级基址的情况下,通 过所述家节点向内存控制器发送所述访存请求;
    根据所述第K级基址,通过所述内存控制器在内存中确定所述第K级基址;
    通过所述内存控制器向所述家节点发送所述第K级基址;
    根据所述第K级基址和所述第K级偏移地址,通过所述家节点的缓冲区buffer确定所述第(K+1)级基址。
  17. 根据权利要求10-16任一项所述的方法,其特征在于,所述方法还包括:
    在向第1级第一缓存发送所述访存请求之前,通过存储器管理单元判断第三缓存是否存储有所述第1级基址;
    在判断所述第三缓存存储有所述第1级基址的情况下,通过所述存储器管理单元获取所述第1级基址;
    在判断所述第三缓存未存储有所述第1级基址的情况下,通过所述存储器管理单元向所述第1级第一缓存发送所述访存请求。
  18. 根据权利要求10-17任一项所述的方法,其特征在于,所述访存请求还包括所述第K级基址的基址标识,所述第K级基址的基址标识用于指示所述第K级基址的级别。
  19. 一种芯片系统,其特征在于,所述芯片系统执行如权利要求10-18中任意一项所述的方法得以实现。
PCT/CN2020/132489 2019-11-28 2020-11-27 一种加速硬件页表遍历的方法及装置 WO2021104502A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201911195523.5 2019-11-28
CN201911195523.5A CN112860600A (zh) 2019-11-28 2019-11-28 一种加速硬件页表遍历的方法及装置

Publications (1)

Publication Number Publication Date
WO2021104502A1 true WO2021104502A1 (zh) 2021-06-03

Family

ID=75995973

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/132489 WO2021104502A1 (zh) 2019-11-28 2020-11-27 一种加速硬件页表遍历的方法及装置

Country Status (2)

Country Link
CN (1) CN112860600A (zh)
WO (1) WO2021104502A1 (zh)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113656331A (zh) * 2021-10-20 2021-11-16 北京微核芯科技有限公司 基于高低位的确定访问地址的方法和装置
CN114281720B (zh) * 2021-12-14 2022-09-02 海光信息技术股份有限公司 处理器、用于处理器的地址翻译方法、电子设备
CN114238176B (zh) * 2021-12-14 2023-03-10 海光信息技术股份有限公司 处理器、用于处理器的地址翻译方法、电子设备
WO2023122194A1 (en) * 2021-12-22 2023-06-29 SiFive, Inc. Page table entry caches with multiple tag lengths

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101027652A (zh) * 2004-09-30 2007-08-29 英特尔公司 对于直接存储器存取地址转换的高速缓存支持
US20120297139A1 (en) * 2011-05-20 2012-11-22 Samsung Electronics Co., Ltd. Memory management unit, apparatuses including the same, and method of operating the same
CN107209724A (zh) * 2015-03-27 2017-09-26 华为技术有限公司 数据处理方法、内存管理单元及内存控制设备
CN109313610A (zh) * 2016-06-13 2019-02-05 超威半导体公司 用于高速缓存替换策略的缩放集合竞争
CN109690484A (zh) * 2016-09-08 2019-04-26 英特尔公司 在虚拟机进入时转换

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101027652A (zh) * 2004-09-30 2007-08-29 英特尔公司 对于直接存储器存取地址转换的高速缓存支持
US20120297139A1 (en) * 2011-05-20 2012-11-22 Samsung Electronics Co., Ltd. Memory management unit, apparatuses including the same, and method of operating the same
CN107209724A (zh) * 2015-03-27 2017-09-26 华为技术有限公司 数据处理方法、内存管理单元及内存控制设备
CN109313610A (zh) * 2016-06-13 2019-02-05 超威半导体公司 用于高速缓存替换策略的缩放集合竞争
CN109690484A (zh) * 2016-09-08 2019-04-26 英特尔公司 在虚拟机进入时转换

Also Published As

Publication number Publication date
CN112860600A (zh) 2021-05-28

Similar Documents

Publication Publication Date Title
WO2021104502A1 (zh) 一种加速硬件页表遍历的方法及装置
JP2019067417A (ja) 最終レベルキャッシュシステム及び対応する方法
US8549231B2 (en) Performing high granularity prefetch from remote memory into a cache on a device without change in address
CN102473139B (zh) 包括用于i/o和计算卸载的多层次地址转换的i/o存储器管理单元
JP5440067B2 (ja) キャッシュメモリ制御装置およびキャッシュメモリ制御方法
TWI557640B (zh) 記憶體快取結構中的中央處理器
US7472253B1 (en) System and method for managing table lookaside buffer performance
US8185692B2 (en) Unified cache structure that facilitates accessing translation table entries
CN102279817B (zh) 用于持久存储器的高速缓冲存储器相干性协议
KR102281928B1 (ko) 가변 변환 색인 버퍼(tlb) 인덱싱
JP2022548642A (ja) タイプ付けされていないメモリアクセスのタイプ付けされているメモリアクセスに対するマッピング
CN1622060A (zh) 转换后备缓冲器的惰性转储清除
US11403222B2 (en) Cache structure using a logical directory
US8862829B2 (en) Cache unit, arithmetic processing unit, and information processing unit
JP2022548886A (ja) メモリ名前空間へデータをバインドするためのメモリシステム
US7549035B1 (en) System and method for reference and modification tracking
WO2024036985A1 (zh) 存储系统及其计算存储处理器、固体硬盘和数据读写方法
CN115481054A (zh) 数据处理方法、装置及系统、系统级soc芯片及计算机设备
US8468297B2 (en) Content addressable memory system
TWI744111B (zh) 查找表建立暨記憶體位址查詢方法、主機記憶體位址查找表建立方法與主機記憶體位址查詢方法
JP2008512758A (ja) 仮想アドレス・キャッシュに格納されたデータを共用する仮想アドレス・キャッシュ及び方法
US10489305B1 (en) Prefetch kill and revival in an instruction cache
CN115586943A (zh) 一种智能网卡虚拟机脏页的硬件标记实现方法
US11500779B1 (en) Vector prefetching for computing systems
CN110362509B (zh) 统一地址转换方法与统一地址空间

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20891910

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20891910

Country of ref document: EP

Kind code of ref document: A1