WO2023093122A1 - 处理器、地址转换的方法、装置、存储介质及程序产品 - Google Patents

处理器、地址转换的方法、装置、存储介质及程序产品 Download PDF

Info

Publication number
WO2023093122A1
WO2023093122A1 PCT/CN2022/110069 CN2022110069W WO2023093122A1 WO 2023093122 A1 WO2023093122 A1 WO 2023093122A1 CN 2022110069 W CN2022110069 W CN 2022110069W WO 2023093122 A1 WO2023093122 A1 WO 2023093122A1
Authority
WO
WIPO (PCT)
Prior art keywords
mmu
pool
address translation
physical
page table
Prior art date
Application number
PCT/CN2022/110069
Other languages
English (en)
French (fr)
Inventor
潘伟
罗军平
李涛
陈中仁
刘君龙
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to EP22897217.0A priority Critical patent/EP4418133A4/en
Publication of WO2023093122A1 publication Critical patent/WO2023093122A1/zh
Priority to US18/673,967 priority patent/US20240330202A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/10Address translation
    • G06F12/109Address translation for multiple virtual address spaces, e.g. segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/10Address translation
    • G06F12/1027Address translation using associative or pseudo-associative address translation means, e.g. translation look-aside buffer [TLB]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/10Address translation
    • G06F12/1009Address translation using page tables, e.g. page table structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/10Address translation
    • G06F12/1027Address translation using associative or pseudo-associative address translation means, e.g. translation look-aside buffer [TLB]
    • G06F12/1036Address translation using associative or pseudo-associative address translation means, e.g. translation look-aside buffer [TLB] for multiple virtual address spaces, e.g. segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/10Address translation
    • G06F12/1072Decentralised address translation, e.g. in distributed shared memory systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/16Combinations of two or more digital computers each having at least an arithmetic unit, a program unit and a register, e.g. for a simultaneous processing of several programs
    • G06F15/163Interprocessor communication
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2212/00Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
    • G06F2212/65Details of virtual memory and virtual address translation
    • G06F2212/657Virtual address space management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2212/00Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
    • G06F2212/68Details of translation look-aside buffer [TLB]
    • G06F2212/684TLB miss handling

Definitions

  • the present application relates to the field of computer technology, and in particular to a processor, an address conversion method, device, storage medium and program product.
  • VA virtual address
  • PA physical address
  • Embodiments of the present application provide a processor, an address translation method, device, storage medium, and program product, which can improve the address translation efficiency of the processor. Described technical scheme is as follows:
  • a processor in a first aspect, includes a plurality of physical cores and a memory management unit (memory management unit, MMU) pool, the MMU pool includes a plurality of MMUs, and the plurality of physical cores and the MMU pool pass through The internal buses of the processors are connected, and the MMU pool provides the address conversion function from VA to PA for the multiple physical cores.
  • MMU memory management unit
  • the MMU pool also provides address translation functions for peripherals of the processor. That is to say, this solution can also improve the address conversion efficiency in the process of accessing the memory by the peripheral device.
  • the peripherals of the processor include a physical network card, a graphics card, and the like outside the processor.
  • the peripherals of the processor also include chips or components inside the processor that include functions such as network cards and graphics cards.
  • the MMU pool resides in the processor's home agent (HA) or memory controller (MC). That is, performing address translation (including page table traversal) near the memory can effectively reduce the delay of address translation. Especially in the case of multi-level page tables, address translation is more efficient.
  • HA home agent
  • MC memory controller
  • the first physical core is configured to send an address translation request to the MMU pool, where the address translation request carries the first VA to be translated.
  • the first physical core is any one of the multiple physical cores.
  • the MMU pool is used to receive the address translation request, translate the first VA into the first PA, and send an address translation response to the first physical core, where the address translation response carries the first PA.
  • the first physical core is also used for receiving an address translation response.
  • the first physical core is used to query the page table entry where the first VA is located from a translation lookaside buffer (TLB) corresponding to the first physical core, if the TLB does not cache the page table entry where the first VA is located.
  • TLB translation lookaside buffer
  • an address translation request is sent to the MMU pool.
  • a page table entry includes a mapping relationship between a VA and a PA.
  • the MMU pool is used to query the page table entry where the first VA is located from the memory page table, so as to obtain the first PA corresponding to the first VA.
  • the memory page table records page table entries where all VAs of the memory are located.
  • the table is first looked up from the TLB close to the physical core, so as to quickly obtain the corresponding page table entry from the cache.
  • the page table is traversed through the MMU pool close to the memory to quickly obtain the corresponding page table entry from the memory page table.
  • the memory page table is a one-level page table or a multi-level page table.
  • an address translation method is provided, and the method is applied to a processor.
  • the processor includes multiple physical cores and an MMU pool, the MMU pool includes multiple MMUs, and the multiple physical cores are connected to the MMU pool through the internal bus of the processor; the method includes:
  • the first physical core sends an address translation request to the MMU pool, the address translation request carries the first VA to be translated, and the first physical core is any physical core in the plurality of physical cores; the first physical core receives the address translation sent by the MMU pool An address translation response, where the address translation response carries the first PA corresponding to the first VA.
  • the MMU pool provides an address translation function from VA to PA for the multiple physical cores.
  • multiple physical cores of the processor share the MMU pool, instead of one physical core corresponding to one MMU.
  • this solution can also provide services for the physical core by multiple MMUs, instead of limiting a single MMU to provide services for the physical core, thus Improve address translation efficiency and speed up memory access.
  • the first physical core before the first physical core sends the address translation request to the MMU pool, it also includes: the first physical core queries the page table entry where the first VA is located from the TLB corresponding to the first physical core, and a page table entry contains a VA The mapping relationship with one PA; if the page table entry where the first VA is located is not cached in the TLB, the first physical core performs an operation of sending an address translation request to the MMU pool.
  • a method for address translation is provided, the method is applied to a processor, the processor includes a plurality of physical cores and an MMU pool, the MMU pool includes a plurality of MMUs, and the plurality of physical cores and the MMU pool pass through internal bus of the processor; the method comprising:
  • the MMU pool receives the address conversion request sent by the first physical core, the address conversion request carries the first VA to be converted, and the first physical core is any physical core in the plurality of physical cores; the MMU pool converts the first VA into The first PA: the MMU pool sends an address translation response to the first physical core, where the address translation response carries the first PA. That is, the MMU pool provides an address translation function from VA to PA for the multiple physical cores. To put it simply, in this solution, multiple physical cores of the processor share the MMU pool, instead of one physical core corresponding to one MMU.
  • the MMU pool converts the first VA into the first PA, including: the MMU pool queries the page table entry where the first VA is located from the memory page table to obtain the first PA corresponding to the first VA, and the memory page table records There are page table entries where all VAs are located in the memory, and a page table entry contains a mapping relationship between a VA and a PA.
  • the multiple MMUs correspond to a management module; the MMU pool converts the first VA into the first PA, including: the MMU pool selects an MMU from the multiple MMUs as the target MMU through the management module; the MMU pool selects an MMU as the target MMU through the target The MMU converts the first VA to the first PA.
  • an address conversion device in a fourth aspect, is provided, and the address conversion device has a function of implementing the behavior of the address conversion method in the second aspect above.
  • the address translation device includes one or more modules, and the one or more modules are used to implement the address translation method provided in the second aspect above.
  • an address conversion device is provided, the device is used for a processor, and the processor may be the processor provided in the first aspect above. That is, the processor includes multiple physical cores and an MMU pool, the MMU pool includes multiple MMUs, and the multiple physical cores are connected to the MMU pool through the internal bus of the processor; the device is specifically used for the first physical core, the first A physical core is any physical core in the plurality of physical cores; the device includes:
  • a sending module configured to send an address translation request to the MMU pool, where the address translation request carries the first VA to be translated;
  • the receiving module is configured to receive the address translation response sent by the MMU pool, where the address translation response carries the first PA corresponding to the first VA.
  • the device also includes:
  • the table lookup module is used to query the page table entry where the first VA is located from the TLB corresponding to the first physical core, and a page table entry includes a mapping relationship between a VA and a PA;
  • the triggering module is configured to trigger the sending module to execute the operation of sending the address translation request to the MMU pool if the page table entry where the first VA is located is not cached in the TLB.
  • an address conversion device is provided, and the address conversion device has a function of implementing the behavior of the address conversion method in the third aspect above.
  • the address translation device includes one or more modules, and the one or more modules are used to implement the address translation method provided in the third aspect above.
  • an address conversion device is provided, the device is used for a processor, and the processor may be the processor provided in the first aspect above. That is, the processor includes multiple physical cores and an MMU pool, the MMU pool includes multiple MMUs, and the multiple physical cores are connected to the MMU pool through the internal bus of the processor; the device is specifically used for the MMU pool, and the device includes :
  • the receiving module is configured to receive an address conversion request sent by the first physical core, the address conversion request carries the first VA to be converted, and the first physical core is any one of multiple physical cores;
  • an address conversion module configured to convert the first VA into the first PA
  • a sending module configured to send an address translation response to the first physical core, where the address translation response carries the first PA.
  • the address conversion module is specifically used for:
  • the memory page table records the page table entries of all VAs in the memory.
  • a page table entry includes a VA and a PA mapping relationship between them.
  • the multiple MMUs correspond to a management module; the address conversion module is specifically used for:
  • a sixth aspect provides a computer device, the computer device includes a processor and a memory, the processor is the processor provided in the first aspect above, and the memory is used to store and execute the above second aspect and the third aspect
  • the processor is configured to execute programs stored in the memory.
  • the operating device of the storage device may further include a communication bus for establishing a connection between the processor and the memory.
  • a computer-readable storage medium is provided, and instructions are stored in the computer-readable storage medium, and when the computer-readable storage medium is run on a computer, the computer executes the above-mentioned second aspect and/or the third aspect. Method of address translation.
  • the eighth aspect provides a computer program product containing instructions, which when run on a computer, causes the computer to execute the address translation method described in the second aspect and/or the third aspect.
  • the technical effects obtained by the fourth aspect, the fifth aspect, the sixth aspect, the seventh aspect and the eighth aspect are similar to the technical effects obtained by the corresponding technical means in the first aspect or the second aspect or the third aspect, here No longer.
  • multiple physical cores of the processor share the MMU pool, that is, multiple MMUs provide the address translation function from VA to PA for each physical core, instead of one physical core corresponding to one MMU.
  • this solution can also provide services for the physical core by multiple MMUs, instead of limiting a single MMU to provide services for the physical core, thus Improve address translation efficiency and speed up memory access.
  • FIG. 1 is a schematic structural diagram of a processor provided in an embodiment of the present application.
  • Fig. 2 is a schematic structural diagram of a computer device provided by an embodiment of the present application.
  • FIG. 3 is a flow chart of an address translation method provided in an embodiment of the present application.
  • FIG. 4 is a schematic structural diagram of another processor provided by an embodiment of the present application.
  • FIG. 5 is a schematic structural diagram of a processor in the related art provided by an embodiment of the present application.
  • FIG. 6 is a flow chart of another address translation method provided by an embodiment of the present application.
  • FIG. 7 is a flow chart of an address translation method in the related art provided by the embodiment of the present application.
  • Fig. 8 is a flow chart of accessing VA corresponding data in scenario 1 provided by the embodiment of the present application.
  • Fig. 9 is a flow chart of accessing VA corresponding data in scenario 1 in the related technology provided by the embodiment of the present application.
  • Fig. 10 is a flow chart of accessing VA corresponding data in scenario 3 provided by the embodiment of the present application.
  • Fig. 11 is a flow chart of accessing VA corresponding data in scenario 3 in the related technology provided by the embodiment of the present application;
  • Fig. 12 is a flow chart of another method for accessing VA corresponding data provided by the embodiment of the present application.
  • Fig. 13 is a flow chart of another method for accessing VA corresponding data in the related art provided by the embodiment of the present application.
  • FIG. 14 is a schematic structural diagram of an address translation device provided by an embodiment of the present application.
  • FIG. 15 is a schematic structural diagram of another address translation device provided by an embodiment of the present application.
  • FIG. 1 is a schematic structural diagram of a processor provided by an embodiment of the present application.
  • the processor includes multiple physical cores (core) and an MMU pool, and the MMU pool includes multiple MMUs.
  • the multiple physical cores include physical core 0 to physical core n, and the multiple MMUs include MMU 0 to MMU m.
  • the multiple physical cores are connected to the MMU pool through the internal bus of the processor.
  • the MMU pool provides an address translation function from VA to PA for the plurality of physical cores.
  • the address translation function includes a memory page table-based page table traversal function, such as a pagetablewalk function.
  • a memory page table-based page table traversal function such as a pagetablewalk function.
  • the MMU pool also provides address translation functions for peripherals of the processor.
  • Peripherals include physical network cards, graphics cards, etc. outside the processor.
  • the peripherals also include chips or components inside the processor that have functions such as network cards and graphics cards.
  • FIG. 2 is a schematic structural diagram of a computer device according to an embodiment of the present application.
  • the computer device includes one or more processors 201 , a bus 202 , a memory 203 and one or more interfaces 204 .
  • the one or more processors 201 include the processor shown in FIG. 1 .
  • the processor 201 is a general-purpose central processing unit (central processing unit, CPU), a network processor (network processing, NP), a microprocessor, or one or more integrated circuits for realizing the application scheme, for example, a dedicated Integrated circuit (application-specific integrated circuit, ASIC), programmable logic device (programmable logic device, PLD) or a combination thereof.
  • the above-mentioned PLD is a complex programmable logic device (complex programmable logic device, CPLD), field-programmable logic gate array (field-programmable gate array, FPGA), general array logic (generic array logic, GAL) or its arbitrary combination.
  • the bus 202 is used to transfer information between the above-mentioned components.
  • the bus 202 is divided into an address bus, a data bus, a control bus, and the like.
  • the bus is also referred to as a communication bus.
  • the memory 203 is a read-only memory (read-only memory, ROM), a random access memory (random access memory, RAM), an electrically erasable programmable read-only memory (electrically erasable programmable read-only memory, EEPROM) , optical discs (including compact disc read-only memory, CD-ROM), compact discs, laser discs, digital versatile discs, Blu-ray discs, etc.), magnetic disk storage media or other magnetic storage devices, or can be used for portable or any other medium that stores desired program code in the form of instructions or data structures and that can be accessed by a computer, but is not limited thereto.
  • the memory 203 exists independently and is connected to the processor 201 through the bus 202 , or the memory 203 and the processor 201 are integrated together.
  • Interface 204 uses any transceiver-like device for communicating with other devices or a communication network.
  • the interface is also referred to as a communication interface.
  • the interface 204 includes a wired communication interface, and optionally, also includes a wireless communication interface.
  • the wired communication interface is, for example, an Ethernet interface.
  • the Ethernet interface is an optical interface, an electrical interface or a combination thereof.
  • the wireless communication interface is a wireless local area network (wireless local area networks, WLAN) interface, a cellular network communication interface, or a combination thereof.
  • the computer device includes multiple processors, such as processor 201 and processor 205 as shown in FIG. 2 .
  • processors are a single-core processor, or a multi-core processor.
  • a processor herein refers to one or more devices, circuits, and/or processing cores for processing data (such as computer program instructions).
  • the computer device further includes an output device 206 and an input device 207 .
  • Output device 206 is in communication with processor 201 and can display information in a variety of ways.
  • the output device 206 is a liquid crystal display (liquid crystal display, LCD), a light emitting diode (light emitting diode, LED) display device, a cathode ray tube (cathode ray tube, CRT) display device, or a projector (projector).
  • the input device 207 communicates with the processor 201 and can receive user input in various ways.
  • the input device 207 is a mouse, a keyboard, a touch screen device or a sensing device, etc.
  • the memory 203 is used to store the program code 210 for implementing the solutions of the present application, and the processor 201 can execute the program code 210 stored in the memory 203 .
  • the program code includes one or more software modules, and the computer device can implement the address conversion method provided in the embodiment of FIG. 3 below through the processor 201 and the program code 210 in the memory 203 .
  • FIG. 3 is a flow chart of an address translation method provided by an embodiment of the present application, and the method is applied to a processor. Please refer to FIG. 3 , the method includes the following steps.
  • Step 301 the first physical core sends an address translation request to the MMU pool, where the address translation request carries the first VA to be translated.
  • the processor includes multiple physical cores and an MMU pool.
  • the MMU pool includes multiple MMUs.
  • the multiple physical cores are connected to the MMU pool through the internal bus of the processor.
  • the MMU pool is the Multiple physical cores provide an address conversion function from VA to PA, that is, an address translation function. Any physical core in the multiple physical cores can perform address translation through the MMU pool.
  • an introduction will be made by taking the address translation performed by the first physical core through the MMU pool as an example, where the first physical core is any one of the multiple physical cores.
  • the first physical core sends an address translation request to the MMU pool, and the address translation request carries the first VA to be translated.
  • the first physical core obtains the first VA, and generates an address translation request carrying the first VA.
  • the first physical core acquires a virtual address sent by the application, and the virtual address is used for data access, and the first physical core uses the VA as the first VA, and generates an address translation request carrying the first VA.
  • the first physical core corresponds to a TLB
  • the first physical core queries the page table entry where the first VA is located from the TLB corresponding to the first physical core. If the page table entry where the first VA is located is not cached in the TLB, the first physical core sends an address translation request to the MMU pool.
  • a page table entry includes a mapping relationship between a VA and a PA. If the TLB caches the page table entry where the first VA is located, the first physical core obtains the first PA corresponding to the first VA from the page table entry.
  • the TLB is used to cache the mapping relationship between the virtual address and the physical address of the memory recently accessed by the processor, and the memory page table stores the mapping relationship between all the virtual addresses and the physical address of the memory. That is, the memory page table records the page table entries where all the virtual addresses of the memory are located, and the TLB stores some page table entries in the memory page table. In some cases, all page table entries in the memory page table may also be cached in the TLB.
  • address translation when address translation is performed, if the TLB hits (hit), the first physical core directly obtains the physical address in the TLB; if the TLB misses (miss), the first physical core performs address translation through the MMU pool.
  • the address translation function provided by the MMU pool includes a page table traversal function based on the memory page table, such as a pagetablewalk function.
  • the MMU pool is located in the HA or MC of the processor. That is, performing page table traversal near the memory can effectively reduce the delay of memory access.
  • one physical core corresponds to one TLB, and the TLB is located next to the physical core. That is, the TLB is located close to the physical core to quickly obtain the corresponding page table entry from the cache.
  • the TLB can perform hierarchical pooling, and the address translation function provided by the MMU pool includes not only the above-mentioned page table traversal function, but also functions corresponding to the TLB.
  • the MMU pool includes not only multiple page table traversal units (that is, the above-mentioned multiple MMUs), but also multiple hierarchical TLBs.
  • the first physical core sends an address translation request to the MMU pool to instruct the MMU pool to perform address translation through the function corresponding to the TLB and the page table traversal function.
  • Step 302 The MMU pool receives the address conversion request sent by the first physical core, and converts the first VA into the first PA.
  • the MMU pool after receiving the address conversion request sent by the first physical core, the MMU pool converts the first VA into the first PA through an address conversion function.
  • the MMU pool receives After the address conversion request, the page table entry where the first VA is located is queried from the memory page table to obtain the first PA corresponding to the first VA.
  • the memory page table records page table entries where all VAs of the memory are located. That is, the first physical core queries the TLB, and in the case of a TLB miss, the MMU pool performs page table traversal.
  • the address translation request is sent by the first physical core to the MMU pool after obtaining the first VA, and the MMU pool provides functions corresponding to the TLB and page table traversal functions, then the MMU pool receives the address After converting the request, the page table entry where the first VA is located is queried from the hierarchical TLB. If the page table entry where the first VA is located is not cached in the hierarchical TLB, the MMU pool queries the page table entry where the first VA is located from the memory page table to obtain the first PA corresponding to the first VA. If the hierarchical TLB caches the page table entry where the first VA is located, the MMU pool acquires the first PA corresponding to the first VA from the hierarchical TLB.
  • the MMU pool includes multiple MMUs. Then, after the MMU pool receives the address translation request, it needs to determine an MMU from the multiple MMUs to perform the address translation function.
  • the multiple MMUs correspond to a management module, and an implementation manner in which the MMU pool converts the first VA into the first PA is: the MMU pool selects an MMU from the multiple MMUs as the target MMU through the management module, and the MMU pool The pool converts the first VA to the first PA through the target MMU.
  • the management module may select an MMU from the multiple MMUs according to a load balancing strategy or other methods. For example, the management module randomly selects an MMU from idle MMUs.
  • the MMU pool may also use other methods to determine a target MMU from the multiple MMUs, which is not limited in this embodiment of the present application.
  • the memory page table is a one-level page table or a multi-level page table.
  • the MMU pool sends the first VA to the MC through the HA through the target MMU, so as to query the first PA corresponding to the first VA from the memory page table.
  • the MMU pool parses the first VA through the target MMU, and obtains multi-level indexes in turn.
  • a query request is sent to the MC through the HA based on the obtained index to obtain from Query the corresponding information in the page table of the corresponding level in the memory page table, until the last query request is sent to the MC through the HA based on the index (such as address offset) obtained by the final analysis, and the first PA returned by the MC is obtained, or the return of the MC is obtained.
  • the page table entry (page table entry, PTE) where the first VA is located.
  • the MMU pool is located in the HA or MC, that is, it is located closer to the memory, the address conversion efficiency can be accelerated.
  • this solution can effectively reduce the delay of address translation.
  • the MMU is located far away from the HA (such as in the physical core), and the MMU needs to interact with the HA to access the memory to query the corresponding PA.
  • the interaction process between the MMU and the HA takes a long time, that is, , the memory access delay is large.
  • most of the current memory page tables are multi-level page tables. In the application of multi-level page tables, multiple interactions between the MMU and HA in related technologies are required to complete the pagetablewalk operation, and the memory access delay and other costs are relatively large.
  • Step 303 The MMU pool sends an address translation response to the first physical core, where the address translation response carries the first PA.
  • the MMU pool converts the first VA to the first PA, it sends an address translation response to the first physical core, and the address translation response carries the first PA, or carries the address translation between the first VA and the first PA. mapping relationship.
  • the MMU pool sends an address translation response to the first physical core through the target MMU.
  • Step 304 The first physical core receives the address translation response sent by the MMU pool.
  • the first physical core receives the address translation response sent by the MMU pool, and the address translation response carries the first PA, or carries a mapping relationship between the first VA and the first PA.
  • the first physical core stores the mapping relationship between the first VA and the first PA in a corresponding TLB.
  • the first VA is used for data access
  • the first VA is a VA for an application program to perform data access
  • the first physical core obtains the first PA
  • it queries from the data cache based on the first PA Corresponding to the first data return the first data to the application program.
  • the first physical core acquires the first data from the data cache.
  • the first physical core acquires the first data from the memory, or acquires the first data in other ways.
  • the data cache includes a level 1/level 2/level 3 (level1/level2/level3, L1/L2/L3) cache and the like.
  • the MMU pool also provides address translation functions for peripherals of the processor.
  • the peripheral device performs address translation through a system-level memory management unit (system MMU, SMMU).
  • system MMU system-level memory management unit
  • SMMU includes a TLB and a page table traversal unit, and the SMMU is located next to the IIO of the processor.
  • the page table traversal unit in the SMMU is moved to the MMU pool, and the address translation function is provided for the peripherals through the MMU pool.
  • the virtual address acquired by the peripheral is called input/output VA (input/output VA, IOVA).
  • a certain peripheral acquires the first IOVA, it queries the corresponding TLB for the page table entry where the first IOVA is located. If the page table entry where the first IOVA is located is not cached in the TLB, the peripheral sends an address translation request to the MMU pool.
  • a page table entry includes a mapping relationship between an IOVA and a PA.
  • the MMU pool queries the page table entry where the first IOVA is located from the memory page table to obtain the PA corresponding to the first IOVA.
  • the MMU pool returns the PA corresponding to the first IOVA to the peripheral. If the TLB caches the page table entry where the first IOVA is located, the peripheral device obtains the first PA corresponding to the first IOVA from the page table entry.
  • FIG. 4 is a schematic structural diagram of another processor provided by an embodiment of the present application.
  • the processor such as CPU
  • the processor includes multiple physical cores (such as physical core 0 to physical core n), TLBs located in each physical core, MMU pools located in HA (including MMU0 to MMUm), and MC, etc.
  • the MMU pool provides address translation functions including page table traversal for the multiple physical cores.
  • the processor further includes an interface for connecting peripherals, such as a network card, a graphics card, and the like.
  • the MMU pool also provides address translation functions for the processor's peripherals.
  • FIG. 5 is a schematic structural diagram of a processor in a related art provided by an embodiment of the present application.
  • the difference between the processor shown in FIG. 5 and FIG. 4 is that each physical core of the processor in FIG. 5 corresponds to an MMU, and the MMU corresponding to each physical core is located next to the physical core.
  • the processor also includes one or more peripheral processing units, such as an input/output unit (IOU), and the SMMU corresponding to the peripheral is located in the peripheral processing unit.
  • IOU input/output unit
  • the page table traversal units in both the MMU and the SMMU are far away from the memory, and in the case of a TLB miss, the time delay for accessing the memory for address conversion by interacting with the HA and the MC is relatively large.
  • this solution moves the page table traversal unit of the MMU from the physical core or SMMU to the HA or MC, and performs page table traversal at a place closer to the memory to reduce the memory access delay as much as possible.
  • moving the page table traversal unit to HA or MC can enable the page table traversal unit to be shared among multiple physical cores (cores), without the need to communicate with
  • the one-to-one correspondence between physical cores can save circuit resources of the processor chip in some embodiments.
  • FIG. 6 is a flow chart of another address translation method provided by an embodiment of the present application. The process shown in FIG. 6 is implemented on the basis of the processor shown in FIG. 4 .
  • FIG. 7 is a flow chart of an address translation method in the related art provided by the embodiment of the present application. The process shown in FIG. 7 is implemented on the basis of the processor shown in FIG. 5 .
  • the MMU pool is located in the HA or MC, and the MMU pool includes the page table traversal unit in FIG. 6 .
  • the physical core needs to perform address conversion, it first looks up the table through the TLB, that is, looks up the page table entry where the VA is located from the TLB, and the TLB table lookup takes about t1. If the TLB misses, the physical core sends an address conversion request to the MMU pool, and the MMU pool performs page table traversal through HA and MC, and the page table traversal takes about L*50ns (nanoseconds).
  • the MMU pool requests to load the page table entry (PTE) where the VA is located from the memory (such as dynamic random access memory (DRAM), etc.) , takes about 3ns.
  • the MMU pool loads the PTE from the DRAM into the TLB for caching through the memory access module of the HA and the MC, that is, updates the TLB. Among them, it takes about 90 ns to load the PTE from the DRAM, and about t3 to update the TLB. If the queried page is not in the memory, the MMU pool requests to load the PTE from the disk (such as Disk).
  • the disk such as Disk
  • the physical core obtains the corresponding page table entry from the TLB, that is, obtains the PA.
  • security checks such as protection checks
  • the physical core handles the corresponding error, such as generating a segment error signal (signal segment violation, SIGSEGV).
  • the physical core obtains the corresponding data from the data cache (such as L1/L2/L3 cache) based on the PA, and it takes about t2 to search for data from the data cache. If the data cache hits, that is, the corresponding data is searched, the physical core loads the data from the data cache, and the data loading takes about 1 ns. If the data cache misses, the physical core requests to load data from the memory, and the request takes about 20ns. Loading data from memory takes about 90ns.
  • the difference between the processing flow in FIG. 7 and FIG. 6 is that in the related technology shown in FIG. 7 , physical cores correspond to MMUs one by one, and the MMU is located next to the physical cores. If there is a miss in the TLB, the physical core performs page table traversal through the corresponding MMU, and the page table traversal takes about L*90ns. In addition, in the related art, it takes about 20 ns to request to load the PTE from the memory.
  • FIG. 6 and FIG. 7 show time overheads in four scenarios when physical cores access data corresponding to VAs.
  • Scenario 1 is a scenario in which both TLB and L1/L2/L3 are missed
  • Scenario 2 is a scenario in which both TLB and L1/L2/L3 are hit
  • Scenario 3 is a scenario in which TLB is not hit but L1/L2/L3 is hit
  • Scenario 4 is Scenario where TLB hits but L1/L2/L3 misses.
  • scenario 2 and scenario 4 that is, in the scenario of a TLB hit, the time cost of accessing the data corresponding to the VA in this solution and related technologies is basically the same.
  • FIG. 8 and FIG. 9 respectively show the flow of accessing VA-corresponding data in this solution and the related technology in scenario 1.
  • FIG. 10 and FIG. 11 respectively show the flow of accessing VA-corresponding data in this solution and the related technology in scenario 3.
  • the processes shown in FIG. 8 to FIG. 11 are consistent with the related processes in FIG. 6 and FIG. 7 , and will not be repeated here.
  • Table 1 shows the comparison of the time cost of accessing the data corresponding to the VA in this solution and related technologies under the above scenarios 1 and 3. It can be seen that in the case of TLB miss, this solution can save a lot of time compared with related technologies.
  • the saving of time overhead is mainly reflected in the process of page table traversal, and the higher the number of page table levels, the greater the saving of time overhead.
  • time consumption of each step shown in FIG. 6 to FIG. 11 and Table 1 is an empirical value obtained from experiments or actual conditions, that is, approximate time, and is not used to limit the embodiment of the present application.
  • the time consumption of each step will vary in different experiments or actual situations.
  • the time overhead of accessing the data corresponding to the VA is smaller in this solution.
  • FIG. 12 is a flow chart of a method for accessing VA corresponding data provided by an embodiment of the present application.
  • the TLB is located in the physical core, and the MMU pool is located in the HA.
  • the physical core obtains the VA through the address generate unit (AGU), it queries the page table entry corresponding to the VA from the corresponding TLB. If the TLB misses, the physical core sends an address translation request to the MMU pool, and the address translation request carries information such as the VA.
  • AGU address generate unit
  • the MMU pool performs a page table traversal operation to interact with the MC five times through the HA, and obtains the PA corresponding to the VA after the last interaction.
  • the MMU pool returns information such as the PA to the physical core, and the physical core updates the mapping relationship between the VA and the PA in the TLB.
  • the physical core can obtain corresponding data from the data cache (such as L1/L2/L3) based on the PA.
  • the physical core acquires the PA corresponding to the VA, it can acquire the corresponding data according to a conventional process, which is not limited in this embodiment of the present application.
  • Fig. 13 is a flowchart of a method for accessing VA corresponding data in the related art provided by the embodiment of the present application.
  • the MMU in the related art shown in FIG. 13 is located in the physical core, that is, far away from the memory. It can be seen from FIG. 12 and FIG. 13 that the time spent in the page table traversal process in the related art is longer than the time spent in the page table traversal process in this solution.
  • multiple physical cores of the processor share the MMU pool, that is, multiple MMUs provide the address conversion function from VA to PA for each physical core, instead of one physical core corresponding to one MMU.
  • this solution can also provide services for the physical core by multiple MMUs, instead of limiting a single MMU to provide services for the physical core, thus Improve address translation efficiency and speed up memory access.
  • Fig. 14 is a schematic structural diagram of an address translation apparatus 1400 provided in an embodiment of the present application.
  • the address translation apparatus 1400 can be implemented by software, hardware, or a combination of the two as part or all of a processor.
  • the processor can be The processor shown in Figure 1 or Figure 4. That is, the apparatus 1400 is used for a processor, the processor includes multiple physical cores and an MMU pool, the MMU pool includes multiple MMUs, and the multiple physical cores and the MMU pool are connected through an internal bus of the processor.
  • the apparatus 1400 is specifically used for a first physical core, and the first physical core is any physical core among the multiple physical cores.
  • the device 1400 includes: a sending module 1401 and a receiving module 1402 .
  • a sending module 1401, configured to send an address translation request to the MMU pool, where the address translation request carries the first VA to be translated;
  • the receiving module 1402 is configured to receive an address translation response sent by the MMU pool, where the address translation response carries the first PA corresponding to the first VA.
  • the device 1400 also includes:
  • the table lookup module is used to query the page table entry where the first VA is located from the TLB corresponding to the first physical core, and a page table entry includes a mapping relationship between a VA and a PA;
  • the triggering module is configured to trigger the sending module 1401 to execute the operation of sending the address translation request to the MMU pool if the page table entry where the first VA is located is not cached in the TLB.
  • multiple physical cores of the processor share the MMU pool, that is, multiple MMUs provide each physical core with an address conversion function from VA to PA, instead of one physical core corresponding to one MMU.
  • this solution can also provide services for the physical core by multiple MMUs, instead of limiting a single MMU to provide services for the physical core, thus Improve address translation efficiency and speed up memory access.
  • the address conversion device when the address conversion device provided by the above embodiment performs address conversion, it only uses the division of the above-mentioned functional modules for illustration. In practical applications, the above-mentioned function allocation can be completed by different functional modules according to needs. , that is, the internal structure of the device is divided into different functional modules to complete all or part of the functions described above.
  • the device for address conversion provided by the above embodiment and the embodiment of the method for address conversion belong to the same idea, and the specific implementation process thereof is detailed in the method embodiment, and will not be repeated here.
  • Fig. 15 is a schematic structural diagram of an address translation apparatus 1500 provided in an embodiment of the present application.
  • the address translation apparatus 1500 can be implemented by software, hardware, or a combination of the two as part or all of a processor.
  • the processor can be The processor shown in Figure 1 or Figure 4. That is, the apparatus 1500 is used for a processor, the processor includes multiple physical cores and an MMU pool, the MMU pool includes multiple MMUs, and the multiple physical cores are connected to the MMU pool through an internal bus of the processor.
  • the apparatus 1500 is specifically used for MMU pooling.
  • the device 1500 includes: a receiving module 1501 , an address conversion module 1502 and a sending module 1503 .
  • the receiving module 1501 is configured to receive an address conversion request sent by the first physical core, where the address conversion request carries the first VA to be converted, and the first physical core is any one of multiple physical cores;
  • An address conversion module 1502 configured to convert the first VA into the first PA
  • a sending module 1503 configured to send an address translation response to the first physical core, where the address translation response carries the first PA.
  • the address conversion module 1502 is specifically configured to:
  • the memory page table records the page table entries of all VAs in the memory.
  • a page table entry includes a VA and a PA mapping relationship between them.
  • the multiple MMUs correspond to one management module; the address translation module 1502 is specifically used for:
  • multiple physical cores of the processor share the MMU pool, that is, multiple MMUs provide each physical core with an address conversion function from VA to PA, instead of one physical core corresponding to one MMU.
  • this solution can also provide services for the physical core by multiple MMUs, instead of limiting a single MMU to provide services for the physical core, thus Improve address translation efficiency and speed up memory access.
  • the address conversion device when the address conversion device provided by the above embodiment performs address conversion, it only uses the division of the above-mentioned functional modules for illustration. In practical applications, the above-mentioned function allocation can be completed by different functional modules according to needs. , that is, the internal structure of the device is divided into different functional modules to complete all or part of the functions described above.
  • the device for address conversion provided by the above embodiment and the embodiment of the method for address conversion belong to the same idea, and its specific implementation process is detailed in the method embodiment, and will not be repeated here.
  • all or part may be implemented by software, hardware, firmware or any combination thereof.
  • software When implemented using software, it may be implemented in whole or in part in the form of a computer program product.
  • the computer program product includes one or more computer instructions. When the computer instructions are loaded and executed on the computer, the processes or functions according to the embodiments of the present application will be generated in whole or in part.
  • the computer can be a general purpose computer, a special purpose computer, a computer network or other programmable devices.
  • the computer instructions may be stored in or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be transmitted from a website, computer, server or data center Transmission to another website site, computer, server or data center by wired (eg coaxial cable, optical fiber, digital subscriber line (DSL)) or wireless (eg infrared, wireless, microwave, etc.).
  • the computer-readable storage medium may be any available medium that can be accessed by a computer, or may be a data storage device such as a server or a data center integrated with one or more available media.
  • the available medium may be a magnetic medium (for example: floppy disk, hard disk, magnetic tape), an optical medium (for example: digital versatile disc (digital versatile disc, DVD)) or a semiconductor medium (for example: solid state disk (solid state disk, SSD)) wait.
  • a magnetic medium for example: floppy disk, hard disk, magnetic tape
  • an optical medium for example: digital versatile disc (digital versatile disc, DVD)
  • a semiconductor medium for example: solid state disk (solid state disk, SSD)
  • the information including but not limited to user equipment information, user personal information, etc.
  • data including but not limited to data used for analysis, stored data, displayed data, etc.
  • All signals are authorized by the user or fully authorized by all parties, and the collection, use and processing of relevant data need to comply with relevant laws, regulations and standards of relevant countries and regions.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Hardware Design (AREA)
  • Software Systems (AREA)
  • Memory System Of A Hierarchy Structure (AREA)

Abstract

本申请实施例公开了一种处理器、地址转换的方法、装置、存储介质及程序产品,属于计算机技术领域。本方案中处理器的多个物理核共享MMU池,即,由多个MMU为各物理核提供VA到PA的地址转换功能,而非一个物理核对应一个MMU。这样,即使某个物理核的地址转换需求较大,如在并发访问内存时,本方案也能够由多个MMU为该物理核提供服务,而不是限制由单个MMU为该物理核提供服务,从而提高地址转换效率,加快内存访问。

Description

处理器、地址转换的方法、装置、存储介质及程序产品
本申请要求于2021年11月25日提交的申请号为202111417305.9、发明名称为“一种地址管理的方法和处理器系统”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。本申请要求于2022年1月25日提交的申请号为202210087387.3、发明名称为“处理器、地址转换的方法、装置、存储介质及程序产品”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及计算机技术领域,特别涉及一种处理器、地址转换的方法、装置、存储介质及程序产品。
背景技术
目前,处理器大多采用虚拟地址(virtual address,VA)来进行内存寻址,使用VA可以创建比实际的物理地址(physical address,PA)大得多的寻址空间。在采用VA进行内存寻址的过程中,处理器需要将接收到的VA转换为PA。如何提高处理器的地址转换效率是当前研究的一个热点。
发明内容
本申请实施例提供了一种处理器、地址转换的方法、装置、存储介质及程序产品,能够提高处理器的地址转换效率。所述技术方案如下:
第一方面,提供了一种处理器,该处理器包括多个物理核和一个内存管理单元(memory management unit,MMU)池,该MMU池包括多个MMU,该多个物理核与MMU池通过处理器的内部总线相连,MMU池为该多个物理核提供从VA到PA的地址转换功能。
简单来说,本方案中处理器的多个物理核共享MMU池,而非一个物理核对应一个MMU。这样,即使某个物理核的地址转换需求较大,如在并发访问内存时,本方案也能够由多个MMU为该物理核提供服务,而不是限制由单个MMU为该物理核提供服务,从而提高地址转换效率,加快内存访问。
可选地,MMU池还为处理器的外设提供地址转换功能。也即是,本方案还能够提高外设访问内存过程中的地址转换效率。其中,处理器的外设包括处理器外部的物理网卡、显卡等。可选地,处理器的外设还包括处理器内部包含诸如网卡、显卡等功能的芯片或组件。
可选地,MMU池位于处理器的本地代理(home agent,HA)或内存控制器(memory controller,MC)。即,在离内存近的地方进行地址转换(包括页表遍历),能够有效减少地址转换的时延。尤其在多级页表的请情况下,地址转换效率更高。
可选地,第一物理核用于向MMU池发送地址转换请求,该地址转换请求携带待转换的第一VA。其中,第一物理核为多个物理核中的任一物理核。MMU池用于接收该地址转换请求,将第一VA转换为第一PA,向第一物理核发送地址转换响应,该地址转换响应携带第一 PA。第一物理核还用于接收地址转换响应。
可选地,第一物理核用于从第一物理核对应的转译后备缓存器(translation lookaside buffer,TLB)中查询第一VA所在的页表项,如果该TLB中未缓存第一VA所在的页表项,则向MMU池发送地址转换请求。其中,一个页表项包含一个VA与一个PA之间的映射关系。MMU池用于从内存页表中查询第一VA所在的页表项,以得到第一VA对应的第一PA。其中,内存页表记录有内存所有VA所在的页表项。也即是,在一种实现方式中,先从离物理核近的TLB中查表,以快速从缓存中获取对应的页表项。在TLB未命中的情况下,再通过离内存近的MMU池进行页表遍历,以快速从内存页表中获取对应的页表项。
可选地,内存页表为一级页表或多级页表。
第二方面,提供了一种地址转换的方法,该方法应用于处理器。该处理器包括多个物理核和一个MMU池,该MMU池包括多个MMU,该多个物理核与MMU池通过处理器的内部总线相连;该方法包括:
第一物理核向MMU池发送地址转换请求,该地址转换请求携带待转换的第一VA,第一物理核为该多个物理核中的任一物理核;第一物理核接收MMU池发送的地址转换响应,该地址转换响应携带第一VA对应的第一PA。
也即是,MMU池为该多个物理核提供从VA到PA的地址转换功能。简单来说,本方案中处理器的多个物理核共享MMU池,而非一个物理核对应一个MMU。这样,即使某个物理核的地址转换需求较大,如在并发访问内存时,本方案也能够由多个MMU为该物理核提供服务,而不是限制由单个MMU为该物理核提供服务,从而提高地址转换效率,加快内存访问。
可选地,第一物理核向MMU池发送地址转换请求之前,还包括:第一物理核从第一物理核对应的TLB中查询第一VA所在的页表项,一个页表项包含一个VA与一个PA之间的映射关系;如果TLB中未缓存第一VA所在的页表项,则第一物理核执行向MMU池发送地址转换请求的操作。
第三方面,提供了一种地址转换的方法,该方法应用于处理器,该处理器包括多个物理核和一个MMU池,该MMU池包括多个MMU,该多个物理核与MMU池通过处理器的内部总线相连;该方法包括:
MMU池接收第一物理核发送的地址转换请求,该地址转换请求携带待转换的第一VA,第一物理核为该多个物理核中的任一物理核;MMU池将第一VA转换为第一PA;MMU池向第一物理核发送地址转换响应,该地址转换响应携带第一PA。也即是,MMU池为该多个物理核提供从VA到PA的地址转换功能。简单来说,本方案中处理器的多个物理核共享MMU池,而非一个物理核对应一个MMU。
可选地,MMU池将第一VA转换为第一PA,包括:MMU池从内存页表中查询第一VA所在的页表项,以得到第一VA对应的第一PA,内存页表记录有内存所有VA所在的页表项,一个页表项包含一个VA与一个PA之间的映射关系。
可选地,该多个MMU对应一个管理模块;MMU池将第一VA转换为第一PA,包括:MMU池通过该管理模块从该多个MMU中选择一个MMU作为目标MMU;MMU池通过目 标MMU将第一VA转换为第一PA。
第四方面,提供了一种地址转换的装置,所述地址转换的装置具有实现上述第二方面中地址转换的方法行为的功能。所述地址转换的装置包括一个或多个模块,该一个或多个模块用于实现上述第二方面所提供的地址转换的方法。
也即是,提供了一种地址转换的装置,该装置用于处理器,处理器可以为上述第一方面提供的处理器。即,该处理器包括多个物理核和一个MMU池,该MMU池包括多个MMU,该多个物理核与MMU池通过处理器的内部总线相连;该装置具体用于第一物理核,第一物理核为该多个物理核中的任一物理核;该装置包括:
发送模块,用于向MMU池发送地址转换请求,该地址转换请求携带待转换的第一VA;
接收模块,用于接收MMU池发送的地址转换响应,该地址转换响应携带第一VA对应的第一PA。
可选地,该装置还包括:
查表模块,用于从第一物理核对应的TLB中查询第一VA所在的页表项,一个页表项包含一个VA与一个PA之间的映射关系;
触发模块,用于如果TLB中未缓存第一VA所在的页表项,则触发发送模块执行向MMU池发送地址转换请求的操作。
第五方面,提供了一种地址转换的装置,所述地址转换的装置具有实现上述第三方面中地址转换的方法行为的功能。所述地址转换的装置包括一个或多个模块,该一个或多个模块用于实现上述第三方面所提供的地址转换的方法。
也即是,提供了一种地址转换的装置,该装置用于处理器,处理器可以为上述第一方面提供的处理器。即,该处理器包括多个物理核和一个MMU池,该MMU池包括多个MMU,该多个物理核与MMU池通过处理器的内部总线相连;该装置具体用于MMU池,该装置包括:
接收模块,用于接收第一物理核发送的地址转换请求,该地址转换请求携带待转换的第一VA,第一物理核为多个物理核中的任一物理核;
地址转换模块,用于将第一VA转换为第一PA;
发送模块,用于向第一物理核发送地址转换响应,该地址转换响应携带第一PA。
可选地,地址转换模块具体用于:
从内存页表中查询第一VA所在的页表项,以得到第一VA对应的第一PA,内存页表记录有内存所有VA所在的页表项,一个页表项包含一个VA与一个PA之间的映射关系。
可选地,该多个MMU对应一个管理模块;地址转换模块具体用于:
通过管理模块从该多个MMU中选择一个MMU作为目标MMU;
通过目标MMU将第一VA转换为第一PA。
第六方面,提供了一种计算机设备,所述计算机设备包括处理器和存储器,所述处理器为上述第一方面提供的处理器,所述存储器用于存储执行上述第二方面和第三方面所提供的地址转换方法的程序,以及存储用于实现上述第二和第三方面所提供的地址转换方法所涉及 的数据。所述处理器被配置为用于执行所述存储器中存储的程序。所述存储设备的操作装置还可以包括通信总线,该通信总线用于该处理器与存储器之间建立连接。
第七方面,提供了一种计算机可读存储介质,所述计算机可读存储介质中存储有指令,当其在计算机上运行时,使得计算机执行上述第二方面和/或第三方面所述的地址转换的方法。
第八方面,提供了一种包含指令的计算机程序产品,当其在计算机上运行时,使得计算机执行上述第二方面和/或第三方面所述的地址转换的方法。
上述第四方面、第五方面、第六方面、第七方面和第八方面所获得的技术效果与第一方面或第二方面或第三方面中对应的技术手段获得的技术效果近似,在这里不再赘述。
本申请实施例提供的技术方案至少能够带来以下有益效果:
本方案中处理器的多个物理核共享MMU池,即,由多个MMU为各物理核提供VA到PA的地址转换功能,而非一个物理核对应一个MMU。这样,即使某个物理核的地址转换需求较大,如在并发访问内存时,本方案也能够由多个MMU为该物理核提供服务,而不是限制由单个MMU为该物理核提供服务,从而提高地址转换效率,加快内存访问。
附图说明
图1是本申请实施例提供的一种处理器的结构示意图;
图2是本申请实施例提供的一种计算机设备的结构示意图;
图3是本申请实施例提供的一种地址转换的方法的流程图;
图4是本申请实施例提供的另一种处理器的结构示意图;
图5是本申请实施例提供的相关技术中一种处理器的结构示意图;
图6是本申请实施例提供的另一种地址转换的方法的流程图;
图7是本申请实施例提供的相关技术中一种地址转换的方法的流程图;
图8是本申请实施例提供的本方案在场景1下访问VA对应数据的一个流程图;
图9是本申请实施例提供的相关技术在场景1下访问VA对应数据的一个流程图;
图10是本申请实施例提供的本方案在场景3下访问VA对应数据的一个流程图;
图11是本申请实施例提供的相关技术在场景3下访问VA对应数据的一个流程图;
图12是本申请实施例提供的本方案另一种访问VA对应数据的方法的流程图;
图13是本申请实施例提供的相关技术另一种访问VA对应数据的方法的流程图;
图14是本申请实施例提供的一种地址转换的装置的结构示意图;
图15是本申请实施例提供的另一种地址转换的装置的结构示意图。
具体实施方式
为使本申请实施例的目的、技术方案和优点更加清楚,下面将结合附图对本申请实施方式作进一步地详细描述。
首先需要说明的是,本申请实施例描述的系统架构以及业务场景是为了更加清楚的说明本申请实施例的技术方案,并不构成对于本申请实施例提供的技术方案的限定,本领域普通 技术人员可知,随着系统架构的演变和新业务场景的出现,本申请实施例提供的技术方案对于类似的技术问题,同样适用。
图1是本申请实施例提供的一种处理器的结构示意图。参见图1,该处理器包括多个物理核(core)和一个MMU池,该MMU池包括多个MMU。该多个物理核包括物理核0至物理核n,该多个MMU包括MMU 0至MMU m。该多个物理核与该MMU池通过处理器的内部总线相连。该MMU池为该多个物理核提供从VA到PA的地址转换功能。
可选地,该地址转换功能包括基于内存页表的页表遍历功能,如pagetablewalk功能。那么,为了提高页表遍历效率,尤其是多级页表情况下的页表遍历效率,在一种实现方式中,MMU池位于处理器的HA或MC。即,在离内存近的地方进行页表遍历,能够有效减少内存访问的时延。
可选地,该MMU池还为处理器的外设提供地址转换功能。外设包括处理器外部的物理网卡、显卡等。可选地,外设还包括处理器内部具备诸如网卡、显卡等功能的芯片或组件。
请参考图2,图2是根据本申请实施例示出的一种计算机设备的结构示意图。可选地,该计算机设备包括一个或多个处理器201、总线202、存储器203以及一个或多个接口204。可选地,该一个或多个处理器201包括如图1中所示的处理器。
处理器201为一个通用中央处理器(central processing unit,CPU)、网络处理器(network processing,NP)、微处理器、或者为一个或多个用于实现本申请方案的集成电路,例如,专用集成电路(application-specific integrated circuit,ASIC),可编程逻辑器件(programmable logic device,PLD)或其组合。可选地,上述PLD为复杂可编程逻辑器件(complex programmable logic device,CPLD),现场可编程逻辑门阵列(field-programmable gate array,FPGA),通用阵列逻辑(generic array logic,GAL)或其任意组合。
总线202用于在上述组件之间传送信息。可选地,总线202分为地址总线、数据总线、控制总线等。为便于表示,图中仅用一条粗线表示,但并不表示仅有一根总线或一种类型的总线。可选地,总线也称为通信总线。
可选地,存储器203为只读存储器(read-only memory,ROM)、随机存取存储器(random access memory,RAM)、电可擦可编程只读存储器(electrically erasable programmable read-only memory,EEPROM)、光盘(包括只读光盘(compact disc read-only memory,CD-ROM)、压缩光盘、激光盘、数字通用光盘、蓝光光盘等)、磁盘存储介质或者其它磁存储设备,或者是能够用于携带或存储具有指令或数据结构形式的期望的程序代码并能够由计算机存取的任何其它介质,但不限于此。存储器203独立存在,并通过总线202与处理器201相连接,或者,存储器203与处理器201集成在一起。
接口204使用任何收发器一类的装置,用于与其它设备或通信网络通信。可选地,接口也称为通信接口。接口204包括有线通信接口,可选地,还包括无线通信接口。其中,有线通信接口例如以太网接口等。可选地,以太网接口为光接口、电接口或其组合。无线通信接口为无线局域网(wireless local area networks,WLAN)接口、蜂窝网络通信接口或其组合等。
可选地,在一些实施例中,计算机设备包括多个处理器,如图2中所示的处理器201和处理器205。这些处理器中的每一个为一个单核处理器,或者一个多核处理器。可选地,这 里的处理器指一个或多个设备、电路、和/或用于处理数据(如计算机程序指令)的处理核。
在具体实现中,作为一种实施例,计算机设备还包括输出设备206和输入设备207。输出设备206和处理器201通信,能够以多种方式来显示信息。例如,输出设备206为液晶显示器(liquid crystal display,LCD)、发光二级管(light emitting diode,LED)显示设备、阴极射线管(cathode ray tube,CRT)显示设备或投影仪(projector)等。输入设备207和处理器201通信,能够以多种方式接收用户的输入。例如,输入设备207是鼠标、键盘、触摸屏设备或传感设备等。
在一些实施例中,存储器203用于存储执行本申请方案的程序代码210,处理器201能够执行存储器203中存储的程序代码210。该程序代码中包括一个或多个软件模块,该计算机设备能够通过处理器201以及存储器203中的程序代码210,来实现下文图3实施例提供的地址转换的方法。
图3是本申请实施例提供的一种地址转换的方法的流程图,该方法应用于处理器。请参考图3,该方法包括如下步骤。
步骤301:第一物理核向MMU池发送地址转换请求,该地址转换请求携带待转换的第一VA。
由前述可知,在本申请实施例中,处理器包括多个物理核和一个MMU池,MMU池包括多个MMU,该多个物理核与MMU池通过处理器的内部总线相连,MMU池为该多个物理核提供从VA到PA的地址转换功能,即地址翻译功能。该多个物理核中的任一物理核均能够通过该MMU池进行地址转换。接下来以第一物理核通过该MMU池进行地址转换为例进行介绍,第一物理核为该多个物理核中的任一物理核。
在本申请实施例中,第一物理核向MMU池发送地址转换请求,该地址转换请求携带待转换的第一VA。可选地,第一物理核向MMU池发送地址转换请求之前,获取第一VA,生成携带第一VA的地址转换请求。示例性地,第一物理核获取应用程序发送的一个虚拟地址,该虚拟地址用于进行数据访问,第一物理核将该VA作为第一VA,生成携带第一VA的地址转换请求。
可选地,第一物理核对应有TLB,第一物理核向MMU池发送地址转换请求之前,从第一物理核对应的TLB中查询第一VA所在的页表项。如果该TLB中未缓存第一VA所在的页表项,则第一物理核向MMU池发送地址转换请求。其中,一个页表项包含一个VA与一个PA之间的映射关系。如果该TLB中缓存有第一VA所在的页表项,则第一物理核从该页表项中获得第一VA对应的第一PA。需要说明的是,TLB是用于缓存处理器最近访问的内存的虚拟地址与物理地址之间的映射关系,内存页表中存储有内存所有的虚拟地址与物理地址之间的映射关系。即,内存页表记录有内存所有虚拟地址所在的页表项,TLB中存储有内存页表中部分的页表项。在一些情况下,TLB中也可能缓存有内存页表中全部的页表项。简单来说,在进行地址转换时,若TLB命中(hit),则第一物理核直接获取TLB中的物理地址,若TLB未命中(miss),则第一物理核通过MMU池进行地址转换。
可选地,MMU池提供的地址转换功能包括基于内存页表的页表遍历功能,如pagetablewalk功能。那么,为了提高页表遍历效率,尤其是多级页表情况下的页表遍历效率,MMU池位于处理器的HA或MC。即,在离内存近的地方进行页表遍历,能够有效减少内存 访问的时延。另外,在本申请实施例中,一个物理核对应一个TLB,TLB位于物理核的旁边。即,TLB位于离物理核近的地方,以快速从缓存中获取对应的页表项。
在另一些实施例中,TLB能够进行分级池化,MMU池提供的地址转换功能既包括上述页表遍历功能,还包括TLB对应的功能。简单来说,MMU池既包括多个页表遍历单元(即上述多个MMU),还包括多个分级TLB。相应地,第一物理核获取到第一VA之后,向MMU池发送地址转换请求,以指示MMU池通过TLB对应的功能以及页表遍历功能进行地址转换。
步骤302:MMU池接收第一物理核发送的地址转换请求,将第一VA转换为第一PA。
在本申请实施例中,MMU池接收到第一物理核发送的地址转换请求后,通过地址转换功能将第一VA转换为第一PA。
可选地,若该地址转换请求是在第一物理核对应的TLB中未缓存第一VA所在的页表项的情况下接收到的,MMU池提供页表遍历功能,那么,MMU池接收到该地址转换请求后,从内存页表中查询第一VA所在的页表项,以得到第一VA对应的第一PA。其中,内存页表记录有内存所有VA所在的页表项。即,第一物理核查询TLB,在TLB未命中的情况下,由MMU池进行页表遍历。
在另一些实施例中,该地址转换请求是第一物理核在获取到第一VA之后向MMU池发送的,MMU池提供TLB对应的功能以及页表遍历功能,那么,MMU池接收到该地址转换请求之后,从分级TLB中查询第一VA所在的页表项。如果分级TLB中未缓存第一VA所在的页表项,则MMU池从内存页表中查询第一VA所在的页表项,以得到第一VA对应的第一PA。如果分级TLB缓存有第一VA所在的页表项,则MMU池从分级TLB中获取第一VA对应的第一PA。
由前述可知,该MMU池包括多个MMU,那么,MMU池接收到该地址转换请求之后,需要从该多个MMU中确定一个MMU来执行地址转换功能。可选地,该多个MMU对应一个管理模块,MMU池将第一VA转换为第一PA的一种实现方式为:MMU池通过管理模块从该多个MMU中选择一个MMU作为目标MMU,MMU池通过目标MMU将第一VA转换为第一PA。其中,管理模块可以按照负载均衡策略或其他方式从该多个MMU中选择一个MMU。例如,管理模块从空闲的MMU中随机选择一个MMU。当然,MMU池也可以通过其他方法从该多个MMU中确定出一个目标MMU,本申请实施例对此不作限定。
可选地,在本申请实施例中,内存页表为一级页表或多级页表。在内存页表为一级页表的应用中,MMU池通过目标MMU将第一VA通过HA发送给MC,从而从内存页表中查询第一VA对应的第一PA。在内存页表为多级页表的应用中,MMU池通过目标MMU解析第一VA,依次得到多级索引,每得到一级索引,基于得到的索引通过HA向MC发送一个查询请求,以从内存页表中对应层级的页表中查询对应的信息,直至基于最后解析得到的索引(如地址偏移)通过HA向MC发送最后一次查询请求,得到MC返回的第一PA,或得到MC返回的第一VA所在的页表项(page table entry,PTE)。需要说明的是,页表遍历功能的具体实现方式也可以参照相关技术,本申请实施例不作详述。
可以看出,本方案由于MMU池位于HA或MC,即位于离内存较近的地方,因此,能够加快地址转换效率。尤其在多级页表的应用中,本方案能够有效减少地址转换的时延。而相关技术中MMU位于离HA较远的地方(如位于物理核),MMU需要与HA进行交互以访问内存才能查询到对应的PA,MMU与HA之间的交互过程所需时间较长,即,内存访问的 时延较大。并且,目前内存页表大多为多级页表,在多级页表的应用中,相关技术中的MMU与HA之间需要多次交互才能完成pagetablewalk操作,内存访问的时延等开销较大。
步骤303:MMU池向第一物理核发送地址转换响应,该地址转换响应携带第一PA。
在本申请实施例中,MMU池将第一VA转换为第一PA之后,向第一物理核发送地址转换响应,该地址转换响应携带第一PA,或携带第一VA与第一PA之间的映射关系。可选地,MMU池通过目标MMU向第一物理核发送地址转换响应。
步骤304:第一物理核接收MMU池发送的地址转换响应。
在本申请实施例中,第一物理核接收MMU池发送的地址转换响应,该地址转换响应携带第一PA,或携带第一VA与第一PA之间的映射关系。可选地,第一物理核将第一VA与第一PA之间的映射关系存储在对应的TLB中。
可选地,第一VA用于进行数据访问,例如,第一VA是某应用程序进行数据访问的一个VA,那么,第一物理核得到第一PA之后,基于第一PA从数据缓存中查询对应的第一数据,将第一数据返回给应用程序。如果数据缓存中缓存有第一PA对应的第一数据,那么,第一物理核从数据缓存中获取第一数据。如果数据缓存中未缓存第一数据,那么,第一物理核从内存中获取第一数据,或者通过其他方式获取第一数据。其中,数据缓存包括一级/二级/三级(level1/level2/level3,L1/L2/L3)缓存等。
可选地,MMU池还为处理器的外设提供地址转换功能。相关技术中,外设通过系统级内存管理单元(system MMU,SMMU)进行地址转换,SMMU包括TLB和页表遍历单元,SMMU位于处理器的IIO旁边。本方案将SMMU中的页表遍历单元搬到MMU池中,通过MMU池为外设提供地址转换功能。
在本申请实施例中,外设获取的虚拟地址称为输入输出VA(input/output VA,IOVA)。某外设获取第一IOVA之后,从对应的TLB中查询第一IOVA所在的页表项。如果该TLB中未缓存第一IOVA所在的页表项,则该外设向MMU池发送一个地址转换请求。其中,一个页表项包含一个IOVA与一个PA之间的映射关系。MMU池从内存页表中查询第一IOVA所在的页表项,以得到第一IOVA对应的PA。MMU池向该外设返回第一IOVA对应的PA。如果该TLB中缓存有第一IOVA所在的页表项,则该外设从该页表项中获得第一IOVA对应的第一PA。
图4是本申请实施例提供的另一种处理器的结构示意图。参见图4,该处理器(如CPU)包括多个物理核(如物理核0至物理核n)、位于各物理核的TLB、位于HA的MMU池(包括MMU0至MMUm),还包括MC等。其中,MMU池为该多个物理核提供包括页表遍历的地址转换功能。可选地,如图4所示,该处理器还包括用于连接外设的接口,外设如网卡、显卡等。相应地,MMU池还为处理器的外设提供地址转换功能。
图5是本申请实施例提供的一种相关技术中处理器的结构示意图。图5所示处理器与图4不同的是,图5中处理器的各个物理核对应一个MMU,每个物理核对应的MMU位于物理核的旁边。处理器还包括一个或多个外设处理单元,如输入输出单元(input/output unit,IOU),外设对应的SMMU位于外设处理单元。由图5可以看出,相关技术中,物理核之间不能够共享MMU,若某个物理核的地址转换需求较大的情况下,对应的MMU不能够满足需要,从而导致地址转换效率较低。SMMU也是如此。另外,相关技术中,MMU和SMMU中的页表遍历单元都离内存较远,在TLB未命中的情况下,通过与HA和MC交互来访问内存以进行 地址转换的时延较大。
而由图4以及前述可知,本方案将MMU的页表遍历单元从物理核或SMMU搬到HA或MC,在离内存较近的地方进行页表遍历,以尽可能地降低内存访问时延,提升从VA到PA的地址转换效率,进而提升数据访问效率。另外,由于HA和MC是处理器中各物理核共享的,因此,将页表遍历单元搬到HA或MC,能够使得页表遍历单元在多个物理核(core)之间共享,不需要与物理核一一对应,在一些实施例中能够节省处理器芯片的电路资源。
图6是本申请实施例提供的另一种地址转换的方法的流程图。图6所示流程是在图4所示处理器的基础上实现的。图7是本申请实施例提供的相关技术中地址转换方法的流程图,图7所示流程是在图5所示处理器的基础上实现的。
参见图6,假设内存页表的级数为L,MMU池位于HA或MC,MMU池包括图6中的页表遍历单元。在本方案中,物理核需要进行地址转换时,先通过TLB查表,即从TLB中查VA所在的页表项,TLB查表耗时约为t1。若TLB未命中,则物理核向MMU池发送地址转换请求,MMU池通过HA和MC进行页表遍历,页表遍历耗时约为L*50ns(纳秒)。在页表遍历的过程中,若查询的页在内存中,则MMU池请求从内存(如动态随机存取存储器(dynamic random access memory,DRAM)等)中加载VA所在的页表项(PTE),耗时约为3ns。MMU池通过HA和MC的内存访问模块从DRAM将PTE加载到TLB中进行缓存,即更新TLB。其中,从DRAM加载PTE耗时约为90ns,更新TLB耗时约为t3。若查询的页不在内存中,则MMU池请求从磁盘(如Disk)中加载PTE。若TLB命中,则物理核从TLB中获取对应的页表项,即获取到PA。可选地,物理核获取到PA之后进行安全检测(如protection check)。若安全检测未通过,则物理核处理相应的错误,如生成段错误信号(signal segment violation,SIGSEGV)。在安全检测通过(如access permitted)后,物理核基于该PA从数据缓存(如L1/L2/L3缓存)中获取对应的数据,从数据缓存中搜索数据耗时约为t2。若在数据缓存命中,即搜索到对应的数据,则物理核从数据缓存中加载数据,数据加载耗时约为1ns。若在数据缓存未命中,则物理核请求从内存加载数据,请求耗时约为20ns。从内存加载数据耗时约为90ns。
图7与图6的处理流程不同的地方在于,在图7所示的相关技术中,物理核与MMU一一对应,MMU位于物理核的旁边。若在TLB未命中,则物理核通过对应的MMU进行页表遍历,页表遍历耗时约为L*90ns。另外,相关技术中请求从内存加载PTE耗时约为20ns。
图6和图7示出了物理核访问VA对应数据时四种场景下的时间开销。场景1为TLB和L1/L2/L3均未命中的场景,场景2为TLB和L1/L2/L3均命中的场景,场景3为TLB未命中但L1/L2/L3命中的场景,场景4为TLB命中但L1/L2/L3未命中的场景。其中,在场景2和场景4中,即在TLB命中的场景中,本方案和相关技术中访问VA对应数据的时间开销基本持平。而在场景1和场景3中,即在TLB未命中的场景中,本方案相比于相关技术能够节省很多时间开销。图8和图9分别示出了场景1下本方案与相关技术中访问VA对应数据的流程。图10和图11分别示出了场景3下本方案与相关技术中访问VA对应数据的流程。图8至图11所示流程与图6和图7中的相关流程一致,这里不再重复赘述。
表1展示了上述场景1和场景3下本方案和相关技术中访问VA对应数据的时间开销对比。可以看出,在TLB miss的情况下,本方案相比于相关技术能够节省很多时间开销。其中时间开销的节省主要体现在页表遍历的过程中,且页表级数越高,节省的时间开销越大。
表1
Figure PCTCN2022110069-appb-000001
需要说明的是,上述图6至图11以及表1中所示各步骤的耗时是根据实验或实际情况得出的经验值,即大致时间,并不用于限制本申请实施例。各步骤的耗时在不同的实验或实际情况下会有所不同。总体来讲,本方案相比于相关技术中访问VA对应数据的时间开销更小。
图12是本申请实施例提供的一种访问VA对应数据的方法的流程图。如图12所示,在本方案的一种实现方式中,TLB位于物理核,MMU池位于HA。物理核通过地址生成单元(address generate unit,AGU)获取VA后,从对应的TLB中查询该VA对应的页表项。若TLB未命中,则物理核向MMU池发送地址转换请求,该地址转换请求携带该VA等信息。假设内存页表为四级页表,则MMU池执行页表遍历操作,以通过HA与MC进行5次交互,在最后一次交互后获得该VA对应的PA。MMU池将该PA等信息返回给物理核,物理核将该VA与PA的映射关系更新在TLB中。之后,物理核可以基于该PA从数据缓存(如L1/L2/L3)中获取对应的数据。简单来说,物理核在获取到VA对应的PA之后,可以按照常规流程来获取对应的数据,本申请实施例对此不作限定。
图13是本申请实施例提供的相关技术中访问VA对应数据的方法的流程图。与图12所示本方案不同的是,图13所示相关技术中的MMU位于物理核,即离内存较远的地方。由图12和图13可以看出,相关技术中页表遍历过程所花费的时间大于本方案页表遍历过程所花费的时间。
综上所述,本方案中处理器的多个物理核共享MMU池,即,由多个MMU为各物理核提供VA到PA的地址转换功能,而非一个物理核对应一个MMU。这样,即使某个物理核的地址转换需求较大,如在并发访问内存时,本方案也能够由多个MMU为该物理核提供服务,而不是限制由单个MMU为该物理核提供服务,从而提高地址转换效率,加快内存访问。
图14是本申请实施例提供的一种地址转换的装置1400的结构示意图,该地址转换的装置1400可以由软件、硬件或者两者的结合实现成为处理器的部分或者全部,该处理器可以为图1或图4所示的处理器。即,该装置1400用于处理器,该处理器包括多个物理核和一个MMU池,该MMU池包括多个MMU,该多个物理核与MMU池通过处理器的内部总线相连。该装置1400具体用于第一物理核,第一物理核为该多个物理核中的任一物理核。参见图14,该装置1400包括:发送模块1401和接收模块1402。
发送模块1401,用于向MMU池发送地址转换请求,该地址转换请求携带待转换的第一VA;
接收模块1402,用于接收MMU池发送的地址转换响应,该地址转换响应携带第一VA对应的第一PA。
可选地,该装置1400还包括:
查表模块,用于从第一物理核对应的TLB中查询第一VA所在的页表项,一个页表项包含一个VA与一个PA之间的映射关系;
触发模块,用于如果TLB中未缓存第一VA所在的页表项,则触发发送模块1401执行向MMU池发送地址转换请求的操作。
在本申请实施例中,处理器的多个物理核共享MMU池,即,由多个MMU为各物理核提供VA到PA的地址转换功能,而非一个物理核对应一个MMU。这样,即使某个物理核的地址转换需求较大,如在并发访问内存时,本方案也能够由多个MMU为该物理核提供服务,而不是限制由单个MMU为该物理核提供服务,从而提高地址转换效率,加快内存访问。
需要说明的是:上述实施例提供的地址转换的装置在进行地址转换时,仅以上述各功能模块的划分进行举例说明,实际应用中,可以根据需要而将上述功能分配由不同的功能模块完成,即将装置的内部结构划分成不同的功能模块,以完成以上描述的全部或者部分功能。另外,上述实施例提供的地址转换的装置与地址转换的方法实施例属于同一构思,其具体实现过程详见方法实施例,这里不再赘述。
图15是本申请实施例提供的一种地址转换的装置1500的结构示意图,该地址转换的装置1500可以由软件、硬件或者两者的结合实现成为处理器的部分或者全部,该处理器可以为图1或图4所示的处理器。即,该装置1500用于处理器,该处理器包括多个物理核和一个MMU池,该MMU池包括多个MMU,该多个物理核与MMU池通过处理器的内部总线相连。该装置1500具体用于MMU池。参见图15,该装置1500包括:接收模块1501、地址转换模块1502和发送模块1503。
接收模块1501,用于接收第一物理核发送的地址转换请求,该地址转换请求携带待转换的第一VA,第一物理核为多个物理核中的任一物理核;
地址转换模块1502,用于将第一VA转换为第一PA;
发送模块1503,用于向第一物理核发送地址转换响应,该地址转换响应携带第一PA。
可选地,地址转换模块1502具体用于:
从内存页表中查询第一VA所在的页表项,以得到第一VA对应的第一PA,内存页表记录有内存所有VA所在的页表项,一个页表项包含一个VA与一个PA之间的映射关系。
可选地,该多个MMU对应一个管理模块;地址转换模块1502具体用于:
通过管理模块从该多个MMU中选择一个MMU作为目标MMU;
通过目标MMU将第一VA转换为第一PA。
在本申请实施例中,处理器的多个物理核共享MMU池,即,由多个MMU为各物理核提供VA到PA的地址转换功能,而非一个物理核对应一个MMU。这样,即使某个物理核的地址转换需求较大,如在并发访问内存时,本方案也能够由多个MMU为该物理核提供服务,而不是限制由单个MMU为该物理核提供服务,从而提高地址转换效率,加快内存访问。
需要说明的是:上述实施例提供的地址转换的装置在进行地址转换时,仅以上述各功能模块的划分进行举例说明,实际应用中,可以根据需要而将上述功能分配由不同的功能模块完成,即将装置的内部结构划分成不同的功能模块,以完成以上描述的全部或者部分功能。另外,上述实施例提供的地址转换的装置与地址转换的方法实施例属于同一构思,其具体实 现过程详见方法实施例,这里不再赘述。
在上述实施例中,可以全部或部分地通过软件、硬件、固件或者其任意结合来实现。当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。所述计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行所述计算机指令时,全部或部分地产生按照本申请实施例所述的流程或功能。所述计算机可以是通用计算机、专用计算机、计算机网络或其他可编程装置。所述计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一个计算机可读存储介质传输,例如,所述计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线(例如:同轴电缆、光纤、数据用户线(digital subscriber line,DSL))或无线(例如:红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。所述计算机可读存储介质可以是计算机能够存取的任何可用介质,或者是包含一个或多个可用介质集成的服务器、数据中心等数据存储设备。所述可用介质可以是磁性介质(例如:软盘、硬盘、磁带)、光介质(例如:数字通用光盘(digital versatile disc,DVD))或半导体介质(例如:固态硬盘(solid state disk,SSD))等。值得注意的是,本申请实施例提到的计算机可读存储介质可以为非易失性存储介质,换句话说,可以是非瞬时性存储介质。
应当理解的是,本文提及的“至少一个”是指一个或多个,“多个”是指两个或两个以上。在本申请实施例的描述中,除非另有说明,“/”表示或的意思,例如,A/B可以表示A或B;本文中的“和/或”仅仅是一种描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。另外,为了便于清楚描述本申请实施例的技术方案,在本申请的实施例中,采用了“第一”、“第二”等字样对功能和作用基本相同的相同项或相似项进行区分。本领域技术人员可以理解“第一”、“第二”等字样并不对数量和执行次序进行限定,并且“第一”、“第二”等字样也并不限定一定不同。
需要说明的是,本申请实施例所涉及的信息(包括但不限于用户设备信息、用户个人信息等)、数据(包括但不限于用于分析的数据、存储的数据、展示的数据等)以及信号,均为经用户授权或者经过各方充分授权的,且相关数据的收集、使用和处理需要遵守相关国家和地区的相关法律法规和标准。
以上所述为本申请提供的实施例,并不用以限制本申请,凡在本申请的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本申请的保护范围之内。

Claims (18)

  1. 一种处理器,其特征在于,所述处理器包括多个物理核和一个内存管理单元MMU池,所述MMU池包括多个MMU,所述多个物理核与所述MMU池通过所述处理器的内部总线相连,所述MMU池为所述多个物理核提供从虚拟地址VA到物理地址PA的地址转换功能。
  2. 如权利要求1所述的处理器,其特征在于,所述MMU池还为所述处理器的外设提供所述地址转换功能。
  3. 如权利要求1或2所述的处理器,其特征在于,所述MMU池位于所述处理器的本地代理HA或内存控制器MC。
  4. 如权利要求1-3任一所述的处理器,其特征在于,
    第一物理核,用于向所述MMU池发送地址转换请求,所述地址转换请求携带待转换的第一VA,所述第一物理核为所述多个物理核中的任一物理核;
    所述MMU池,用于接收所述地址转换请求,将所述第一VA转换为第一PA,向所述第一物理核发送地址转换响应,所述地址转换响应携带所述第一PA;
    所述第一物理核,还用于接收所述地址转换响应。
  5. 如权利要求4所述的处理器,其特征在于,
    所述第一物理核,用于从所述第一物理核对应的转译后备缓存器TLB中查询所述第一VA所在的页表项,如果所述TLB中未缓存所述第一VA所在的页表项,则向所述MMU池发送所述地址转换请求,一个页表项包含一个VA与一个PA之间的映射关系;
    所述MMU池,用于从内存页表中查询所述第一VA所在的页表项,以得到所述第一VA对应的所述第一PA,所述内存页表记录有内存所有VA所在的页表项。
  6. 如权利要求5所述的处理器,其特征在于,所述内存页表为一级页表或多级页表。
  7. 一种地址转换的方法,其特征在于,处理器包括多个物理核和一个内存管理单元MMU池,所述MMU池包括多个MMU,所述多个物理核与所述MMU池通过所述处理器的内部总线相连;所述方法包括:
    第一物理核向所述MMU池发送地址转换请求,所述地址转换请求携带待转换的第一虚拟地址VA,所述第一物理核为所述多个物理核中的任一物理核;
    所述第一物理核接收所述MMU池发送的地址转换响应,所述地址转换响应携带所述第一VA对应的第一物理地址PA。
  8. 如权利要求7所述的方法,其特征在于,所述第一物理核向所述MMU池发送地址转换请求之前,还包括:
    所述第一物理核从所述第一物理核对应的转译后备缓存器TLB中查询所述第一VA所在的页表项,一个页表项包含一个VA与一个PA之间的映射关系;
    如果所述TLB中未缓存所述第一VA所在的页表项,则所述第一物理核执行所述向所述MMU池发送地址转换请求的操作。
  9. 一种地址转换的方法,其特征在于,处理器包括多个物理核和一个内存管理单元MMU池,所述MMU池包括多个MMU,所述多个物理核与所述MMU池通过所述处理器的内部总线相连;所述方法包括:
    所述MMU池接收第一物理核发送的地址转换请求,所述地址转换请求携带待转换的第一虚拟地址VA,所述第一物理核为所述多个物理核中的任一物理核;
    所述MMU池将所述第一VA转换为第一物理地址PA;
    所述MMU池向所述第一物理核发送地址转换响应,所述地址转换响应携带所述第一PA。
  10. 如权利要求9所述的方法,其特征在于,所述MMU池将所述第一VA转换为第一物理地址PA,包括:
    所述MMU池从内存页表中查询所述第一VA所在的页表项,以得到所述第一VA对应的所述第一PA,所述内存页表记录有内存所有VA所在的页表项,一个页表项包含一个VA与一个PA之间的映射关系。
  11. 如权利要求9所述的方法,其特征在于,所述多个MMU对应一个管理模块;
    所述MMU池将所述第一VA转换为第一物理地址PA,包括:
    所述MMU池通过所述管理模块从所述多个MMU中选择一个MMU作为目标MMU;
    所述MMU池通过所述目标MMU将所述第一VA转换为所述第一PA。
  12. 一种地址转换的装置,其特征在于,处理器包括多个物理核和一个内存管理单元MMU池,所述MMU池包括多个MMU,所述多个物理核与所述MMU池通过所述处理器的内部总线相连;所述装置用于第一物理核,所述第一物理核为所述多个物理核中的任一物理核;
    所述装置包括:
    发送模块,用于向所述MMU池发送地址转换请求,所述地址转换请求携带待转换的第一虚拟地址VA;
    接收模块,用于接收所述MMU池发送的地址转换响应,所述地址转换响应携带所述第一VA对应的第一物理地址PA。
  13. 如权利要求12所述的装置,其特征在于,所述装置还包括:
    查表模块,用于从所述第一物理核对应的转译后备缓存器TLB中查询所述第一VA所在的页表项,一个页表项包含一个VA与一个PA之间的映射关系;
    触发模块,用于如果所述TLB中未缓存所述第一VA所在的页表项,则触发所述发送模块执行所述向所述MMU池发送地址转换请求的操作。
  14. 一种地址转换的装置,其特征在于,处理器包括多个物理核和一个内存管理单元MMU池,所述MMU池包括多个MMU,所述多个物理核与所述MMU池通过所述处理器的内部总线相连;所述装置用于所述MMU池,所述装置包括:
    接收模块,用于接收第一物理核发送的地址转换请求,所述地址转换请求携带待转换的第一虚拟地址VA,所述第一物理核为所述多个物理核中的任一物理核;
    地址转换模块,用于将所述第一VA转换为第一物理地址PA;
    发送模块,用于向所述第一物理核发送地址转换响应,所述地址转换响应携带所述第一PA。
  15. 如权利要求14所述的装置,其特征在于,所述地址转换模块具体用于:
    从内存页表中查询所述第一VA所在的页表项,以得到所述第一VA对应的所述第一PA,所述内存页表记录有内存所有VA所在的页表项,一个页表项包含一个VA与一个PA之间的映射关系。
  16. 如权利要求14所述的装置,其特征在于,所述多个MMU对应一个管理模块;
    所述地址转换模块具体用于:
    通过所述管理模块从所述多个MMU中选择一个MMU作为目标MMU;
    通过所述目标MMU将所述第一VA转换为所述第一PA。
  17. 一种计算机可读存储介质,其特征在于,所述存储介质内存储有计算机程序,所述计算机程序被计算机执行时实现权利要求7至8任一所述的方法的步骤,或者,实现权利要求9至11任一所述的方法的步骤。
  18. 一种计算机程序产品,其特征在于,所述计算机程序产品内存储有计算机指令,所述计算机指令被计算机执行时实现权利要求7至8任一所述的方法的步骤,或者,实现权利要求9至11任一所述的方法的步骤。
PCT/CN2022/110069 2021-11-25 2022-08-03 处理器、地址转换的方法、装置、存储介质及程序产品 WO2023093122A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
EP22897217.0A EP4418133A4 (en) 2021-11-25 2022-08-03 PROCESSOR, ADDRESS TRANSLATION METHOD AND APPARATUS, STORAGE MEDIUM AND PROGRAM PRODUCT
US18/673,967 US20240330202A1 (en) 2021-11-25 2024-05-24 Processor, Address Translation Method and Apparatus, Storage Medium, and Program Product

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
CN202111417305.9 2021-11-25
CN202111417305 2021-11-25
CN202210087387.3A CN116166577A (zh) 2021-11-25 2022-01-25 处理器、地址转换的方法、装置、存储介质及程序产品
CN202210087387.3 2022-01-25

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US18/673,967 Continuation US20240330202A1 (en) 2021-11-25 2024-05-24 Processor, Address Translation Method and Apparatus, Storage Medium, and Program Product

Publications (1)

Publication Number Publication Date
WO2023093122A1 true WO2023093122A1 (zh) 2023-06-01

Family

ID=86411897

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/110069 WO2023093122A1 (zh) 2021-11-25 2022-08-03 处理器、地址转换的方法、装置、存储介质及程序产品

Country Status (4)

Country Link
US (1) US20240330202A1 (zh)
EP (1) EP4418133A4 (zh)
CN (1) CN116166577A (zh)
WO (1) WO2023093122A1 (zh)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5983332A (en) * 1996-07-01 1999-11-09 Sun Microsystems, Inc. Asynchronous transfer mode (ATM) segmentation and reassembly unit virtual address translation unit architecture
CN103914405A (zh) * 2013-01-07 2014-07-09 三星电子株式会社 包括存储管理单元的片上系统及其存储地址转换方法
CN105302765A (zh) * 2014-07-22 2016-02-03 电信科学技术研究院 一种系统级芯片及其内存访问管理方法
CN112748848A (zh) * 2019-10-29 2021-05-04 伊姆西Ip控股有限责任公司 用于存储管理的方法、设备和计算机程序产品

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9405700B2 (en) * 2010-11-04 2016-08-02 Sonics, Inc. Methods and apparatus for virtualization in an integrated circuit
US10114760B2 (en) * 2014-01-14 2018-10-30 Nvidia Corporation Method and system for implementing multi-stage translation of virtual addresses

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5983332A (en) * 1996-07-01 1999-11-09 Sun Microsystems, Inc. Asynchronous transfer mode (ATM) segmentation and reassembly unit virtual address translation unit architecture
CN103914405A (zh) * 2013-01-07 2014-07-09 三星电子株式会社 包括存储管理单元的片上系统及其存储地址转换方法
CN105302765A (zh) * 2014-07-22 2016-02-03 电信科学技术研究院 一种系统级芯片及其内存访问管理方法
CN112748848A (zh) * 2019-10-29 2021-05-04 伊姆西Ip控股有限责任公司 用于存储管理的方法、设备和计算机程序产品

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP4418133A4

Also Published As

Publication number Publication date
US20240330202A1 (en) 2024-10-03
CN116166577A (zh) 2023-05-26
EP4418133A1 (en) 2024-08-21
EP4418133A4 (en) 2024-09-11

Similar Documents

Publication Publication Date Title
US20200409603A1 (en) Data accessing method and apparatus, and medium
US8984255B2 (en) Processing device with address translation probing and methods
US8601235B2 (en) System and method for concurrently managing memory access requests
US20140006681A1 (en) Memory management in a virtualization environment
US20080034179A1 (en) Guard bands in very large virtual memory pages
JPH07200409A (ja) 仮想アドレスを物理アドレスに変換する方法及び装置
JP2015507810A (ja) 仮想タグ付きキャッシュにおけるエイリアスアドレスのキャッシュヒット/ミスの判定、ならびに関連システムおよび方法
US20110145542A1 (en) Apparatuses, Systems, and Methods for Reducing Translation Lookaside Buffer (TLB) Lookups
US20140164716A1 (en) Override system and method for memory access management
US11836079B2 (en) Storage management apparatus, storage management method, processor, and computer system
CN111949572A (zh) 页表条目合并方法、装置及电子设备
WO2023165319A1 (zh) 内存访问方法、装置和输入输出内存管理单元
CN116594925B (zh) 一种地址转换系统、处理器、地址转换方法及电子设备
WO2023108938A1 (zh) 解决高速缓冲存储器地址二义性问题的方法和装置
US8706975B1 (en) Memory access management block bind system and method
US20120265944A1 (en) Assigning Memory to On-Chip Coherence Domains
WO2020185316A1 (en) In-memory normalization of cached objects to reduce cache memory footprint
WO2023093122A1 (zh) 处理器、地址转换的方法、装置、存储介质及程序产品
CA2816443A1 (en) Secure partitioning with shared input/output
WO2024001310A1 (zh) 一种数据处理设备及方法
US8700865B1 (en) Compressed data access system and method
CN116795740A (zh) 数据存取方法、装置、处理器、计算机系统及存储介质
CN115061955A (zh) 处理器、电子设备、地址翻译方法以及缓存页表项方法
US20230062909A1 (en) Sleep / wake-up performance enhancing for simultaneous address translation table walks
CN114925002A (zh) 电子装置、电子设备和地址转换方法

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22897217

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 2022897217

Country of ref document: EP

ENP Entry into the national phase

Ref document number: 2022897217

Country of ref document: EP

Effective date: 20240517

NENP Non-entry into the national phase

Ref country code: DE