WO2023155694A1 - 内存换页方法、系统及存储介质 - Google Patents

内存换页方法、系统及存储介质 Download PDF

Info

Publication number
WO2023155694A1
WO2023155694A1 PCT/CN2023/074406 CN2023074406W WO2023155694A1 WO 2023155694 A1 WO2023155694 A1 WO 2023155694A1 CN 2023074406 W CN2023074406 W CN 2023074406W WO 2023155694 A1 WO2023155694 A1 WO 2023155694A1
Authority
WO
WIPO (PCT)
Prior art keywords
physical address
hpa
page table
host
virtual machine
Prior art date
Application number
PCT/CN2023/074406
Other languages
English (en)
French (fr)
Inventor
徐云
Original Assignee
阿里云计算有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 阿里云计算有限公司 filed Critical 阿里云计算有限公司
Priority to EP23755697.2A priority Critical patent/EP4375836A1/en
Publication of WO2023155694A1 publication Critical patent/WO2023155694A1/zh

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45533Hypervisors; Virtual machine monitors
    • G06F9/45558Hypervisor-specific management and integration aspects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/0223User address space allocation, e.g. contiguous or non contiguous base addressing
    • G06F12/023Free address space management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/10Address translation
    • G06F12/1009Address translation using page tables, e.g. page table structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/10Address translation
    • G06F12/1081Address translation for peripheral access to main memory, e.g. direct memory access [DMA]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/12Replacement control
    • G06F12/121Replacement control using replacement algorithms
    • G06F12/122Replacement control using replacement algorithms of the least frequently used [LFU] type, e.g. with individual count value
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45533Hypervisors; Virtual machine monitors
    • G06F9/45558Hypervisor-specific management and integration aspects
    • G06F2009/45583Memory management, e.g. access or allocation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2212/00Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
    • G06F2212/15Use in a specific computing environment
    • G06F2212/151Emulated environment, e.g. virtual machine
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2212/00Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
    • G06F2212/65Details of virtual memory and virtual address translation
    • G06F2212/651Multi-level translation tables
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2212/00Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
    • G06F2212/65Details of virtual memory and virtual address translation
    • G06F2212/657Virtual address space management
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Definitions

  • the present application relates to the technical field of computers, and in particular to a memory paging method, system and storage medium.
  • Direct Pass-Through introduces the Input/Output Memory Management Unit (IOMMU), which dedicates the device to a single client by sacrificing the sharing capability of the device, so as to realize the Aim for full functionality and optimal performance.
  • the IOMMU can connect the direct memory access (Direct Memory Access, DMA) I/O bus and the memory of the host machine.
  • the IOMMU can convert the virtual address accessed by the pass-through device (device) into a physical address, and realize the access of the pass-through device to the host memory.
  • IOMMU page table replacement is required.
  • the page fault method can be used to replace the IOMMU page table.
  • the above method cannot be used to replace the page table. Therefore, how to implement page table replacement for an IOMMU that does not support page fault interrupts has become an urgent technical problem in the art.
  • aspects of the present application provide a memory paging method, system, and storage medium for implementing page table replacement for an IOMMU that does not support Page fault.
  • An embodiment of the present application provides a memory paging method, including:
  • the embodiment of the present application also provides a computing system, including: a host machine and a virtual machine management node;
  • the host machine is deployed with a virtual machine, and is mounted with a pass-through device for direct connection of the virtual machine; the host machine is also Including: IOMMU; the page table stored in the IOMMU records the corresponding relationship between the virtual machine physical address of the virtual machine and the host machine physical address; the pass-through device is based on the virtual machine physical address of the virtual machine and the host machine physical address The corresponding relationship between accessing the memory of the host machine;
  • the virtual machine management node is configured to: determine the physical address of the virtual machine requiring a page table update; determine the physical address of the first host machine corresponding to the physical address of the virtual machine from the page table stored in the IOMMU; from the host machine In the free physical address of the memory, determine the physical address of the second host machine; copy the data stored in the physical address of the first host machine to the physical address of the second host machine; The physical address of the host machine corresponding to the physical address of the second host machine is updated to the physical address of the second host machine.
  • the embodiment of the present application also provides a computer-readable storage medium storing computer instructions, and when the computer instructions are executed by one or more processors, the one or more processors are caused to perform the above-mentioned memory paging method. A step of.
  • An embodiment of the present application further provides a computer program product, including: a computer program; when the computer program is executed by a processor, the processor is caused to execute the steps in the above memory paging method.
  • the VMM can determine the GPA that needs page update, and determine the first HPA corresponding to the GPA from the page table stored in the IOMMU of the host machine; and from the memory of the host machine In the free physical address, determine the second HPA that replaces the first HPA; further, the data stored in the first HPA can be copied to the second HPA; and the HPA corresponding to the GPA that needs to be updated in the page table record of the IOMMU is updated to the second HPA HPA implements page table replacement for IOMMUs that do not support Page fault.
  • FIG. 1 is a schematic structural diagram of a computing system provided by an embodiment of the present application.
  • FIG. 2 and FIG. 3 are schematic flow charts of a memory paging method provided in an embodiment of the present application.
  • the VMM can determine the GPA that needs page update, and determine the first HPA corresponding to the GPA from the page table stored in the IOMMU of the host machine; and from the memory of the host machine In the free physical address, determine the second HPA that replaces the first HPA; further, the data stored in the first HPA can be copied to the second HPA; and the HPA corresponding to the GPA that needs to be updated in the page table record of the IOMMU is updated to the second HPA HPA implements page table replacement for IOMMUs that do not support Page fault.
  • FIG. 1 is a schematic structural diagram of a computing system provided by an embodiment of the present application. As shown in FIG. 1 , the computing system includes: a host machine 10 and a virtual machine management node (VMM) 20 .
  • VMM virtual machine management node
  • the host machine 10 refers to a computer device with computing, storage and communication functions, such as a server device.
  • the host computer 10 may be a single server device, or may be a cloud-based server array or the like.
  • the host machine 10 may also be a terminal device such as a mobile phone, a tablet computer, a personal computer, or a wearable device.
  • the host machine 10 is deployed with a virtual machine (Virtual Machine, VM) 101.
  • the virtual machine 101 may also be referred to as a virtual machine (Guest) of the host machine 10 .
  • VM 101 has independent CPU, memory, network and disk.
  • the CPU corresponding to the virtual machine 101 may also be called the virtual machine CPU, that is, the vCPU 101a shown in FIG. 1 .
  • the VMM 20 refers to a logical node for managing virtual machines, which can be deployed on the host machine 10, or can be deployed on other physical machines communicating with the host machine 10.
  • the VMM 20 can run on the CPU of the host machine 10 (not shown in FIG. 1 ).
  • the VMM 20 can perform task scheduling, load balancing, status monitoring, etc. for the VM 101.
  • the VM 101 can directly access the I/O hardware mounted on the host machine 10 through the VMM 20, so that the I/O operation path of the VM 101 is almost the same as the I/O in a non-virtualized environment.
  • the paths are the same.
  • the I/O hardware is the pass-through device 102 .
  • the pass-through device 102 refers to an I/O hardware device mounted on the host machine 10, and may include: a network card, a storage medium, and the like.
  • the storage medium may be a persistent storage medium such as a disk or a hard disk.
  • the memory of the host machine 10 For the pass-through device 102, the memory of the host machine 10, that is, the host machine memory 103, can be accessed through direct memory access (DMA). For the DMA mode, there is a data path between the pass-through device 102 and the host memory 103 to realize direct data transmission between the pass-through device 102 and the host memory 103 . During the data transmission process between the pass-through device 102 and the memory 103 of the host machine, the CPU of the host machine 10 does not need to participate.
  • DMA direct memory access
  • the operating system (guest OS) running on the VM 101 usually does not know the physical memory address of the host it accesses, which can also be called the host physical address (HPA). If you're doing a DMA operation, you risk corrupting memory. Because the passthrough device 102 can only know the physical memory address of the virtual machine, which can also be called the virtual machine physical address (Guest physical address, GPA), and does not know the mapping relationship between the GPA and the HPA. Therefore, the IOMMU 104 is introduced in the pass-through technique.
  • GPA virtual machine physical address
  • the IOMMU 104 can be communicatively connected between the pass-through device 102 and the host memory 103.
  • the IOMMU 104 can communicate with the pass-through device 102 and the host memory 103 through a serial interface bus.
  • the serial interface bus may be a PCI interface bus, a PCIe interface bus, or the like.
  • the IOMMU 104 can ensure that the pass-through device 102 can access the host memory 103 when performing DMA operations.
  • the pass-through device 102 can access all memory address spaces of the host 10 through DMA.
  • the CPU 101a of the virtual machine can be A GPA accessible to the pass-through device 102 is assigned, and the VMM 20 assigns a corresponding HPA to the GPA.
  • IOMMU 104 can maintain a mapping table between GPA and HPA, which can also be called a page table. The page table records the mapping relationship between GPA and HPA.
  • the VMM 20 can capture the DMA request sent by the pass-through device 102 and transparently transmit the DMA request to the IOMMU 104 .
  • the IOMMU 104 can obtain the GPA to be accessed from the DMA request; then, match the GPA to be accessed in the page table stored in the IOMMU 104 to obtain the HPA corresponding to the GPA to be accessed.
  • the memory space corresponding to the host memory 103 can be accessed through the HPA corresponding to the GPA to be accessed.
  • the page table stored by the IOMMU 104 has a page table replacement requirement. For example, when the HPA corresponding to one or some GPAs changes, the page table stored by the IOMMU 104 needs to be updated. For example, in the hot and cold memory page replacement scenario, if the hot page is not used for a long time, it can be converted to a cold page and allocated to the DMA request. This is because for DMA requests, contiguous pages of memory are required, so cold pages need to be allocated. The cold page above means that the free page is no longer in the cache, and the hot page means that the free page is still in the cache.
  • VMM 20 can reclaim the physical memory from the page table stored in the IOMMU when necessary, that is, reclaim the memory corresponding to the HPA; and store the data stored in the physical memory in the page table file (Paging File) on the hard disk, which can ensure Data is not lost, and the freed physical memory pages are used by other processes.
  • Paging File Page File
  • the VMM 20 can search for data from the Paging File, allocate free physical memory pages, and write the data into the physical memory pages, and then map the HPA of the new physical memory pages to The process needs to operate in the virtual space corresponding to the GPA, and write it into the page table stored in the IOMMU to obtain the data and memory space required for access. Add an HPA corresponding to a physical memory page to the page table stored in OMMU, which is a Page Fault.
  • the embodiment of this application provides a new memory paging method.
  • the main implementation methods are as follows:
  • the VMM 20 can determine the GPA that needs page updating.
  • the specific implementation manner in which the VMM 20 determines the GPA(A) that needs to be updated in the page table is not limited.
  • VMM 20 may obtain page table update requests.
  • the page table update request includes GPA(A) requiring page table update.
  • the page table update request obtained by the VMM 20 may be a page table update request provided by the client corresponding to the VM 101.
  • the client is the computing device on the user side that applies for the virtual machine.
  • the page table update request may also be a page table update request sent by the operation and maintenance side device of the cloud server; and so on.
  • the VMM 20 can obtain the GPA (denoted as A) that requires page table update from the page table update request. Further, the VMM 20 can determine the HPA corresponding to the GPA (A) that needs to be updated from the page table stored in the IOMMU 104, denoted as Pa. Specifically, the VMM 20 can match the GPA(A) that needs page table updating in the page table stored in the IOMMU 104, so as to obtain the HPA(Pa) corresponding to the GPA(A) that needs page table updating.
  • VMM 20 Another HPA can also be determined from the free physical address of the host memory 103, denoted as Pr.
  • Pr the HPA corresponding to the GPA (A) that needs to be updated in the page table
  • Pa the HPA
  • Pr another HPA that is determined to replace the first HPA
  • the VMM 20 may select a free physical address from the free physical addresses of the host memory 103 as the second HPA (Pr).
  • the VMM 20 may determine the second HPA (Pr) from the free physical addresses of the host memory 103 according to a set memory page replacement algorithm.
  • Memory page replacement algorithms include, but are not limited to: Optimal Replacement Algorithm (OPT), First In First Out Algorithm (FIFO), Least Recently Used Algorithm (LRU), Clock Replacement Algorithm (CLOCK) or Improved Clock Replacement Algorithm.
  • the optimal replacement algorithm refers to the selected page that will never be used in the future, or the memory page that is no longer accessed for the longest time, and the HPA corresponding to the memory page is used as the second HPA (Pr ).
  • the first-in-first-out algorithm refers to arranging the memory pages transferred into the memory into a queue according to the order of transfer, and selecting the HPA corresponding to the memory page first entering the memory as the second HPA (Pr).
  • the least recently used algorithm means that the selected memory page is a memory page that has not been used recently; and the HPA corresponding to the selected memory page is determined as the second HPA (Pr).
  • the available access field records the time elapsed since the memory page was last accessed. When the page table needs to be replaced, select the memory page with the largest corresponding time value from the free memory pages as the most recently unused page. memory pages.
  • the clock replacement algorithm means that the memory pages in the memory are linked into a circular queue through the link pointer, and a field access bit field is added.
  • the access bit field corresponding to the page is set to 1; when the page is accessed subsequently, the access bit field is also set to 1.
  • the operating system scans the buffer to find the memory page whose access bit field is set to 0, and uses the HPA corresponding to the first scanned memory page set to - as Second HPA (Pr).
  • the VMM 20 can copy the data stored in the first HPA (Pa) to the second HPA (Pr);
  • the HPA corresponding to the GPA(A) that needs page table updating is updated to the second HPA(Pr).
  • the VMM can determine the GPA that needs page update, and determine the first HPA corresponding to the GPA from the page table stored in the IOMMU of the host computer; and from the host In the free physical address of the host memory, determine the second HPA that replaces the first HPA; further, copy the data stored in the first HPA to the second HPA; and update the HPA corresponding to the GPA that needs to be updated in the page table record of the IOMMU
  • page table replacement is implemented for IOMMUs that do not support Page fault.
  • the page table of the IOMMU 104 can also be refreshed to the input/output translation lookaside buffer (Input/Output Translation Lookaside Buffer, IOTLB).
  • IOTLB Input/Output Translation Lookaside Buffer
  • the IOTLB has a fixed number of space slots for storing the page table for mapping the virtual address to the physical address, that is, the page table of the above-mentioned IOMMU 104 .
  • the search keyword is the physical memory address of the virtual machine
  • the search result is the physical address (HPA) of the host machine. If the virtual machine physical address (GPA) of the DMA request exists in the IOTLB, it can be Increase the address translation rate, and then you can use the obtained host physical address (HPA) to access the host memory.
  • GPA virtual machine physical address
  • the host machine 10 not only supports the access of the pass-through device 104 to the memory of the host machine, but also supports the access of the CPU (vCPU 101a) of the virtual machine to the memory of the host machine.
  • the vCPU 101a can apply for the address space (Virtual Address Space, VAS) of the virtual machine, that is, GPA. Since VAS is not a real physical memory space, the operating system must map VAS to physical memory space so that the process can store the process context (process context).
  • VAS Virtual Address Space
  • a memory management unit (MMU) 105 may be provided for the host computer 10 .
  • the page table maintained by the MMU 105 records the correspondence between the physical address of the virtual machine and the physical address of the host machine.
  • the MMU 105 can convert the GPA accessed by the vCPU 101a into an HPA based on the correspondence between the physical address of the virtual machine recorded in the page table and the physical address of the host machine, so that the vCPU 101a can access the memory space corresponding to the HPA.
  • the page table of the MMU 105 needs to be updated synchronously. This is because if the HPA in the IOMMU 104 update page table is updated, the page table in the MMU 105 is not updated synchronously, which will cause the GPA(A) that needs to be updated in the page table of the IOMMU 104 and the page table in the MMU 105 Corresponding to different HPAs; after the IOMMU 104 updates the HPA in the page table, the data stored in the original HPA (the first HPA) has been updated to the replaced HPA (the second HPA), and the original HPA has released or stored other data , which will cause the MMU 105 to still access the first HPA (Pa) when accessing the above-mentioned GPA that requires a page update, resulting in missing pages or errors.
  • the VMM 20 after the VMM 20 determines the GPA (A) that needs page table updating, it can also delete the GPA that needs to be updated from the page table of the MMU 105; Updated GPA(A) visit.
  • the VMM 20 can block the access of the vCPU 101a to the GPA(A) that needs page table updating when executing the page fault (page table interruption) process. Specifically, when the VMM 20 obtains the access request from the vCPU 101a to the GPA(A) that needs page table updating, it can use the GPA(A) that needs page table updating to query the page table of the MMU 105. Since the GPA (A) that needs page table updating has been deleted in the page table of MMU 105, the HPA corresponding to GPA (A) that needs page table updating cannot be found in the page table of MMU 105, and the page of MMU 105 can be triggered. The table interrupts the flow.
  • the page table replacement of MMU 105 can be locked during the process of executing the page table interruption process, thereby blocking the vCPU 101a An access to GPA(A) that requires a page table update. If the page table replacement of the MMU 105 is not locked, the MMU 105 executes the page table interruption process, and will reallocate the HPA for the GPA recorded in the MMU 105 page table that needs to be updated. The corresponding HPA in the page table of 104 is inconsistent.
  • the above-mentioned GPA to be updated can be added to the page table of the MMU 105, and the HPA corresponding to the GPA to be updated can be updated to the second HPA (Pr). Further, the access of the vCPU 101a to the above-mentioned GPA that needs to be updated can be restored.
  • the VMM 20 during the page table update process of the IOMMU, there may be data update for the data stored in the HPA corresponding to the GPA(A) that needs page table update.
  • User's HPA corresponding to GPA(A) that requires page table update The VMM 20 is unaware of the update of the stored data. Therefore, the VMM 20 cannot determine whether the data update occurs after the page table update of the IOMMU or during the update process of the page table of the IOMMU.
  • the data stored in the HPA corresponding to the GPA (A) that needs page table updating is updated, it occurs after the data stored in the first HPA (Pa) has been copied to the second HPA (Pr), but after the page of the IOMMU 104 is copied Before the HPA corresponding to the GPA (A) that needs to be updated in the page table recorded in the table is updated to the second HPA (Pr), the data stored in the HPA corresponding to the GPA (A) that needs to be updated in the page table is updated to the first HPA If the data stored in (Pa) is updated, the data stored in the second HPA (Pr) is the data before the update, and synchronous data update cannot be realized.
  • a first HPA (Ps) for temporarily storing a snapshot of data in the second HPA (Pr) may also be applied.
  • the first HPA (Ps) can be determined from the free physical address of the host computer memory; and after the data stored in the first HPA (Pa) is copied to the second HPA (Pr), the second HPA (Pr) The stored data is copied to the first HPA (Ps), ie the data stored in the first HPA (Ps) is a snapshot of the data stored in the second HPA (Pr).
  • the data stored in the first HPA (Pa) may be compared whether the data stored in the first HPA (Pa) is the same as the data stored in the second HPA (Pr).
  • byte by byte may be used to compare whether the data stored in the first HPA (Pa) is the same as the data stored in the second HPA (Pr). If the data stored in the first HPA (Pa) is the same as the data stored in the second HPA (Pr), it means that during the IOMMU page table update process, the data stored in the HPA corresponding to the GPA (A) that needs page table update is not updated.
  • the memory space corresponding to the physical address of the third host can be released. Further, the above-mentioned GPA that needs to be updated can also be added to the page table of the MMU 105, and the HPA corresponding to the GPA that needs to be updated can be updated to the second HPA (Pr). Further, the access of the vCPU 101a to the above-mentioned GPA that needs to be updated can be restored.
  • the data stored in the first HPA (Pa) and the data stored in the second HPA (Pr) have different data, it means that during the IOMMU page table update process, the data stored in the HPA corresponding to the GPA (A) that needs page table update is updated. . Since the VMM 20 cannot perceive the timing of updating the data stored in the HPA, in the embodiment of the present application, it is also necessary to determine the timing of updating the data stored in the HPA corresponding to the GPA (A) that needs page table updating.
  • the data stored in the HPA corresponding to the GPA (A) that needs page table updating is updated, it occurs after the first HPA (Pa) of the page table record of the IOMMU 104 is updated to the second HPA (Pr), indicating that the HPA of the data update occurs is the second HPA(Pr). Since the HPA corresponding to the GPA (A) that needs page table updating in the page table record of IOMMU 104 is already the second HPA (Pr), therefore, when the pass-through device 102 accesses the GPA (A) that needs page table updating, it accesses The memory space of the second HPA (Pr) accesses the updated data, and the access accuracy is relatively high.
  • the data stored in the HPA corresponding to the GPA (A) that needs page table updating is updated, it occurs after the data of the first HPA (Pa) is copied to the second HPA (Pr), but the page table record in the IOMMU 104 Before the first HPA (Pa) is updated to the second HPA (Pr), it means that the HPA where the data update occurs is the first HPA (Pa), and the data stored in the second HPA (Pr) is the data before the update.
  • the passthrough device 102 accesses the GPA (A) that needs page table updating, it accesses the second HPA (Pr). ) memory space, the data before the update is accessed, and the access accuracy is low.
  • the logical address of the different data can be determined.
  • the logical address refers to a relative address of different data in the HPA.
  • the data of the Nth byte is not equal.
  • the first logical address b with different data it can be judged whether the data Pr (b) stored in the second HPA (Pr) at the first logical address b is the same as the data Pr (b) stored in the first HPA (Ps) Whether the data Ps(b) whose data is at the first logical address b are the same. If the judgment result is negative, that is, Pr(b) is not equal to Ps(b), it means that the second HPA(Pr) has undergone data update.
  • the pass-through device 102 accesses the memory space of the second HPA (Pr) and the updated data when accessing the GPA (A) that requires page table updating, Pr (b) is not equal to Ps (b ), the memory space corresponding to the first HPA(Ps) may be released. Further, the above-mentioned GPA that needs to be updated can also be added to the page table of the MMU 105, and the HPA corresponding to the GPA that needs to be updated can be updated to the second HPA (Pr). Further, the access of the vCPU 101a to the above-mentioned GPA that needs to be updated can be restored.
  • the second HPA (Pr) can be stored
  • the data at the first logical address b among the data in the data is updated to the data at the first logical address b among the data stored at the physical address of the first host, that is, Pr(b) is updated to Pa(b).
  • the memory space corresponding to the first HPA(Ps) may be released.
  • the above-mentioned GPA that needs to be updated can also be added to the page table of the MMU 105, and the HPA corresponding to the GPA that needs to be updated can be updated to the second HPA (Pr). Further, the access of the vCPU 101a to the above-mentioned GPA that needs to be updated can be restored.
  • the logical steps performed can be completed by atomic instructions.
  • the operation will not be interrupted by the thread scheduling mechanism; once this operation starts, it will run until the end without any context switch (switching to another thread) in the middle. In this way, it can be guaranteed that when the data stored in the first HPA (Pa) is different from the data stored in the second HPA (Pr), the execution of the logical steps will not be interrupted, and it can be guaranteed that in the process of performing the logical steps , the data in the second HPA(Pr) will not be updated again.
  • the atomic instruction can ensure the page table modification timing when the IOMMU page table is updated, and when the DMA write request of the pass-through device 102 is concurrent, it can still ensure that the data stored in the first HPA (Pa) is different from the data stored in the second HPA (Pr) In the case of data, the atomicity of the logical steps performed.
  • the embodiment of the present application may also define that the pass-through device 102 The HPA can only be written at most once.
  • a data update occurs after the data of the first HPA (Pa) is copied to the second HPA (Pr), but before the first HPA (Pa) recorded in the page table of the IOMMU 104 is updated to the second HPA (Pr) , updated is the data of the first HPA (Pa); 2 data updates after the first HPA (Pa) of the page table record of the IOMMU 104 is updated to the second HPA (Pr), and the updated is the second HPA ( Pr) data.
  • all three data updates update data at logical address c.
  • Pr(c) is updated to Pa(c)
  • an error will occur in Pr(c), because the data update of the logical address c of the second HPA(Pr) is compared with the logical address of the first HPA(Pa) c is late for data update. Therefore, in the embodiment of the present application, it is limited that the pass-through device 102 can only write to each HPA once at most. In this way, the above-mentioned problem of data update error can be prevented.
  • the embodiments of the present application also provide a corresponding memory paging method.
  • An exemplary description will be given below in conjunction with specific embodiments.
  • FIG. 2 is a schematic flowchart of a memory paging method provided by an embodiment of the present application. As shown in Figure 2, the memory paging method includes:
  • step S201 the GPA that needs page updating can be determined.
  • the specific implementation manner of determining the GPA(A) that needs to be updated in the page table is not limited.
  • a page table update request may be obtained.
  • the GPA (denoted as A) requiring page table update may be obtained from the page table update request.
  • step S202 from the page table stored in the IOMMU, the first HPA corresponding to the GPA(A) that requires page table update may be determined, denoted as Pa.
  • step S203 the second HPA can also be determined from the free physical address of the host memory, which is denoted as Pr.
  • Pr the free physical address of the host memory
  • step S204 After determining the second HPA (Pr) to replace the first HPA (Pa), in step S204, the data stored in the first HPA (Pa) can be copied to the second HPA (Pr); and in step S205, The HPA corresponding to the GPA (A) that needs to be updated in the page table recorded in the IOMMU's page table is updated to the second HPA (Pr).
  • the VMM can determine that a page update is required GPA, from the page table stored in the IOMMU of the host computer, determine the first HPA corresponding to the GPA; and from the free physical address of the host computer memory, determine the second HPA that replaces the first HPA; further, the first HPA can be The stored data is copied to the second HPA; and the HPA corresponding to the GPA that needs to be updated in the page table record of the IOMMU is updated to the second HPA, and the page table is replaced for the IOMMU that does not support Page fault.
  • the IOMMU page table can also be refreshed to the IOTLB.
  • the DMA request sent by the pass-through device can be preferentially matched in the IOTLB. If the virtual machine physical address (GPA) requested by the DMA exists in the IOTLB, the address translation rate can be increased, and then the obtained host physical address (HPA) can be used to access the host memory.
  • GPS virtual machine physical address
  • HPA host physical address
  • the host computer not only supports the access of the passthrough device to the memory of the host computer, but also supports the access of the CPU (vCPU) of the virtual machine to the memory of the host computer.
  • a memory management unit can be set for the host computer.
  • the page table maintained by the MMU records the correspondence between the physical address of the virtual machine and the physical address of the host.
  • the MMU can convert the GPA accessed by the vCPU into an HPA based on the correspondence between the physical address of the virtual machine recorded in the page table and the physical address of the host, so that the vCPU can access the memory space corresponding to the HPA.
  • the page table of the MMU needs to be updated synchronously. Therefore, after step S201, the GPA that needs to be updated can also be deleted from the page table of the MMU; and the access of the vCPU to the GPA (A) that needs to be updated is blocked.
  • the MMU Since the MMU supports page fault, it can block the vCPU's access to the GPA(A) that needs page table update when the page fault (page table interruption) process is executed. Specifically, when the VMM acquires the vCPU's access request to the GPA(A) that needs page table updating, it can use the GPA(A) that needs page table updating to query the page table of the MMU. Since the GPA(A) that needs page table update has been deleted in the MMU page table, the HPA corresponding to the GPA(A) that needs page table update cannot be queried in the MMU page table, which can trigger the MMU page table interrupt process .
  • the page table interruption process is executed.
  • the page table replacement of the MMU can be locked during the process of executing the page table interruption process, thereby blocking the access of the vCPU to the GPA(A) that needs to be updated.
  • GPA(A) accesses for page table updates.
  • the above-mentioned GPA to be updated may be added to the page table of the MMU, and the HPA corresponding to the GPA to be updated may be updated to the second HPA (Pr). Further, the access of the vCPU to the aforementioned GPA that needs to be updated may be restored.
  • the VMM during the page table update process of the IOMMU, there may be data update for the data stored in the HPA corresponding to the GPA(A) that needs page table update.
  • the VMM is unaware of the user's update of the data stored in the HPA corresponding to the GPA(A) that needs page table update. Therefore, the VMM 20 cannot determine whether the data update occurs after the page table update of the IOMMU or during the update process of the page table of the IOMMU.
  • the data stored in the HPA corresponding to the GPA (A) that needs page table updating is updated, it occurs after the data stored in the first HPA (Pa) has been copied to the second HPA (Pr), but after the page table of the IOMMU is copied Before the HPA corresponding to the GPA (A) that needs to be updated in the page table is updated to the second HPA (Pr), the number stored in the HPA corresponding to the GPA (A) that needs to be updated in the page table
  • the update of the data is to update the data stored in the first HPA (Pa), which results in the data stored in the second HPA (Pr) being the data before the update, and synchronous data update cannot be realized.
  • a first HPA (Ps) for temporarily storing a snapshot of data in the second HPA (Pr) may also be applied.
  • the first HPA (Ps) can be determined from the free physical address of the host computer memory; and after the data stored in the first HPA (Pa) is copied to the second HPA (Pr), the second HPA (Pr) The stored data is copied to the first HPA (Ps), ie the data stored in the first HPA (Ps) is a snapshot of the data stored in the second HPA (Pr).
  • the data stored in the first HPA (Pa) may be compared whether the data stored in the first HPA (Pa) is the same as the data stored in the second HPA (Pr).
  • byte by byte may be used to compare whether the data stored in the first HPA (Pa) is the same as the data stored in the second HPA (Pr). If the data stored in the first HPA (Pa) is the same as the data stored in the second HPA (Pr), it means that during the IOMMU page table update process, the data stored in the HPA corresponding to the GPA (A) that needs page table update is not updated.
  • the memory space corresponding to the physical address of the third host can be released.
  • the above-mentioned GPA that needs to be updated may also be added to the page table of the MMU, and the HPA corresponding to the GPA that needs to be updated is updated to the second HPA (Pr). Further, the access of the vCPU to the aforementioned GPA that needs to be updated may be restored.
  • the data stored in the first HPA (Pa) and the data stored in the second HPA (Pr) have different data, it means that during the IOMMU page table update process, the data stored in the HPA corresponding to the GPA (A) that needs page table update is updated. .
  • logical addresses with different data may be determined.
  • the logical address refers to a relative address of different data in the HPA. For example, the data of the Nth byte is not equal.
  • the first logical address b with different data it can be judged whether the data Pr (b) stored in the second HPA (Pr) at the first logical address b is the same as the data Pr (b) stored in the first HPA (Ps) Whether the data Ps(b) whose data is at the first logical address b are the same. If the judgment result is negative, that is, Pr(b) is not equal to Ps(b), it means that the second HPA(Pr) has undergone data update.
  • the pass-through device 102 accesses the memory space of the second HPA (Pr) and the updated data when accessing the GPA (A) that requires page table updating, Pr (b) is not equal to Ps (b ), the memory space corresponding to the first HPA(Ps) may be released. Further, the above-mentioned GPA that needs to be updated may also be added to the page table of the MMU, and the HPA corresponding to the GPA that needs to be updated is updated to the second HPA (Pr). Further, the access of the vCPU to the aforementioned GPA that needs to be updated may be restored.
  • the second HPA (Pr) can be stored
  • the data at the first logical address b among the data in the data is updated to the data at the first logical address b among the data stored at the physical address of the first host, that is, Pr(b) is updated to Pa(b).
  • the memory space corresponding to the first HPA(Ps) may be released.
  • the above-mentioned GPA that needs to be updated may also be added to the page table of the MMU, and the HPA corresponding to the GPA that needs to be updated is updated to the second HPA (Pr). Further, the access of the vCPU to the aforementioned GPA that needs to be updated may be restored.
  • the memory paging method mainly includes:
  • step S9 By byte as a unit, compare whether the data stored in the first HPA (Pa) is the same as the data stored in the third HPA (Ps) byte by byte. If they are the same, go to step S12. If the data is different, execute step S10.
  • the embodiment of the present application can also define a pair of pass-through devices. Each HPA can only be written at most once. For specific cause analysis, reference may be made to relevant content of the foregoing system embodiments, which will not be repeated here.
  • the subject of execution of each step of the method provided in the foregoing embodiments may be the same device, or the method may also be executed by different devices.
  • the execution subject of steps S201 and S202 may be device A;
  • the execution subject of step S201 may be device A, and the execution subject of step S202 may be device B; and so on.
  • the embodiment of the present application also provides a computer-readable storage medium storing computer instructions, and when the computer instructions are executed by one or more processors, one or more processors are caused to perform the above-mentioned memory paging method. step.
  • An embodiment of the present application further provides a computer program product, including: a computer program; when the computer program is executed by a processor, the processor is caused to execute the steps in the memory paging method.
  • a computer program product may be implemented as a hypervisor.
  • the virtual machine management program may run on the CPU of the host machine of the virtual machine.
  • the embodiments of the present application may be provided as methods, systems, or computer program products. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.
  • computer-usable storage media including but not limited to disk storage, CD-ROM, optical storage, etc.
  • These computer program instructions may also be stored in a computer-readable memory capable of directing a computer or other programmable data processing apparatus to operate in a specific manner, such that the instructions stored in the computer-readable memory produce an article of manufacture comprising instruction means, the instructions
  • the device realizes the function specified in one or more procedures of the flowchart and/or one or more blocks of the block diagram.
  • a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
  • processors CPUs
  • input/output interfaces network interfaces
  • memory volatile and non-volatile memory
  • Memory may include non-permanent storage in computer readable media, in the form of random access memory (RAM) and/or nonvolatile memory such as read-only memory (ROM) or flash RAM. Memory is an example of computer readable media.
  • RAM random access memory
  • ROM read-only memory
  • flash RAM flash random access memory
  • the storage medium of the computer is a readable storage medium, which may also be referred to as a readable medium.
  • Readable storage media including both volatile and non-permanent, removable and non-removable media, may be implemented by any method or technology for information storage.
  • Information may be computer readable instructions, data structures, modules of a program, or other data.
  • Examples of computer storage media include, but are not limited to, phase change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), read only memory (ROM), Electrically Erasable Programmable Read-Only Memory (EEPROM), Flash memory or other memory technology, Compact Disc Read-Only Memory (CD-ROM), Digital Versatile Disc (DVD) or other optical storage, A magnetic tape cartridge, disk storage or other magnetic storage device or any other non-transmission medium that can be used to store information that can be accessed by a computing device.
  • computer-readable media excludes transitory computer-readable media, such as modulated data signals and carrier waves.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Memory System Of A Hierarchy Structure (AREA)

Abstract

本申请实施例提供一种内存换页方法、系统及存储介质。在本申请实施例中,针对不支持Page fault的IOMMU,VMM可确定需要页面更新的GPA,从宿主机的IOMMU存储的页表中,确定该GPA对应的第一HPA;并从宿主机内存的空闲物理地址中,确定替换第一HPA的第二HPA;进一步,可将第一HPA存储的数据复制至第二HPA;并将IOMMU的页表记录的需要更新的GPA对应的HPA更新为第二HPA,对不支持Page fault的IOMMU实现了页表更换。

Description

内存换页方法、系统及存储介质
本申请要求于2022年02月18日提交中国专利局、申请号为202210150479.1、申请名称为“内存换页方法、系统及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及计算机技术领域,尤其涉及一种内存换页方法、系统及存储介质。
背景技术
在计算机领域,直通技术(Direct Pass-Through)引入了输入/输出内存管理单元(Input/Output Memory Management Unit,IOMMU),通过牺牲设备的共享能力,将设备专用于单个客户机,以达到实现设备全部功能和最优性能的目的。IOMMU可连接直接内存访问(Direct Memory Access,DMA)I/O总线和宿主机的内存。IOMMU可把直通设备(device)访问的虚拟地址转化成物理地址,实现直通设备对宿主机内存的访问。
在一些应用场景下,例如冷热内存页更替场景,需要进行IOMMU页表更换。在设备直通场景下,对于支持缺页中断(Page fault)功能的IOMMU,可采用Page fault方式更换IOMMU页表。对于不支持缺页中断功能的IOMMU,无法使用上述方式更换页表。因此,如何对不支持缺页中断的IOMMU实现页表更换成为本领域亟待解决的技术问题。
发明内容
本申请的多个方面提供一种内存换页方法、系统及存储介质,用以对不支持Page fault的IOMMU实现页表更换。
本申请实施例提供一种内存换页方法,包括:
确定需要页表更新的虚拟机物理地址;
从宿主机的IOMMU存储的页表中,确定所述虚拟机物理地址对应的第一宿主机物理地址;
从宿主机内存的空闲物理地址中,确定第二宿主机物理地址;
将所述第一宿主机物理地址存储的数据复制至所述第二宿主机物理地址;
将所述IOMMU的页表记录的所述虚拟机物理地址对应的宿主机物理地址更新为所述第二宿主机物理地址。
本申请实施例还提供一种计算系统,包括:宿主机和虚拟机管理节点;
所述宿主机部署有虚拟机,并挂载有所述虚拟机直通的直通设备;所述宿主机还 包括:IOMMU;所述IOMMU存储的页表记录有虚拟机的虚拟机物理地址与宿主机物理地址之间的对应关系;所述直通设备基于所述虚拟机的虚拟机物理地址与宿主机物理地址之间的对应关系,访问所述宿主机的内存;
所述虚拟机管理节点,用于:确定需要页表更新的虚拟机物理地址;从所述IOMMU存储的页表中,确定所述虚拟机物理地址对应的第一宿主机物理地址;从宿主机内存的空闲物理地址中,确定第二宿主机物理地址;将所述第一宿主机物理地址存储的数据复制至所述第二宿主机物理地址;将所述IOMMU的页表记录的所述虚拟机物理地址对应的宿主机物理地址更新为所述第二宿主机物理地址。
本申请实施例还提供一种存储有计算机指令的计算机可读存储介质,当所述计算机指令被一个或多个处理器执行时,致使所述一个或多个处理器执行上述内存换页方法中的步骤。
本申请实施例还提供一种计算机程序产品,包括:计算机程序;当所述计算机程序被处理器执行时,致使所述处理器执行上述内存换页方法中的步骤。
在本申请实施例中,针对不支持Page fault的IOMMU,VMM可确定需要页面更新的GPA,从宿主机的IOMMU存储的页表中,确定该GPA对应的第一HPA;并从宿主机内存的空闲物理地址中,确定替换第一HPA的第二HPA;进一步,可将第一HPA存储的数据复制至第二HPA;并将IOMMU的页表记录的需要更新的GPA对应的HPA更新为第二HPA,对不支持Page fault的IOMMU实现了页表更换。
附图说明
此处所说明的附图用来提供对本申请的进一步理解,构成本申请的一部分,本申请的示意性实施例及其说明用于解释本申请,并不构成对本申请的不当限定。在附图中:
图1为本申请实施例提供的计算系统的结构示意图;
图2和图3为本申请实施例提供的内存换页方法的流程示意图。
具体实施方式
为使本申请的目的、技术方案和优点更加清楚,下面将结合本申请具体实施例及相应的附图对本申请技术方案进行清楚、完整地描述。显然,所描述的实施例仅是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
在本申请实施例中,针对不支持Page fault的IOMMU,VMM可确定需要页面更新的GPA,从宿主机的IOMMU存储的页表中,确定该GPA对应的第一HPA;并从宿主机内存的空闲物理地址中,确定替换第一HPA的第二HPA;进一步,可将第一HPA存储的数据复制至第二HPA;并将IOMMU的页表记录的需要更新的GPA对应的HPA更新为第二HPA,对不支持Page fault的IOMMU实现了页表更换。
以下结合附图,详细说明本申请各实施例提供的技术方案。
应注意到:相同的标号在下面的附图以及实施例中表示同一物体,因此,一旦某一物体在一个附图或实施例中被定义,则在随后的附图和实施例中不需要对其进行进一步讨论。
图1为本申请实施例提供的计算系统的结构示意图。如图1所示,计算系统包括:宿主机10和虚拟机管理节点(VMM)20。
在本实施例中,宿主机10是指具有计算、存储及通信等功能的计算机设备,例如可以是服务端设备。例如,宿主机10可以是单一服务器设备,也可以为云化的服务器阵列等。当然,宿主机10也可以为手机、平板电脑、个人电脑、穿戴设备等终端设备。
在本实施例中,宿主机10部署有虚拟机(Virtual Machine,VM)101。其中,虚拟机101也可称为宿主机10的虚拟机(Guest)。VM 101有独立的CPU、内存、网络以及磁盘等。虚拟机101对应的CPU,也可称为虚拟机CPU,即图1所示的vCPU 101a。
在本实施例中,VMM 20是指对虚拟机进行管理的逻辑节点,可部署于宿主机10,也可部署于与宿主机10通信的其它物理机。对于VMM 20部署于宿主机10的实施例,VMM 20可运行于宿主机10的CPU(图1未示出)。VMM 20可对VM 101进行调度任务、负载均衡、状态监测等。
在直通(Pass through)技术中,VM 101可透过VMM 20,直接访问宿主机10挂载的I/O硬件,这样VM 101的I/O操作路径几乎和无虚拟化环境下的I/O路径相同。其中,I/O硬件即为直通设备102。直通设备102是指宿主机10挂载的I/O硬件设备,可包括:网卡、存储介质等。其中,存储介质可为磁盘、硬盘等可持久存储介质。
对于直通设备102,可通过直接内存访问(DMA)方式访问宿主机10的内存,即宿主机内存103。对于DMA方式,直通设备102和宿主机内存103之间有一条数据通路,实现直通设备102和宿主机内存103直接的数据传输。在直通设备102和宿主机内存103之间数据传输过程中,不需要宿主机10的CPU参与。
在虚拟化技术中,VM 101上运行的操作系统(guest OS)通常不知道它所访问的宿主机的物理内存地址,也可称为宿主机物理地址(Host physical address,HPA)。如果要进行DMA操作,就有可能破坏内存。因为直通设备102仅可获知虚拟机的物理内存地址,也可称为虚拟机物理地址(Guest physical address,GPA),并不知道GPA和HPA之间的映射关系。因此,在直通技术中引入IOMMU 104。
IOMMU 104可通信连接于直通设备102和宿主机内存103之间。可选地,IOMMU104可通过串行接口总线,与直通设备102和宿主机内存103通信连接。串行接口总线可为PCI接口总线、PCIe接口总线等。IOMMU 104可保证直通设备102进行DMA操作时能够访问宿主机内存103。
对于没有IOMMU 104的宿主机,直通设备102通过DMA方式可以访问宿主机10的全部内存地址空间。对于设置有IOMMU 104的宿主机,虚拟机的CPU 101a可 分配直通设备102可访问的GPA,并由VMM 20为GPA分配对应的HPA。IOMMU 104可维护GPA与HPA之间的映射表,该映射表也可称为页表。页表记录有GPA与HPA之间的映射关系。在直通设备102通过DMA方式访问宿主机10的内存时,VMM 20可捕获直通设备102发出的DMA请求,并将DMA请求透传给IOMMU 104。IOMMU 104可从DMA请求中获取待访问的GPA;之后,将待访问的GPA在IOMMU 104存储的页表中进行匹配,以得到待访问的GPA对应的HPA。对于直通设备102来说,可通过待访问的GPA对应的HPA访问宿主机内存103对应的内存空间。
在一些应用场景中,IOMMU 104存储的页表有页表更换需求。例如,某个或某些GPA对应的HPA发生变化时,需要更新IOMMU 104存储的页表。如,在冷热内存页更替场景中,对于热页若长时间不被使用可转为冷页,分配给DMA请求。这是因为对于DMA请求,需要连续的内存页,因此需要分配冷页。上述冷页表示该空闲页已经不再高速缓存中,热页表示该空闲页仍然在高速缓存中。
对于需要对IOMMU 104进行页表更换的场景,对于支持Page fault的IOMMU,可采用age fault方式更换IOMMU页表。VMM 20可在必要时从IOMMU存储的页表中回收物理内存,即回收HPA对应的内存;并将存储在物理内存中的数据存储在硬盘上的页表文件(Paging File)中,这样能保证数据不丢失,同时释放的物理内存页面供其它进程使用。
当直通设备102再次需要访问回收的内存时,VMM 20可从Paging File中查找数据,分配空闲的物理内存页,并将数据写入物理内存页中,然后将新的物理内存页的HPA映射到进程需要操作的GPA对应的虚拟空间中,并写入IOMMU存储的页表中,获得访问所需的数据和内存空间。在OMMU存储的页表增加一个物理内存页对应的HPA,这就是一次Page Fault。
对于不支持Page fault的IOMMU,本申请实施例提供一种新的内存换页方式,主要实施方式如下:
在本申请实施例中,VMM 20可确定需要页面更新的GPA。在本申请实施例中,不限定VMM 20确定需要页表更新的GPA(A)的具体实施方式。在一些实施例中,VMM 20可获取页表更新请求。其中,页表更新请求包含需要页表更新的GPA(A)。VMM 20获取的页表更新请求可为VM 101对应的用户端提供的页表更新请求。例如,在云计算领域,用户可向云服务端申请虚拟机资源,VM 101为云服务端为用户分配的虚拟机。相应地,用户端即为申请该虚拟机的用户侧的计算设备。或者,页表更新请求也可为云服务端的运维侧设备发送的页表更新请求;等等。
进一步,VMM 20可从页表更新请求中,获取需要页表更新的GPA(记为A)。进一步,VMM 20可从IOMMU 104存储的页表中,确定需要页表更新的GPA(A)对应的HPA,记为Pa。具体地,VMM 20可将需要页表更新的GPA(A)在IOMMU 104存储的页表中进行匹配,以得到需要页表更新的GPA(A)对应的HPA(Pa)。
由于需要页表更新的GPA(A)对应的HPA是需要被更换的HPA,因此,VMM 20 还可从宿主机内存103的空闲物理地址中,确定另一HPA,记为Pr。在本申请实施例中,为了便于描述和区分,将需要页表更新的GPA(A)对应的HPA,定义为第一HPA(Pa);并将确定出的替换第一HPA的另一HPA,定义为第二HPA(记为Pr)。
可选地,VMM 20可从宿主机内存103的空闲物理地址中,任选一个空闲的物理地址,作为第二HPA(Pr)。或者,VMM 20可按照设定的内存页面置换算法,从宿主机内存103的空闲物理地址中,确定第二HPA(Pr)。内存页面置换算法包括但不局限于:最佳置换算法(OPT)、先进先出算法(FIFO)、最近最少使用算法(LRU)、时钟置换算法(CLOCK)或者改进型的时钟置换算法等。
其中,最佳置换算法(OPT)是指选择的页面以后永不使用的,或者是在最长时间内不再被访问的内存页,并将该内存页对应的HPA,作为第二HPA(Pr)。
先进先出算法(FIFO)是指将调入内存的内存页按照调入的先后顺序排成一个队列,选择最先进入内存的内存页对应的HPA,作为第二HPA(Pr)。
最近最少使用算法(LRU)是指选择出的内存页为最近未使用的内存页;并确定选择出的内存页对应的HPA,为第二HPA(Pr)。可选地,可用访问字段记录该内存页自上次被访问以来所经历的时间,当需要页表替换时,从空闲的内存页中,选择对应的时间值最大的内存页,作为最近未使用的内存页。
时钟置换算法(CLOCK)是指内存中的内存页通过链接指针,链接成一个循环队列,增加一个字段访问位字段。当某一内存页首次进入内存时,则将该页对应的访问位字段设置为1;当该页随后被访问到时,访问位字段也会被设置为1。该方法中,当需要内存换页时,操作系统扫描缓冲区,以查找访问位字段被置为0的内存页,并将扫描到的第一个被设置为-的内存页对应的HPA,作为第二HPA(Pr)。
上述实施例列举的确定HPA的方式仅为示例性说明,并不构成限定。
在确定出替换第一HPA(Pa)的第二HPA(Pr)之后,VMM 20可将第一HPA(Pa)存储的数据复制至第二HPA(Pr);并将IOMMU 104的页表记录的需要页表更新的GPA(A)对应的HPA更新为第二HPA(Pr)。
在本实施例提供的计算系统中,针对不支持Page fault的IOMMU,VMM可确定需要页面更新的GPA,从宿主机的IOMMU存储的页表中,确定该GPA对应的第一HPA;并从宿主机内存的空闲物理地址中,确定替换第一HPA的第二HPA;进一步,可将第一HPA存储的数据复制至第二HPA;并将IOMMU的页表记录的需要更新的GPA对应的HPA更新为第二HPA,对不支持Page fault的IOMMU实现了页表更换。
进一步,在对IOMMU页表更换之后,还可将IOMMU 104的页表刷新到输入/输出转译后备缓冲区(Input/Output Translation Lookaside Buffer,IOTLB)。
其中,IOTLB具有固定数目的空间槽,用于存放将虚拟地址映射至物理地址的页表,即上述IOMMU 104的页表。其搜索关键字为虚拟机的物理内存地址,其搜索结果为宿主机的物理地址(HPA)。如果DMA请求的虚拟机物理地址(GPA)在IOTLB中存在,可 提高地址翻译速率,之后就可以使用得到的宿主机物理地址(HPA)访问宿主机内存。
在直通技术领域中,如图1所示,宿主机10不仅支持直通设备104对宿主机内存的访问,还支持虚拟机的CPU(vCPU 101a)对宿主机内存的访问。vCPU 101a可申请虚拟机的地址空间(Virtural Address Space,VAS),即GPA。由于VAS不是真正的物理内存空间,操作系统必须将VAS映射到物理内存空间,进程才能存储进程上下文(process context)。
对于宿主机10可设置内存管理单元(MMU)105。MMU 105维护的页表记录有虚拟机物理地址与宿主机物理地址之间的对应关系。MMU 105可基于页表记录的虚拟机物理地址与宿主机物理地址之间的对应关系,将vCPU 101a访问的GPA转换为HPA,这样,vCPU 101a可访问HPA对应的内存空间。
在本申请实施例中,在IOMMU 104更新页表时,需要同步更新MMU 105的页表。这是因为若对IOMMU 104更新页表中的HPA进行更新,MMU 105中的页表不同步更新,会导致需要页表更新的GPA(A)在IOMMU 104的页表和MMU 105中的页表对应不同的HPA;而在IOMMU 104更新页表中的HPA后,已将原HPA(第一HPA)存储的数据更新到替换后的HPA(第二HPA),原HPA已释放或已存储其它数据,这样会导致MMU 105访问上述需要页面更新的GPA时,仍然会访问第一HPA(Pa),导致访问缺页或错误。
基于上述分析,在本申请实施例中,VMM 20在确定需要页表更新的GPA(A)之后,还可从MMU 105的页表中,删除需要更新的GPA;并阻塞vCPU 101a对需要页表更新的GPA(A)的访问。
由于MMU 105支持page fault,因此,VMM 20可在执行page fault(页表中断)流程时,阻塞vCPU 101a对需要页表更新的GPA(A)的访问。具体地,VMM 20在获取vCPU 101a对需要页表更新的GPA(A)的访问请求时,可利用需要页表更新的GPA(A)查询MMU 105的页表。由于MMU 105的页表中已删除需要页表更新的GPA(A),因此,在MMU 105的页表中不能查询到需要页表更新的GPA(A)对应的HPA,可触发MMU 105的页表中断流程。即在MMU 105的页表中未查询到需要页表更新的GPA(A)对应的HPA的情况下,执行页表中断流程。在本申请实施例中,为了阻塞vCPU 101a对需要页表更新的GPA(A)的访问,可在执行页表中断流程的过程中,对MMU 105的页表更换进行加锁处理,进而阻塞vCPU 101a对需要页表更新的GPA(A)的访问。如果不对MMU 105的页表更换进行加锁处理,MMU 105执行页表中断流程,将会为MMU 105页表记录的需要更新的GPA重新分配HPA,极易导致同一GPA在MMU 105页表和IOMMU 104的页表中对应的HPA不一致。
进一步,在IOMMU 104的页表更新完成之后,可在MMU 105的页表中添加上述需要更新的GPA,并将需要更新的GPA对应的HPA更新为第二HPA(Pr)。进一步,可恢复vCPU 101a对上述需要更新的GPA的访问。
在一些实施例中,在IOMMU的页表更新过程中,对于需要页表更新的GPA(A)对应的HPA存储的数据可能存在数据更新。用户对需要页表更新的GPA(A)对应的HPA 存储的数据的更新,VMM 20是无感知的。因此,VMM 20无法确定该数据更新是发生在IOMMU的页表更新之后,还是发生在IOMMU的页表更新过程中。若需要页表更新的GPA(A)对应的HPA存储的数据产生更新,发生在已完成将第一HPA(Pa)存储的数据复制至第二HPA(Pr)之后,但在将IOMMU 104的页表记录的需要页表更新的GPA(A)对应的HPA更新为第二HPA(Pr)之前,则对需要页表更新的GPA(A)对应的HPA存储的数据的更新,是对第一HPA(Pa)存储的数据进行更新的,导致第二HPA(Pr)存储的数据为更新前的数据,无法实现数据同步更新。
为了解决上述问题,在本申请一些实施例中,还可申请用于临时存储第二HPA(Pr)中数据的快照的第一HPA(Ps)。具体地,可从宿主机内存的空闲物理地址中,确定第一HPA(Ps);并在第一HPA(Pa)存储的数据复制至第二HPA(Pr)之后,将第二HPA(Pr)存储的数据复制至第一HPA(Ps),即第一HPA(Ps)存储的数据为第二HPA(Pr)存储的数据的快照。
进一步,可比较第一HPA(Pa)存储的数据与第二HPA(Pr)存储的数据是否相同。可选地,可以字节为单位,逐字节比较第一HPA(Pa)存储的数据与第二HPA(Pr)存储的数据是否相同。若第一HPA(Pa)存储的数据与第二HPA(Pr)存储的数据相同,说明在IOMMU页表更新过程中,需要页表更新的GPA(A)对应的HPA存储的数据未更新。因此,在第一HPA(Pa)存储的数据与第二HPA(Pr)存储的数据相同的情况下,可释放第三宿主机物理地址对应的内存空间。进一步,还可在MMU 105的页表中添加上述需要更新的GPA,并将需要更新的GPA对应的HPA更新为第二HPA(Pr)。进一步,可恢复vCPU 101a对上述需要更新的GPA的访问。
若第一HPA(Pa)存储的数据与第二HPA(Pr)存储的数据存在不同数据,说明在IOMMU页表更新过程中,需要页表更新的GPA(A)对应的HPA存储的数据发生更新。由于VMM 20无法感知HPA存储的数据发生更新的时机,因此,本申请实施例中,还需要确定需要页表更新的GPA(A)对应的HPA存储的数据发生更新的时机。若需要页表更新的GPA(A)对应的HPA存储的数据发生更新,发生在IOMMU 104的页表记录的第一HPA(Pa)更新为第二HPA(Pr)之后,说明发生数据更新的HPA为第二HPA(Pr)。由于IOMMU 104的页表记录的需要页表更新的GPA(A)对应的HPA已为第二HPA(Pr),因此,直通设备102在访问需要页表更新的GPA(A)时,访问的是第二HPA(Pr)的内存空间,访问的是更新后的数据,访问准确度较高。
相应地,若需要页表更新的GPA(A)对应的HPA存储的数据发生更新,发生在第一HPA(Pa)的数据复制至第二HPA(Pr)之后,但在IOMMU 104的页表记录的第一HPA(Pa)更新为第二HPA(Pr)之前,说明发生数据更新的HPA为第一HPA(Pa),而第二HPA(Pr)存储的数据为更新前的数据。这样,在IOMMU 104的页表记录的第一HPA(Pa)更新为第二HPA(Pr)之后,直通设备102在访问需要页表更新的GPA(A)时,访问的是第二HPA(Pr)的内存空间,访问的是更新前的数据,访问准确度较低。
基于上述分析,在本申请实施例中,在第一HPA(Pa)存储的数据与第二HPA(Pr)存储的数据存在不同数据的情况下,可确定数据不同的逻辑地址。其中,逻辑地址是指数据不同的数据在HPA中的相对地址。例如第N个字节的数据不同等。
在本实施例中,针对数据不同的第一逻辑地址b,可判断第二HPA(Pr)存储的数据处于第一逻辑地址b的数据Pr(b),是否与第一HPA(Ps)存储的数据处于第一逻辑地址b的数据Ps(b)是否相同。若判断结果为否,即Pr(b)不等于Ps(b),说明发生数据更新的为第二HPA(Pr)。由于直通设备102在访问需要页表更新的GPA(A)时,访问的是第二HPA(Pr)的内存空间,访问的是更新后的数据,因此,在Pr(b)不等于Ps(b)的情况下,可释放第一HPA(Ps)对应的内存空间。进一步,还可在MMU 105的页表中添加上述需要更新的GPA,并将需要更新的GPA对应的HPA更新为第二HPA(Pr)。进一步,可恢复vCPU 101a对上述需要更新的GPA的访问。
相应地,在第一HPA(Pa)存储的数据与第二HPA(Pr)存储的数据存在不同数据的情况下,若Pr(b)等于Ps(b),由于Pa(b)不等于Ps(b),说明发生数据更新的为第一HPA(Pa),即数据更新发生在第一HPA(Pa)的数据复制至第二HPA(Pr)之后,但在IOMMU 104的页表记录的第一HPA(Pa)更新为第二HPA(Pr)之前。因此,在在第一HPA(Pa)存储的数据与第二HPA(Pr)存储的数据存在不同数据的情况下,若Pr(b)等于Ps(b),可将第二HPA(Pr)存储的数据中处于第一逻辑地址b的数据更新为第一宿主机物理地址存储的数据中处于第一逻辑地址b的数据,即将Pr(b)更新为Pa(b)。进一步,可释放第一HPA(Ps)对应的内存空间。进一步,还可在MMU 105的页表中添加上述需要更新的GPA,并将需要更新的GPA对应的HPA更新为第二HPA(Pr)。进一步,可恢复vCPU 101a对上述需要更新的GPA的访问。
上述在第一HPA(Pa)存储的数据与第二HPA(Pr)存储的数据存在不同数据的情况下,进行的逻辑步骤可由原子指令完成。在原子指令执行过程中,不会被线程调度机制打断的操作;这种操作一旦开始,就一直运行到结束,中间不会有任何context switch(切换到另一个线程)。这样,可保证在第一HPA(Pa)存储的数据与第二HPA(Pr)存储的数据存在不同数据的情况下,执行逻辑步骤不会被打断,可保证在对执行逻辑步骤的过程中,第二HPA(Pr)中的数据不会被再次更新。原子指令可保证IOMMU页表更新时的页表修改时序,在直通设备102的DMA写请求并发时,依然可保证第一HPA(Pa)存储的数据与第二HPA(Pr)存储的数据存在不同数据的情况下,进行的逻辑步骤的原子性。
为了确保在第一HPA(Pa)存储的数据与第二HPA(Pr)存储的数据存在不同数据的情况下,执行的逻辑步骤的准确度,本申请实施例还可限定直通设备102对每个HPA最多只能写一次。这是因为若直通设备102可对HPA写多次,若需要页表更新的GPA(A)对应的HPA存储的数据的更新发生多次,假设GPA(A)对应的HPA存储的数据的更新一次发生在第一HPA(Pa)的数据复制至第二HPA(Pr)之后,但在IOMMU 104的页表记录的第一HPA(Pa)更新为第二HPA(Pr)之前,2次发生在IOMMU 104的页表记录的 第一HPA(Pa)更新为第二HPA(Pr)之后。其中,发生在第一HPA(Pa)的数据复制至第二HPA(Pr)之后,但在IOMMU 104的页表记录的第一HPA(Pa)更新为第二HPA(Pr)之前的一次数据更新,更新的为第一HPA(Pa)的数据;发生在IOMMU 104的页表记录的第一HPA(Pa)更新为第二HPA(Pr)之后的2次数据更新,更新的是第二HPA(Pr)的数据。假设这3次数据更新更新的都是逻辑地址c的数据。假设更新后的Pa(c)=3;而第二HPA(Pr)中逻辑地址c的数据发生2次更新后更新为原数据。例如,第二HPA(Pr)中逻辑地址c的原数据Pr(c)=1;第一次更新后Pr(c)=2;第二次更新后Pr(c)=1。第三HPA(Ps)中存储的数据为第二HPA(Pr)存储的原数据的快照,即Pr(c)=1。这样,Ps(c)不等于Pa(c),但Ps(c)=Pr(c),则应该执行将Pr(c)更新为Pa(c)的步骤。若将Pr(c)更新为Pa(c),则会导致Pr(c)发生错误,因为对第二HPA(Pr)的逻辑地址c进行数据更新,比对第一HPA(Pa)的逻辑地址c进行数据更新较晚。因此,在本申请实施例中,限定直通设备102对每个HPA最多只能写一次。这样,就可防止出现上述数据更新错误的问题。
除了上述系统实施例之外,本申请实施例还提供相应的内存换页方法。下面结合具体实施例进行示例性说明。
图2为本申请实施例提供的内存换页方法的流程示意图。如图2所示,内存换页方法包括:
S201、确定需要页表更新的GPA(A)。
S202、从宿主机的IOMMU存储的页表中,确定GPA(A)对应的第一HPA。
S203、从宿主机内存的空闲物理地址中,确定第二HPA。
S204、将第一HPA存储的数据复制至第二HPA。
S205、将IOMMU的页表记录的GPA(A)对应的HPA更新为第二HPA。
在本实施例中,对于不支持Page fault的IOMMU,在步骤S201中,可确定需要页面更新的GPA。在本申请实施例中,不限定确定需要页表更新的GPA(A)的具体实施方式。在一些实施例中,可获取页表更新请求。进一步,可从页表更新请求中,获取需要页表更新的GPA(记为A)。
进一步,在步骤S202中,可从IOMMU存储的页表中,确定需要页表更新的GPA(A)对应的第一HPA,记为Pa。
由于需要页表更新的GPA(A)对应的HPA是需要被更换的HPA,因此,在步骤S203中,还可从宿主机内存的空闲物理地址中,确定第二HPA,记为Pr。关于步骤S203的具体实施方式,可参见上述系统实施例的相关内容,在此不再赘述。
在确定出替换第一HPA(Pa)的第二HPA(Pr)之后,在步骤S204中,可将第一HPA(Pa)存储的数据复制至第二HPA(Pr);并在步骤S205中,将IOMMU的页表记录的需要页表更新的GPA(A)对应的HPA更新为第二HPA(Pr)。
在本实施例中,针对不支持Page fault的IOMMU,VMM可确定需要页面更新的 GPA,从宿主机的IOMMU存储的页表中,确定该GPA对应的第一HPA;并从宿主机内存的空闲物理地址中,确定替换第一HPA的第二HPA;进一步,可将第一HPA存储的数据复制至第二HPA;并将IOMMU的页表记录的需要更新的GPA对应的HPA更新为第二HPA,对不支持Page fault的IOMMU实现了页表更换。
进一步,在对IOMMU页表更换之后,还可将IOMMU的页表刷新到IOTLB。这样,在直通设备访问宿主机内存时,对于直通设备发出的DMA请求,可优先在IOTLB中进行匹配。如果DMA请求的虚拟机物理地址(GPA)在IOTLB中存在,可提高地址翻译速率,之后就可以使用得到的宿主机物理地址(HPA)访问宿主机内存。
在直通技术领域中,宿主机不仅支持直通设备对宿主机内存的访问,还支持虚拟机的CPU(vCPU)对宿主机内存的访问。对于宿主机可设置内存管理单元(MMU)。MMU维护的页表记录有虚拟机物理地址与宿主机物理地址之间的对应关系。MMU可基于页表记录的虚拟机物理地址与宿主机物理地址之间的对应关系,将vCPU访问的GPA转换为HPA,这样,vCPU可访问HPA对应的内存空间。
在本申请实施例中,在IOMMU更新页表时,需要同步更新MMU的页表。因此,在步骤S201之后,还可从MMU的页表中,删除需要更新的GPA;并阻塞vCPU对需要页表更新的GPA(A)的访问。
由于MMU支持page fault,因此,可在执行page fault(页表中断)流程时,阻塞vCPU对需要页表更新的GPA(A)的访问。具体地,VMM在获取vCPU对需要页表更新的GPA(A)的访问请求时,可利用需要页表更新的GPA(A)查询MMU的页表。由于MMU的页表中已删除需要页表更新的GPA(A),因此,在MMU的页表中不能查询到需要页表更新的GPA(A)对应的HPA,可触发MMU的页表中断流程。即在MMU的页表中未查询到需要页表更新的GPA(A)对应的HPA的情况下,执行页表中断流程。在本申请实施例中,为了阻塞vCPU对需要页表更新的GPA(A)的访问,可在执行页表中断流程的过程中,对MMU的页表更换进行加锁处理,进而阻塞vCPU对需要页表更新的GPA(A)的访问。
进一步,在IOMMU的页表更新完成之后,可在MMU的页表中添加上述需要更新的GPA,并将需要更新的GPA对应的HPA更新为第二HPA(Pr)。进一步,可恢复vCPU对上述需要更新的GPA的访问。
在一些实施例中,在IOMMU的页表更新过程中,对于需要页表更新的GPA(A)对应的HPA存储的数据可能存在数据更新。用户对需要页表更新的GPA(A)对应的HPA存储的数据的更新,VMM是无感知的。因此,VMM 20无法确定该数据更新是发生在IOMMU的页表更新之后,还是发生在IOMMU的页表更新过程中。若需要页表更新的GPA(A)对应的HPA存储的数据产生更新,发生在已完成将第一HPA(Pa)存储的数据复制至第二HPA(Pr)之后,但在将IOMMU的页表记录的需要页表更新的GPA(A)对应的HPA更新为第二HPA(Pr)之前,则对需要页表更新的GPA(A)对应的HPA存储的数 据的更新,是对第一HPA(Pa)存储的数据进行更新的,导致第二HPA(Pr)存储的数据为更新前的数据,无法实现数据同步更新。
为了解决上述问题,在本申请一些实施例中,还可申请用于临时存储第二HPA(Pr)中数据的快照的第一HPA(Ps)。具体地,可从宿主机内存的空闲物理地址中,确定第一HPA(Ps);并在第一HPA(Pa)存储的数据复制至第二HPA(Pr)之后,将第二HPA(Pr)存储的数据复制至第一HPA(Ps),即第一HPA(Ps)存储的数据为第二HPA(Pr)存储的数据的快照。
进一步,可比较第一HPA(Pa)存储的数据与第二HPA(Pr)存储的数据是否相同。可选地,可以字节为单位,逐字节比较第一HPA(Pa)存储的数据与第二HPA(Pr)存储的数据是否相同。若第一HPA(Pa)存储的数据与第二HPA(Pr)存储的数据相同,说明在IOMMU页表更新过程中,需要页表更新的GPA(A)对应的HPA存储的数据未更新。因此,在第一HPA(Pa)存储的数据与第二HPA(Pr)存储的数据相同的情况下,可释放第三宿主机物理地址对应的内存空间。进一步,还可在MMU的页表中添加上述需要更新的GPA,并将需要更新的GPA对应的HPA更新为第二HPA(Pr)。进一步,可恢复vCPU对上述需要更新的GPA的访问。
若第一HPA(Pa)存储的数据与第二HPA(Pr)存储的数据存在不同数据,说明在IOMMU页表更新过程中,需要页表更新的GPA(A)对应的HPA存储的数据发生更新。在本申请实施例中,在第一HPA(Pa)存储的数据与第二HPA(Pr)存储的数据存在不同数据的情况下,可确定数据不同的逻辑地址。其中,逻辑地址是指数据不同的数据在HPA中的相对地址。例如第N个字节的数据不同等。
在本实施例中,针对数据不同的第一逻辑地址b,可判断第二HPA(Pr)存储的数据处于第一逻辑地址b的数据Pr(b),是否与第一HPA(Ps)存储的数据处于第一逻辑地址b的数据Ps(b)是否相同。若判断结果为否,即Pr(b)不等于Ps(b),说明发生数据更新的为第二HPA(Pr)。由于直通设备102在访问需要页表更新的GPA(A)时,访问的是第二HPA(Pr)的内存空间,访问的是更新后的数据,因此,在Pr(b)不等于Ps(b)的情况下,可释放第一HPA(Ps)对应的内存空间。进一步,还可在MMU的页表中添加上述需要更新的GPA,并将需要更新的GPA对应的HPA更新为第二HPA(Pr)。进一步,可恢复vCPU对上述需要更新的GPA的访问。
相应地,在第一HPA(Pa)存储的数据与第二HPA(Pr)存储的数据存在不同数据的情况下,若Pr(b)等于Ps(b),由于Pa(b)不等于Ps(b),说明发生数据更新的为第一HPA(Pa),即数据更新发生在第一HPA(Pa)的数据复制至第二HPA(Pr)之后,但在IOMMU的页表记录的第一HPA(Pa)更新为第二HPA(Pr)之前。因此,在在第一HPA(Pa)存储的数据与第二HPA(Pr)存储的数据存在不同数据的情况下,若Pr(b)等于Ps(b),可将第二HPA(Pr)存储的数据中处于第一逻辑地址b的数据更新为第一宿主机物理地址存储的数据中处于第一逻辑地址b的数据,即将Pr(b)更新为Pa(b)。进 一步,可释放第一HPA(Ps)对应的内存空间。进一步,还可在MMU的页表中添加上述需要更新的GPA,并将需要更新的GPA对应的HPA更新为第二HPA(Pr)。进一步,可恢复vCPU对上述需要更新的GPA的访问。
为了清楚说明上述内存换页方法的具体实施方式,下面结合图3所示的具体实施例对内存换页方法进行示例性说明。如图3所示,该内存换页方法主要包括:
S1、确定需要页表更新的GPA(A)。
S2、从宿主机的MMU的页表中,删除需要页表更新的GPA(A)。
S3、阻塞虚拟机的CPU对GPA(A)的访问。
S4、从宿主机的IOMMU存储的页表中,确定GPA(A)对应的第一HPA(Pa)。
S5、从宿主机内存的空闲物理地址中,确定第二HPA(Pr)。
S6、从宿主机内存的空闲物理地址中,确定第三HPA(Ps)。
S7、将第一HPA(Pa)存储的数据复制至第二HPA(Pr),并将第二HPA(Pr)存储的数据复制至第三HPA(Ps)。
S8、将IOMMU的页表记录的GPA(A)对应的HPA更新为第二HPA(Pr);并刷新IOTLB。
S9、以字节为单位,逐字节比较第一HPA(Pa)存储的数据与第三HPA(Ps)存储的数据是否相同。若相同,执行步骤S12。若存在数据不同的情况,执行步骤S10。
S10、针对数据不同的第一逻辑地址b,判断第二HPA(Pr)存储的数据中处于第一逻辑地址b的数据Pr(b),是否与第三HPA(Ps)存储的数据中处于第一逻辑地址b的数据Ps(b)相同。即判断Pr(b)与Ps(b)是否相同。若判断结果为否,执行步骤S12。若判断结果为是,执行步骤S11。
S11、将第二HPA存储的数据中处于第一逻辑地址b的数据Pr(b),更新为第一HPA存储的数据中处于第一逻辑地址b的数据Pa(b);即Pr(b)=Pa(b)。接着执行步骤S12。
S12、释放第三HPA(Ps)对应的内存空间。
S13、在MMU的页表中添加GPA(A),并将MMU的页表记录的GPA(A)对应的HPA更新为第二HPA(Pr)。
上述在第一HPA(Pa)存储的数据与第二HPA(Pr)存储的数据存在不同数据的情况下,进行的逻辑步骤,即步骤S10和S11可由原子指令完成。具体原因可参见上述系统实施例的相关内容,在此不再赘述。
为了确保在第一HPA(Pa)存储的数据与第二HPA(Pr)存储的数据存在不同数据的情况下,执行的逻辑步骤S10和S11的准确度,本申请实施例还可限定直通设备对每个HPA最多只能写一次。具体原因分析可参见上述系统实施例的相关内容,在此不再赘述。
需要说明的是,上述实施例所提供方法的各步骤的执行主体均可以是同一设备,或者,该方法也由不同设备作为执行主体。比如,步骤S201和S202的执行主体可以为设备A; 又比如,步骤S201的执行主体可以为设备A,步骤S202的执行主体可以为设备B;等等。
另外,在上述实施例及附图中的描述的一些流程中,包含了按照特定顺序出现的多个操作,但是应该清楚了解,这些操作可以不按照其在本文中出现的顺序来执行或并行执行,操作的序号如S201、S202等,仅仅是用于区分开各个不同的操作,序号本身不代表任何的执行顺序。另外,这些流程可以包括更多或更少的操作,并且这些操作可以按顺序执行或并行执行。
相应地,本申请实施例还提供一种存储有计算机指令的计算机可读存储介质,当计算机指令被一个或多个处理器执行时,致使一个或多个处理器执行上述内存换页方法中的步骤。
本申请实施例还提供一种计算机程序产品,包括:计算机程序;当计算机程序被处理器执行时,致使处理器执行上述内存换页方法中的步骤。在本申请实施例中,不限定计算机程序的实现形态。在一些实施例中,计算机程序产品可实现为虚拟机管理程序。可选地,该虚拟机管理程序可运行于虚拟机的宿主机的CPU上。
需要说明的是,本文中的“第一”、“第二”等描述,是用于区分不同的消息、设备、模块等,不代表先后顺序,也不限定“第一”和“第二”是不同的类型。
本领域内的技术人员应明白,本申请的实施例可提供为方法、系统、或计算机程序产品。因此,本申请可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。而且,本申请可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。
本申请是参照根据本申请实施例的方法、设备(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器,使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。
这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品,该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。
这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上,使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。
在一个典型的配置中,计算设备包括一个或多个处理器(CPU)、输入/输出接口、网络接口和内存。
内存可能包括计算机可读介质中的非永久性存储器,随机存取存储器(RAM)和/或非易失性内存等形式,如只读存储器(ROM)或闪存(flash RAM)。内存是计算机可读介质的示例。
计算机的存储介质为可读存储介质,也可称为可读介质。可读存储介质包括永久性和非永久性、可移动和非可移动媒体可以由任何方法或技术来实现信息存储。信息可以是计算机可读指令、数据结构、程序的模块或其他数据。计算机的存储介质的例子包括,但不限于相变内存(PRAM)、静态随机存取存储器(SRAM)、动态随机存取存储器(DRAM)、其他类型的随机存取存储器(RAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、快闪记忆体或其他内存技术、只读光盘只读存储器(CD-ROM)、数字多功能光盘(DVD)或其他光学存储、磁盒式磁带,磁盘存储或其他磁性存储设备或任何其他非传输介质,可用于存储可以被计算设备访问的信息。按照本文中的界定,计算机可读介质不包括暂存电脑可读媒体(transitory media),如调制的数据信号和载波。
还需要说明的是,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、商品或者设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、商品或者设备所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括所述要素的过程、方法、商品或者设备中还存在另外的相同要素。
以上所述仅为本申请的实施例而已,并不用于限制本申请。对于本领域技术人员来说,本申请可以有各种更改和变化。凡在本申请的精神和原理之内所作的任何修改、等同替换、改进等,均应包含在本申请的权利要求范围之内。

Claims (11)

  1. 一种内存换页方法,其特征在于,包括:
    确定需要页表更新的虚拟机物理地址;
    从宿主机的IOMMU存储的页表中,确定所述虚拟机物理地址对应的第一宿主机物理地址;
    从宿主机内存的空闲物理地址中,确定第二宿主机物理地址;
    将所述第一宿主机物理地址存储的数据复制至所述第二宿主机物理地址;
    将所述IOMMU的页表记录的所述虚拟机物理地址对应的宿主机物理地址更新为所述第二宿主机物理地址。
  2. 根据权利要求1所述的方法,其特征在于,还包括:
    从所述宿主机的MMU的页表中,删除所述虚拟机物理地址;
    阻塞虚拟机的CPU对所述虚拟机物理地址的访问。
  3. 根据权利要求2所述的方法,其特征在于,所述阻塞所述CPU对所述虚拟机物理地址的访问,包括:
    获取所述CPU对所述虚拟机物理地址的访问请求;
    利用所述虚拟机物理地址查询所述MMU的页表;
    在所述MMU的页表中未查询到所述虚拟机物理地址对应的宿主机物理地址的情况下,执行页表中断流程;
    在执行页表中断流程的过程中,对所述MMU的页表更换进程进行加锁处理,以阻塞所述CPU对所述虚拟机物理地址的访问。
  4. 根据权利要求2所述的方法,其特征在于,还包括:
    从所述宿主机内存的空闲物理地址中,确定第三宿主机物理地址;
    在将所述第一宿主机物理地址存储的数据复制至所述第二宿主机物理地址之后,将所述第二宿主机物理地址存储的数据复制至所述第三宿主机物理地址;
    在所述第一宿主机物理地址存储的数据与所述第三宿主机物理地址存储的数据相同的情况下,释放所述第三宿主机物理地址对应的内存空间;
    将所述MMU的页表记录的所述虚拟机物理地址对应的宿主机物理地址更新为所述第二宿主机物理地址。
  5. 根据权利要求4所述的方法,其特征在于,还包括:
    在所述第一宿主机物理地址存储的数据与所述第三宿主机物理地址存储的数据存在不同数据的情况下,针对数据不同的第一逻辑地址,判断所述第二宿主机物理地址存储的数据中处于所述第一逻辑地址的数据,是否与所述第三宿主机物理地址存储的数据中处于所述第一逻辑地址的数据相同;
    若判断结果为否,释放所述第三宿主机物理地址对应的内存空间;
    将所述MMU的页表记录的所述虚拟机物理地址对应的宿主机物理地址更新为所 述第二宿主机物理地址。
  6. 根据权利要求5所述的方法,其特征在于,还包括:
    若判断结果为是,将所述第二宿主机物理地址存储的数据中处于所述第一逻辑地址的数据,更新为所述第一宿主机物理地址存储的数据中处于所述第一逻辑地址的数据;
    释放所述第三宿主机物理地址对应的内存空间;
    将所述MMU的页表记录的所述虚拟机物理地址对应的宿主机物理地址更新为所述第二宿主机物理地址。
  7. 根据权利要求4所述的方法,其特征在于,还包括:
    以字节为单位,比较所述第一宿主机物理地址存储的数据与所述第三宿主机物理地址存储的数据。
  8. 根据权利要求5或6所述的方法,其特征在于,还包括:
    在将所述MMU的页表记录的所述虚拟机物理地址对应的宿主机物理地址更新为所述第二宿主机物理地址之后,恢复所述虚拟机的CPU对所述虚拟机物理地址的访问。
  9. 根据权利要求1所述的方法,其特征在于,所述确定需要页表更新的虚拟机物理地址,包括:
    获取页表更新请求;从所述页表更新请求中获取需要页表更新的虚拟机物理地址。
  10. 一种计算系统,其特征在于,包括:宿主机和虚拟机管理节点;
    所述宿主机部署有虚拟机,并挂载有所述虚拟机直通的直通设备;所述宿主机还包括:IOMMU;所述IOMMU存储的页表记录有虚拟机的虚拟机物理地址与宿主机物理地址之间的对应关系;所述直通设备基于所述虚拟机的虚拟机物理地址与宿主机物理地址之间的对应关系,访问所述宿主机的内存;
    所述虚拟机管理节点,用于:确定需要页表更新的虚拟机物理地址;从所述IOMMU存储的页表中,确定所述虚拟机物理地址对应的第一宿主机物理地址;从宿主机内存的空闲物理地址中,确定第二宿主机物理地址;将所述第一宿主机物理地址存储的数据复制至所述第二宿主机物理地址;将所述IOMMU的页表记录的所述虚拟机物理地址对应的宿主机物理地址更新为所述第二宿主机物理地址。
  11. 一种存储有计算机指令的计算机可读存储介质,其特征在于,当所述计算机指令被一个或多个处理器执行时,致使所述一个或多个处理器执行权利要求1-9任一项所述方法中的步骤。
PCT/CN2023/074406 2022-02-18 2023-02-03 内存换页方法、系统及存储介质 WO2023155694A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
EP23755697.2A EP4375836A1 (en) 2022-02-18 2023-02-03 Memory paging method and system, and storage medium

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210150479.1 2022-02-18
CN202210150479.1A CN114201269B (zh) 2022-02-18 2022-02-18 内存换页方法、系统及存储介质

Publications (1)

Publication Number Publication Date
WO2023155694A1 true WO2023155694A1 (zh) 2023-08-24

Family

ID=80645531

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/074406 WO2023155694A1 (zh) 2022-02-18 2023-02-03 内存换页方法、系统及存储介质

Country Status (3)

Country Link
EP (1) EP4375836A1 (zh)
CN (1) CN114201269B (zh)
WO (1) WO2023155694A1 (zh)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114201269B (zh) * 2022-02-18 2022-08-26 阿里云计算有限公司 内存换页方法、系统及存储介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103597451A (zh) * 2011-03-31 2014-02-19 英特尔公司 用于高可用性的存储器镜像和冗余生成
CN111966468A (zh) * 2020-08-28 2020-11-20 海光信息技术有限公司 用于直通设备的方法、系统、安全处理器和存储介质
CN112241310A (zh) * 2020-10-21 2021-01-19 海光信息技术股份有限公司 页表管理、信息获取方法、处理器、芯片、设备及介质
CN112925606A (zh) * 2019-12-06 2021-06-08 阿里巴巴集团控股有限公司 一种内存管理方法、装置及设备
CN114201269A (zh) * 2022-02-18 2022-03-18 阿里云计算有限公司 内存换页方法、系统及存储介质

Family Cites Families (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7868897B2 (en) * 2006-06-30 2011-01-11 Intel Corporation Apparatus and method for memory address re-mapping of graphics data
CN101751345B (zh) * 2008-12-10 2012-04-11 国际商业机器公司 在主机中运行客户机的程序的模拟器和模拟方法
US8386745B2 (en) * 2009-07-24 2013-02-26 Advanced Micro Devices, Inc. I/O memory management unit including multilevel address translation for I/O and computation offload
US9535849B2 (en) * 2009-07-24 2017-01-03 Advanced Micro Devices, Inc. IOMMU using two-level address translation for I/O and computation offload devices on a peripheral interconnect
CN105095094B (zh) * 2014-05-06 2018-11-30 华为技术有限公司 内存管理方法和设备
US9870324B2 (en) * 2015-04-09 2018-01-16 Vmware, Inc. Isolating guest code and data using multiple nested page tables
US9842065B2 (en) * 2015-06-15 2017-12-12 Intel Corporation Virtualization-based platform protection technology
CN107193759A (zh) * 2017-04-18 2017-09-22 上海交通大学 设备内存管理单元的虚拟化方法
US10691365B1 (en) * 2019-01-30 2020-06-23 Red Hat, Inc. Dynamic memory locality for guest memory
CN111190752B (zh) * 2019-12-30 2023-04-07 海光信息技术股份有限公司 虚拟机共享内核内存的方法及装置
CN112363824B (zh) * 2020-10-12 2022-07-22 北京大学 一种申威架构下的内存虚拟化方法与系统

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103597451A (zh) * 2011-03-31 2014-02-19 英特尔公司 用于高可用性的存储器镜像和冗余生成
CN112925606A (zh) * 2019-12-06 2021-06-08 阿里巴巴集团控股有限公司 一种内存管理方法、装置及设备
CN111966468A (zh) * 2020-08-28 2020-11-20 海光信息技术有限公司 用于直通设备的方法、系统、安全处理器和存储介质
CN112241310A (zh) * 2020-10-21 2021-01-19 海光信息技术股份有限公司 页表管理、信息获取方法、处理器、芯片、设备及介质
CN114201269A (zh) * 2022-02-18 2022-03-18 阿里云计算有限公司 内存换页方法、系统及存储介质

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
"Master's Thesis", 10 March 2018, SHANGHAI JIAOTONG UNIVERSITY, CN, article XU, YU: "Optimization of Mediated Pass-Through for On-Device MMU Virtualization", pages: 1 - 69, XP009548264, DOI: 10.27307/d.cnki.gsjtu.2018.003099 *

Also Published As

Publication number Publication date
CN114201269B (zh) 2022-08-26
CN114201269A (zh) 2022-03-18
EP4375836A1 (en) 2024-05-29

Similar Documents

Publication Publication Date Title
US11500689B2 (en) Communication method and apparatus
US10289555B1 (en) Memory read-ahead using learned memory access patterns
US10552337B2 (en) Memory management and device
US10198377B2 (en) Virtual machine state replication using DMA write records
US9582198B2 (en) Compressed block map of densely-populated data structures
US11620233B1 (en) Memory data migration hardware
US10521354B2 (en) Computing apparatus and method with persistent memory
US11392363B2 (en) Implementing application entrypoints with containers of a bundled application
WO2018176911A1 (zh) 一种虚拟磁盘文件格式转换方法和装置
WO2019061352A1 (zh) 数据加载方法及装置
WO2023165400A1 (zh) 计算系统、内存缺页处理方法及存储介质
US10620871B1 (en) Storage scheme for a distributed storage system
CN111290827A (zh) 数据处理的方法、装置和服务器
WO2023155694A1 (zh) 内存换页方法、系统及存储介质
US8898413B2 (en) Point-in-time copying of virtual storage
US11582168B2 (en) Fenced clone applications
US11288238B2 (en) Methods and systems for logging data transactions and managing hash tables
US8892838B2 (en) Point-in-time copying of virtual storage and point-in-time dumping
US11531481B1 (en) Optimal method for deleting sub-blocks of a pointer block that do not have on-disk metadata headers for addresses
US11435935B2 (en) Shrinking segment cleaning algorithm in an object storage
US12013799B2 (en) Non-interrupting portable page request interface
US11847100B2 (en) Distributed file system servicing random-access operations
WO2023231572A1 (zh) 一种容器的创建方法、装置及存储介质
US20230251967A1 (en) Optimizing instant clones through content based read cache
US20240160373A1 (en) Coordinated Persistent Memory Data Mirroring

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23755697

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 2023755697

Country of ref document: EP

WWE Wipo information: entry into national phase

Ref document number: 18685060

Country of ref document: US

ENP Entry into the national phase

Ref document number: 2023755697

Country of ref document: EP

Effective date: 20240219