CN112363824B

CN112363824B - Memory virtualization method and system under Shenwei architecture

Info

Publication number: CN112363824B
Application number: CN202011084199.2A
Authority: CN
Inventors: 沙赛; 罗英伟; 汪小林; 张毅
Original assignee: Wuxi Advanced Technology Research Institute; Peking University
Current assignee: Wuxi Advanced Technology Research Institute; Peking University
Priority date: 2020-10-12
Filing date: 2020-10-12
Publication date: 2022-07-22
Anticipated expiration: 2040-10-12
Also published as: CN112363824A

Abstract

The invention relates to a memory virtualization method and system under a Shenwei architecture. The method comprises the following steps: establishing a buffer for storing a base address of a shadow page table; when the CPU queries the TLB and generates TLB miss, the CPU accesses the buffer to acquire the base address of the shadow page table of the current process, loads the base address of the shadow page table into a memory management unit and starts page table query; when mapping is missing in page table query, the CPU switches the client context to the host context to perform missing page interrupt processing; directly filling virtual and real address translation mapping obtained after the interruption process of missing page into corresponding TLB to realize TLB prefetching; the CPU inquires the TLB again to complete the address conversion from the virtual address of the client to the physical address of the host machine. The invention realizes the simultaneous refreshing of the shadow page table and the TLB based on the TLB characteristic managed by the Shenwei architecture software, thereby realizing the synchronization of the shadow page table and the client process page table.

Description

Memory virtualization method and system under Shenwei architecture

Technical Field

The invention relates to the field of Shenwei architecture virtualization, in particular to a method and a system for realizing efficient memory virtualization under the Shenwei architecture.

Background

The development of the Shenwei family of processors, which are representative of the domestic processors, is of great interest. The huge success of super computer Shenwei Taihu lake lays the important position of Shenwei in the field of domestic processors. Particularly, in a part with high safety and autonomous controllability such as a government, the Shenwei server is favored and is mainly used for a desktop office system. The first generation of the Shenwei architecture instruction set is derived from the Alpha instruction set, and then has been developed into the independent and autonomous Shenwei instruction set through continuous improvement and development.

Compared to the international mainstream computer processor architecture (e.g., x86), the Shenwei architecture still has a significant gap in functionality and performance. With the continuous development of informatization technology, the Shenwei processor is not limited to a desktop office system but follows the trend of the times, and is oriented to a wider cloud service system. Virtualization is one of the main support technologies for cloud services. Virtualization refers to the virtualization of a physical computer system into one or more virtual computer systems (virtual machines). Each virtual machine has its own virtual hardware (such as CPU, memory, etc.) to provide an independent and complete computer execution environment. Virtualization mainly aims at three physical resources, namely CPU virtualization, memory virtualization and I/O virtualization. The memory virtualization is the most complex virtualization technology, and the quality of the memory virtualization is often the bottleneck of the system performance of the virtual machine.

From the perspective of the operating system, it has two fundamental insights into physical memory: physical addresses start at zero and memory address continuity. The virtual machine runs on the host machine as an independent process, and the two basic conditions are difficult to meet. Virtualization introduces a new layer of system software, called a virtual machine monitor (or hypervisor), that controls access to physical resources by the guest operating systems. To satisfy the two basic conditions, the hypervisor introduces a new address space called the guest physical address space. In a computer system, the access of a CPU to a memory is divided into two steps: virtual-real address translation and accessing memory data based on physical addresses. Virtual-to-real address translation refers to translating virtual addresses of program operations to actual physical addresses.

In a virtualized environment, address translation is divided into two steps: client virtual address- > client physical address- > host physical address. The task of memory virtualization is how to efficiently accomplish the two-layer address translation. In this process, the address translation overhead is mainly divided into three parts: TLB lookup, page table lookup, and page fault handling.

The existing main stream architecture memory virtualization solutions are divided into two types: software memory virtualization represented by traditional shadow page tables and hardware-assisted virtualization represented by extended page tables. Both of these solutions are implemented on a mainstream processor architecture such as x86, but neither is suitable for a Shenwei architecture processor. In terms of expanding the page table model, hardware is essentially used for efficiently completing two-layer address translation, but the Shenwei architecture lacks hardware support, and if the model is realized by pure software, the performance cannot meet the production and living requirements. Furthermore, while the extended page table model reduces the miss interrupt handling overhead compared to traditional shadow page tables, it introduces additional page table walk overhead.

With respect to the traditional shadow page table model, on one hand, the code implementation is extremely complex and inefficient due to the synchronization mechanism of write protection; on the other hand, the software flexibility advantage unique to the Shenwei architecture cannot be exerted. The Shenwei architecture has unique virtualization advantages compared to the x86 architecture. First, the Shenwei architecture is a software-managed Translation Lookaside Buffer (TLB), which provides the necessary conditions for memory virtualization optimization. The TLB is a hardware device having a small storage capacity, and directly stores a mapping relationship from a virtual address to a physical address. It is an address translation unit next to the CPU, which first accesses the TLB to look up the virtual-real address translation map at each address translation. Second, the Shenwei architecture has a hardware mode higher than the Kernel mode and a unique programmable software interface HMcode. The software interface runs in a hardware mode, and has the highest system authority to directly access registers, memories and other devices such as a TLB and the like. By the characteristic, the Shenwei architecture has extremely high flexibility of bottom-layer software and can provide rich and diverse support for virtualization.

The Shenwei architecture has unique virtualization advantages. In addition to user mode and kernel mode, the Shenwei architecture also has a mode with the highest privilege, called hardware mode. Hosts and clients under the AWARE architecture may have orthogonal three levels of privileges, namely user mode, kernel mode, and hardware mode. This is similar to the Intel VMX mode of operation. The svwey HMcode is a programmable interface between the kernel layer and the hardware, operating in hardware mode to execute privileged instructions. The HMcode interface is transparent to the user layer and even the kernel layer and can directly access registers and memory using physical addresses and the like. The operating system may be trapped in hardware mode by system calls. For example, HMcode provides a TLB flush interface for the kernel called TBI. Similar to VPID (Virtual Process ID) and PCID (Process context identifier) in TLB of x86 architecture, VPN (Virtual processor number) and UPN (User Process number) in Sunway TLB may distinguish different Virtual processors and processes, respectively. The HMcode interface provides software flexibility and may also help verify virtualized hardware support.

Disclosure of Invention

The invention aims to realize a memory virtualization system on a Shenwei server by fully combining the advantages of the Shenwei architecture and based on a TLB (translation lookaside buffer) managed by software. Specifically, the invention is based on a memory virtualization model of a shadow page table, fully utilizes the characteristics of a programmable interface HMcode of the Shenwei software, a TLB of software management and the like, and realizes a memory virtualization system of the Shenwei 1621 server. The core idea of the invention is based on the TLB characteristic managed by the Shenwei architecture software, the simultaneous refreshing of the shadow page table and the TLB is realized, and the synchronization of the shadow page table and the client process page table is realized.

The technical scheme adopted by the invention is as follows:

a memory virtualization method under Shenwei architecture comprises the following steps:

establishing a buffer for storing a base address of a shadow page table;

when the CPU queries the TLB and generates TLB miss, the CPU accesses the buffer area to acquire the base address of the shadow page table of the current process by using the TLB characteristic managed by the Shenwei architecture software, loads the base address of the shadow page table into a memory management unit and starts page table query;

when mapping is missing in page table query, the CPU switches the client context to the host context to perform missing page interrupt processing;

filling the TLB by utilizing the characteristic that Shenwei architecture software fills the TLB, and directly filling virtual-real address translation mapping obtained after the missing page interrupt processing into the corresponding TLB to realize the TLB prefetching;

the CPU inquires the TLB again to complete the address conversion from the virtual address of the client to the physical address of the host machine.

Further, the buffer uses 16 physical pages, each page stores 1024 entries, the buffer is indexed using a combination of a 2-bit VPN and an 8-bit UPN, and each entry in the buffer contains a 64-bit shadow page table base address of a process.

Further, when the page table is queried, a 4-level shadow page table structure is adopted, the memory management unit traverses the 4-level shadow page table to obtain the mapping from the virtual address of the client to the physical address of the host, if the page table query is successful, the mapping is filled in the TLB, and then the CPU queries the TLB again to complete address translation; a mapping miss for any one level of page tables results in a page miss interrupt.

Further, the page fault interrupt processing includes: the method comprises the steps of client process page table query, host process page table query and shadow page table construction and filling.

Further, the page fault interrupt processing includes:

inquiring a client process page table, converting a client virtual address into a client physical address, and if the client process page table is missing or incompletely mapped, entering a virtual machine to perfect the client process page table;

converting the physical address of the client into a virtual address of a host machine, and inquiring a page table of a process of the host machine to convert the virtual address of the host machine into the physical address of the host machine;

and utilizing the mapping from the virtual address of the client to the physical address of the host obtained by the two-step query, constructing a four-level page table according to the organization rule of the shadow page table, and filling the mapping.

Further, maintaining synchronization of the shadow page table and the guest process page table is performed by:

the client operating system refreshes the TLB entry of the current process directly through system call without exiting the virtual machine;

implementing a shadow page table flusher in the HMcode interface to monitor the software managed TLB interface;

the shadow page table flusher decodes the captured TLB flush instruction and invalidates the corresponding shadow page table entries, thereby enabling a simultaneous flush of the TLB and shadow page tables.

Further, TLB flush requests are issued through two interfaces in the chenwei operating system: one interface is the operating system process page table missing processing function, and the other interface is the process context switch processing function.

Further, the following steps are adopted to timely recycle the discarded shadow page table:

when the TLB refreshing instruction captured by the shadow page table refresher refreshes the TLB entries of the whole process, immediately invalidating the entries of the corresponding process stored in the shadow page table base address buffer;

when the shadow page table is in page missing processing, when the virtual machine monitoring program constructs the shadow page table mapping, the virtual machine monitoring program firstly checks the valid bit of the base address of the current process in the buffer area, and if the virtual machine monitoring program is set to be invalid, all the shadow page tables of the current process are directly recycled.

A memory virtualization system under the Shenwei architecture using the method comprises:

the TLB inquiry module is used for realizing that the CPU inquires the TLB to obtain virtual-real address translation mapping;

the page table query module is used for accessing a pre-established buffer area for storing a base address of a shadow page table by a CPU (central processing unit) by utilizing the TLB characteristic managed by the Shenwei architecture software when the TLB is queried and the TLB is missed, so as to obtain the base address of the shadow page table of the current process, load the base address of the shadow page table into the memory management unit and start page table query;

the page missing processing module is used for switching the client context to the host context by the CPU to carry out page missing interrupt processing when mapping missing occurs in page table query;

and the TLB prefetching module is used for directly filling virtual-real address translation mapping obtained after the missing page interrupt processing into the corresponding TLB by utilizing the characteristic that the Shenwei architecture software fills the TLB so as to realize TLB prefetching, so that the address translation from the virtual address of the client to the physical address of the host is completed when the CPU inquires the TLB again.

The virtual machine is based on the Shenwei architecture, and the virtual machine adopts the method to virtualize the memory.

The invention provides a novel memory virtualization method and a novel memory virtualization system under a Shenwei architecture, which are based on the characteristics of the Shenwei architecture, in particular to a TLB mechanism of software management. On one hand, the method is based on the shadow page table, inherits the advantages of efficient page table query of the traditional shadow page table model and eliminates the processing overhead of missing pages caused by write protection synchronization of the traditional shadow page table model; on the other hand, the invention does not need complex hardware support and does not introduce extra page table query cost as the expansion of the page table model.

Drawings

Fig. 1 is an implementation interface diagram of a memory virtualization model on the schenware architecture.

FIG. 2 is a diagram of the Shenwei memory virtualization overhead results using the SPEC CPU2006 test set.

FIG. 3 is a graph of x86 memory virtualization overhead results using the SPEC CPU2006 test set.

FIG. 4 is a comparison graph of Shenwei and x86 memory virtualization overhead using the SPEC CPU2017 large working set program.

Detailed Description

In order to make the aforementioned objects, features and advantages of the present invention more comprehensible, the present invention is described in further detail with reference to the following specific embodiments and the accompanying drawings.

The "software managed TLB" described in the present disclosure refers to an architecture that exposes a TLB software management interface for an operating system to flush/fill TLB entries.

The shadow page table is used for directly caching mapping relation from virtual address of a client to physical address of a host in a virtualization environment. This is an important method for accelerating the efficiency of two-layer address translation.

The page table synchronization refers to the fact that in a shadow page table memory virtualization model, a shadow page table needs to keep mapping consistency with a client process page table, and the process of maintaining the consistency is called page table synchronization.

The present invention relates to a TLB and shadow page table simultaneous refreshing method, which is characterized in that a monitoring-capturing TLB refreshing instruction is realized based on a Shenwei architecture TLB software management interface, and corresponding shadow page table entries are refreshed simultaneously. In a computer system, to ensure the validity of TLB entries, once the process page table is modified by the operating system, the corresponding TLB entry must be flushed, i.e., the old mapping invalidated.

The KVM is a module in a Linux kernel and is an open-source efficient virtualization solution. It consists of a loadable kernel module that provides the core virtualization infrastructure, and a processor-specific module for architecture emulation and interrupt handling. The invention realizes a memory virtualization model based on KVM on the Shenwei 1621 server. Fig. 1 shows an implementation interface of a memory virtualization model on the schenware architecture. Wherein, I-TLB represents instruction TLB, and D-TLB represents data TLB. In the invention, the memory virtualization mainly ensures two aspects of work, namely, efficiently completing the address conversion from the virtual address of the client to the physical address of the host, and maintaining the synchronization of the shadow page table and the client process page table.

1. And (3) address conversion flow:

the task of memory virtualization is to complete the conversion of the virtual address of a client to the physical address of a host.

1) And (4) querying the TLB. CPU visits TLB, according to the virtual address inquiry mapping of the client, if TLB hits, then the address translation is finished; otherwise, a TLB miss is generated, and a page table walk is entered.

2) And (5) page table query. The invention designs a buffer with limited size for storing the base address of the shadow page table. By utilizing the TLB characteristic managed by the Shenwei architecture software, before the page table query, the CPU accesses the buffer to acquire the base address of the shadow page table of the current process. Shenwei 1621 has 16 physical cores with a page size of 8 KB. The buffer uses 16 physical pages, each storing 1024 entries. The present invention uses a 2-bit VPN and 8-bit UPN combination index buffer. Each entry in the buffer contains a 64-bit shadow page table base for one process. And loading the base address of the shadow page table into a memory management unit, and starting page table query. The invention adopts 4-level shadow page table structure, the memory management unit traverses 4-level page table to obtain the mapping from virtual address of client to physical address of host. If the page table walk is successful, the mapping to the TLB is filled, and then the CPU walks the TLB again, completing the address translation. A mapping miss for any one level page table results in a page miss interrupt.

3) And page missing interrupt processing. Once a mapping miss occurs in the page table walk, the CPU will switch the guest context to the host context to handle the page miss interrupt. The parameters of the incoming page fault interrupt handler include the virtual address of the client, error information and the like. The missing page interrupt processing mainly comprises three parts: client process page table walk, host process page table walk and shadow page table build and fill.

a) The system first queries the client process page table to translate the client virtual address to the client physical address. If the guest process page table is missing (or not fully mapped), then the virtual machine needs to be re-entered to complete the guest process page table.

b) The client physical address and the host virtual address are continuous and have a direct linear mapping relationship. The conversion may be direct. The system queries a host process page table to convert a host virtual address to a host physical address.

c) According to the two steps of inquiry, the system obtains the mapping from the virtual address of the client machine to the physical address of the host machine. The system constructs a four-level page table according to the organization rule of the shadow page table and fills the mapping.

4) TLB prefetching. After the page fault interrupt processing is finished, the CPU enters the virtual machine again to execute an instruction generating the TLB miss. Before that, the present invention utilizes the characteristic of filling TLB by using Shenwei architecture software, and fills the virtual-real address translation mapping obtained after the processing of missing page interrupt directly into the corresponding TLB, which is called TLB prefetching. Under the condition of no TLB prefetching, the CPU needs to execute the original instruction, generates TLB missing and page table query, acquires mapping in the shadow page table and fills the TLB, and then executes the original instruction again to complete address translation. TLB prefetch optimization may save one TLB miss and one page table walk.

2. Page table synchronization:

the guest operating system can flush (invalidate) the TLB entry of the current process directly through the system call without requiring a virtual machine exit (i.e., context switch). In the Shenwei operating system, there are two main interfaces that can issue "TLB flush" requests. One is the operating system process page table page fault handling function. When the operating system updates the process page table, it needs to invalidate the old TLB entries with the same guest virtual address. The other interface is a process context switch handling function. When switching process contexts, all TLB entries under the entire virtual processor should be flushed if a rotation UPN is required. The TLB entry of the Shenwei 1621 contains an 8-bit UPN that identifies the active process. Each process gets a UPN when it is first scheduled on the CPU. If the number of processes exceeds 256, the UPN will rotate, meaning that all TLB entries under the current virtual processor will be flushed, and the system will reassign the UPN to the active process. The present invention implements a shadow page table flusher in the HMcode interface to monitor the TLB interface for software management. The shadow page table flusher decodes the captured TLB flush instruction and invalidates the corresponding shadow page table entry. Therefore, the present invention realizes the simultaneous refreshing of the TLB and the shadow page table.

3. And (3) recovering the table page of the shadow page:

the shadow page table page belongs to the host machine process memory and is managed by the virtual machine monitoring program. In a multitasking virtual machine, each process uses a shadow page table. Under the Shenwei architecture, we maintain 246 process shadow page tables for each virtual processor, and at most 1024 process shadow page tables on one physical core. The frequent creation and destruction of processes requires that the memory virtualization model must recycle obsolete shadow page tables in a timely manner. The TLB flush has different granularities such as a single TLB entry, a TLB entry of the whole process, and the like, and when the TLB flush instruction captured by the shadow page table flusher flushes the TLB entry of the whole process, the entry of the corresponding process stored in the shadow page table base address buffer is immediately invalidated. When the shadow page table is in page missing processing, when the virtual machine monitoring program constructs the shadow page table mapping, the virtual machine monitoring program firstly checks the valid bit of the base address of the current process in the buffer area, and if the virtual machine monitoring program is set to be invalid, all the shadow page tables of the current process are directly recycled.

4. And (3) experimental evaluation:

to validate the efficiency of the invention, we evaluated using the SPEC CPU test set and STREAM Bandwidth test script process. Since the test set in SPEC CPU2006 is generally small (less than 3GB), we also selected a partial large working set in SPEC CPU2017 for testing. FIGS. 2 and 3 show the test results of the SPEC CPU2006 of the new shadow page table model under the Shenwei architecture and the conventional shadow page table model and the extended page table model under x86, respectively. Experiment results show that the execution time average cost of memory virtualization of the new shadow page table under the Shenwei architecture is only 1.36%, which is significantly lower than the cost of the traditional shadow page table (5.97%) and the extended page table (5.36%) under the x86 architecture. FIG. 4 is a test result using SPEC CPU2017, and as such, the new model exhibits good performance even with a large working set program. Under the test, the virtualization overhead of the new shadow page table model under the Shenwei architecture is only 3.22%, and the overhead of the traditional shadow page table model and the overhead of the extended page table model under x86 are respectively as high as 9.27% and 11.06%. STREAM is a classic procedure for testing system bandwidth. The experimental result shows that the bandwidth loss of memory virtualization under the Shenwei architecture is only 0.5%.

Based on the same inventive concept, another embodiment of the present invention provides a memory virtualization system under the Shenwei architecture using the method, including:

the page table query module is used for accessing a pre-established buffer area for storing the base address of the shadow page table by a CPU (central processing unit) to acquire the base address of the shadow page table of the current process by utilizing the TLB characteristic managed by the Shenwei architecture software when the TLB is queried and the TLB is missed, loading the base address of the shadow page table into the memory management unit and starting page table query;

Based on the same inventive concept, another embodiment of the present invention provides a virtual machine based on the Shenwei architecture, and the virtual machine performs memory virtualization by using the method of the present invention.

The foregoing disclosure of the specific embodiments of the present invention and the accompanying drawings is intended to assist in understanding the contents of the invention and to enable its practice, and it will be understood by those skilled in the art that various alternatives, modifications and variations may be possible without departing from the spirit and scope of the invention. The present invention should not be limited to the disclosure of the embodiments and drawings in the specification, and the scope of the present invention is defined by the scope of the claims.

Claims

1. A memory virtualization method under Shenwei architecture is characterized by comprising the following steps:

establishing a buffer area for storing a base address of a shadow page table;

when the CPU queries the TLB and has TLB miss, the CPU accesses the buffer area to obtain the base address of the shadow page table of the current process by using the TLB characteristic managed by the Shenwei architecture software, loads the base address of the shadow page table into a memory management unit and starts page table query;

filling the virtual-real address translation mapping obtained after the missing page interrupt processing into the corresponding TLB by utilizing the characteristic that Shenwei architecture software fills the TLB, so as to realize the TLB prefetching;

2. The method of claim 1, wherein the buffer uses 16 physical pages, each page stores 1024 entries, the buffer is indexed using a combination of a 2-bit VPN and an 8-bit UPN, and each entry in the buffer contains a 64-bit shadow page table base for a process.

3. The method of claim 1, wherein when performing the page table walk, a 4-level shadow page table structure is adopted, the memory management unit traverses the 4-level shadow page table to obtain the mapping from the virtual address of the client to the physical address of the host, if the page table walk is successful, the mapping is filled in the TLB, and then the CPU queries the TLB again to complete the address translation; a mapping miss for any one level of page tables results in a page miss interrupt.

4. The method of claim 1, wherein the page fault interrupt handling comprises: the method comprises the steps of client process page table query, host process page table query and shadow page table construction and filling.

5. The method of claim 4, wherein the page fault interrupt handling comprises:

6. The method of claim 1, wherein the step of maintaining synchronization of the shadow page table and the guest process page table is performed by:

the client operating system refreshes the TLB entry of the current process directly through system call without quitting the virtual machine;

implementing a shadow page table flusher in an HMcode interface to monitor a TLB interface of software management;

7. The method of claim 6, wherein the TLB flush request is issued over two interfaces in the SW operating system: one interface is the operating system process page table missing processing function, and the other interface is the process context switch processing function.

8. The method of claim 1, wherein the obsolete shadow page tables are reclaimed in time by the steps of:

when the shadow page table is in page missing processing, when the virtual machine monitoring program constructs shadow page table mapping, firstly checking the valid bit of the base address of the current process in the buffer, and directly recycling all the shadow page tables of the current process if the valid bit is invalid.

9. A memory virtualization system under an Schwek architecture using the method of any one of claims 1-8, comprising:

10. A virtual machine based on the schenware architecture, wherein the virtual machine performs memory virtualization by using the method of any one of claims 1 to 8.