WO2021027069A1 - 分布式虚拟机自适应内存一致性协议及其设计方法、终端 - Google Patents
分布式虚拟机自适应内存一致性协议及其设计方法、终端 Download PDFInfo
- Publication number
- WO2021027069A1 WO2021027069A1 PCT/CN2019/113236 CN2019113236W WO2021027069A1 WO 2021027069 A1 WO2021027069 A1 WO 2021027069A1 CN 2019113236 W CN2019113236 W CN 2019113236W WO 2021027069 A1 WO2021027069 A1 WO 2021027069A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- mode
- state
- page
- shared
- protocol
- Prior art date
Links
- 230000015654 memory Effects 0.000 title claims abstract description 70
- 230000003044 adaptive effect Effects 0.000 title claims abstract description 36
- 238000000034 method Methods 0.000 title claims abstract description 32
- 238000013461 design Methods 0.000 title claims abstract description 23
- 230000007704 transition Effects 0.000 claims abstract description 11
- 238000012546 transfer Methods 0.000 claims description 10
- 238000004590 computer program Methods 0.000 claims description 6
- 230000008569 process Effects 0.000 claims description 5
- 230000004888 barrier function Effects 0.000 claims description 4
- 230000001360 synchronised effect Effects 0.000 claims description 3
- 238000010586 diagram Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000003111 delayed effect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 230000007334 memory performance Effects 0.000 description 1
- 239000013307 optical fiber Substances 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/44—Arrangements for executing specific programs
- G06F9/455—Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
- G06F9/45533—Hypervisors; Virtual machine monitors
- G06F9/45558—Hypervisor-specific management and integration aspects
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/08—Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
- G06F12/0802—Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
- G06F12/0806—Multiuser, multiprocessor or multiprocessing cache systems
- G06F12/0815—Cache consistency protocols
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/44—Arrangements for executing specific programs
- G06F9/455—Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
- G06F9/45533—Hypervisors; Virtual machine monitors
- G06F9/45558—Hypervisor-specific management and integration aspects
- G06F2009/45583—Memory management, e.g. access or allocation
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Definitions
- the present invention relates to the technical field of computer virtualization and distributed system architecture, in particular to a distributed virtual machine self-adaptive memory consistency protocol and a design method and terminal thereof.
- Distributed virtual machines abstract the hardware resources on multiple machines to provide massive computing and I/O resources for a single or even multiple virtual machines to meet application scenarios with extremely high resource and performance requirements.
- distributed virtual machines On the basis of QEMU-KVM, distributed virtual machines have added several functional modules, including IPI forwarding, interrupt forwarding, I/O forwarding, clock synchronization, and distributed shared memory modules. At the same time, the machines are connected through the RDMA network.
- the protocol for distributed shared memory is the sequential consistency (Sequential Consistency) protocol.
- Sequential Consistency Sequential Consistency
- This type of protocol requires that any read operation can see the value of the last write of a certain data, which belongs to a strong consistency protocol.
- this memory protocol has sacrificed a lot of performance while ensuring strong consistency.
- Intel and AMD's x86 architecture does not use this sequential consistency, but uses x86-TSO as the memory consistency protocol.
- x86-TSO allows write operations to be cached in the store buffer to be delayed, thereby improving the performance of subsequent read operations.
- this memory consistency protocol cannot guarantee that the memory synchronization protocol of distributed virtual machines can be flexibly switched between sequential consistency and x86-TSO, which greatly reduces the performance of distributed shared memory.
- the present invention provides a distributed virtual machine adaptive memory consistency protocol, a design method thereof, and a terminal.
- a new memory consistency protocol is designed to enable distributed shared memory to achieve better performance.
- the present invention is achieved through the following technical solutions.
- a design method for a distributed virtual machine adaptive memory consistency protocol including:
- the client vCPU can switch flexibly between sequential consistency mode and TSO mode;
- the interception of the synchronization operation includes:
- the vCPU is divided into two working modes, namely sequential consistency mode and TSO mode, so that the client can switch between the two modes through hypercall:
- the vCPU will only run in sequential consistency mode
- KVM will Invalidate the copies on other nodes when it writes the memory page, allowing the program to monopolize the memory page.
- the state description of the synchronization protocol includes:
- a distributed virtual machine adaptive memory consistency protocol which is designed by any one of the above-mentioned design methods of the distributed virtual machine adaptive memory consistency protocol .
- a terminal includes a memory, a processor, and a computer program stored in the memory and capable of running on the processor, and the processor can be used to execute any of the foregoing when the computer program is executed.
- the present invention has the following beneficial effects:
- the design method of the distributed virtual machine adaptive memory consistency protocol designed an adaptive memory consistency protocol, which can make the memory synchronization protocol of the distributed virtual machine switch flexibly in sequence consistency and x86-TSO .
- the adaptive consistency protocol can appropriately relax the original sequential consistency to x86-TSO, greatly improving the performance of distributed shared memory.
- Figure 1 is a schematic diagram of interception of synchronization operations in the present invention
- FIG. 2 is a schematic diagram of the state transition of the adaptive consensus protocol in the present invention.
- the embodiment of the present invention provides a design method for a distributed virtual machine adaptive memory consistency protocol, so that the distributed shared memory in the distributed virtual machine obtains better performance.
- the client vCPU can be flexibly switched between sequential consistency mode and TSO mode.
- the vCPU is divided into two working modes, namely sequential consistency (SC) mode and TSO mode, so that the client can switch between the two modes through hypercall. If the guest supports para-virtualization, make the vCPU work in the TSO mode in the kernel mode, otherwise it will run in the sequential consistency mode. At the same time, rewrite the synchronization operation in the kernel, that is, the memory barrier and atomic instructions are hypercall, and actively notify KVM to process distributed shared memory.
- SC sequential consistency
- the status of the adaptive consensus protocol includes Invalid, Shared, Modified, and Dirty.
- three additional states are Invalid*, Shared*, and Dirty*. If there is a vCPU in sequential consistency mode on the current node, the Shared or Shared* state is written, and when the page in the Invalid or Invalid* state is written, the page state transfers to the Modified state; if the current node does not have a vCPU in sequential consistency mode, then When writing Shared or Shared* state, and Invalid or Invalid* state page, the page state is transferred to Dirty and Dirty* state respectively; when reading Invalid or Invalid* page, first obtain a copy from the page owner, and then respectively Transfer to Shared and Shared* status.
- the embodiments of the present invention will be described in detail below with reference to the accompanying drawings.
- This embodiment is implemented under the premise of the technical solution and architecture of the present invention, and detailed implementation and specific operation procedures are given, but the applicable platform is not limited to the following examples.
- the specific deployment example is a cluster composed of three ordinary servers, and each server is equipped with a network card that supports InfiniBand.
- the server is connected to the central InfiniBand switch via optical fiber.
- the invention is not limited by the type and number of servers, and can be extended to more than three heterogeneous servers to form a cluster.
- Each server is equipped with Ubuntu Server 16.04.5 LTS 64bit, and is equipped with two CPUs with a total of 56 cores and 128GB of memory.
- the specific development of the present invention is based on the source code version of QEMU 2.8.1.1 and Linux kernel 4.9.76 as an explanation, and it is also applicable to other QEMU and other Linux kernels.
- a slightly modified operating system can be run on QEMU-KVM, that is, Ubuntu Server 16.04.5TS 64bit.
- IDE devices, SCSI devices, network devices, GPU devices, FPGA devices, etc. can be run on the client operating system.
- the client vCPU can flexibly switch between sequential consistency mode and TSO mode, and read and write operations of different consistency protocols will be processed by distributed shared memory to achieve higher consistency while ensuring consistency Performance.
- devices that use device pass-through to perform I/O access to the client such as GPU, will be accessed through the gMMU in the client, and thus can be managed by distributed shared memory.
- a design method for distributed virtual machine adaptive memory consistency protocol including the interception of synchronization operations and the state description of the synchronization protocol.
- the interception method of synchronization operation is as follows:
- vCPU work in sequential consistency mode in user mode and in TSO mode in kernel mode;
- the vCPU only runs in sequential consistency mode
- KVM will Invalidate the copies on other nodes when it writes to the memory page, allowing the program to monopolize the memory page.
- the states of the adaptive protocol include Invalid, Shared, Dirty and Modified; in addition, in order to support atomic instructions, three states Invalid*, Shared*, and Dirty* are added;
- the current node has a vCPU with sequential consistency mode: when writing a page in the Shared or Shared* state, the page state transitions to the Modified state; when writing a page in the Invalid or Invalid* state, it also transitions to the Modified state;
- the design method of the distributed virtual machine adaptive memory consistency protocol provided by the above-mentioned embodiment of the present invention, on the basis of the distributed virtual machine, designs a new memory consistency protocol-the adaptive consistency protocol , Which makes distributed shared memory obtain better performance.
- synchronization operations such as mfence in the client program and the kernel are rewritten into hypercalls, and are handed over to KVM (Kernel Virtual Machine) for processing.
- KVM Kernel Virtual Machine
- the state of the adaptive consistency protocol adds the Dirty (dirty page) state and the corresponding atomic operation state (*) on the original basis, and adds the page state of the sequential consistency (SC) mode and x86-TSO (TSO) mode , Their state transition is shown in Figure 2.
- the embodiment of the present invention also provides a protocol for improving adaptive memory consistency of distributed virtual machines, which is adapted through any one of the above-mentioned distributed virtual machines.
- the design method of memory consistency protocol is designed.
- the embodiment of the present invention also provides a terminal, including a memory, a processor, and a computer program stored on the memory and capable of running on the processor, and the processor executes all
- the computer program can be used to execute any of the design methods of the distributed virtual machine adaptive memory consistency protocol described above.
- the present invention provides a distributed virtual machine adaptive memory consistency protocol and a design method and terminal thereof. Based on the distributed virtual machine, an adaptive consistency protocol is designed so that the distributed shared memory can achieve better performance.
- the present invention enables the memory synchronization protocol of the distributed virtual machine to be flexibly switched between sequential consistency and x86-TSO; for different application scenarios and restrictions, the adaptive consistency protocol relaxes the original sequential consistency to x86-TSO, making distributed Improved shared memory performance.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Software Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Storage Device Security (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Multi Processors (AREA)
Abstract
Description
Claims (5)
- 一种分布式虚拟机自适应内存一致性协议的设计方法,其特征在于,包括:-同步操作的拦截:针对不同的应用,客户机vCPU在顺序一致性模式和TSO模式中灵活切换;-同步协议的状态说明:在原有的内存一致性协议基础上,添加Dirty及对应的原子操作状态,并实现状态转移。
- 根据权利要求1所述的分布式虚拟机自适应内存一致性协议的设计方法,其特征在于,所述同步操作的拦截,包括:将vCPU划分为两种工作模式,即顺序一致性模式和TSO模式,使得客户机能够通过hypercall在两种模式间切换:-若客户机支持半虚拟化,则使vCPU在用户态时工作在顺序一致性模式下,在内核态时工作在TSO模式下;-若客户机不支持半虚拟化,则vCPU只运行在顺序一致性模式下;改写内核中的同步操作,即内存屏障和原子指令为hypercall,主动通知KVM来进行分布式共享内存的处理;对于不能半虚拟化的用户态程序,在其对内存页进行写入时KVM将会Invalidate其他节点上的副本,让程序独占该内存页。
- 根据权利要求1所述的分布式虚拟机自适应内存一致性协议的设计方法,其特征在于,所述同步协议的状态说明,包括:设自适应协议的状态包括:Invalid,表示无效页;Shared,表示共享页;Dirty,表示脏页;Modified,表示已修改页;Invalid*,表示原子操作模式的无效页;Shared*,表示原子操作模式的共享页;Dirty*,表示原子操作模式的脏页;对自适应一致性协议进行状态转移:-当节点上任意一个vCPU最先从顺序一致性模式切换到TSO模式时,将节点 上尚未同步的修改广播给其他节点,同时使处在Dirty或Dirty*状态的页分别转移到Shared和shared*状态;-若当前节点存在顺序一致性模式的vCPU:写入Shared或shared*状态的页时,页状态转移到Modified状态;写入Invalid或Invalid*状态的页时,转移到Modified状态;-若当前节点不存在顺序一致性模式的vCPU:写入Shared或Shared*状态的页时,页状态分别转移到Dirty和Dirty*状态;写入Invalid或Invalid*状态的页时,分别转移到Dirty和Dirty*状态;-当读取Invalid或Invalid*页时,首先从页Owner处获得一份拷贝,随后分别转移到Shared和Shared*状态。
- 一种分布式虚拟机自适应内存一致性协议,其特征在于,通过权利要求1至3中任一项所述的分布式虚拟机自适应内存一致性协议的设计方法设计得到。
- 一种终端,包括存储器、处理器及存储在存储器上并能够在处理器上运行的计算机程序,其特征在于,所述处理器执行所述计算机程序时能够用于执行权利要求1至3中任一项所述的分布式虚拟机自适应内存一致性协议的设计方法。
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910746605.8A CN110569105B (zh) | 2019-08-14 | 2019-08-14 | 分布式虚拟机自适应内存一致性协议及其设计方法、终端 |
CN201910746605.8 | 2019-08-14 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2021027069A1 true WO2021027069A1 (zh) | 2021-02-18 |
Family
ID=68775308
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2019/113236 WO2021027069A1 (zh) | 2019-08-14 | 2019-10-25 | 分布式虚拟机自适应内存一致性协议及其设计方法、终端 |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN110569105B (zh) |
WO (1) | WO2021027069A1 (zh) |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111414134B (zh) * | 2020-02-20 | 2021-05-25 | 上海交通大学 | 面向持久内存文件系统的事务写优化框架的方法及系统 |
CN115705194A (zh) * | 2021-08-13 | 2023-02-17 | 华为技术有限公司 | 一种硬件内存序架构下的代码处理方法及相应装置 |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6591355B2 (en) * | 1998-09-28 | 2003-07-08 | Technion Research And Development Foundation Ltd. | Distributed shared memory system with variable granularity |
CN103744725A (zh) * | 2013-12-24 | 2014-04-23 | 杭州华为数字技术有限公司 | 一种虚拟机管理方法及装置 |
CN105653347A (zh) * | 2014-11-28 | 2016-06-08 | 杭州华为数字技术有限公司 | 一种服务器、资源管理方法及虚拟机管理器 |
CN108932154A (zh) * | 2018-07-23 | 2018-12-04 | 上海交通大学 | 一种分布式虚拟机管理器 |
Family Cites Families (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104021063B (zh) * | 2014-05-14 | 2015-03-11 | 南京大学 | 一种基于硬件虚拟化的模块化计算机取证系统及其方法 |
US10157133B2 (en) * | 2015-12-10 | 2018-12-18 | Arm Limited | Snoop filter for cache coherency in a data processing system |
KR101863578B1 (ko) * | 2016-12-01 | 2018-07-02 | 경북대학교 산학협력단 | 컴퓨팅 시스템의 적응적인 캐시 메모리 접근 장치 및 그 방법 |
CN108021429B (zh) * | 2017-12-12 | 2019-08-06 | 上海交通大学 | 一种基于numa架构的虚拟机内存及网卡资源亲和度计算方法 |
-
2019
- 2019-08-14 CN CN201910746605.8A patent/CN110569105B/zh active Active
- 2019-10-25 WO PCT/CN2019/113236 patent/WO2021027069A1/zh active Application Filing
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6591355B2 (en) * | 1998-09-28 | 2003-07-08 | Technion Research And Development Foundation Ltd. | Distributed shared memory system with variable granularity |
CN103744725A (zh) * | 2013-12-24 | 2014-04-23 | 杭州华为数字技术有限公司 | 一种虚拟机管理方法及装置 |
CN105653347A (zh) * | 2014-11-28 | 2016-06-08 | 杭州华为数字技术有限公司 | 一种服务器、资源管理方法及虚拟机管理器 |
CN108932154A (zh) * | 2018-07-23 | 2018-12-04 | 上海交通大学 | 一种分布式虚拟机管理器 |
Also Published As
Publication number | Publication date |
---|---|
CN110569105A (zh) | 2019-12-13 |
CN110569105B (zh) | 2023-05-26 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10846145B2 (en) | Enabling live migration of virtual machines with passthrough PCI devices | |
US7533198B2 (en) | Memory controller and method for handling DMA operations during a page copy | |
US6075938A (en) | Virtual machine monitors for scalable multiprocessors | |
US10176007B2 (en) | Guest code emulation by virtual machine function | |
US20120239904A1 (en) | Seamless interface for multi-threaded core accelerators | |
Chapman et al. | vNUMA: A Virtual Shared-Memory Multiprocessor. | |
CN111712796B (zh) | 用于实时迁移具有分配的外围设备的虚拟机的系统 | |
Macdonell | Shared-memory optimizations for virtual machines | |
US8032716B2 (en) | System, method and computer program product for providing a new quiesce state | |
CN108932154B (zh) | 一种分布式虚拟机管理器 | |
US20150212956A1 (en) | Updating virtual machine memory by interrupt handler | |
US8458438B2 (en) | System, method and computer program product for providing quiesce filtering for shared memory | |
WO2021027069A1 (zh) | 分布式虚拟机自适应内存一致性协议及其设计方法、终端 | |
JP2022123079A (ja) | ヘテロジニアスコンピューティングのためのシステム、方法及び装置 | |
US10817456B2 (en) | Separation of control and data plane functions in SoC virtualized I/O device | |
US10120709B2 (en) | Guest initiated atomic instructions for shared memory page host copy on write | |
JP2523653B2 (ja) | 仮想計算機システム | |
US11263122B2 (en) | Implementing fine grain data coherency of a shared memory region | |
Ivanovic et al. | Performance analysis of ivshmem for high-performance computing in virtual machines | |
US20230112225A1 (en) | Virtual machine remote host memory accesses | |
US20180217939A1 (en) | Deferring registration for dma operations | |
JP6266767B2 (ja) | 仮想環境内での不揮発性メモリ状態の一貫した効率的ミラーリング | |
US11755512B2 (en) | Managing inter-processor interrupts in virtualized computer systems | |
Klimiankou | Towards practical multikernel OSes with MySyS | |
WO2022022708A1 (zh) | 一种进程间通信的方法、装置及计算机存储介质 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 19941721 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 19941721 Country of ref document: EP Kind code of ref document: A1 |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 19941721 Country of ref document: EP Kind code of ref document: A1 |
|
32PN | Ep: public notification in the ep bulletin as address of the adressee cannot be established |
Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 22.05.2023) |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 19941721 Country of ref document: EP Kind code of ref document: A1 |