WO2020135504A1 - Virtualization method and apparatus - Google Patents

Virtualization method and apparatus Download PDF

Info

Publication number
WO2020135504A1
WO2020135504A1 PCT/CN2019/128310 CN2019128310W WO2020135504A1 WO 2020135504 A1 WO2020135504 A1 WO 2020135504A1 CN 2019128310 W CN2019128310 W CN 2019128310W WO 2020135504 A1 WO2020135504 A1 WO 2020135504A1
Authority
WO
WIPO (PCT)
Prior art keywords
nvme
memory
virtual machine
blk
driver
Prior art date
Application number
PCT/CN2019/128310
Other languages
French (fr)
Chinese (zh)
Inventor
李翌
彭浩
Original Assignee
中兴通讯股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中兴通讯股份有限公司 filed Critical 中兴通讯股份有限公司
Publication of WO2020135504A1 publication Critical patent/WO2020135504A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45533Hypervisors; Virtual machine monitors
    • G06F9/45558Hypervisor-specific management and integration aspects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0806Multiuser, multiprocessor or multiprocessing cache systems
    • G06F12/0815Cache consistency protocols
    • G06F12/0831Cache consistency protocols using a bus scheme, e.g. with bus monitoring or watching means
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0806Multiuser, multiprocessor or multiprocessing cache systems
    • G06F12/0815Cache consistency protocols
    • G06F12/0831Cache consistency protocols using a bus scheme, e.g. with bus monitoring or watching means
    • G06F12/0835Cache consistency protocols using a bus scheme, e.g. with bus monitoring or watching means for main memory peripheral accesses (e.g. I/O or DMA)
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45533Hypervisors; Virtual machine monitors
    • G06F9/45558Hypervisor-specific management and integration aspects
    • G06F2009/4557Distribution of virtual machine instances; Migration and load balancing

Definitions

  • the embodiments of the present application relate to but are not limited to the computer field, for example, a virtualization method and device.
  • SSDs Solid State Drives
  • SSDs are hard drives made with solid-state electronic storage chip arrays, and most of them are currently based on Flash chips as storage media.
  • Flash chips as storage media.
  • SSDs are fast to read and write, light in weight, low in energy consumption, shock-proof and drop-resistant, and small in size. It has been used more and more in the fields of server and personal computing.
  • Non-Volatile Memory Express is a storage device interface specification that can take full advantage of the low latency of the external device high-speed interconnect standard (Peripheral Component Interconnect express (PCI-E) channel) And the characteristics of parallelism simplify the input and output (IO) access path of the solid state disk.
  • PCI-E Peripheral Component Interconnect express
  • IO input and output
  • SCSI Small Computer System Interface
  • SATA Serial Advanced Technology Attachment
  • NVMe can provide lower latency, Higher transmission performance and lower power consumption control.
  • NVMe adapts to the high-speed IO characteristics of SSD, which greatly improves the read and write performance of solid-state disks, and is widely used in high-end SSD devices.
  • NVMe SSD virtual machine storage support based on NVMe SSD.
  • the general virtualization method of NVMe SSD is shown in Figure 1.
  • the virtual machine uses the front-end driver to make read and write requests with the back-end driver in the Quick Emulator (Qemu).
  • the back-end driver runs in the context of the host and interfaces with the virtual file system (Virtual File System (VFS)).
  • VFS Virtual File System
  • VFS Virtual File System
  • NVMe devices provide two types of queues: management and IO. There is only one management queue and multiple groups of IO queues. Each group of queues consists of two circular lists of submission and response, which are used by the NVMe device to receive commands and place execution information.
  • the front-end driver sends the read-write request to the back-end driver; after the back-end driver receives the read-write request, the read-write request is converted to a host (Host) (also (Referred to as physical machine) VFS read and write operations; Host operating system (Operating System, OS) kernel calls NVMe SSD drive read and write commands according to VFS read and write operations, so that the NVMe driver generates IO read and write instructions into the IO of the NVMe device In the submission queue of the queue; after the NVMe device executes the IO read and write instructions, the result is placed in the response queue of the IO queue, and a message interrupt (Message Signaled Interrupts, MSI) is generated to notify the host; the host then processes through the NVMe driver and is encapsulated by VFS , Return the result to the back-end driver, and return to the front-end through the virtualization mechanism to complete the IO read and write operations
  • Host also (Referred to as physical machine) VFS
  • one IO operation needs to involve two front-end and back-end interactions, and involves multiple copies of data in the front-end and back-end, and the long IO stack from VFS in the Host to NVMe, efficiency Low, is not conducive to the guests to take full advantage of the high IO characteristics of NVMe SSD devices.
  • the embodiments of the present application provide a virtualization method and device, which can improve IO efficiency.
  • An embodiment of the present application provides a virtualization method, including: in the case where a predetermined request is issued by an application of a virtual machine, the interface specification of the non-volatile memory host controller of the virtual machine—the block NVMe-Blk driver directly stores memory from the virtual NVMe
  • the first DMA memory is allocated in the DMA memory management area; wherein, the first DMA memory includes the second DMA memory of the commit queue of the input/output IO queue and the third DMA memory of the completion queue; the NVMe-Blk driver is based on the first The physical address HPA of the physical machine in the DMA memory constructs a submission queue item, and notifies the NVMe device of the host to process the predetermined request; the NVMe-Blk driver reads the response information in the completion queue.
  • An embodiment of the present application also provides a virtualization method, which includes: the host's non-volatile memory host controller interface specification NVMe device learns that a predetermined request needs to be processed, performs an operation corresponding to the predetermined request, and places the response information Into the completion queue.
  • An embodiment of the present application provides a virtualization device, including: a non-volatile memory host controller interface specification—a block NVMe-Blk driver, which is configured to directly store memory from a virtual NVMe when the virtual machine application issues a predetermined request
  • the first DMA memory is allocated in the DMA memory management area; wherein, the first DMA memory includes a second DMA memory of the commit queue of the input and output IO queue and a third DMA memory of the completion queue; according to the second DMA memory
  • the physical machine physical address HPA constructs a submission queue item, notifies the NVMe device to process the predetermined request; reads the response information in the completion queue.
  • An embodiment of the present application provides a virtualization device, including: a non-volatile memory host controller interface specification NVMe device, which is configured to perform an operation corresponding to the predetermined request when it is informed that a predetermined request needs to be processed, and place the response information Into the completion queue.
  • NVMe device non-volatile memory host controller interface specification
  • An embodiment of the present application further provides a virtualization device, including a processor and a computer-readable storage medium, where the computer-readable storage medium stores instructions, and when the instructions are executed by the processor, the foregoing A method of virtualization.
  • An embodiment of the present application provides a computer-readable storage medium on which a computer program is stored, and when the computer program is executed by a processor, any one of the foregoing virtualization methods is implemented.
  • Figure 1 is a schematic diagram of a common virtualization method of related technology NVMe SSDs
  • FIG. 3 is a schematic diagram of a doorbell (DoorBell) area mapping with an IO queue in an embodiment of this application;
  • FIG. 4 is a schematic diagram of a virtualization method in an embodiment of this application.
  • FIG. 5 is a flowchart of a virtualization method proposed by another embodiment of this application.
  • FIG. 6 is a schematic structural diagram of a virtualization device according to an embodiment of the present application.
  • an embodiment of the present application provides a virtualization method, including steps 200 to 202.
  • step 200 in the case that the application of the virtual machine issues a predetermined request, the NVMe-block (Block, Blk) driver of the virtual machine allocates the first from the virtual NVMe direct memory access (Direct Memory Access, DMA,) memory management area.
  • a DMA memory wherein, the first DMA memory includes the second DMA memory of the commit queue (Summit Queue, SQ) of the IO queue and the third DMA memory of the completion queue (Complete Queue, CQ).
  • step 201 the NVMe-Blk driver constructs an SQ item based on the physical machine physical address (HPA) of the second DMA memory, and notifies the NVMe device of the host to process the predetermined request.
  • HPA physical machine physical address
  • the SQ item further includes information such as the sector number and priority of the NVMe device.
  • the second DMA memory is the source address
  • the sector number is the destination address, that is, the NVMe device is notified through the SQ item to write data from the second DMA memory to the sector number.
  • the NVMe device may be an SSD.
  • any one of the following methods may be used to notify the NVMe device of the host to process the reservation request.
  • Method 1 The NVMe-Blk driver sends a write doorbell (DoorBell) request to the host; wherein the write DoorBell request includes submission queue item information, that is, submission queue item subscript; the host writes the submission queue item information Into the field corresponding to the IO queue identifier in the DoorBell area of the register of the NVMe device.
  • DoorBell write doorbell
  • the NVMe-Blk driver can be based on a generalized virtual memory address trapping mechanism, or the HyperCall instruction of the Central Processing Unit (CPU) sends a write DoorBell request to the host hypervisor.
  • CPU Central Processing Unit
  • the DoorBell area is arranged according to the IO queue ID, and the host can know the field corresponding to the IO queue ID based on the IO queue ID and the start address of the DoorBell area.
  • Method 2 The NVMe-Blk driver pre-stores the second correspondence between the IO queue identifier and the field of the doorbell DoorBell area in the register of the NVMe device; the NVMe-Blk driver writes the submission queue item information to the NVMe In the doorbell DoorBell area of the device's register, in the field corresponding to the IO queue identifier.
  • the DoorBell area of the NVMe device is a continuous area, which can be mapped into a virtual machine, and each IO queue corresponds to a 16-bit field in it, which can be initialized by the NVMe-Blk driver.
  • the mapping operation to the DoorBell area is completed, the second corresponding relationship is saved.
  • the NVMe-Blk driver is used to directly write the submission queue item information into the field corresponding to the IO queue identifier in the DoorBell area of the NVMe device register, without writing through the host, which improves the IO efficiency.
  • step 202 the NVMe-Blk driver reads the response information in the completion queue.
  • the NVMe-Blk driver may use any of the following methods to read the response information in the CQ.
  • Method 1 The NVMe-Blk driver polls the completion queue to obtain response information.
  • Method 2 Before the host sends the first correspondence between the IO queue identifier and the HPA of the first physical virtual machine memory to the NVMe-Blk driver, the host allocates message interrupts (MSI) to the IO queue Interrupt number, the MSI interrupt number is set to be processed directly by the virtual machine; when the host sends the first correspondence between the IO queue identifier and the HPA of the first physical virtual machine memory to the NVMe-Blk driver, Send the MSI interrupt number to the NVMe-Blk driver; after the host performs the operation corresponding to the predetermined request, the host triggers an interrupt according to the MSI interrupt number; the NVMe-Blk driver reads the interrupt when the interrupt corresponding to the MSI interrupt number is triggered Describe the response information in the completion queue.
  • MSI message interrupts
  • the NVMe-Blk driver can set the corresponding interrupt handler in the virtual machine.
  • the top handler of the interrupt reads the completion queue information and releases the CQ queue to trigger the bottom handler of the interrupt; when the predetermined request is In the case of a read request, the interrupt bottom handler obtains the location of the third DMA memory from the completion queue information, completes the copy of the read data, and releases the DMA memory.
  • the interrupt handler will trigger the IO protocol stack of the virtual machine OS to complete the action, and return the corresponding operation return information to the read and write requester.
  • the MSI interrupt number is set to be processed directly by the virtual machine.
  • the X86 processor it can be implemented based on the mechanism of Direct IO Virtualization Technology (Virtualization for Directed I/O, VT-D) PostInterrupt. That is, the MSI interrupt number is set to The VT-D PostInterrupt virtual universal interrupt controller (Virtual Generic Interrupt, VGI) feature area of the X86 processor of the virtual machine; it can be completed based on the vGSI mechanism in the ARM processor; in the case of the virtual machine monopolizing the host CPU, it can be Based on the X86 processor, a virtual machine control structure (Virtual-Machine Control Structure, VMCS) is provided with an interrupt (non-exit) method.
  • VGI Virtual Generic Interrupt
  • the predetermined request includes at least one of the following: a read request and a write request.
  • step 200 and step 201 further include: the NVMe-Blk driver copies the written data to the second DMA memory.
  • the response information includes the HPA of the third DMA memory
  • the NVMe-Blk driver reads data from the third DMA memory according to the response information, and copies the read data to the application's read cache, Free the third DMA memory.
  • the method further includes: the NVMe-Blk driver applies to the virtual machine for a continuous first physical virtual machine memory, and the virtual machine physical of the first physical virtual machine memory
  • the address (Guest Physical) (GPA) is transmitted to the host; among them, the GPA of the starting position of the first physical virtual machine memory can be transmitted to the host; the GPA of the first physical virtual machine memory can be transmitted to the host Qemu; NVMe- The Blk driver receives the first correspondence between the IO queue identifier and the HPA of the first physical virtual machine memory; the NVMe-Blk driver applies to the virtual machine for a continuous second physical virtual machine memory, and the second The physical virtual machine memory is set to the DMA access method, and the virtual machine physical address GPA of the second physical virtual machine memory is transmitted to the host; wherein, the GPA of the starting position of the second physical virtual machine memory can be transmitted to the host; Transfer the GPA of the memory of the second physical virtual machine to Qemu of the
  • a continuous first physical virtual machine memory and a second physical virtual machine memory may be applied to the virtual machine.
  • the NVMe-Blk driver can be created when the virtual machine operating system (Operating System) is initialized, and it is registered as a block (Block, Blk) driver of the virtual machine to complete random read and write operations based on Blk.
  • the transfer method of the GPA address from the virtual machine to the host includes any one of the following: under a kernel virtual machine (Kernel-based Virtual Machine, KVM), it can be transferred based on a virtual input output (virtIO) channel; based on Qemu shared memory area transfer; HyperCall instruction parameter transfer based on X86vt technology.
  • KVM Kernel-based Virtual Machine
  • the virtIO channel may be a virtIO control channel.
  • the virtIO channel may be established by the NVMe-Blk driver and Qemu of the host.
  • the NVMe-Blk driver is used to achieve direct access to the NVMe device, which reduces the participation of the host, thereby achieving efficient IO operations of the virtual machine, and does not need to occupy the CPU, reduces the consumption of virtualization, and does not need to support specific NVMe device hardware has good versatility.
  • step 500 another embodiment of the present application provides a virtualization method, including step 500.
  • step 500 when the host's NVMe device learns that a predetermined request needs to be processed, it performs an operation corresponding to the predetermined request and puts the response information into the completion queue.
  • the NVMe device learns that a predetermined request needs to be processed.
  • the host or the NVMe-Blk driver may write the submission queue item information into the field corresponding to the IO queue identifier in the DoorBell area of the register of the NVMe device.
  • the NVMe-Blk driver In the case where the NVMe-Blk driver writes the submission queue item information into the field corresponding to the IO queue identifier in the DoorBell area of the NVMe device's register, the NVMe-Blk driver needs to save the IO queue identifier and the NVME device's The second correspondence between the fields of the DoorBell area in the register.
  • the NVMe-Blk driver sends a write DoorBell request to the host; where the write DoorBell request includes submission queue information; the host writes the submission queue item information into the field corresponding to the IO queue identifier in the DoorBell area of the register of the NVMe device.
  • the reservation request is a write request or a read request; in the case where the reservation request is a write request, performing operations corresponding to the reservation request includes: the NVMe device stores the second DMA memory according to the submission queue item information. The data is written to the corresponding sector of the NVMe device.
  • the operation corresponding to the predetermined request includes: the NVMe device reads data from the corresponding sector of the NVMe device according to the submission queue item information, and copies the read data to the third DMA In memory.
  • the method before performing the operation corresponding to the predetermined request in step 500, before placing the response information in the completion queue, the method further includes: the host converts the received GPA of the first physical virtual machine memory into HPA, wherein the device command includes the HPA of the first physical virtual machine memory; wherein, Qemu of the host calls the linux kernel interface to convert GPA to HPA, and sets the noCache attribute, Qemu sends the HPA to the host, and the host sends the HPA Send to the NVMe-Blk driver; the host creates an IO queue based on the HPA of the first physical virtual machine memory, and sends the first correspondence between the IO queue identifier and the HPA of the first physical virtual machine memory to NVMe -Blk driver; wherein, Qemu of the host creates an IO queue by calling NVMe device control commands; the host converts the received GPA of the second physical virtual machine memory into HPA, and converts the second physical virtual machine memory Set to host DMA
  • the method before the host sends the first correspondence between the IO queue identifier and the HPA of the memory of the first physical virtual machine to the NVMe-Blk driver, the method further includes: the host is the IO The queue allocation message interrupts the MSI interrupt number, and sets the MSI interrupt number to be directly processed by the virtual machine; the host sends the first correspondence between the IO queue identifier and the HPA of the first physical virtual machine memory to When driving the NVMe-Blk, the MSI interrupt number is also sent to the NVMe-Blk driver; after the host performs the operation corresponding to the predetermined request, the method further includes: the host triggering an interrupt according to the MSI interrupt number .
  • the host may transfer the HPA or IO queue identification (ID) or MSI interrupt number to the virtual machine based on the shared memory area of Qemu or the virtIO channel.
  • ID IO queue identification
  • MSI interrupt number MSI interrupt number
  • the virtIO channel may be a virtIO control channel.
  • the virtIO channel may be established by the NVMe-Blk driver and Qemu of the host.
  • FIG. 6 another embodiment of the present application provides a virtualization device, including an NVMe-Blk driver 601.
  • the NVMe-Blk driver 601 is configured to allocate the first DMA memory from the virtual NVMe DMA memory management area when the virtual machine application issues a predetermined request; the first DMA memory includes the input queue of the input and output IO queue.
  • the second DMA memory and the third DMA memory of the completion queue construct the submission queue item according to the HPA of the second DMA memory, notify the NVMe device of the host to process the predetermined request; read the response information in the completion queue.
  • the predetermined request is a write request; the NVMe-Blk driver 601 is further configured to copy the written data to the second DMA memory.
  • the predetermined request is a read request; the NVMe-Blk driver 601 is further configured to read the read data from the third DMA memory according to the completion queue information, and read the read data The data is copied to the read buffer of the application, and the third DMA memory is released.
  • the NVMe-Blk driver 601 is further configured to: apply to the virtual machine for a continuous first physical virtual machine memory, and transmit the virtual machine physical address GPA of the first physical virtual machine memory to The host; receiving the first correspondence between the IO queue identifier and the HPA of the first physical virtual machine memory; applying to the virtual machine for a continuous second physical virtual machine memory, and setting the second physical virtual machine memory to The DMA access method transmits the virtual machine physical address GPA of the second physical virtual machine memory to the host; the virtual NVMe DMA memory management area is created according to the received HPA of the second physical virtual machine memory.
  • the NVMe-Blk driver 601 is further configured to: save the second correspondence between the IO queue identifier and the field of the doorbell DoorBell area in the register of the NVMe device; the queue item will be submitted The information is written in the doorbell DoorBell area of the NVMe device's register, in the corresponding field of the IO queue identifier.
  • the NVMe-Blk driver 601 is configured to notify the NVMe device of the host to process a predetermined request in the following manner: send a doorbell write DoorBell request to the host; wherein, the write DoorBell request includes submission queue item information, That is to submit the subscript of the queue item.
  • the NVMe-Blk driver 601 is further configured to: receive the MSI interrupt number; and read the response information in the completion queue when the interrupt corresponding to the MSI interrupt number is triggered.
  • the NVMe-Blk driver 501 is configured to read the response information in the completion queue in the following manner: poll the completion queue to obtain the response information.
  • a virtualization apparatus such as a host
  • the NVMe device 602 is set to perform an operation corresponding to the reservation request when it learns that a reservation request needs to be processed, and put the response information into the completion queue.
  • the reservation request is a write request
  • the NVMe device 602 is set to implement the operation corresponding to the reservation request in the following manner:
  • the reservation request is a read request
  • the NVMe device 602 is configured to implement the operation corresponding to the reservation request in the following manner:
  • a controller 603 configured to convert the received GPA of the first physical virtual machine memory into a physical machine physical address HPA; create an IO queue according to the HPA of the first physical virtual machine memory , Sending the first correspondence between the IO queue identifier and the HPA of the first physical virtual machine memory to the NVMe-Blk driver; converting the received GPA of the second physical virtual machine memory into the physical machine physical address HPA, Setting the memory of the second physical virtual machine to a host DMA access mode, and sending the HPA of the memory of the second physical virtual machine to the NVMe-Blk driver.
  • a controller 603 configured to allocate a message interrupt MSI interrupt number to the IO queue, set the MSI interrupt number to be directly processed by the virtual machine; send the MSI interrupt number Drive the NVMe-Blk; after the NVMe device 602 performs an operation corresponding to the predetermined request, the NVMe device 602 triggers an interrupt according to the MSI interrupt number; in the embodiment of the present application, the controller 603 is further configured to place the submission queue The item information is written into the field corresponding to the IO queue identifier in the DoorBell area of the register of the NVMe device.
  • Another embodiment of the present application provides a computer-readable storage medium on which a computer program is stored, and when the computer program is executed by a processor, the steps of any one of the foregoing virtualization methods are implemented.
  • the term computer storage medium includes both volatile and nonvolatile implemented in any method or technology for storing information such as computer readable instructions, data structures, program modules, or other data Sex, removable and non-removable media.
  • Computer storage media include but are not limited to Random Access Memory (RAM), Read-Only Memory (ROM), Erasable Programmable Read-Only Memory (Electrically Programmable Read Only Memory (EEPROM), Flash memory or other memory technology, compact disc (Read-Only Memory, CD-ROM), digital versatile disc (Digital Video Disc, DVD) or other optical disc storage, magnetic box, magnetic tape, magnetic disk storage or other magnetic storage devices Or any other medium that can be used to store desired information and can be accessed by a computer.
  • the communication medium generally contains computer readable instructions, data structures, program modules, or other data in a modulated data signal such as a carrier wave or other transmission mechanism, and may include any information delivery medium .

Abstract

Provided are a virtualization method and apparatus. The method comprises: in the case where an application of a virtual machine sends a pre-determined request, an NVMe-Blk drive of the virtual machine allocating, from a virtual NVMe direct memory access (DMA) memory management area, a first DMA memory, wherein the first DMA memory comprises a second DMA memory of a submission queue and a third DMA memory of a completion queue of an input/output (IO) queue (200); the NVMe-Blk drive constructing a submission queue item according to an HPA of the second DMA memory, and notifying an NVMe device of a host of processing the pre-determined request (201); and the NVMe-Blk drive reading response information in the completion queue (202).

Description

一种虚拟化方法和装置Virtualization method and device
本申请要求在2018年12月27日提交中国专利局、申请号为201811612722.7的中国专利申请的优先权,该申请的全部内容通过引用结合在本申请中。This application requires the priority of the Chinese patent application with the application number 201811612722.7 filed by the China Patent Office on December 27, 2018. The entire content of this application is incorporated by reference in this application.
技术领域Technical field
本申请实施例涉及但不限于计算机领域,例如一种虚拟化方法和装置。The embodiments of the present application relate to but are not limited to the computer field, for example, a virtualization method and device.
背景技术Background technique
固态盘(Solid State Drives,SSD)是用固态电子存储芯片阵列而制成的硬盘,目前大部分基于Flash芯片作为存储介质。与常规机械硬盘相比,SSD读写快速、质量轻、能耗低、防震抗摔且体积小。已经在服务器、个人计算等领域得到了越来越多的应用。Solid State Drives (SSD) are hard drives made with solid-state electronic storage chip arrays, and most of them are currently based on Flash chips as storage media. Compared with conventional mechanical hard drives, SSDs are fast to read and write, light in weight, low in energy consumption, shock-proof and drop-resistant, and small in size. It has been used more and more in the fields of server and personal computing.
非易失内存主机控制器接口规范(Non-Volatile Memory express,NVMe)是一种存储设备接口规范,可以充分利用外部设备高速互连标准(Peripheral Component Interconnect express,PCI-E)通道的低延时以及并行性的特点,简化了固态盘的输入输出(Input Output,IO)访问路径。相比串行连接小型计算机系统接口(Small Computer System Interface,SCSI)(Serial Attached SCSI,SAS)或串行高级连接技术(Serial Advanced Technology Attachment,SATA)接口规范,NVMe可以提供更低的延时,更高的传输性能,以及更低的功耗控制。NVMe适配了SSD的高速IO特点,极大提升了固态盘的读写性能,在高端SSD设备中得到广泛应用。The non-volatile memory host controller interface specification (Non-Volatile Memory Express (NVMe) is a storage device interface specification that can take full advantage of the low latency of the external device high-speed interconnect standard (Peripheral Component Interconnect express (PCI-E) channel) And the characteristics of parallelism simplify the input and output (IO) access path of the solid state disk. Compared with the serial connection of small computer system interface (Small Computer System Interface (SCSI) (Serial Attached SCSI, SAS) or serial advanced connection technology (Serial Advanced Technology Attachment, SATA) interface specification, NVMe can provide lower latency, Higher transmission performance and lower power consumption control. NVMe adapts to the high-speed IO characteristics of SSD, which greatly improves the read and write performance of solid-state disks, and is widely used in high-end SSD devices.
目前许多云主机都基于NVMe SSD提供虚拟机存储支持,NVMe SSD的普通虚拟化方法如图1所示。虚拟机通过前端驱动,与在快速模拟器(Quick Emulator,Qemu)中的后端驱动进行读写请求,后端驱动运行在主机上下文环境中,并通过虚拟文件系统(Virtual File System,VFS)接口,向主机中的NVMe驱动进行读写请求。在NVMe接口规范中,NVMe设备提供管理和IO两类队列,管理队列只有一组,IO队列有多组。每组队列由提交和回应两个循环列表组成,用于NVMe设备接收命令和放置执行信息。At present, many cloud hosts provide virtual machine storage support based on NVMe SSD. The general virtualization method of NVMe SSD is shown in Figure 1. The virtual machine uses the front-end driver to make read and write requests with the back-end driver in the Quick Emulator (Qemu). The back-end driver runs in the context of the host and interfaces with the virtual file system (Virtual File System (VFS)). , Read and write requests to the NVMe driver in the host. In the NVMe interface specification, NVMe devices provide two types of queues: management and IO. There is only one management queue and multiple groups of IO queues. Each group of queues consists of two circular lists of submission and response, which are used by the NVMe device to receive commands and place execution information.
如图1所示,在虚拟机进行IO读写时,前端驱动会将读写请求发送到后端驱 动;后端驱动接收到读写请求后,将读写请求转换为主机(Host)(也称为物理机)的VFS读写操作;Host的操作系统(Operating System,OS)内核根据VFS读写操作调用NVMe SSD驱动的读写命令,使得NVMe驱动产生IO读写指令放入NVMe设备的IO队列的提交队列中;NVMe设备执行IO读写指令后,将结果放入IO队列的回应队列,并产生消息中断(Message Signaled Interrupts,MSI)通知主机;主机再通过NVMe驱动处理,通过VFS的封装,将结果返回给后端驱动,在通过虚拟化机制,返回前端去的,完成虚拟机或客户机(Guest)的IO读写操作。As shown in Figure 1, when the virtual machine performs IO read and write, the front-end driver sends the read-write request to the back-end driver; after the back-end driver receives the read-write request, the read-write request is converted to a host (Host) (also (Referred to as physical machine) VFS read and write operations; Host operating system (Operating System, OS) kernel calls NVMe SSD drive read and write commands according to VFS read and write operations, so that the NVMe driver generates IO read and write instructions into the IO of the NVMe device In the submission queue of the queue; after the NVMe device executes the IO read and write instructions, the result is placed in the response queue of the IO queue, and a message interrupt (Message Signaled Interrupts, MSI) is generated to notify the host; the host then processes through the NVMe driver and is encapsulated by VFS , Return the result to the back-end driver, and return to the front-end through the virtualization mechanism to complete the IO read and write operations of the virtual machine or guest (Guest).
可以看出,普通的NVMe虚拟化方法中,一次IO操作需要涉及到两次前后端交互,并涉及到数据在前后端中的多次拷贝,以及Host中的VFS到NVMe的长IO栈,效率较低,不利于Guest充分利用NVMe SSD设备的高IO特性。It can be seen that in the ordinary NVMe virtualization method, one IO operation needs to involve two front-end and back-end interactions, and involves multiple copies of data in the front-end and back-end, and the long IO stack from VFS in the Host to NVMe, efficiency Low, is not conducive to the guests to take full advantage of the high IO characteristics of NVMe SSD devices.
发明内容Summary of the invention
本申请实施例提供了一种虚拟化方法和装置,能够提升IO效率。The embodiments of the present application provide a virtualization method and device, which can improve IO efficiency.
本申请实施例提供了一种虚拟化方法,包括:在虚拟机的应用发出预定请求的情况下,虚拟机的非易失内存主机控制器接口规范—块NVMe-Blk驱动从虚拟NVMe直接内存存取DMA内存管理区中分配第一DMA内存;其中,第一DMA内存包括输入输出IO队列的提交队列的第二DMA内存和完成队列的第三DMA内存;所述NVMe-Blk驱动根据所述第二DMA内存的物理机物理地址HPA构建提交队列项,通知主机的NVMe设备处理所述预定请求;所述NVMe-Blk驱动读取完成队列中的回应信息。An embodiment of the present application provides a virtualization method, including: in the case where a predetermined request is issued by an application of a virtual machine, the interface specification of the non-volatile memory host controller of the virtual machine—the block NVMe-Blk driver directly stores memory from the virtual NVMe The first DMA memory is allocated in the DMA memory management area; wherein, the first DMA memory includes the second DMA memory of the commit queue of the input/output IO queue and the third DMA memory of the completion queue; the NVMe-Blk driver is based on the first The physical address HPA of the physical machine in the DMA memory constructs a submission queue item, and notifies the NVMe device of the host to process the predetermined request; the NVMe-Blk driver reads the response information in the completion queue.
本申请实施例还提供了一种虚拟化方法,包括:主机的非易失内存主机控制器接口规范NVMe设备获知有预定请求需要处理时,进行与所述预定请求对应的操作,将回应信息放入完成队列中。An embodiment of the present application also provides a virtualization method, which includes: the host's non-volatile memory host controller interface specification NVMe device learns that a predetermined request needs to be processed, performs an operation corresponding to the predetermined request, and places the response information Into the completion queue.
本申请实施例提供了一种虚拟化装置,包括:非易失内存主机控制器接口规范—块NVMe-Blk驱动,设置为在虚拟机的应用发出预定请求的情况下,从虚拟NVMe直接内存存取DMA内存管理区中分配第一DMA内存;其中,所述第一DMA内存包括输入输出IO队列的提交队列的第二DMA内存和完成队列的第三DMA内存;根据所述第二DMA内存的物理机物理地址HPA构建提交队列项,通知NVMe设备处理所述预定请求;读取完成队列中的回应信息。An embodiment of the present application provides a virtualization device, including: a non-volatile memory host controller interface specification—a block NVMe-Blk driver, which is configured to directly store memory from a virtual NVMe when the virtual machine application issues a predetermined request The first DMA memory is allocated in the DMA memory management area; wherein, the first DMA memory includes a second DMA memory of the commit queue of the input and output IO queue and a third DMA memory of the completion queue; according to the second DMA memory The physical machine physical address HPA constructs a submission queue item, notifies the NVMe device to process the predetermined request; reads the response information in the completion queue.
本申请实施例提供了一种虚拟化装置,包括:非易失内存主机控制器接口 规范NVMe设备,设置为获知有预定请求需要处理时,进行与所述预定请求对应的操作,将回应信息放入完成队列中。An embodiment of the present application provides a virtualization device, including: a non-volatile memory host controller interface specification NVMe device, which is configured to perform an operation corresponding to the predetermined request when it is informed that a predetermined request needs to be processed, and place the response information Into the completion queue.
本申请实施例还提供了一种虚拟化装置,包括处理器和计算机可读存储介质,所述计算机可读存储介质中存储有指令,当所述指令被所述处理器执行时,实现上述任一种虚拟化方法。An embodiment of the present application further provides a virtualization device, including a processor and a computer-readable storage medium, where the computer-readable storage medium stores instructions, and when the instructions are executed by the processor, the foregoing A method of virtualization.
本申请实施例提供了一种计算机可读存储介质,其上存储有计算机程序,所述计算机程序被处理器执行时,实现上述任一种虚拟化方法。An embodiment of the present application provides a computer-readable storage medium on which a computer program is stored, and when the computer program is executed by a processor, any one of the foregoing virtualization methods is implemented.
附图说明BRIEF DESCRIPTION
图1为相关技术NVMe SSD的普通虚拟化方法的示意图;Figure 1 is a schematic diagram of a common virtualization method of related technology NVMe SSDs;
图2为本申请一个实施例提出的虚拟化方法的流程图;2 is a flowchart of a virtualization method proposed by an embodiment of this application;
图3为本申请一实施例中门铃(DoorBell)区域与IO队列映射示意图;FIG. 3 is a schematic diagram of a doorbell (DoorBell) area mapping with an IO queue in an embodiment of this application;
图4为本申请一实施例中虚拟化方法的示意图;4 is a schematic diagram of a virtualization method in an embodiment of this application;
图5为本申请另一个实施例提出的虚拟化方法的流程图;5 is a flowchart of a virtualization method proposed by another embodiment of this application;
图6为本申请实施例虚拟化装置的结构组成示意图。6 is a schematic structural diagram of a virtualization device according to an embodiment of the present application.
具体实施方式detailed description
在附图的流程图示出的步骤可以在诸如一组计算机可执行指令的计算机系统中执行。并且,虽然在流程图中示出了逻辑顺序,但是在某些情况下,可以以不同于此处的顺序执行所示出或描述的步骤。The steps shown in the flowcharts of the figures can be performed in a computer system such as a set of computer-executable instructions. And, although a logical order is shown in the flowchart, in some cases, the steps shown or described may be performed in an order different from here.
参见图2,本申请一个实施例提出了一种虚拟化方法,包括步骤200至步骤202。Referring to FIG. 2, an embodiment of the present application provides a virtualization method, including steps 200 to 202.
在步骤200中,在虚拟机的应用发出预定请求的情况下,虚拟机的NVMe-块(Block,Blk)驱动从虚拟NVMe直接内存存取(Direct Memory Access,DMA,)内存管理区中分配第一DMA内存;其中,第一DMA内存包括IO队列的提交队列(Summit Queue,SQ)的第二DMA内存和完成队列(Complete Queue,CQ)的第三DMA内存。In step 200, in the case that the application of the virtual machine issues a predetermined request, the NVMe-block (Block, Blk) driver of the virtual machine allocates the first from the virtual NVMe direct memory access (Direct Memory Access, DMA,) memory management area. A DMA memory; wherein, the first DMA memory includes the second DMA memory of the commit queue (Summit Queue, SQ) of the IO queue and the third DMA memory of the completion queue (Complete Queue, CQ).
在步骤201中,NVMe-Blk驱动根据第二DMA内存的物理机物理地址(Host Physical Address,HPA)构建SQ项,通知主机的NVMe设备处理所述预定请求。In step 201, the NVMe-Blk driver constructs an SQ item based on the physical machine physical address (HPA) of the second DMA memory, and notifies the NVMe device of the host to process the predetermined request.
在本申请实施例中,SQ项还包括:NVMe设备的扇区号和优先级等信息。In the embodiment of the present application, the SQ item further includes information such as the sector number and priority of the NVMe device.
其中,第二DMA内存为源地址,扇区号为目的地址,也就是说,通过SQ 项告知NVMe设备将数据从第二DMA内存写入到扇区号内。Among them, the second DMA memory is the source address, and the sector number is the destination address, that is, the NVMe device is notified through the SQ item to write data from the second DMA memory to the sector number.
在本申请实施例中,NVMe设备可以是SSD。In the embodiment of the present application, the NVMe device may be an SSD.
在本申请实施例中可以采用以下任一种方法通知主机的NVMe设备处理预定请求。In the embodiments of the present application, any one of the following methods may be used to notify the NVMe device of the host to process the reservation request.
方法一、NVMe-Blk驱动向所述主机发送写门铃(DoorBell)请求;其中,所述写DoorBell请求包括提交队列项信息,即提交队列项下标;所述主机将所述提交队列项信息写入所述NVMe设备的寄存器的DoorBell区域中所述IO队列标识对应的字段中。Method 1: The NVMe-Blk driver sends a write doorbell (DoorBell) request to the host; wherein the write DoorBell request includes submission queue item information, that is, submission queue item subscript; the host writes the submission queue item information Into the field corresponding to the IO queue identifier in the DoorBell area of the register of the NVMe device.
该方法中,NVMe-Blk驱动可以基于通用的虚拟化的内存地址陷入机制,或者中央处理器(Central Processing Unit,CPU)的HyperCall指令向主机的管理程序(Hypervisor)发送写DoorBell请求。In this method, the NVMe-Blk driver can be based on a generalized virtual memory address trapping mechanism, or the HyperCall instruction of the Central Processing Unit (CPU) sends a write DoorBell request to the host hypervisor.
该方法中,DoorBell区域是按照IO队列标识进行排列的,主机基于IO队列标识和DoorBell区域的起始地址,即可获知IO队列标识对应的字段。In this method, the DoorBell area is arranged according to the IO queue ID, and the host can know the field corresponding to the IO queue ID based on the IO queue ID and the start address of the DoorBell area.
方法二、NVMe-Blk驱动预先保存所述IO队列标识和所述NVMe设备的寄存器中的门铃DoorBell区域的字段之间的第二对应关系;NVMe-Blk驱动将提交队列项信息写入所述NVMe设备的寄存器的门铃DoorBell区域中,所述IO队列标识对应的字段中。Method 2: The NVMe-Blk driver pre-stores the second correspondence between the IO queue identifier and the field of the doorbell DoorBell area in the register of the NVMe device; the NVMe-Blk driver writes the submission queue item information to the NVMe In the doorbell DoorBell area of the device's register, in the field corresponding to the IO queue identifier.
该方法中,如图3所示,NVMe设备的DoorBell区域是连续区域,可以映射到虚拟机中,每个IO队列对应其中的一个16比特(bit)的字段,可以通过在NVMe-Blk驱动初始化时完成对DoorBell区域的映射操作,即保存第二对应关系。In this method, as shown in Figure 3, the DoorBell area of the NVMe device is a continuous area, which can be mapped into a virtual machine, and each IO queue corresponds to a 16-bit field in it, which can be initialized by the NVMe-Blk driver. When the mapping operation to the DoorBell area is completed, the second corresponding relationship is saved.
采用第二种方式通过NVMe-Blk驱动直接将提交队列项信息写入NVMe设备的寄存器的DoorBell区域中IO队列标识对应的字段中,而不需要通过主机来写入,提高了IO效率。In the second way, the NVMe-Blk driver is used to directly write the submission queue item information into the field corresponding to the IO queue identifier in the DoorBell area of the NVMe device register, without writing through the host, which improves the IO efficiency.
在步骤202中,NVMe-Blk驱动读取完成队列中的回应信息。In step 202, the NVMe-Blk driver reads the response information in the completion queue.
在本申请实施例中,NVMe-Blk驱动可以采用以下任一种方法读取CQ中的回应信息。In the embodiment of the present application, the NVMe-Blk driver may use any of the following methods to read the response information in the CQ.
方法一、NVMe-Blk驱动轮询所述完成队列得到回应信息。Method 1: The NVMe-Blk driver polls the completion queue to obtain response information.
方法二、主机将IO队列标识和所述第一物理虚拟机内存的HPA之间的第一对应关系发送给NVMe-Blk驱动之前,主机为所述IO队列分配消息中断(Message Signal Interrupts,MSI)中断号,将所述MSI中断号设置成所述虚拟 机直接处理;主机将IO队列标识和所述第一物理虚拟机内存的HPA之间的第一对应关系发送给NVMe-Blk驱动时,还将所述MSI中断号发送给所述NVMe-Blk驱动;主机进行与预定请求对应的操作后,主机根据所述MSI中断号触发中断;NVMe-Blk驱动在MSI中断号对应中断触发时读取所述完成队列中的回应信息。Method 2: Before the host sends the first correspondence between the IO queue identifier and the HPA of the first physical virtual machine memory to the NVMe-Blk driver, the host allocates message interrupts (MSI) to the IO queue Interrupt number, the MSI interrupt number is set to be processed directly by the virtual machine; when the host sends the first correspondence between the IO queue identifier and the HPA of the first physical virtual machine memory to the NVMe-Blk driver, Send the MSI interrupt number to the NVMe-Blk driver; after the host performs the operation corresponding to the predetermined request, the host triggers an interrupt according to the MSI interrupt number; the NVMe-Blk driver reads the interrupt when the interrupt corresponding to the MSI interrupt number is triggered Describe the response information in the completion queue.
该方法中,NVMe-Blk驱动可以在虚拟机中设置对应的中断处理程序,在中断触发时,中断顶部处理程序读取完成队列信息,并释放CQ队列,触发中断底部处理程序;在预定请求为读请求的情况下,中断底部处理程序从完成队列信息中获取出第三DMA内存的位置,完成读数据拷贝,以及DMA内存释放。In this method, the NVMe-Blk driver can set the corresponding interrupt handler in the virtual machine. When the interrupt is triggered, the top handler of the interrupt reads the completion queue information and releases the CQ queue to trigger the bottom handler of the interrupt; when the predetermined request is In the case of a read request, the interrupt bottom handler obtains the location of the third DMA memory from the completion queue information, completes the copy of the read data, and releases the DMA memory.
在linux中,中断底部处理程序将触发虚拟机OS的IO协议栈完成动作,将对应的操作返回信息返回给读写请求者。In Linux, the interrupt handler will trigger the IO protocol stack of the virtual machine OS to complete the action, and return the corresponding operation return information to the read and write requester.
该方法中,将MSI中断号设置成虚拟机直接处理在X86处理器中可以基于直接IO虚拟化技术(Virtualization Technology for Directed I/O,VT-D)PostInterrupt的机制实现,即将MSI中断号设置到虚拟机的X86处理器的VT-D PostInterrupt虚拟通用中断控制器(Virtual Generic Interrupt,VGI)特性区中;在ARM处理器中可以基于vGSI机制完成;在虚拟机独占主机的CPU的情况下,可以基于X86处理器在虚拟机控制结构(Virtual-Machine Control Structure,VMCS)中设置中断(interrupt)非退出方式实现。In this method, the MSI interrupt number is set to be processed directly by the virtual machine. In the X86 processor, it can be implemented based on the mechanism of Direct IO Virtualization Technology (Virtualization for Directed I/O, VT-D) PostInterrupt. That is, the MSI interrupt number is set to The VT-D PostInterrupt virtual universal interrupt controller (Virtual Generic Interrupt, VGI) feature area of the X86 processor of the virtual machine; it can be completed based on the vGSI mechanism in the ARM processor; in the case of the virtual machine monopolizing the host CPU, it can be Based on the X86 processor, a virtual machine control structure (Virtual-Machine Control Structure, VMCS) is provided with an interrupt (non-exit) method.
在本申请实施例中,预定请求包括以下至少之一:读请求、写请求。In the embodiment of the present application, the predetermined request includes at least one of the following: a read request and a write request.
在预定请求为写请求的情况下,步骤200和步骤201之间还包括:NVMe-Blk驱动将所写的数据拷贝到所述第二DMA内存中。In the case where the predetermined request is a write request, step 200 and step 201 further include: the NVMe-Blk driver copies the written data to the second DMA memory.
在预定请求为读请求的情况下,回应信息包括第三DMA内存的HPA,NVMe-Blk驱动根据回应信息从第三DMA内存中读取数据,将读取的数据拷贝到应用的读缓存中,释放第三DMA内存。In the case where the predetermined request is a read request, the response information includes the HPA of the third DMA memory, and the NVMe-Blk driver reads data from the third DMA memory according to the response information, and copies the read data to the application's read cache, Free the third DMA memory.
在本申请另一个实施例中,在步骤200之前还包括:所述NVMe-Blk驱动向所述虚拟机申请连续的第一物理虚拟机内存,将所述第一物理虚拟机内存的虚拟机物理地址(Guest Physical Address,GPA)传输给主机;其中,可以将第一物理虚拟机内存的起始位置的GPA传输给主机;可以将第一物理虚拟机内存的GPA传输给主机的Qemu;NVMe-Blk驱动接收到IO队列标识和第一物理虚拟机内存的HPA之间的第一对应关系;所述NVMe-Blk驱动向所述虚拟机申请连续的第二物理虚拟机内存,将所述第二物理虚拟机内存设置成DMA访问方法, 将所述第二物理虚拟机内存的虚拟机物理地址GPA传输给主机;其中,可以将第二物理虚拟机内存的起始位置的GPA传输给主机;可以将第二物理虚拟机内存的GPA传输给主机的Qemu;所述NVMe-Blk驱动根据接收到的第二物理虚拟机内存的HPA创建所述虚拟NVMe DMA内存管理区。In another embodiment of the present application, before step 200, the method further includes: the NVMe-Blk driver applies to the virtual machine for a continuous first physical virtual machine memory, and the virtual machine physical of the first physical virtual machine memory The address (Guest Physical) (GPA) is transmitted to the host; among them, the GPA of the starting position of the first physical virtual machine memory can be transmitted to the host; the GPA of the first physical virtual machine memory can be transmitted to the host Qemu; NVMe- The Blk driver receives the first correspondence between the IO queue identifier and the HPA of the first physical virtual machine memory; the NVMe-Blk driver applies to the virtual machine for a continuous second physical virtual machine memory, and the second The physical virtual machine memory is set to the DMA access method, and the virtual machine physical address GPA of the second physical virtual machine memory is transmitted to the host; wherein, the GPA of the starting position of the second physical virtual machine memory can be transmitted to the host; Transfer the GPA of the memory of the second physical virtual machine to Qemu of the host; the NVMe-Blk driver creates the virtual NVMe DMA memory management area according to the received HPA of the memory of the second physical virtual machine.
在本申请一实施例中,如图4所示,可以在NVMe-Blk驱动创建时,向虚拟机申请连续的第一物理虚拟机内存和第二物理虚拟机内存。In an embodiment of the present application, as shown in FIG. 4, when the NVMe-Blk driver is created, a continuous first physical virtual machine memory and a second physical virtual machine memory may be applied to the virtual machine.
其中,NVMe-Blk驱动可以在虚拟机操作系统(Operating System,OS)初始化时创建,它注册为虚拟机的块(Block,Blk)驱动,完成基于Blk的随机读写操作。Among them, the NVMe-Blk driver can be created when the virtual machine operating system (Operating System) is initialized, and it is registered as a block (Block, Blk) driver of the virtual machine to complete random read and write operations based on Blk.
在本申请一实施例中,GPA地址从虚拟机到主机的传递方式包括以下任一种:在内核虚拟机(Kernel-based Virtual Machine,KVM)下可以基于虚拟输入输出(virtIO)通道传递;基于Qemu的共享内存区传递;基于X86vt技术的HyperCall指令参数传递。In an embodiment of the present application, the transfer method of the GPA address from the virtual machine to the host includes any one of the following: under a kernel virtual machine (Kernel-based Virtual Machine, KVM), it can be transferred based on a virtual input output (virtIO) channel; based on Qemu shared memory area transfer; HyperCall instruction parameter transfer based on X86vt technology.
其中,virtIO通道可以是virtIO控制通道,该virtIO通道可以在创建NVMe-Blk驱动时,由NVMe-Blk驱动与主机的Qemu建立。Among them, the virtIO channel may be a virtIO control channel. When the NVMe-Blk driver is created, the virtIO channel may be established by the NVMe-Blk driver and Qemu of the host.
本申请实施例通过NVMe-Blk驱动实现对NVMe设备的直接访问,减少了主机的参与,从而实现虚拟机的高效IO操作,并且不需要占用CPU,降低了虚拟化的损耗,且无需支持特定的NVMe设备硬件,具有较好的通用性。In this embodiment of the present application, the NVMe-Blk driver is used to achieve direct access to the NVMe device, which reduces the participation of the host, thereby achieving efficient IO operations of the virtual machine, and does not need to occupy the CPU, reduces the consumption of virtualization, and does not need to support specific NVMe device hardware has good versatility.
参见图5,本申请另一个实施例提出了一种虚拟化方法,包括步骤500。Referring to FIG. 5, another embodiment of the present application provides a virtualization method, including step 500.
在步骤500中,主机的NVMe设备获知有预定请求需要处理时,进行与预定请求对应的操作,将回应信息放入完成队列中。In step 500, when the host's NVMe device learns that a predetermined request needs to be processed, it performs an operation corresponding to the predetermined request and puts the response information into the completion queue.
在本申请实施例中,在NVMe设备的寄存器的DoorBell区域中所述IO队列标识对应的字段有提交队列项信息写入的情况下,NVMe设备获知有预定请求需要处理。In the embodiment of the present application, in the case that the field corresponding to the IO queue identifier in the DoorBell area of the register of the NVMe device has commit queue item information written, the NVMe device learns that a predetermined request needs to be processed.
其中,可以由主机或NVMe-Blk驱动将提交队列项信息写入NVMe设备的寄存器的DoorBell区域中所述IO队列标识对应的字段中。Wherein, the host or the NVMe-Blk driver may write the submission queue item information into the field corresponding to the IO queue identifier in the DoorBell area of the register of the NVMe device.
其中,在由NVMe-Blk驱动将提交队列项信息写入NVMe设备的寄存器的DoorBell区域中所述IO队列标识对应的字段中的情况下,NVMe-Blk驱动需要预先保存IO队列标识和NVME设备的寄存器中DoorBell区域的字段之间的第二对应关系。In the case where the NVMe-Blk driver writes the submission queue item information into the field corresponding to the IO queue identifier in the DoorBell area of the NVMe device's register, the NVMe-Blk driver needs to save the IO queue identifier and the NVME device's The second correspondence between the fields of the DoorBell area in the register.
其中,在由主机将提交队列项信息写入NVMe设备的寄存器的DoorBell区 域中所述IO队列标识对应的字段中的情况下,NVMe-Blk驱动向主机发送写DoorBell请求;其中,写DoorBell请求包括提交队列信息;主机将提交队列项信息写入所述NVMe设备的寄存器的DoorBell区域中所述IO队列标识对应的字段中。In the case where the host writes the submission queue item information into the field corresponding to the IO queue identifier in the DoorBell area of the NVMe device's register, the NVMe-Blk driver sends a write DoorBell request to the host; where the write DoorBell request includes Submission queue information; the host writes the submission queue item information into the field corresponding to the IO queue identifier in the DoorBell area of the register of the NVMe device.
在本申请一实施例中,预定请求为写请求或读请求;在预定请求为写请求的情况下,进行与预定请求对应的操作包括:NVMe设备根据提交队列项信息将第二DMA内存中的数据写入到NVMe设备的对应扇区中。In an embodiment of the present application, the reservation request is a write request or a read request; in the case where the reservation request is a write request, performing operations corresponding to the reservation request includes: the NVMe device stores the second DMA memory according to the submission queue item information. The data is written to the corresponding sector of the NVMe device.
在预定请求为读请求的情况下,进行与预定请求对应的操作包括:NVMe设备根据所述提交队列项信息从NVMe设备的对应扇区中读取数据,将读取的数据拷贝到第三DMA内存中。In the case where the predetermined request is a read request, the operation corresponding to the predetermined request includes: the NVMe device reads data from the corresponding sector of the NVMe device according to the submission queue item information, and copies the read data to the third DMA In memory.
在本申请另一个实施例中,步骤500中进行与预定请求对应的操作,将回应信息放入完成队列中之前,还包括:所述主机将接收到的第一物理虚拟机内存的GPA转换成HPA,其中,所述设备命令包括所述第一物理虚拟机内存的HPA;其中,主机的Qemu调用linux内核接口将GPA转换成HPA,并设置noCache属性,Qemu将HPA发送给主机,主机将HPA发送给NVMe-Blk驱动;所述主机根据所述第一物理虚拟机内存的HPA创建IO队列,将IO队列标识和所述第一物理虚拟机内存的HPA之间的第一对应关系发送给NVMe-Blk驱动;其中,主机的Qemu通过调用NVMe设备控制命令的方式来创建IO队列;所述主机将接收到的第二物理虚拟机内存的GPA转换成HPA,将所述第二物理虚拟机内存设置成主机DMA访问模式,将所述第二物理虚拟机内存的HPA发送给所述NVMe-Blk驱动;其中,主机的Qemu调用linux内核接口将GPA转换成HPA,并设置noCache属性,Qemu将HPA发送给主机,主机将HPA发送给NVMe-Blk驱动。In another embodiment of the present application, before performing the operation corresponding to the predetermined request in step 500, before placing the response information in the completion queue, the method further includes: the host converts the received GPA of the first physical virtual machine memory into HPA, wherein the device command includes the HPA of the first physical virtual machine memory; wherein, Qemu of the host calls the linux kernel interface to convert GPA to HPA, and sets the noCache attribute, Qemu sends the HPA to the host, and the host sends the HPA Send to the NVMe-Blk driver; the host creates an IO queue based on the HPA of the first physical virtual machine memory, and sends the first correspondence between the IO queue identifier and the HPA of the first physical virtual machine memory to NVMe -Blk driver; wherein, Qemu of the host creates an IO queue by calling NVMe device control commands; the host converts the received GPA of the second physical virtual machine memory into HPA, and converts the second physical virtual machine memory Set to host DMA access mode, send the HPA of the second physical virtual machine memory to the NVMe-Blk driver; where, Qemu of the host calls the Linux kernel interface to convert GPA to HPA, and set the noCache attribute, Qemu will HPA Send to the host, the host sends HPA to the NVMe-Blk driver.
在本申请另一个实施例中,主机将IO队列标识和所述第一物理虚拟机内存的HPA之间的第一对应关系发送给NVMe-Blk驱动之前,还包括:所述主机为所述IO队列分配消息中断MSI中断号,将所述MSI中断号设置成所述虚拟机直接处理;所述主机将IO队列标识和所述第一物理虚拟机内存的HPA之间的第一对应关系发送给NVMe-Blk驱动时,还将所述MSI中断号发送给所述NVMe-Blk驱动;所述主机进行与预定请求对应的操作后,该方法还包括:所述主机根据所述MSI中断号触发中断。In another embodiment of the present application, before the host sends the first correspondence between the IO queue identifier and the HPA of the memory of the first physical virtual machine to the NVMe-Blk driver, the method further includes: the host is the IO The queue allocation message interrupts the MSI interrupt number, and sets the MSI interrupt number to be directly processed by the virtual machine; the host sends the first correspondence between the IO queue identifier and the HPA of the first physical virtual machine memory to When driving the NVMe-Blk, the MSI interrupt number is also sent to the NVMe-Blk driver; after the host performs the operation corresponding to the predetermined request, the method further includes: the host triggering an interrupt according to the MSI interrupt number .
在本申请实施例中,主机可以基于Qemu的共享内存区,或virtIO通道向虚 拟机传递HPA或IO队列标识(ID)或MSI中断号。In the embodiment of the present application, the host may transfer the HPA or IO queue identification (ID) or MSI interrupt number to the virtual machine based on the shared memory area of Qemu or the virtIO channel.
其中,virtIO通道可以是virtIO控制通道,该virtIO通道可以在创建NVMe-Blk驱动时,由NVMe-Blk驱动与主机的Qemu建立。Among them, the virtIO channel may be a virtIO control channel. When the NVMe-Blk driver is created, the virtIO channel may be established by the NVMe-Blk driver and Qemu of the host.
参见图6,本申请另一个实施例提出了一种虚拟化装置,包括NVMe-Blk驱动601。Referring to FIG. 6, another embodiment of the present application provides a virtualization device, including an NVMe-Blk driver 601.
NVMe-Blk驱动601,设置为在虚拟机的应用发出预定请求的情况下,从虚拟NVMe DMA内存管理区中分配第一DMA内存;其中,第一DMA内存包括输入输出IO队列的提交队列的第二DMA内存和完成队列的第三DMA内存;根据第二DMA内存的HPA构建提交队列项,通知主机的NVMe设备处理所述预定请求;读取完成队列中的回应信息。The NVMe-Blk driver 601 is configured to allocate the first DMA memory from the virtual NVMe DMA memory management area when the virtual machine application issues a predetermined request; the first DMA memory includes the input queue of the input and output IO queue. The second DMA memory and the third DMA memory of the completion queue; construct the submission queue item according to the HPA of the second DMA memory, notify the NVMe device of the host to process the predetermined request; read the response information in the completion queue.
在本申请实施例中,所述预定请求为写请求;所述NVMe-Blk驱动601还设置为:将所写的数据拷贝到所述第二DMA内存中。In the embodiment of the present application, the predetermined request is a write request; the NVMe-Blk driver 601 is further configured to copy the written data to the second DMA memory.
在本申请实施例中,所述预定请求为读请求;所述NVMe-Blk驱动601还设置为:根据所述完成队列信息从第三DMA内存中读取所读取的数据,将读取的数据拷贝到应用的读缓存中,释放第三DMA内存。In the embodiment of the present application, the predetermined request is a read request; the NVMe-Blk driver 601 is further configured to read the read data from the third DMA memory according to the completion queue information, and read the read data The data is copied to the read buffer of the application, and the third DMA memory is released.
在本申请实施例中,所述NVMe-Blk驱动601还设置为:向所述虚拟机申请连续的第一物理虚拟机内存,将所述第一物理虚拟机内存的虚拟机物理地址GPA传输给主机;接收到IO队列标识和第一物理虚拟机内存的HPA之间的第一对应关系;向所述虚拟机申请连续的第二物理虚拟机内存,将所述第二物理虚拟机内存设置成DMA访问方法,将所述第二物理虚拟机内存的虚拟机物理地址GPA传输给主机;根据接收到的第二物理虚拟机内存的HPA创建所述虚拟NVMe DMA内存管理区。In the embodiment of the present application, the NVMe-Blk driver 601 is further configured to: apply to the virtual machine for a continuous first physical virtual machine memory, and transmit the virtual machine physical address GPA of the first physical virtual machine memory to The host; receiving the first correspondence between the IO queue identifier and the HPA of the first physical virtual machine memory; applying to the virtual machine for a continuous second physical virtual machine memory, and setting the second physical virtual machine memory to The DMA access method transmits the virtual machine physical address GPA of the second physical virtual machine memory to the host; the virtual NVMe DMA memory management area is created according to the received HPA of the second physical virtual machine memory.
在本申请实施例中,所述NVMe-Blk驱动601还设置为:保存所述IO队列标识和所述NVMe设备的寄存器中的门铃DoorBell区域的字段之间的第二对应关系;将提交队列项信息写入所述NVMe设备的寄存器的门铃DoorBell区域中,所述IO队列标识对应字段中。In the embodiment of the present application, the NVMe-Blk driver 601 is further configured to: save the second correspondence between the IO queue identifier and the field of the doorbell DoorBell area in the register of the NVMe device; the queue item will be submitted The information is written in the doorbell DoorBell area of the NVMe device's register, in the corresponding field of the IO queue identifier.
在本申请实施例中,NVMe-Blk驱动601设置为采用以下方式实现通知主机的NVMe设备处理预定请求:向所述主机发送写门铃DoorBell请求;其中,所述写DoorBell请求包括提交队列项信息,即提交队列项下标。In the embodiment of the present application, the NVMe-Blk driver 601 is configured to notify the NVMe device of the host to process a predetermined request in the following manner: send a doorbell write DoorBell request to the host; wherein, the write DoorBell request includes submission queue item information, That is to submit the subscript of the queue item.
在本申请实施例中,NVMe-Blk驱动601还设置为:接收MSI中断号;在MSI中断号对应的中断触发时读取所述完成队列中的回应信息。In the embodiment of the present application, the NVMe-Blk driver 601 is further configured to: receive the MSI interrupt number; and read the response information in the completion queue when the interrupt corresponding to the MSI interrupt number is triggered.
在本申请实施例中,所述NVMe-Blk驱动501设置为采用以下方式实现读取完成队列中的回应信息:轮询所述完成队列得到回应信息。In the embodiment of the present application, the NVMe-Blk driver 501 is configured to read the response information in the completion queue in the following manner: poll the completion queue to obtain the response information.
上述虚拟化装置的实现方式与前述实施例的虚拟化方法相同,这里不再赘述。The implementation of the foregoing virtualization device is the same as the virtualization method of the foregoing embodiment, and details are not described here.
本申请另一个实施例提出了一种虚拟化装置(如主机),包括NVMe设备602。Another embodiment of the present application provides a virtualization apparatus (such as a host), including an NVMe device 602.
NVMe设备602,设置为获知有预定请求需要处理时,进行与预定请求对应的操作,将回应信息放入完成队列中。The NVMe device 602 is set to perform an operation corresponding to the reservation request when it learns that a reservation request needs to be processed, and put the response information into the completion queue.
在本申请实施例中,预定请求为写请求,NVMe设备602设置为采用以下方式实现进行与预定请求对应的操作:In the embodiment of the present application, the reservation request is a write request, and the NVMe device 602 is set to implement the operation corresponding to the reservation request in the following manner:
根据提交队列项信息将第二DMA内存中的数据写入到NVMe设备的对应扇区中。Write the data in the second DMA memory to the corresponding sector of the NVMe device according to the submission queue item information.
在本申请实施例中,预定请求为读请求,NVMe设备602设置为采用以下方式实现进行与预定请求对应的操作:In the embodiment of the present application, the reservation request is a read request, and the NVMe device 602 is configured to implement the operation corresponding to the reservation request in the following manner:
根据提交队列项信息从NVMe设备的对应扇区中读取数据,将读取的数据拷贝到第三DMA内存中。Read data from the corresponding sector of the NVMe device according to the submission queue item information, and copy the read data to the third DMA memory.
在本申请实施例中,还包括:控制器603,设置为将接收到的第一物理虚拟机内存的GPA转换成物理机物理地址HPA;根据所述第一物理虚拟机内存的HPA创建IO队列,将IO队列标识和所述第一物理虚拟机内存的HPA之间的第一对应关系发送给NVMe-Blk驱动;将接收到的第二物理虚拟机内存的GPA转换成物理机物理地址HPA,将所述第二物理虚拟机内存设置成主机DMA访问模式,将所述第二物理虚拟机内存的HPA发送给所述NVMe-Blk驱动。In the embodiment of the present application, it further includes: a controller 603 configured to convert the received GPA of the first physical virtual machine memory into a physical machine physical address HPA; create an IO queue according to the HPA of the first physical virtual machine memory , Sending the first correspondence between the IO queue identifier and the HPA of the first physical virtual machine memory to the NVMe-Blk driver; converting the received GPA of the second physical virtual machine memory into the physical machine physical address HPA, Setting the memory of the second physical virtual machine to a host DMA access mode, and sending the HPA of the memory of the second physical virtual machine to the NVMe-Blk driver.
在本申请实施例中,还包括:控制器603,设置为为所述IO队列分配消息中断MSI中断号,将所述MSI中断号设置成所述虚拟机直接处理;将所述MSI中断号发送给所述NVMe-Blk驱动;NVMe设备602进行与预定请求对应的操作后,NVMe设备602根据所述MSI中断号触发中断;在本申请实施例中,控制器603还设置为将所述提交队列项信息写入所述NVMe设备的寄存器的DoorBell区域中IO队列标识对应的字段中。In the embodiment of the present application, it further includes: a controller 603 configured to allocate a message interrupt MSI interrupt number to the IO queue, set the MSI interrupt number to be directly processed by the virtual machine; send the MSI interrupt number Drive the NVMe-Blk; after the NVMe device 602 performs an operation corresponding to the predetermined request, the NVMe device 602 triggers an interrupt according to the MSI interrupt number; in the embodiment of the present application, the controller 603 is further configured to place the submission queue The item information is written into the field corresponding to the IO queue identifier in the DoorBell area of the register of the NVMe device.
上述虚拟化装置的实现方式与前述实施例的虚拟化方法相同,这里不再赘述。The implementation of the foregoing virtualization device is the same as the virtualization method of the foregoing embodiment, and details are not described here.
本申请另一个实施例提出了一种虚拟化装置,包括处理器和计算机可读存 储介质,所述计算机可读存储介质中存储有指令,当所述指令被所述处理器执行时,实现上述任一种虚拟化方法。Another embodiment of the present application provides a virtualization device, including a processor and a computer-readable storage medium, where the computer-readable storage medium stores instructions, which are implemented when the instructions are executed by the processor Any kind of virtualization method.
本申请另一个实施例提出了一种计算机可读存储介质,其上存储有计算机程序,所述计算机程序被处理器执行时实现上述任一种虚拟化方法的步骤。Another embodiment of the present application provides a computer-readable storage medium on which a computer program is stored, and when the computer program is executed by a processor, the steps of any one of the foregoing virtualization methods are implemented.
本领域普通技术人员可以理解,上文中所公开方法中的全部或某些步骤、系统、装置中的功能模块/单元可以被实施为软件、固件、硬件及其适当的组合。在硬件实施方式中,在以上描述中提及的功能模块/单元之间的划分不一定对应于物理组件的划分;例如,一个物理组件可以具有多个功能,或者一个功能或步骤可以由多个物理组件合作执行。某些组件或所有组件可以被实施为由处理器,如数字信号处理器或微处理器执行的软件,或者被实施为硬件,或者被实施为集成电路,如专用集成电路。这样的软件可以分布在计算机可读介质上,计算机可读介质可以包括计算机存储介质(或非暂时性介质)和通信介质(或暂时性介质)。如本领域普通技术人员公知的,术语计算机存储介质包括在用于存储信息(诸如计算机可读指令、数据结构、程序模块或其他数据)的任何方法或技术中实施的易失性和非易失性、可移除和不可移除介质。计算机存储介质包括但不限于随机存取存储器(Random Access Memory,RAM)、只读存储器(Read-Only Memory,ROM)、带电可擦可编程只读存储器(Electrically Erasable Programmable read only memory,EEPROM)、闪存或其他存储器技术、只读光盘(Compact Disc Read-Only Memory,CD-ROM)、数字多功能盘(Digital Video Disc,DVD)或其他光盘存储、磁盒、磁带、磁盘存储或其他磁存储装置、或者可以用于存储期望的信息并且可以被计算机访问的任何其他的介质。此外,本领域普通技术人员公知的是,通信介质通常包含计算机可读指令、数据结构、程序模块或者诸如载波或其他传输机制之类的调制数据信号中的其他数据,并且可包括任何信息递送介质。Those of ordinary skill in the art may understand that all or some of the steps, systems, and functional modules/units in the method disclosed above may be implemented as software, firmware, hardware, and appropriate combinations thereof. In a hardware implementation, the division between the functional modules/units mentioned in the above description does not necessarily correspond to the division of physical components; for example, a physical component may have multiple functions, or a function or step may be composed of multiple The physical components are executed in cooperation. Some or all components may be implemented as software executed by a processor, such as a digital signal processor or microprocessor, or as hardware, or as an integrated circuit, such as an application specific integrated circuit. Such software may be distributed on computer-readable media, which may include computer storage media (or non-transitory media) and communication media (or transitory media). As is well known to those of ordinary skill in the art, the term computer storage medium includes both volatile and nonvolatile implemented in any method or technology for storing information such as computer readable instructions, data structures, program modules, or other data Sex, removable and non-removable media. Computer storage media include but are not limited to Random Access Memory (RAM), Read-Only Memory (ROM), Erasable Programmable Read-Only Memory (Electrically Programmable Read Only Memory (EEPROM), Flash memory or other memory technology, compact disc (Read-Only Memory, CD-ROM), digital versatile disc (Digital Video Disc, DVD) or other optical disc storage, magnetic box, magnetic tape, magnetic disk storage or other magnetic storage devices Or any other medium that can be used to store desired information and can be accessed by a computer. In addition, it is well known to those of ordinary skill in the art that the communication medium generally contains computer readable instructions, data structures, program modules, or other data in a modulated data signal such as a carrier wave or other transmission mechanism, and may include any information delivery medium .

Claims (18)

  1. 一种虚拟化方法,包括:A virtualization method, including:
    在虚拟机的应用发出预定请求的情况下,虚拟机的非易失内存主机控制器接口规范—块NVMe-Blk驱动从虚拟NVMe直接内存存取DMA内存管理区中分配第一DMA内存;其中,第一DMA内存包括输入输出IO队列的提交队列的第二DMA内存和完成队列的第三DMA内存;In the case that the application of the virtual machine issues a predetermined request, the non-volatile memory host controller interface specification of the virtual machine—the block NVMe-Blk driver allocates the first DMA memory from the virtual NVMe direct memory access DMA memory management area; where, The first DMA memory includes the second DMA memory of the input queue of the input/output IO queue and the third DMA memory of the completion queue;
    所述NVMe-Blk驱动根据所述第二DMA内存的物理机物理地址HPA构建提交队列项,通知主机的NVMe设备处理所述预定请求;The NVMe-Blk driver constructs a submission queue item according to the physical machine physical address HPA of the second DMA memory, and notifies the NVMe device of the host to process the predetermined request;
    所述NVMe-Blk驱动读取完成队列中的回应信息。The NVMe-Blk driver reads the response information in the completion queue.
  2. 根据权利要求1所述的虚拟化方法,其中,所述预定请求为写请求;所述NVMe-Blk驱动根据所述第二DMA内存的HPA构建提交队列项之前,还包括:The virtualization method according to claim 1, wherein the predetermined request is a write request; before the NVMe-Blk driver constructs a submission queue item according to the HPA of the second DMA memory, further comprising:
    所述NVMe-Blk驱动将所写的数据拷贝到所述第二DMA内存中。The NVMe-Blk driver copies the written data to the second DMA memory.
  3. 根据权利要求1所述的虚拟化方法,其中,所述预定请求为读请求;所述虚拟化方法还包括:The virtualization method according to claim 1, wherein the predetermined request is a read request; the virtualization method further comprises:
    所述NVMe-Blk驱动根据所述回应信息从所述第三DMA内存中读取数据,将读取的数据拷贝到应用的读缓存中,释放所述第三DMA内存。The NVMe-Blk driver reads data from the third DMA memory according to the response information, copies the read data to an application read buffer, and releases the third DMA memory.
  4. 根据权利要求1~3任一项所述的虚拟化方法,在虚拟机的应用发出预定请求的情况下,虚拟机的非易失内存主机控制器接口规范—块NVMe-Blk驱动从虚拟NVMe直接内存存取DMA内存管理区中分配第一DMA内存之前,还包括:According to the virtualization method according to any one of claims 1 to 3, in the case where a predetermined request is issued by the application of the virtual machine, the interface specification of the non-volatile memory host controller of the virtual machine—the block NVMe-Blk driver directly from the virtual NVMe Before allocating the first DMA memory in the memory access DMA memory management area, it also includes:
    所述NVMe-Blk驱动向所述虚拟机申请连续的第一物理虚拟机内存,将所述第一物理虚拟机内存的虚拟机物理地址GPA传输给所述主机;The NVMe-Blk driver applies to the virtual machine for a continuous first physical virtual machine memory, and transmits the virtual machine physical address GPA of the first physical virtual machine memory to the host;
    所述NVMe-Blk驱动接收到输入输出IO队列标识和所述第一物理虚拟机内存的物理机物理地址HPA之间的第一对应关系;The NVMe-Blk driver receives the first correspondence between the input and output IO queue identifier and the physical machine physical address HPA in the first physical virtual machine memory;
    所述NVMe-Blk驱动向所述虚拟机申请连续的第二物理虚拟机内存,将所述第二物理虚拟机内存设置成DMA访问方法,将所述第二物理虚拟机内存的虚拟机物理地址GPA传输给所述主机;The NVMe-Blk driver applies to the virtual machine for a continuous second physical virtual machine memory, sets the second physical virtual machine memory as a DMA access method, and sets the virtual machine physical address of the second physical virtual machine memory GPA transmission to the host;
    所述NVMe-Blk驱动根据接收到的所述第二物理虚拟机内存的HPA创建所述虚拟NVMe DMA内存管理区。The NVMe-Blk driver creates the virtual NVMe DMA memory management area according to the received HPA of the memory of the second physical virtual machine.
  5. 根据权利要求4所述的虚拟化方法,还包括:The virtualization method according to claim 4, further comprising:
    所述NVMe-Blk驱动保存所述IO队列标识和所述NVMe设备的寄存器中的门铃DoorBell区域的字段之间的第二对应关系;The NVMe-Blk driver saves the second correspondence between the IO queue identifier and the field of the doorbell DoorBell area in the register of the NVMe device;
    所述通知主机的NVMe设备处理预定请求包括:The processing of the predetermined request by the NVMe device informing the host includes:
    所述NVMe-Blk驱动将所述提交队列项信息写入所述NVMe设备的寄存器的门铃DoorBell区域中所述IO队列标识对应的字段中。The NVMe-Blk driver writes the submission queue item information into the field corresponding to the IO queue identifier in the doorbell DoorBell area of the register of the NVMe device.
  6. 根据权利要求4所述的虚拟化方法,还包括:The virtualization method according to claim 4, further comprising:
    所述NVMe-Blk驱动接收到消息中断MSI中断号;The NVMe-Blk driver receives the message interrupt MSI interrupt number;
    所述NVMe-Blk驱动读取完成队列中的回应信息包括:The response information in the read completion queue of the NVMe-Blk driver includes:
    所述NVMe-Blk驱动在所述MSI中断号对应的中断触发时读取所述完成队列中的回应信息。The NVMe-Blk driver reads the response information in the completion queue when the interrupt corresponding to the MSI interrupt number is triggered.
  7. 根据权利要求1~3任一项所述的虚拟化方法,其中,所述通知主机的NVMe设备处理预定请求包括:The virtualization method according to any one of claims 1 to 3, wherein the processing of the predetermined request by the NVMe device of the notification host includes:
    所述NVMe-Blk驱动向所述主机发送写门铃DoorBell请求;其中,所述写DoorBell请求包括提交队列项信息。The NVMe-Blk driver sends a doorbell write DoorBell request to the host; wherein the write DoorBell request includes submission queue item information.
  8. 根据权利要求1~3任一项所述的虚拟化方法,其中,所述NVMe-Blk驱动读取完成队列中的回应信息包括:The virtualization method according to any one of claims 1 to 3, wherein the response information in the read completion queue of the NVMe-Blk driver includes:
    所述NVMe-Blk驱动轮询所述完成队列得到所述回应信息。The NVMe-Blk driver polls the completion queue to obtain the response information.
  9. 一种虚拟化方法,包括:A virtualization method, including:
    主机的非易失内存主机控制器接口规范NVMe设备获知有预定请求需要处理时,进行与所述预定请求对应的操作,将回应信息放入完成队列中。When the NVMe device of the host's non-volatile memory host controller interface specification learns that a predetermined request needs to be processed, it performs an operation corresponding to the predetermined request and puts the response information into the completion queue.
  10. 根据权利要求9所述的虚拟化方法,其中,所述预定请求为写请求,所述进行与所述预定请求对应的操作包括:The virtualization method according to claim 9, wherein the predetermined request is a write request, and performing the operation corresponding to the predetermined request includes:
    所述NVMe设备根据提交队列项信息将第二直接内存存取DMA内存中的数据写入到所述NVMe设备的对应扇区中。The NVMe device writes the data in the second direct memory access DMA memory to the corresponding sector of the NVMe device according to the submission queue item information.
  11. 根据权利要求9所述的虚拟化方法,其中,所述预定请求为读请求,所述进行与预定请求对应的操作包括:The virtualization method according to claim 9, wherein the predetermined request is a read request, and performing the operation corresponding to the predetermined request includes:
    所述NVMe设备根据提交队列项信息从NVMe设备的对应扇区中读取数据,将读取的数据拷贝到第三DMA内存中。The NVMe device reads data from the corresponding sector of the NVMe device according to the submission queue item information, and copies the read data to the third DMA memory.
  12. 根据权利要求9~11任一项所述的虚拟化方法,所述进行与所述预定请求对应的操作,将回应信息放入完成队列中之前,还包括:The virtualization method according to any one of claims 9 to 11, before performing the operation corresponding to the predetermined request and placing the response information in a completion queue, further comprising:
    所述主机将接收到的第一物理虚拟机内存的虚拟机物理地址GPA转换成物 理机物理地址HPA;The host converts the received virtual machine physical address GPA of the first physical virtual machine memory into a physical machine physical address HPA;
    所述主机根据所述第一物理虚拟机内存的HPA创建输入输出IO队列,将所述IO队列标识和所述第一物理虚拟机内存的HPA之间的第一对应关系发送给NVMe-块Blk驱动;The host creates an input and output IO queue according to the HPA of the first physical virtual machine memory, and sends the first correspondence between the IO queue identifier and the HPA of the first physical virtual machine memory to the NVMe-block Blk drive;
    所述主机将所述第二物理虚拟机内存的GPA转换成物理机物理地址HPA,将所述第二物理虚拟机内存设置成主机DMA访问模式,将所述第二物理虚拟机内存的HPA发送给所述NVMe-Blk驱动。The host converts the GPA of the memory of the second physical virtual machine into a physical address HPA of the physical machine, sets the memory of the second physical virtual machine to a host DMA access mode, and sends the HPA of the memory of the second physical virtual machine Give the NVMe-Blk driver.
  13. 根据权利要求12所述的虚拟机方法,所述主机将所述IO队列标识和所述第一物理虚拟机内存的HPA之间的第一对应关系发送给NVMe-Blk驱动之前,还包括:所述主机为所述IO队列分配消息中断MSI中断号,将所述MSI中断号设置成所述虚拟机直接处理;The virtual machine method according to claim 12, before the host sends the first correspondence between the IO queue identifier and the HPA of the memory of the first physical virtual machine to the NVMe-Blk driver, further comprising: The host allocates a message interrupt MSI interrupt number to the IO queue, and sets the MSI interrupt number to be processed directly by the virtual machine;
    所述主机将IO队列标识和所述第一物理虚拟机内存的HPA之间的第一对应关系发送给NVMe-Blk驱动时,还将所述MSI中断号发送给所述NVMe-Blk驱动;When the host sends the first correspondence between the IO queue identifier and the HPA in the memory of the first physical virtual machine to the NVMe-Blk driver, the host also sends the MSI interrupt number to the NVMe-Blk driver;
    所述进行与预定请求对应的操作后,该方法还包括:After performing the operation corresponding to the predetermined request, the method further includes:
    所述主机根据所述MSI中断号触发中断。The host triggers an interrupt according to the MSI interrupt number.
  14. 根据权利要求12所述的虚拟机方法,还包括:The virtual machine method according to claim 12, further comprising:
    所述主机接收到写门铃DoorBell请求;其中,所述写DoorBell请求包括提交队列项信息;The host receives the doorbell write doorbell request; wherein the doorbell write request includes information about the submission queue item;
    所述主机将所述提交队列项信息写入所述NVMe设备的寄存器的DoorBell区域中所述IO队列标识对应的字段中。The host writes the submission queue item information into the field corresponding to the IO queue identifier in the DoorBell area of the register of the NVMe device.
  15. 一种虚拟化装置,包括:A virtualization device, including:
    非易失内存主机控制器接口规范—块NVMe-Blk驱动,设置为在虚拟机的应用发出预定请求的情况下,从虚拟NVMe直接内存存取DMA内存管理区中分配第一DMA内存;其中,所述第一DMA内存包括输入输出IO队列的提交队列的第二DMA内存和完成队列的第三DMA内存;根据所述第二DMA内存的物理机物理地址HPA构建提交队列项,通知NVMe设备处理所述预定请求;读取完成队列中的回应信息。Non-volatile memory host controller interface specification—block NVMe-Blk driver, set to allocate the first DMA memory from the virtual NVMe direct memory access DMA memory management area when the virtual machine application issues a predetermined request; where, The first DMA memory includes a second DMA memory of the submission queue of the input/output IO queue and a third DMA memory of the completion queue; a submission queue item is constructed based on the physical machine physical address HPA of the second DMA memory, and the NVMe device is notified to process The predetermined request; read the response information in the completion queue.
  16. 一种虚拟化装置,包括:A virtualization device, including:
    非易失内存主机控制器接口规范NVMe设备,设置为获知有预定请求需要处理时,进行与所述预定请求对应的操作,将回应信息放入完成队列中。The non-volatile memory host controller interface specification NVMe device is set to perform an operation corresponding to the predetermined request when it learns that a predetermined request needs to be processed, and put the response information into a completion queue.
  17. 一种虚拟化装置,包括处理器和计算机可读存储介质,所述计算机可读存储介质中存储有指令,当所述指令被所述处理器执行时,实现如权利要求1~14任一项所述的虚拟化方法。A virtualization device includes a processor and a computer-readable storage medium. The computer-readable storage medium stores instructions. When the instructions are executed by the processor, any one of claims 1 to 14 is implemented. The described virtualization method.
  18. 一种计算机可读存储介质,其上存储有计算机程序,所述计算机程序被处理器执行时,实现如权利要求1~14任一项所述的虚拟化方法。A computer-readable storage medium on which a computer program is stored, and when the computer program is executed by a processor, implements the virtualization method according to any one of claims 1 to 14.
PCT/CN2019/128310 2018-12-27 2019-12-25 Virtualization method and apparatus WO2020135504A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201811612722.7A CN111381926A (en) 2018-12-27 2018-12-27 Virtualization method and device
CN201811612722.7 2018-12-27

Publications (1)

Publication Number Publication Date
WO2020135504A1 true WO2020135504A1 (en) 2020-07-02

Family

ID=71129199

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/128310 WO2020135504A1 (en) 2018-12-27 2019-12-25 Virtualization method and apparatus

Country Status (2)

Country Link
CN (1) CN111381926A (en)
WO (1) WO2020135504A1 (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112416526B (en) * 2020-11-27 2023-02-17 海光信息技术股份有限公司 Direct storage access method, device and related equipment
CN114281252A (en) * 2021-12-10 2022-04-05 阿里巴巴(中国)有限公司 Virtualization method and device for NVMe (network video recorder) device of nonvolatile high-speed transmission bus
WO2023141811A1 (en) * 2022-01-26 2023-08-03 Intel Corporation Host to guest notification
CN115904259B (en) * 2023-02-28 2023-05-16 珠海星云智联科技有限公司 Processing method and related device of nonvolatile memory standard NVMe instruction
CN117251118B (en) * 2023-11-16 2024-02-13 上海创景信息科技有限公司 Virtual NVMe simulation and integration supporting method and system

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105556930A (en) * 2013-06-26 2016-05-04 科内克斯实验室公司 NVM EXPRESS controller for remote memory access
CN105677597A (en) * 2014-11-20 2016-06-15 中兴通讯股份有限公司 Data writing method and device
CN105700826A (en) * 2015-12-31 2016-06-22 华为技术有限公司 Virtualization method and device
CN106484549A (en) * 2015-08-31 2017-03-08 华为技术有限公司 A kind of exchange method, NVMe equipment, HOST and physical machine system
CN108363670A (en) * 2017-01-26 2018-08-03 华为技术有限公司 A kind of method, apparatus of data transmission, equipment and system
CN109062671A (en) * 2018-08-15 2018-12-21 无锡江南计算技术研究所 A kind of high-performance interconnection network software virtual method of lightweight

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170228173A9 (en) * 2014-05-02 2017-08-10 Cavium, Inc. Systems and methods for enabling local caching for remote storage devices over a network via nvme controller
CN107992436B (en) * 2016-10-26 2021-04-09 华为技术有限公司 NVMe data read-write method and NVMe equipment

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105556930A (en) * 2013-06-26 2016-05-04 科内克斯实验室公司 NVM EXPRESS controller for remote memory access
CN105677597A (en) * 2014-11-20 2016-06-15 中兴通讯股份有限公司 Data writing method and device
CN106484549A (en) * 2015-08-31 2017-03-08 华为技术有限公司 A kind of exchange method, NVMe equipment, HOST and physical machine system
CN105700826A (en) * 2015-12-31 2016-06-22 华为技术有限公司 Virtualization method and device
CN108363670A (en) * 2017-01-26 2018-08-03 华为技术有限公司 A kind of method, apparatus of data transmission, equipment and system
CN109062671A (en) * 2018-08-15 2018-12-21 无锡江南计算技术研究所 A kind of high-performance interconnection network software virtual method of lightweight

Also Published As

Publication number Publication date
CN111381926A (en) 2020-07-07

Similar Documents

Publication Publication Date Title
WO2020135504A1 (en) Virtualization method and apparatus
US11550477B2 (en) Processing host write transactions using a non-volatile memory express controller memory manager
US11947991B2 (en) Methods and apparatus to process commands from virtual machines
US10534552B2 (en) SR-IOV-supported storage resource access method and storage controller and storage device
US9841902B2 (en) Peripheral component interconnect express controllers configured with non-volatile memory express interfaces
TWI637613B (en) Systems and methods for enabling access to extensible storage devices over a network as local storage via nvme controller
US9501245B2 (en) Systems and methods for NVMe controller virtualization to support multiple virtual machines running on a host
US9740439B2 (en) Solid-state storage management
US9727503B2 (en) Storage system and server
US20190155548A1 (en) Computer system and storage access apparatus
EP3729251A1 (en) Virtualized ocssds spanning physical ocssd channels
US8924659B2 (en) Performance improvement in flash memory accesses
US20130326186A1 (en) Avoiding Physical Fragmentation in a Virtualized Storage Environment
JP6040101B2 (en) Storage device control method, storage device, and information processing device
US9733845B2 (en) Shared virtualized local storage
US11016817B2 (en) Multi root I/O virtualization system
US20220405015A1 (en) Storage controller, computational storage device, and operational method of computational storage device
CN112352221A (en) Shared memory mechanism to support fast transfer of SQ/CQ pair communications between SSD device drivers and physical SSDs in virtualized environments
CN108153582B (en) IO command processing method and medium interface controller
EP4105771A1 (en) Storage controller, computational storage device, and operational method of computational storage device
US9870242B2 (en) Parallel mapping of client partition memory to multiple physical adapters
KR102532099B1 (en) Storage virtualization device supporting a virtual machine, operation method thereof, and operation method of system having the same
US20220137998A1 (en) Storage virtualization device supporting virtual machine, operation method thereof, and operation method of system having the same
TW201911047A (en) Method for accessing solid state drive

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19905011

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 16.11.2021)

122 Ep: pct application non-entry in european phase

Ref document number: 19905011

Country of ref document: EP

Kind code of ref document: A1