CN117573041B - Method for improving virtualized storage performance by improving vhost-scsi - Google Patents
Method for improving virtualized storage performance by improving vhost-scsi Download PDFInfo
- Publication number
- CN117573041B CN117573041B CN202410056443.6A CN202410056443A CN117573041B CN 117573041 B CN117573041 B CN 117573041B CN 202410056443 A CN202410056443 A CN 202410056443A CN 117573041 B CN117573041 B CN 117573041B
- Authority
- CN
- China
- Prior art keywords
- host
- polling
- guest
- data
- vhost
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 47
- 230000007246 mechanism Effects 0.000 claims abstract description 24
- 230000000977 initiatory effect Effects 0.000 claims abstract description 5
- 238000012545 processing Methods 0.000 claims description 11
- 238000002715 modification method Methods 0.000 claims description 4
- 230000008569 process Effects 0.000 abstract description 13
- 230000004044 response Effects 0.000 abstract description 7
- 238000005265 energy consumption Methods 0.000 abstract description 3
- 238000005516 engineering process Methods 0.000 description 5
- 238000005457 optimization Methods 0.000 description 5
- 230000005540 biological transmission Effects 0.000 description 4
- 230000004048 modification Effects 0.000 description 4
- 238000012986 modification Methods 0.000 description 4
- 230000006870 function Effects 0.000 description 3
- 238000012360 testing method Methods 0.000 description 3
- 238000012546 transfer Methods 0.000 description 3
- 230000001133 acceleration Effects 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 2
- 230000008859 change Effects 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 230000006872 improvement Effects 0.000 description 2
- 238000002347 injection Methods 0.000 description 2
- 239000007924 injection Substances 0.000 description 2
- 238000004088 simulation Methods 0.000 description 2
- 230000006978 adaptation Effects 0.000 description 1
- 230000000903 blocking effect Effects 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000002708 enhancing effect Effects 0.000 description 1
- 230000007774 longterm Effects 0.000 description 1
- 238000007726 management method Methods 0.000 description 1
- 238000005086 pumping Methods 0.000 description 1
- 230000010076 replication Effects 0.000 description 1
- 239000000243 solution Substances 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
- 238000013519 translation Methods 0.000 description 1
- 230000014616 translation Effects 0.000 description 1
- 230000001960 triggered effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0602—Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
- G06F3/061—Improving I/O performance
- G06F3/0611—Improving I/O performance in relation to response time
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0628—Interfaces specially adapted for storage systems making use of a particular technique
- G06F3/0629—Configuration or reconfiguration of storage systems
- G06F3/0631—Configuration or reconfiguration of storage systems by allocating resources to storage systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0668—Interfaces specially adapted for storage systems adopting a particular infrastructure
- G06F3/0671—In-line storage system
- G06F3/0673—Single storage device
- G06F3/0674—Disk device
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/44—Arrangements for executing specific programs
- G06F9/455—Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
- G06F9/45533—Hypervisors; Virtual machine monitors
- G06F9/45558—Hypervisor-specific management and integration aspects
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/44—Arrangements for executing specific programs
- G06F9/455—Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
- G06F9/45533—Hypervisors; Virtual machine monitors
- G06F9/45558—Hypervisor-specific management and integration aspects
- G06F2009/45583—Memory management, e.g. access or allocation
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Human Computer Interaction (AREA)
- Software Systems (AREA)
- Memory System Of A Hierarchy Structure (AREA)
Abstract
The invention relates to a method for improving the performance of virtualized storage by improving a vhost-scsi, which modifies an active query event mechanism into an active query event query, and comprises the following steps: s100: initiating an application for creating continuous shared memory in an address space of a client kernel layer, and distributing the shared memory to SQ and CQ; s200: the shared memory created in the client kernel layer is transferred to a vHost module of the host kernel layer after being converted by a Qemu module memory address; s300: the Polling thread is started through allocation at the Guest end and the Host end, and the Polling and the setting of the key identifiers of the SQ and the CQ are continuously carried out to complete the loop logic. According to the invention, the previous passive event notification mechanism is changed into the active polling query mechanism, so that the storage process is more efficient when 4K-level data is stored, the service request response event is reduced, and the energy consumption of a server is also reduced.
Description
Technical Field
The invention relates to electronic digital data processing, in particular to a method for improving the virtualized storage performance of a vhost-scsi.
Background
The importance of enhancing IO performance of storage is self-evident for modern data centers and cloud computing environments. With the rapid development of digital transformation and cloud computing, data has become an important asset for enterprises, and the storage and processing capabilities of the data directly affect the business operations and performances of the enterprises.
The stored IO performance refers to the capability of the storage device to perform read-write operation in unit time, and is one of important indexes for measuring the performance of the storage device. In data centers and cloud computing environments, stored IO performance directly affects the following aspects:
the application performance is as follows: the IO performance of the storage device limits the operating speed and response time of the application system. If the IO performance of the storage device is insufficient, the response of the application system is slow or crashes, so that the business operation of an enterprise is affected.
Data center efficiency: the IO performance of a storage device directly affects the efficiency and operating costs of a data center. If the IO performance of the storage device is insufficient, more server devices need to be added to meet the requirements by parallel processing, which increases the operation cost and complexity of the data center.
Competitiveness of cloud computing service providers: providing storage devices with high IO performance is one of the important means for cloud computing service providers to attract customers. The storage device with high IO performance can improve the use experience and reliability of cloud service.
Thus, improving IO performance of storage is critical to data centers and cloud computing environments. Traditional storage architecture cannot meet the requirements of modern application systems, and more advanced storage technologies, such as NVMe, distributed storage, and the like, need to be adopted to improve the IO performance and expandability of the storage device. Meanwhile, factors such as energy consumption and reliability of the storage device are also required to be considered, so that the storage device can meet the requirements of enterprises and long-term operation cost is reduced.
The vhost-scsi directly receives and transmits data in the Kernel layer by moving the rear end implementation of virtio from the Qemu application layer to the Kernel layer of Host, avoids the context switching between user state and Kernel state and the data replication in the middle process, and is the best scheme for the fast local disk drop of virtualized storage data at present. However, while hardware devices are increasingly powerful, in the process of reading and writing data volume IO below 4K and 4K, the consumption of a software stack still occupies a large proportion, so that the optimization of the software layer of storage is more important to the high-performance storage.
Chinese patent invention "a method and system for improving virtualized storage performance of Shenwei platform" (patent number: CN 112148224A). The invention discloses a method and a system for improving virtualized storage performance of a Shenwei platform, comprising the following steps: forming a disk array on a host; constructing a new storage volume on the disk array; constructing a file system based on the new storage volume; creating a virtual machine on the constructed file system; the host machine realizes the read-write operation with the client machine through the virtual machine. The invention utilizes LVM cache to combine RAID0 technology to build file system, and uses KVM virtualization technology to build virtual machine, to improve the read-write performance of virtualized storage IO. The patent mainly utilizes LVM cache and RAID0 to realize performance improvement, is not optimized on an IO path stack, and needs to be optimized from a software stack layer under the condition of random storage test of small data below 4K.
Chinese invention patent (patent number: CN 114020406A) is a method, a device and a system for accelerating I/O of virtual machines by a cloud platform. The invention discloses a cloud platform accelerating virtual machine I/O method, a device, a system and a computer readable storage medium, which comprise the steps of creating a virtual machine sharing a large page memory in advance and acquiring a protocol and connection information of a target volume; sending a request for creating a controller to an SPDK (specific virtual host-user) vhost-user service on a host machine corresponding to the virtual machine so as to create a vhost-user-scsi controller; according to the protocol and the connection information of the target volume, the SPDKBdve takes over the target volume, and the SPDKBdve is added to a vhost-user-scsi controller; associating the virtual machine with a vhost-user-scsi controller such that the QUME within the virtual machine accelerates the target volume by invoking the vhost-user-scsi controller; the invention is beneficial to improving the acceleration efficiency and the system performance. The patent mainly accelerates the I/O of the virtual machine by solving the limit of the virtual machine which is created in advance and shares the large page memory, and is mainly used in a vhost-user-scsi module and the method and the scene of the patent are different.
Chinese patent 'Shenwei platform storage input output device virtualization performance optimization method and system' (patent number: CN 111796912A). The invention discloses a method and a system for optimizing virtualization performance of a Shenwei platform storage input/output device, wherein the method comprises the following steps: the simulation processor QEMU of the client provides a shared memory for the client and the host; the simulation processor QEMU of the client communicates with the host to inform the host of the address information of the shared memory; after the host receives the address information of the shared memory, the address of the address information of the shared memory in the host user process is calculated, and then the read-write operation is performed. The patent mainly uses the shared memory to accelerate, and mainly comprises the technical adaptation Shenwei platform of the shared memory in the vhost-scsi scheme, and has no additional general acceleration method for other platforms.
Chinese patent invention "a method for safely storing and quickly calling data and mobile terminal" (patent number: CN 109829324A). The invention discloses a method for safely storing and quickly calling data and a mobile terminal, comprising the following steps: encrypting data which needs to be stored in an open public path by a system; storing the encrypted data under the open public path; decrypting the data in the open public path, storing the decrypted data in a virtual memory, and forming a mapped path according to the storage address; modifying a system call interface of which the access path defaults to the open public path, and modifying the access path of the system call interface to the mapped path, so that the system invokes decrypted data from the virtual memory for use. The invention not only can solve the problem of safe storage of data under the default path of the system, but also can improve the calling speed of the data, avoid the phenomena of system blocking, no response and the like, and well solve the contradiction between the problem of data storage safety and the problem of data calling rapidity. This patent only encrypts data stored under an open public path, helping little to improve I/O efficiency.
Chinese patent invention "a data access method and device oF NVMe-oF user client" (patent number: CN 114417373A). The embodiment oF the application provides a data access method and device oF an NVMe-oF user client, wherein the method comprises the following steps: receiving a data access request message sent by a virtual host vhost device; analyzing the data access request message to obtain a first service end identifier and an access operation instruction; selecting a first annular queue based on the first service end identifier, and writing an access operation instruction into a control instruction area oF the first NVMe-oF service end through the first annular queue; the first annular queue is a queue between the vhost device and the first NVMe-oF server. The patent is mainly aimed at a method for improving efficiency by reducing IO paths by using DMA for NVME-oF equipment.
Disclosure of Invention
The invention mainly aims to provide a method for improving the virtualized storage performance of the vhost-scsi, which changes a passive event notification mechanism into a active polling query mechanism, so that the storage process is more efficient when 4K-level data is stored, service request response events are reduced, and meanwhile, the energy consumption of a server is reduced.
In order to accomplish the above object, the present invention provides a method for improving the performance of a vhost-scsi to promote virtualized storage, modifying the IOeventfd event mechanism into a Polling active event query, the modifying method comprising the steps of:
s100: initiating an application for creating continuous shared memory in an address space of a Guest Kernel, and distributing the shared memory to the SQ and the CQ;
s200: the shared memory created in the Guest Kernel is transferred to a vHost module of the Host Kernel after being converted by a memory address of the Qemu module;
s300: the Polling and the setting of the key identifications of the SQ and the CQ are continuously polled to complete the Polling logic by distributing and starting the Polling thread at the Guest end and the Host end.
Preferably, the step S100 further includes the steps of:
s110: dividing a shared memory in an address space of a Guest Kernel according to a continuous whole page memory allocation principle, and allocating the division areas of the whole page memory to SQ and CQ;
s120: dividing the space in the SQ area space and distributing the space to the SQ- > head, the SQ- > tail and the SQ- > flag;
s130: the space is divided in the CQ area space and allocated to CQ- > head, CQ- > tail and CQ- > flag.
Further preferably, the step S200 further includes the steps of:
s210: pre-configuring a PCI configuration space in Qemu, and correspondingly writing information of the PCI configuration space in Qemu into the PCI configuration space in Guest Kernel to enable Qemu to obtain Guest Physical Address;
s220: qemu converts Guest Physical Address to Host Virtual Address;
s230: host Virtual Address is passed in Qemu to the vhest module in Host Kernel over the Ioctl interface.
Still more preferably, in step S300, the Polling logic is performed by setting SQ and CQ head, tail, flag, respectively.
Still further preferably, the modification method further comprises the steps of:
in step S300, the concrete Polling logic is as follows:
s310: starting an Sqthread thread in a Host Kernel, and starting a Polling state;
s320: setting the Sq.head mark in the Guest OS as n, wherein n is a natural number;
s330: actively inquiring whether the value of the Sq.head mark is larger than the value of the Sq.tail or not in an Sqthread thread in the Host OS, and if the value of the Sq.head mark is larger than the value of the Sq.tail, indicating that new data is filled;
s340: acquiring new transmitted data through the Sq.head mark in the Host OS;
s350: updating the identification of the Sq.tail in the Host OS, wherein the identification of the data which is newly sent by the Host OS terminal is obtained;
s360: processing the newly transmitted data in a Host OS;
s370: after processing new data in the Host OS, updating cq.head=n, which means that the data corresponding to the index value of n is processed, polling is performed to determine whether the next data is sent, and continuing to step by step n;
s380: judging whether the Cq.head reaches n in the get OS, and informing an upper application layer that the request of the sq.head= n is processed;
s390: after the application layer is notified in the get OS, the cq.tail is updated to n, and the resources occupied by the n index can be released after the n number request result is also processed.
Still further preferably, in step S370, if new data is not received within a predetermined time, the poll is set to the sleep state.
The beneficial effects of the invention are as follows:
the advantages of the Vhost-scsi technique in terms of performance are mainly due to its optimization of the data transmission path, reduced virtualization overhead and improved data transmission mechanisms. The data transfer of the Vhost-scsi technique uses IOeventfd and irqfd for notification between the virtual machine and the host. IOeventfd is used to inform HOST that data is ready, while irqfd is used to inform virtual machines of interrupt injection. After analyzing the whole flow of the vhost-scsi, the event notification mechanism is found to have longer real paths in the KVM module and relatively more performance cost, so that the IOeventfd event mechanism is changed into a Polling active event query, and the mechanism reduces the delay of request response, thereby improving the performance.
In the contrast test of using fio 4K random read-write virtual disk, the invention adopts the Polling scheme to replace event notification mechanism, and the efficiency of iops is improved by about 6%.
Drawings
The invention will be described in further detail with reference to the drawings and the detailed description.
FIG. 1 is a diagram of the original IO flow of a read-write request of a Vhost-scsi prior to modification;
FIG. 2 is a flow chart of the present invention after modification of the Vhost-scsi Polling scheme;
fig. 3 is a flowchart of step S300 of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention are clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention. In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention, but the present invention may be practiced in other ways other than those described herein, and persons skilled in the art will readily appreciate that the present invention is not limited to the specific embodiments disclosed below.
Abbreviations and key term definitions:
KVM: the Kernel of the full-scale Kernel-based Virtual Machine is that a virtualization module is added on the Linux operating system, through which the operating system can directly run multiple Virtual machines, i.e. Virtual clients (VM-Virtual Machine)
Guest OS: in virtualization, a computer may run multiple operating systems simultaneously, each of which is referred to as a Guest OS, i.e., an operating system on a virtual machine.
Host OS: in the virtual machine, the Host OS corresponds to the Guest OS, the former refers to an operating system installed on the physical machine, and the latter refers to an operating system installed in the virtual machine. For example, the Ubuntu (u Ban Tu) system installed on a physical computer, on which we then created 1 virtual machine with the OpenEuler system inside, in this case Ubuntu is the Host OS.
Qemu: qemu is called as Quick manager, is a powerful open-source virtual machine manager, and can support hardware virtualization by combining with a KVM module in a Linux kernel. Currently, qemu+kvm is commonly adopted in Linux platforms to provide virtualization services.
vhost-scsi: the virtual disk technology is a virtual disk technology for simulating the Scsi equipment on a Host, and is used as a storage mode provided for a virtual machine as a virtual disk. The Vhost-scsi has the greatest characteristic of high performance, and the method directly carries out data transceiving on the Kernel layer by moving the rear end implementation of Virtio from the Qemu application layer to the Kernel layer of Host, avoids the context switching between a user state and the Kernel state and the data copying of an intermediate process, reduces the stack of IO flow, and obviously helps to promote Iops under the test condition of 4K random reading and writing.
Event notification mechanism in Vhost-scsi: the IO flow of the Vhost-scsi is initiated in the Guest Kernel, data to be transmitted is stored in a Descriibe item of Vring, then a Virtqueue index value is written in a PCI space, a KVM module is triggered, and the back end is informed of which Vring queue has new data to be processed in storage through the bound Ioeventfd.
Vhost-scsi changes the poling mechanism: the shared memory is created in the Guest Kernel and the Host Kernel, and new data is found to be processed by actively inquiring the flag in the shared memory through a series of agreed protocols, so that the waiting time can be reduced, and the IO efficiency can be further improved.
SQ (Submit Queue): SQ is a queue of user space commit IO requests. When user processes need to submit IO requests to the kernel through system calls, these requests are put into SQ. The requests in the SQ queue are arranged in a first-in first-out (FIFO) order for the kernel thread to process. Upon submitting an IO request, the user process needs to provide relevant I/O operating parameters such as start address, operation type (read/write), data length, etc. These parameters are packaged into a request structure and then submitted to SQ via a system call. When the kernel thread reads the request from the SQ, the corresponding I/O operation is performed according to the type and parameters of the request. After the operation is completed, the kernel thread writes the result back to the shared memory and notifies the user process of the completion using CQ (Completion Queue).
CQ (Completion Queue): the CQ is a queue for notifying the user of IO operation completion. When the kernel thread completes one IO operation, a completion notification is placed into the CQ. The user process can continuously read the completion notification from the CQ through the system call so as to process the completed IO operation, and the shared memory is used as a communication medium by the SQ and the CQ, so that the IO initiating terminal and the IO receiving processing terminal can efficiently interact. Through actively inquiring IO requests and reducing the number of system call times, the Polling can obviously improve the disk I/O performance.
The advantages of the Vhost-scsi technique in terms of performance are mainly due to its optimization of the data transmission path, reduced virtualization overhead and improved data transmission mechanisms. The data transfer of the Vhost-scsi technique uses IOeventfd and Irqfd for notification between the virtual machine and the host. IOeventfd is used to inform Host that data is ready, while Irqfd is used to inform virtual machines to perform interrupt injection. After analyzing the whole flow of Vhost-scsi, the event notification mechanism is found to have longer real paths in the KVM module and relatively more performance cost, so that we focus on further optimization to change the IOeventfd event mechanism into a Polling active event query. This mechanism reduces latency in the response to the request, thereby improving performance.
As shown in FIG. 1, the original IO flow of the read-write request is the Vhost-scsi before improvement. The following procedure is depicted in fig. 1:
in Guest: some upper layer- > generic block device layer- > virtssi layer- > virtio layer.
In Host: vhost (virtio) layer- > vhost_scsi layer- > specific physical IO drop.
As can be seen from fig. 1, the event notification mechanism from Guest to Host in the IO flow of the vhost-scsi scheme forwards the event through kvm, and then passes through a relatively long software stack.
The flow chart of the modified Vhost-scsi pumping is shown in FIG. 2. As can be seen from FIG. 2, replacing events with polling the shared memory identifier reduces the time consuming procedure stack calls for event notification.
As can be seen from a comparison of fig. 1 and fig. 2, the vp_notify_iowrite (vq_index) path in fig. 2 is a data notification path before modification, and the present application still keeps the path, and plays a role of notifying the wake-up of the dormant polling thread in the vhost-scsi. This path modification is present before, and is also present in the prior Vhost mechanism preparation, and its final implementation is also by writing PCI space data, this PCI space will bind this space with Irqfd in Qemu, write this PCI space value in Guest, trigger the interrupt event of binding in Host Kernel, then pass through Virtio protocol, and then process IO request.
In fig. 2, vhost_scsi_handle_kck: the Host receives the corresponding processing function of the IO event notification of the Guest (through the kvm module) according to the original logic of the vhost-scsi.
virtqueue_add_split: is a function of the Guest-side virtio adding data in split mode.
To implement the Vhost-scsi event notification mechanism to be modified to a Polling mechanism, the following needs to be solved:
1. how to create a continuous shared memory in the address space.
If only the Guest knows the memory address, the Host does not know the identity of the Guest-written memory, and the Host does not know where to go, the IO event cannot be transferred, so that the shared memory from the Guest Kernel to the Host Kernel needs to be opened. It is known that virtual memory addresses are contiguous and physical memory addresses are not necessarily contiguous, so we need a piece of memory where the Host/Guest appears to be contiguous. GPA (guest physic address) - > HVA (host virtual address) - > HPA (host physic address). This is the first problem to be solved if it is ensured that the memory of both the virtual address space and the physical address space is contiguous. In order to solve the problem of shared memory, a section of whole page memory can be applied to a Guest Kernel, and a certain area is defined for SQ and CQ to be used for recording.
Therefore, the first step in this embodiment for implementing the Vhost-scsi event notification mechanism to be modified into the Polling mechanism is:
s100: and initiating an application for creating continuous shared memory in the address space of the Guest Kernel, and distributing the shared memory to the SQ and the CQ.
In this step, the method specifically further includes the following steps:
s110: and dividing the shared memory in the address space of the Guest Kernel according to the continuous whole page memory allocation principle, and allocating the division areas of the whole page memory to the SQ and the CQ. In this step, it is possible to analyze if we allocate in the Guest in whole page so that the section of memory is also contiguous in the Host view. And dividing the region into SQ and CQ in the whole page of memory of the application.
Specifically, the required specific memory size is estimated, and then the shared memory is allocated according to the size of the whole page of the memory (i.e. the actual page_size: 4096). For example, a worker may apply for a 4088 byte size memory that is smaller than 4096, so a worker may apply for 1 full page. If a 4097 byte size memory is to be applied, which is larger than 4096, one page is not available, the staff member needs to apply a 2-page size, i.e., (4096 x 2=) 8192 bytes of shared memory.
We find that in the Guest the memory according to the size application of the whole page, in Qemu it will also apply to the Host's memory management module according to the whole page. Thus, the memory in the Guest is not composed of the hash table memory in the Host, but is composed of a continuous whole memory. Thus, the GuestOS and HostOS can operate on the same shared memory conveniently.
As described above, only a full page of aligned memory is allocated, its GPAs are contiguous. This is because the virtual address space appears to be contiguous, but the physical addresses are in the form of a hash table, and various translations are required for this segment of Vhost to correctly identify data identified by the Guest, which can facilitate efficient polling in the case of a contiguous memory Vhost kernel.
S120: the space is divided in the SQ region space and allocated to the SQ- > head, the SQ- > tail, the SQ- > flag.
The function of the Sq- > FLAG is to identify whether to wake up, if the Sq poll at the Host end finds that no data is issued at the upper layer for a period of time, the Sq- > FLAG is set to be 'flag_vring_sq_new_wake' (meaning that vhost Sqpoll thread NEEDs to be awakened before data is issued next), then the Sqpoll thread is put into a sleep state (CPU consumption is saved), the identity of whether the Sq- > FLAG has a new WAKEUP is detected before the data is sent for a period of time Guest, if so, the FLAG is informed to the vhest by the previous kclk flow, the polling state is restored at first vhost Sqpoll thread, and then the Sq- > head identity is set. S130: the space is divided in the CQ area space and allocated to CQ- > head, CQ- > tail and CQ- > flag.
The cq- > flag is a reserved item, and is used for adding a poll to the Guest end to receive a completion event, if a poll section of event finds that the completion event is not completed, the cq- > flag is set to be in a new wakeup state before the corresponding thread in the Guest enters sleep, and the next time the Guest end sends the completion event, the Guest end needs to wake up first and then change the cq- > head value.
2. How to transfer the memory address converted by the shared memory created in the Guest Kernel to the vHost module of the Host Kernel, i.e. step S200 of the present application: and transferring the shared memory created in the Guest Kernel to a vHost module of the Host Kernel after the memory address of the Qemu module is converted.
That is, if only the Guest Kernel knows the shared memory address, the Host Kernel does not know that the entire flow is certainly not circulated. Both ends are required to operate the same block of memory space.
In this application, this is achieved by the following specific steps:
s210: pre-configuring a PCI configuration space in Qemu, and correspondingly writing information of the PCI configuration space in Qemu into the PCI configuration space in Guest Kernel to enable Qemu to obtain Guest Physical Address;
s220: qemu converts Guest Physical Address to Host Virtual Address;
s230: host Virtual Address is passed in Qemu to the vhest module in Host Kernel over the Ioctl interface.
3. The Polling logic of vhost-scsi in Host Kernel, step S300 in this embodiment: the Polling logic is completed by setting SQ and CQ.
A set of Polling logic flow chart is needed to be designed, so that the locking can be less when the Loop is as much as possible, and the system can operate efficiently when a large number of requests are made. A specific logic flow diagram of the poling is shown in fig. 3.
In step S300, the Polling thread is started by allocating at the Host end of the Guest end, and the Polling logic is completed by continuously Polling and setting the key identifiers of SQ and CQ.
Specifically, the modification method further comprises the following steps:
in step S300, the concrete Polling logic is as follows:
s310: the vhost_iovq_thread [ start ] is called in the Host Kernel, namely, the Sqthread thread is started in the Host Kernel, and the Polling state is started;
s320: setting the identifier of the sq.head in the Guest OS to n, n being a natural number, as shown in fig. 3, in this embodiment, n=3;
s330: actively inquiring whether the value of the Sq.head mark is larger than the value of the Sq.tail or not in the Sqthread thread in the Host OS, and if the value of the Sq.head mark is larger than the value of the Sq.tail, indicating that new data is filled;
s340: obtaining new transmitted data (get_request) through the Sq.head identification in the Host OS;
s350: updating the identification of the Sq.tail in the Host OS, wherein the identification of the data which is newly sent by the Host OS terminal is obtained;
s360: processing the newly transmitted data (handle_request) in the Host OS;
s370: after processing new data in the Host OS, updating cq.head=3 to indicate that the data corresponding to index value No. 3 has been processed, polling then to determine whether there is next data to send, and continuing to step 3;
s380: judging whether the Cq.head reaches 3 in the get OS, and informing the upper application layer that the request of the sq.head= 3 is processed;
s390: after the application layer is notified in the get OS, the cq.tail is updated to 3, and the resources occupied by the index No. 3 can be released after the request No. 3 result is also processed.
4. If no data exists, the loop can be stopped to rest when no data is requested (CPU is not occupied when idle), and the polling can continue to operate with high efficiency until a new request is again made.
In order to cope with the above-described problem, the present embodiment adds a predetermined time in step S370, and sets the poll to the sleep state if no new data is received within the predetermined time.
It will be apparent that the described embodiments are only some, but not all, embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Claims (4)
1. A method for improving the performance of a vhost-scsi to promote virtualized storage, wherein the IOeventfd event mechanism is modified to a Polling active event query, the modification method comprising the steps of:
s100: initiating an application for creating continuous shared memory in an address space of a Guest Kernel, and distributing the shared memory to the SQ and the CQ;
s200: the shared memory created in the Guest Kernel is transferred to a vHost module of the Host Kernel after being converted by a memory address of the Qemu module;
s300: the Polling thread is started through allocation of a Guest end and a Host end, and Polling is continuously conducted, and key identifiers of the SQ and the CQ are set to complete the Polling logic; in step S300, the Polling logic is completed by setting SQ, head, tail, flag of CQ, respectively;
the modification method further comprises the following steps:
in step S300, the concrete Polling logic is as follows:
s310: starting an Sqthread thread in a Host Kernel, and starting a Polling state;
s320: setting the Sq.head mark in the Guest OS as n, wherein n is a natural number;
s330: actively inquiring whether the value of the Sq.head mark is larger than the value of the Sq.tail or not in an Sqthread thread in the Host OS, and if the value of the Sq.head mark is larger than the value of the Sq.tail, indicating that new data is filled;
s340: acquiring new transmitted data through the Sq.head mark in the Host OS;
s350: updating the identification of the Sq.tail in the Host OS, wherein the identification of the data which is newly sent by the Host OS terminal is obtained;
s360: processing the newly transmitted data in a Host OS;
s370: after processing new data in the Host OS, updating cq.head=n, which means that the data corresponding to the index value of n is processed, polling is performed to determine whether the next data is sent, and continuing to step by step n;
s380: judging whether the Cq.head reaches n in the get OS, and informing an upper application layer that the request of the sq.head=n is processed;
s390: after the application layer is notified in the get OS, the cq.tail is updated to n, and the resources occupied by the n index can be released after the n number request result is also processed.
2. The method for improving the performance of virtualized storage according to claim 1, further comprising the step of, in step S100:
s110: dividing a shared memory in an address space of a Guest Kernel according to a continuous whole page memory allocation principle, and allocating the division areas of the whole page memory to SQ and CQ;
s120: dividing the space in the SQ area space and distributing the space to the SQ- > head, the SQ- > tail and the SQ- > flag;
s130: the space is divided in the CQ area space and allocated to CQ- > head, CQ- > tail and CQ- > flag.
3. The method for improving the performance of virtualized storage according to claim 2, further comprising the step of, in step S200:
s210: pre-configuring a PCI configuration space in Qemu, and correspondingly writing information of the PCI configuration space in Qemu into the PCI configuration space in Guest Kernel to enable Qemu to obtain Guest Physical Address;
s220: qemu converts Guest Physical Address to Host Virtual Address;
s230: host Virtual Address is passed in Qemu to the vhest module in Host Kernel over the Ioctl interface.
4. A method of improving the performance of virtualized storage according to claim 3, wherein in step S370, if no new data is received within a predetermined time, the poll is set to a dormant state.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202410056443.6A CN117573041B (en) | 2024-01-16 | 2024-01-16 | Method for improving virtualized storage performance by improving vhost-scsi |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202410056443.6A CN117573041B (en) | 2024-01-16 | 2024-01-16 | Method for improving virtualized storage performance by improving vhost-scsi |
Publications (2)
Publication Number | Publication Date |
---|---|
CN117573041A CN117573041A (en) | 2024-02-20 |
CN117573041B true CN117573041B (en) | 2024-04-09 |
Family
ID=89886540
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202410056443.6A Active CN117573041B (en) | 2024-01-16 | 2024-01-16 | Method for improving virtualized storage performance by improving vhost-scsi |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN117573041B (en) |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106250166A (en) * | 2015-05-21 | 2016-12-21 | 阿里巴巴集团控股有限公司 | A kind of half virtualization network interface card kernel accelerating module upgrade method and device |
CN111796912A (en) * | 2020-07-09 | 2020-10-20 | 山东省计算中心(国家超级计算济南中心) | Virtualization performance optimization method and system for storage input/output device of Shenwei platform |
CN114020406A (en) * | 2021-10-28 | 2022-02-08 | 郑州云海信息技术有限公司 | Method, device and system for accelerating I/O of virtual machine by cloud platform |
CN115904628A (en) * | 2022-12-14 | 2023-04-04 | 安超云软件有限公司 | IO virtualization data processing method and application based on vhost protocol |
CN117389694A (en) * | 2023-12-13 | 2024-01-12 | 麒麟软件有限公司 | Virtual storage IO performance improving method based on virtio-blk technology |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP7251648B2 (en) * | 2019-10-08 | 2023-04-04 | 日本電信電話株式会社 | In-server delay control system, in-server delay control device, in-server delay control method and program |
-
2024
- 2024-01-16 CN CN202410056443.6A patent/CN117573041B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106250166A (en) * | 2015-05-21 | 2016-12-21 | 阿里巴巴集团控股有限公司 | A kind of half virtualization network interface card kernel accelerating module upgrade method and device |
CN111796912A (en) * | 2020-07-09 | 2020-10-20 | 山东省计算中心(国家超级计算济南中心) | Virtualization performance optimization method and system for storage input/output device of Shenwei platform |
CN114020406A (en) * | 2021-10-28 | 2022-02-08 | 郑州云海信息技术有限公司 | Method, device and system for accelerating I/O of virtual machine by cloud platform |
CN115904628A (en) * | 2022-12-14 | 2023-04-04 | 安超云软件有限公司 | IO virtualization data processing method and application based on vhost protocol |
CN117389694A (en) * | 2023-12-13 | 2024-01-12 | 麒麟软件有限公司 | Virtual storage IO performance improving method based on virtio-blk technology |
Also Published As
Publication number | Publication date |
---|---|
CN117573041A (en) | 2024-02-20 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11868617B2 (en) | Virtualizing non-volatile storage at a peripheral device | |
CN112422615B (en) | Communication method and device | |
CN107515775B (en) | Data transmission method and device | |
WO2017114283A1 (en) | Method and apparatus for processing read/write request in physical host | |
WO2018035856A1 (en) | Method, device and system for implementing hardware acceleration processing | |
US9792186B2 (en) | Kernel state and user state data exchange method for disaster recovery of virtual container system | |
US20100153514A1 (en) | Non-disruptive, reliable live migration of virtual machines with network data reception directly into virtual machines' memory | |
US20050060443A1 (en) | Method, system, and program for processing packets | |
US20080155571A1 (en) | Method and System for Host Software Concurrent Processing of a Network Connection Using Multiple Central Processing Units | |
WO2024046188A1 (en) | I/o unloading method and system in cloud environment, device, and storage medium | |
WO2022143714A1 (en) | Server system, and virtual machine creation method and apparatus | |
WO2023016414A1 (en) | Credential rotation method, computing device, and storage medium | |
CN114371811A (en) | Method, electronic device and computer program product for storage management | |
WO2023201987A1 (en) | Request processing method and apparatus, and device and medium | |
US10831684B1 (en) | Kernal driver extension system and method | |
US20220121359A1 (en) | System and method to utilize a composite block of data during compression of data blocks of fixed size | |
CN110445580B (en) | Data transmission method and device, storage medium, and electronic device | |
CN117573041B (en) | Method for improving virtualized storage performance by improving vhost-scsi | |
US20060242258A1 (en) | File sharing system, file sharing program, management server and client terminal | |
US10846265B2 (en) | Method and apparatus for accessing file, and storage system | |
US10120594B1 (en) | Remote access latency in a reliable distributed computing system | |
CN116225614A (en) | Method and system for virtualizing security cryptographic module in fragments | |
CN116601616A (en) | Data processing device, method and related equipment | |
Zhang et al. | NVMe-over-RPMsg: A Virtual Storage Device Model Applied to Heterogeneous Multi-Core SoCs | |
US11422963B2 (en) | System and method to handle uncompressible data with a compression accelerator |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |