WO2023174146A1 - 卸载卡命名空间管理、输入输出请求处理系统和方法 - Google Patents

卸载卡命名空间管理、输入输出请求处理系统和方法 Download PDF

Info

Publication number
WO2023174146A1
WO2023174146A1 PCT/CN2023/080410 CN2023080410W WO2023174146A1 WO 2023174146 A1 WO2023174146 A1 WO 2023174146A1 CN 2023080410 W CN2023080410 W CN 2023080410W WO 2023174146 A1 WO2023174146 A1 WO 2023174146A1
Authority
WO
WIPO (PCT)
Prior art keywords
nvme
namespace
hardware
input
card
Prior art date
Application number
PCT/CN2023/080410
Other languages
English (en)
French (fr)
Inventor
朴君
Original Assignee
阿里巴巴(中国)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 阿里巴巴(中国)有限公司 filed Critical 阿里巴巴(中国)有限公司
Publication of WO2023174146A1 publication Critical patent/WO2023174146A1/zh

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/062Securing storage systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0638Organizing or formatting or addressing of data
    • G06F3/0644Management of space entities, e.g. partitions, extents, pools
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0655Vertical data movement, i.e. input-output transfer; data movement between one or more hosts and one or more storage devices
    • G06F3/0658Controller construction arrangements
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/0671In-line storage system
    • G06F3/0673Single storage device
    • G06F3/0679Non-volatile semiconductor memory device, e.g. flash memory, one time programmable memory [OTP]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45533Hypervisors; Virtual machine monitors
    • G06F9/45558Hypervisor-specific management and integration aspects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45533Hypervisors; Virtual machine monitors
    • G06F9/45558Hypervisor-specific management and integration aspects
    • G06F2009/45583Memory management, e.g. access or allocation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45533Hypervisors; Virtual machine monitors
    • G06F9/45558Hypervisor-specific management and integration aspects
    • G06F2009/45587Isolation or security of virtual machine instances

Definitions

  • the present disclosure relates to the field of computer technology, and specifically to offload card namespace management, input and output request processing systems and methods.
  • namespace technology has been used to divide/manage the space of NVME storage, which is mainly used to divide disk logical space for different applications and establish a security isolation mechanism. Different namespaces are isolated using software technology and share some hardware resources, such as hardware queues.
  • embodiments of the present disclosure provide offload card namespace management, input and output request processing systems and methods.
  • the offload card allocates multiple hardware queues corresponding to the multiple created namespaces according to the create namespace request and binds the allocated multiple hardware queues to the corresponding namespaces respectively.
  • a virtual machine is running on the host, The multiple applications run in the virtual machine, and the virtual machine includes:
  • a driver is used to manage the plurality of hardware queues, and the virtual machine sends the create namespace request to the uninstall card through the driver.
  • the offload card includes a controller and a hardware accelerator, wherein,
  • the host includes a multi-core central processor
  • the virtual machine includes user space and kernel space
  • the host is connected to the NVMe device through the offload card,
  • the NVMe device is mounted inside the virtual machine
  • the application running in the user space sends an input and output request to the general block layer
  • inventions of the present disclosure provide an uninstall card namespace management method.
  • the method runs on an uninstall card namespace management system.
  • the uninstall card namespace management system includes a host and an uninstall card connected to the host.
  • the method includes:
  • the offload card allocates multiple hardware queues corresponding to the multiple created namespaces according to the create namespace request and binds the allocated multiple hardware queues to the corresponding namespaces respectively.
  • a virtual machine is run on the host, the plurality of applications are run in the virtual machine, and the virtual machine includes: The driver of the multiple hardware queues, wherein the host sends a create namespace request to the offload card, including:
  • the virtual machine sends the create namespace request to the uninstall card through the driver.
  • the controller creates respective namespaces for the multiple applications according to the create namespace request
  • the offload card allocates multiple hardware queues corresponding to the multiple created namespaces according to the create namespace request and binds the allocated multiple hardware queues to the corresponding namespaces respectively, including:
  • the plurality of applications are running in the user space, and the applications are executed by corresponding cores,
  • the driver is an NVMe driver
  • the controller is an NVMe controller
  • the method also includes:
  • a virtual machine runs on the host, the host includes a multi-core central processor, the virtual machine includes user space and kernel space, wherein multiple applications that issue input and output requests run in the user space, The application is executed by the corresponding core, wherein the kernel space includes a common block layer and an NVMe driver, and the common block layer includes multiple software queues corresponding to the multiple cores, wherein the virtual machine passes through all
  • the NVMe drive sends a create namespace request to the offload card
  • the application running in the user space sends an input and output request to the general block layer
  • the NVMe controller transfers the input and output requests that comply with the NVMe protocol to the offload card through a hardware queue corresponding to the created namespace
  • the offload card sends input and output requests that comply with the NVMe protocol to the NVMe device.
  • a virtual machine runs on the host, the host includes a multi-core central processor, the virtual machine includes user space and kernel space, wherein multiple applications that issue input and output requests run in the user space, The application is executed by the corresponding core, wherein the kernel space includes a common block layer and an NVMe driver, and the common block layer includes multiple software queues corresponding to the multiple cores, wherein the virtual machine passes through all
  • the NVMe drive sends a create namespace request to the offload card
  • the offload card includes an NVMe controller and a hardware accelerator, wherein the NVMe controller creates respective namespaces for the multiple applications according to the create namespace request, and allocates the corresponding namespaces from the hardware accelerator. Create multiple hardware queues corresponding to multiple namespaces and bind the multiple allocated hardware queues to the corresponding namespaces.
  • the NVMe driver is used to manage the multiple hardware queues, and the common block layer establishes a one-to-one correspondence from the multiple software queues to the multiple hardware queues,
  • the method includes:
  • the application running in the user space sends input and output requests to the general block layer
  • the offload card sends input and output requests that comply with the NVMe protocol to the NVMe device.
  • the namespace management system through the offload card includes a host and an offload card connected to the host, wherein multiple applications that issue input and output requests are run on the host, and the host sends input and output requests to the host.
  • the uninstall card sends a create namespace request
  • the uninstall card creates multiple namespaces corresponding to the multiple applications according to the create namespace request
  • the uninstall card allocates and creates the namespace according to the create namespace request.
  • a virtual machine is run on the host, and the multiple applications are run in the virtual machine.
  • the virtual machine includes: a driver used to manage the multiple hardware Queue, the virtual machine sends the create namespace request to the offload card through the driver, multiple namespaces can be created, and a dedicated hardware queue is dynamically assigned to each namespace, realizing a hardware-level namespace. Resources and applications can use their own corresponding hardware queues, which greatly improves the performance isolation capabilities and reliability of the namespace.
  • the offload card includes a controller and a hardware accelerator, wherein the controller creates respective namespaces for the multiple applications according to the create namespace request, and from all Allocate multiple hardware queues corresponding to the multiple namespaces created in the hardware accelerator and transfer the allocated multiple hardware queues
  • the file queues are respectively bound to the corresponding namespaces, multiple namespaces can be created, and exclusive hardware queues are dynamically assigned to each namespace, realizing hardware-level namespace resources. Applications can use their own corresponding hardware queues. Significantly improves the performance isolation capabilities and reliability of namespaces.
  • the host includes a multi-core central processor
  • the virtual machine includes a user space and a kernel space
  • the multiple applications are run in the user space
  • the kernel space includes a common block layer and the driver
  • the common block layer includes a plurality of software queues corresponding to the plurality of cores, and is established from the plurality of software queues to
  • the one-to-one correspondence between the multiple hardware queues can create multiple namespaces, and dynamically allocate dedicated hardware queues to each namespace, realizing hardware-level namespace resources, and applications can use their own corresponding hardware queues.
  • the driver is an NVMe driver
  • the controller is an NVMe controller
  • the host is connected to the NVMe device through the offload card
  • the NVMe device mounts to the inside of the virtual machine
  • the application running in the user space issues an input/output request to the general block layer
  • the input/output request is converted through processing by the general block layer and the NVMe driver into input and output requests that comply with the NVMe protocol, and send them to the NVMe controller
  • the NVMe controller carries the input and output requests that comply with the NVMe protocol to the Offload card
  • the offload card sends input and output requests that comply with the NVMe protocol to the NVMe device, can create multiple namespaces, and dynamically allocate dedicated hardware queues to each namespace, realizing hardware-level naming.
  • the method runs on the offload card namespace management system, and the offload card namespace management system includes a host and an offload card connected to the host, wherein,
  • the method includes: based on multiple applications running on the host that issue input and output requests, the host sends a create namespace request to the uninstall card; the uninstall card creates a namespace request for the multiple applications based on the create namespace request.
  • Each application creates multiple namespaces corresponding to each application; the uninstall card allocates multiple hardware queues corresponding to the created multiple namespaces according to the create namespace request and binds the allocated multiple hardware queues to the corresponding namespaces respectively.
  • the corresponding namespace can create multiple namespaces, and dynamically allocate dedicated hardware queues to each namespace, realizing hardware-level namespace resources.
  • Applications can use their own corresponding hardware queues, which greatly improves the performance of the namespace. Isolation capabilities and reliability.
  • the solution of the embodiment of the present disclosure does not affect the application of other hardware queues when a single hardware queue fails, and has excellent performance isolation capabilities. And the fault isolation capability has been greatly improved.
  • a virtual machine is run on the host, the multiple applications are run in the virtual machine, and the virtual machine includes a driver for managing the multiple hardware queues,
  • the host sends a create namespace request to the uninstall card, including: the virtual machine sends the create namespace request to the uninstall card through the driver.
  • Multiple namespaces can be created, and dynamically Each namespace is assigned a dedicated hardware queue, which implements hardware-level namespace resources. Applications can use their own corresponding hardware queues, which greatly improves the performance isolation capability and reliability of the namespace.
  • the offload card includes a controller and a hardware accelerator, wherein the offload card creates multiple corresponding namespaces for the multiple applications according to the create namespace request, including :
  • the controller creates respective namespaces for the multiple applications according to the create namespace request, wherein the uninstall card allocates multiple namespaces corresponding to the created multiple namespaces according to the create namespace request.
  • hardware queues and binding the allocated multiple hardware queues to corresponding namespaces respectively including: the controller allocates multiple hardware queues corresponding to the multiple created namespaces from the hardware accelerator and binds the multiple allocated hardware queues to the corresponding namespaces.
  • the multiple assigned hardware queues are bound to the corresponding namespaces respectively.
  • Multiple namespaces can be created, and exclusive hardware queues are dynamically allocated for each namespace. This realizes hardware-level namespace resources and applications can use their own The corresponding hardware queue greatly improves the performance isolation capability and reliability of the namespace.
  • the host includes a multi-core central processor
  • the virtual machine includes a user space and a kernel space
  • the multiple applications are run in the user space
  • the kernel space includes a common block layer and the driver
  • the common block layer includes a plurality of software queues corresponding to the plurality of cores, and is established from the plurality of software queues to
  • the one-to-one correspondence between the multiple hardware queues can create multiple namespaces, and dynamically allocate dedicated hardware queues to each namespace, realizing hardware-level namespace resources, and applications can use their own corresponding hardware queues.
  • the driver is an NVMe driver
  • the controller is an NVMe controller
  • the host is connected to the NVMe device through the offload card
  • the method further includes:
  • the NVMe device is mounted inside the virtual machine; the application running in the user space sends an input and output request to the general block layer; the input is processed by the general block layer and the NVMe driver.
  • the output request is converted into an input and output request that conforms to the NVMe protocol and is sent to the NVMe controller; the NVMe controller carries the input and output request that conforms to the NVMe protocol to the created namespace through a hardware queue corresponding to the created namespace.
  • the offload card sends input and output requests that comply with the NVMe protocol to the NVMe device, can create multiple namespaces, and dynamically allocates dedicated hardware queues to each namespace, realizing a hardware-level namespace. Resources and applications can use their own corresponding hardware queues, which greatly improves the performance isolation capabilities and reliability of the namespace.
  • the existing NVMe namespace capabilities have been strengthened and expanded, combined with the technical architecture of integrated software and hardware, to achieve a significant improvement in performance isolation capabilities and fault isolation capabilities, and are fully compatible with the software ecosystem of upper-layer applications.
  • Figure 1 shows an exemplary schematic diagram of an implementation scenario of an offload card namespace management system according to related technologies
  • Figure 2 shows an exemplary schematic diagram of an implementation scenario of an offload card namespace management system according to an embodiment of the present disclosure
  • Figure 3 shows a structural block diagram of an offload card namespace management system according to an embodiment of the present disclosure
  • Figure 4 shows a flow chart of an offload card namespace management method according to an embodiment of the present disclosure
  • Figure 5 shows a structural block diagram of an input and output request processing system according to an embodiment of the present disclosure
  • Figure 6 shows a flow chart of an input and output request processing method according to an embodiment of the present disclosure
  • Figure 7 shows a structural block diagram of an electronic device according to an embodiment of the present disclosure
  • FIG. 8 is a schematic structural diagram of a computer system suitable for implementing methods according to various embodiments of the present disclosure.
  • the acquisition of user information or user data is an operation authorized, confirmed by the user, or actively selected by the user.
  • DMA Direct Memory Access
  • CPU central processing unit
  • Offloading card It implements functions originally implemented by software through hardware, so that some data processing originally performed on the operating system can be done on the hardware, reducing system CPU consumption and improving processing performance.
  • PCIe Peripheral Component Interconnect express
  • FPGA Field-Programmable Gate Array
  • ASIC Application Specific Integrated Circuit
  • Input/output control is a system call function dedicated to device input and output operations. It passes in a request code related to the device. The function of the system call depends entirely on the request code. ioctl It is a function in the device driver that manages the I/O channel of the device.
  • namespace technology has been currently used to divide/manage the space of NVME storage, which is mainly used to divide disk logical space for different applications and establish a security isolation mechanism.
  • Different namespaces are isolated using software technology and share some hardware resources, such as hardware queues, which will cause performance interference problems and have poor fault isolation capabilities.
  • namespace technology can have independent formatting and encryption capabilities, which is equivalent to independent configuration functions.
  • FIG. 1 shows an exemplary schematic diagram of an implementation scenario of an offload card namespace management system according to the related art.
  • the related art offload card namespace management system can adopt a hardware offloading (offloading) architecture, and the system can include a host 1100 and an offload card 1200.
  • Host 1100 can access NVMe device 1300 through offload card 1200.
  • Offload card 1200 and NVMe device 1300 together can be used as a storage system.
  • the offload card 1200 and the NVMe device 1300 can be connected through the PCIe interface.
  • Specific PCIe interface specifications can be obtained from related technologies, and will not be described in detail in this disclosure.
  • a virtual machine 1110 is running on the host 1100.
  • the virtual machine 1110 may include a user space 1111 and a kernel space 1115.
  • virtual machine 1110 may be a Linux virtual machine.
  • an application 1112 (or a thread of an application) is running in the user space 1111, and the application 1112 delivers data through a central processing unit (CPU) core 1113.
  • the virtual machine 1110 has a common block layer (Blk-mq in the Linux kernel) 1116 in the kernel space 1115 .
  • the general block layer 1116 provides software queues for upper-layer applications 1112, and each CPU core 1113 corresponds to a software queue 11161.
  • the kernel space 1115 of the virtual machine 1110 also has a driver (for example, an NVMe driver) 1117 for maintaining and managing the hardware queue 11171, and can issue input and output (IO, Input Output) requests to the virtual machine 1110 by, for example, setting hardware registers. Handled on specific physical hardware (e.g., offload card).
  • the general block layer 1116 may maintain a mapping relationship from the software queue 11161 to the hardware queue 11171.
  • the software queue 11161 can be used to forward data sent by the CPU core 1113.
  • Hardware queue 11171 is used to forward data that needs to be delivered to NVMe devices.
  • the general block layer 1116 needs to interact with the hardware through the NVMe driver 1117, the hardware also needs to support multiple queues.
  • the most ideal situation is that the hardware supports enough queues, and each software queue 11161 of the general block layer 1116 has a hardware queue 11171 and its association.
  • the hardware queues supported by hardware in related technologies are limited, resulting in a correlation as shown in Figure 1.
  • the general block layer 1116 forms a total of 8 software queues 11161. Therefore, the general block layer 1116 associates two software queues 11161 with one hardware queue 11171, that is, establishes a mapping relationship from two software queues 11161 to one hardware queue 11171.
  • the offload card 1200 includes an NVMe controller 1210 and an accelerator 1220 .
  • the offload card 1200 can simulate the standard NVMe controller 1210 through software and hardware technology, and is responsible for processing the configuration request of the NVMe driver 1117 in the kernel space 1115 in the virtual machine 1110, and completing management tasks such as initialization and mounting/unmounting of the NVMe device 1300.
  • the accelerator 1220 is, for example, an accelerator based on FPGA hardware or ASIC hardware, and is responsible for transferring IO requests in the virtual machine 1110 to the offload card 1200 in a DMA manner and sending them to the NVMe device 1300 through a connection such as a network card or PCI.
  • the 4 hardware queues 11171 included in the NVMe controller 1210 are pre-allocated corresponding to the hardware part 1221 in the accelerator 1220 and are managed by a namespace 1211.
  • the NVMe controller 1210 manages the disk space of the device 1300 and block device-related attributes, such as sector size, capacity, etc., through a namespace.
  • an NVMe controller 1210 can contain four hardware queues, which are uniformly managed by a namespace 1211.
  • the related technology NVMe controller 1210 may actually contain more hardware queues, but the core problem is that even if the number of hardware queues can match the number of applications, it is impossible to allocate independent hardware to each application by dividing the namespace. queue. All application processes share the hardware queue of the NVMe controller 1210. Different applications on the namespace 1211 of the NVMe controller 1210 will share the hardware resources of the NVMe device 1300. Therefore, there is a certain competition for performance resources and weak fault isolation capabilities.
  • the application 1112 in the virtual machine 1110 can issue an IO request to the general block layer 1116 in the kernel space 1115 through the system call interface.
  • the general block layer 1116 in the kernel space 1115 receives the IO request and encapsulates it into a bio request (a request of the general block layer).
  • the NVMe driver 1117 is responsible for converting bio requests into IO requests that comply with the NVMe protocol, and can issue IO requests that comply with the NVMe protocol through the device hardware register.
  • the NVMe driver 1117 can notify the NVMe controller 1210 on the offload card 1200 to process IO requests that comply with the NVMe protocol by setting a hardware register.
  • the accelerator 1220 transfers the IO request of the virtual machine 1110 to the offload card 1200 through DMA and sends it to the NVMe device 1300 through an interface such as a network card or PCI.
  • the hardware queue of the NVMe device 1300 is uniformly managed by a namespace.
  • Different applications share hardware queue resources, which causes performance interference problems and poor fault isolation capabilities.
  • application A uses a low-latency IO model and application B uses a high-throughput IO model. If they share a hardware queue, application A's IO requests will be slowed down by application B. Under certain circumstances, a single hardware queue failure will cause all the above application IOs to hang, with a large impact and poor fault isolation capabilities.
  • the present disclosure proposes an offload card namespace management, input and output request processing system and method.
  • the namespace management system through the offload card includes a host and an offload card connected to the host, wherein multiple applications that issue input and output requests are run on the host, and the host sends input and output requests to the host.
  • the uninstall card sends a create namespace request
  • the uninstall card creates multiple namespaces corresponding to the multiple applications according to the create namespace request
  • the uninstall card allocates and creates the namespace according to the create namespace request.
  • FIG. 2 shows an exemplary schematic diagram of an implementation scenario of an offload card namespace management system according to an embodiment of the present disclosure.
  • the offload card namespace management system is similar to the offload card namespace management system in related technologies. It can adopt a hardware offloading architecture.
  • the system can include a host 2100 and an offload card 2200. .
  • the host 2100 can access the NVMe device 2300 through the offload card 2200.
  • Offload card 2200 and NVMe device 2300 together can be used as a storage system. Communication between the offload card 2200 and the NVMe device 2300 is possible. Connect via PCIe interface. Specific PCIe interface specifications can be obtained from related technologies, and will not be described in detail in this disclosure.
  • a virtual machine 2110 is running on the host 2100.
  • the virtual machine 2110 may include a user space 2111 and a kernel space 2115.
  • virtual machine 2110 may be a Linux virtual machine.
  • an application 2112 (or a thread of an application) is running in the user space 2111, and the application 2112 delivers data through a central processing unit (CPU) core 2113.
  • the virtual machine 2110 has a common block layer (Blk-mq in the Linux kernel) 2116 in the kernel space 2115 .
  • the general block layer 2116 provides software queues for upper-layer applications 2112, and each CPU core 2113 corresponds to a software queue 21161.
  • the kernel space 2115 of the virtual machine 2110 also has a driver (for example, an NVMe driver) 2117, which is used to maintain and manage the hardware queue 21171, and can issue input and output (IO, Input Output) requests to the virtual machine 2110 by, for example, setting hardware registers. Handled on specific physical hardware (e.g., offload card).
  • the general block layer 2116 can maintain the mapping relationship from the software queue 21161 to the hardware queue 21171.
  • the software queue 21161 can be used to forward data sent by the CPU core 2113.
  • Hardware queue 21171 is used to forward data that needs to be delivered to NVMe devices.
  • the universal block layer 2116 needs to interact with the hardware through the NVMe driver 2117, the hardware also needs to support multiple queues.
  • the most ideal situation is that the hardware supports enough queues, and each software queue 11161 of the general block layer 2116 has a hardware queue 21171 and its association.
  • the general block layer 2116 forms a total of 8 software queues 11161. Therefore, the general block layer 2116 associates a software queue 21161 with a hardware queue 21171, that is, establishes a one-to-one mapping relationship between the software queue 11161 and the hardware queue 11171.
  • the specific manner of using the offload card to create multiple namespaces to allocate hardware queues is described in the embodiments of the present disclosure.
  • the offload card 2200 includes an NVMe controller 2210 and an accelerator 2220.
  • the offload card 2200 can simulate the standard NVMe controller 2210 through software and hardware technology, and is responsible for processing the configuration request of the NVMe driver 2117 in the kernel space 2115 in the virtual machine 2110, and completing management tasks such as initialization and mounting/unmounting of the NVMe device 2300.
  • the accelerator 2220 is, for example, an accelerator based on FPGA hardware or ASIC hardware, and is responsible for transferring IO requests in the virtual machine 2110 to the offload card 2200 in a DMA manner and sending them to the NVMe device 2300 through a connection such as a network card or PCI.
  • the NVMe device 2300 can be mounted inside the virtual machine 2110. Inside the virtual machine 2110, an independent namespace can be created for each application 2112 through tools such as nvme-cli (a Linux command line tool for monitoring and configuring and managing NVMe devices). Specifically, the virtual machine 2110 can issue an NVMe Admin Command command to the NVMe driver 2117 through an ioctl system call, specifically a namespace creation request (which may also be called a namespace creation request). After the NVMe driver 2117 receives the create namespace request, it may send the create namespace request to the offload card 2200.
  • nvme-cli a Linux command line tool for monitoring and configuring and managing NVMe devices.
  • nvme-cli a Linux command line tool for monitoring and configuring and managing NVMe devices.
  • the virtual machine 2110 can issue an NVMe Admin Command command to the NVMe driver 2117 through an ioctl system call, specifically a namespace creation request (which may also be called
  • the NVMe driver 2117 may send a create namespace request to the offload card 2200 by setting a hardware register and sending a configuration information request to the offload card 2200.
  • the NVMe controller 2210 on the offload card 2200 creates corresponding namespaces NS1, NS2, NS3, NS4, NS5, NS6, NS7, and NS8 for each application 2112, and obtains the corresponding hardware from the accelerator 2220 Part 2221 allocates hardware queues 21171, respectively, and binds them to the created namespaces NS1, NS2, NS3, NS4, NS5, NS6, NS7, and NS8.
  • the NVMe controller 2210 allocates 8 hardware queues 21171 from the corresponding hardware part 2221 of the accelerator 2220 according to the create namespace request, consisting of eight namespaces NS1, NS2, NS3, NS4, NS5, NS6, NS7, NS8 is managed separately.
  • Hardware parts 2222 other than the hardware part 2221 in the accelerator 2220 are not allocated for use by the namespace NS1, NS2, NS3, NS4, NS5, NS6, NS7, NS8 management.
  • an application 2112 corresponds to a software queue 21161, which corresponds to a hardware queue 21171.
  • a hardware queue 21171 is managed by a namespace bound to it. Therefore, in the offload card namespace management system according to the embodiment of the present disclosure, hardware-level namespace resources can be realized by dynamically allocating dedicated hardware queues to each NVMe namespace, which greatly improves the performance isolation capability of the namespace and reliability. Compared with the offload card namespace management system of related technologies, it has been greatly improved in terms of performance isolation capabilities and performance isolation capabilities. Moreover, with the development of hardware offload technology, existing offload cards can already support one or thousands of hardware queues.
  • the offload card namespace management system adopts a technical architecture that combines software and hardware, supports device forms of multiple namespaces, and allocates independent hardware queues to each namespace to avoid the problem of mutual interference between IO requests of different applications. And improved fault isolation capabilities.
  • the application 2112 in the virtual machine 2110 can issue an IO request to the general block layer 2116 in the kernel space 2115 through the system call interface.
  • the general block layer 1116 in the kernel space 2115 receives the IO request and encapsulates it into a bio request (general block layer request).
  • the NVMe driver 2117 is responsible for converting bio requests into IO requests that comply with the NVMe protocol, and can issue IO requests that comply with the NVMe protocol through the device hardware registers.
  • the NVMe driver 2117 can notify the NVMe controller 2210 on the offload card 2200 to process IO requests that comply with the NVMe protocol by setting a hardware register.
  • the accelerator 2220 transfers the IO request of the virtual machine 2110 to the offload card 2200 through DMA through an independent hardware queue (that is, a hardware queue corresponding to each namespace) 21171, and sends it to the NVMe device through an interface such as a network card or PCI. 2300.
  • an independent hardware queue that is, a hardware queue corresponding to each namespace
  • each CPU core 2113 corresponds to a software queue 21161, but in the general block layer 2116, unlike the related art
  • the namespace management system of the offload card is different. What the driver 2117 manages and maintains is not the preset hardware queue. Instead, the NVMe controller 2210 of the offload card 2200 creates multiple namespaces with the same number as the software queues based on the namespace creation request. , a hardware queue allocated from the accelerator 2220.
  • the number of hardware queues 21171 can be the same as that of software queues 21161, and the two can correspond one to one. In this way, the hardware queues of NVMe device 2300 are managed by independent namespaces. Different applications use their own hardware queue resources, and there is no performance interference. And the fault isolation capability is very good.
  • FIG. 3 shows a structural block diagram of an offload card namespace management system 300 according to an embodiment of the present disclosure.
  • the loading card namespace management system 300 shown in FIG. 3 includes a host 301 and an unloading card 302.
  • the host 301 runs a plurality of applications that issue input and output requests, and the host 301 sends a create namespace request to the uninstall card 302.
  • the uninstall card 302 creates the namespace according to the create namespace request. Multiple applications create multiple corresponding namespaces.
  • the offload card 302 allocates multiple hardware queues corresponding to the multiple created namespaces according to the create namespace request and binds the allocated multiple hardware queues respectively. to the corresponding namespace.
  • the namespace management system through the offload card includes a host and an offload card connected to the host, wherein multiple applications that issue input and output requests are run on the host, and the host sends input and output requests to the host.
  • the uninstall card sends a create namespace request
  • the uninstall card creates multiple namespaces corresponding to the multiple applications according to the create namespace request
  • the uninstall card allocates and creates the namespace according to the create namespace request.
  • Multiple hardware queues corresponding to multiple namespaces and bind the allocated multiple hardware queues to the corresponding named Space you can create multiple namespaces, and dynamically allocate dedicated hardware queues to each namespace, realizing hardware-level namespace resources.
  • a virtual machine runs on the host 301, and the multiple applications run in the virtual machine.
  • the virtual machine includes: a driver for managing the multiple hardware queues.
  • the virtual machine sends the create namespace request to the offload card 302 through the driver.
  • a virtual machine is run on the host, and the multiple applications are run in the virtual machine.
  • the virtual machine includes: a driver used to manage the multiple hardware Queue, the virtual machine sends the create namespace request to the offload card through the driver, multiple namespaces can be created, and a dedicated hardware queue is dynamically assigned to each namespace, realizing a hardware-level namespace. Resources and applications can use their own corresponding hardware queues, which greatly improves the performance isolation capabilities and reliability of the namespace.
  • the offload card 302 includes a controller and a hardware accelerator, wherein the controller creates respective namespaces for the multiple applications according to the create namespace request, and from the Multiple hardware queues corresponding to the multiple created namespaces are allocated in the hardware accelerator and the multiple allocated hardware queues are respectively bound to the corresponding namespaces.
  • the offload card includes a controller and a hardware accelerator, wherein the controller creates respective namespaces for the multiple applications according to the create namespace request, and from all The hardware accelerator allocates multiple hardware queues corresponding to the multiple namespaces created and binds the multiple allocated hardware queues to the corresponding namespaces respectively.
  • Multiple namespaces can be created and dynamically configured for each namespace.
  • Each namespace is assigned a dedicated hardware queue, which implements hardware-level namespace resources.
  • Applications can use their own corresponding hardware queues, which greatly improves the performance isolation capability and reliability of the namespace.
  • the host 301 includes a multi-core central processor
  • the virtual machine includes a user space and a kernel space
  • the multiple applications are run in the user space, and the applications are composed of Corresponding core execution
  • the kernel space includes a common block layer and the driver
  • the common block layer includes a plurality of software queues corresponding to the plurality of cores, and establishes a link from the plurality of software queues to all Describes the one-to-one correspondence between multiple hardware queues.
  • the host includes a multi-core central processor
  • the virtual machine includes a user space and a kernel space
  • the multiple applications are run in the user space
  • the kernel space includes a common block layer and the driver
  • the common block layer includes a plurality of software queues corresponding to the plurality of cores, and is established from the plurality of software queues to
  • the one-to-one correspondence between the multiple hardware queues can create multiple namespaces, and dynamically allocate dedicated hardware queues to each namespace, realizing hardware-level namespace resources, and applications can use their own corresponding hardware queues.
  • the driver is an NVMe driver
  • the controller is an NVMe controller
  • the host 301 is connected to the NVMe device 303 through the offload card 302, where the NVMe device 303 Mounted inside the virtual machine, where the application running in the user space issues an input and output request to the common block layer, where the input and output are processed by the common block layer and the NVMe driver ask Convert the input and output requests that comply with the NVMe protocol and send them to the NVMe controller, where the NVMe controller carries the input and output requests that comply with the NVMe protocol to the created namespace through the hardware queue corresponding to the created namespace.
  • the offload card 302, wherein the offload card 302 sends input and output requests that comply with the NVMe protocol to the NVMe device 303.
  • the input and output requests are converted into input and output requests that comply with the NVMe protocol through the processing of the common block layer and the NVMe driver, and the specific manner of sending them to the NVMe controller can be Including: the general block layer converts the input and output requests issued by the application into input and output requests that conform to the general block device protocol (for example, bio structure), and the NVMe driver converts the input and output requests that conform to the general block device protocol into Input and output requests that comply with the NVMe protocol and are issued through the device hardware registers.
  • the general block layer converts the input and output requests issued by the application into input and output requests that conform to the general block device protocol (for example, bio structure)
  • the NVMe driver converts the input and output requests that conform to the general block device protocol into Input and output requests that comply with the NVMe protocol and are issued through the device hardware registers.
  • the driver is an NVMe driver
  • the controller is an NVMe controller
  • the host is connected to the NVMe device through the offload card
  • the NVMe device mounts to the inside of the virtual machine
  • the application running in the user space issues an input/output request to the general block layer
  • the input/output request is converted through processing by the general block layer and the NVMe driver into input and output requests that comply with the NVMe protocol, and send them to the NVMe controller
  • the NVMe controller carries the input and output requests that comply with the NVMe protocol to the Offload card
  • the offload card sends input and output requests that comply with the NVMe protocol to the NVMe device, can create multiple namespaces, and dynamically allocate dedicated hardware queues to each namespace, realizing hardware-level naming.
  • Figure 4 shows a flow chart of an offload card namespace management method according to an embodiment of the present disclosure.
  • the offload card namespace management method shown in FIG. 4 runs on an offload card namespace management system.
  • the offload card namespace management system includes a host and an offload card connected to the host.
  • the uninstall card namespace management system is the uninstall card namespace management system described with reference to FIG. 3 .
  • the offload card namespace management method includes steps S401, S402, and S403.
  • step S401 based on multiple applications running on the host that issue input and output requests, the host sends a create namespace request to the uninstall card.
  • step S402 the uninstall card creates multiple corresponding namespaces for the multiple applications according to the create namespace request.
  • step S403 the offload card allocates multiple hardware queues corresponding to the multiple created namespaces according to the create namespace request and binds the allocated multiple hardware queues to the corresponding namespaces respectively.
  • a virtual machine runs on the host, the multiple applications run in the virtual machine, and the virtual machine includes a driver for managing the multiple hardware queues, wherein, The host unloads to the The card sends a create namespace request, including: the virtual machine sends the create namespace request to the uninstall card through the driver.
  • the offload card includes a controller and a hardware accelerator, wherein the offload card creates multiple corresponding namespaces for the multiple applications according to the create namespace request, including :
  • the controller creates respective namespaces for the multiple applications according to the create namespace request, wherein the uninstall card allocates multiple namespaces corresponding to the created multiple namespaces according to the create namespace request.
  • hardware queues and binding the allocated multiple hardware queues to corresponding namespaces respectively including: the controller allocates multiple hardware queues corresponding to the multiple created namespaces from the hardware accelerator and binds the multiple allocated hardware queues to the corresponding namespaces.
  • the multiple assigned hardware queues are bound to the corresponding namespaces respectively.
  • Multiple namespaces can be created, and exclusive hardware queues are dynamically allocated for each namespace. This realizes hardware-level namespace resources and applications can use their own The corresponding hardware queue greatly improves the performance isolation capability and reliability of the namespace.
  • the driver is an NVMe driver
  • the controller is an NVMe controller
  • the host is connected to the NVMe device through the offload card, and the method further includes: connecting the The NVMe device is mounted inside the virtual machine;
  • the application running in the user space sends an input and output request to the general block layer; the input and output request is converted into an input and output request that complies with the NVMe protocol through the processing of the general block layer and the NVMe driver, And sent to the NVMe controller; the NVMe controller carries the input and output requests that comply with the NVMe protocol to the offload card through the hardware queue corresponding to the created namespace, and the offload card will comply with the NVMe protocol Input and output requests are sent to the NVMe device.
  • a virtual machine runs on the host 501.
  • the host 501 includes a multi-core central processor.
  • the virtual machine includes a user space and a kernel space. In the user space, multiple programs that issue input and output requests are run. application, the application is executed by the corresponding core, wherein the kernel space includes a common block layer and an NVMe driver, the common block layer includes a plurality of software queues corresponding to the multiple cores, wherein the virtual machine A create namespace request is sent to the offload card 502 through the NVMe driver.
  • the input and output requests are converted into input and output requests that comply with the NVMe protocol through the processing of the general block layer and the NVMe driver, and are sent to the NVMe controller,
  • the NVMe controller transfers the input and output requests that comply with the NVMe protocol to the offload card 502 through a hardware queue corresponding to the created namespace,
  • a virtual machine runs on the host, the host includes a multi-core central processor, the virtual machine includes user space and kernel space, wherein multiple applications that issue input and output requests run in the user space, The application is executed by the corresponding core, wherein the kernel space includes a common block layer and an NVMe driver, and the common block layer includes multiple software queues corresponding to the multiple cores, wherein the virtual machine passes through all
  • the NVMe drive sends a create namespace request to the offload card
  • the offload card includes an NVMe controller and a hardware accelerator, wherein the NVMe controller creates respective namespaces for the multiple applications according to the create namespace request, and allocates the corresponding namespaces from the hardware accelerator. Create multiple hardware queues corresponding to multiple namespaces and bind the multiple allocated hardware queues to the corresponding namespaces.
  • the NVMe driver is used to manage the multiple hardware queues, and the common block layer establishes a one-to-one correspondence from the multiple software queues to the multiple hardware queues,
  • step S601 the application running in the user space sends an input and output request to the general block layer
  • step S602 the input and output requests are converted into input and output requests that comply with the NVMe protocol through the processing of the common block layer and the NVMe driver, and are sent to the NVMe controller;
  • step S603 the NVMe controller transfers the input and output requests that comply with the NVMe protocol to the offload card through the hardware queue corresponding to the created namespace;
  • step S604 the offload card sends an input and output request that complies with the NVMe protocol to the NVMe device.
  • FIG. 7 shows a structural block diagram of an electronic device according to an embodiment of the present disclosure.
  • the embodiment of the present disclosure also provides an electronic device, as shown in Figure 7, including at least one processor 701; and a memory 702 communicatively connected with the at least one processor 701; wherein the memory 702 stores information that can be used by the at least one processor.
  • the instruction executed by 701 runs in the offload card namespace management system.
  • the offload card namespace management system includes a host and an offload card connected to the host.
  • the instruction is executed by at least one processor 701 to implement the following steps:
  • the host Based on multiple applications running on the host that issue input and output requests, the host sends a create namespace request to the offload card;
  • the uninstall card creates multiple corresponding namespaces for the multiple applications according to the create namespace request;
  • the offload card allocates multiple hardware queues corresponding to the multiple created namespaces according to the create namespace request and binds the allocated multiple hardware queues to the corresponding namespaces respectively.
  • the controller creates respective namespaces for the multiple applications according to the create namespace request
  • the offload card allocates multiple hardware queues corresponding to the multiple created namespaces according to the create namespace request and binds the allocated multiple hardware queues to the corresponding namespaces respectively, including:
  • the controller allocates multiple hardware queues corresponding to the created multiple namespaces from the hardware accelerator and binds the allocated multiple hardware queues to corresponding namespaces respectively.
  • the host includes a multi-core central processor
  • the virtual machine includes user space and kernel space
  • the plurality of applications are running in the user space, and the applications are executed by corresponding cores,
  • the kernel space includes a general block layer and the driver
  • the general block layer includes a plurality of software queues corresponding to the plurality of cores, and establishes a transition from the plurality of software queues to the plurality of hardware queues. one-to-one correspondence.
  • the NVMe controller transfers the input and output requests that comply with the NVMe protocol to the offload card through the hardware queue corresponding to the created namespace,
  • the embodiment of the present disclosure also provides an electronic device, as shown in Figure 7, including at least one processor 701; and a memory 702 communicatively connected with the at least one processor 701; wherein the memory 702 stores information that can be processed by at least one The instructions executed by the processor 701 run on the input and output request processing system.
  • the input and output request processing system includes a host, an offload card, and an NVMe device connected to the host through the offload card,
  • the offload card includes an NVMe controller and a hardware accelerator, wherein the NVMe controller creates respective namespaces for the multiple applications according to the create namespace request, and allocates the corresponding namespaces from the hardware accelerator. Create multiple hardware queues corresponding to multiple namespaces and bind the multiple allocated hardware queues to the corresponding namespaces.
  • the method includes:
  • the application running in the user space sends input and output requests to the general block layer
  • the computer system 800 includes a processing unit 801 that can execute the above-described appendix according to a program stored in a read-only memory (ROM) 802 or a program loaded from a storage portion 808 into a random access memory (RAM) 803 .
  • ROM read-only memory
  • RAM random access memory
  • CPU801, ROM802 and RAM803 are connected to each other through bus 804.
  • An input/output (I/O) interface 805 is also connected to bus 804.
  • the following components are connected to the I/O interface 805: an input section 806 including a keyboard, a mouse, etc.; an output section 807 including a cathode ray tube (CRT), a liquid crystal display (LCD), etc., speakers, etc.; and a storage section 808 including a hard disk, etc. ; and a communication section 809 including a network interface card such as a LAN card, a modem, etc.
  • the communication section 809 performs communication processing via a network such as the Internet.
  • Driver 810 is also connected to I/O interface 805 as needed.
  • each block of the block diagram and/or flowchart illustration, and combinations of blocks in the block diagram and/or flowchart illustration can be implemented by special purpose hardware-based systems that perform the specified functions or operations. , or can be implemented using a combination of specialized hardware and computer instructions.
  • the units or modules described in the embodiments of the present disclosure may be implemented in software or hardware.
  • the described units or modules may also be provided in the processor, and the names of these units or modules do not constitute a limitation on the units or modules themselves under certain circumstances.
  • the present disclosure also provides a computer-readable storage medium.
  • the computer-readable storage medium may be the computer-readable storage medium included in the node described in the above embodiments; it may also exist independently without A computer-readable storage medium that is built into a device.
  • the computer-readable storage medium stores one or more programs, which are used by one or more processors to perform the methods described in the present disclosure.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • Software Systems (AREA)
  • Stored Programmes (AREA)

Abstract

卸载卡命名空间管理、输入输出请求处理系统和方法。卸载卡命名空间管理系统(300)包括主机(301)和与主机(301)连接的卸载卡(302),其中,所述主机(301)上运行有发出输入输出请求的多个应用,并且所述主机(301)向所述卸载卡(302)发送创建命名空间请求,所述卸载卡(302)根据所述创建命名空间请求为所述多个应用创建对应的多个命名空间,所述卸载卡(302)根据所述创建命名空间请求分配与所创建的多个命名空间对应的多个硬件队列并且将所分配的多个硬件队列分别绑定到所对应的命名空间,可以创建多个命名空间,并且动态地为每个命名空间分配专属硬件队列,实现了硬件级别的命名空间资源,应用可以使用自身对应的硬件队列,大幅提升了命名空间的性能隔离能力以及可靠性。

Description

卸载卡命名空间管理、输入输出请求处理系统和方法
本申请要求于2022年03月18日提交中国专利局、申请号为202210273112.9、申请名称为“卸载卡命名空间管理、输入输出请求处理系统和方法”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本公开涉及计算机技术领域,具体涉及卸载卡命名空间管理、输入输出请求处理系统和方法。
背景技术
NVMe(Non-Volatile Memory express,非易失性内存主机控制器接口规范)存储器是服务器技术领域常见的固态存储器,例如,NVMe SSD(Solid State Drive,固态硬盘)。随着产品应用情景的复杂性,直连NVMe设备需要占用中央处理器(CPU)大量计算空间,从而影响到CPU的性能。
当前已经采用命名空间(namespace)技术来对NVME存储器进行空间划分/管理,主要用于为不同的应用划分磁盘逻辑空间和建立安全隔离机制。不同的命名空间采用软件技术隔离,共用一部分硬件资源,例如,硬件队列。
相关技术中已知的命名空间技术的优势:通过软件实现,提供灵活的空间管理能力,满足软件定义存储诉求。
相关技术中已知的命名空间技术的劣势:在命名空间上不同应用会共用NVMe设备的例如硬件队列的硬件资源,会产生性能干扰问题,而且故障隔离能力比较差。
发明内容
为了解决相关技术中的问题,本公开实施例提供卸载卡命名空间管理、输入输出请求处理系统和方法。
第一方面,本公开实施例中提供了一种卸载卡命名空间管理系统,包括主机和与主机连接的卸载卡,其中,
所述主机上运行有发出输入输出请求的多个应用,并且所述主机向所述卸载卡发送创建命名空间请求,
所述卸载卡根据所述创建命名空间请求为所述多个应用创建对应的多个命名空间,
所述卸载卡根据所述创建命名空间请求分配与所创建的多个命名空间对应的多个硬件队列并且将所分配的多个硬件队列分别绑定到所对应的命名空间。
结合第一方面,本公开在第一方面的第一种实现方式中,所述主机上运行有虚拟机, 所述多个应用运行于所述虚拟机中,所述虚拟机包括:
驱动器,其用于管理所述多个硬件队列,所述虚拟机通过所述驱动器向所述卸载卡发送所述创建命名空间请求。
结合第一方面的第一种实现方式,本公开在第一方面的第二种实现方式中,所述卸载卡包括控制器和硬件加速器,其中,
所述控制器根据所述创建命名空间请求为所述多个应用创建各自的命名空间,并且从所述硬件加速器中分配与所创建的多个命名空间对应的多个硬件队列并且将所分配的多个硬件队列分别绑定到所对应的命名空间。
结合第一方面的第二种实现方式,本公开在第一方面的第三种实现方式中,所述主机包括多核心中央处理器,所述虚拟机包括用户空间和内核空间,
其中,所述用户空间中运行有所述多个应用,所述应用由对应的核心执行,
其中,所述内核空间包括通用块层和所述驱动器,所述通用块层包括与所述多个核心对应的多个软件队列,并且建立从所述多个软件队列到所述多个硬件队列的一一对应关系。
结合第一方面的第三种实现方式,本公开在第一方面的第四种实现方式中,所述驱动器是NVMe驱动器,所述控制器是NVMe控制器,
其中,所述主机通过所述卸载卡连接到NVMe设备,
其中,所述NVMe设备挂载到所述虚拟机内部,
其中,所述用户空间中运行的应用向所述通用块层发出输入输出请求,
其中,通过所述通用块层和所述NVMe驱动器的处理将所述输入输出请求转换成符合NVMe协议的输入输出请求,并且发送到所述NVMe控制器,
其中,所述NVMe控制器通过与所创建的命名空间对应的硬件队列将所述符合NVMe协议的输入输出请求搬运到所述卸载卡,
其中,所述卸载卡将符合NVMe协议的输入输出请求发送到所述NVMe设备。
第二方面,本公开实施例中提供了一种卸载卡命名空间管理方法,所述方法运行于卸载卡命名空间管理系统,所述卸载卡命名空间管理系统包括主机和与主机连接的卸载卡,其中,所述方法包括:
基于所述主机上运行的发出输入输出请求的多个应用,所述主机向所述卸载卡发送创建命名空间请求;
所述卸载卡根据所述创建命名空间请求为所述多个应用创建对应的多个命名空间;
所述卸载卡根据所述创建命名空间请求分配与所创建的多个命名空间对应的多个硬件队列并且将所分配的多个硬件队列分别绑定到所对应的命名空间。
结合第二方面,本公开在第二方面的第一种实现方式中,所述主机上运行有虚拟机,所述多个应用运行于所述虚拟机中,所述虚拟机包括用于管理所述多个硬件队列的驱动器,其中,所述主机向所述卸载卡发送创建命名空间请求,包括:
所述虚拟机通过所述驱动器向所述卸载卡发送所述创建命名空间请求。
结合第二方面的第一种实现方式,本公开在第二方面的第二种实现方式中,所述卸载卡包括控制器和硬件加速器,其中,所述卸载卡根据所述创建命名空间请求为所述多个应用创建对应的多个命名空间,包括:
所述控制器根据所述创建命名空间请求为所述多个应用创建各自的命名空间,
其中,所述卸载卡根据所述创建命名空间请求分配与所创建的多个命名空间对应的多个硬件队列并且将所分配的多个硬件队列分别绑定到所对应的命名空间,包括:
所述控制器从所述硬件加速器中分配与所创建的多个命名空间对应的多个硬件队列并且将所分配的多个硬件队列分别绑定到所对应的命名空间。
结合第二方面的第二种实现方式,本公开在第二方面的第三种实现方式中,所述主机包括多核心中央处理器,所述虚拟机包括用户空间和内核空间,
其中,所述用户空间中运行有所述多个应用,所述应用由对应的核心执行,
其中,所述内核空间包括通用块层和所述驱动器,所述通用块层包括与所述多个核心对应的多个软件队列,并且建立从所述多个软件队列到所述多个硬件队列的一一对应关系。
结合第二方面的第三种实现方式,本公开在第二方面的第四种实现方式中,所述驱动器是NVMe驱动器,所述控制器是NVMe控制器,
其中,所述主机通过所述卸载卡连接到NVMe设备,
其中,所述方法还包括:
将所述NVMe设备挂载到所述虚拟机内部;
所述用户空间中运行的应用向所述通用块层发出输入输出请求;
通过所述通用块层和所述NVMe驱动器的处理将所述输入输出请求转换成符合NVMe协议的输入输出请求,并且发送到所述NVMe控制器;
所述NVMe控制器通过与所创建的命名空间对应的硬件队列将所述符合NVMe协议的输入输出请求搬运到所述卸载卡,
所述卸载卡将符合NVMe协议的输入输出请求发送到所述NVMe设备。
第三方面,本公开实施例中提供了一种输入输出请求处理系统,其包括主机、卸载卡和通过所述卸载卡与所述主机连接的NVMe设备,
其中,所述主机上运行有虚拟机,所述主机包括多核心中央处理器,所述虚拟机包括用户空间和内核空间,其中,所述用户空间中运行有发出输入输出请求的多个应用,所述应用由对应的核心执行,其中,所述内核空间包括通用块层和NVMe驱动器,所述通用块层包括与所述多个核心对应的多个软件队列,其中,所述虚拟机通过所述NVMe驱动器向所述卸载卡发送创建命名空间请求,
其中,所述卸载卡包括NVMe控制器和硬件加速器,其中,所述NVMe控制器根据所述创建命名空间请求为所述多个应用创建各自的命名空间,并且从所述硬件加速器中分配与所创建的多个命名空间对应的多个硬件队列并且将所分配的多个硬件队列分别绑定到所对应的命名空间,
其中,所述NVMe驱动器用于管理所述多个硬件队列,所述通用块层建立从所述多个软件队列到所述多个硬件队列的一一对应关系,
其中,所述用户空间中运行的应用向所述通用块层发出输入输出请求,
其中,通过所述通用块层和所述NVMe驱动器的处理将所述输入输出请求转换成符合NVMe协议的输入输出请求,并且发送到所述NVMe控制器,
其中,所述NVMe控制器通过与所创建的命名空间对应的硬件队列将所述符合NVMe协议的输入输出请求搬运到所述卸载卡,
其中,所述卸载卡将符合NVMe协议的输入输出请求发送到所述NVMe设备。
第四方面,本公开实施例中提供了一种输入输出请求处理方法,所述方法运行于输入输出请求处理系统,所述输入输出请求处理系统包括主机、卸载卡和通过所述卸载卡与所述主机连接的NVMe设备,
其中,所述主机上运行有虚拟机,所述主机包括多核心中央处理器,所述虚拟机包括用户空间和内核空间,其中,所述用户空间中运行有发出输入输出请求的多个应用,所述应用由对应的核心执行,其中,所述内核空间包括通用块层和NVMe驱动器,所述通用块层包括与所述多个核心对应的多个软件队列,其中,所述虚拟机通过所述NVMe驱动器向所述卸载卡发送创建命名空间请求,
其中,所述卸载卡包括NVMe控制器和硬件加速器,其中,所述NVMe控制器根据所述创建命名空间请求为所述多个应用创建各自的命名空间,并且从所述硬件加速器中分配与所创建的多个命名空间对应的多个硬件队列并且将所分配的多个硬件队列分别绑定到所对应的命名空间,
其中,所述NVMe驱动器用于管理所述多个硬件队列,所述通用块层建立从所述多个软件队列到所述多个硬件队列的一一对应关系,
其中,所述方法包括:
所述用户空间中运行的应用向所述通用块层发出输入输出请求;
通过所述通用块层和所述NVMe驱动器的处理将所述输入输出请求转换成符合NVMe协议的输入输出请求,并且发送到所述NVMe控制器;
所述NVMe控制器通过与所创建的命名空间对应的硬件队列将所述符合NVMe协议的输入输出请求搬运到所述卸载卡;
所述卸载卡将符合NVMe协议的输入输出请求发送到所述NVMe设备。
根据本公开实施例提供的技术方案,通过卸载卡命名空间管理系统,包括主机和与主机连接的卸载卡,其中,所述主机上运行有发出输入输出请求的多个应用,并且所述主机向所述卸载卡发送创建命名空间请求,所述卸载卡根据所述创建命名空间请求为所述多个应用创建对应的多个命名空间,所述卸载卡根据所述创建命名空间请求分配与所创建的多个命名空间对应的多个硬件队列并且将所分配的多个硬件队列分别绑定到所对应的命名空间,可以创建多个命名空间,并且动态地为每个命名空间分配专属硬件队列,实现了硬件级别的命名空间资源,应用可以使用自身对应的硬件队列,大幅提升了命名空间的性能隔离能力以及可靠性。而且,相较于所有应用进程共用统一由一个命名空间管理的多个硬件队列的方案,本公开实施例的方案在单一硬件队列出现故障的情况下不影响其他硬件队列的应用,在性能隔离能力以及故障隔离能力方面有很大提升。
根据本公开实施例提供的技术方案,通过所述主机上运行有虚拟机,所述多个应用运行于所述虚拟机中,所述虚拟机包括:驱动器,其用于管理所述多个硬件队列,所述虚拟机通过所述驱动器向所述卸载卡发送所述创建命名空间请求,可以创建多个命名空间,并且动态地为每个命名空间分配专属硬件队列,实现了硬件级别的命名空间资源,应用可以使用自身对应的硬件队列,大幅提升了命名空间的性能隔离能力以及可靠性。
根据本公开实施例提供的技术方案,通过所述卸载卡包括控制器和硬件加速器,其中,所述控制器根据所述创建命名空间请求为所述多个应用创建各自的命名空间,并且从所述硬件加速器中分配与所创建的多个命名空间对应的多个硬件队列并且将所分配的多个硬 件队列分别绑定到所对应的命名空间,可以创建多个命名空间,并且动态地为每个命名空间分配专属硬件队列,实现了硬件级别的命名空间资源,应用可以使用自身对应的硬件队列,大幅提升了命名空间的性能隔离能力以及可靠性。
根据本公开实施例提供的技术方案,通过所述主机包括多核心中央处理器,所述虚拟机包括用户空间和内核空间,其中,所述用户空间中运行有所述多个应用,所述应用由对应的核心执行,其中,所述内核空间包括通用块层和所述驱动器,所述通用块层包括与所述多个核心对应的多个软件队列,并且建立从所述多个软件队列到所述多个硬件队列的一一对应关系,可以创建多个命名空间,并且动态地为每个命名空间分配专属硬件队列,实现了硬件级别的命名空间资源,应用可以使用自身对应的硬件队列,大幅提升了命名空间的性能隔离能力以及可靠性。
根据本公开实施例提供的技术方案,通过所述驱动器是NVMe驱动器,所述控制器是NVMe控制器,其中,所述主机通过所述卸载卡连接到NVMe设备,其中,所述NVMe设备挂载到所述虚拟机内部,其中,所述用户空间中运行的应用向所述通用块层发出输入输出请求,其中,通过所述通用块层和所述NVMe驱动器的处理将所述输入输出请求转换成符合NVMe协议的输入输出请求,并且发送到所述NVMe控制器,其中,所述NVMe控制器通过与所创建的命名空间对应的硬件队列将所述符合NVMe协议的输入输出请求搬运到所述卸载卡,其中,所述卸载卡将符合NVMe协议的输入输出请求发送到所述NVMe设备,可以创建多个命名空间,并且动态地为每个命名空间分配专属硬件队列,实现了硬件级别的命名空间资源,应用可以使用自身对应的硬件队列,大幅提升了命名空间的性能隔离能力以及可靠性。而且,对现有的NVMe命名空间能力进行了加强扩展,并结合软硬件一体化的技术架构,实现了性能隔离能力和故障隔离能力的大幅提升,并且完全兼容上层应用的软件生态。
根据本公开实施例提供的技术方案,通过卸载卡命名空间管理方法,所述方法运行于卸载卡命名空间管理系统,所述卸载卡命名空间管理系统包括主机和与主机连接的卸载卡,其中,所述方法包括:基于所述主机上运行的发出输入输出请求的多个应用,所述主机向所述卸载卡发送创建命名空间请求;所述卸载卡根据所述创建命名空间请求为所述多个应用创建对应的多个命名空间;所述卸载卡根据所述创建命名空间请求分配与所创建的多个命名空间对应的多个硬件队列并且将所分配的多个硬件队列分别绑定到所对应的命名空间,可以创建多个命名空间,并且动态地为每个命名空间分配专属硬件队列,实现了硬件级别的命名空间资源,应用可以使用自身对应的硬件队列,大幅提升了命名空间的性能隔离能力以及可靠性。而且,相较于所有应用进程共用统一由一个命名空间管理的多个硬件队列的方案,本公开实施例的方案在单一硬件队列出现故障的情况下不影响其他硬件队列的应用,在性能隔离能力以及故障隔离能力方面有很大提升。
根据本公开实施例提供的技术方案,通过所述主机上运行有虚拟机,所述多个应用运行于所述虚拟机中,所述虚拟机包括用于管理所述多个硬件队列的驱动器,其中,所述主机向所述卸载卡发送创建命名空间请求,包括:所述虚拟机通过所述驱动器向所述卸载卡发送所述创建命名空间请求,可以创建多个命名空间,并且动态地为每个命名空间分配专属硬件队列,实现了硬件级别的命名空间资源,应用可以使用自身对应的硬件队列,大幅提升了命名空间的性能隔离能力以及可靠性。
根据本公开实施例提供的技术方案,通过所述卸载卡包括控制器和硬件加速器,其中,所述卸载卡根据所述创建命名空间请求为所述多个应用创建对应的多个命名空间,包括:所述控制器根据所述创建命名空间请求为所述多个应用创建各自的命名空间,其中,所述卸载卡根据所述创建命名空间请求分配与所创建的多个命名空间对应的多个硬件队列并且将所分配的多个硬件队列分别绑定到所对应的命名空间,包括:所述控制器从所述硬件加速器中分配与所创建的多个命名空间对应的多个硬件队列并且将所分配的多个硬件队列分别绑定到所对应的命名空间,可以创建多个命名空间,并且动态地为每个命名空间分配专属硬件队列,实现了硬件级别的命名空间资源,应用可以使用自身对应的硬件队列,大幅提升了命名空间的性能隔离能力以及可靠性。
根据本公开实施例提供的技术方案,通过所述主机包括多核心中央处理器,所述虚拟机包括用户空间和内核空间,其中,所述用户空间中运行有所述多个应用,所述应用由对应的核心执行,其中,所述内核空间包括通用块层和所述驱动器,所述通用块层包括与所述多个核心对应的多个软件队列,并且建立从所述多个软件队列到所述多个硬件队列的一一对应关系,可以创建多个命名空间,并且动态地为每个命名空间分配专属硬件队列,实现了硬件级别的命名空间资源,应用可以使用自身对应的硬件队列,大幅提升了命名空间的性能隔离能力以及可靠性。
根据本公开实施例提供的技术方案,通过所述驱动器是NVMe驱动器,所述控制器是NVMe控制器,其中,所述主机通过所述卸载卡连接到NVMe设备,其中,所述方法还包括:将所述NVMe设备挂载到所述虚拟机内部;所述用户空间中运行的应用向所述通用块层发出输入输出请求;通过所述通用块层和所述NVMe驱动器的处理将所述输入输出请求转换成符合NVMe协议的输入输出请求,并且发送到所述NVMe控制器;所述NVMe控制器通过与所创建的命名空间对应的硬件队列将所述符合NVMe协议的输入输出请求搬运到所述卸载卡,所述卸载卡将符合NVMe协议的输入输出请求发送到所述NVMe设备,可以创建多个命名空间,并且动态地为每个命名空间分配专属硬件队列,实现了硬件级别的命名空间资源,应用可以使用自身对应的硬件队列,大幅提升了命名空间的性能隔离能力以及可靠性。而且,对现有的NVMe命名空间能力进行了加强扩展,并结合软硬件一体化的技术架构,实现了性能隔离能力和故障隔离能力的大幅提升,并且完全兼容上层应用的软件生态。
根据本公开实施例提供的技术方案,通过利用前述卸载卡命名空间管理方案进行输入输出请求处理,可以创建多个命名空间,并且动态地为每个命名空间分配专属硬件队列,实现了硬件级别的命名空间资源,应用可以使用自身对应的硬件队列,避免不同应用输入输出请求相互干扰问题,并提升了故障隔离能力大幅提升了命名空间的性能隔离能力以及可靠性。而且,对现有的NVMe命名空间能力进行了加强扩展,并结合软硬件一体化的技术架构,实现了性能隔离能力和故障隔离能力的大幅提升,并且完全兼容上层应用的软件生态。
应当理解的是,以上的一般描述和后文的细节描述仅是示例性和解释性的,并不能限制本公开。
附图说明
结合附图,通过以下非限制性实施方式的详细描述,本公开的其它特征、目的和优点将变得更加明显。在附图中:
图1示出根据相关技术的卸载卡命名空间管理系统的实施场景的示例性示意图;
图2示出根据本公开实施例的卸载卡命名空间管理系统的实施场景的示例性示意图;
图3示出根据本公开实施例的卸载卡命名空间管理系统的结构框图;
图4示出根据本公开实施例的卸载卡命名空间管理方法的流程图;
图5示出根据本公开实施例的输入输出请求处理系统的结构框图;
图6示出根据本公开实施例的输入输出请求处理方法的流程图;
图7示出根据本公开一实施方式的电子设备的结构框图;
图8是适于用来实现根据本公开各实施方式的方法的计算机系统的结构示意图。
具体实施方式
下文中,将参考附图详细描述本公开的示例性实施例,以使本领域技术人员可容易地实现它们。此外,为了清楚起见,在附图中省略了与描述示例性实施例无关的部分。
在本公开中,应理解,诸如“包括”或“具有”等的术语旨在指示本说明书中所公开的特征、数字、步骤、行为、部件、部分或其组合的存在,并且不欲排除一个或多个其他特征、数字、步骤、行为、部件、部分或其组合存在或被添加的可能性。
在本公开中,对用户信息或用户数据的获取均为经用户授权、确认,或由用户主动选择的操作。
另外还需要说明的是,在不冲突的情况下,本公开中的实施例及实施例中的特征可以相互组合。下面将参考附图并结合实施例来详细说明本公开。
以下对本公开实施例可能涉及术语进行简要描述。
DMA(Direct Memory Access):直接内存访问,是计算机科学中的一种内存访问技术。它允许某些电脑内部的硬件子系统(电脑外设)可以独立地直接读写系统内存,而不需中央处理器(CPU)介入处理。在同等程度的处理器负担下,DMA是一种快速的数据传送方式。很多硬件的系统会使用DMA,包含硬盘控制器、绘图显卡、网卡和声卡等。
卸载卡(offloading card):它通过硬件实现本来由软件实现的功能,这样就可以将本来在操作系统上进行的一些数据处理放到硬件上去做,降低系统CPU消耗的同时,提高处理性能。
PCIe(Peripheral Component Interconnect express):是一种高速串行计算机扩展总线标准,它沿用现有的PCI编程概念及信号标准,并且构建了更加高速的串行通信系统标准。
FPGA(Field-Programmable Gate Array):现场可编程逻辑门阵列,它以PAL、GAL、CPLD等可编程逻辑器件为技术基础发展而成。作为专用集成电路中的一种半客制电路,它既弥补专用集成电路电路不足,又克服原有可编程逻辑器件门电路数有限的缺点。
ASIC(Application Specific Integrated Circuit):是指依产品需求不同而全定制的特殊规格集成电路,是一种有别于标准工业IC的集成电路产品。
ioctl(input/output control):输入/输出控制,是一个专用于设备输入输出操作的系统调用的函数,其传入一个跟设备有关的请求码,系统调用的功能完全取决于请求码。ioctl 是设备驱动程序中对设备的I/O通道进行管理的函数。
如上所述,当前已经采用命名空间(namespace)技术来对NVME存储器进行空间划分/管理,主要用于为不同的应用划分磁盘逻辑空间和建立安全隔离机制。不同的命名空间采用软件技术隔离,共用一部分硬件资源,例如,硬件队列,会产生性能干扰问题,而且,故障隔离能力比较差。命名空间技术除了提供空间划分功能之外,每个命名空间可以具备独立的格式化、加密能力,也就相当于独立的配置功能。
以下参照图1描述相关技术中的卸载卡命名空间管理系统。图1示出根据相关技术的卸载卡命名空间管理系统的实施场景的示例性示意图。
如图1所示,相关技术的卸载卡命名空间管理系统可以采用硬件卸载(offloading)架构,该系统可以包括主机1100和卸载卡1200。主机1100可以通过卸载卡1200访问NVMe设备1300。卸载卡1200和NVMe设备1300一起可以被用作存储系统。卸载卡1200和NVMe设备1300之间可以通过PCIe接口连接。具体的PCIe接口规范可以从相关技术中获取,本公开对此不做赘述。
如图1所示,主机1100上运行有虚拟机1110,虚拟机1110可以包括用户空间(user space)1111和内核空间(kernel space)1115。在本公开的一个实施例中,虚拟机1110可以是Linux虚拟机。在本公开的一个实施例中,用户空间1111中运行有应用1112(或者应用的线程),并且应用1112通过中央处理器(CPU)核心1113下发数据。虚拟机1110的内核空间1115中具有通用块层(Linux内核中的Blk-mq)1116。通用块层1116为上层应用1112提供软件队列,每个CPU核心1113对应一个软件队列11161。虚拟机1110的内核空间1115中还具有驱动器(例如,NVMe驱动器)1117,用于维护和管理硬件队列11171,并且可以将输入输出(IO,Input Output)请求通过例如设置硬件寄存器的方式下发到具体的物理硬件(例如,卸载卡)上处理。通用块层1116可以维护从软件队列11161到硬件队列11171的映射关系。软件队列11161可以用于转发CPU核心1113下发的数据。硬件队列11171用于转发需要下发到NVMe设备的数据。
在相关技术的卸载卡命名空间管理系统中,因为通用块层1116需要通过NVMe驱动器1117与硬件交互,因此需要硬件也支持多队列。最理想的情况是,硬件支持的队列足够多,通用块层1116的每个软件队列11161都有硬件队列11171和其关联。但是,相关技术中硬件支持的硬件队列有限,就形成如图1所示的关联关系。图1中有4个硬件队列11171,但是通用块层1116总共形成了8个软件队列11161。因此,通用块层1116将2个软件队列11161和1个硬件队列11171关联起来,即,建立从2软件队列11161到1个硬件队列11171的映射关系。
在相关技术的卸载卡命名空间管理系统中,卸载卡1200包括NVMe控制器1210和加速器1220。卸载卡1200可以通过软硬件技术模拟标准NVMe控制器1210,负责处理虚拟机1110中的内核空间1115中的NVMe驱动器1117的配置请求,完成NVMe设备1300的初始化、挂载/卸载等管理任务。加速器1220例如是基于FPGA硬件或ASIC硬件的加速器,负责将虚拟机1110中的IO请求以DMA方式搬运到卸载卡1200并通过例如网卡、PCI之类的连接方式发送到NVMe设备1300。例如,在卸载卡1200中,NVMe控制器1210包含的4硬件队列11171是在加速器1220中对应于硬件部分1221预先分配的,由一个命名空间1211管理。
在相关技术的卸载卡命名空间管理系统中,NVMe控制器1210通过一个命名空间管理设备1300的磁盘空间以及块设备相关的属性,例如,扇区大小、容量等等。在卸载卡1200中,一个NVMe控制器1210可以包含4个硬件队列,统一由一个命名空间1211管理。相关技术的NVMe控制器1210实际上可能做到包含更多的硬件队列,但是核心的问题在于即使硬件队列数量可以匹配应用的数量,也无法通过划分命名空间做到为每个应用分配独立的硬件队列。所有应用进程共用NVMe控制器1210的硬件队列,NVMe控制器1210的命名空间1211上不同应用会共用NVMe设备1300的硬件资源,因此存在一定的性能资源争抢问题以及故障隔离能力较弱的问题。
在利用相关技术的卸载卡命名空间管理系统处理IO请求的情况中,虚拟机1110中的应用1112可以通过系统调用接口向内核空间1115中的通用块层1116下发IO请求。内核空间1115中的通用块层1116接收IO请求,并将其封装成bio请求(通用块层的请求)。NVMe驱动器1117负责将bio请求转成符合NVMe协议的IO请求,并可通过设备硬件寄存器下发符合NVMe协议的IO请求。NVMe驱动器1117可以通过设置硬件寄存器,通知卸载卡1200上的NVMe控制器1210处理符合NVMe协议的IO请求。加速器1220将虚拟机1110的IO请求通过DMA方式搬运到卸载卡1200并通过例如网卡、PCI之类接口发送到NVMe设备1300。
在利用相关技术的卸载卡命名空间管理系统处理IO请求的过程中,NVMe设备1300的硬件队列统一由一个命名空间管理,不同的应用共用硬件队列资源,存在性能干扰问题,而且故障隔离能力比较差。例如,应用A为低延时IO模型,应用B为高吞吐IO模型,如果共用一个硬件队列,会造成应用A的IO请求被应用B拖慢的问题。在一定情况下,单一硬件队列故障会导致上面所有的应用IO挂起,影响范围大,故障隔离能力差。
为了解决上述问题,本公开提出了卸载卡命名空间管理、输入输出请求处理系统和方法。
根据本公开实施例提供的技术方案,通过卸载卡命名空间管理系统,包括主机和与主机连接的卸载卡,其中,所述主机上运行有发出输入输出请求的多个应用,并且所述主机向所述卸载卡发送创建命名空间请求,所述卸载卡根据所述创建命名空间请求为所述多个应用创建对应的多个命名空间,所述卸载卡根据所述创建命名空间请求分配与所创建的多个命名空间对应的多个硬件队列并且将所分配的多个硬件队列分别绑定到所对应的命名空间,可以创建多个命名空间,并且动态地为每个命名空间分配专属硬件队列,实现了硬件级别的命名空间资源,应用可以使用自身对应的硬件队列,大幅提升了命名空间的性能隔离能力以及可靠性。而且,相较于不能为每个应用分配独立的硬件队列的相关技术的方案,本公开实施例的方案在单一硬件队列出现故障的情况下不影响其他硬件队列的应用,在性能隔离能力以及故障隔离能力方面有很大提升。
以下参照图2描述根据本公开实施例的卸载卡命名空间管理系统的实施场景。图2示出根据本公开实施例的卸载卡命名空间管理系统的实施场景的示例性示意图。
如图2所示,根据本公开实施例的卸载卡命名空间管理系统与相关技术中的卸载卡命名空间管理系统类似,可以采用硬件卸载(offloading)架构,该系统可以包括主机2100和卸载卡2200。主机2100可以通过卸载卡2200访问NVMe设备2300。卸载卡2200和NVMe设备2300一起可以被用作存储系统。卸载卡2200和NVMe设备2300之间可以通 过PCIe接口连接。具体的PCIe接口规范可以从相关技术中获取,本公开对此不做赘述。
如图2所示,主机2100上运行有虚拟机2110,虚拟机2110可以包括用户空间(user space)2111和内核空间(kernel space)2115。在本公开的一个实施例中,虚拟机2110可以是Linux虚拟机。在本公开的一个实施例中,用户空间2111中运行有应用2112(或者应用的线程),并且应用2112通过中央处理器(CPU)核心2113下发数据。虚拟机2110的内核空间2115中具有通用块层(Linux内核中的Blk-mq)2116。通用块层2116为上层应用2112提供软件队列,每个CPU核心2113对应一个软件队列21161。虚拟机2110的内核空间2115中还具有驱动器(例如,NVMe驱动器)2117,用于维护和管理硬件队列21171,并且可以将输入输出(IO,Input Output)请求通过例如设置硬件寄存器的方式下发到具体的物理硬件(例如,卸载卡)上处理。通用块层2116可以维护从软件队列21161到硬件队列21171的映射关系。软件队列21161可以用于转发CPU核心2113下发的数据。硬件队列21171用于转发需要下发到NVMe设备的数据。
在根据本公开实施例的卸载卡命名空间管理系统中,因为通用块层2116需要通过NVMe驱动器2117与硬件交互,因此需要硬件也支持多队列。最理想的情况是,硬件支持的队列足够多,通用块层2116的每个软件队列11161都有硬件队列21171和其关联。图2中有8个硬件队列21171,通用块层2116总共形成了8个软件队列11161。因此,通用块层2116将1个软件队列21161和1个硬件队列21171关联起来,即,建立软件队列11161到硬件队列11171的一一对应的映射关系。以下描述中本公开实施例中利用卸载卡创建多个命名空间以分配硬件队列的具体方式。
在根据本公开实施例的卸载卡命名空间管理系统中,卸载卡2200包括NVMe控制器2210和加速器2220。卸载卡2200可以通过软硬件技术模拟标准NVMe控制器2210,负责处理虚拟机2110中的内核空间2115中的NVMe驱动器2117的配置请求,完成NVMe设备2300的初始化、挂载/卸载等管理任务。加速器2220例如是基于FPGA硬件或ASIC硬件的加速器,负责将虚拟机2110中的IO请求以DMA方式搬运到卸载卡2200并通过例如网卡、PCI之类的连接方式发送到NVMe设备2300。
在根据本公开实施例的卸载卡命名空间管理系统中,可以将NVMe设备2300挂载到虚拟机2110内部。虚拟机2110内部可以通过例如nvme-cli(Linux的用于监控和配置管理NVMe设备的命令行工具)之类的工具为每个应用2112创建独立的命名空间。具体而言,虚拟机2110可以通过ioctl系统调用向NVMe驱动器2117下发NVMe Admin Command命令,具体为,创建命名空间请求(也可以称作命名空间创建请求)。在NVMe驱动器2117收到创建命名空间请求后,可以向卸载卡2200发送创建命名空间请求。例如,NVMe驱动器2117可以通过设置硬件寄存器向卸载卡2200发送配置信息请求的方式向卸载卡2200发送创建命名空间请求。卸载卡2200上的NVMe控制器2210收到创建命名空间请求后,为各个应用2112创建对应的命名空间NS1、NS2、NS3、NS4、NS5、NS6、NS7、NS8,并且从加速器2220中对应的硬件部分2221分配硬件队列21171,分别并绑定在创建的命名空间NS1、NS2、NS3、NS4、NS5、NS6、NS7、NS8上。例如,在卸载卡2200中,NVMe控制器2210根据创建命名空间请求从加速器2220的对应硬件部分2221分配8硬件队列21171,由八个命名空间NS1、NS2、NS3、NS4、NS5、NS6、NS7、NS8分别管理。加速器2220中的硬件部分2221以外的其他硬件部分2222并不被分配用于由命名空间NS1、 NS2、NS3、NS4、NS5、NS6、NS7、NS8管理。
从图2可见,一个应用2112对应于一个软件队列21161,对应于一个硬件队列21171,一个硬件队列21171由与其绑定的一个命名空间管理。因此,在根据本公开实施例的卸载卡命名空间管理系统中,可以通过动态的为每个NVMe命名空间分配专属硬件队列,实现硬件级别的命名空间资源,大幅提升了命名空间的性能隔离能力以及可靠性。相比于相关技术的卸载卡命名空间管理系统,在性能隔离能力以及性能隔离能力方面有很大提升。而且,随着硬件卸载技术的发展,现有的卸载卡已经可以支持一千或数千硬件队列。根据本公开实施例的卸载卡命名空间管理系统采用软硬件结合的技术架构,支持多命名空间的设备形态,并且为每个命名空间分配独立的硬件队列,避免不同应用的IO请求相互干扰问题,并提升了故障隔离能力。
在利用根据本公开实施例的卸载卡命名空间管理系统处理IO请求的情况中,虚拟机2110中的应用2112可以通过系统调用接口向内核空间2115中的通用块层2116下发IO请求。内核空间2115中的通用块层1116接收IO请求,并将其封装成bio请求(通用块层的请求)。NVMe驱动器2117负责将bio请求转成符合NVMe协议的IO请求,并可通过设备硬件寄存器下发符合NVMe协议的IO请求。NVMe驱动器2117可以通过设置硬件寄存器,通知卸载卡2200上的NVMe控制器2210处理符合NVMe协议的IO请求。加速器2220通过独立的硬件队列(即,与各个命名空间一一对应的硬件队列)21171将虚拟机2110的IO请求通过DMA方式搬运到卸载卡2200并通过例如网卡、PCI之类接口发送到NVMe设备2300。
在根据本公开实施例的卸载卡命名空间管理系统中,与相关技术的卸载卡命名空间管理系统一样,每个CPU核心2113对应一个软件队列21161,但是在通用块层2116中,与相关技术的卸载卡命名空间管理系统不同,驱动器2117管理和维护的不是预设好的硬件队列,而是由卸载卡2200的NVMe控制器2210根据命名空间创建请求创建多个与软件队列数量相同的命名空间后,从加速器2220分配的硬件队列。硬件队列21171的数量可以与软件队列21161一样,二者可以一一对应,这样NVMe设备2300的硬件队列分别由独立的命名空间管理,不同的应用使用各自的硬件队列资源,不再存在性能干扰,而且故障隔离能力很好。
以下参照图3描述根据本公开实施例的卸载卡命名空间管理系统。图3示出根据本公开实施例的卸载卡命名空间管理系统300的结构框图。
图3所示的载卡命名空间管理系统300包括主机301和卸载卡302。其中,所述主机301上运行有发出输入输出请求的多个应用,并且所述主机301向所述卸载卡302发送创建命名空间请求,所述卸载卡302根据所述创建命名空间请求为所述多个应用创建对应的多个命名空间,所述卸载卡302根据所述创建命名空间请求分配与所创建的多个命名空间对应的多个硬件队列并且将所分配的多个硬件队列分别绑定到所对应的命名空间。
根据本公开实施例提供的技术方案,通过卸载卡命名空间管理系统,包括主机和与主机连接的卸载卡,其中,所述主机上运行有发出输入输出请求的多个应用,并且所述主机向所述卸载卡发送创建命名空间请求,所述卸载卡根据所述创建命名空间请求为所述多个应用创建对应的多个命名空间,所述卸载卡根据所述创建命名空间请求分配与所创建的多个命名空间对应的多个硬件队列并且将所分配的多个硬件队列分别绑定到所对应的命名 空间,可以创建多个命名空间,并且动态地为每个命名空间分配专属硬件队列,实现了硬件级别的命名空间资源,应用可以使用自身对应的硬件队列,大幅提升了命名空间的性能隔离能力以及可靠性。而且,相较于所有应用进程共用统一由一个命名空间管理的多个硬件队列的方案,本公开实施例的方案在单一硬件队列出现故障的情况下不影响其他硬件队列的应用,在性能隔离能力以及故障隔离能力方面有很大提升。
在本公开的一个实施例中,所述主机301上运行有虚拟机,所述多个应用运行于所述虚拟机中,所述虚拟机包括:驱动器,其用于管理所述多个硬件队列,所述虚拟机通过所述驱动器向所述卸载卡302发送所述创建命名空间请求。
根据本公开实施例提供的技术方案,通过所述主机上运行有虚拟机,所述多个应用运行于所述虚拟机中,所述虚拟机包括:驱动器,其用于管理所述多个硬件队列,所述虚拟机通过所述驱动器向所述卸载卡发送所述创建命名空间请求,可以创建多个命名空间,并且动态地为每个命名空间分配专属硬件队列,实现了硬件级别的命名空间资源,应用可以使用自身对应的硬件队列,大幅提升了命名空间的性能隔离能力以及可靠性。
在本公开的一个实施例中,所述卸载卡302包括控制器和硬件加速器,其中,所述控制器根据所述创建命名空间请求为所述多个应用创建各自的命名空间,并且从所述硬件加速器中分配与所创建的多个命名空间对应的多个硬件队列并且将所分配的多个硬件队列分别绑定到所对应的命名空间。
根据本公开实施例提供的技术方案,通过所述卸载卡包括控制器和硬件加速器,其中,所述控制器根据所述创建命名空间请求为所述多个应用创建各自的命名空间,并且从所述硬件加速器中分配与所创建的多个命名空间对应的多个硬件队列并且将所分配的多个硬件队列分别绑定到所对应的命名空间,可以创建多个命名空间,并且动态地为每个命名空间分配专属硬件队列,实现了硬件级别的命名空间资源,应用可以使用自身对应的硬件队列,大幅提升了命名空间的性能隔离能力以及可靠性。
在本公开的一个实施例中,所述主机301包括多核心中央处理器,所述虚拟机包括用户空间和内核空间,其中,所述用户空间中运行有所述多个应用,所述应用由对应的核心执行,其中,所述内核空间包括通用块层和所述驱动器,所述通用块层包括与所述多个核心对应的多个软件队列,并且建立从所述多个软件队列到所述多个硬件队列的一一对应关系。
根据本公开实施例提供的技术方案,通过所述主机包括多核心中央处理器,所述虚拟机包括用户空间和内核空间,其中,所述用户空间中运行有所述多个应用,所述应用由对应的核心执行,其中,所述内核空间包括通用块层和所述驱动器,所述通用块层包括与所述多个核心对应的多个软件队列,并且建立从所述多个软件队列到所述多个硬件队列的一一对应关系,可以创建多个命名空间,并且动态地为每个命名空间分配专属硬件队列,实现了硬件级别的命名空间资源,应用可以使用自身对应的硬件队列,大幅提升了命名空间的性能隔离能力以及可靠性。
在本公开的一个实施例中,所述驱动器是NVMe驱动器,所述控制器是NVMe控制器,其中,所述主机301通过所述卸载卡302连接到NVMe设备303,其中,所述NVMe设备303挂载到所述虚拟机内部,其中,所述用户空间中运行的应用向所述通用块层发出输入输出请求,其中,通过所述通用块层和所述NVMe驱动器的处理将所述输入输出请求 转换成符合NVMe协议的输入输出请求,并且发送到所述NVMe控制器,其中,所述NVMe控制器通过与所创建的命名空间对应的硬件队列将所述符合NVMe协议的输入输出请求搬运到所述卸载卡302,其中,所述卸载卡302将符合NVMe协议的输入输出请求发送到所述NVMe设备303。
在本公开的一个实施例中,通过所述通用块层和所述NVMe驱动器的处理将所述输入输出请求转换成符合NVMe协议的输入输出请求,并且发送到所述NVMe控制器的具体方式可以包括:通用块层将应用下发的输入输出请求请求转换成符合通用块设备协议(例如,bio结构体)的输入输出请求请求,而NVMe驱动器将符合通用块设备协议的输入输出请求请求转成符合NVMe协议的输入输出请求,并通过设备硬件寄存器下发输入输出请求。
根据本公开实施例提供的技术方案,通过所述驱动器是NVMe驱动器,所述控制器是NVMe控制器,其中,所述主机通过所述卸载卡连接到NVMe设备,其中,所述NVMe设备挂载到所述虚拟机内部,其中,所述用户空间中运行的应用向所述通用块层发出输入输出请求,其中,通过所述通用块层和所述NVMe驱动器的处理将所述输入输出请求转换成符合NVMe协议的输入输出请求,并且发送到所述NVMe控制器,其中,所述NVMe控制器通过与所创建的命名空间对应的硬件队列将所述符合NVMe协议的输入输出请求搬运到所述卸载卡,其中,所述卸载卡将符合NVMe协议的输入输出请求发送到所述NVMe设备,可以创建多个命名空间,并且动态地为每个命名空间分配专属硬件队列,实现了硬件级别的命名空间资源,应用可以使用自身对应的硬件队列,大幅提升了命名空间的性能隔离能力以及可靠性。而且,对现有的NVMe命名空间能力进行了加强扩展,并结合软硬件一体化的技术架构,实现了性能隔离能力和故障隔离能力的大幅提升,并且完全兼容上层应用的软件生态。
本领域技术人员可以理解,参照图3描述的技术方案的可以与参照图1和图2描述的实施场景结合,从而具备参照图1和图2描述的实施场景所实现的技术效果。具体内容可以参照以上根据图1和图2进行的描述,其具体内容在此不再赘述。
以下参照图4描述根据本公开实施例的卸载卡命名空间管理方法。
图4示出根据本公开实施例的卸载卡命名空间管理方法的流程图。
图4示出的根据本公开实施例的卸载卡命名空间管理方法运行于卸载卡命名空间管理系统,所述卸载卡命名空间管理系统包括主机和与主机连接的卸载卡。该卸载卡命名空间管理系统是参照图3所述的卸载卡命名空间管理系统。
如图4所示,根据本公开实施例的卸载卡命名空间管理方法包含步骤S401、S402、S403。
在步骤S401中,基于所述主机上运行的发出输入输出请求的多个应用,所述主机向所述卸载卡发送创建命名空间请求。
在步骤S402中,所述卸载卡根据所述创建命名空间请求为所述多个应用创建对应的多个命名空间。
在步骤S403中,所述卸载卡根据所述创建命名空间请求分配与所创建的多个命名空间对应的多个硬件队列并且将所分配的多个硬件队列分别绑定到所对应的命名空间。
在本公开的一个实施例中,所述主机上运行有虚拟机,所述多个应用运行于所述虚拟机中,所述虚拟机包括用于管理所述多个硬件队列的驱动器,其中,所述主机向所述卸载 卡发送创建命名空间请求,包括:所述虚拟机通过所述驱动器向所述卸载卡发送所述创建命名空间请求。
根据本公开实施例提供的技术方案,通过所述主机上运行有虚拟机,所述多个应用运行于所述虚拟机中,所述虚拟机包括用于管理所述多个硬件队列的驱动器,其中,所述主机向所述卸载卡发送创建命名空间请求,包括:所述虚拟机通过所述驱动器向所述卸载卡发送所述创建命名空间请求,可以创建多个命名空间,并且动态地为每个命名空间分配专属硬件队列,实现了硬件级别的命名空间资源,应用可以使用自身对应的硬件队列,大幅提升了命名空间的性能隔离能力以及可靠性。
在本公开的一个实施例中,所述卸载卡包括控制器和硬件加速器,其中,步骤S402包括:所述控制器根据所述创建命名空间请求为所述多个应用创建各自的命名空间,其中,所述卸载卡根据所述创建命名空间请求分配与所创建的多个命名空间对应的多个硬件队列并且将所分配的多个硬件队列分别绑定到所对应的命名空间,包括:所述控制器从所述硬件加速器中分配与所创建的多个命名空间对应的多个硬件队列并且将所分配的多个硬件队列分别绑定到所对应的命名空间。
根据本公开实施例提供的技术方案,通过所述卸载卡包括控制器和硬件加速器,其中,所述卸载卡根据所述创建命名空间请求为所述多个应用创建对应的多个命名空间,包括:所述控制器根据所述创建命名空间请求为所述多个应用创建各自的命名空间,其中,所述卸载卡根据所述创建命名空间请求分配与所创建的多个命名空间对应的多个硬件队列并且将所分配的多个硬件队列分别绑定到所对应的命名空间,包括:所述控制器从所述硬件加速器中分配与所创建的多个命名空间对应的多个硬件队列并且将所分配的多个硬件队列分别绑定到所对应的命名空间,可以创建多个命名空间,并且动态地为每个命名空间分配专属硬件队列,实现了硬件级别的命名空间资源,应用可以使用自身对应的硬件队列,大幅提升了命名空间的性能隔离能力以及可靠性。
在本公开的一个实施例中,所述主机包括多核心中央处理器,所述虚拟机包括用户空间和内核空间,其中,所述用户空间中运行有所述多个应用,所述应用由对应的核心执行,其中,所述内核空间包括通用块层和所述驱动器,所述通用块层包括与所述多个核心对应的多个软件队列,并且建立从所述多个软件队列到所述多个硬件队列的一一对应关系。
根据本公开实施例提供的技术方案,通过所述主机包括多核心中央处理器,所述虚拟机包括用户空间和内核空间,其中,所述用户空间中运行有所述多个应用,所述应用由对应的核心执行,其中,所述内核空间包括通用块层和所述驱动器,所述通用块层包括与所述多个核心对应的多个软件队列,并且建立从所述多个软件队列到所述多个硬件队列的一一对应关系,可以创建多个命名空间,并且动态地为每个命名空间分配专属硬件队列,实现了硬件级别的命名空间资源,应用可以使用自身对应的硬件队列,大幅提升了命名空间的性能隔离能力以及可靠性。
在本公开的一个实施例中,所述驱动器是NVMe驱动器,所述控制器是NVMe控制器,其中,所述主机通过所述卸载卡连接到NVMe设备,其中,所述方法还包括:将所述NVMe设备挂载到所述虚拟机内部;
所述用户空间中运行的应用向所述通用块层发出输入输出请求;通过所述通用块层和所述NVMe驱动器的处理将所述输入输出请求转换成符合NVMe协议的输入输出请求, 并且发送到所述NVMe控制器;所述NVMe控制器通过与所创建的命名空间对应的硬件队列将所述符合NVMe协议的输入输出请求搬运到所述卸载卡,所述卸载卡将符合NVMe协议的输入输出请求发送到所述NVMe设备。
根据本公开实施例提供的技术方案,通过所述驱动器是NVMe驱动器,所述控制器是NVMe控制器,其中,所述主机通过所述卸载卡连接到NVMe设备,其中,所述方法还包括:将所述NVMe设备挂载到所述虚拟机内部;所述用户空间中运行的应用向所述通用块层发出输入输出请求;通过所述通用块层和所述NVMe驱动器的处理将所述输入输出请求转换成符合NVMe协议的输入输出请求,并且发送到所述NVMe控制器;所述NVMe控制器通过与所创建的命名空间对应的硬件队列将所述符合NVMe协议的输入输出请求搬运到所述卸载卡,所述卸载卡将符合NVMe协议的输入输出请求发送到所述NVMe设备,可以创建多个命名空间,并且动态地为每个命名空间分配专属硬件队列,实现了硬件级别的命名空间资源,应用可以使用自身对应的硬件队列,大幅提升了命名空间的性能隔离能力以及可靠性。而且,对现有的NVMe命名空间能力进行了加强扩展,并结合软硬件一体化的技术架构,实现了性能隔离能力和故障隔离能力的大幅提升,并且完全兼容上层应用的软件生态。
本领域技术人员可以理解,参照图4描述的技术方案的可以与参照图1至图3描述的实施场景结合,从而具备参照图1至图3描述的实施场景所实现的技术效果。具体内容可以参照以上根据图1至图3进行的描述,其具体内容在此不再赘述。
以下参照图5描述根据本公开实施例的输入输出请求处理系统。
图5示出根据本公开实施例的输入输出请求处理系统500的结构框图。输入输出请求处理系统500是利用图3所示的卸载卡命名空间管理系统300实现的。
如图5所示,输入输出请求处理系统500包括主机501、卸载卡5022和通过所述卸载卡502与所述主机501连接的NVMe设备503。
其中,所述主机501上运行有虚拟机,所述主机501包括多核心中央处理器,所述虚拟机包括用户空间和内核空间,其中,所述用户空间中运行有发出输入输出请求的多个应用,所述应用由对应的核心执行,其中,所述内核空间包括通用块层和NVMe驱动器,所述通用块层包括与所述多个核心对应的多个软件队列,其中,所述虚拟机通过所述NVMe驱动器向所述卸载卡502发送创建命名空间请求。
其中,所述卸载卡502包括NVMe控制器和硬件加速器,其中,所述NVMe控制器根据所述创建命名空间请求为所述多个应用创建各自的命名空间,并且从所述硬件加速器中分配与所创建的多个命名空间对应的多个硬件队列并且将所分配的多个硬件队列分别绑定到所对应的命名空间,
其中,所述NVMe驱动器用于管理所述多个硬件队列,所述通用块层建立从所述多个软件队列到所述多个硬件队列的一一对应关系,
其中,所述用户空间中运行的应用向所述通用块层发出输入输出请求,
其中,通过所述通用块层和所述NVMe驱动器的处理将所述输入输出请求转换成符合NVMe协议的输入输出请求,并且发送到所述NVMe控制器,
其中,所述NVMe控制器通过与所创建的命名空间对应的硬件队列将所述符合NVMe协议的输入输出请求搬运到所述卸载卡502,
其中,所述卸载卡502将符合NVMe协议的输入输出请求发送到所述NVMe设备503。
根据本公开实施例提供的技术方案,通过利用前述卸载卡命名空间管理方案进行输入输出请求处理,可以创建多个命名空间,并且动态地为每个命名空间分配专属硬件队列,实现了硬件级别的命名空间资源,应用可以使用自身对应的硬件队列,避免不同应用输入输出请求相互干扰问题,并提升了故障隔离能力大幅提升了命名空间的性能隔离能力以及可靠性。而且,对现有的NVMe命名空间能力进行了加强扩展,并结合软硬件一体化的技术架构,实现了性能隔离能力和故障隔离能力的大幅提升,并且完全兼容上层应用的软件生态。
以下参照图6描述根据本公开实施例的输入输出请求处理方法。
图6示出根据本公开实施例的输入输出请求处理方法的流程图。图6示出的输入输出请求处理方法是利用图5所示的输入输出请求处理系统500实现的。图5所示的输入输出请求处理系统500是利用图3所示的卸载卡命名空间管理系统300实现的。
运行有输入输出请求处理方法的输入输出请求处理系统包括主机、卸载卡和通过所述卸载卡与所述主机连接的NVMe设备,
其中,所述主机上运行有虚拟机,所述主机包括多核心中央处理器,所述虚拟机包括用户空间和内核空间,其中,所述用户空间中运行有发出输入输出请求的多个应用,所述应用由对应的核心执行,其中,所述内核空间包括通用块层和NVMe驱动器,所述通用块层包括与所述多个核心对应的多个软件队列,其中,所述虚拟机通过所述NVMe驱动器向所述卸载卡发送创建命名空间请求,
其中,所述卸载卡包括NVMe控制器和硬件加速器,其中,所述NVMe控制器根据所述创建命名空间请求为所述多个应用创建各自的命名空间,并且从所述硬件加速器中分配与所创建的多个命名空间对应的多个硬件队列并且将所分配的多个硬件队列分别绑定到所对应的命名空间,
其中,所述NVMe驱动器用于管理所述多个硬件队列,所述通用块层建立从所述多个软件队列到所述多个硬件队列的一一对应关系,
其中,所述方法包括步骤S601、S602、S603和S604。
在步骤S601,所述用户空间中运行的应用向所述通用块层发出输入输出请求;
在步骤S602,通过所述通用块层和所述NVMe驱动器的处理将所述输入输出请求转换成符合NVMe协议的输入输出请求,并且发送到所述NVMe控制器;
在步骤S603,所述NVMe控制器通过与所创建的命名空间对应的硬件队列将所述符合NVMe协议的输入输出请求搬运到所述卸载卡;
在步骤S604,所述卸载卡将符合NVMe协议的输入输出请求发送到所述NVMe设备。
根据本公开实施例提供的技术方案,通过利用前述卸载卡命名空间管理方案进行输入输出请求处理,可以创建多个命名空间,并且动态地为每个命名空间分配专属硬件队列,实现了硬件级别的命名空间资源,应用可以使用自身对应的硬件队列,避免不同应用输入输出请求相互干扰问题,并提升了故障隔离能力大幅提升了命名空间的性能隔离能力以及可靠性。而且,对现有的NVMe命名空间能力进行了加强扩展,并结合软硬件一体化的技术架构,实现了性能隔离能力和故障隔离能力的大幅提升,并且完全兼容上层应用的软件生态。
图7示出根据本公开一实施方式的电子设备的结构框图。
本公开实施方式还提供了一种电子设备,如图7所示,包括至少一个处理器701;以及与至少一个处理器701通信连接的存储器702;其中,存储器702存储有可被至少一个处理器701执行的指令,指令运行于卸载卡命名空间管理系统,所述卸载卡命名空间管理系统包括主机和与主机连接的卸载卡,指令被至少一个处理器701执行以实现以下步骤:
基于所述主机上运行的发出输入输出请求的多个应用,所述主机向所述卸载卡发送创建命名空间请求;
所述卸载卡根据所述创建命名空间请求为所述多个应用创建对应的多个命名空间;
所述卸载卡根据所述创建命名空间请求分配与所创建的多个命名空间对应的多个硬件队列并且将所分配的多个硬件队列分别绑定到所对应的命名空间。
在本公开的一个实施例中,存所述主机上运行有虚拟机,所述多个应用运行于所述虚拟机中,所述虚拟机包括用于管理所述多个硬件队列的驱动器,其中,所述主机向所述卸载卡发送创建命名空间请求,包括:
所述虚拟机通过所述驱动器向所述卸载卡发送所述创建命名空间请求。
在本公开的一个实施例中,所述卸载卡包括控制器和硬件加速器,其中,所述卸载卡根据所述创建命名空间请求为所述多个应用创建对应的多个命名空间,包括:
所述控制器根据所述创建命名空间请求为所述多个应用创建各自的命名空间,
其中,所述卸载卡根据所述创建命名空间请求分配与所创建的多个命名空间对应的多个硬件队列并且将所分配的多个硬件队列分别绑定到所对应的命名空间,包括:
所述控制器从所述硬件加速器中分配与所创建的多个命名空间对应的多个硬件队列并且将所分配的多个硬件队列分别绑定到所对应的命名空间。
在本公开的一个实施例中,所述主机包括多核心中央处理器,所述虚拟机包括用户空间和内核空间,
其中,所述用户空间中运行有所述多个应用,所述应用由对应的核心执行,
其中,所述内核空间包括通用块层和所述驱动器,所述通用块层包括与所述多个核心对应的多个软件队列,并且建立从所述多个软件队列到所述多个硬件队列的一一对应关系。
在本公开的一个实施例中,所述驱动器是NVMe驱动器,所述控制器是NVMe控制器,
其中,所述主机通过所述卸载卡连接到NVMe设备,
其中,所述方法还包括:
将所述NVMe设备挂载到所述虚拟机内部;
所述用户空间中运行的应用向所述通用块层发出输入输出请求;
通过所述通用块层和所述NVMe驱动器的处理将所述输入输出请求转换成符合NVMe协议的输入输出请求,并且发送到所述NVMe控制器;
所述NVMe控制器通过与所创建的命名空间对应的硬件队列将所述符合NVMe协议的输入输出请求搬运到所述卸载卡,
所述卸载卡将符合NVMe协议的输入输出请求发送到所述NVMe设备。
本公开实施方式还提供了一种电子设备,如图7所示,包括至少一个处理器701;以及与至少一个处理器701通信连接的存储器702;其中,存储器702存储有可被至少一个 处理器701执行的指令,指令运行于输入输出请求处理系统,所述输入输出请求处理系统包括主机、卸载卡和通过所述卸载卡与所述主机连接的NVMe设备,
其中,所述主机上运行有虚拟机,所述主机包括多核心中央处理器,所述虚拟机包括用户空间和内核空间,其中,所述用户空间中运行有发出输入输出请求的多个应用,所述应用由对应的核心执行,其中,所述内核空间包括通用块层和NVMe驱动器,所述通用块层包括与所述多个核心对应的多个软件队列,其中,所述虚拟机通过所述NVMe驱动器向所述卸载卡发送创建命名空间请求,
其中,所述卸载卡包括NVMe控制器和硬件加速器,其中,所述NVMe控制器根据所述创建命名空间请求为所述多个应用创建各自的命名空间,并且从所述硬件加速器中分配与所创建的多个命名空间对应的多个硬件队列并且将所分配的多个硬件队列分别绑定到所对应的命名空间,
其中,所述NVMe驱动器用于管理所述多个硬件队列,所述通用块层建立从所述多个软件队列到所述多个硬件队列的一一对应关系,
其中,所述方法包括:
所述用户空间中运行的应用向所述通用块层发出输入输出请求;
通过所述通用块层和所述NVMe驱动器的处理将所述输入输出请求转换成符合NVMe协议的输入输出请求,并且发送到所述NVMe控制器;
所述NVMe控制器通过与所创建的命名空间对应的硬件队列将所述符合NVMe协议的输入输出请求搬运到所述卸载卡;
所述卸载卡将符合NVMe协议的输入输出请求发送到所述NVMe设备。
图8是适于用来实现根据本公开各实施方式的方法的计算机系统的结构示意图。如图8所示,计算机系统800包括处理单元801,其可以根据存储在只读存储器(ROM)802中的程序或者从存储部分808加载到随机访问存储器(RAM)803中的程序而执行上述附图所示的实施方式中的各种处理。在RAM803中,还存储有系统800操作所需的各种程序和数据。CPU801、ROM802以及RAM803通过总线804彼此相连。输入/输出(I/O)接口805也连接至总线804。
以下部件连接至I/O接口805:包括键盘、鼠标等的输入部分806;包括诸如阴极射线管(CRT)、液晶显示器(LCD)等以及扬声器等的输出部分807;包括硬盘等的存储部分808;以及包括诸如LAN卡、调制解调器等的网络接口卡的通信部分809。通信部分809经由诸如因特网的网络执行通信处理。驱动器810也根据需要连接至I/O接口805。可拆卸介质811,诸如磁盘、光盘、磁光盘、半导体存储器等等,根据需要安装在驱动器810上,以便于从其上读出的计算机程序根据需要被安装入存储部分808。其中,所述处理单元801可实现为CPU、GPU、TPU、FPGA、NPU等处理单元。
特别地,根据本公开的实施方式,上文参考附图描述的方法可以被实现为计算机软件程序。例如,本公开的实施方式包括一种计算机程序产品,其包括有形地包含在及其可读介质上的计算机程序,所述计算机程序包含用于执行附图中的方法的程序代码。在这样的实施方式中,该计算机程序可以通过通信部分809从网络上被下载和安装,和/或从可拆卸介质811被安装。例如,本公开的实施方式包括一种可读存储介质,其上存储有计算机指令,该计算机指令被处理器执行时实现用于执行附图中的方法的程序代码。
附图中的流程图和框图,图示了按照本公开各种实施方式的系统、方法和计算机程序产品的可能实现的体系架构、功能和操作。在这点上,路程图或框图中的每个方框可以代表一个模块、程序段或代码的一部分,所述模块、程序段或代码的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。也应当注意,在有些作为替换的实现中,方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如,两个接连地表示的方框实际上可以基本并行地执行,它们有时也可以按相反的顺序执行,这依所涉及的功能而定。也要注意的是,框图和/或流程图中的每个方框、以及框图和/或流程图中的方框的组合,可以用执行规定的功能或操作的专用的基于硬件的系统来实现,或者可以用专用硬件与计算机指令的组合来实现。
描述于本公开实施方式中所涉及到的单元或模块可以通过软件的方式实现,也可以通过硬件的方式来实现。所描述的单元或模块也可以设置在处理器中,这些单元或模块的名称在某种情况下并不构成对该单元或模块本身的限定。
作为另一方面,本公开还提供了一种计算机可读存储介质,该计算机可读存储介质可以是上述实施方式中所述节点中所包含的计算机可读存储介质;也可以是单独存在,未装配入设备中的计算机可读存储介质。计算机可读存储介质存储有一个或者一个以上程序,所述程序被一个或者一个以上的处理器用来执行描述于本公开的方法。
以上描述仅为本公开的较佳实施例以及对所运用技术原理的说明。本领域技术人员应当理解,本公开中所涉及的发明范围,并不限于上述技术特征的特定组合而成的技术方案,同时也应涵盖在不脱离所述发明构思的情况下,由上述技术特征或其等同特征进行任意组合而形成的其它技术方案。例如上述特征与本公开中公开的(但不限于)具有类似功能的技术特征进行互相替换而形成的技术方案。

Claims (14)

  1. 一种卸载卡命名空间管理系统,包括主机和与主机连接的卸载卡,其中,
    所述主机上运行有发出输入输出请求的多个应用,并且所述主机向所述卸载卡发送创建命名空间请求,
    所述卸载卡根据所述创建命名空间请求为所述多个应用创建对应的多个命名空间,
    所述卸载卡根据所述创建命名空间请求分配与所创建的多个命名空间对应的多个硬件队列并且将所分配的多个硬件队列分别绑定到所对应的命名空间。
  2. 根据权利要求1所述的系统,其中,所述主机上运行有虚拟机,所述多个应用运行于所述虚拟机中,所述虚拟机包括:
    驱动器,其用于管理所述多个硬件队列,所述虚拟机通过所述驱动器向所述卸载卡发送所述创建命名空间请求。
  3. 根据权利要求2所述的系统,其中,所述卸载卡包括控制器和硬件加速器,其中,
    所述控制器根据所述创建命名空间请求为所述多个应用创建各自的命名空间,并且从所述硬件加速器中分配与所创建的多个命名空间对应的多个硬件队列并且将所分配的多个硬件队列分别绑定到所对应的命名空间。
  4. 根据权利要求3所述的系统,其中,所述主机包括多核心中央处理器,所述虚拟机包括用户空间和内核空间,
    其中,所述用户空间中运行有所述多个应用,所述应用由对应的核心执行,
    其中,所述内核空间包括通用块层和所述驱动器,所述通用块层包括与所述多个核心对应的多个软件队列,并且建立从所述多个软件队列到所述多个硬件队列的一一对应关系。
  5. 根据权利要求4所述的系统,其中,所述驱动器是NVMe驱动器,所述控制器是NVMe控制器,
    其中,所述主机通过所述卸载卡连接到NVMe设备,
    其中,所述NVMe设备挂载到所述虚拟机内部,
    其中,所述用户空间中运行的应用向所述通用块层发出输入输出请求,
    其中,通过所述通用块层和所述NVMe驱动器的处理将所述输入输出请求转换成符合NVMe协议的输入输出请求,并且发送到所述NVMe控制器,
    其中,所述NVMe控制器通过与所创建的命名空间对应的硬件队列将所述符合NVMe协议的输入输出请求搬运到所述卸载卡,
    其中,所述卸载卡将符合NVMe协议的输入输出请求发送到所述NVMe设备。
  6. 一种卸载卡命名空间管理方法,所述方法运行于卸载卡命名空间管理系统,所述卸载卡命名空间管理系统包括主机和与主机连接的卸载卡,其中,所述方法包括:
    基于所述主机上运行的发出输入输出请求的多个应用,所述主机向所述卸载卡发送创建命名空间请求;
    所述卸载卡根据所述创建命名空间请求为所述多个应用创建对应的多个命名空间;
    所述卸载卡根据所述创建命名空间请求分配与所创建的多个命名空间对应的多个硬件队列并且将所分配的多个硬件队列分别绑定到所对应的命名空间。
  7. 根据权利要求6所述的方法,其中,所述主机上运行有虚拟机,所述多个应用运行于所述虚拟机中,所述虚拟机包括用于管理所述多个硬件队列的驱动器,其中,所述主机 向所述卸载卡发送创建命名空间请求,包括:
    所述虚拟机通过所述驱动器向所述卸载卡发送所述创建命名空间请求。
  8. 根据权利要求7所述的方法,其中,所述卸载卡包括控制器和硬件加速器,其中,所述卸载卡根据所述创建命名空间请求为所述多个应用创建对应的多个命名空间,包括:
    所述控制器根据所述创建命名空间请求为所述多个应用创建各自的命名空间,
    其中,所述卸载卡根据所述创建命名空间请求分配与所创建的多个命名空间对应的多个硬件队列并且将所分配的多个硬件队列分别绑定到所对应的命名空间,包括:
    所述控制器从所述硬件加速器中分配与所创建的多个命名空间对应的多个硬件队列并且将所分配的多个硬件队列分别绑定到所对应的命名空间。
  9. 根据权利要求8所述的方法,其中,所述主机包括多核心中央处理器,所述虚拟机包括用户空间和内核空间,
    其中,所述用户空间中运行有所述多个应用,所述应用由对应的核心执行,
    其中,所述内核空间包括通用块层和所述驱动器,所述通用块层包括与所述多个核心对应的多个软件队列,并且建立从所述多个软件队列到所述多个硬件队列的一一对应关系。
  10. 根据权利要求9所述的方法,其中,所述驱动器是NVMe驱动器,所述控制器是NVMe控制器,
    其中,所述主机通过所述卸载卡连接到NVMe设备,
    其中,所述方法还包括:
    将所述NVMe设备挂载到所述虚拟机内部;
    所述用户空间中运行的应用向所述通用块层发出输入输出请求;
    通过所述通用块层和所述NVMe驱动器的处理将所述输入输出请求转换成符合NVMe协议的输入输出请求,并且发送到所述NVMe控制器;
    所述NVMe控制器通过与所创建的命名空间对应的硬件队列将所述符合NVMe协议的输入输出请求搬运到所述卸载卡,
    所述卸载卡将符合NVMe协议的输入输出请求发送到所述NVMe设备。
  11. 一种输入输出请求处理系统,其包括主机、卸载卡和通过所述卸载卡与所述主机连接的NVMe设备,
    其中,所述主机上运行有虚拟机,所述主机包括多核心中央处理器,所述虚拟机包括用户空间和内核空间,其中,所述用户空间中运行有发出输入输出请求的多个应用,所述应用由对应的核心执行,其中,所述内核空间包括通用块层和NVMe驱动器,所述通用块层包括与所述多个核心对应的多个软件队列,其中,所述虚拟机通过所述NVMe驱动器向所述卸载卡发送创建命名空间请求,
    其中,所述卸载卡包括NVMe控制器和硬件加速器,其中,所述NVMe控制器根据所述创建命名空间请求为所述多个应用创建各自的命名空间,并且从所述硬件加速器中分配与所创建的多个命名空间对应的多个硬件队列并且将所分配的多个硬件队列分别绑定到所对应的命名空间,
    其中,所述NVMe驱动器用于管理所述多个硬件队列,所述通用块层建立从所述多个软件队列到所述多个硬件队列的一一对应关系,
    其中,所述用户空间中运行的应用向所述通用块层发出输入输出请求,
    其中,通过所述通用块层和所述NVMe驱动器的处理将所述输入输出请求转换成符合NVMe协议的输入输出请求,并且发送到所述NVMe控制器,
    其中,所述NVMe控制器通过与所创建的命名空间对应的硬件队列将所述符合NVMe协议的输入输出请求搬运到所述卸载卡,
    其中,所述卸载卡将符合NVMe协议的输入输出请求发送到所述NVMe设备。
  12. 一种输入输出请求处理方法,所述方法运行于输入输出请求处理系统,所述输入输出请求处理系统包括主机、卸载卡和通过所述卸载卡与所述主机连接的NVMe设备,
    其中,所述主机上运行有虚拟机,所述主机包括多核心中央处理器,所述虚拟机包括用户空间和内核空间,其中,所述用户空间中运行有发出输入输出请求的多个应用,所述应用由对应的核心执行,其中,所述内核空间包括通用块层和NVMe驱动器,所述通用块层包括与所述多个核心对应的多个软件队列,其中,所述虚拟机通过所述NVMe驱动器向所述卸载卡发送创建命名空间请求,
    其中,所述卸载卡包括NVMe控制器和硬件加速器,其中,所述NVMe控制器根据所述创建命名空间请求为所述多个应用创建各自的命名空间,并且从所述硬件加速器中分配与所创建的多个命名空间对应的多个硬件队列并且将所分配的多个硬件队列分别绑定到所对应的命名空间,
    其中,所述NVMe驱动器用于管理所述多个硬件队列,所述通用块层建立从所述多个软件队列到所述多个硬件队列的一一对应关系,
    其中,所述方法包括:
    所述用户空间中运行的应用向所述通用块层发出输入输出请求;
    通过所述通用块层和所述NVMe驱动器的处理将所述输入输出请求转换成符合NVMe协议的输入输出请求,并且发送到所述NVMe控制器;
    所述NVMe控制器通过与所创建的命名空间对应的硬件队列将所述符合NVMe协议的输入输出请求搬运到所述卸载卡;
    所述卸载卡将符合NVMe协议的输入输出请求发送到所述NVMe设备。
  13. 一种电子设备,包括存储器和处理器;其中,所述存储器用于存储一条或多条计算机指令,其中,所述一条或多条计算机指令被所述处理器执行以实现权利要求6-10、12任一项所述的方法步骤。
  14. 一种可读存储介质,其上存储有计算机指令,该计算机指令被处理器执行时实现权利要求6-10、12任一项所述的方法步骤。
PCT/CN2023/080410 2022-03-18 2023-03-09 卸载卡命名空间管理、输入输出请求处理系统和方法 WO2023174146A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210273112.9 2022-03-18
CN202210273112.9A CN114691037A (zh) 2022-03-18 2022-03-18 卸载卡命名空间管理、输入输出请求处理系统和方法

Publications (1)

Publication Number Publication Date
WO2023174146A1 true WO2023174146A1 (zh) 2023-09-21

Family

ID=82140049

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/080410 WO2023174146A1 (zh) 2022-03-18 2023-03-09 卸载卡命名空间管理、输入输出请求处理系统和方法

Country Status (2)

Country Link
CN (1) CN114691037A (zh)
WO (1) WO2023174146A1 (zh)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114691037A (zh) * 2022-03-18 2022-07-01 阿里巴巴(中国)有限公司 卸载卡命名空间管理、输入输出请求处理系统和方法
CN115686836A (zh) * 2022-10-17 2023-02-03 阿里巴巴(中国)有限公司 一种安装有加速器的卸载卡

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020133487A1 (en) * 2001-03-15 2002-09-19 Microsoft Corporation System and method for unloading namespace devices
CN109799951A (zh) * 2017-11-16 2019-05-24 三星电子株式会社 使用分布式的和虚拟的命名空间管理的按需存储供应
CN111198663A (zh) * 2020-01-03 2020-05-26 苏州浪潮智能科技有限公司 控制数据存取操作的方法、系统、装置以及存储介质
CN111459406A (zh) * 2020-03-08 2020-07-28 苏州浪潮智能科技有限公司 一种存储卸载卡下识别nvme硬盘的方法及系统
CN114691037A (zh) * 2022-03-18 2022-07-01 阿里巴巴(中国)有限公司 卸载卡命名空间管理、输入输出请求处理系统和方法

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105808167B (zh) * 2016-03-10 2018-12-21 深圳市杉岩数据技术有限公司 一种基于sr-iov的链接克隆的方法、存储设备及系统
CN107526645B (zh) * 2017-09-06 2019-01-29 武汉斗鱼网络科技有限公司 一种通信优化方法及系统
CN113312155B (zh) * 2021-07-29 2022-02-01 阿里云计算有限公司 虚拟机创建方法、装置、设备、系统及计算机程序产品
CN113849198A (zh) * 2021-09-28 2021-12-28 歌尔科技有限公司 一种卸载应用程序的方法、装置、设备和介质

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020133487A1 (en) * 2001-03-15 2002-09-19 Microsoft Corporation System and method for unloading namespace devices
CN109799951A (zh) * 2017-11-16 2019-05-24 三星电子株式会社 使用分布式的和虚拟的命名空间管理的按需存储供应
CN111198663A (zh) * 2020-01-03 2020-05-26 苏州浪潮智能科技有限公司 控制数据存取操作的方法、系统、装置以及存储介质
CN111459406A (zh) * 2020-03-08 2020-07-28 苏州浪潮智能科技有限公司 一种存储卸载卡下识别nvme硬盘的方法及系统
CN114691037A (zh) * 2022-03-18 2022-07-01 阿里巴巴(中国)有限公司 卸载卡命名空间管理、输入输出请求处理系统和方法

Also Published As

Publication number Publication date
CN114691037A (zh) 2022-07-01

Similar Documents

Publication Publication Date Title
US20200278880A1 (en) Method, apparatus, and system for accessing storage device
US10365830B2 (en) Method, device, and system for implementing hardware acceleration processing
US10324873B2 (en) Hardware accelerated communications over a chip-to-chip interface
WO2023174146A1 (zh) 卸载卡命名空间管理、输入输出请求处理系统和方法
JP5608243B2 (ja) 仮想化環境においてi/o処理を行う方法および装置
US11829309B2 (en) Data forwarding chip and server
US10846254B2 (en) Management controller including virtual USB host controller
US8930568B1 (en) Method and apparatus for enabling access to storage
US11741039B2 (en) Peripheral component interconnect express device and method of operating the same
US11940933B2 (en) Cross address-space bridging
CN104731635A (zh) 一种虚拟机访问控制方法,及虚拟机访问控制系统
CN112352221A (zh) 用以支持虚拟化环境中的ssd设备驱动器与物理ssd之间的sq/cq对通信的快速传输的共享存储器机制
US11036649B2 (en) Network interface card resource partitioning
CN114925012A (zh) 一种以太网帧的下发方法、上传方法及相关装置
US20240061802A1 (en) Data Transmission Method, Data Processing Method, and Related Product
US11281602B1 (en) System and method to pipeline, compound, and chain multiple data transfer and offload operations in a smart data accelerator interface device
WO2023186143A1 (zh) 一种数据处理方法、主机及相关设备
US11601515B2 (en) System and method to offload point to multipoint transmissions
US20230350824A1 (en) Peripheral component interconnect express device and operating method thereof
US11422963B2 (en) System and method to handle uncompressible data with a compression accelerator
JPWO2018173300A1 (ja) I/o制御方法およびi/o制御システム
CN115396250A (zh) 具有一致事务排序的多插槽网络接口控制器

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23769652

Country of ref document: EP

Kind code of ref document: A1