WO2020029619A1 - 数据处理的方法、设备和服务器 - Google Patents
数据处理的方法、设备和服务器 Download PDFInfo
- Publication number
- WO2020029619A1 WO2020029619A1 PCT/CN2019/085476 CN2019085476W WO2020029619A1 WO 2020029619 A1 WO2020029619 A1 WO 2020029619A1 CN 2019085476 W CN2019085476 W CN 2019085476W WO 2020029619 A1 WO2020029619 A1 WO 2020029619A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- server
- request
- descriptor
- memory
- queue
- Prior art date
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F13/00—Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
- G06F13/38—Information transfer, e.g. on bus
- G06F13/42—Bus transfer protocol, e.g. handshake; Synchronisation
- G06F13/4282—Bus transfer protocol, e.g. handshake; Synchronisation on a serial bus, e.g. I2C bus, SPI bus
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/44—Arrangements for executing specific programs
- G06F9/455—Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
- G06F9/45533—Hypervisors; Virtual machine monitors
- G06F9/45558—Hypervisor-specific management and integration aspects
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F13/00—Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
- G06F13/14—Handling requests for interconnection or transfer
- G06F13/20—Handling requests for interconnection or transfer for access to input/output bus
- G06F13/28—Handling requests for interconnection or transfer for access to input/output bus using burst mode transfer, e.g. direct memory access DMA, cycle steal
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/44—Arrangements for executing specific programs
- G06F9/455—Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
- G06F9/45533—Hypervisors; Virtual machine monitors
- G06F9/45558—Hypervisor-specific management and integration aspects
- G06F2009/45579—I/O management, e.g. providing access to device drivers or storage
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2213/00—Indexing scheme relating to interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
- G06F2213/0026—PCI express
Definitions
- This application relates to the field of servers, and more particularly, to methods, devices, and servers for data processing.
- a virtual input / output device (virtio input / output device, Virtio) on a virtual machine can be implemented in the following ways:
- VMM virtual machine monitor
- the VMM simulates multiple Virtio
- the overhead of the central processing unit (CPU) in the server where the virtual machine is located increases and the performance decreases.
- This application provides a data processing method, which can optimize the I / O path and improve the performance of virtualized I / O.
- a data processing method includes: a device in a server obtains a first input / output I / O request sent by a virtual machine, and the device is connected to the server through a high-speed serial computer expansion standard PCIe bus.
- the virtual machine runs on the server, and the device provides the server with multiple virtual function VFs.
- the first I / O request is initiated by the virtual machine for any one of the multiple VFs.
- the O request includes a read operation or a write operation.
- the read operation is used to perform a read data operation on the I / O device of the server.
- the write operation is used to perform a write data operation on the I / O device of the server.
- the VF uses To manage the storage space of the virtual machine; the device reads or writes data from the I / O device of the server according to the first I / O request.
- the device by offloading the I / O services processed by the processor in the server to the device, the device completes the I / O processing process, the device presents to the server as a virtualized input / output controller, and the device provides I / O
- the O resource is used by the virtual machine.
- the virtual machine directly sends an I / O request to the device, and the device processes the I / O request.
- the data processing method of this application can optimize the I / O processing process, reduce the load of the processor of the server, and directly process the I / O request of the virtual machine by the device, which further improves the virtualized I / O performance.
- the device obtaining the first I / O request sent by the virtual machine includes: the device obtaining a first descriptor from the first queue, where the first descriptor is the first descriptor in the virtual machine. Generated by the front-end driver after processing the first I / O request, the first descriptor is used to indicate a storage location of the first I / O request in the server's memory, and the first queue is stored to the server for the server. Memory, the first queue is used to store descriptors of multiple I / O requests including the first I / O request; the device obtains the first I from the server's memory according to the first descriptor / O request.
- the device includes a descriptor prefetch engine and a direct memory access DMA engine and memory, and the device obtains the first descriptor from the storage space allocated by the server for the first queue, including: the The descriptor prefetch engine generates a second descriptor and sends the second descriptor to the DMA engine.
- the second descriptor is used to indicate that the first descriptor is in a storage space allocated by the server for the first queue.
- the storage location of the memory the DMA engine obtains the first descriptor from the storage space allocated by the first queue in a DMA manner according to the second descriptor, and stores the first descriptor into the memory of the device.
- the descriptor prefetch engine is used to automatically obtain descriptors from the server.
- the descriptor prefetch engine will Automatically move descriptors from the server to the device's local memory, thereby speeding up I / O request processing.
- the device includes an interrupt generation module and a back-end driver, and the device obtains the first I / O request from the memory of the server according to the first descriptor, including the back-end
- the driver processes the interrupt request initiated by the interrupt generating module, obtains the first descriptor from the memory, and sends the first descriptor to the DMA engine; the DMA engine uses the DMA to retrieve the first descriptor from the first descriptor. Obtain the first I / O request in the memory of the server.
- the device further includes an I / O device engine, and the device reads or writes data from the I / O device of the server according to the first I / O request, including: when the device When the first I / O request is the read operation, the back-end driver generates a read data message according to the first I / O request, and the read data message is used to indicate that the target data to be read is in the I / O.
- the read data message is also used to indicate the storage location of the read target data in the device's memory; the backend driver sends to the I / O device through the I / O device engine The read data message; the back-end driver instructs the DMA engine to store the target data stored in the memory of the device into the memory of the server.
- the device reads or writes data from an I / O device of the server according to the first I / O request, including: when the first I / O request is the write operation
- the DMA engine obtains the target data to be written to the I / O device from the memory of the server, and writes the target data into the memory of the device;
- the back-end driver generates according to the first I / O request Write data message, the write data message is used to indicate the storage location of the target data in the memory, and the write data message is also used to indicate the target data is stored to the storage location of the I / O device;
- the end driver sends the write data message to the I / O device.
- the method further includes: after the first I / O request processing is completed, the back-end driver sends an interrupt request to the server through the DMA engine, and the interrupt request is used by the server to determine The device processes the first I / O request.
- a data processing device configured in a server.
- the device is connected to the server through a high-speed serial computer expansion standard PCIe bus.
- the data processing device is configured to execute the first aspect or The method in any possible implementation of the first aspect.
- a server in which the data processing device provided in the second aspect is configured, and the device is connected to the server through a high-speed serial computer expansion standard PCIe bus.
- a computer-readable storage medium stores instructions. When the instructions are run on a computer, the computer is caused to execute the first aspect or any possible implementation manner of the first aspect. Method.
- a computer program product containing instructions which when executed on a computer, causes the computer to execute the method of the first aspect or any possible implementation of the first aspect.
- FIG. 1 is a schematic block diagram of a virtual machine system according to an embodiment of the present invention.
- FIG. 2 is a schematic block diagram of a data processing device according to an embodiment of the present invention.
- FIG. 3 is a schematic flowchart of a data processing method according to an embodiment of the present invention.
- FIG. 4 is another schematic block diagram of a data processing device according to an embodiment of the present invention.
- FIG. 1 shows a schematic block diagram of a server 100 according to an embodiment of the present invention.
- the server 100 runs at least one virtual machine 101, and each virtual machine 101 is configured with a front-end driver 1011.
- the server 100 is also configured with a device 102 for data processing, and the device 102 uses a high-speed serial computer
- An extended standard (Peripheral Component Interconnect Express) bus communicates with the server 100.
- the data processing device 102 is configured to process an I / O request from a virtual machine, and the I / O request includes a read operation or a write operation.
- the device 102 in the embodiment of the present invention may support communication with a front-end driver developed based on a virtual input / output (virtual I / O, Virtio) protocol, that is, the front-end driver 1011 in the embodiment of the present invention may It was developed based on the Virtio protocol.
- the front-end driver developed based on the Virtio protocol is highly versatile, does not need to be changed as the kernel version of the server 100 is changed, and the maintenance cost is low.
- the device 102 may be a system-on-chip (SoC) during specific implementation.
- SoC may be a separate PCIe card.
- the PCIe card may be deployed on the server 100 or directly integrated with the motherboard of the server 100.
- the device 102 includes a controller 1021, a register 1022, a descriptor prefetch engine 1023, a back-end driver 1024, a direct memory access (DMA) engine 1025, an interrupt generation module 1026, and I / O.
- the device 102 may use a physical function (PF) of an I / O device connected to the device 102 through the I / O device engine 1027 through hardware virtualization (for example, a single input-output virtual (Single root I / O virtualization (SR-IOV)) presents multiple virtual functions (VF) to the server 100. Any two VFs are isolated from each other, and the multiple VFs are assigned to multiple Used by virtual machines. Each virtual machine can be assigned one or more VFs.
- PF physical function
- hardware virtualization for example, a single input-output virtual (Single root I / O virtualization (SR-IOV)
- SR-IOV single input-output virtual
- VF virtual functions
- the I / O device connected to the device 102 through the I / O device engine 1027 is a storage device.
- the device 102 presents 256 VFs to the server 100.
- Each VF can be responsible for managing a piece of storage space in the storage device.
- the 256 VFs are equivalent to 256 storage controllers for a virtual machine.
- a total of 1024 VFs can use 1024 queues.
- the storage device may be a solid state drive (SSD). See the description of the queue below.
- SSD solid state drive
- the device 102 in the embodiment of the present invention may also be connected to a network device (for example, a PCIe Switch) through the I / O device engine 1027.
- a network device for example, a PCIe Switch
- the device 102 presents 100 VFs to the server 100.
- the 100 A VF is equivalent to 100 network controllers for a virtual machine.
- the server 100 further includes a device driver 103, and a processor (not shown in FIG. 1) in the server 100 executes instructions in the device driver 103 to configure a working mode of the device 102.
- the so-called configuration of the working mode of the device 102 refers to configuring the number of available VFs among the plurality of VFs presented by the device 102 to the server 100 and the number of queues allocated for each VF.
- the processor in the server 100 needs to execute the instructions in the device driver 103, thereby completing the configuration of the working mode of the device 103. For example, the processor in the server 100 executes the instructions in the device driver 103, configures that 50 of the 256 VFs are available to the virtual machine, and configures that each VF can use 4 queues.
- Each VF corresponds to multiple data structures.
- the device 102 divides multiple storage spaces into its memory space. Each storage space is used to store a data structure.
- the multiple storage spaces can belong to a base address register (base address (register, BAR) space, or each storage space can also be used as a BAR space, where each BAR space will be assigned an index number.
- BAR base address register
- the device 102 stores a common configuration capability data structure and a notification capability data structure in a BAR space with an index number of 1.
- Register 1022 is the configuration space of device 102. This configuration space is divided into multiple configuration subspaces, and each configuration subspace is allocated as a VF configuration space to the VF for use.
- the configuration space of each VF includes the device ID stored. (device ID), vendor ID (vendor ID), multiple base address fields, BAR space index number fields, multiple offset amount fields, and multiple storage length fields.
- the device 102 When the device 102 stores the data structure to the BAR space allocated for the data structure, it writes the offset of the start address of the storage space occupied by the data structure in the BAR space relative to the start address of the BAR space into the corresponding Offset field, and write the length of the storage space occupied by the data structure in the BAR space into the corresponding storage length field.
- the device 102 stores the common configuration capability data structure in the BAR space with the index number 1, and the start address of the storage space occupied by the data structure in the BAR space with the index number 1 is relative to the start address of the BAR space.
- the shift amount is 10 bytes.
- the storage length of the data structure in the BAR space is 100 bytes.
- the device 102 stores the notification capability data structure in the BAR space with the index number 1, and the data structure is in the BAR space with the index number 1.
- the offset of the start address of the storage space occupied by the medium from the start address of the BAR space is 110 bytes.
- the storage length of the data structure in the BAR space is 50 bytes.
- the corresponding offset and storage length of the structure and notification capability data structure in the BAR space with the index number 1 are written into the corresponding VF configuration space.
- the common configuration capability structure includes a number of queues (num_queues) field, a queue_size field, and a queue_enable bit allocated for the VF.
- each data structure of the VF is specified by a virtual input / output (virtual I / O, Virtio) protocol, that is, which fields in each data structure of the VF are used to store what content are all controlled by Virtio Protocol is defined.
- the controller 1021 may be a CPU or other general-purpose processors, SoCs, digital signal processors (DSPs), application specific integrated circuits (ASICs), and field programmable gate arrays (FPGAs). Or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc.
- a general-purpose processor may be a microprocessor or any conventional processor.
- the memory 1028 may include a read-only memory and a random access memory, and provide instructions and data to the processor.
- the memory may also include non-volatile random access memory.
- the memory may also store information about the type of device.
- the memory may be volatile memory or non-volatile memory, or may include both volatile and non-volatile memory.
- the non-volatile memory may be read-only memory (ROM), programmable read-only memory (PROM), erasable programmable read-only memory (erasable PROM, EPROM), electrical memory Erase programmable read-only memory (EPROM, EEPROM) or flash memory.
- the volatile memory may be a random access memory (RAM), which is used as an external cache.
- RAM random access memory
- DRAM dynamic random access memory
- SDRAM synchronous dynamic random access memory
- ESDRAM enhanced synchronous dynamic random access memory
- SLDRAM synchronous connection dynamic random access memory
- direct RAMbus RAM direct RAMbus RAM, DR RAM
- the server 100 Before the server 100 uses the device 102 to process I / O requests from the virtual machine, in addition to requiring the processor in the server 100 to execute the instructions in the device driver 103, it also needs the front-end driver 1011 to complete the initialization of the device 102.
- the driver 1011 introduces a process of initializing the device 102.
- the front-end driver 1011 maps the BAR space of each data structure stored in the device 102 to the memory space of the server 100.
- the server 100 When the server 100 is powered on, the server 100 divides multiple storage spaces in its memory space, and establishes a mapping relationship between the multiple storage spaces and multiple BAR spaces in the device 102.
- the first address of the storage space allocated in the memory space of the server 100 is written into the corresponding base address field. For example, the first address of the storage space allocated for the BAR space with the index number 0 is written into the base address with the index number 0. Field.
- the device 102 can be detected based on the device ID and manufacturer ID.
- the front-end driver 1011 needs to access the data structure stored in the BAR space of a VF in the device 102, it can be based on The index number of the BAR space stored in the configuration space of the VF, through the same base address field as the index number of the BAR space, find the first address of the storage space allocated for the BAR space in the memory space of the server 100, and then According to the offset field and the storage length field stored in the configuration space of the VF, the storage area of the data structure to be accessed is determined in the storage space allocated for the BAR space, so that the device 102 is accessed by accessing the memory of the server 100 Access to the BAR space.
- the front-end driver 1011 initiates a reset operation to the device 102.
- the front-end driver 1011 first resets the device 102 to ensure that the historical information is cleared. At this time, the hardware logic in the controller 1021 will perceive this step and reset the fields in the common configuration capability data structure and the notification capability data structure.
- the front-end driver 1011 obtains the characteristics of the I / O device.
- the Virtio protocol defines a series of characteristic bits to identify the capabilities of the device and driver. For example, if the I / O device device is a network card, the characteristics can be used to identify whether the network card supports the checksum offload function.
- the front-end driver 1011 obtains the characteristics of the I / O device.
- the device feature (device_feature) is in the common configuration capability data structure, and the field length is 4 bytes.
- the front-end driver 1011 After successfully obtaining the device characteristics of the I / O device, the front-end driver 1011 also has a set of supported driver characteristics.
- the front-end driver 1011 will intersect the device characteristics obtained from the common configuration capability data structure with its own characteristics, and The intersection features are written into the driver feature (driver_feature) field in the common configuration capability data structure as the features supported by the front-end driver 1011, so that the driver features are notified to the device 102 as the features finally negotiated with the device 102.
- Front-end driver 1011 enables queues one by one
- the front-end driver 1011 allocates a storage space for each queue of each VF in the memory space of the server 100. Each storage space is used to store the descriptors corresponding to multiple I / O requests initiated by the virtual machine for the corresponding VF, pending processing.
- Number of I / O requests Avail_idx, the index of the first descriptor of multiple descriptors of pending I / O requests, Avail_entry [i] .index, the number of I / O requests that have been processed, used_idx, and the number of The processed I / O request uses the index of the first descriptor of the multiple descriptors Used_entry [i] .index, where i is the index number of the I / O request.
- the description of the descriptor is described below.
- the common configuration capability data structure stores the first address queue_desc of the storage descriptor area in the storage space allocated for the queue, the first address queue_avail of the area where the number of pending I / O requests is stored, and the I / O where the processed I / O is stored.
- the first address of the requested_idx area is queue_used.
- the common configuration capability data structure also stores a queue selection (queue_select) field. By setting the queue_select field as the index number of the queue, it represents that each field in the common configuration capability data structure belongs to the queue indicated by the queue_select field.
- the front-end driver 1011 configures the queue_select field in the common configuration capability data structure as the index number of different queues to configure queue_desc, queue_avail, and queue_used that belong to each queue. After the configuration is completed, the front-end driver 1011 can set a queue
- the queue_enable field in the public configuration capability data structure is set to 1, which means that the queue is enabled, that is, the memory space allocated by the server 100 for the queue can be used.
- the front-end driver 1011 also registers the MSI-X interrupt for each queue in the server 100 according to the number of MSI-X interrupts supported by the I / O device.
- the I / O request issued by the virtual machine for each VF is first obtained by the front-end driver 1011.
- the front-end driver 1011 encapsulates the I / O request as a descriptor, and stores the descriptor as the corresponding VF. In the allocated storage space of the queue.
- Descriptors are used to describe I / O requests.
- the structure of the descriptors can be: address + length + flag + next.
- Next is used to describe the index number of the next descriptor that is logically adjacent to the descriptor.
- Each description The size of the symbol is 16 bytes. For example, if an I / O request is divided into 8 fragments, each fragment includes 100 bytes, then the 8 fragments correspond to 8 descriptors, and each descriptor is used to indicate that a fragment is stored in the memory of the server 100 The address, the length of the storage, the read / write flag of the fragment, and the index number of the next descriptor logically adjacent to the descriptor.
- the storage space allocated for the queue also stores the number of pending I / O requests initiated by the virtual machine.
- Avail_idx and multiple descriptions of the I / O request Index of the first descriptor in the symbol Avail_entry [i] .index, i represents the index number of the request, the number of pending I / O requests, Avail_idx and the first of the multiple descriptors of the I / O request
- the index number of the descriptor Avail_entry [i] .index is adjacent to each other in the storage space. Among them, the number of pending I / O requests Avail_idx will increase correspondingly as the number of I / O requests initiated by the virtual machine increases.
- the device 102 may use the queue_avail stored in the public configuration capability data structure of the queue with an index number of 2.
- FIG. 3 shows a schematic flowchart of a method 200 that includes at least the following steps.
- a device in the server 100 obtains a first input-output I / O request sent by a virtual machine.
- the device connects to the server 100 through a high-speed serial computer expansion standard PCIe bus.
- the virtual machine runs on the server 100, and the device sends the The server 100 provides multiple virtual function VFs.
- the first I / O request is initiated by the virtual machine for any one of the multiple VFs.
- the first I / O request includes a read operation or a write operation. The operation is used to perform a read data operation on the I / O device of the server 100, and the write operation is used to perform a write data operation on the I / O device of the server 100.
- the virtual machine running on the server 100 may initiate an I / O request (for example, a first I / O request) for a certain VF provided by the device 102, and the I / O request may be a read operation or a write operation. It is used to perform a read data operation on the I / O device of the server 100, and a write operation is used to perform a write data operation on the I / O device of the server 100.
- I / O request for example, a first I / O request
- the I / O request may be a read operation or a write operation. It is used to perform a read data operation on the I / O device of the server 100, and a write operation is used to perform a write data operation on the I / O device of the server 100.
- the device reads or writes data from the I / O device of the server 100 according to the first I / O request.
- the device 102 first obtains the I / O request of the virtual machine. If the I / O request is a read operation, the device 102 performs a read data operation on the I / O device of the server 100 according to the I / O request; or If the I / O request is a write operation, the device 102 performs a write data operation on the I / O device of the server 100 according to the I / O request.
- the device obtaining the first I / O request sent by the virtual machine includes: the device obtains a first descriptor from a first queue, where the first descriptor is after the front-end driver processes the first I / O request; Generated, the first descriptor is used to indicate a storage location of the first I / O request in the memory of the server 100, the first queue is stored to the memory of the server 100, and the first queue is used to store Descriptors of multiple I / O requests including the first I / O request; the device obtains the first I / O request from the memory of the server 100 according to the first descriptor.
- the I / O request is first obtained by the front-end driver 1011, and the front-end driver 1011 encapsulates the I / O request into multiple descriptors (for example, the first descriptor).
- the front-end driver 1011 determines which storage space of the I / O device the destination address belongs to according to the destination address of the read data or the destination address of the write data carried in the I / O request, and further determines the storage space according to the determined storage space.
- the VF corresponding to the storage space (that is, the VF that is responsible for managing the storage space) stores the generated multiple descriptors in the storage space allocated by the server 100 to a certain queue (for example, the first queue) of the VF .
- the device 102 obtains multiple descriptors of the I / O request from the storage space allocated by the server 100 for the queue, and obtains the I / O requests from the memory of the server 100 according to the multiple descriptors.
- the storage space allocated by the server 100 for each queue is part of the storage space in the memory of the server 100.
- the method for the device 102 to obtain the first descriptor of the first I / O request from the memory of the server 100 and the device 102 to obtain the first I / O request from the memory of the server 100 according to the first descriptor The methods are explained separately.
- the device obtains a first descriptor from the storage space allocated by the server 100 for the first queue, including: the descriptor prefetch engine generates a second descriptor, and sends the second descriptor to the DMA engine, and the first The two descriptors are used to indicate the storage location of the first descriptor in the storage space allocated by the server 100 for the first queue; the DMA engine obtains the first queue from the first queue through DMA according to the second descriptor.
- the first descriptor is stored in the memory 1028 of the device 102.
- the front-end driver 1011 After the front-end driver 1011 stores the multiple descriptors of the I / O request into the storage space allocated by the server 100 for the queue, the front-end driver 1011 operates the notification capability data structure belonging to the queue, for example, The front-end driver 1011 writes the index number of the queue (for example, the index number is 2) in the Notify field in the notification capability data structure, and the hardware logic in the descriptor prefetch engine 1023 will perceive the operation, thereby notifying the descriptor prefetch The engine 1023 has a new descriptor update event in the queue with the index number 2 of the VF.
- the descriptor prefetch engine 1023 checks the public configuration capability data structure of the queue with the index number 2 to obtain the first address queue_desc of the storage descriptor area in the storage space allocated for the queue in the server 100, and stores the pending I /
- the number of O requests is the first address of the area of Avail_idx queue_avail, and in the storage space allocated by the server 100 to the queue, the area of the storage descriptor and the area of the number of pending I / O requests are located.
- the number of pending I / O requests stored in the queue's storage space is 10
- the device 102's current processing is an I / O request with an index number of 6
- the descriptor prefetch engine 1023 according to the storage space
- the index number of the first descriptor of the multiple I / O requests stored with the index number 6 is Avail_entry [6] .index and the first address of the memory descriptor area queue_desc, and the index number is found from the storage space. Is the first descriptor of the I / O request of 6, and then according to the next field in the first descriptor, other descriptors belonging to the I / O request with the index number 6 are found in the storage space.
- the descriptor prefetch engine 1023 may generate multiple DMA transfer descriptors (for example, second descriptors) and DMA transfer descriptors for multiple descriptors of an I / O request with an index of 6 located in the storage space.
- the structure can be: address + length, that is, the descriptor prefetch engine 1023 generates a DMA transfer descriptor for each of the multiple descriptors, and each DMA transfer descriptor includes a corresponding descriptor in the storage space.
- the storage location (for example, start and end addresses) in the and the length of the storage.
- the descriptor prefetch engine 1023 provides the generated multiple DMA handling descriptors to the DMA engine 1025.
- the DMA engine 1025 obtains the multiple descriptors of the I / O request with the index number 6 from the storage space in the server 100.
- the plurality of descriptors are stored in the memory 1028 of the device 102.
- the descriptor prefetch engine is used to automatically obtain descriptors from the server.
- the descriptor prefetch engine will Automatically move descriptors from the server to the device's local memory, thereby speeding up I / O request processing.
- the device obtaining the first I / O request from the memory of the server 100 according to the first descriptor includes: a back-end driver processing an interrupt request initiated by the interrupt generating module, and obtaining the first descriptor from the memory; And sending the first descriptor to the DMA engine; according to the first descriptor, the DMA engine obtains the first I / O request from the memory of the server 100 in a DMA manner.
- the descriptor prefetch engine 1023 sends the interrupt to the processor in the device 102 through the interrupt generation module 1026 (FIG. (Not shown in 2) sends an interrupt request, because the backend driver 1024 has registered the interrupt processing callback function with the processor in the device 102 in advance, so when the processor in the device 102 processes the interrupt request initiated by the DMA engine 1025 , Will enter the backend driver 1024 processing logic.
- the back-end driver 1024 obtains multiple descriptors of the I / O request with the index number 6 from the memory 1028, and sends the multiple descriptors to the DMA engine.
- the DMA engine The memory of the server 100 obtains an I / O request with an index number 6 initiated by the virtual machine.
- the I / O service processed by the processor in the server is offloaded to the device, and the I / O processing process is completed by the device.
- the device is presented to the server as a virtualized input / output controller, and the device provides I / O O resources are used by the virtual machine.
- the virtual machine directly issues an I / O request to the device, and the device processes the I / O request.
- this method can optimize the I / O processing process. , Reduce the load on the server's processor, and the device directly processes the I / O requests of the virtual machine to further improve the virtualized I / O performance.
- the device 102 may process the first I / O request according to the type of the first I / O request. For example, when the type of the first I / O request is a read operation, the device 102 performs a read data operation on the I / O device of the server 100; when the type of the first I / O request is a write operation, the device 102 performs a write data operation on the I / O device of the server 100.
- the process of processing the I / O request by the device 102 is divided into scenarios (for example, a write operation scenario or a read operation scenario).
- Scenario 1 The device reads or writes data from the I / O device of the server 100 according to the first I / O request, including: when the first I / O request is the read operation, the back-end driver The first I / O request generates a read data message, where the read data message is used to indicate a storage location of target data to be read in the I / O device, and the read data message is further used to indicate reading The storage location of the target data in the memory 1028 of the device 102; the back-end driver sends the read data message to the I / O device through the I / O device engine; the back-end driver notifies the DMA engine to The target data stored in the memory 1028 of the device 102 is stored in the memory of the server 100.
- the back-end driver 1024 converts the I / O request into a read data message conforming to the transmission between the device 102 and the I / O device, for example, the I / O If the device is a network card, the format of the read data message conforms to the message format required for network transmission between the device 102 and the network card.
- the read data message includes the storage location of the target data to be read in the I / O device and the storage location of the read data in the memory 1028.
- the I / O device uses the read data message to The target data is read from its own memory, and the read target data is stored in the memory 1028 through the I / O device engine 1027.
- the I / O device After the I / O device stores the target data read from its own memory into the memory 1028, it will notify the back-end driver 1024 that the data has been stored in the memory 1028, and the back-end driver 1024 then notifies the DMA engine 1025, and the DMA engine 1025
- the read target data is obtained from the memory 1028, and the target data is written into the storage space indicated by the I / O request in the memory of the server 100.
- the device reads or writes data from the I / O device of the server 100 according to the first I / O request, including: when the first I / O request is a write operation, the DMA engine from the server 100 memory obtains target data to be written to the I / O device, and writes the target data into the device 102 memory 1028; the back-end driver generates a write data message according to the first I / O request, the The write data message is used to indicate the storage location of the target data in the memory, and the write data message is also used to indicate the target data is stored to the storage location of the I / O device; the back-end driver sends the data to the I / O device.
- the O device sends the write data message.
- the backend driver 1024 first obtains target data to be written to the I / O device from the server 100 through the DMA engine 1025 according to the I / O request, and the DMA engine 1025 The acquired target data is stored in the memory 1028.
- the back-end driver 1024 converts the I / O request into a write data message conforming to the transmission between the device 102 and the I / O device.
- the write data message includes the storage location of the target data stored in the memory 1028.
- the write The data message also includes the storage location of the target data in the I / O device when the target data is written to the I / O device.
- the I / O device first obtains the target data to be written from the memory 1028 through the I / O device engine 1027, and writes the target data into the I / O device indicated by the write data message. storage.
- the back-end driver 1024 reports the processing result of the I / O request to the processor (not shown in FIG. 1) in the server 100.
- the method further includes: after the first I / O request processing is completed, the back-end driver sends an interrupt request to the server 100 through the DMA engine, and the interrupt request is used by the server 100 to determine that the device responds to the first I / O O request processing result.
- the storage space allocated for the queue stores multiple descriptors of pending I / O requests, the number of pending I / O requests, Avail_idx, and multiple pending I / O requests.
- the index number of the first descriptor in the descriptor is Avail_entry [i] .index, the number of I / O requests that have been processed Used_idx, and the index number of the first descriptor that has been processed I / O requests Used_entry [ i] .index.
- the backend driver 1024 After the backend driver 1024 determines that the I / O request with the index number 6 is processed, it updates the Used_idx value from 10 to 9 through the DMA engine 1025, and updates the first description of the I / O request with the index number 6. The index number of the symbol is written to Used_entry [6] .index.
- the back-end driver 1024 When the back-end driver 1024 completes the processing of the I / O request with the index number 6, the back-end driver 1024 sends an interrupt request to the processor in the server 100 through the DMA engine 1025. Because the front-end driver 1011 has the interrupt processing callback function in advance Registered in the processor in the server 100. Therefore, when the processor in the server 100 processes the interrupt request initiated by the DMA engine 1025, it enters the processing logic of the front-end driver 1011.
- the public configuration capability data structure stores the first address queue_used of the used_idx area in the queue that stores the number of I / O requests that have been processed.
- the front-end driver 1011 will queue the server 100 for the queue based on the first address queue_used.
- the front-end driver 1011 when the front-end driver 1011 performs a write operation on the Notify field in the notification capability data structure of a queue, there is sufficient space in the local memory at the same time for storing multiple I / O requests in the queue. Descriptor, at this time it can form a valid RR scheduling request for the queue.
- the front-end driver 1011 writes the index numbers of each queue in multiple Notify fields of the multiple queues. RR scheduling requests.
- the controller 1021 may group the multiple queues in advance, for example, divide the multiple queues into 32 groups (that is, 32 scheduling groups), and each scheduling group is Includes 32 queued dispatch requests.
- a scheduling group with a scheduling request is selected from the 32 scheduling groups through the first-level RR scheduling, and then a second-level RR scheduling is performed for the scheduling group, and a queue is selected from the 32 queues in the scheduling group.
- the RR scheduling request formed by the queue is processed in advance.
- the descriptor prefetch engine 1023 needs to determine whether the queue has a pending descriptor in the memory space of the server 100 before it can start an effective prefetch operation. Only when the queue has pending descriptors in the memory space of the server 100, the descriptor prefetch engine 1023 can fetch the pending descriptors from the server 100, otherwise it may initiate some uselessness to the server 100 The read operation wastes the bandwidth of the PCIe interface. Therefore, before the descriptor prefetch engine 1023 prefetches multiple descriptors of I / O requests from the server 100, it is necessary to determine whether to prefetch multiple descriptors of I / O requests from the server 100.
- the following describes a method for determining whether multiple descriptors of I / O requests in the queue are prefetched from the server 100 according to an embodiment of the present invention.
- the queue may include multiple pending I / O requests.
- the number of pending I / O requests in the queue, Avail_idx is stored in the common configuration capability data structure.
- Two registers are configured in 1023, which are respectively used to record the number of pending I / O requests Avail_idx stored in the storage space in the server 100 for each queue and the number of I / O requests Avail_ring_index_engine that the device 102 has completed processing.
- the descriptor prefetch engine 1023 obtains the Avail_idx and Avail_ring_index_engine belonging to the queue in the above two registers according to the index number of the queue.
- Avail_idx and Avail_ring_index_engine belonging to the queue are equal, it means that the I / O request in the queue has been processed. At this time, the descriptor prefetch engine 1023 does not need to obtain descriptors from the storage space of the queue;
- the descriptor prefetch engine 1023 needs to continue to obtain descriptors from the queue's storage space, and I / O requests are processed.
- FIG. 4 is a schematic block diagram of a data processing device 300 according to an embodiment of the present invention.
- the device 300 is configured in the server 100 and is connected to the server 100 through a high-speed serial computer extended standard PCIe bus.
- the device 300 includes: an acquisition module 301, and processing Module 302, storage module 303, interrupt generation module 304, and I / O device engine module 305.
- the obtaining module 301 is configured to obtain a first input / output I / O request sent by a virtual machine, where the first I / O request is initiated by the virtual machine for any one of the plurality of VFs, and the first I / O
- the request includes a read operation or a write operation.
- the read operation is used to perform a read data operation on the I / O device of the server.
- the write operation is used to perform a write data operation on the I / O device of the server.
- the VF is used for Manage the storage space of the virtual machine;
- the processing module 302 is configured to read or write data from an I / O device of the server according to the first I / O request.
- the obtaining module 301 is further configured to obtain a first descriptor from the first queue, where the first descriptor is generated after the front-end driver in the virtual machine processes the first I / O request, and The first descriptor is used to indicate a storage location of the first I / O request in the server's memory, the first queue is stored to the server's memory, and the first queue is used to store the first I / O request. Descriptors for multiple I / O requests;
- the obtaining module 301 is further configured to obtain the first I / O request from the memory of the server according to the first descriptor.
- the device 300 further includes a storage module 303, and the processing module 302 is further configured to generate a second descriptor, and send the second descriptor to the acquisition module 301, where the second descriptor is used to indicate the A storage location of the first descriptor in a storage space allocated by the server for the first queue;
- the obtaining module 301 is further configured to obtain the first descriptor from the storage space allocated by the first queue in a DMA manner according to the second descriptor, and store the first descriptor to a storage module of the device. 303.
- the device 300 further includes: an interrupt generation module 304, and the processing module 302, configured to process the interrupt request initiated by the interrupt generation module 304, obtain the first descriptor from the storage module 303, and copy the first descriptor Sending a first descriptor to the acquisition module 301;
- the obtaining module 301 is further configured to obtain the first I / O request from the memory of the server in a DMA manner according to the first descriptor.
- the device 300 further includes an I / O device engine module 305, and the processing module 302 is further configured to generate a read according to the first I / O request when the first I / O request is the read operation.
- Data message the read data message is used to indicate the storage location of the target data to be read in the I / O device, and the read data message is also used to indicate the read target data in the storage module 303 Storage location in
- the processing module 302 is further configured to send the read data message to the I / O device through the I / O device engine module 305;
- the processing module 302 is further configured to notify the obtaining module 301 to store the target data stored in the storage module 303 into a memory of the server.
- the obtaining module 301 is further configured to, when the first I / O request is the write operation, obtain target data to be written into the I / O device from the memory of the server, and write the target data Into the storage module 303;
- the processing module 302 is further configured to generate a write data message according to the first I / O request, where the write data message is used to indicate a storage location of the target data in the storage module 303, and the write data message is also Used to indicate that the target data is stored to the storage location of the I / O device;
- the processing module 302 is further configured to send the write data message to the I / O device.
- the processing module 302 is further configured to send an interrupt request to the server after the processing of the first I / O request is completed, and the interrupt request is used by the server to determine the device's response to the first I / O request. process result.
- the device provided by the embodiment of the present invention can offload I / O services processed by the processor in the server to the device, the device completes the I / O processing process, and the device presents to the server as a virtualized input / output controller.
- the device provides I / O resources for use by the virtual machine.
- the virtual machine directly sends an I / O request to the device.
- the device processes the I / O request.
- the device processes the I / O request.
- I / O requests can optimize the processing of I / O requests, reduce the load on the server's processor, and directly handle I / O requests for virtual machines by devices to further improve virtualized I / O performance.
- the data processing device 300 may correspond to the device 102 in the embodiment of the present invention, and may correspond to the corresponding subject in the method 200 according to the embodiment of the present invention.
- the above and other operations and / or functions of the various modules are respectively implemented to implement the corresponding process of the method 200 in FIG. 3, and for the sake of brevity, they are not repeated here.
- An embodiment of the present invention provides a server.
- the server includes a data processing device 102 or a data processing device 300, and the server is configured to implement the operation steps of the data processing method described in FIG. 3.
- the above embodiments may be implemented in whole or in part by software, hardware, firmware, or any other combination.
- the above embodiments may be implemented in whole or in part in the form of a computer program product.
- the computer program product includes one or more computer instructions.
- the computer program instructions When the computer program instructions are loaded or executed on a computer, the processes or functions according to the embodiments of the present invention are wholly or partially generated.
- the computer may be a general-purpose computer, a special-purpose computer, a computer network, or other programmable devices.
- the computer instructions may be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be transmitted from a website site, computer, server, or data center through a cable (Such as coaxial cable, optical fiber, digital subscriber line (DSL)) or wireless (such as infrared, wireless, microwave, etc.) to another website site, computer, server, or data center.
- the computer-readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server, a data center, and the like, including one or more sets of available media.
- the usable medium may be a magnetic medium (for example, a floppy disk, a hard disk, a magnetic tape), an optical medium (for example, a DVD), or a semiconductor medium.
- the semiconductor medium may be a solid state drive.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Software Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Bus Control (AREA)
- Memory System Of A Hierarchy Structure (AREA)
Abstract
本申请提供了一种数据处理的方法,包括:服务器中的设备获取虚拟机发送的第一输入输出I/O请求,设备通过高速串行计算机扩展标准PCIe总线连接服务器,虚拟机运行在服务器上,设备向服务器提供多个虚拟功能VF,第一I/O请求为虚拟机针对多个VF中的任意一个VF发起的,第一I/O请求包括读操作或写操作,读操作用于对服务器的I/O设备执行读取数据操作,写操作用于对服务器的I/O设备执行写入数据操作;设备根据第一I/O请求,从服务器的I/O设备中读取或写入数据。该方法能够优化I/O处理过程,降低服务器的处理器的负载,由设备直接处理虚拟机的I/O请求也进一步提高了虚拟化I/O性能。
Description
本申请涉及服务器领域,并且更具体地,涉及数据处理的方法、设备和服务器。
在虚拟化场景中,虚拟机上的虚拟化输入/输出设备(virtual input/output device,Virtio)可以通过下述方式实现:
全虚拟化(full virtualization)方式,这种方式通过虚拟机监控器(virtual machine monitor,VMM)来模拟Virtio实现,VMM会截获虚拟机发起的I\O请求,并通过软件模拟真实的硬件。VMM必须处理所有虚拟机发起的I\O请求,然后将所有的IO请求序列化为可以被底层硬件处理的单一I\O流。
当虚拟机中的Virtio发起I\O请求时,虚拟机与虚拟机所在服务器之间会频繁地发生内核态与用户态切换等涉及上下文切换的操作,对I\O性能的影响较大。
此外,当VMM模拟多个Virtio时,虚拟机所在服务器中的中央处理器(central processing unit,CPU)的开销增大,性能降低。
发明内容
本申请提供一种数据处理的方法,该方法能够优化I/O路径,提高虚拟化I/O性能。
第一方面,提供了一种数据处理的方法,该方法包括:服务器中的设备获取虚拟机发送的第一输入输出I/O请求,该设备通过高速串行计算机扩展标准PCIe总线连接该服务器,该虚拟机运行在该服务器上,该设备向该服务器提供多个虚拟功能VF,该第一I/O请求为该虚拟机针对该多个VF中的任意一个VF发起的,该第一I/O请求包括读操作或写操作,该读操作用于对该服务器的I/O设备执行读取数据操作,该写操作用于对该服务器的I/O设备执行写入数据操作,该VF用于管理该虚拟机的存储空间;该设备根据该第一I/O请求,从该服务器的I/O设备中读取或写入数据。
在本申请中,通过将服务器中处理器处理的I/O业务卸载至设备,由该设备完成I/O处理过程,设备向服务器呈现为虚拟化输入/输出控制器,由该设备提供I/O资源供虚拟机使用,虚拟机直接向设备发起I/O请求,由设备处理该I/O请求。本申请的数据处理的方法能够优化I/O处理过程,降低服务器的处理器的负载,由设备直接处理虚拟机的I/O请求也进一步提高了虚拟化I/O性能。
在一种可能的实现方式中,该设备获取该虚拟机发送的第一I/O请求,包括:该设备从该第一队列获取第一描述符,该第一描述符为该虚拟机中的前端驱动对该第一I/O请求处理后生成的,该第一描述符用于指示该第一I/O请求在该服务器的内存中的存储位置,该第一队列存储至为该服务器的内存,该第一队列用于存储包括该第一I/O请求在内的多个I/O请求的描述符;该设备根据该第一描述符,从该服务器的内存中 获取该第一I/O请求。
在另一种可能的实现方式中,该设备包括描述符预取引擎与直接内存访问DMA引擎与存储器,该设备从该服务器为第一队列分配的存储空间中获取第一描述符,包括:该描述符预取引擎生成第二描述符,并将该第二描述符发送至该DMA引擎,该第二描述符用于指示该第一描述符在该服务器为该第一队列分配的存储空间中的存储位置;该DMA引擎根据该第二描述符,通过DMA的方式从该第一队列分配的存储空间中获取该第一描述符,并将该第一描述符存储至该设备的存储器中。
通过描述符预取引擎实现自动从服务器中获取描述符,当服务器中的虚拟机发起I/O请求,且由前端驱动通知设备服务器中有新的描述符可用时,描述符预取引擎则会自动从服务器中将描述符搬运至设备的本地内存中,从而加快I/O请求处理的速度。
在另一种可能的实现方式中,该设备包括中断产生模块与后端驱动,该设备根据该第一描述符,从该服务器的内存中获取该第一I/O请求,包括:该后端驱动处理该中断产生模块发起的中断请求,从该存储器中获取该第一描述符,并将该第一描述符发送至该DMA引擎;该DMA引擎根据该第一描述符,通过DMA的方式从该服务器的内存中获取该第一I/O请求。
在另一种可能的实现方式中,该设备还包括I/O设备引擎,该设备根据该第一I/O请求从该服务器的I/O设备中读取或写入数据,包括:当该第一I/O请求为该读操作时,该后端驱动根据该第一I/O请求,生成读数据报文,该读数据报文用于指示待读取的目标数据在该I/O设备中的存储位置,该读数据报文还用于指示读取到的该目标数据在该设备的存储器中的存储位置;该后端驱动通过该I/O设备引擎向该I/O设备发送该读数据报文;该后端驱动通知该DMA引擎将存储在该设备的存储器中的该目标数据存储至该服务器的内存中。
在另一种可能的实现方式中,该设备根据该第一I/O请求从该服务器的I/O设备中读取或写入数据,包括:当该第一I/O请求为该写操作时,该DMA引擎从该服务器的内存中获取待写入I/O设备的目标数据,并将该目标数据写入该设备的存储器中;该后端驱动根据该第一I/O请求,生成写数据报文,该写数据报文用于指示该目标数据在该存储器中的存储位置,该写数据报文还用于指示将该目标数据存储至该I/O设备的存储位置;该后端驱动向该I/O设备发送该写数据报文。
在另一种可能的实现方式中,该方法还包括:在该第一I/O请求处理完成之后,该后端驱动通过该DMA引擎向该服务器发送中断请求,该中断请求用于该服务器确定该设备对该第一I/O请求的处理结果。
第二方面,提供了一种数据处理的设备,该数据处理的设备配置于服务器中,该设备通过高速串行计算机扩展标准PCIe总线连接服务器,该数据处理的设备用于执行上述第一方面或第一方面的任一可能的实现方式中的方法。
第三方面,提供一种服务器,该服务器中配置有第二方面提供的数据处理的设备,该设备通过高速串行计算机扩展标准PCIe总线连接服务器。
第四方面,提供一种计算机可读存储介质,该计算机可读存储介质中存储有指令,当该指令在计算机上运行时,使得计算机执行第一方面或第一方面的任一可能的实现方式中的方法。
第五方面,提供一种包含指令的计算机程序产品,当该指令在计算机上运行时,使得计算机执行第一方面或第一方面的任一可能的实现方式中的方法。
图1是本发明实施例提供的虚拟机系统的示意性框图。
图2是本发明实施例提供的数据处理的设备的示意性框图。
图3为本发明实施例提供的数据处理的方法的示意性流程图。
图4为本发明实施例提供的数据处理的设备的另一示意性框图。
下面将结合附图,对本发明实施例中的技术方案进行描述。
图1示出了本发明实施例提供的服务器100的示意性框图。该服务器100上运行有至少一个虚拟机101,每个虚拟机101中配置有前端驱动1011,此外,该服务器100中还配置有一个用于数据处理的设备102,该设备102通过高速串行计算机扩展标准(Peripheral Component Interconnect express,PCIe)总线与服务器100之间进行通信。其中,数据处理的设备102用于对来自虚拟机的I/O请求进行处理,该I/O请求包括读操作或写操作。
需要说明的是,本发明实施例中的设备102可以支持与基于虚拟化输入/输出(virtual I/O,Virtio)协议开发的前端驱动进行通信,即,本发明实施例中的前端驱动1011可以是基于Virtio协议开发的。其中,基于Virtio协议开发的前端驱动的通用性强,不需要随着服务器100的内核版本的改变而改变,维护成本低。
设备102在具体实现时可以为片上系统(system on chip,SoC),该SoC可以是一张单独的PCIe卡,该PCIe卡可以部署在服务器100上,也可以直接和服务器100主板集成在一起。
下面结合图2对本发明实施例提供的数据处理的设备102进行简单介绍。
如图2所示,该设备102包括控制器1021、寄存器1022、描述符预取引擎1023、后端驱动1024、直接内存访问(direct memory access,DMA)引擎1025、中断产生模块1026、I/O设备引擎1027与存储器1028。
在本发明实施例中,设备102可以将与该设备102通过I/O设备引擎1027连接的I/O设备的一个物理功能(physical function,PF)通过硬件虚拟化(例如,单根输入输出虚拟化(single root I/O virtualization,SR-IOV)的方式向服务器100呈现多个虚拟功能(virtual function,VF),任意两个VF之间是相互隔离的,将该多个VF分配给多个虚拟机使用,每个虚拟机可以分配一个或多个VF。
例如,与设备102通过I/O设备引擎1027连接的I/O设备为存储设备,设备102向服务器100呈现256个VF,每个VF可以负责管理存储设备中的一块存储空间,此时,该256个VF对虚拟机而言则相当于256个存储控制器,256个VF总共可以使用1024个队列(queue)。
举例说明,该存储设备可以为固态硬盘(solid state drives,SSD)。关于队列的说明请见下文中的描述。
需要说明的是,本发明实施例中的设备102还可以通过I/O设备引擎1027与网络设备连接(例如,PCIe Switch),例如,设备102向服务器100呈现100个VF,此时,该100个VF对虚拟机而言相当于100个网络控制器。
此外,服务器100中还包括设备驱动103,服务器100中的处理器(图1中未示出)通过执行设备驱动103中的指令,从而对设备102的工作模式进行配置。所谓对设备102的工作模式进行配置,是指对设备102向服务器100呈现的多个VF中可用的VF的数量以及为每个VF分配的队列的数量进行配置。
如果需要在服务器100中使用设备102,则需要服务器100中的处理器执行设备驱动103中的指令,从而完成对设备103的工作模式的配置。例如,服务器100中的处理器执行设备驱动103中的指令,配置256个VF中存在50个VF对虚拟机可用,并且配置每个VF可以使用4个队列。
每个VF会对应多个数据结构,设备102会在其内存空间中划分出多个存储空间,每个存储空间用于存储一种数据结构,该多个存储空间可以属于一个基地址寄存器(base address register,BAR)空间,或者,每个存储空间也可以作为一个BAR空间使用,其中,每个BAR空间会被分配一个索引号。例如,设备102将公共配置能力(common configuration capability)数据结构、通知能力(notification capability)数据结构均存储在索引号为1的BAR空间中。
寄存器1022为设备102的配置空间,将该配置空间划分为多个配置子空间,将每个配置子空间作为一个VF的配置空间分配给该VF使用,每个VF的配置空间包括存储有设备ID(device ID)、厂商ID(vendor ID)、多个基地址(base address)字段、BAR空间索引号字段、多个偏移(offset)量字段以及多个存储长度(length)字段。
设备102在将数据结构存储至为该数据结构分配的BAR空间时,会将该数据结构在BAR空间中占据的存储空间的起始地址相对于BAR空间的起始地址的偏移量写入相应的偏移量字段,并将该数据结构在BAR空间中占据的存储空间的长度写入相应的存储长度字段。例如,设备102将公共配置能力数据结构存储在索引号为1的BAR空间,该数据结构在索引号为1的BAR空间中占据的存储空间的起始地址相对于BAR空间的起始地址的偏移量为10字节,该数据结构在BAR空间中的存储长度为100字节,设备102将通知能力数据结构存储在索引号为1的BAR空间,该数据结构在索引号为1的BAR空间中占据的存储空间的起始地址相对于BAR空间的起始地址的偏移量为110字节,该数据结构在BAR空间中的存储长度为50字节,则设备102分别将公共配置能力数据结构与通知能力数据结构在索引号为1的BAR空间中对应的偏移量与存储长度写入相应的VF的配置空间中。
其中,公共配置能力结构中包括为该VF分配的队列数量(num_queues)字段队列深度(queue_size)字段、队列使能(queue_enable)位。
需要说明的是,VF的各个数据结构是由虚拟化输入/输出(virtual I/O,Virtio)协议规定好的,即VF的各个数据结构中的哪些字段用于存储什么内容,均是由Virtio协议进行定义的。
应理解,在本发明实施例中,控制器1021可以是CPU,还可以是其他通用处理器、SoC、数字信号处理器(DSP)、专用集成电路(ASIC)、现场可编程门阵列(FPGA) 或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。通用处理器可以是微处理器或者是任何常规的处理器等。
存储器1028可以包括只读存储器和随机存取存储器,并向处理器提供指令和数据。存储器还可以包括非易失性随机存取存储器。例如,存储器还可以存储设备类型的信息。该存储器可以是易失性存储器或非易失性存储器,或可包括易失性和非易失性存储器两者。其中,非易失性存储器可以是只读存储器(read-only memory,ROM)、可编程只读存储器(programmable ROM,PROM)、可擦除可编程只读存储器(erasable PROM,EPROM)、电可擦除可编程只读存储器(electrically EPROM,EEPROM)或闪存。易失性存储器可以是随机存取存储器(random access memory,RAM),其用作外部高速缓存。通过示例性但不是限制性说明,许多形式的RAM可用,例如静态随机存取存储器(static RAM,SRAM)、动态随机存取存储器(DRAM)、同步动态随机存取存储器(synchronous DRAM,SDRAM)、双倍数据速率同步动态随机存取存储器(double data date SDRAM,DDR SDRAM)、增强型同步动态随机存取存储器(enhanced SDRAM,ESDRAM)、同步连接动态随机存取存储器(synchlink DRAM,SLDRAM)和直接内存总线随机存取存储器(direct rambus RAM,DR RAM)。
服务器100在使用设备102处理来自虚拟机的I/O请求之前,除了需要服务器100中的处理器执行设备驱动103中的指令外,还需要前端驱动1011完成对设备102的初始化,下面对前端驱动1011对设备102进行初始化的过程进行介绍。
(1)前端驱动1011将设备102中存储各个数据结构的BAR空间映射至服务器100的内存空间。
在服务器100上电时,服务器100会在其内存空间中划分出多个存储空间,并将该多个存储空间与设备102中的多个BAR空间之间建立映射关系,并将为每一个BAR空间在服务器100的内存空间中分配的存储空间的首地址写入相应的基地址字段,例如,将为索引号为0的BAR空间分配的存储空间的首地址写入索引号为0的基地址字段。
前端驱动1011加载后根据设备ID(device ID)、厂商ID便可探测到设备102,当前端驱动1011需要对设备102中的某个VF的BAR空间中存储的数据结构进行访问时,便可以根据该VF的配置空间中的存储的BAR空间的索引号,通过与该BAR空间的索引号相同的基地址字段,在服务器100的内存空间中找到为该BAR空间分配的存储空间的首地址,再根据该VF的配置空间中存储的偏移量字段与存储长度字段,在为该BAR空间分配的存储空间中确定待访问的数据结构的存储区域,从而以访问服务器100内存的方式实现对设备102中的BAR空间的访问。
(2)前端驱动1011向设备102发起复位操作。
前端驱动1011首先会对设备102进行复位以确保历史信息清空,此时控制器1021中的硬件逻辑会感知这步操作,并将公共配置能力数据结构以及通知能力数据结构中的各个字段进行复位。
(3)前端驱动1011获取I/O设备特征。
Virtio协议中定义了一连串的特征位来标识设备和驱动的能力,例如,如果I/O设备设备为网卡,则特征可以用于标识该网卡是否支持校验和卸载功能。
前端驱动1011会获取I/O设备的特征,设备特征(device_feature)在公共配置能力数据结构中,字段长度为4字节。在成功获取I/O设备的设备特征后,前端驱动1011本身也有一个支持的驱动特征集合,前端驱动1011会将从公共配置能力数据结构中获取的设备特征与自身的特征进行交集,并将取交集后的特征作为前端驱动1011支持的特征写入公共配置能力数据结构中的驱动特征(driver_feature)字段,从而将该驱动特征作为最终与设备102协商好的特征告知设备102。
(4)前端驱动1011逐一使能队列
前端驱动1011会在服务器100的内存空间中为各个VF的每个队列分配一块存储空间,每一块存储空间用于存储虚拟机针对相应VF发起的多个I/O请求对应的描述符、待处理的I/O请求的数量Avail_idx、待处理的I/O请求的多个描述符中的第一个描述符的索引Avail_entry[i].index、已经处理完成的I/O请求的数量Used_idx以及已经处理完成的I/O请求多个描述符中的第一个描述符的索引Used_entry[i].index,其中,i为I/O请求的索引号。关于描述符的说明请见下文中描述。
公共配置能力数据结构中存储有为队列分配的存储空间中存储描述符区域的首地址queue_desc、存储待处理的I/O请求的数量Avail_idx的区域的首地址queue_avail、存储已经处理完成的I/O请求的数量Used_idx的区域的首地址queue_used。此外,公共配置能力数据结构中还存储有队列选择(queue_select)字段,通过将该queue_select字段置为队列的索引号,代表公共配置能力数据结构中的各个字段属于queue_select字段所指示的队列。
前端驱动1011通过将公共配置能力数据结构中的queue_select字段置为不同队列的索引号,从而对属于各个队列的queue_desc、queue_avail以及queue_used分别进行配置,配置完成之后,前端驱动1011便可以将某个队列的公共配置能力数据结构中的队列使能(queue_enable)字段置为1,代表将该队列使能,即代表服务器100为该队列分配的内存空间可以被使用。
此外,前端驱动1011也会根据I/O设备支持的MSI-X中断的个数,在服务器100中为每个队列注册MSI-X中断。
下面对描述符进行简单介绍。
在本发明实施例中,虚拟机针对每个VF发出的I/O请求首先会被前端驱动1011获取,前端驱动1011会将I/O请求封装为描述符,并将描述符存储至为相应VF的队列分配的存储空间中。
描述符用于描述I/O请求,描述符的结构可以为:地址+长度+标志+next,next用于描述与该描述符在逻辑上相邻的下一个描述符的索引号,每个描述符的大小为16字节。例如,将一个I/O请求划分为8个片段,每个片段包括100字节,则该8个片段对应8个描述符,每个描述符用于指示一个片段在服务器100的内存中存储的地址,存储的长度、该片段的读\写标志以及与该描述符在逻辑上相邻的下一个描述符的索引号。
此外,为队列分配的存储空间中除了存储I/O请求对应的多个描述符外,还会存储虚拟机发起的待处理的I/O请求的数量Avail_idx与该I/O请求的多个描述符中的第一个描述符的索引号Avail_entry[i].index,i代表请求的索引号,待处理的I/O请求的 数量Avail_idx与I/O请求的多个描述符中的第一个描述符的索引号Avail_entry[i].index在该存储空间中的存储位置是相邻的。其中,待处理的I/O请求的数量Avail_idx会随着虚拟机发起的I/O请求的数量的增加而相应增加。
例如,待处理的I/O请求的索引号i=1,该I/O请求在索引号为2的队列中,设备102可以根据索引号为2的队列的公共配置能力数据结构中存储的queue_avail,在服务器100中为该队列分配的存储空间中首先找到该I/O请求的第一个描述符的索引号Avail_entry[1].index,再根据公共配置能力数据结构中存储的queue_desc与该I/O请求的第一个描述符的索引号Avail_entry[1].index,在该存储空间中找到该I/O请求的第一个描述符,再根据该I/O请求的第一个描述符中的next字段,获取属于该I/O请求的其他描述符,并根据获取到的多个描述符,处理该I/O请求。关于设备102处理I/O请求的具体过程请见下文中描述。
下面,结合图1中的虚拟机系统100以及图2中的设备102,以该I/O设备为存储设备为例对本发明实施例提供的数据处理的方法200进行说明。图3示出了方法200的示意性流程图,该方法至少包括以下步骤。
201,服务器100中的设备获取虚拟机发送的第一输入输出I/O请求,该设备通过高速串行计算机扩展标准PCIe总线连接该服务器100,该虚拟机运行在该服务器100上,该设备向该服务器100提供多个虚拟功能VF,该第一I/O请求为该虚拟机针对该多个VF中的任意一个VF发起的,该第一I/O请求包括读操作或写操作,该读操作用于对该服务器100的I/O设备执行读取数据操作,该写操作用于对该服务器100的I/O设备执行写入数据操作。
具体地,服务器100上运行的虚拟机可以针对设备102提供的某一个VF发起I/O请求(例如,第一I/O请求),该I/O请求可以为读操作或写操作,读操作用于对该服务器100的I/O设备执行读取数据操作,写操作用于对该服务器100的I/O设备执行写入数据操作。
202,设备根据该第一I/O请求,从该服务器100的I/O设备中读取或写入数据。
具体地,设备102首先获取虚拟机的I/O请求,如果该I/O请求为读操作,则设备102根据该I/O请求,对服务器100的I/O设备执行读取数据操作;或者,如果该I/O请求为写操作,则设备102根据该I/O请求,对该服务器100的I/O设备执行写入数据操作。
可选地,设备获取该虚拟机发送的第一I/O请求,包括:设备从第一队列获取第一描述符,该第一描述符为该前端驱动对该第一I/O请求处理后生成的,该第一描述符用于指示该第一I/O请求在该服务器100的内存中的存储位置,该第一队列存储至该服务器100的内存,该第一队列用于存储包括该第一I/O请求在内的多个I/O请求的描述符;设备根据该第一描述符,从该服务器100的内存中获取该第一I/O请求。
具体地,当虚拟机发出该I/O请求后,该I/O请求首先被前端驱动1011获取,前端驱动1011将该I/O请求封装为多个描述符(例如,第一描述符)。
前端驱动1011根据该I/O请求中携带的读取数据的目的地址或写入数据的目的地址,确定该目的地址属于I/O设备的哪一块存储空间,并根据确定的存储空间进一步 确定该存储空间所对应的VF(即,确定处负责管理该存储空间的VF),将生成的多个描述符存储至服务器100为该VF的某一个队列(例如,第一队列)分配的存储空间中。
设备102从服务器100为该队列分配的存储空间中获取该I/O请求的多个描述符,并根据该多个描述符,从服务器100的内存中获取该I/O请求。
需要说明的是,服务器100为每个队列分配的存储空间为服务器100的内存中的部分存储空间。
接下来,对设备102从服务器100的内存中获取第一I/O请求的第一描述符的方法以及设备102根据该第一描述符,从服务器100的内存中获取该第一I/O请求的方法分别进行说明。
该设备从该服务器100为第一队列分配的存储空间中获取第一描述符,包括:该描述符预取引擎生成第二描述符,并将该第二描述符发送至该DMA引擎,该第二描述符用于指示该第一描述符在该服务器100为该第一队列分配的存储空间中的存储位置;该DMA引擎根据该第二描述符,通过DMA的方式从该第一队列获取该第一描述符,并将该第一描述符存储至该设备102的存储器1028中。
具体地,在前端驱动1011将该I/O请求的多个描述符存储至服务器100为该队列分配的存储空间中后,前端驱动1011会对属于该队列的通知能力数据结构进行操作,例如,前端驱动1011在通知能力数据结构中的Notify字段中写入该队列的索引号(例如,索引号为2),描述符预取引擎1023中的硬件逻辑会感知该操作,从而通知描述符预取引擎1023该VF的索引号为2的队列中有新的描述符更新事件。
描述符预取引擎1023对索引号为2的队列的公共配置能力数据结构进行查看,从中获取服务器100中为该队列分配的存储空间中存储描述符区域的首地址queue_desc、存储待处理的I/O请求的数量Avail_idx的区域的首地址queue_avail,进而在服务器100分配给该队列的存储空间中定位到存储描述符的区域与存储待处理的I/O请求的数量的区域。
例如,该队列的存储空间中存储的待处理的I/O请求的数量为10,设备102本次处理的为索引号为6的I/O请求,描述符预取引擎1023根据该存储空间中存储的索引号为6的I/O请求的多个描述符中的第一个描述符的索引号Avail_entry[6].index与存储描述符区域的首地址queue_desc,从该存储空间中找到索引号为6的I/O请求的第一个描述符,再根据该第一个描述符中的next字段,在该存储空间中找到属于索引号为6的I/O请求的其他描述符。
描述符预取引擎1023可以针对在该存储空间中定位到的索引为6的I/O请求的多个描述符,生成多个DMA搬运描述符(例如,第二描述符),DMA搬运描述符的结构可以为:地址+长度,即描述符预取引擎1023针对多个描述符中的每个描述符,生成一个DMA搬运描述符,每个DMA搬运描述符中包括相应描述符在该存储空间中的存储位置(例如,起始地址与结束地址)与存储的长度。
描述符预取引擎1023将生成的多个DMA搬运描述符提供给DMA引擎1025,由DMA引擎1025从服务器100中的该存储空间中获取索引号为6的I/O请求的多个描述符,并将该多个描述符存储至设备102的存储器1028中。
通过描述符预取引擎实现自动从服务器中获取描述符,当服务器中的虚拟机发起I/O请求,且由前端驱动通知设备服务器中有新的描述符可用时,描述符预取引擎则会自动从服务器中将描述符搬运至设备的本地内存中,从而加快I/O请求处理的速度。
设备根据该第一描述符,从该服务器100的内存中获取该第一I/O请求,包括:后端驱动处理该中断产生模块发起的中断请求,从该存储器中获取该第一描述符,并将该第一描述符发送至该DMA引擎;DMA引擎根据该第一描述符,通过DMA的方式从该服务器100的内存中获取该第一I/O请求。
具体地,在DMA引擎1025将索引号为6的I/O请求的多个描述符全部存储至存储器1028中后,描述符预取引擎1023通过中断产生模块1026向设备102中的处理器(图2中未示出)发送中断请求,由于后端驱动1024已经提前将中断处理回调函数注册至设备102中的处理器中,因此,当设备102中的处理器处理DMA引擎1025发起的中断请求时,会进入后端驱动1024的处理逻辑。此时,后端驱动1024会从存储器1028中获取索引号为6的I/O请求的多个描述符,并将该多个描述符发送至DMA引擎,DMA引擎根据该多个描述符,从服务器100的内存中获取虚拟机发起的索引号为6的I/O请求。
通过本发明实施例中将服务器中处理器处理的I/O业务卸载至设备,由该设备完成I/O处理过程,设备向服务器呈现为虚拟化输入/输出控制器,由该设备提供I/O资源供虚拟机使用,虚拟机直接向设备发起I/O请求,由设备处理该I/O请求,相对于由VMM处理虚拟机发起的I/O请求,该方法能够优化I/O处理过程,降低服务器的处理器的负载,由设备直接处理虚拟机的I/O请求也进一步提高了虚拟化I/O性能。
当设备102从服务器100的内存中获取第一I/O请求后,可以根据第一I/O请求的类型,对该第一I/O请求进行处理。例如,当该第一I/O请求的类型为读操作时,设备102对该服务器100的I/O设备执行读取数据操作;当该第一I/O请求的类型为写操作时,设备102对该服务器100的I/O设备执行写入数据操作。
下面对设备102处理I/O请求的过程分场景(例如,写操作场景或读操作场景)进行说明。
场景1:设备根据该第一I/O请求从该服务器100的I/O设备中读取或写入数据,包括:当该第一I/O请求为该读操作时,该后端驱动根据该第一I/O请求,生成读数据报文,该读数据报文用于指示待读取的目标数据在该I/O设备中的存储位置,该读数据报文还用于指示读取到的该目标数据在该设备102的存储器1028中的存储位置;该后端驱动通过该I/O设备引擎向该I/O设备发送该读数据报文;该后端驱动通知该DMA引擎将存储在该设备102的存储器1028中的该目标数据存储至该服务器100的内存中。
具体地,当该I/O请求为读操作时,后端驱动1024将该I/O请求转换为符合设备102与I/O设备之间进行传输的读数据报文,例如,该I/O设备为网卡,则该读数据报文的格式符合设备102与网卡之间的网络传输所需的报文格式。
该读数据报文中包括待读取的目标数据在I/O设备中的存储位置以及将读取到的数据存储至存储器1028中的存储位置,I/O设备根据该读数据报文,从自身内存中读取该目标数据,并将读取到的目标数据通过I/O设备引擎1027存储至存储器1028中。
在I/O设备将从自身内存读取的目标数据存储至存储器1028中后,便会通知后端驱动1024数据已存储至存储器1028中,后端驱动1024再通知DMA引擎1025,由DMA引擎1025从存储器1028中获取读取到的目标数据,并将该目标数据写入服务器100内存中的该I/O请求所指示的存储空间。
场景2:设备根据该第一I/O请求从该服务器100的I/O设备中读取或写入数据,包括:当该第一I/O请求为写操作时,该DMA引擎从该服务器100的内存中获取待写入I/O设备的目标数据,并将该目标数据写入该设备102存储器1028中;该后端驱动根据该第一I/O请求,生成写数据报文,该写数据报文用于指示该目标数据在该存储器中的存储位置,该写数据报文还用于指示将该目标数据存储至该I/O设备的存储位置;该后端驱动向该I/O设备发送该写数据报文。
具体地,当该I/O请求为写操作时,后端驱动1024根据该I/O请求,首先通过DMA引擎1025从服务器100中获取待写入I/O设备的目标数据,由DMA引擎1025将获取到的目标数据存储存储器1028中。
后端驱动1024将该I/O请求转换为符合设备102与I/O设备之间进行传输的写数据报文,该写数据报文中包括目标数据存储在存储器1028中的存储位置,该写数据报文还包括将该目标数据写入I/O设备时,该数据在I/O设备中的存储位置。
I/O设备根据写数据报文,首先通过I/O设备引擎1027从存储器1028中获取待写入的目标数据,并将该目标数据写入写数据报文所指示的I/O设备中的存储空间。
针对场景1与场景2,当索引号为6的I/O请求处理完成之后,后端驱动1024向服务器100中的处理器(图1中未示出)上报该I/O请求的处理结果。
该方法还包括:在该第一I/O请求处理完成之后,该后端驱动通过该DMA引擎向该服务器100发送中断请求,该中断请求用于该服务器100确定该设备对该第一I/O请求的处理结果。
具体地,前面提到过,为队列分配的存储空间中存储有待处理的I/O请求的多个描述符、待处理的I/O请求的数量Avail_idx、待处理的I/O请求的多个描述符中的第一个描述符的索引号Avail_entry[i].index、已经处理完成的I/O请求的数量Used_idx与已经处理完成的I/O请求的第一个描述符的索引号Used_entry[i].index。
当后端驱动1024确定索引号为6的I/O请求处理完成之后,便会通过DMA引擎1025将Used_idx的值由10更新为9,将索引号为6的I/O请求的第一个描述符的索引号写入Used_entry[6].index。
当后端驱动1024对索引号为6的I/O请求处理完成之后,后端驱动1024会通过DMA引擎1025向服务器100中的处理器发送中断请求,由于前端驱动1011已经提前将中断处理回调函数注册至服务器100中的处理器中,因此,当服务器100中的处理器处理DMA引擎1025发起的中断请求时,会进入前端驱动1011的处理逻辑。
公共配置能力数据结构中存储有该队列中存储已经处理完成的I/O请求的数量Used_idx的区域的首地址queue_used,此时,前端驱动1011会根据该首地址queue_used,在服务器100中为该队列分配的存储空间中查看Used_entry[i].index,若存在与索引号6的I/O请求的第一个描述符的索引号相同的索引号时,则前端驱动1011确定设备102对索引号为6的I/O请求已经处理完成。
在本发明实施例中,当前端驱动1011对某个队列的通知能力数据结构中的Notify字段进行写操作时,同时本地内存有足够的空间可用于存储该队列中的I/O请求的多个描述符,此时即可以形成该队列的一个有效的RR调度请求。
然而,有可能多个队列中均存在待处理的I/O请求,此时,前端驱动1011在多个队列的多个Notify字段中分别写入了各个队列的索引号,此时便形成了多个RR调度请求。
首先对当存在多个队列中的I/O请求需要处理时,本发明实施例提供的轮询(Roud-Robin)调度方法进行说明。
具体地,在处理该多个RR调度请求时,控制器1021可以预先对该多个队列进行分组,例如,将多个队列分成了32组(即,32个调度组),每一个调度组中包括32个队列的调度请求。通过第一级的RR调度从32个调度组中选出一个有调度请求的调度组,然后针对该调度组,进行第二级RR调度,从该调度组中的32个队列中选择出一个队列,对该队列形成的RR调度请求预先进行处理。
在对该队列形成的RR调度请求预先进行处理时,描述符预取引擎1023需要确定该队列在服务器100的内存空间中是否有待处理的描述符后,才能开始进行有效地预取操作。只有在该队列在服务器100的内存空间中存在待处理的描述符的情况下,描述符预取引擎1023才可以从服务器100中取出待处理的描述符,否则就有可能向服务器100发起一些无用的读操作,浪费PCIe接口的带宽。因此,描述符预取引擎1023在从服务器100中预取I/O请求的多个描述符之前,有必要对是否从服务器100中预取I/O请求的多个描述符进行判断。
下面对本发明实施例提供的确定是否从服务器100中预取该队列中的I/O请求的多个描述符的方法进行说明。
在本发明实施例中,该队列中可能包括多个待处理I/O请求,在公共配置能力数据结构中存储有该队列中待处理的I/O请求的数量Avail_idx,在描述符预取引擎1023中配置两个寄存器,分别用于针对每个队列记录服务器100中的存储空间保存的待处理的I/O请求的数量Avail_idx与设备102已经处理完成的I/O请求的数量Avail_ring_index_engine。
描述符预取引擎1023根据该队列的索引号,在上述两个寄存器中获取属于该队列的Avail_idx与Avail_ring_index_engine。
当属于该队列的Avail_idx与Avail_ring_index_engine相等时,说明该队列中的I/O请求已经处理完成,此时描述符预取引擎1023不需要再从该队列的存储空间中获取描述符;
当属于该队列的Avail_idx与Avail_ring_index_engine不同时,说明该队列中还存在未处理的I/O请求,此时描述符预取引擎1023则需要继续从该队列的存储空间中获取描述符,并对获取的I/O请求进行处理。
上文结合图1至图3,描述了本发明实施例提供的数据处理的方法,下面结合图4描述本发明实施例提供的数据处理的设备。
图4为本发明实施例提供的数据处理的设备300的示意性框图,设备300配置于 服务器100中,并通过高速串行计算机扩展标准PCIe总线连接服务器100,设备300包括:获取模块301、处理模块302、存储模块303、中断产生模块304与I/O设备引擎模块305。
获取模块301,用于获取虚拟机发送的第一输入输出I/O请求,该第一I/O请求为该虚拟机针对该多个VF中的任意一个VF发起的,该第一I/O请求包括读操作或写操作,该读操作用于对该服务器的I/O设备执行读取数据操作,该写操作用于对该服务器的I/O设备执行写入数据操作,该VF用于管理该虚拟机的存储空间;
处理模块302,用于根据该第一I/O请求从该服务器的I/O设备中读取或写入数据。
可选地,该获取模块301,还用于从该第一队列获取第一描述符,该第一描述符为该虚拟机中的前端驱动对该第一I/O请求处理后生成的,该第一描述符用于指示该第一I/O请求在该服务器的内存中的存储位置,该第一队列存储至该服务器的内存,该第一队列用于存储包括该第一I/O请求在内的多个I/O请求的描述符;
该获取模块301,还用于根据该第一描述符,从该服务器的内存中获取该第一I/O请求。
可选地,该设备300还包括存储模块303,该处理模块302,还用于生成第二描述符,并将该第二描述符发送至该获取模块301,该第二描述符用于指示该第一描述符在该服务器为该第一队列分配的存储空间中的存储位置;
该获取模块301,还用于根据该第二描述符,通过DMA的方式从该第一队列分配的存储空间中获取该第一描述符,并将该第一描述符存储至该设备的存储模块303中。
可选地,该设备300还包括:中断产生模块304,该处理模块302,用于通过处理该中断产生模块304发起的中断请求,从该存储模块303中获取该第一描述符,并将该第一描述符发送至该获取模块301;
该获取模块301,还用于根据该第一描述符,通过DMA的方式从该服务器的内存中获取该第一I/O请求。
可选地,该设备300还包括I/O设备引擎模块305,该处理模块302,还用于当该第一I/O请求为该读操作时,根据该第一I/O请求,生成读数据报文,该读数据报文用于指示待读取的目标数据在该I/O设备中的存储位置,该读数据报文还用于指示读取到的该目标数据在该存储模块303中的存储位置;
该处理模块302,还用于通过该I/O设备引擎模块305向该I/O设备发送该读数据报文;
该处理模块302,还用于通知该获取模块301将存储在该存储模块303的该目标数据存储至该服务器的内存中。
可选地,该获取模块301,还用于当该第一I/O请求为该写操作时,从该服务器的内存中获取待写入I/O设备的目标数据,并将该目标数据写入该存储模块303中;
该处理模块302,还用于根据该第一I/O请求,生成写数据报文,该写数据报文用于指示该目标数据在该存储模块303中的存储位置,该写数据报文还用于指示将该目标数据存储至该I/O设备的存储位置;
该处理模块302,还用于向该I/O设备发送该写数据报文。
可选地,该处理模块302,还用于在该第一I/O请求处理完成之后,向该服务器发 送中断请求,该中断请求用于该服务器确定该设备对该第一I/O请求的处理结果。
本发明实施例提供的设备能够将服务器中处理器处理的I/O业务卸载至该设备,由该设备完成I/O处理过程,该设备向服务器呈现为虚拟化输入/输出控制器,由该设备提供I/O资源供虚拟机使用,虚拟机直接向设备发起I/O请求,由设备处理该I/O请求,相对于由VMM处理虚拟机发起的I/O请求,由该设备处理I/O请求能够优化I/O请求的处理过程,降低服务器的处理器的负载,由设备直接处理虚拟机的I/O请求也进一步提高了虚拟化I/O性能。
应理解,根据本发明实施例的数据处理的设备300可对应于本发明实施例中设备102,并可以对应于执行根据本发明实施例的方法200中的相应主体,并且数据处理的设备300中的各个模块的上述和其它操作和/或功能分别为了实现图3中的方法200的相应流程,为了简洁,在此不再赘述。
本发明实施例提供了一种服务器,该服务器包括数据处理设备102或数据处理设备300,该服务器用于实现图3所述的数据处理的方法的操作步骤。
上述实施例,可以全部或部分地通过软件、硬件、固件或其他任意组合来实现。当使用软件实现时,上述实施例可以全部或部分地以计算机程序产品的形式实现。该计算机程序产品包括一个或多个计算机指令。在计算机上加载或执行该计算机程序指令时,全部或部分地产生按照本发明实施例该的流程或功能。该计算机可以为通用计算机、专用计算机、计算机网络、或者其他可编程装置。该计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一个计算机可读存储介质传输,例如,该计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线(例如同轴电缆、光纤、数字用户线(DSL))或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。该计算机可读存储介质可以是计算机能够存取的任何可用介质或者是包含一个或多个可用介质集合的服务器、数据中心等数据存储设备。该可用介质可以是磁性介质(例如,软盘、硬盘、磁带)、光介质(例如,DVD)、或者半导体介质。半导体介质可以是固态硬盘。
以上该,仅为本发明实施例的具体实施方式,但本发明实施例的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本发明实施例揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本发明实施例的保护范围之内。因此,本发明实施例的保护范围应以该权利要求的保护范围为准。
Claims (15)
- 一种数据处理的方法,其特征在于,所述方法包括:服务器中的设备获取虚拟机发送的第一输入输出I/O请求,所述设备通过高速串行计算机扩展标准PCIe总线连接所述服务器,所述虚拟机运行在所述服务器上,所述设备向所述服务器提供多个虚拟功能VF,所述第一I/O请求为所述虚拟机针对所述多个VF中的任意一个VF发起的,所述第一I/O请求包括读操作或写操作,所述读操作用于对所述服务器的I/O设备执行读取数据操作,所述写操作用于对所述服务器的I/O设备执行写入数据操作,所述VF用于管理所述虚拟机的存储空间;所述设备根据所述第一I/O请求,从所述服务器的I/O设备中读取或写入数据。
- 根据权利要求1所述方法,其特征在于,所述设备获取所述虚拟机发送的第一I/O请求,包括:所述设备从第一队列获取第一描述符,所述第一描述符为所述虚拟机中的前端驱动对所述第一I/O请求处理后生成的,所述第一描述符用于指示所述第一I/O请求在所述服务器的内存中的存储位置,所述第一队列存储至所述服务器的内存,所述第一队列用于存储包括所述第一I/O请求在内的多个I/O请求的描述符;所述设备根据所述第一描述符,从所述服务器的内存中获取所述第一I/O请求。
- 根据权利要求2所述的方法,其特征在于,所述设备包括描述符预取引擎与直接内存访问DMA引擎与存储器,所述设备从所述服务器为第一队列分配的存储空间中获取第一描述符,包括:所述描述符预取引擎生成第二描述符,并将所述第二描述符发送至所述DMA引擎,所述第二描述符用于指示所述第一描述符在所述服务器为所述第一队列分配的存储空间中的存储位置;所述DMA引擎根据所述第二描述符,通过DMA的方式从所述第一队获取所述第一描述符,并将所述第一描述符存储至所述设备的存储器中。
- 根据权利要求3所述的方法,其特征在于,所述设备包括中断产生模块与后端驱动,所述设备根据所述第一描述符,从所述服务器的内存中获取所述第一I/O请求,包括:所述后端驱动通过处理所述中断产生模块发起的中断请求,从所述存储器中获取所述第一描述符,并将所述第一描述符发送至所述DMA引擎;所述DMA引擎根据所述第一描述符,通过DMA的方式从所述服务器的内存中获取所述第一I/O请求。
- 根据权利要求3或4所述的方法,其特征在于,所述设备还包括I/O设备引擎,所述设备根据所述第一I/O请求从所述服务器的I/O设备中读取或写入数据,包括:当所述第一I/O请求为所述读操作时,所述后端驱动根据所述第一I/O请求,生成读数据报文,所述读数据报文用于指示待读取的目标数据在所述I/O设备中的存储位置,所述读数据报文还用于指示读取到的所述目标数据在所述设备的存储器中的存储位置;所述后端驱动通过所述I/O设备引擎向所述I/O设备发送所述读数据报文;所述后端驱动通知所述DMA引擎将存储在所述设备的存储器中的所述目标数据 存储至所述服务器的内存中。
- 权利要求3或4所述的方法,其特征在于,所述设备根据所述第一I/O请求从所述服务器的I/O设备中读取或写入数据,包括:当所述第一I/O请求为所述写操作时,所述DMA引擎从所述服务器的内存中获取待写入I/O设备的目标数据,并将所述目标数据写入所述设备的存储器中;所述后端驱动根据所述第一I/O请求,生成写数据报文,所述写数据报文用于指示所述目标数据在所述存储器中的存储位置,所述写数据报文还用于指示将所述目标数据存储至所述I/O设备的存储位置;所述后端驱动向所述I/O设备发送所述写数据报文。
- 根据权利要求1至6中任一项所述的方法,其特征在于,所述方法还包括:在所述第一I/O请求处理完成之后,所述后端驱动通过所述DMA引擎向所述服务器发送中断请求,所述中断请求用于所述服务器确定所述设备对所述第一I/O请求的处理结果。
- 一种数据处理的设备,其特征在于,所述设备通过高速串行计算机扩展标准PCIe总线连接服务器,所述服务器上运行有虚拟机,所述设备向所述服务器提供多个虚拟功能VF,所述设备包括:获取模块,用于获取虚拟机发送的第一输入输出I/O请求,所述第一I/O请求为所述虚拟机针对所述多个VF中的任意一个VF发起的,所述第一I/O请求包括读操作或写操作,所述读操作用于对所述服务器的I/O设备执行读取数据操作,所述写操作用于对所述服务器的I/O设备执行写入数据操作,所述VF用于管理所述虚拟机的存储空间;处理模块,用于根据所述第一I/O请求,从所述服务器的I/O设备中读取或写入数据。
- 根据权利要求8所述设备,其特征在于,所述获取模块,还用于从第一队列获取第一描述符,所述第一描述符为所述虚拟机中的前端驱动对所述第一I/O请求处理后生成的,所述第一描述符用于指示所述第一I/O请求在所述服务器的内存中的存储位置,所述第一队列存储至为所述服务器的内存,所述第一队列用于存储包括所述第一I/O请求在内的多个I/O请求的描述符;所述获取模块,还用于根据所述第一描述符,从所述服务器的内存中获取所述第一I/O请求。
- 根据权利要求9所述的设备,其特征在于,所述设备还包括存储模块,所述处理模块,还用于生成第二描述符,并将所述第二描述符发送至所述获取模块,所述第二描述符用于指示所述第一描述符在所述服务器为所述第一队列分配的存储空间中的存储位置;所述获取模块,还用于根据所述第二描述符,通过DMA的方式从所述第一队列获取所述第一描述符,并将所述第一描述符存储至所述存储模块中。
- 根据权利要求10所述的设备,其特征在于,所述设备还包括:中断产生模块,所述处理模块,用于通过处理所述中断产生模块发起的中断请求,从所述存储模块中获取所述第一描述符,并将所述第一描述符发送至所述获取模块;所述获取模块,还用于根据所述第一描述符,通过DMA的方式从所述服务器的内存中获取所述第一I/O请求。
- 根据权利要求10或11所述的设备,其特征在于,所述设备还包括I/O设备引擎模块,所述处理模块,还用于当所述第一I/O请求为所述读操作时,根据所述第一I/O请求,生成读数据报文,所述读数据报文用于指示待读取的目标数据在所述I/O设备中的存储位置,所述读数据报文还用于指示读取到的所述目标数据在所述存储模块中的存储位置;所述处理模块,还用于通过所述I/O设备引擎模块向所述I/O设备发送所述读数据报文;所述处理模块,还用于通知所述获取模块将存储在所述存储模块中的所述目标数据存储至所述服务器的内存中。
- 权利要求10或11所述的设备,其特征在于,所述获取模块,还用于当所述第一I/O请求为所述写操作时,从所述服务器的内存中获取待写入I/O设备的目标数据,并将所述目标数据写入所述存储模块中;所述处理模块,还用于根据所述第一I/O请求,生成写数据报文,所述写数据报文用于指示所述目标数据在所述存储模块中的存储位置,所述写数据报文还用于指示将所述目标数据存储至所述I/O设备的存储位置;所述处理模块,还用于向所述I/O设备发送所述写数据报文。
- 根据权利要求8至13中任一项所述的设备,其特征在于,所述处理模块,还用于在所述第一I/O请求处理完成之后,通过所述DMA引擎向所述服务器发送中断请求,所述中断请求用于所述服务器确定所述设备对所述第一I/O请求的处理结果。
- 一种服务器,其特征在于,所述服务器中包括设备,所述设备通过高速串行计算机扩展标准PCIe总线连接所述服务器,所述设备用于执行所述权利要求8至14中任一项所述方法的操作步骤。
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
EP19847753.1A EP3822779B1 (en) | 2018-08-07 | 2019-05-05 | Data processing method and device, and server |
US17/165,158 US11636062B2 (en) | 2018-08-07 | 2021-02-02 | Data processing method and device, and server |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810893287.3 | 2018-08-07 | ||
CN201810893287.3A CN110825485A (zh) | 2018-08-07 | 2018-08-07 | 数据处理的方法、设备和服务器 |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US17/165,158 Continuation US11636062B2 (en) | 2018-08-07 | 2021-02-02 | Data processing method and device, and server |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2020029619A1 true WO2020029619A1 (zh) | 2020-02-13 |
Family
ID=69414017
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2019/085476 WO2020029619A1 (zh) | 2018-08-07 | 2019-05-05 | 数据处理的方法、设备和服务器 |
Country Status (4)
Country | Link |
---|---|
US (1) | US11636062B2 (zh) |
EP (1) | EP3822779B1 (zh) |
CN (1) | CN110825485A (zh) |
WO (1) | WO2020029619A1 (zh) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116209980A (zh) * | 2020-07-23 | 2023-06-02 | 华为技术有限公司 | 虚拟化环境中虚拟功能驱动器和物理功能驱动器之间的硬件能力协商机制 |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112650558B (zh) * | 2020-12-29 | 2022-07-05 | 优刻得科技股份有限公司 | 数据处理方法、装置、可读介质和电子设备 |
CN115629845B (zh) * | 2022-12-14 | 2023-04-11 | 北京云豹创芯智能科技有限公司 | Io数据的产生方法、装置、计算机设备和存储介质 |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101645832A (zh) * | 2009-05-07 | 2010-02-10 | 曙光信息产业(北京)有限公司 | 一种基于fpga的虚拟机网络数据包处理方法 |
US8239655B2 (en) * | 2010-01-18 | 2012-08-07 | Vmware, Inc. | Virtual target addressing during direct data access via VF of IO storage adapter |
CN107005495A (zh) * | 2017-01-20 | 2017-08-01 | 华为技术有限公司 | 用于转发数据包的方法、网卡、主机设备和计算机系统 |
Family Cites Families (28)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2005036367A2 (en) * | 2003-10-08 | 2005-04-21 | Unisys Corporation | Virtual data center that allocates and manages system resources across multiple nodes |
US7496695B2 (en) * | 2005-09-29 | 2009-02-24 | P.A. Semi, Inc. | Unified DMA |
US8230134B2 (en) * | 2009-06-01 | 2012-07-24 | Lsi Corporation | Fast path SCSI IO |
US9087200B2 (en) * | 2009-12-22 | 2015-07-21 | Intel Corporation | Method and apparatus to provide secure application execution |
TWI408557B (zh) * | 2010-03-18 | 2013-09-11 | Faraday Tech Corp | 高速輸入輸出系統及其節能控制方法 |
US9239796B2 (en) * | 2011-05-24 | 2016-01-19 | Ixia | Methods, systems, and computer readable media for caching and using scatter list metadata to control direct memory access (DMA) receiving of network protocol data |
CN102650976B (zh) * | 2012-04-01 | 2014-07-09 | 中国科学院计算技术研究所 | 一种支持单根io虚拟化用户级接口控制装置及其方法 |
US9183163B2 (en) * | 2012-06-27 | 2015-11-10 | Ubiquiti Networks, Inc. | Method and apparatus for distributed control of an interfacing-device network |
US20150281126A1 (en) * | 2014-03-31 | 2015-10-01 | Plx Technology, Inc. | METHODS AND APPARATUS FOR A HIGH PERFORMANCE MESSAGING ENGINE INTEGRATED WITHIN A PCIe SWITCH |
US20160154756A1 (en) * | 2014-03-31 | 2016-06-02 | Avago Technologies General Ip (Singapore) Pte. Ltd | Unordered multi-path routing in a pcie express fabric environment |
WO2015175942A1 (en) * | 2014-05-15 | 2015-11-19 | Carnegie Mellon University | Method and apparatus for on-demand i/o channels for secure applications |
US9842075B1 (en) * | 2014-09-12 | 2017-12-12 | Amazon Technologies, Inc. | Presenting multiple endpoints from an enhanced PCI express endpoint device |
US9459905B2 (en) * | 2014-12-16 | 2016-10-04 | International Business Machines Corporation | Implementing dynamic SRIOV virtual function resizing |
US10809998B2 (en) * | 2016-02-12 | 2020-10-20 | Nutanix, Inc. | Virtualized file server splitting and merging |
US10503684B2 (en) * | 2016-07-01 | 2019-12-10 | Intel Corporation | Multiple uplink port devices |
CN107894913B (zh) * | 2016-09-30 | 2022-05-13 | 超聚变数字技术有限公司 | 一种计算机系统和存储访问装置 |
US10528267B2 (en) * | 2016-11-11 | 2020-01-07 | Sandisk Technologies Llc | Command queue for storage operations |
US11055615B2 (en) * | 2016-12-07 | 2021-07-06 | Arilou Information Security Technologies Ltd. | System and method for using signal waveform analysis for detecting a change in a wired network |
PL3812900T3 (pl) * | 2016-12-31 | 2024-04-08 | Intel Corporation | Systemy, sposoby i aparaty do obliczania heterogenicznego |
CN106897106B (zh) * | 2017-01-12 | 2018-01-16 | 北京三未信安科技发展有限公司 | 一种sr‑iov环境下多虚拟机并发dma的顺序调度方法及系统 |
CN107463829B (zh) * | 2017-09-27 | 2018-08-21 | 山东渔翁信息技术股份有限公司 | 一种密码卡中dma请求的处理方法、系统及相关装置 |
CN107807843B (zh) * | 2017-10-26 | 2019-05-24 | 北京百度网讯科技有限公司 | 虚拟机中的i/o请求处理方法、设备及计算机可读介质 |
US10365826B1 (en) * | 2018-01-24 | 2019-07-30 | Micron Technology, Inc. | Command processing for a storage system |
US11379411B2 (en) * | 2019-01-07 | 2022-07-05 | Vast Data Ltd. | System and method for replicating file systems in remote object storages |
GB2621499B (en) * | 2019-02-25 | 2024-05-08 | Mobileye Vision Technologies Ltd | Systems and methods for vehicle navigation |
US11138116B2 (en) * | 2019-07-29 | 2021-10-05 | Xilinx, Inc. | Network interface device supporting multiple interface instances to a common bus |
CN117053814A (zh) * | 2020-03-30 | 2023-11-14 | 御眼视觉技术有限公司 | 使用电子地平线导航交通工具 |
US20220027379A1 (en) * | 2020-07-21 | 2022-01-27 | Observe, Inc. | Data capture and visualization system providing temporal data relationships |
-
2018
- 2018-08-07 CN CN201810893287.3A patent/CN110825485A/zh active Pending
-
2019
- 2019-05-05 WO PCT/CN2019/085476 patent/WO2020029619A1/zh unknown
- 2019-05-05 EP EP19847753.1A patent/EP3822779B1/en active Active
-
2021
- 2021-02-02 US US17/165,158 patent/US11636062B2/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101645832A (zh) * | 2009-05-07 | 2010-02-10 | 曙光信息产业(北京)有限公司 | 一种基于fpga的虚拟机网络数据包处理方法 |
US8239655B2 (en) * | 2010-01-18 | 2012-08-07 | Vmware, Inc. | Virtual target addressing during direct data access via VF of IO storage adapter |
CN107005495A (zh) * | 2017-01-20 | 2017-08-01 | 华为技术有限公司 | 用于转发数据包的方法、网卡、主机设备和计算机系统 |
Non-Patent Citations (1)
Title |
---|
See also references of EP3822779A4 * |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116209980A (zh) * | 2020-07-23 | 2023-06-02 | 华为技术有限公司 | 虚拟化环境中虚拟功能驱动器和物理功能驱动器之间的硬件能力协商机制 |
Also Published As
Publication number | Publication date |
---|---|
EP3822779A1 (en) | 2021-05-19 |
US11636062B2 (en) | 2023-04-25 |
CN110825485A (zh) | 2020-02-21 |
EP3822779B1 (en) | 2023-08-02 |
EP3822779A4 (en) | 2021-09-22 |
US20210157765A1 (en) | 2021-05-27 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10713074B2 (en) | Method, apparatus, and system for accessing storage device | |
US9727503B2 (en) | Storage system and server | |
CN107305534B (zh) | 同时进行内核模式访问和用户模式访问的方法 | |
WO2018076793A1 (zh) | 一种NVMe数据读写方法及NVMe设备 | |
US9696942B2 (en) | Accessing remote storage devices using a local bus protocol | |
US9467512B2 (en) | Techniques for remote client access to a storage medium coupled with a server | |
US11635902B2 (en) | Storage device processing stream data, system including the same, and operation method | |
US8806098B1 (en) | Multi root shared peripheral component interconnect express (PCIe) end point | |
US20190243757A1 (en) | Systems and methods for input/output computing resource control | |
US11636062B2 (en) | Data processing method and device, and server | |
EP4220419B1 (en) | Modifying nvme physical region page list pointers and data pointers to facilitate routing of pcie memory requests | |
CN106560791B (zh) | 高效虚拟i/o地址转换 | |
CN107967225B (zh) | 数据传输方法、装置、计算机可读存储介质和终端设备 | |
US11042495B2 (en) | Providing interrupts from an input-output memory management unit to guest operating systems | |
WO2015180598A1 (zh) | 对存储设备的访问信息处理方法和装置、系统 | |
JP7227907B2 (ja) | バイトアドレス可能メモリとして不揮発性メモリにアクセスする方法及び装置 | |
KR20160123986A (ko) | 불휘발성 메모리 장치, 및 그것을 포함하는 메모리 시스템 | |
US12105648B2 (en) | Data processing method, apparatus, and device | |
US20230359392A1 (en) | Non-volatile memory-based storage device, device controller and method thereof | |
US9727521B2 (en) | Efficient CPU mailbox read access to GPU memory | |
US20200379927A1 (en) | Providing Copies of Input-Output Memory Management Unit Registers to Guest Operating Systems | |
WO2020251790A1 (en) | Guest operating system buffer and log access by an input-output memory management unit | |
US11775451B2 (en) | Computing system for reducing latency between serially connected electronic devices | |
US20230385118A1 (en) | Selective execution of workloads using hardware accelerators | |
US20220342837A1 (en) | Peripheral component interconnect express device and operating method thereof |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 19847753 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
ENP | Entry into the national phase |
Ref document number: 2019847753 Country of ref document: EP Effective date: 20210215 |