CN117785766A - CPU and peripheral data transmission method based on DPDK+QDMA - Google Patents

CPU and peripheral data transmission method based on DPDK+QDMA Download PDF

Info

Publication number
CN117785766A
CN117785766A CN202311851614.6A CN202311851614A CN117785766A CN 117785766 A CN117785766 A CN 117785766A CN 202311851614 A CN202311851614 A CN 202311851614A CN 117785766 A CN117785766 A CN 117785766A
Authority
CN
China
Prior art keywords
qdma
cpu
dpdk
data
transmission
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311851614.6A
Other languages
Chinese (zh)
Inventor
马国瑞
胡亚平
袁海军
王峰
李晓慧
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
CLP Kesiyi Technology Co Ltd
Original Assignee
CLP Kesiyi Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by CLP Kesiyi Technology Co Ltd filed Critical CLP Kesiyi Technology Co Ltd
Priority to CN202311851614.6A priority Critical patent/CN117785766A/en
Publication of CN117785766A publication Critical patent/CN117785766A/en
Pending legal-status Critical Current

Links

Landscapes

  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The invention relates to a data transmission method of a CPU and a peripheral. A data transmission method of CPU and peripheral equipment based on DPDK+QDMA includes: carrying out packet processing on the transmission data; transmitting the data packet into a pre-allocated fixed kernel buffer area, wherein the buffer area is positioned in a user space; reading the data packet through a polling mode in the DPDK, and processing the data packet on the original address; carrying out data transmission in a QDMA multi-queue transmission mode; and address decoding is carried out through the peripheral equipment, so that data transmission is realized. The method provided by the invention utilizes the characteristics of DPDK to realize polling transmission and reception, reduces the copying times, reduces the overhead of system call and data packet copying, and improves the transmission efficiency; placing the TCP/IP protocol stack into a user mode or a custom protocol stack, and freeing up CPU performance; the data transmission mode based on QDMA multi-queue and SR-IOV technology can be applied to virtualized environment.

Description

CPU and peripheral data transmission method based on DPDK+QDMA
Technical Field
The invention belongs to the technical field of data transmission, and relates to a data transmission method of a CPU and a peripheral.
Background
The conventional PC architecture is based on PCIe bus, realizes high-speed data transmission between the CPU and the external equipment, adopts MMU (memory management unit) function to realize data transmission between the peripheral and the CPU at the earliest, and has limited CPU utilization rate because the CPU is required to participate in data transmission in the whole course, and then develops DMA (Direct Memory Access) transmission mode based on IOMMU function, wherein the transmission mode is shown in figure 1, so that the participation of the CPU in data transmission can be effectively reduced, and the data processing performance of the CPU is improved. But still requests the CPU in an interrupt fashion.
The PCIe-based PC architecture can be connected with an FPGA and other external devices, can be used as an intelligent network card and a network tester, and data transmitted between the CPU and the FPGA comprise various data such as flow generation parameters, flow statistics parameters, captured messages, protocol simulation, port configuration and the like. The traditional kernel driver uses system call READ, WRITE and Memory map to READ and WRITE characters, and when the READ, WRITE and Memory map mode system call occurs, kernel mode and user mode conversion can be generated, when multiple types and a large amount of data are transmitted, the frequent kernel mode and user mode switching can lead to low transmission efficiency and high transmission time delay in the discrete transmission process of a large amount of data with different functions.
When the Ethernet frame is transmitted and received, the data frame is sent to the pre-allocated kernel buffer area, the corresponding receiving descriptor ring is updated, the interrupt notification is generated, the Linix system generates corresponding information, the corresponding descriptor ring is updated, the received data frame is sent to the network stack in the kernel for processing, the corresponding data is copied to the corresponding socket after the network stack is processed, and is sent to the user space, and the data is needed to be taken into the kernel and then carried to the corresponding kernel buffer area when the data frame is sent, or the CPU kernel is needed to participate. The performance of the CPU will be reduced.
With the failure of the moore law of the CPU of the computer, the performance of the single-core CPU is improved from 50% to 10% each year, and the performance of the CPU is improved by adopting a multi-core multi-thread mode. With the continuous increase of application demands, the number of adopted CPU cores is continuously increased, and the application also has a virtualization trend. The traditional DMA can not realize the virtualization technology, and the application of the current multi-core CPU in the virtual machine is difficult to meet. How to realize high-efficiency transmission between the high-performance multi-core CPU and the peripheral equipment such as the FPGA becomes a problem to be concerned and solved.
Disclosure of Invention
Based on the defects of common DMA transmission in the prior art, the invention provides a multi-queue efficient data transmission method for a CPU and external equipment. The invention uses polling mode driver library in DPDK, virtual function interface/user interface (VFIO/UIO), and combines QDMA (Queue DMA) multi-queue transmission characteristics, and adopts user mode driving to reduce kernel copying and system calling times; adopting polling scheduling to avoid the cost of interruption during context switching; the multi-core parallel processing is adopted, and specific tasks are placed in the appointed CPU cores for processing, so that threads are prevented from being switched among the multi-cores; the effective bandwidth is improved by adopting a mode of accessing the memory in a multi-channel staggered way, the performance of multiple angles is improved, and the CPU resource waste caused by system call is avoided. The user state process can directly use the VFIO drive to access hardware, and then the multi-queue transmission mode of QDMA is matched, so that the direct interaction of the CPU user space and the peripheral equipment such as the FPGA is realized, the kernel space does not participate in data transmission and processing, the problem of low data transmission efficiency caused by context switching and system calling overhead between the CPU and the peripheral equipment is solved, and the efficient data transmission between the CPU and the peripheral equipment such as the FPGA is realized. The DPDK packet I/O mechanism has a multi-queue supporting function, can adapt to the multi-queue characteristic in QDMA, and the SR-IOV function of the QDMA is matched with a UIO/VFIO interface to support 1 Physical Function (PF) and 256 Virtual Functions (VF), so that a maximum of 256 high-speed virtualization environments are realized.
The invention provides a DPDK+QDMA-based data transmission method for a CPU and a peripheral, which comprises the following steps:
carrying out packet processing on the transmission data;
transmitting the data packet into a pre-allocated fixed kernel buffer area, wherein the buffer area is positioned in a user space;
reading the data packet through a polling mode in the DPDK, and processing the data packet on a preset fixed memory address;
carrying out data transmission in a QDMA multi-queue transmission mode;
and the peripheral performs address decoding and decoding to realize the data transmission between the CPU and the peripheral.
Preferably, the data packets include a header tag, the header tag being composed of a type, a length, and a start address, and each data packet has a length of not more than 4KB.
Preferably, the user state process accesses hardware through the VFIO drive of the DPDK, and realizes direct interaction between the CPU user space and the outside by matching with a QDMA multi-queue transmission mode.
Preferably, the multi-queue transmission mode of the QDMA supports 1024 queue buffers, wherein the queues can be individually configured through interface types.
Preferably, the virtual function interface/user interface in the DPDK cooperates with the SR-IOV function of the QDMA to allocate the queue as a resource to a plurality of physical functions and virtual functions, so as to realize multiple functions and use the same QDMA core with the virtualized application space, and realize a high-speed virtualized environment.
Compared with the prior art, the method has the following beneficial effects:
compared with the traditional DMA mode, the data transmission method of the CPU and the peripheral device based on DPDK+QDMA solves the problem of outstanding interrupt overhead, and when various and small batches of data are sent, the characteristics of DPDK are utilized to realize polling sending and receiving, reduce the copying times, reduce the overhead of system call and data packet copying, and put a TCP/IP protocol stack into a user mode to liberate the CPU performance; meanwhile, based on the QDMA multi-queue data transmission mode and the SR-IOV technology, the Physical Function (PF) and 256 Virtual Functions (VFs) can be supported, and the transmission mode can be applied to a virtualized environment.
Drawings
FIG. 1 is a diagram of a conventional DMA read process;
FIG. 2 is a schematic block diagram of a DPDK+QDMA-based data transmission method between a CPU and a peripheral;
fig. 3 is a diagram illustrating a packet structure in the method of the present invention.
Detailed Description
In order that the invention may be readily understood, a more particular description of the invention will be rendered by reference to specific embodiments that are illustrated below. This invention may, however, be embodied in many different forms and should not be construed as limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete.
The principle of the data transmission method of the CPU and the peripheral based on the DPDK+QDMA is shown in figure 2, a polling mode driving library, a virtual function interface/user interface (VFIO/UIO) in the DPDK are utilized on the basis of a DPDK framework, and the multi-queue transmission characteristic of the QDMA is combined, so that user mode driving is adopted, and the number of kernel copying and system calling times is reduced. Adopting polling scheduling to avoid the cost of interruption during context switching; the multi-core parallel processing is adopted, and specific tasks are placed in the appointed CPU cores for processing, so that threads are prevented from being switched among the multi-cores; the effective bandwidth is improved by adopting a mode of accessing the memory in a multi-channel staggered way, the performance of multiple angles is improved, and the CPU resource waste caused by system call is avoided.
The user state process can directly use the VFIO (virtual function interface) in the DPDK to drive and access the hardware, and then cooperates with the QDMA multi-queue transmission mode to realize the direct interaction of the CPU user space and the peripheral such as the FPGA, and the kernel space does not participate in the data transmission and processing, so as to solve the problem of low data transmission efficiency caused by context switching and system call overhead between the CPU and the peripheral, and realize the efficient transmission of data between the CPU and the peripheral such as the FPGA.
The invention provides a DPDK+QDMA-based data transmission method for a CPU and a peripheral, which specifically comprises the following steps:
(1) Carrying out packet processing on the transmission data;
in order to adapt to the user space polling mode of DPDK and to make efficient use of descriptor ring, the present invention proposes a novel data structure for performing packet processing on transmitted data on the basis of memory map, each packet length not exceeding 4KB, as shown in fig. 3, each data packet including a header tag composed of type, length, start address, and each data packet length not exceeding 4KB.
(2) The data packets are sent to a pre-allocated fixed kernel buffer, with the buffer located in user space,
(3) The application program can read the data packet in a continuous polling mode through a polling mode in the DPDK, and directly process the data packet at the original address to realize short packet transmission of the DPDK user space polling. The number of times of user state and kernel state data handling is reduced, the overhead of system call and data packet copying is reduced, and the process is simplified.
(4) Aiming at the application requirements of multiple threads, data transmission is carried out in a QDMA multi-queue transmission mode.
The QDMA multi-queue transmission mode is adopted, which supports 1024 queue buffers and SR-IOV, wherein the queues can be independently configured through interface types and can work in different modes. Each queue may provide a very low overhead setup time option and continuous update functionality based on the manner in which the DMA descriptors are loaded by a single queue. QDMA also supports SR-IOV functions, allocating queues as resources to multiple physical functions (Physical Function, PF) and Virtual Functions (VFs), implementing multiple functions and virtualized application space using the same QDMA core. An I/O device supports at most 256 Virtual Functions (VFs), achieves PF and VF combination, can reduce hardware cost, is based on a scheme of DPDK+QDMA, and achieves message transmission and control through a shared virtual queue mechanism by utilizing a front end driver Virtio in the DPDK and a back end driver vhost in a user mode, wherein the vhost adopts a shared memory technology, reduces cost of data transmission between original kernel modes, and achieves at most 256 high-speed virtualization environments.
(5) And the peripheral equipment performs address decoding, so that high-speed transmission of data between the CPU and the peripheral equipment such as the FPGA is realized.

Claims (5)

1. The data transmission method of the CPU and the peripheral equipment based on DPDK+QDMA is characterized by comprising the following steps:
carrying out packet processing on the transmission data;
transmitting the data packet into a pre-allocated fixed kernel buffer area, wherein the buffer area is positioned in a user space;
reading the data packet through a polling mode in the DPDK, and processing the data packet on a preset fixed memory address;
carrying out data transmission in a QDMA multi-queue transmission mode;
and the peripheral performs address decoding and decoding to realize the data transmission between the CPU and the peripheral.
2. The method for transmitting data between the CPU and the peripheral device based on dpdk+qdma according to claim 1, wherein: the data packets comprise head tags, the head tags consist of types, lengths and starting addresses, and the length of each data packet is not more than 4KB.
3. The method for transmitting data between the CPU and the peripheral device based on dpdk+qdma according to claim 1, wherein: the user state process accesses hardware through the VFIO drive of the DPDK and is matched with a QDMA multi-queue transmission mode to realize direct interaction between the CPU user space and the peripheral.
4. The method for transmitting data between the CPU and the peripheral device based on dpdk+qdma according to claim 1, wherein: the QDMA multi-queue transfer mode supports 1024 queue buffers, where the queues can be configured individually by interface type.
5. The method for transmitting data between the CPU and the peripheral device based on dpdk+qdma according to claim 1, wherein: the virtual function interface/user interface in the DPDK is matched with the SR-IOV function of the QDMA, the queues are used as resources to be distributed to a plurality of physical functions and virtual functions, the same QDMA core is used for realizing a plurality of functions and virtualized application space, and a high-speed virtualization environment is realized.
CN202311851614.6A 2023-12-29 2023-12-29 CPU and peripheral data transmission method based on DPDK+QDMA Pending CN117785766A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311851614.6A CN117785766A (en) 2023-12-29 2023-12-29 CPU and peripheral data transmission method based on DPDK+QDMA

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311851614.6A CN117785766A (en) 2023-12-29 2023-12-29 CPU and peripheral data transmission method based on DPDK+QDMA

Publications (1)

Publication Number Publication Date
CN117785766A true CN117785766A (en) 2024-03-29

Family

ID=90398063

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311851614.6A Pending CN117785766A (en) 2023-12-29 2023-12-29 CPU and peripheral data transmission method based on DPDK+QDMA

Country Status (1)

Country Link
CN (1) CN117785766A (en)

Similar Documents

Publication Publication Date Title
EP1896965B1 (en) Dma descriptor queue read and cache write pointer arrangement
KR100992282B1 (en) Apparatus and method for supporting connection establishment in an offload of network protocol processing
US7124207B1 (en) I2O command and status batching
KR101006260B1 (en) Apparatus and method for supporting memory management in an offload of network protocol processing
CN111651377B (en) Elastic shared buffer for on-chip message processing
US6307789B1 (en) Scratchpad memory
US20180225254A1 (en) Network communications using pooled memory in rack-scale architecture
CN113225307B (en) Optimization method, system and terminal for pre-reading descriptors in uninstalling engine network card
US20070271401A1 (en) Techniques to moderate interrupt transfer
US20190340725A1 (en) Direct memory access for graphics processing unit packet processing
US20180181421A1 (en) Transferring packets between virtual machines via a direct memory access device
CN110874336B (en) Distributed block storage low-delay control method and system based on Shenwei platform
CN115086310B (en) High-throughput low-delay data packet forwarding method
CN113407357B (en) Method and device for inter-process data movement
CN117785766A (en) CPU and peripheral data transmission method based on DPDK+QDMA
CN113014627A (en) Message forwarding method and device, intelligent terminal and computer readable storage medium
CN118192883A (en) Remote storage access-oriented request merging and scheduling method and device
US20240333541A1 (en) Data transmission device on server, data transmission method and program on server
CN116136790A (en) Task processing method and device
US20180167340A1 (en) Technologies for multi-core wireless network data transmission
EP4254207A1 (en) Data processing apparatus and method, and related device
CN110837482B (en) Distributed block storage low-delay control method, system and equipment
CN115529275B (en) Message processing system and method
JP7574902B2 (en) Intra-server delay control device, intra-server delay control method and program
CN117354254B (en) Combined interrupt control method and device based on LRO timeout and interrupt ITR timeout

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination