CN113419780B - DPDK driving system based on FPGA acceleration card - Google Patents

DPDK driving system based on FPGA acceleration card Download PDF

Info

Publication number
CN113419780B
CN113419780B CN202110500249.9A CN202110500249A CN113419780B CN 113419780 B CN113419780 B CN 113419780B CN 202110500249 A CN202110500249 A CN 202110500249A CN 113419780 B CN113419780 B CN 113419780B
Authority
CN
China
Prior art keywords
data
ddr
packet
size
data packet
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110500249.9A
Other languages
Chinese (zh)
Other versions
CN113419780A (en
Inventor
郭志川
王可
沙猛
黄逍颖
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhongkehai Suzhou Network Technology Co ltd
Institute of Acoustics CAS
Original Assignee
Zhongkehai Suzhou Network Technology Co ltd
Institute of Acoustics CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhongkehai Suzhou Network Technology Co ltd, Institute of Acoustics CAS filed Critical Zhongkehai Suzhou Network Technology Co ltd
Priority to CN202110500249.9A priority Critical patent/CN113419780B/en
Publication of CN113419780A publication Critical patent/CN113419780A/en
Application granted granted Critical
Publication of CN113419780B publication Critical patent/CN113419780B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F13/00Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F13/14Handling requests for interconnection or transfer
    • G06F13/20Handling requests for interconnection or transfer for access to input/output bus
    • G06F13/28Handling requests for interconnection or transfer for access to input/output bus using burst mode transfer, e.g. direct memory access DMA, cycle steal
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/4401Bootstrapping
    • G06F9/4403Processor initialisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/4401Bootstrapping
    • G06F9/4411Configuring for operating with peripheral devices; Loading of device drivers
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The invention discloses a DPDK driving system based on an FPGA acceleration card, which is deployed in an X86 server and comprises: the system comprises a DMA module, a data packet receiving module and a data packet sending module; the DMA module is used for dynamically adjusting the size of a DMA transmission data block according to the network flow, transmitting the data packet in the DDR of the FPGA accelerator card to a receiving memory of the server in a DMA mode, and transmitting the data packet in a transmitting memory of the server to the DDR of the FPGA accelerator card in a DMA mode by adopting a timeout packet supplementing mechanism; the data packet receiving module is used for analyzing the data packet in the memory received by the server, extracting the time stamp and the packet length information, and encapsulating the data packet into an mbuf data structure of the DPDK; and the data packet sending module is used for packaging the data packet to be sent in the mbuf data structure according to a preset format, adding the packet header information and copying the packet header information to a sending memory of the server.

Description

DPDK driving system based on FPGA acceleration card
Technical Field
The invention relates to the technical field of high-speed data packet acquisition by using an FPGA (field programmable gate array) network accelerator card, in particular to a DPDK (digital video broadcasting) driving system based on the FPGA accelerator card.
Background
The rapid development of the internet in the world is occurring today, the network traffic and the network rate are increasing, and the high-speed acquisition of network data is becoming the focus of research in academia and industry. In the fields of network traffic analysis, network data security and the like, the data volume is very large, so that the requirements on the network data acquisition rate in the fields are higher and higher. From 10Gb/s to 100Gb/s and beyond 100Gb/s, in order to adapt to higher and higher bandwidth, the acquisition rate of network data is continuously improved, and various software and hardware methods for high-speed data acquisition are continuously emerging.
Among various methods of high-speed network data acquisition, FPGAs have received increasing attention. The use of FPGAs for packet processing has many advantages over traditional software processing methods, such as faster processing speed, lower latency, etc. Thus, FPGA is one of the mainstream platforms for high-speed packet processing. The FPGA network accelerator card is connected to the server through a PCIe interface, the FPGA receives the data packet through the optical port, and the server performs other specific complex operations on the received data packet. To achieve these operations, it is necessary to accelerate the card's associated drivers through the FPGA.
DPDK (Data Plane Development Kit) is a set of development platforms and interfaces for fast processing data packets, running on Intel X86 and arm platforms. The DPDK adopts a polling mode to realize the data packet processing process. The processing mode provides a simple, feasible and efficient data packet processing mode for an application layer, so that the development of network application is more convenient. Therefore, in terms of high-speed packet processing, more and more developers choose to use DPDK for acquisition and processing of network data, so that DPDK is becoming a standard for network data processing. When a developer uses DPDK for packet processing programming, if the underlying platform switches, it typically requires a new interface to reprogram the application, making development cumbersome. Therefore, it is important that the underlying platform be "transparent" to the developer, and that the developer can run applications on different platforms without or with little modification, which results in a significant reduction in development effort.
In the field of high-performance processing of network data packets currently, most FPGA acceleration card products do not support DPDK; the traditional DPDK network card products, such as 710 tera network card of INTEL, do not support network programming. The DPDK driver is designed for the FPGA accelerator card, can support data packets with the size of 64B-10000B, has a double optical port of 2x10Gbps, and supports nanosecond time stamps. A novel method for supporting DPDK is provided for the FPGA accelerator card.
In high-speed network packet processing, developers typically use DPDK to develop high-speed network data processing upper layer applications. When the bottom layer platform is switched to the FPGA network acceleration card, the upper layer application usually needs a large amount of modification to finish development again, which is very complicated.
Disclosure of Invention
The invention aims to overcome the defects of the prior art and provides a DPDK driving system based on an FPGA acceleration card. Aiming at the situation, the invention designs the DPDK driver suitable for the FPGA network accelerator card, which can provide a DPDK standard function interface, so that an upper layer application developer can complete development without or with little change. And can realize the non-packet loss transmission and reception of the 64B small packet with the linear speed of 2X10 Gbps.
In order to achieve the above object, the present invention provides a DPDK driving system based on FPGA accelerator card, deployed in a server of X86, the system comprising: the system comprises a DMA module, a data packet receiving module and a data packet sending module; wherein, the liquid crystal display device comprises a liquid crystal display device,
the DMA module is used for dynamically adjusting the size of a DMA transmission data block according to the network flow, transmitting the data packet in the DDR of the FPGA accelerator card to a receiving memory of the server in a DMA mode, and transmitting the data packet in a transmitting memory of the server to the DDR of the FPGA accelerator card in a DMA mode by adopting a timeout packet supplementing mechanism;
the data packet receiving module is used for analyzing the data packet in the memory received by the server, extracting the time stamp and the packet length information, and encapsulating the data packet into an mbuf data structure of the DPDK;
the data packet sending module is used for packaging the data packet to be sent in the mbuf data structure according to a preset format, adding packet header information and copying the data packet to a sending memory of the server.
As an improvement of the system, the DDR of the FPGA accelerator card adopts a ring buffer structure, and comprises a write pointer ddr_ wrp and a read pointer ddr_ rdp; the receiving memory and the sending memory of the server adopt independent annular buffer structures and comprise a write pointer and a read pointer.
As an improvement of the system, the size of the DMA transmission data block is dynamically adjusted according to the network flow, and the data packet in the DDR of the FPGA accelerator card is transmitted to the receiving memory of the server in a DMA mode; the method comprises the following steps:
monitoring the network flow of an optical port of the FPGA acceleration card, and dynamically adjusting the size of a DMA transmission data block;
monitoring a write pointer ddr_ wrp and a read pointer ddr_ rdp of the DDR, and calculating the size of data to be transmitted;
the monitoring server receives a write pointer usr_ wrp and a read pointer usr_ rdp of the memory, and calculates the size of a writable space;
when the size of data to be transmitted and the size of the writable space meet the condition of one-time transmission, the data packet in the DDR of the FPGA accelerator card is transmitted to a receiving memory of the server in a DMA mode.
As an improvement of the system, when the size of the data to be transmitted and the size of the writable space meet the condition of one-time transmission, transmitting the data packet in the DDR of the FPGA accelerator card to the receiving memory of the server in a DMA mode; the method comprises the following steps:
calculating the size of data to be transmitted from the read pointer DDR rdp and the write pointer DDR wrp, and reading data between DDR rdp and DDR wrp when the read pointer DDR rdp is smaller than the write pointer DDR wrp; when the read pointer ddr_ rdp is greater than the write pointer ddr_ wrp, data between ddr_size-ddr_ rdp +ddr_ wrp is read; ddr_size is the size of DDR annular buffer;
if the read data size exceeds the data size N of one DMA transfer, then N is read from ddr_ rdp;
calculating the size of a writable space by a read pointer usr_ rdp and a write pointer usr_ wrp of a receiving ring buffer of the server, wherein when the read pointer usr_ rdp is larger than the write pointer usr_ wrp, the size of the writable space is usr_ rdp-usr_ wrp; when the read pointer usr_ rdp is smaller than the write pointer usr_ wrp, the writable space size is usr_size-usr_ wrp +usr_ rdp, and usr_size is the size of the receiving ring buffer of the server;
when the writable space is larger than N, copying the read data to a receiving annular buffer write pointer usr_ wrp of the server;
judging whether the usr_ wrp is larger than usr_size after increasing N, if yes, performing remainder operation to obtain a new position of a write pointer usr_ wrp, otherwise, increasing N by usr_ wrp;
judging whether the ddr_ rdp is larger than the ddr_size after increasing N, if so, performing a remainder operation to obtain a new writing pointer ddr_ rdp position, otherwise, increasing the ddr_ rdp by N.
As an improvement of the system, the data packet in the sending memory of the server is transmitted to the DDR of the FPGA acceleration card in a DMA mode by adopting a timeout packet supplementing mechanism; the method comprises the following steps:
when the size of data to be transmitted in a memory sent by a server reaches the size of one-time DMA transmission, the data is transmitted to the DDR of the FPGA acceleration card in a DMA mode;
when the retention time of data to be transmitted in a memory transmitted by a server exceeds a threshold value, performing packet supplementing processing, filling the data into the size of one-time transmission of a transmittable DMA, and transmitting the data into the DDR of the FPGA accelerator card.
As an improvement of the above system, the specific processing procedure of the data packet receiving module is as follows:
extracting a data packet from a read pointer usr_ rdp in a receiving ring buffer of a server; the data packet includes a packet header and a standard ethernet frame, the packet header including: a 6-byte preamble, 2-byte valid packet length information, and 8-byte timestamp information;
positioning a data packet head according to a preamble, extracting packet length information pkt_len, judging the validity of the pkt_len, and if the pkt_len is larger than the maximum packet length of an Ethernet data frame, the packet length information is wrong, and skipping 8bytes from the preamble to reposition the packet head;
comparing the length of the pkt_len with that of the actual data packet, and if the lengths are consistent, extracting the time stamp information of the data packet;
comparing the data area sizes mbuf_size of the pkt_len and the mbuf structure, and if the pkt_len is smaller than the mbuf_size, copying the data packet of the pkt_len into a designated mbuf; otherwise, sectionally filling the data with the length of pkt_len into the data areas of a plurality of mbufs by taking 2048bytes as a unit until the data areas are inseparable, placing the data with the residual length in the last mbuf, and sequentially linking the plurality of mbufs together; filling the number of mbuf linked with the data packet in an nb_segs field of the mbuf;
the time stamp information is padded into the mbuf designated area.
As a modification of the above system, the data packet of pkt_len is copied into a designated mbuf; the method comprises the following steps:
judging whether the current read pointer position usr_ rdp is moved backwards by the length of pkt_len and then exceeds a receiving memory preset by a server, if so, cutting off the data packet, copying the terminal data of the memory firstly, and positioning the read pointer usr_ rdp to the head address of the memory to copy the residual data.
As an improvement of the above system, the specific processing procedure of the data packet sending module is as follows:
extracting a data packet from mbuf, adding 6 byte lead code and 2 byte packet length information in the packet head;
reading an nb_segs field of an mbuf structure, and if the nb_segs field is 1, copying the data with the length of the data packet pkt_len to a transmission memory of a server; if nb_segs is greater than 1, sequentially extracting data in the link mbuf according to the value of nb_segs and the length of mbuf, and copying the data to a transmission memory of the server.
As an improvement of the above system, the system further comprises: the acceleration card driving binding module and the acceleration card configuration initializing module; wherein, the liquid crystal display device comprises a liquid crystal display device,
the acceleration card driving and binding module is used for realizing the binding of the FPGA acceleration card and the DPDK;
the acceleration card configuration initialization module is used for initializing the FPGA acceleration card, acquiring information of the FPGA acceleration card and configuring the FPGA acceleration card.
As an improvement of the above system, the specific processing procedure of the accelerator card driving binding module is as follows:
registering the driving system to a DPDK driving chain table;
and registering the equipment type number of the FPGA acceleration card to the equipment linked list of the DPDK.
Compared with the prior art, the invention has the advantages that:
1. the invention carries out dynamic DMA transmission by monitoring the actual network data flow, adopts the design of a timeout packet filling mechanism, and supports the zero-packet-loss efficient processing and the transmission of a huge frame of a data packet through a self-defined data packet package and a data packet analysis flow;
2. the FPGA acceleration card can realize the 64-10000bytes line speed receiving and transmitting package under the condition of double optical ports 2x10Gbps line speed, the same server is configured, and the performance of the package is better than that of an Intel ten-thousand megacard supporting DPDK; meanwhile, the driver supports nanosecond time stamping, and the resolution of the time stamping is better than that of a common DPDK tera-meganetwork card.
Drawings
FIG. 1 is a block diagram of a DPDK driving system based on an FPGA accelerator card of the present invention;
FIG. 2 is a diagram of a function call framework of the present invention;
FIG. 3 (a) is a DMA module reception flow chart of the present invention;
FIG. 3 (b) is a flow chart of the packet collection module of the present invention;
FIG. 4 is a dual port transceiver schematic diagram of the FPGA accelerator card of the present invention;
FIG. 5 (a) is a diagram illustrating an initial state of a ring buffer read/write pointer received by a server according to the present invention;
FIG. 5 (b) is a diagram of a server receiving ring cache write status of the present invention;
FIG. 5 (c) is a diagram illustrating a read status of a ring buffer received by a server according to the present invention;
FIG. 5 (d) is a schematic diagram of a server receiving a ring cache out-of-range state according to the present invention;
fig. 6 is a schematic diagram of the mbuf structure of DPDK of the present invention.
Detailed Description
The invention provides a method for realizing DPDK (Data Plane Development Kit) driving of an FPGA accelerator card, which combines FPGA accelerator card resources with a DPDK optimization processing function, thereby realizing high-performance processing of a data packet. Through the drive, a developer does not need to change the original DPDK interface function when writing the upper-layer application, and the bottom-layer processing operation is completed by the FPGA network acceleration card. By using the DPDK driver, a developer can realize configuration and management of the FPGA network acceleration card, realize collection and transmission of the data packet and perform other operations on the data packet. When a developer calls a driving function to collect a data packet, the data packet is transmitted from the DDR of the FPGA accelerator card to the host memory through a PCIe interface in a DMA mode, then the driver analyzes the data packet in the host memory, encapsulates the data packet in the host memory into an mbuf data structure of the DPDK, and fills related information. Meanwhile, the size of each DMA transfer can be adjusted according to the traffic size. When the data packet is sent, the driver analyzes the mbuf data structure, fills necessary information, places the data packet into a host memory, and then transmits the data packet to the DDR memory of the FPGA accelerator card in a DMA mode through a PCIe interface.
The system comprises: the system comprises an acceleration card driving binding module, an acceleration card configuration initialization module, a DMA module, a data packet receiving module and a data packet sending module. The data packet copied by the DMA carries packet length information, the packet length field being at the last 2 bytes of the preamble. The FPGA board card is bound with the DPDK, the receiving and the transmitting are all annular cache, the processing is carried out by taking a large data block as a unit in the case of high speed in data transmission, a time-out packet supplementing mechanism is adopted in the case of low speed, and the size of the data block can be dynamically adjusted according to the actual network flow, so that high throughput and low delay are ensured. Compared with the traditional method, the method has the advantages in the aspect of small packet (64B) line speed processing and occupies less CPU resources.
The FPGA network acceleration card DPDK driver provided by the invention can realize the functions of binding the FPGA network acceleration card, configuring and initializing the FPGA network acceleration card, transmitting and receiving data packets and the like by calling a DPDK standard function interface. When receiving the data packet, the data packet cached in the FPGA accelerator card DDR needs to be transmitted to a memory of a server through DMA, and then is copied from the memory to an mbuf structure of the DPDK for subsequent processing.
The invention can realize the following functions:
1. and controlling and managing the FPGA acceleration card. The FPGA acceleration card DPDK driver can register the driver, so that the binding of the FPGA acceleration card and the DPDK driver is realized, the initialization of the FPGA acceleration card is realized, the information of the acceleration card is obtained, and the acceleration card is configured.
2. A data packet is received. After the FPGA acceleration card is configured and initialized, a function interface for receiving the data packet in the driver can be called to realize the packet receiving function. Firstly, configuring a receiving queue by calling a function interface driven by DPDK; then starting the FPGA acceleration card by calling a function interface rte _eth_dev_start; and finally, calling a function interface for receiving the data packet to receive the data packet and process the data packet, and driving the data stored in the memory of the server to be put into an mbuf data structure of the DPDK in the form of the data packet to fill in related information. The data stored in the server memory is transferred from the FPGA accelerator card to the server memory by means of DMA.
3. And sending the data packet. And in the process of receiving the data packet, the data packet in the mbuf to be forwarded is copied to the designated forwarding memory through a packet sending interface function, integrated into large-block data and then transmitted to the board DDR through the DMA.
The specific process of DMA is as follows: judging whether one DMA can be performed, if so, finishing one DMA transmission by reading the equipment file, wherein the size of the DMA transmission is determined by the current flow. The specific process of sending DMA is as follows: judging whether the DMA can be performed once, if so, completing the DMA transmission once by writing the equipment file.
The simultaneous receiving and transmitting operation of the double network ports is supported. On the basis of the receiving and transmitting process, the network port communication state is monitored through the register, when two network ports are simultaneously communicated, the packet receiving DMA module polls and receives data of two board card packet receiving channels to a designated memory space, the packet sending DMA module can also poll and copy the data of two forwarding memories to corresponding two groups of board card DDRs, and through adding a double-thread mode in a test program, the interaction of the data between the two groups of memories and mbuf is respectively bound on two CPU processor cores, so that the independence of the two groups of receiving and transmitting channels is ensured.
Compared with other high-speed network processing systems based on the FPGA, the invention can lead the FPGA to support DPDK driving because the original application code needs to be changed greatly by the replacement of the bottom layer equipment, thereby leading the application code to be unnecessary or changed only slightly and reducing the complexity of application development. Compared with the transmission performance of the universal ten-thousand-megacard, under the same server environment, the Intel ten-thousand-megacard 64B small packet line speed double-optical-port test can only ensure that no packet is lost in 1Gbps, and the invention can realize the line speed no packet loss transmission of 20Gbps of the double-optical-port through a plurality of optimization mechanisms of large-block dynamic DMA transmission and transmission channels.
The technical scheme of the invention is described in detail below with reference to the accompanying drawings and examples.
Example 1
As shown in fig. 1, embodiment 1 of the present invention proposes a DPDK driving system based on FPGA accelerator card, deployed in a server of X86, the system comprising: the system comprises a DMA module, a data packet receiving module and a data packet sending module; wherein, the liquid crystal display device comprises a liquid crystal display device,
the DMA module is used for dynamically adjusting the size of a DMA transmission data block according to the network flow, transmitting the data packet in the DDR of the FPGA acceleration card to a receiving memory of the server in a DMA mode, and transmitting the data packet in a sending memory of the server to the DDR of the FPGA acceleration card in a DMA mode by adopting a timeout packet supplementing mechanism;
the data packet receiving module is used for analyzing the data packet in the memory received by the server, extracting the time stamp and the packet length information, and encapsulating the data packet into an mbuf data structure of the DPDK;
the data packet sending module is used for packaging the data packet to be sent in the mbuf data structure according to a preset format, adding packet header information and copying the data packet to a sending memory of the server.
The specific principle is as follows:
according to the application program driving detection matching process, an application program running in a DPDK environment registers a driving chain table before the program starts, scans a registration device chain table during initialization, and because an FPGA accelerator card can be directly identified on a server as PCIe (peripheral component interconnect express) equipment, DPDK acquires card equipment information by scanning a Linux system bus equipment catalog and hangs the card equipment information in the device chain table; in the DPDK PMD driving of the FPGA accelerator card, a written driving function is registered into a DPDK driving chain table by using a RTE_PMD_REGISTER_PCI interface function; and REGISTERs the device class number of the drive support using the rtepmdregister PCI TABLE function. After the program is started, the rte _bus_probe function under rte _ eal _init is called to match the corresponding driver in the device linked list and the driver linked list, so that the binding of the FPGA accelerator card and the DPDK PMD driver is completed.
FIG. 2 is a diagram of the basic functions and connectivity between the basic functions and the connectivity between the basic functions encapsulated in the driver. After the driver registration and binding are successful, a data path between the driver and the FPGA board card can be constructed, and the data path can be divided into configuration initialization of the FPGA acceleration card, starting and closing of the acceleration card, a DMA module, a data packet receiving module (RX) and a data packet forwarding module (TX). Next, a specific procedure of receiving a data packet under the drive of the card DPDK will be specifically described.
The system adopts a double-ring buffer structure, the memory size of the server is 128M, and the DDR (double data Rate) ring memory size of the board card is 8G. FIG. 3 (a) is a flow chart of DMA operations performed in a slave thread. The DMA starts by opening the device file before entering the slave thread loop. In the cycle, four judgments should be made before the data is copied in order to ensure the safety and effectiveness of the data transmission: and judging whether the DDR read pointer DDR rdp circularly utilizes the DDR memory at the initial position, further judging whether enough data exists in the DDR, and finally determining whether the space of the host memory is sufficient. If the conditions are not met, re-entering the loop according to the callback sequence of the flow chart, and when the conditions are met, performing DMA transmission with the size of N once from the DDR corresponding to the device file, and then changing ddr_ rdp and usr_ wrp according to the data change, and entering the next loop.
A specific procedure of judging whether or not there is enough readable data in the DDR. When the read pointer ddr_ rdp is less than ddr_ wrp, the value of ddr_ wrp-ddr_ rdp is a readable data size; or when ddr_ rdp is greater than ddr_ wrp, the value of 8G-ddr_ rdp +ddr_ wrp is a readable data size. The readable data size is larger than the size of one DMA, which indicates that the readable cache data in DDR is larger than the data of one DMA transmission, and the next step judgment of the program can be performed.
A specific process of determining whether the host memory has sufficient writable space. Because the host memory uses a ring memory structure, when the DMA size at a time is greater than the remaining writable size of the host memory, the newly written data will overwrite the unread data. When the host memory read pointer usr_ rdp is greater than the host memory write pointer usr_ wrp, the value of usr_ rdp-usr_ wrp is the writable space size; when the host memory read pointer usr_ rdp is smaller than the host memory write pointer, 128M-usr_ wrp +usr_ rdp is the writable space size. When the writable space is larger than the size of one DMA, the unread data can be ensured not to be covered, and one DMA can be performed.
After one DMA, the change condition of the write pointer of the host annular memory and the read pointer of the DDR annular memory is carried out. After DMA with the size of N is carried out once, the write pointer usr_ wrp of the host memory is increased by N, and when the increased value is larger than the size 128M of the host memory, residual operation is carried out, so that a new write pointer usr_ wrp position is obtained; and adding N to the DDR read pointer DDR rdp, and performing redundancy operation when the added value is larger than the DDR memory size 8G to obtain a new write pointer DDR rdp position.
The data packet receiving module is used for realizing data copying between a server memory and a DPDK mbuf structure, and a packet receiving link, in the eth_dsp_rx, one data packet is extracted from continuous large data in a host memory, is put into the mbuf structure applied from a memory pool, and basic information of the data packet is extracted to a designated position of the mbuf. Fig. 3 (b) shows a copy flow of a single packet: through the processing of the FPGA accelerator card, the first 512 bits of the data packet include a 6 byte preamble (FB 5555555555) and a two byte valid packet length field, followed by 8bytes of time stamp information, followed by a standard ethernet frame format. The data packets are all eight-byte aligned after being processed by hardware, so the design adopts an 8-byte copy detection mode, and the data packet field is checked by taking 8bytes as a unit. Since the preamble (FB 5555555555) field appears uniquely in each packet header, the packet header is first located according to the preamble, then the following packet length information is extracted, and the validity of the packet length is determined, whether the maximum packet length specification (9600 bytes) of the ethernet data frame is exceeded or not is determined, if yes, the packet length error is indicated, and the 8-byte relocation packet header is skipped. And secondly, DMA adopts data block transmission, the transmission boundary is not necessarily a complete data packet, when the data flow is smaller and the memory extraction speed is higher, the phenomenon that the last data packet in the memory is incomplete and one packet length transmission is not satisfied can occur, and then judgment is set, and pointer boundary crossing phenomenon is prevented by comparing the packet length and the actual memory distance. And after the packet length is detected correctly, extracting the time stamp information of the data packet, and extracting data from the memory to mbuf according to the pkt_len length. Before copying, judging whether the data packet is a macro frame according to the size (2048 bytes) of a data area of an mbuf structure, if not, directly copying the data packet of pkt_len into a designated mbuf, if so, sectionally filling data_room of a plurality of mbufs by taking 2048bytes as a unit with data of pkt_len length until the data is inseparable, placing the data of the rest length into the last mbuf, and finally linking the plurality of mbufs together in sequence. Before copying, the judgment of the read pointer to prevent the boundary crossing should be performed, wherein the boundary crossing prevention means that: because the host memory is set to 128M for cyclic coverage, it should be judged whether the current read pointer position usr_ rdp is moved back by pkt_len length and then is out of range by 128M, if so, the data packet should be truncated, the memory end data is copied first, and then the read pointer is positioned to the memory head address to copy the remaining data. The upper layer application can transfer the number of the data packets which want to be transmitted in a burst to the eth_dsp_rx function through the rte _eth_rx_burst function, and the specified number of the data packets are copied to an mbuf structure of the DPDK in sequence by using a for-loop statement.
The data forwarding path and the receiving path are opposite, the eth_dsp_tx function is called to transfer the mbuf structure data to be forwarded to the rte _eth_tx_burst function, and in order to facilitate the identification and further processing of the data packet by the board card forwarding function, the function adds a preamble and packet length information to the data packet, and performs octet alignment processing on the data packet. For a huge frame, the mbuf- > nb_segs field can be used for judging, the mark identifies how many mbufs are linked together, and the mark is equal to 1, which indicates that no linked mbufs (not the huge frame) exist after the mbuf- > next, pkt_len length data can be directly transmitted, and if the mark is not equal to 1, the data in the linked mbufs are sequentially extracted according to the identification number and the data_len. And then, the annular buffer structure is also utilized to dynamically transfer the memory data to the board DDR. Because the AXI4 bus bandwidth of the FPGA board card is set to 512 bits, in order to avoid the problem of packet loss caused by retention of data packets in the board card, a time-out packet supplementing mechanism is added to a DMA module of the forwarding part, and when no data is transmitted within a certain time and the data remaining in the memory is less than 512 bits, the packet supplementing operation is carried out on the data.
Fig. 4 is a process of constructing dual-port transceiving using dual threads on the basis of a single-port transceiving test. As shown in the figure, the board DDR is divided into four parts by taking 4G as a unit, and receives the receiving and transmitting paths of the two optical ports respectively, and the receiving and transmitting paths correspond to four memory areas of the upper computer respectively through a DMA transmission mechanism. And in the DMA module, detecting the communication state of the optical ports through the register, and if the two ports are simultaneously communicated, the packet receiving DMA polls and receives the data of the two groups of packet receiving DDR to the corresponding two groups of memory areas according to the communication zone bit, and also transmits the data of the two forwarding memories to the corresponding board card DDR space in a polling way by the packet sending DMA. In the single-port transceiving operation, a group of transceiving queues are arranged to respectively correspond to the data receiving memory and the data forwarding memory. In order to ensure the independence of data receiving and transmitting of the two optical ports, the transmission performance is not affected. And in the test program, two queues are respectively arranged for the receiving and transmitting functions to correspond to different memory spaces, and a pair of receiving and transmitting queues of each optical port are operated on one processor core by using the rte-eal-mp-remote-queue function, so that the shared resources and the dependency relationship of two groups of receiving and transmitting paths are reduced, and the high-performance processing during single-port receiving and transmitting of the data packet is ensured.
Fig. 5 (a), (b), (c) and (d) show that, in conjunction with the above specific operation process, the ring buffer has the function of making the read-write module not only keep independent operation, but also need interdependence, the DMA module is responsible for changing the write pointer, monitoring the read pointer to prevent write cross-boundary, the packet receiving module is responsible for changing the read pointer, monitoring the write pointer to prevent read cross-boundary, and both modules need to take into account the memory 128M boundary copy. Fig. 6 is a schematic diagram of mbuf structure.
Finally, it should be noted that the above embodiments are only for illustrating the technical solution of the present invention and are not limiting. Although the present invention has been described in detail with reference to the embodiments, it should be understood by those skilled in the art that modifications and equivalents may be made thereto without departing from the spirit and scope of the present invention, which is intended to be covered by the appended claims.

Claims (7)

1. A DPDK driver system based on FPGA accelerator card deployed in a server of X86, the system comprising: the system comprises a DMA module, a data packet receiving module and a data packet sending module; wherein, the liquid crystal display device comprises a liquid crystal display device,
the DMA module is used for dynamically adjusting the size of a DMA transmission data block according to the network flow, transmitting the data packet in the DDR of the FPGA accelerator card to a receiving memory of the server in a DMA mode, and transmitting the data packet in a transmitting memory of the server to the DDR of the FPGA accelerator card in a DMA mode by adopting a timeout packet supplementing mechanism;
the data packet receiving module is used for analyzing the data packet in the memory received by the server, extracting the time stamp and the packet length information, and encapsulating the data packet into an mbuf data structure of the DPDK;
the data packet sending module is used for packaging the data packet to be sent in the mbuf data structure according to a preset format, adding packet header information and copying the packet header information to a sending memory of the server;
the DDR of the FPGA accelerator card adopts an annular buffer structure and comprises a write pointer ddr_ wrp and a read pointer ddr_ rdp; the receiving memory and the sending memory of the server adopt independent annular buffer structures and comprise a write pointer and a read pointer;
the size of a DMA transmission data block is dynamically adjusted according to the network flow, and data packets in the DDR of the FPGA acceleration card are transmitted to a receiving memory of a server in a DMA mode; the method comprises the following steps:
monitoring the network flow of an optical port of the FPGA acceleration card, and dynamically adjusting the size of a DMA transmission data block;
monitoring a write pointer ddr_ wrp and a read pointer ddr_ rdp of the DDR, and calculating the size of data to be transmitted;
the monitoring server receives a write pointer usr_ wrp and a read pointer usr_ rdp of the memory, and calculates the size of a writable space;
when the size of data to be transmitted and the size of the writable space meet the condition of one-time transmission, transmitting the data packet in the DDR of the FPGA accelerator card to a receiving memory of a server in a DMA mode;
the method comprises the steps that a time-out packet supplementing mechanism is adopted to transmit data packets in a transmitting memory of a server to a DDR of an FPGA acceleration card in a DMA mode; the method comprises the following steps:
when the size of data to be transmitted in a memory sent by a server reaches the size of one-time DMA transmission, the data is transmitted to the DDR of the FPGA acceleration card in a DMA mode;
when the retention time of data to be transmitted in a memory transmitted by a server exceeds a threshold value, performing packet supplementing processing, filling the data into the size of one-time transmission of a transmittable DMA, and transmitting the data into the DDR of the FPGA accelerator card.
2. The FPGA accelerator card-based DPDK driver system according to claim 1, wherein when the size of the data to be transmitted and the size of the writable space satisfy the condition of one transmission, the data packet in the DDR of the FPGA accelerator card is transmitted to the receiving memory of the server in a DMA manner; the method comprises the following steps:
calculating the size of data to be transmitted from the read pointer DDR rdp and the write pointer DDR wrp, and reading data between DDR rdp and DDR wrp when the read pointer DDR rdp is smaller than the write pointer DDR wrp; when the read pointer ddr_ rdp is greater than the write pointer ddr_ wrp, data between ddr_size-ddr_ rdp +ddr_ wrp is read; ddr_size is the size of DDR annular buffer;
if the read data size exceeds the data size N of one DMA transfer, then N is read from ddr_ rdp;
calculating the size of a writable space by a read pointer usr_ rdp and a write pointer usr_ wrp of a receiving ring buffer of the server, wherein when the read pointer usr_ rdp is larger than the write pointer usr_ wrp, the size of the writable space is usr_ rdp-usr_ wrp; when the read pointer usr_ rdp is smaller than the write pointer usr_ wrp, the writable space size is usr_size-usr_ wrp +usr_ rdp, and usr_size is the size of the receiving ring buffer of the server;
when the writable space is larger than N, copying the read data to a receiving annular buffer write pointer usr_ wrp of the server;
judging whether the usr_ wrp is larger than usr_size after increasing N, if yes, performing remainder operation to obtain a new position of a write pointer usr_ wrp, otherwise, increasing N by usr_ wrp;
judging whether the ddr_ rdp is larger than the ddr_size after increasing N, if so, performing a remainder operation to obtain a new writing pointer ddr_ rdp position, otherwise, increasing the ddr_ rdp by N.
3. The DPDK driving system based on FPGA accelerator card according to claim 2, wherein the specific processing procedure of the data packet receiving module is as follows:
extracting a data packet from a read pointer usr_ rdp in a receiving ring buffer of a server; the data packet includes a packet header and a standard ethernet frame, the packet header including: a 6-byte preamble, 2-byte valid packet length information, and 8-byte timestamp information;
positioning a data packet head according to a preamble, extracting packet length information pkt_len, judging the validity of the pkt_len, and if the pkt_len is larger than the maximum packet length of an Ethernet data frame, the packet length information is wrong, and skipping 8bytes from the preamble to reposition the packet head;
comparing the length of the pkt_len with that of the actual data packet, and if the lengths are consistent, extracting the time stamp information of the data packet;
comparing the data area sizes mbuf_size of the pkt_len and the mbuf structure, and if the pkt_len is smaller than the mbuf_size, copying the data packet of the pkt_len into a designated mbuf; otherwise, sectionally filling the data with the length of pkt_len into the data areas of a plurality of mbufs by taking 2048bytes as a unit until the data areas are inseparable, placing the data with the residual length in the last mbuf, and sequentially linking the plurality of mbufs together; filling the number of mbuf linked with the data packet in an nb_segs field of the mbuf;
the time stamp information is padded into the mbuf designated area.
4. The FPGA accelerator card based DPDK driver system according to claim 3, wherein said data packet of pkt_len is copied to a designated mbuf; the method comprises the following steps:
judging whether the current read pointer position usr_ rdp is moved backwards by the length of pkt_len and then exceeds a receiving memory preset by a server, if so, cutting off the data packet, copying the terminal data of the memory firstly, and positioning the read pointer usr_ rdp to the head address of the memory to copy the residual data.
5. The DPDK driving system based on FPGA accelerator card according to claim 4, wherein the specific processing procedure of the data packet sending module is as follows:
extracting a data packet from mbuf, adding 6 byte lead code and 2 byte packet length information in the packet head;
reading an nb_segs field of an mbuf structure, and if the nb_segs field is 1, copying the data with the length of the data packet pkt_len to a transmission memory of a server; if nb_segs is greater than 1, sequentially extracting data in the link mbuf according to the value of nb_segs and the length of mbuf, and copying the data to a transmission memory of the server.
6. The FPGA accelerator card based DPDK driver system according to claim 1, further comprising: the acceleration card driving binding module and the acceleration card configuration initializing module; wherein, the liquid crystal display device comprises a liquid crystal display device,
the acceleration card driving and binding module is used for realizing the binding of the FPGA acceleration card and the DPDK;
the acceleration card configuration initialization module is used for initializing the FPGA acceleration card, acquiring information of the FPGA acceleration card and configuring the FPGA acceleration card.
7. The FPGA accelerator card based DPDK driver system according to claim 6, wherein the accelerator card driver binding module has the following specific processing procedures:
registering the driving system to a DPDK driving chain table;
and registering the equipment type number of the FPGA acceleration card to the equipment linked list of the DPDK.
CN202110500249.9A 2021-05-08 2021-05-08 DPDK driving system based on FPGA acceleration card Active CN113419780B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110500249.9A CN113419780B (en) 2021-05-08 2021-05-08 DPDK driving system based on FPGA acceleration card

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110500249.9A CN113419780B (en) 2021-05-08 2021-05-08 DPDK driving system based on FPGA acceleration card

Publications (2)

Publication Number Publication Date
CN113419780A CN113419780A (en) 2021-09-21
CN113419780B true CN113419780B (en) 2023-05-12

Family

ID=77712124

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110500249.9A Active CN113419780B (en) 2021-05-08 2021-05-08 DPDK driving system based on FPGA acceleration card

Country Status (1)

Country Link
CN (1) CN113419780B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114095251B (en) * 2021-11-19 2024-02-13 南瑞集团有限公司 SSLVPN implementation method based on DPDK and VPP
CN115412502B (en) * 2022-11-02 2023-03-24 之江实验室 Network port expansion and message rapid equalization processing method

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106385379A (en) * 2016-09-14 2017-02-08 杭州迪普科技有限公司 Message caching method and device
CN107436855A (en) * 2016-05-25 2017-12-05 三星电子株式会社 QOS cognition IO management for the PCIE storage systems with reconfigurable multiport
CN112131164A (en) * 2020-09-23 2020-12-25 山东云海国创云计算装备产业创新中心有限公司 Data scheduling method and device applied to acceleration board card, acceleration board card and medium
CN112422448A (en) * 2020-08-21 2021-02-26 苏州浪潮智能科技有限公司 FPGA accelerator card network data transmission method and related components
CN112637080A (en) * 2020-12-14 2021-04-09 中国科学院声学研究所 Load balancing processing system based on FPGA
CN112765054A (en) * 2019-11-01 2021-05-07 中国科学院声学研究所 High-speed data acquisition system and method based on FPGA

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107436855A (en) * 2016-05-25 2017-12-05 三星电子株式会社 QOS cognition IO management for the PCIE storage systems with reconfigurable multiport
CN106385379A (en) * 2016-09-14 2017-02-08 杭州迪普科技有限公司 Message caching method and device
CN112765054A (en) * 2019-11-01 2021-05-07 中国科学院声学研究所 High-speed data acquisition system and method based on FPGA
CN112422448A (en) * 2020-08-21 2021-02-26 苏州浪潮智能科技有限公司 FPGA accelerator card network data transmission method and related components
CN112131164A (en) * 2020-09-23 2020-12-25 山东云海国创云计算装备产业创新中心有限公司 Data scheduling method and device applied to acceleration board card, acceleration board card and medium
CN112637080A (en) * 2020-12-14 2021-04-09 中国科学院声学研究所 Load balancing processing system based on FPGA

Also Published As

Publication number Publication date
CN113419780A (en) 2021-09-21

Similar Documents

Publication Publication Date Title
US5299313A (en) Network interface with host independent buffer management
CN113419780B (en) DPDK driving system based on FPGA acceleration card
US7669000B2 (en) Host bus adapter with multiple hosts
US6223305B1 (en) Method and apparatus for resetting, enabling and freezing a communication device in a diagnostic process
JP3863912B2 (en) Automatic start device for data transmission
EP0607412B1 (en) Network adapter with host indication optimization
US5530874A (en) Network adapter with an indication signal mask and an interrupt signal mask
US8392632B2 (en) Method and apparatus for data processing in mobile communication system
CN102185833B (en) Fiber channel (FC) input/output (I/O) parallel processing method based on field programmable gate array (FPGA)
US20200081850A1 (en) Unified address space for multiple hardware accelerators using dedicated low latency links
JPS6352260A (en) Multiplex cpu interlocking system
CN108243185A (en) Scientific grade CCD gigabit Ethernet communication system and method based on AX88180
US8200877B2 (en) Device for processing a stream of data words
CN112637080B (en) Load balancing processing system based on FPGA
US7191262B2 (en) High-throughput UART interfaces
US6856619B1 (en) Computer network controller
CN114356829A (en) Protocol self-adaptive identification, cross-platform and standardization software system based on serial port transceiving
US20080195793A1 (en) Microcontroller with memory trace module
CN108984324B (en) FPGA hardware abstraction layer
CN101867510A (en) Plate-level double system interconnecting method
US8898716B2 (en) Method and apparatus for designing a communication mechanism between embedded cable modem and embedded set-top box
CN112637027B (en) Frame boundary defining device based on UART (universal asynchronous receiver/transmitter), transmitting method and receiving method
US20100011140A1 (en) Ethernet Controller Using Same Host Bus Timing for All Data Object Access
CN117176832A (en) Exchange chip and control method
CN115086192A (en) Data processing method, device and system and monitoring card

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant