CN113419780A - DPDK driving system based on FPGA accelerator card - Google Patents

DPDK driving system based on FPGA accelerator card Download PDF

Info

Publication number
CN113419780A
CN113419780A CN202110500249.9A CN202110500249A CN113419780A CN 113419780 A CN113419780 A CN 113419780A CN 202110500249 A CN202110500249 A CN 202110500249A CN 113419780 A CN113419780 A CN 113419780A
Authority
CN
China
Prior art keywords
data
ddr
data packet
size
accelerator card
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110500249.9A
Other languages
Chinese (zh)
Other versions
CN113419780B (en
Inventor
郭志川
王可
沙猛
黄逍颖
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhongkehai Suzhou Network Technology Co ltd
Institute of Acoustics CAS
Original Assignee
Zhongkehai Suzhou Network Technology Co ltd
Institute of Acoustics CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhongkehai Suzhou Network Technology Co ltd, Institute of Acoustics CAS filed Critical Zhongkehai Suzhou Network Technology Co ltd
Priority to CN202110500249.9A priority Critical patent/CN113419780B/en
Publication of CN113419780A publication Critical patent/CN113419780A/en
Application granted granted Critical
Publication of CN113419780B publication Critical patent/CN113419780B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F13/00Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F13/14Handling requests for interconnection or transfer
    • G06F13/20Handling requests for interconnection or transfer for access to input/output bus
    • G06F13/28Handling requests for interconnection or transfer for access to input/output bus using burst mode transfer, e.g. direct memory access DMA, cycle steal
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/4401Bootstrapping
    • G06F9/4403Processor initialisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/4401Bootstrapping
    • G06F9/4411Configuring for operating with peripheral devices; Loading of device drivers
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Security & Cryptography (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The invention discloses a DPDK driving system based on an FPGA accelerator card, which is deployed in a server of X86, and comprises: the device comprises a DMA module, a data packet receiving module and a data packet sending module; the DMA module is used for dynamically adjusting the size of a DMA transmission data block according to network flow, transmitting a data packet in a DDR of the FPGA accelerator card to a receiving memory of a server in a DMA mode, and also transmitting the data packet in a sending memory of the server to the DDR of the FPGA accelerator card in the DMA mode by adopting an overtime packet supplementing mechanism; the data packet receiving module is used for analyzing the data packet received by the server in the memory, extracting the timestamp and the packet length information, and packaging the timestamp and the packet length information into the mbuf data structure of the DPDK; and the data packet sending module is used for packaging the data packet to be sent in the mbuf data structure according to a preset format, adding packet header information and copying the data packet to a sending memory of the server.

Description

DPDK driving system based on FPGA accelerator card
Technical Field
The invention relates to the technical field of high-speed data packet acquisition by using an FPGA (field programmable gate array) network accelerator card, in particular to a DPDK (digital pre-distortion) driving system based on the FPGA network accelerator card.
Background
The internet in the world is rapidly developed, the network flow and the network speed are increased day by day, and the high-speed acquisition of network data is the research focus in academia and industry. In the fields of network traffic analysis, network data security and the like, the requirement for data volume is huge, so that the requirements for the network data acquisition rate in the fields are higher and higher. From 10Gb/s to 100Gb/s and then to over 100Gb/s, in order to adapt to higher and higher bandwidths, the acquisition rate of network data is continuously improved, and various software and hardware methods for high-speed data acquisition are continuously emerged.
Among various high-speed network data acquisition methods, FPGAs are receiving more and more attention. Using an FPGA for packet processing has many advantages over traditional software processing methods, such as faster processing speed, lower latency, etc. Thus, FPGAs are one of the mainstream platforms for high-speed packet processing. The FPGA network accelerator card is connected to the server through a PCIe interface, the FPGA receives the data packet through the optical port, and the server performs other specific complex operations on the received data packet. To realize these operations, the related driving of the FPGA accelerator card is needed to realize the operations.
DPDK (data Plane Development kit) is a Development platform and interface for fast processing data packets, running on Intel X86 and arm platforms. The DPDK implements the packet processing procedure in a polling manner. The processing mode provides a simple, easy and efficient data packet processing mode for the application layer, so that the development of network application is more convenient. Therefore, in terms of high-speed packet processing, more and more developers choose to use DPDK for network data collection and processing, so to speak, DPDK is becoming a standard of network data processing. When a developer uses DPDK to perform packet processing programming, if the lower platform switches, the developer usually needs to use a new interface to reprogram the application, which makes the development more cumbersome. Therefore, it is important that the underlying platform is "transparent" to the developer, and the developer can make the application run on different platforms without making changes to the application or with only a small amount of changes, which greatly reduces the development workload.
Currently, in the field of high-performance processing of network data packets, most FPGA accelerator card products do not support DPDK; the traditional DPDK network card products, such as 710 tera network card of INTEL, do not support network programmability. This patent has designed the DPDK drive to FPGA acceleration card, can support the data package of 64B-10000B size, and two optical ports 2x10Gbps linear speed supports nanosecond level timestamp. A new method for supporting DPDK is provided for the FPGA accelerator card.
In high-speed network packet processing, developers generally use DPDK to develop high-speed network data processing upper-layer applications. After the bottom platform is switched to the FPGA network accelerator card, the upper application usually needs to be modified in a large amount to complete the development again, which is very tedious.
Disclosure of Invention
The invention aims to overcome the defects of the prior art and provides a DPDK driving system based on an FPGA accelerator card. Aiming at the situation, the invention designs the DPDK driver suitable for the FPGA network accelerator card, can provide a DPDK standard function interface, and enables an upper application developer to complete development without or with little change. And can realize the packet-free transceiving of 2x10Gbps line speed 64B packet.
In order to achieve the above object, the present invention provides a DPDK driver system based on an FPGA accelerator card, which is deployed in a server of X86, and the DPDK driver system includes: the device comprises a DMA module, a data packet receiving module and a data packet sending module; wherein the content of the first and second substances,
the DMA module is used for dynamically adjusting the size of a DMA transmission data block according to network flow, transmitting a data packet in a DDR of the FPGA accelerator card to a receiving memory of the server in a DMA mode, and also transmitting the data packet in a sending memory of the server to the DDR of the FPGA accelerator card in the DMA mode by adopting an overtime packet supplementing mechanism;
the data packet receiving module is used for analyzing the data packet received by the server in the memory, extracting the timestamp and the packet length information, and packaging the timestamp and the packet length information into the mbuf data structure of the DPDK;
and the data packet sending module is used for packaging the data packet to be sent in the mbuf data structure according to a preset format, adding packet header information and copying the data packet to a sending memory of the server.
As an improvement of the system, the DDR of the FPGA accelerator card adopts a ring buffer structure and comprises a write pointer DDR _ wrp and a read pointer DDR _ rdp; the receiving memory and the sending memory of the server both adopt independent annular buffer structures and both comprise a write pointer and a read pointer.
As an improvement of the above system, the size of the DMA transfer data block is dynamically adjusted according to the network traffic, and the data packet in the DDR of the FPGA accelerator card is transferred to the receiving memory of the server in a DMA manner; the method specifically comprises the following steps:
monitoring the network flow of an optical port of the FPGA acceleration card, and dynamically adjusting the size of a DMA transmission data block;
monitoring a write pointer DDR _ wrp and a read pointer DDR _ rdp of the DDR, and calculating the size of data to be transmitted;
the monitoring server receives a write pointer usr _ wrp and a read pointer usr _ rdp of the memory, and calculates the size of the writable space;
and when the size of the data to be transmitted and the size of the writable space both meet the condition of one-time transmission, transmitting the data packet in the DDR of the FPGA accelerator card to a receiving memory of the server in a DMA mode.
As an improvement of the above system, when both the size of the data to be transmitted and the size of the writable space satisfy a one-time transmission condition, the data packet in the DDR of the FPGA accelerator card is transmitted to the receiving memory of the server in a DMA manner; the method specifically comprises the following steps:
calculating the size of data to be transmitted by a read pointer DDR _ rdp and a write pointer DDR _ wrp of the DDR, and reading data between DDR _ rdp and DDR _ wrp when the read pointer DDR _ rdp is smaller than the write pointer DDR _ wrp; when the read pointer ddr _ rdp is greater than the write pointer ddr _ wrp, data between ddr _ size-ddr _ rdp + ddr _ wrp is read; DDR _ size is the size of the DDR ring buffer;
if the read data size exceeds the data size N of one DMA transfer, then starting to read N from ddr _ rdp;
calculating the size of the writable space by a read pointer usr _ rdp and a write pointer usr _ wrp of a receiving ring buffer of the server, wherein when the read pointer usr _ rdp is larger than the write pointer usr _ wrp, the size of the writable space is usr _ rdp-usr _ wrp; when the read pointer usr _ rdp is smaller than the write pointer usr _ wrp, the size of the writable space is usr _ size-usr _ wrp + usr _ rdp, and usr _ size is the size of the receiving ring buffer of the server;
when the writable space is larger than N, copying the read data to a receiving ring buffer write pointer usr _ wrp of the server;
judging whether the increased N of usr _ wrp is larger than usr _ size, if so, performing remainder operation to obtain a new write pointer usr _ wrp position, otherwise, increasing N of usr _ wrp;
and judging whether the increased N of the ddr _ rdp is larger than the ddr _ size, if so, performing remainder operation to obtain the position of a new write pointer ddr _ rdp, and otherwise, increasing N of the ddr _ rdp.
As an improvement of the above system, the data packet in the sending memory of the server is transmitted to the DDR of the FPGA accelerator card in a DMA manner by using a timeout packet supplementing mechanism; the method specifically comprises the following steps:
when the size of data to be sent in a server sending memory reaches the size of one-time DMA transmission, the data is transmitted to the DDR of the FPGA accelerator card in a DMA mode;
and when the retention time of the data to be sent in the sending memory of the server exceeds a threshold value, performing packet supplementing processing, filling the size of the DMA once transmission which can be sent, and sending the data to the DDR of the FPGA accelerator card.
As an improvement of the above system, the specific processing procedure of the data packet receiving module is as follows:
extracting a data packet from the receiving ring buffer of the server at a read pointer usr _ rdp; the data packet comprises a header and a standard Ethernet frame, and the header comprises: a 6-byte preamble, 2-byte valid packet length information, and 8-byte timestamp information;
positioning a data packet header according to a lead code, then extracting packet length information pkt _ len, judging the validity of the pkt _ len, if the pkt _ len is larger than the maximum packet length of an Ethernet data frame, the packet length information is wrong, and skipping 8bytes from the lead code to reposition the packet header;
comparing the lengths of the pkt _ len and the actual data packet, and if the lengths of the pkt _ len and the actual data packet are consistent, extracting the timestamp information of the data packet;
comparing the data area size mbuf _ size of pkt _ len and mbuf structure, if pkt _ len is smaller than mbuf _ size, copying the data packet of pkt _ len to the designated mbuf; otherwise, filling the data with the pkt _ len length in a data area with a plurality of mbufs in a segmented manner by taking 2048bytes as a unit until the data area is not separable, placing the data with the residual length in the last mbuf, and sequentially linking the mbufs together; filling the number of mbufs linked with the data packet in the nb _ segs field of the mbuf;
time stamp information is filled into the mbuf specified area.
As an improvement of the above system, the packet of pkt _ len is copied to a designated mbuf; the method specifically comprises the following steps:
and judging whether the current read pointer position usr _ rdp is shifted backward by the pkt _ len length and then exceeds a receiving memory preset by the server, if so, truncating the data packet, copying the data at the tail end of the memory, and positioning the read pointer usr _ rdp to the first address of the memory to copy the residual data.
As an improvement of the above system, the specific processing procedure of the data packet sending module is as follows:
extracting a data packet from mbuf, and adding a lead code of 6 bytes and packet length information of 2 bytes in a packet header;
reading nb _ segs field of the mbuf structure, and copying the data with the length of pkt _ len of the data packet to a sending memory of the server if nb _ segs is 1; and if the nb _ segs is larger than 1, sequentially extracting the data in the link mbuf according to the numerical value of the nb _ segs and the length of the mbuf, and copying the data to a sending memory of the server.
As an improvement of the above system, the system further comprises: the system comprises an accelerator card drive binding module and an accelerator card configuration initialization module; wherein the content of the first and second substances,
the accelerator card drive binding module is used for binding the FPGA accelerator card and the DPDK;
the accelerator card configuration initialization module is used for initializing the FPGA accelerator card, acquiring FPGA accelerator card information and configuring the FPGA accelerator card.
As an improvement of the above system, the specific processing procedure of the accelerator card driver binding module is as follows:
registering a drive system to a DPDK drive linked list;
and registering the device type number of the FPGA accelerator card to the device linked list of the DPDK.
Compared with the prior art, the invention has the advantages that:
1. the invention carries out dynamic DMA transmission by monitoring the actual network data flow, adopts the design of an overtime package supplementing mechanism, and supports the zero-packet-loss efficient processing of the data package and the transmission of the giant frame through the self-defined data package packaging and data package analyzing flow;
2. the FPGA accelerator card can realize 64-10000bytes line speed receiving and sending packets under the condition of double optical ports 2x10Gbps line speed, the servers are configured identically, and the performance of the packets is superior to that of an Intel gigabit card supporting DPDK; meanwhile, the driver supports nanosecond timestamps, and the resolution of the timestamps is superior to that of a common DPDK ten-gigabit network card.
Drawings
FIG. 1 is a block diagram of a DPDK driver system based on an FPGA accelerator card according to the present invention;
FIG. 2 is a function call framework diagram of the present invention;
FIG. 3(a) is a DMA module reception flow diagram of the present invention;
FIG. 3(b) is a flow diagram of a packet receipt module of the present invention;
FIG. 4 is a schematic diagram of dual-port transceiving of the FPGA accelerator card of the invention;
FIG. 5(a) is a schematic diagram illustrating an initial state of a server receiving a ring cache read-write pointer according to the present invention;
FIG. 5(b) is a diagram illustrating a server receiving a ring cache write state according to the present invention;
FIG. 5(c) is a diagram illustrating a server receive ring cache read state of the present invention;
FIG. 5(d) is a diagram illustrating a server receiving a ring cache out-of-bounds condition according to the present invention;
fig. 6 is a schematic diagram of the mbuf structure of DPDK of the present invention.
Detailed Description
The invention provides a method for realizing DPDK (data Plane Development kit) drive of an FPGA accelerator card, which combines FPGA accelerator card resources with a DPDK optimization processing function so as to realize high-performance processing of data packets. Through the drive, a developer does not need to change an originally used DPDK interface function when programming an upper-layer application, and the bottom-layer processing operation is finished by the FPGA network accelerator card. By using the DPDK driver, a developer can realize the configuration and management of the FPGA network accelerator card, realize the receiving and sending of data packets and perform other operations on the data packets. When a developer calls a driving function to receive a data packet, the data packet can be transmitted to a host memory from a DDR of the FPGA accelerator card through a PCIe interface in a DMA mode, then the driver can analyze the data packet in the host memory, package the data packet in the host memory into an mbuf data structure of a DPDK, and fill related information. Meanwhile, the size of each DMA transmission can be adjusted according to the flow size. When the data packet is sent, the driver analyzes the mbuf data structure, fills necessary information, puts the data packet into the memory of the host, and then transmits the data packet to the DDR memory of the FPGA accelerator card through the PCIe interface in a DMA manner.
The system comprises: the device comprises an accelerator card drive binding module, an accelerator card configuration initialization module, a DMA module, a data packet receiving module and a data packet sending module. The data packet copied by the DMA carries packet length information with the packet length field in the last 2 bytes of the preamble. The FPGA board card is bound with the DPDK, annular cache is adopted for receiving and transmitting, large data blocks are used as units for processing in data transmission at high speed, an overtime packet supplementing mechanism is adopted at low speed, and the size of the data blocks can be dynamically adjusted according to actual network flow, so that high throughput and low delay are guaranteed. Compared with the traditional method, the method has the advantages of line speed processing of the packet (64B) and less occupation of CPU resources.
The DPDK driver of the FPGA network accelerator card provided by the invention can realize the functions of binding the FPGA network accelerator card, configuring and initializing the FPGA network accelerator card, sending and receiving a data packet and the like by calling a DPDK standard function interface. When receiving the data packet, the data packet cached in the DDR of the FPGA accelerator card needs to be transmitted to the memory of the server through the DMA, and then copied from the memory to the mbuf structure of the DPDK for subsequent processing.
The invention can realize the following functions:
1. and controlling and managing the FPGA accelerator card. The DPDK driver of the FPGA accelerator card can register the driver, so that the binding of the FPGA accelerator card and the DPDK driver is realized, the initialization of the FPGA accelerator card is realized, the accelerator card information is acquired, and the accelerator card is configured.
2. A data packet is received. After the FPGA acceleration card is configured and initialized, a packet receiving function can be realized by calling a function interface for receiving a data packet in the driver. Firstly, configuring a receiving queue by calling a DPDK driven function interface; then starting the FPGA acceleration card by calling a function interface rte _ eth _ dev _ start; and finally, calling a packet receiving function interface to receive and process the data packet, and driving to be responsible for putting the data stored in the memory of the server into an mbuf data structure of the DPDK in the form of the data packet and filling related information. The data stored in the server memory is transferred from the FPGA accelerator card to the server memory in a DMA mode.
3. And sending the data packet. The process of receiving the data packet is approximately opposite, the data packet in the mbuf to be forwarded is copied to the appointed forwarding memory through the packet sending interface function, and the data packet is integrated into a large block of data and then transmitted to the board card DDR through the DMA.
The specific process of DMA is as follows: and judging whether the DMA can be carried out once, if so, finishing the DMA transmission once by reading the equipment file, wherein the DMA transmission size is determined by the current flow size. The specific process of sending DMA is as follows: and judging whether the DMA can be carried out once, and if so, finishing the DMA transmission once by writing the equipment file.
And the simultaneous transceiving operation of the two network ports is supported. The network port communication state is monitored through a register on the basis of the receiving and sending process, when the two network ports are communicated simultaneously, a packet receiving DMA module polls and receives data of two board card packet receiving channels to a specified memory space, the packet sending DMA module can poll and copy the data of two forwarding memories to two corresponding two groups of board cards DDR, a double-thread mode is added in a test program, interaction of the data between the two groups of memories and mbuf is respectively bound on two CPU processor cores, and independence of the two groups of receiving and sending channels is guaranteed.
Compared with other high-speed network processing systems based on the FPGA, the method enables the FPGA to support DPDK driving because the replacement of bottom equipment causes the original application code to need a large amount of changes, thereby enabling the application code to be changed without or with a small amount of changes and reducing the complexity of application development. Compared with the transmission performance of a universal gigabit network card, in the same server environment, the line speed double-port test of the Intel gigabit card 64B packet can only ensure that no packet loss occurs within 1Gbps by a single port, and the invention can realize the line speed no-packet loss transmission of the double-port 20Gbps by large-block dynamic DMA transmission and various optimization mechanisms of a transmission path.
The technical solution of the present invention will be described in detail below with reference to the accompanying drawings and examples.
Example 1
As shown in fig. 1, embodiment 1 of the present invention provides a DPDK driver system based on an FPGA accelerator card, which is deployed in a server of X86, and the DPDK driver system includes: the device comprises a DMA module, a data packet receiving module and a data packet sending module; wherein the content of the first and second substances,
the DMA module is used for dynamically adjusting the size of a DMA transmission data block according to network flow, transmitting a data packet in a DDR of the FPGA accelerator card to a receiving memory of the server in a DMA mode, and also transmitting the data packet in a sending memory of the server to the DDR of the FPGA accelerator card in the DMA mode by adopting a timeout packet supplementing mechanism;
the data packet receiving module is used for analyzing the data packet received by the server in the memory, extracting the timestamp and the packet length information, and packaging the timestamp and the packet length information into the mbuf data structure of the DPDK;
and the data packet sending module is used for packaging the data packet to be sent in the mbuf data structure according to a preset format, adding packet header information and copying the data packet to a sending memory of the server.
The specific principle is as follows:
according to the application program drive detection matching process, an application program running in a DPDK environment firstly registers a drive linked list before the program starts, a register equipment linked list is scanned during initialization, an FPGA accelerator card serving as PCIe equipment can be directly identified on a server, and the DPDK acquires board card equipment information by scanning a Linux system bus equipment directory and hangs the board card equipment information into the equipment linked list; in the DPDPMD driver of the FPGA accelerator card, a RTE _ PMD _ REGISTER _ PCI interface function is utilized to REGISTER a written drive function into a DPDK drive linked list; and REGISTERs the device type number supported by the drive using the RTE _ PMD _ REGISTER _ PCI _ TABLE function. After the program starts, the binding of the FPGA accelerator card and the DPDK PMD driver is completed by calling the rte _ bus _ probe function under rte _ eal _ init to match the device chain table and the corresponding driver in the driver chain table.
FIG. 2 is a diagram illustrating basic functions related to transceiving packets and their connectivity relationships encapsulated in a driver. After the driver is successfully registered and bound, a data path between the driver and the FPGA board card can be established and can be divided into configuration initialization of the FPGA accelerator card, starting and closing of the accelerator card, a DMA module, a data packet receiving module (RX) and a data packet forwarding module (TX). Next, a specific process of receiving a packet under the drive of the board DPDK will be described in detail.
The system adopts a double-ring buffer structure, the size of a server memory is 128M, and the size of a board card DDR ring memory is 8G. Fig. 3(a) is a flowchart of a DMA operation in a slave thread. The DMA starts by opening the device file and then entering a slave thread loop. In the loop, four judgments are made before the data is copied in order to ensure the safety and the effectiveness of the data transmission: and DMA cycle termination judgment, namely judging whether the DDR read pointer DDR _ rdp cyclically utilizes the DDR memory at the initial position, further judging whether sufficient data exist in the DDR, and finally determining whether the space of the host memory is sufficient. And if the conditions are not met, the circulation is re-entered according to the callback sequence of the flow chart, when the conditions are met, DMA transmission with the size of N is performed from the DDR corresponding to the equipment file, and then DDR _ rdp and usr _ wrp are changed according to data change to enter the next circulation.
Specific procedure of determining whether there is sufficient readable data in DDR. When the read pointer ddr _ rdp is less than ddr _ wrp, the values of ddr _ wrp-ddr _ rdp are readable data sizes; or when ddr _ rdp is greater than ddr _ wrp, the value of 8G-ddr _ rdp + ddr _ wrp is the readable data size. The size of the readable data is larger than that of the primary DMA, which indicates that the cache data readable in the DDR is larger than the data transmitted by the primary DMA, and then the next judgment of the program can be carried out.
And judging whether the host memory has enough writable space. Because the host memory uses a ring storage structure, when the size of one-time DMA is larger than the remaining writable size of the host memory, the newly written data can overwrite the unread data. When the host memory read pointer usr _ rdp is greater than the host memory write pointer usr _ wrp, the values usr _ rdp-usr _ wrp are the size of the writable space; when the host memory read pointer usr _ rdp is smaller than the host memory write pointer, 128M-usr _ wrp + usr _ rdp is the writable space size. When the size of the writable space is larger than the size of one-time DMA, the unread data can not be covered, and one-time DMA can be carried out.
And after one DMA, the change conditions of the host annular memory write pointer and the DDR annular memory read pointer are carried out. After DMA with the size of N is carried out for one time, the host memory write pointer usr _ wrp increases N, and when the increased value is larger than the host memory size 128M, remainder operation is carried out to obtain a new write pointer usr _ wrp position; the DDR read pointer DDR _ rdp increments N, and when the incremented value is greater than the DDR memory size 8G, a remainder operation is performed to obtain the new write pointer DDR _ rdp position.
The data packet receiving module realizes data copying and packet receiving between a server memory and a DPDK mbuf structure, extracts a data packet from continuous large data blocks in a host memory in eth _ dsp _ rx, puts the data packet into the mbuf structure applied from a memory pool, and extracts basic information of the data packet to a specified position of mbuf. Fig. 3(b) shows the copy flow of a single packet: the first 512 bits of the data packet, processed by the FPGA accelerator card, include a preamble (FB 5555555555555555) of 6 bytes and a valid packet length field of two bytes, followed by 8bytes of timestamp information, followed by a standard ethernet frame format. The data packets are all eight-byte aligned after being processed by hardware, so the design adopts an 8-byte copy detection mode and checks the data packet field by taking 8bytes as a unit. Since the preamble (FB 55555555555555555555) field appears uniquely in each data packet header, the data packet header is first located according to the preamble, the following packet length information is extracted, the validity of the packet length is judged, whether the packet length specification (9600bytes) is exceeded or not is judged, if so, the packet length is wrong, and the 8bytes are skipped to relocate the packet header. Secondly, DMA adopts data block transmission, the transmission boundary is not necessarily a complete data packet, when the data flow is small and the memory extraction speed is high, the phenomenon that the last data packet in the memory is incomplete and cannot meet the requirement of one-time packet length transmission can occur, judgment is set again, and the phenomenon that a pointer crosses the boundary is prevented by comparing the packet length with the actual memory distance. And after the packet length is detected to be correct, extracting the timestamp information of the data packet, and extracting data from the memory to mbuf according to the pkt _ len length. Before copying, judging whether the data packet is a giant frame according to the size (2048bytes) of an mbuf structure data area, if not, directly copying the data packet of pkt _ len to a specified mbuf, if the data packet is the giant frame, sectionally filling data _ rom of a plurality of mbufs with data of pkt _ len length by taking 2048bytes as a unit until the data are inseparable, placing the data of the residual length in the last mbuf, and finally sequentially linking the mbufs together. Before copying, the read pointer is judged to prevent boundary crossing, wherein the method for preventing boundary crossing comprises the following steps: because the host memory is set to be used in 128M loop coverage, it should be determined whether the current read pointer position usr _ rdp is shifted backward by pkt _ len length and then crosses the boundary 128M, if the boundary is crossed, the data packet should be truncated, the end data of the memory is copied first, and then the read pointer is positioned to the first address of the memory to copy the remaining data. This is a packet copying process, and the upper layer application may pass the number of packets that are expected to be transmitted in a burst to the eth _ dsp _ rx function through the rte _ eth _ rx _ burst function, and copy the specified number of packets in turn to the mbuf structure of the DPDK using the for loop statement.
And the data forwarding path and the receiving path are opposite, an eth _ dsp _ tx function is called to transfer the mbuf structure data to be forwarded to an rte _ eth _ tx _ burst function, so that a preamble and packet length information are added to the data packet by the function, and the data packet is subjected to eight-byte alignment processing in order to facilitate the identification and further processing of the data packet by the board forwarding function. For a macro frame, an mbuf- > nb _ segs field can be used for judging, the flag identifies how many mbufs are linked together, the flag is equal to 1, which indicates that there is no linked mbuf (not a macro frame) after mbuf- > next, pkt _ len length data can be directly transmitted, and if not equal to 1, data in the linked mbuf are sequentially extracted according to the identification number and data _ len. And then dynamically transmitting the memory data to the board DDR by using the annular buffer structure. Because the AXI4 bus bandwidth of the FPGA board card is set to 512 bits, in order to avoid the problem of packet loss caused by the retention of the data packet in the board card, an overtime packet supplementing mechanism is added to the DMA module of the forwarding part, and when there is no data transmission within a certain time and the data remaining in the memory is less than 512 bits, the packet supplementing operation is performed on the data.
Fig. 4 is a process of constructing dual-port transceiving by using dual-threads based on a single-port transceiving test. As shown in the figure, the board DDR is divided into four parts by taking 4G as a unit, and respectively receives the transceiving paths of the two optical ports, and respectively corresponds to four internal memory areas of the upper computer through a DMA transmission mechanism. And in the DMA module, the communication state of the optical port is detected through a register, if the two ports are simultaneously communicated, the packet receiving DMA receives two groups of data of the packet receiving DDR respectively to the two corresponding groups of memory areas according to the communication flag bit polling, and the packet receiving DMA transmits the data of the two forwarding memories to the corresponding board DDR space in a packet transmitting DMA polling mode. When single-port transceiving operation is carried out, a group of transceiving queues is arranged and respectively corresponds to the data receiving memory and the data forwarding memory. In order to ensure the independence of data transmission and reception of the two optical ports, the transmission performance is not influenced. The method comprises the steps of respectively setting two queues corresponding to different memory spaces for a transceiving function in a test program, and operating a pair of transceiving queues of each optical port on a processor core by using rte-eal-mp-remote-launch functions, so that shared resources and dependency relationship of two groups of transceiving paths are reduced, and high-performance processing during single-port transceiving of data packets is guaranteed.
Fig. 5(a), (b), (c), and (d) are memory space ring buffer structures, and it can be seen from the specific operation process that the ring buffer functions to enable the read/write module to keep independent operation and need to be dependent on each other, the DMA module is responsible for changing the change of the write pointer and monitoring the read pointer to prevent write-out, the packet receiving module is responsible for changing the change of the read pointer and monitoring the write pointer to prevent read-out, and both modules need to take into account the memory 128M boundary copy. Fig. 6 is a schematic diagram of mbuf structure.
Finally, it should be noted that the above embodiments are only used for illustrating the technical solutions of the present invention and are not limited. Although the present invention has been described in detail with reference to the embodiments, it will be understood by those skilled in the art that various changes may be made and equivalents may be substituted without departing from the spirit and scope of the invention as defined in the appended claims.

Claims (10)

1. A DPDK driving system based on an FPGA accelerator card, which is deployed in a server of X86, and is characterized in that the system comprises: the device comprises a DMA module, a data packet receiving module and a data packet sending module; wherein the content of the first and second substances,
the DMA module is used for dynamically adjusting the size of a DMA transmission data block according to network flow, transmitting a data packet in a DDR of the FPGA accelerator card to a receiving memory of the server in a DMA mode, and also transmitting the data packet in a sending memory of the server to the DDR of the FPGA accelerator card in the DMA mode by adopting an overtime packet supplementing mechanism;
the data packet receiving module is used for analyzing the data packet received by the server in the memory, extracting the timestamp and the packet length information, and packaging the timestamp and the packet length information into the mbuf data structure of the DPDK;
and the data packet sending module is used for packaging the data packet to be sent in the mbuf data structure according to a preset format, adding packet header information and copying the data packet to a sending memory of the server.
2. The DPDK driving system based on the FPGA accelerator card as claimed in claim 1, wherein the DDR of the FPGA accelerator card adopts a ring buffer structure comprising a write pointer DDR _ wrp and a read pointer DDR _ rdp; the receiving memory and the sending memory of the server both adopt independent annular buffer structures and both comprise a write pointer and a read pointer.
3. The DPDK driving system based on the FPGA accelerator card according to claim 2, wherein the size of the DMA transmission data block is dynamically adjusted according to the network traffic, and the data packet in the DDR of the FPGA accelerator card is transmitted to the receiving memory of the server in a DMA mode; the method specifically comprises the following steps:
monitoring the network flow of an optical port of the FPGA acceleration card, and dynamically adjusting the size of a DMA transmission data block;
monitoring a write pointer DDR _ wrp and a read pointer DDR _ rdp of the DDR, and calculating the size of data to be transmitted;
the monitoring server receives a write pointer usr _ wrp and a read pointer usr _ rdp of the memory, and calculates the size of the writable space;
and when the size of the data to be transmitted and the size of the writable space both meet the condition of one-time transmission, transmitting the data packet in the DDR of the FPGA accelerator card to a receiving memory of the server in a DMA mode.
4. The DPDK driving system based on the FPGA accelerator card as claimed in claim 3, wherein when the size of the data to be transmitted and the size of the writable space both satisfy a one-time transmission condition, the data packet in the DDR of the FPGA accelerator card is transmitted to the receiving memory of the server in a DMA manner; the method specifically comprises the following steps:
calculating the size of data to be transmitted by a read pointer DDR _ rdp and a write pointer DDR _ wrp of the DDR, and reading data between DDR _ rdp and DDR _ wrp when the read pointer DDR _ rdp is smaller than the write pointer DDR _ wrp; when the read pointer ddr _ rdp is greater than the write pointer ddr _ wrp, data between ddr _ size-ddr _ rdp + ddr _ wrp is read; DDR _ size is the size of the DDR ring buffer;
if the read data size exceeds the data size N of one DMA transfer, then starting to read N from ddr _ rdp;
calculating the size of the writable space by a read pointer usr _ rdp and a write pointer usr _ wrp of a receiving ring buffer of the server, wherein when the read pointer usr _ rdp is larger than the write pointer usr _ wrp, the size of the writable space is usr _ rdp-usr _ wrp; when the read pointer usr _ rdp is smaller than the write pointer usr _ wrp, the size of the writable space is usr _ size-usr _ wrp + usr _ rdp, and usr _ size is the size of the receiving ring buffer of the server;
when the writable space is larger than N, copying the read data to a receiving ring buffer write pointer usr _ wrp of the server;
judging whether the increased N of usr _ wrp is larger than usr _ size, if so, performing remainder operation to obtain a new write pointer usr _ wrp position, otherwise, increasing N of usr _ wrp;
and judging whether the increased N of the ddr _ rdp is larger than the ddr _ size, if so, performing remainder operation to obtain the position of a new write pointer ddr _ rdp, and otherwise, increasing N of the ddr _ rdp.
5. The DPDK driving system based on the FPGA accelerator card according to claim 4, wherein the timeout packet complementing mechanism is used to transmit the data packet in the sending memory of the server to the DDR of the FPGA accelerator card in a DMA manner; the method specifically comprises the following steps:
when the size of data to be sent in a server sending memory reaches the size of one-time DMA transmission, the data is transmitted to the DDR of the FPGA accelerator card in a DMA mode;
and when the retention time of the data to be sent in the sending memory of the server exceeds a threshold value, performing packet supplementing processing, filling the size of the DMA once transmission which can be sent, and sending the data to the DDR of the FPGA accelerator card.
6. The DPDK driving system based on the FPGA accelerator card of claim 5, wherein the specific processing procedure of the data packet receiving module is as follows:
extracting a data packet from the receiving ring buffer of the server at a read pointer usr _ rdp; the data packet comprises a header and a standard Ethernet frame, and the header comprises: a 6-byte preamble, 2-byte valid packet length information, and 8-byte timestamp information;
positioning a data packet header according to a lead code, then extracting packet length information pkt _ len, judging the validity of the pkt _ len, if the pkt _ len is larger than the maximum packet length of an Ethernet data frame, the packet length information is wrong, and skipping 8bytes from the lead code to reposition the packet header;
comparing the lengths of the pkt _ len and the actual data packet, and if the lengths of the pkt _ len and the actual data packet are consistent, extracting the timestamp information of the data packet;
comparing the data area size mbuf _ size of pkt _ len and mbuf structure, if pkt _ len is smaller than mbuf _ size, copying the data packet of pkt _ len to the designated mbuf; otherwise, filling the data with the pkt _ len length in a data area with a plurality of mbufs in a segmented manner by taking 2048bytes as a unit until the data area is not separable, placing the data with the residual length in the last mbuf, and sequentially linking the mbufs together; filling the number of mbufs linked with the data packet in the nb _ segs field of the mbuf;
time stamp information is filled into the mbuf specified area.
7. The DPDK driving system based on the FPGA accelerator card of claim 6, wherein the data packet of pkt _ len is copied to a designated mbuf; the method specifically comprises the following steps:
and judging whether the current read pointer position usr _ rdp is shifted backward by the pkt _ len length and then exceeds a receiving memory preset by the server, if so, truncating the data packet, copying the data at the tail end of the memory, and positioning the read pointer usr _ rdp to the first address of the memory to copy the residual data.
8. The DPDK driving system based on the FPGA accelerator card of claim 7, wherein the specific processing procedure of the data packet sending module is as follows:
extracting a data packet from mbuf, and adding a lead code of 6 bytes and packet length information of 2 bytes in a packet header;
reading nb _ segs field of the mbuf structure, and copying the data with the length of pkt _ len of the data packet to a sending memory of the server if nb _ segs is 1; and if the nb _ segs is larger than 1, sequentially extracting the data in the link mbuf according to the numerical value of the nb _ segs and the length of the mbuf, and copying the data to a sending memory of the server.
9. The DPDK driving system based on the FPGA accelerator card of claim 1, further comprising: the system comprises an accelerator card drive binding module and an accelerator card configuration initialization module; wherein the content of the first and second substances,
the accelerator card drive binding module is used for binding the FPGA accelerator card and the DPDK;
the accelerator card configuration initialization module is used for initializing the FPGA accelerator card, acquiring FPGA accelerator card information and configuring the FPGA accelerator card.
10. The DPDK driving system based on the FPGA accelerator card of claim 8, wherein the specific processing procedure of the accelerator card driving binding module is as follows:
registering a drive system to a DPDK drive linked list;
and registering the device type number of the FPGA accelerator card to the device linked list of the DPDK.
CN202110500249.9A 2021-05-08 2021-05-08 DPDK driving system based on FPGA acceleration card Active CN113419780B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110500249.9A CN113419780B (en) 2021-05-08 2021-05-08 DPDK driving system based on FPGA acceleration card

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110500249.9A CN113419780B (en) 2021-05-08 2021-05-08 DPDK driving system based on FPGA acceleration card

Publications (2)

Publication Number Publication Date
CN113419780A true CN113419780A (en) 2021-09-21
CN113419780B CN113419780B (en) 2023-05-12

Family

ID=77712124

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110500249.9A Active CN113419780B (en) 2021-05-08 2021-05-08 DPDK driving system based on FPGA acceleration card

Country Status (1)

Country Link
CN (1) CN113419780B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114095251A (en) * 2021-11-19 2022-02-25 南瑞集团有限公司 SSLVPN realization method based on DPDK and VPP
CN115412502A (en) * 2022-11-02 2022-11-29 之江实验室 Network port expansion and message rapid equalization processing method
CN117806988A (en) * 2024-02-29 2024-04-02 山东云海国创云计算装备产业创新中心有限公司 Task execution method, task configuration method, board card and server
CN117806988B (en) * 2024-02-29 2024-05-24 山东云海国创云计算装备产业创新中心有限公司 Task execution method, task configuration method, board card and server

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106385379A (en) * 2016-09-14 2017-02-08 杭州迪普科技有限公司 Message caching method and device
CN107436855A (en) * 2016-05-25 2017-12-05 三星电子株式会社 QOS cognition IO management for the PCIE storage systems with reconfigurable multiport
CN112131164A (en) * 2020-09-23 2020-12-25 山东云海国创云计算装备产业创新中心有限公司 Data scheduling method and device applied to acceleration board card, acceleration board card and medium
CN112422448A (en) * 2020-08-21 2021-02-26 苏州浪潮智能科技有限公司 FPGA accelerator card network data transmission method and related components
CN112637080A (en) * 2020-12-14 2021-04-09 中国科学院声学研究所 Load balancing processing system based on FPGA
CN112765054A (en) * 2019-11-01 2021-05-07 中国科学院声学研究所 High-speed data acquisition system and method based on FPGA

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107436855A (en) * 2016-05-25 2017-12-05 三星电子株式会社 QOS cognition IO management for the PCIE storage systems with reconfigurable multiport
CN106385379A (en) * 2016-09-14 2017-02-08 杭州迪普科技有限公司 Message caching method and device
CN112765054A (en) * 2019-11-01 2021-05-07 中国科学院声学研究所 High-speed data acquisition system and method based on FPGA
CN112422448A (en) * 2020-08-21 2021-02-26 苏州浪潮智能科技有限公司 FPGA accelerator card network data transmission method and related components
CN112131164A (en) * 2020-09-23 2020-12-25 山东云海国创云计算装备产业创新中心有限公司 Data scheduling method and device applied to acceleration board card, acceleration board card and medium
CN112637080A (en) * 2020-12-14 2021-04-09 中国科学院声学研究所 Load balancing processing system based on FPGA

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114095251A (en) * 2021-11-19 2022-02-25 南瑞集团有限公司 SSLVPN realization method based on DPDK and VPP
CN114095251B (en) * 2021-11-19 2024-02-13 南瑞集团有限公司 SSLVPN implementation method based on DPDK and VPP
CN115412502A (en) * 2022-11-02 2022-11-29 之江实验室 Network port expansion and message rapid equalization processing method
CN117806988A (en) * 2024-02-29 2024-04-02 山东云海国创云计算装备产业创新中心有限公司 Task execution method, task configuration method, board card and server
CN117806988B (en) * 2024-02-29 2024-05-24 山东云海国创云计算装备产业创新中心有限公司 Task execution method, task configuration method, board card and server

Also Published As

Publication number Publication date
CN113419780B (en) 2023-05-12

Similar Documents

Publication Publication Date Title
US5530874A (en) Network adapter with an indication signal mask and an interrupt signal mask
CN101841470B (en) High-speed capturing method of bottom-layer data packet based on Linux
US6233244B1 (en) Method and apparatus for reclaiming buffers
US6526451B2 (en) Method and network device for creating circular queue structures in shared memory
US5299313A (en) Network interface with host independent buffer management
CN113419780A (en) DPDK driving system based on FPGA accelerator card
US20200081850A1 (en) Unified address space for multiple hardware accelerators using dedicated low latency links
WO1998036528A1 (en) Method and apparatus for transmitting multiple copies by replicating data identifiers
EP0960505B1 (en) Method and apparatus for controlling initiation of transmission of data as a function of received data
US10609125B2 (en) Method and system for transmitting communication data
CN112637080B (en) Load balancing processing system based on FPGA
EP2059877B1 (en) Device for processing a stream of data words
US6388989B1 (en) Method and apparatus for preventing memory overrun in a data transmission system
CN106059955A (en) Ethernet real-time packet capturing method based on SOC DMA
CN107888337A (en) A kind of method of FPGA, FPGA processing information, accelerator
CN114356829A (en) Protocol self-adaptive identification, cross-platform and standardization software system based on serial port transceiving
EP2122472A1 (en) Microcontroller with memory trace module
US5948079A (en) System for non-sequential transfer of data packet portions with respective portion descriptions from a computer network peripheral device to host memory
CN115033407B (en) System and method for collecting and identifying flow suitable for cloud computing
CN115296743A (en) Optical fiber communication switching system
CN115756296A (en) Cache management method and device, control program and controller
KR20040110540A (en) Apparatus and method interfacing a data for a network electronic device
US20030210684A1 (en) Packet transceiving method and device
CN112637027B (en) Frame boundary defining device based on UART (universal asynchronous receiver/transmitter), transmitting method and receiving method
Byszuk et al. Implementation of PCIe-SerDes-DDR3 communication in a multi-FPGA data acquisition system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant