CN113676421A

CN113676421A - Multi-port network message receiving and transmitting method based on PCIe

Info

Publication number: CN113676421A
Application number: CN202111237181.6A
Authority: CN
Inventors: 沈文君; 张富军
Original assignee: Zhejiang Lab
Current assignee: Zhejiang Lab
Priority date: 2021-10-25
Filing date: 2021-10-25
Publication date: 2021-11-19
Anticipated expiration: 2041-10-25
Also published as: CN113676421B

Abstract

The invention discloses a multi-port network message receiving and transmitting method based on PCIe, which comprises the following steps: s1: the ARM and the FPGA virtualize a plurality of network devices, port numbers are added to original network messages, and the function of simultaneously receiving and transmitting data by a plurality of network ports on a single PCIe channel is realized; s2: the ARM adopts a mechanism of sending network message page merging, multi-page aggregation sending and overtime sending, and adopts two independent threads and two lock-free caches to operate in the processes of sending the message page merging and the merged page sending; s3: and the ARM performs CPU binding processing on the processing threads for sending and receiving the network message. As the DMA transmission of the ARM takes pages (4096 bytes) as a unit, the ARM creates two lock-free caches and processing threads, performs page combination processing on network transmission messages, effectively reduces the DMA transmission through a page aggregation transmission and overtime transmission mechanism, improves the network transmission efficiency, and simultaneously ensures the instantaneity when the network messages are less.

Description

Multi-port network message receiving and transmitting method based on PCIe

Technical Field

The invention relates to the technical field of computer networks, in particular to a multiport network message receiving and transmitting method based on PCIe.

Background

In the embedded network device, for a scene requiring a plurality of network ports, a native network port of a general ARM processor is not enough, and at the moment, the network port can be expanded through a PCIe channel and an FPGA of the ARM. For embedded equipment with sensitive cost, the FPGA does not have an SR-IOV function, so a plurality of network ports need to be realized by virtual network equipment and a mode of modifying original network messages. When the network message is frequently received and sent, the receiving and sending efficiency of the message needs to be ensured, and meanwhile, the real-time performance of the message is also considered, so that designing a set of efficient real-time network message receiving and sending method based on PCIe is the key for reflecting the network performance of the equipment.

Disclosure of Invention

The invention aims to provide a multi-port network message receiving and transmitting method based on PCIe (peripheral component interface express) so as to overcome the defects in the prior art.

In order to achieve the purpose, the invention provides the following technical scheme:

the invention discloses a multi-port network message receiving and transmitting method based on PCIe, which comprises the following steps:

s1: the ARM and the FPGA virtualize a plurality of network devices, port numbers are added to original network messages, and the function of simultaneously receiving and transmitting data by a plurality of network ports on a single PCIe channel is realized;

s2: the ARM adopts a mechanism of sending network message page merging, multi-page aggregation sending and overtime sending, and adopts two independent threads and two lock-free caches to operate in the processes of sending the message page merging and the merged page sending;

s3: and the ARM performs CPU binding processing on the processing threads for sending and receiving the network message.

Preferably, the step S1 includes the following sub-steps:

s11: the method comprises the steps that ARM creates a plurality of virtual network devices in a network driver, the FPGA creates network devices of external network ports, the number of the network devices is the same as that of the virtual network devices, and data interaction is carried out through PCIe channels;

s12: the ARM adds a port number in a network message according to virtual network equipment, and then inserts the port number into a network original message sending lock-free cache queue, and the method comprises the following substeps:

s121: the method comprises the steps that an ARM creates a network original sending message lock-free cache queue capable of storing M message structure addresses;

s122: the ARM determines a port number for sending a network message according to the virtual network equipment, and fills the port number into a vlan _ cfi field in a message structure sk _ buff;

s123: inserting the modified message structure address into a network original sending message lock-free cache queue;

s13: the ARM receives the network message of the FPGA, analyzes and obtains a port number, and sends the original message to corresponding virtual network equipment according to the port number, and the ARM comprises the following substeps:

s131: the ARM applies for a message receiving page cache containing N pages and sends a cached physical address to the FPGA;

s132: the FPGA receives external network port data, adds a port number on the head of a network message, writes the data into a cache and updates the state of a sending engine, and only one message is written in each page so as to be convenient for ARM to read;

s133: the ARM establishes a network message receiving thread, circularly reads the state of a sending engine of the FPGA, detects that a message arrives, obtains a header and a footer which contain the message in a cache, sequentially reads the message, analyzes to obtain a port number, and sends an original message with the port number removed to corresponding virtual network equipment.

Preferably, the step S2 includes the following sub-steps:

s21: the ARM establishes a network message merging page lock-free cache queue capable of storing P pages, wherein P is less than M;

s22: the ARM application can store an array pkg _ delay _ times of P assigned chars, store corresponding page merging delay time, apply to store an array page _ data _ len of P assigned short, store the effective data length of a corresponding page, and define a page merging timeout time threshold pkg _ delay _ threshold _ times;

s23: the ARM establishes a network transmission original message merging thread, performs page merging processing on a network transmission original message lock-free cache queue, and inserts the network message lock-free cache queue into the network message merging page;

s24: defining a message merge page sending trigger threshold: page _ threshold _ nums, page _ threshold _ nums < P, defining a message merge page delay trigger threshold: delay _ threshold _ times;

s25: the ARM creates a network message merge page sending thread, processes the message merge pages in the network message merge page lock-free cache queue, and transmits data to the FPGA through the DMA.

Preferably, the message merge page delay trigger threshold defined in step S24 is: the delay _ threshold _ times is greater than the page merge timeout time threshold defined in the step S22: pkg _ delay _ threshold _ times.

Preferably, the step S23 includes the following sub-steps:

s231: detecting whether the network message merge page lock-free cache queue is full or not until the network message merge page lock-free cache queue is detected to be not full, acquiring the current network message merge page, detecting whether the network original message transmission lock-free cache queue has to send the message or not,

if there is no message to be sent in the queue, executing step S232, and if there is a message to be sent in the queue, executing step S235;

s232: detecting the effective data length of the page corresponding to the current network message merge page: whether page _ data _ len is 0,

if the page valid data length: page _ data _ len =0, returning to step S231 again, otherwise, executing step S233;

s233: adding 1us to the page merge delay time pkg _ delay _ times corresponding to the current network message merge page, the network sending the original message merge thread to execute udelay (1) function sleep 1us, detecting whether the page merge delay time pkg _ delay _ times corresponding to the current network message merge page is greater than the page merge timeout threshold pkg _ delay _ threshold _ times,

if the page merge delay time pkg _ delay _ times is greater than the page merge timeout threshold pkg _ delay _ threshold _ times, execute step S234, otherwise, go back to step S231 again;

s234: inserting the current network message merge page into the network message merge page lock-free cache queue, and returning to the step S231 again;

s235: reading length information of the oldest message in an original message-sending lock-free cache queue of the network, detecting whether the current network message merge page has enough residual cache to store the message, if the residual cache is not enough, executing step S236, and if the residual cache is enough, executing step S237;

s236: inserting the current network message merge page into the network message merge page lock-free cache queue, detecting whether the network message merge page lock-free cache queue is full or not, obtaining the current network message merge page until the network message merge page lock-free cache queue is detected to be not full, and executing step S237;

s237: respectively copying a vlan _ cfi field, a data _ len field and message data in a message structure sk _ buf to a message data length field, a port number field and a message data content part in a network message merge page, updating the message number field in the current network message merge page, updating the information of the corresponding page effective data length page _ data _ len, and releasing resources of a network message sk _ buf cache and a network message lock-free cache queue of the message in an original message sending network;

s238: and detecting whether the network originally sends a message without a lock cache queue to the message to be sent, returning to the step S237 if the message to be sent is in the queue, and returning to the step S231 again if the message to be sent is not in the queue.

Preferably, the step S25 includes the following sub-steps:

s251: detecting whether the network message merge page lock-free cache queue has a message merge page to be sent or not until the message merge page to be sent in the queue is detected, and executing S252;

s252: calculating to obtain the number of merged pages of the message to be sent, determining whether the number of merged pages, page _ nums, is greater than a merged page sending trigger threshold, page _ threshold _ nums, and if the number of merged pages, page _ nums, is greater than the merged page sending trigger threshold, page _ threshold _ nums, executing step S255, otherwise executing step S253;

s253: adding pkg _ delay _ times corresponding to the message merging pages to be sent, adding kthread _ delay _ times to obtain total delay time all _ delay _ times, judging whether the total delay time all _ delay _ times is greater than a message merging page delay triggering threshold value delay _ threshold _ times,

if the total delay time all _ delay _ time is greater than the delay triggering threshold delay _ threshold _ time of the message merge page, executing step S255, otherwise executing step S254;

s254: the thread executes udelay (1), the function is delayed by 1us, the thread delay time kthread _ delay _ times value is added with 1, and the process returns to the step S251 again;

s255: combining the message merging pages to be sent, transmitting the message to the FPGA through DMA transmission, setting the corresponding page effective data length page _ data _ len and the page merging delay time pkg _ delay _ times to 0, setting the thread delay time kthread _ delay _ times to 0, releasing the resource of the message merging pages to be transmitted in the network message merging page lock-free cache queue, and returning to the step S251.

Preferably, the ARM adopts a six-core processor which is respectively a processor CPU1-CPU 6.

Preferably, the step S3 includes the following sub-steps:

s31: the network original sending message merging thread is bound to a CPU4 to run;

s32: the network message merge page sending thread is bound to the CPU5 to run;

s33: the network packet receiving thread binding runs at the CPU 6.

The invention has the beneficial effects that:

1. the invention realizes the function of simultaneously receiving and transmitting data by four network ports on a single PCIe channel by virtualizing four network devices and modifying the original network message through the ARM and the FPGA, thereby effectively reducing the cost;

2. because the DMA transmission of the ARM takes pages (4096 bytes) as a unit, the ARM establishes two lock-free caches and a processing thread, performs page combination processing on network transmission messages, effectively reduces the DMA transmission through a page aggregation transmission and overtime transmission mechanism, improves the network transmission efficiency, and simultaneously ensures the instantaneity when the network messages are less;

3. the ARM binds the network original sending message merging thread, the network message merging page sending thread and the network message receiving thread on the fixed CPU respectively, and the receiving and sending performance of the network message is effectively improved.

The features and advantages of the present invention will be described in detail by embodiments in conjunction with the accompanying drawings.

Drawings

FIG. 1 is a general architecture diagram of a PCIe-based multiport network of the present invention;

FIG. 2 is a data structure diagram of a network message merge page according to the present invention;

FIG. 3 is a flow chart of an original message sending interface of the present invention;

FIG. 4 is a flow chart of a merging thread of original sending messages of the network according to the present invention;

FIG. 5 is a flow chart of a network message merge page send thread according to the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail below with reference to the accompanying drawings and examples. It should be understood, however, that the description herein of specific embodiments is only intended to illustrate the invention and not to limit the scope of the invention. Moreover, in the following description, descriptions of well-known structures and techniques are omitted so as to not unnecessarily obscure the concepts of the present invention.

As shown in fig. 1, the overall framework of the present invention is a PCIe-based multi-port network packet transceiving method, which specifically includes the following processing methods:

s1: the method comprises the following steps that ARM and FPGA virtualize four network devices, port numbers are added to original network messages, and the function that the four network ports receive and send data simultaneously is achieved on a single PCIe channel, and the method specifically comprises the following steps:

s11: the ARM creates four virtual network devices in a network driver, the FPGA creates four network devices for an external network port, and data interaction is carried out through a PCIe channel;

s12: the ARM adds a port number in a network message according to the virtual network equipment, and then inserts the port number into a network original message sending lock-free cache queue, and the specific steps are as follows:

s123: and inserting the modified message structure address into a network original message sending lock-free cache queue.

S13: the ARM receives the network message of the FPGA, analyzes and obtains a port number, and sends an original message to corresponding virtual network equipment according to the port number, and the method specifically comprises the following steps:

s131: the ARM applies for a message receiving page cache containing N pages (4096 bytes), and sends a physical address of the cache to the FPGA;

s133: the method comprises the steps that an ARM creates a network message receiving thread, circularly reads the state of a sending engine of an FPGA, detects that a message arrives, obtains a header (rx _ head) and a footer (rx _ tail) which contain the message in a cache, sequentially reads the message, analyzes to obtain a port number, and sends an original message with the port number stripped to corresponding virtual network equipment, wherein the specific flow is shown in FIG. 3.

In the step, four network devices are virtualized through the ARM and the FPGA, original network messages are modified, the function that four network ports receive and send data simultaneously is achieved on a single PCIe channel, and cost is effectively reduced.

S2: the ARM adopts a mechanism of sending network message page merging, multi-page aggregation sending and overtime sending, and adopts two independent threads and two lock-free caches to operate in the processes of sending the message page merging and the merged page sending, and the specific steps are as follows:

s21: ARM creates a network message merge page lock-free cache queue which can store P pages (4096 bytes), P < M;

s22: the ARM application can store an array pkg _ delay _ times of P assigned chars, store corresponding page merging delay time, apply to store an array page _ data _ len of P assigned short, store effective data length of corresponding pages, and define a page merging timeout time threshold: pkg _ delay _ threshold _ times;

s23: the ARM creates a network original message sending merging thread, performs page merging processing on a network original message sending lock-free cache queue, and inserts the network message sending original message sending lock-free cache queue into the network message merging page lock-free cache queue, as shown in fig. 4, the specific steps are as follows:

s231: detecting whether the network message merge page lock-free cache queue is full, continuing to execute S231 when the queue is full, and executing S232 when the queue is not full;

s232: acquiring the current network message merge page ID, marking as i, detecting whether a network original message sending lock-free cache queue has a message to be sent or not, if not, executing S233, and if so, executing S236;

s233: detecting a page _ data _ len [ i ] corresponding to the current network message merge page, and executing S231 if the page _ data _ len [ i ] =0, otherwise executing S234;

s234: adding 1 to pkg _ delay _ times [ i ] corresponding to the current network message merge page, the network sending the original message merge thread to execute udelay (1) function delay 1us, detecting pkg _ delay _ times [ i ], pkg _ delay _ times [ i ] > pkg _ delay _ threshold _ times, then executing S235, otherwise executing S231;

s235: inserting the current network message merge page (ID is i) into a network message merge page lock-free cache queue, and executing S231;

s236: acquiring the oldest message ID in the network original message lock-free cache queue, marking the oldest message ID as j, detecting whether the current network message merge page has enough space to store the message, if not, executing S237, and if so, executing S2310;

s237: inserting the current network message merge page (ID is i) into the network message merge page lock-free cache queue, and then executing S238;

s238: detecting whether the network message merge page lock-free cache queue is full, continuing to execute S238 if the queue is full, and executing S239 if the queue is not full;

s239: acquiring the current network message merge page ID, marking as k, and then executing S2310;

s2310: the vlan _ cfi, data _ len and message data of the network original message structure sk _ buff marked as j are respectively copied to the data length field, port number field and message data content of the current network message merge page (ID is i or k), the message number field of the current network message merge page is updated, the processed page data structure is as shown in fig. 2,

updating the corresponding value of page _ data _ len [ i ] or page _ data _ len [ k ], releasing resources marked as j network original message sk _ buff cache and network original sent message lock-free cache queues, and then executing S2311;

s2311: and detecting whether the network originally sends the message without a lock cache queue to be sent or not, if not, executing S231, and if so, executing S236.

S24: defining a message merge page sending trigger threshold: page _ threshold _ nums, page _ threshold _ nums < P, defining a message merge page delay trigger threshold: delay _ threshold _ times, delay _ threshold _ times > pkg _ delay _ threshold _ times;

s25: the ARM creates a network message merge page sending thread, processes the message merge pages in the network message merge page lock-free cache queue, and transmits data to the FPGA through the DMA, as shown in fig. 5, and the specific steps are as follows:

s251: detecting whether a network message merge page lock-free cache queue has a message merge page to be sent, if not, executing S251, and otherwise, executing S252;

s252: calculating to obtain the number of page _ nums of the merged pages of the message to be sent, and executing S255 if the number of the page _ nums is greater than the number of the page _ threshold _ nums, or executing S253 if the number of the merged pages of the message to be sent is not greater than the number of the page _ threshold _ nums;

s253: adding pkg _ delay _ times corresponding to a message merging page to be sent, and adding the sum to thread delay time kthread _ delay _ times to obtain total delay time all _ delay _ times, wherein if the total delay time all _ delay _ times > delay _ threshold _ times is S255, otherwise, S254 is executed;

s254: the thread executes udelay (1) the function delays 1us, adds 1us to the kthread _ delay _ times value, and then executes S251;

s255: combining the message merging pages to be sent, transmitting the messages to the FPGA through DMA transmission, setting the corresponding page _ data _ len and pkg _ delay _ times to be 0, setting the kthread _ delay _ times to be 0, releasing the resource of the message merging pages to be sent in the network message merging page lock-free cache queue, and executing S251.

In the step, the ARM establishes two lock-free caches and processing threads, page combination processing is carried out on the network message, DMA transmission is effectively reduced through a page aggregation sending and overtime sending mechanism, the network transmission efficiency is improved, and meanwhile the instantaneity of the network message in a small number is guaranteed.

S3: the ARM carries out CPU binding processing on processing threads for sending and receiving network messages, adopts a six-core processor which is respectively a processor CPU1-CPU6, and specifically comprises the following steps:

s32: the network message merge page sending thread is bound to the CPU5 to run;

s33: the network packet receiving thread binding runs at the CPU 6.

In the step, the ARM binds the network original sending message merging thread, the network message merging page sending thread and the network message receiving thread to the fixed CPU respectively, so that the receiving and sending performance of the network message is effectively improved.

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents or improvements made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims

1. A multi-port network message receiving and sending method based on PCIe is characterized by comprising the following steps:

2. The PCIe-based multiport network messaging method of claim 1, wherein the step S1 comprises the substeps of:

3. The PCIe-based multiport network messaging method of claim 1, wherein the step S2 comprises the substeps of:

4. The PCIe-based multiport network messaging method of claim 3, wherein the packet merge page latency trigger threshold defined in the step S24 is: the delay _ threshold _ times is greater than the page merge timeout time threshold defined in the step S22: pkg _ delay _ threshold _ times.

5. The PCIe-based multiport network messaging method of claim 3, wherein the step S23 comprises the substeps of:

6. The PCIe-based multiport network messaging method of claim 3, wherein the step S25 comprises the substeps of:

7. The PCIe-based multi-port network messaging method of claim 1, wherein the ARM employs a six-core processor, which is the processors CPU1-CPU6 respectively.

8. The PCIe-based multiport network messaging method of claim 7, wherein the step S3 comprises the substeps of:

s32: the network message merge page sending thread is bound to the CPU5 to run;

s33: the network packet receiving thread binding runs at the CPU 6.