CN116866449A

CN116866449A - TOE acceleration system for improving application network performance

Info

Publication number: CN116866449A
Application number: CN202310832814.0A
Authority: CN
Inventors: 邢钱舰; 余锋; 王嘉浩
Original assignee: Zhejiang University ZJU
Current assignee: Zhejiang University ZJU
Priority date: 2023-07-08
Filing date: 2023-07-08
Publication date: 2023-10-10

Abstract

The application discloses a TOE acceleration system for improving the performance of an application network, which comprises the following components: the system comprises an interface replacement module, a TOE network driver and TOE network card equipment; the interface replacing module is used for judging whether a TCP unloading is needed by a data receiving and transmitting related interface, and correspondingly processing the TOE-needed TOE network driver, or else, performing system call according to a function original path to enter a kernel protocol stack; the TOE network driver is used for processing the receiving and transmitting command issued by the interface replacing module, maintaining a corresponding sending/receiving task queue and controlling the TOE network card equipment to complete corresponding operation; the TOE network card equipment is arranged in the FPGA and is used for processing protocol work involved in the process of receiving and transmitting data messages. The TOE acceleration system for improving the network performance of the application can be rapidly deployed on the premise of not modifying the kernel of the operating system, and the TOE acceleration system can be conveniently and rapidly used for accelerating network transmission in various application programs realized on the basis of TCP sockets, so that the network performance of the application is improved.

Description

TOE acceleration system for improving application network performance

Technical Field

The application belongs to the technical field of networks, and particularly relates to a TOE acceleration system for improving the performance of an application network.

Background

The rapid increase of data volume is brought about by the acceleration innovation of modern technology, massive data information penetrates through all aspects of daily life work, and higher requirements are put on data transmission bandwidth. Ethernet is the most widely used data transmission network today, in ethernet communication, processing of a network protocol stack is usually performed by a CPU, and with the increasing bandwidth of network transmission, the resource overhead of a processor is more serious, which gradually becomes a bottleneck in high-speed network transmission.

The TCP protocol offload engine (TCP Offload Engine, TOE) technology reduces CPU occupation and bypasses an operating system by offloading processing work of a network protocol stack from a CPU to a special processing unit, such as a network card (NIC), FPGA, ASIC, etc., thereby simplifying a processing path of the network protocol, relieving pressure of a processor, and being an important direction in the research of TCP acceleration technology.

The mainstream TOE technology can be divided into a Chirney TOE and a Full Offload TOE, wherein the former is a semi-unloading technology, only the data transmission path which consumes the most CPU resource is unloaded, and the functions of connection establishment, disconnection maintenance and the like are continuously processed by the kernel of the operating system; the latter requires that the functions that the entire network protocol stack needs to use be implemented entirely in hardware. Compared with the prior art, the TOE design difficulty is low by adopting a semi-unloading mode, and the compatibility with an upper application program is better; and a full unloading mode is used, and a custom API interface is needed to complete corresponding functions of the network during application program development.

Currently, linux operating systems do not provide good support and specifications for TOE technology. Compared with the general Linux TCP/IP protocol stack, various TOE solutions are proprietary, the actual deployment is complex, and the support of application programs is very limited. Most TOE schemes need to modify network protocol stack source codes, and have a certain risk on the stability and inheritability of the system. Therefore, how to quickly deploy the TOE in the application program without modifying the kernel of the operating system becomes an urgent problem to be solved, and has important significance in the network technical field.

Disclosure of Invention

The application provides a TOE acceleration system for improving the performance of an application network, which solves the technical problems, and specifically adopts the following technical scheme:

a TOE acceleration system for improving application network performance, comprising: the system comprises an interface replacement module, a TOE network driver and TOE network card equipment;

the interface replacing module is arranged in a user state of the host, and is used for judging whether a TCP unloading is needed for an interface related to data receiving and transmitting, correspondingly processing the TOE needed to be processed by the TOE, entering the TOE network driver, and otherwise, performing system call according to a function original path to enter a kernel protocol stack;

the TOE network driver is arranged in a kernel mode of the host, and is used for processing a receiving and transmitting command issued by the interface replacement module, maintaining a corresponding sending/receiving task queue and controlling the TOE network card equipment to complete corresponding operation;

the TOE network card equipment is arranged in the FPGA and is used for processing protocol work involved in the process of receiving and transmitting data messages.

Further, the interface replacing module replaces a socket interface related to data transceiving in the application program through a hook function.

Further, the protocol work handled by the TOE network card device includes out-of-order transmission, timer management, flow control and congestion control, and encapsulation parsing of the protocol.

Further, the TOE network driver interacts with the TOE network card device through a register and an interrupt to complete the control of the sending/receiving task, the transparent transmission of the bypass message and the configuration of the hardware state.

Further, the interface replacing module obtains a local IP address according to the socket handle of the interface parameter, and queries in a virtual network port IP hash table generated by the TOE network driver to judge whether the TOE network driver needs to be called to realize TCP unloading.

Further, the TOE network driver performs data transceiving control on the TOE network card device based on a task command instead of a TCP message, as follows:

setting a transmitting/receiving annular task queue for each TCP connection in the TOE network card equipment;

each time the data transmission interface is called, the next time a command is sent to a hardware transmission task queue, wherein the command comprises a data address and a length;

each time the data receiving interface is called, data with corresponding length is read from a receiving buffer zone formed by splicing a plurality of receiving commands.

Further, the TOE network driver comprises a hardware configuration unit, a bypass processing unit and a data receiving and transmitting unit;

the hardware configuration unit initializes the TOE network card equipment when the driver is mounted, and comprises register area mapping, virtual network port configuration, interrupt management and FPGA hardware logic initialization configuration;

the bypass processing unit is used for processing the low-speed protocol messages which do not need TCP unloading by the kernel protocol stack, and simultaneously sending the messages which need to be sent by the kernel protocol stack to the TOE network card equipment through the hardware configuration unit, the bypass processing unit performs TCP connection state synchronous maintenance so as to correctly configure the hardware of the TOE network card equipment in different stages, and the bypass processing unit performs ARP response analysis so as to update and maintain an ARP table in the TOE network card equipment;

the data receiving and transmitting unit is used for processing the command sent by the interface replacing module, the data receiving and transmitting unit maintains a transmitting/receiving annular task queue with the length of 256 in TOE network card equipment, the physical address, the length and other information of data to be transmitted are written into the transmitting task queue through the command register, the network card hardware logic sequentially tries to complete the command in the transmitting queue, and when the data is received, the network card hardware logic sequentially places the data into the address designated by the command in the receiving task queue.

Further, the method for processing the related interface by the data transceiver unit comprises the following steps:

s200: preprocessing, namely judging whether the connection information is normal or not and whether the transmission size is smaller than a transmission buffer area or not;

s201: after the pretreatment is in accordance with the requirements, actively maintaining a once-sending queue, and cleaning the completed command;

s202: calculating the remaining size of the buffer zone, and calculating by combining the head-to-tail offset address and the size of the transmission task queue maintained in the drive with the total size of the buffer zone, entering S203 when the remaining size of the buffer zone is larger than the size of data to be transmitted, otherwise, directly returning an error if the transmission option is set to be non-blocking;

s203: copying the application data to the corresponding address of the sending buffer area, and when the application data is set to be zero copy, skipping step S203 and checking whether the data address is correct;

s204: and writing the data address and the size into a transmission queue of the TOE network card equipment through the hardware configuration unit, and synchronously maintaining the simulated transmission queue in the drive.

Further, the method for processing the receiving related interface by the data receiving and transmitting unit comprises the following steps:

s210: preprocessing, judging whether the connection information is normal, and when the receiving option is set as MSG_WAITALL, judging whether the receiving size is smaller than the maximum value which can be used for receiving, namely subtracting the confirmed unreleased part from the total receiving buffer area size;

s211: calculating the size of the received data in the receiving buffer area according to the receiving window calculation mode, entering S212 when the received size is larger than the current receiving command size or the receiving size is larger than 0 and the MSG_WAITALL option is not set, exiting the receiving and returning corresponding error values when the receiving size is zero and the connection is disconnected or the receiving is in a non-blocking mode, and continuously repeating S211 when the receiving window calculation mode is in a blocking mode, and calculating the received data size until the exiting condition is met;

s212: copying the data received in the receiving buffer area to an application designated address, and when the data is set to be zero copy, skipping step S212 and checking whether the data address is correct;

s213: maintaining a receiving window, shifting the left boundary of the receiving buffer zone to the right, and if the left boundary crosses one or more receiving commands, writing corresponding numbers of receiving commands into the TOE network card equipment to keep the size of the buffer zone unchanged.

Further, the conventional TCP receiving window is simulated by splicing a plurality of equal-length receiving commands with continuous addresses as a receiving buffer, the receiving window is dynamically maintained each time data is read out, and the TOE network card device places the data from the transmission network into a designated receiving buffer address.

The TOE acceleration system for improving the network performance of the application provided by the application can be rapidly deployed on the premise of not modifying the kernel of the operating system, and the TOE acceleration system can be conveniently and rapidly used for accelerating network transmission in various application programs realized on the basis of TCP sockets, so that the network performance of the application is improved.

The application has the advantages that the TOE acceleration system for improving the performance of the application network uses the network card driving software to manage the receiving and transmitting of network data, and reserves the control capability of the kernel protocol stack for establishing and disconnecting the TCP connection, so that the application program can manage the TCP connection by using a normal socket interface.

The application has the advantages that the TOE acceleration system for improving the performance of the application network provided by the application can unload the TCP protocol processing related to the data transmission path to TOE network card hardware, and uses FPGA logic to process, thereby not only being beneficial to exerting the high-speed transmission performance of the network card, improving the network transmission bandwidth, reducing the transmission delay, but also being capable of greatly reducing the CPU occupancy rate and relieving the pressure of a processor.

Drawings

In order to more clearly illustrate the embodiments of the application or the technical solutions of the prior art, the drawings which are used in the description of the embodiments or the prior art will be briefly described, it being obvious that the drawings in the description below are only some embodiments of the application, and that other drawings can be obtained according to these drawings without inventive faculty for a person skilled in the art.

FIG. 1 is a schematic diagram of a TOE acceleration system for improving performance of an application network according to the present application;

FIG. 2 is a schematic diagram of the internal architecture of the TOE network driver of the present application;

FIG. 3a is a flowchart illustrating steps for invoking a data transmission interface according to an embodiment of the present application;

FIG. 3b is a flowchart illustrating steps performed by a data receiving interface call according to another embodiment of the present application;

fig. 4 is a schematic diagram of a receiving window implementation of the present application.

Detailed Description

Embodiments of the present application are described in detail below, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to like or similar elements or elements having like or similar functions throughout. The embodiments described below by referring to the drawings are illustrative and intended to explain the present application and should not be construed as limiting the application.

Fig. 1 shows a TOE acceleration system for improving performance of an application network according to the present application, which includes: interface replacement module, TOE network driver and TOE network card equipment.

The interface replacing module is arranged in a user mode of the Linux host. The interface replacing module is used for judging whether the TCP unloading is needed by the interface related to the data receiving and sending, correspondingly processing the TOE needed to be processed, entering the TOE network driver, and otherwise, carrying out system call according to the original path of the function to enter the kernel protocol stack.

Specifically, the interface replacement module comprises a drive start part and an interface transfer part.

And (3) driving to start: setting a handle of a static variable representation driver, opening the driver when the interface is first mobilized, storing the handle, and simultaneously detecting the IP addresses of all TOE ports on the host and storing the IP addresses in a hash table.

Interface transfer: the method comprises the steps of firstly obtaining a source end IP through a socket handle, packing function parameters if the source end IP is found in a hash table, driving a corresponding interface through an ioctl function call TOE network, and otherwise entering a system call according to a source path. The interfaces comprise recv, send, writev, sendfile, epoll_create, epoll_ctl, epoll_wait and the like, and custom interfaces such as get_recv_addr, get_send_addr and the like are additionally arranged for the zero copy characteristic to acquire the address and the length of the data buffer area of the network card.

The TOE network driver is arranged in the kernel mode of the Linux host. The TOE network driver is used for processing the receiving and transmitting command issued by the interface replacement module, maintaining the corresponding sending/receiving task queue and controlling the TOE network card equipment to complete corresponding operation. The interface replacement module communicates with the TOE network driver using ioctl function calls.

The TOE network card equipment is arranged in the FPGA and is completed by FPGA hardware logic. The TOE network card device is used for processing protocol work involved in the process of receiving and transmitting data messages, and the protocol work comprises heavy load work such as out-of-order transmission, timer management, flow control, congestion control, encapsulation analysis of the protocol and the like. The TOE network card device uses one TCP working module to uniformly manage all TCP connections, and separately caches information related to the TCP connections, thereby realizing the concurrent transmission of multiple TCP connections with high resource utilization rate and improving the expandability of the system.

In the embodiment of the application, the TOE network driver interacts with the TOE network card equipment through the register and the interrupt to complete the control of the sending/receiving task, the transparent transmission of the bypass message and the configuration of the hardware state.

In an embodiment of the present application, the interface replacement module replaces the socket interface related to data transceiving in the application program through the hook function. Specifically, the replaced data transceiving related API interfaces include, but are not limited to, recv, send, writev, readv, sendfile, and the interface replacing module obtains a local IP address according to a socket handle of an interface parameter, and queries the virtual portal IP hash table generated by the TOE network driver to determine whether to call the TOE network driver to implement TCP offload.

In the embodiment of the application, the TOE network driver performs data transceiving control on the TOE network card device based on the task command instead of the TCP message, and the data transceiving control is as follows:

a transmit/receive ring task queue is set in the TOE network card device for each TCP connection.

Each time the data transmission interface is called, the command is sent to the hardware transmission task queue once, and the command comprises a data address and a length.

As shown in fig. 2, in the embodiment of the present application, the TOE network driver includes a hardware configuration unit, a bypass processing unit, and a data transceiving unit.

The hardware configuration unit initializes TOE network card equipment when the driver is mounted, and comprises register area mapping, virtual network port configuration, interrupt management and FPGA hardware logic initialization configuration. The hardware configuration unit also serves as a bridge for interaction with hardware, and other units in the auxiliary network drive complete corresponding functions.

The bypass processing unit is used for transmitting the low-speed protocol message which does not need TCP unloading to the kernel protocol stack for processing, and transmitting the message which needs to be transmitted by the kernel protocol stack to the TOE network card equipment through the hardware configuration unit. The bypass processing unit performs TCP connection state synchronous maintenance so as to correctly configure TOE network card equipment hardware at different stages, and performs ARP response analysis so as to update and maintain an ARP table in the TOE network card equipment.

The bypass processing unit needs to process the message according to the different message types. Specifically, for the messages SYN, FIN, RST, etc. of the TCP in the connection establishment and disconnection stages, before submitting to the kernel processing, a TCP connection state is synchronously maintained in the driver, so as to correctly configure the TOE network card device hardware in different stages. It should be noted that, since the middle data transmission process is offloaded by the TOE after the connection is established, the SEQ and ACK sequence numbers maintained by the core at this time are already different from the actual transmission situation, and to correctly identify the TCP packet associated with the disconnection by the core, it is necessary to record and modify the SEQ and ACK fields of the packet, and synchronously modify the checksum so as to verify the correctness. In addition, when the bypass processing unit receives the ARP reply, it is further required to parse the message and maintain the ARP table in the TOE network card device.

Specifically, the application program calls the socket interface related to data transmission, performs interface replacement through the hook function, judges whether TCP unloading is required according to the IP address corresponding to the socket handle, and enters the TOE network to drive the corresponding interface through the ioctl function if so. Next, as shown in fig. 3a, in the embodiment of the present application, the method for processing the transmission related interface by the data transceiver unit is:

s200: preprocessing, judging whether the connection information is normal or not and whether the transmission size is smaller than the transmission buffer area or not.

S201: after the pretreatment is in accordance with the requirements, the transmission queue is actively maintained for one time, and the completed command is cleaned. The process obtains the number of the completed sending commands in the period by calculating the difference value of the two tail times of the related register of the sending task queue, and accordingly, the simulated sending task queue in the drive is maintained to be consistent with the annular queue in the hardware of the TOE network card equipment.

S202: calculating the remaining size of the buffer zone, calculating by combining the head-to-tail offset address and the size of the transmission task queue maintained in the drive with the total size of the buffer zone, entering S203 when the remaining size of the buffer zone is larger than the size of data to be transmitted, otherwise, directly returning an error if the transmission option is set to be non-blocking.

S203: and copying the application data to the corresponding address of the sending buffer area, and when the zero copy is set, skipping the step S203 and checking whether the data address is correct.

S204: and writing the data address and the size into a transmission queue of the TOE network card equipment through a hardware configuration unit, and synchronously maintaining the simulated transmission queue in the drive.

The application program calls the socket interface related to data receiving, the interface is replaced through the hook function, whether TCP unloading is needed is judged according to the IP address corresponding to the socket handle, and if so, the application program enters the TOE network through the ioctl function to drive the corresponding interface. Next, as shown in fig. 3b, in the embodiment of the present application, the method for processing the receiving-related interface by the data transceiver unit is:

s210: preprocessing, judging whether the connection information is normal, and when the receiving option is set to MSG_WAITALL, judging whether the receiving size is smaller than the maximum value which can be used for receiving, namely subtracting the confirmed unreleased part from the total receiving buffer size.

S211: and (3) calculating the size of the received data in the receiving buffer area according to the receiving window calculation mode, entering S212 when the received size is larger than the current receiving command size or the receiving size is larger than 0 and the MSG_WAITALL option is not set, exiting the receiving and returning corresponding error values when the receiving size is zero and the connection is disconnected or the receiving is in a non-blocking mode, and continuously repeating S211 when the receiving window calculation mode is in a blocking mode, and calculating the received data size until the exiting condition is met.

S212: copying the data received in the receiving buffer to the application-specific address, and when the zero copy is set, skipping step S212 and checking whether the data address is correct.

S213: maintaining a receiving window, shifting the left boundary of the receiving buffer zone to the right, and if the left boundary crosses one or more receiving commands, writing corresponding number of receiving commands into TOE network card equipment to keep the size of the buffer zone unchanged.

In the embodiment of the application, a plurality of continuous equal-length receiving commands with a plurality of addresses are spliced to be used as a receiving buffer area to simulate a traditional TCP receiving window, the receiving window is dynamically maintained when data are read out each time, and the TOE network card equipment places the data from a transmission network into the address of the designated receiving buffer area.

The calculation process of the relevant receiving window in the flowchart of the relevant interface of the processing and receiving of the data transceiver unit is described as shown in fig. 4, and the calculation and adjustment modes of the relevant receiving window in steps S210, S211 and S213 are specifically explained.

In the embodiment of the application, the receiving window is an annular task queue formed by splicing a plurality of receiving commands with continuous addresses and equal lengths. For ease of presentation understanding, FIG. 4 levels the circular task queue and specializes the number of received commands to 64.

Specifically, rwq in fig. 4 represents a receive task queue, rwq _tail, rwq _cur, rwq _head are three pointers to the receive task queue, task commands are added from rwq _head, rwq _tail are deleted, and rwq _cur represents the command being executed. Region ACKED represents the portion of TCP that is acknowledged and received by the application, ACKED NOT RECEIVED represents the portion that is acknowledged but NOT received by the application, and NOT ACKED represents the remaining portion in the buffer.

In fig. 4, area (1) is part of an activated, recording the received portion of rwq tail rwq _tail, denoted tail_ rwq _recv_done. The areas (2) (3) are all parts that are confirmed but not received by the application program, wherein (3) represents the part that has been confirmed in rwq _cur, and is represented by cur_ rwq _recv_done. Tail_ rwq _recv_done and cur_ rwq _recv_done may be zero, rwq _tail may also be equal to rwq _cur, and cur_ rwq _recv_done > tail_ rwq _recv_done is always satisfied when rwq _tail= rwq _cur.

The area (2) is the size of the received data in the receiving buffer to be calculated in step S211, which is (rwq _cur-rwq _tail) × rwq _size+cur_ rwq _recv_done-tail_ rwq _recv_done.

The region (4), i.e. the size of the remaining receiving buffer to be calculated in step S210, is (rwq _head-rwq _cur) × rwq _size-cur_ rwq _recv_done.

Note that rwq is a circular queue, so the rwq pointer difference calculation in the above two formulas may be negative, and 256 is added and then the 256 is left to ensure that the difference is positive.

In step S213, the left boundary of the receiving buffer needs to be shifted to the right to maintain the size of the receiving buffer, i.e. the rwq _tail pointer in fig. 4 is shifted to the right, and this pointer value is maintained by the driver software, which is different from the value in the actual TOE network card device, because the rwq _tail value will shift to the right automatically after ACK acknowledgement in the hardware logic, and it is actually required to wait for the application software to call the receiving interface for receiving and then modifying.

Preferably, a plurality of receive window-related registers may be added to simplify CPU computation.

The foregoing has shown and described the basic principles, principal features and advantages of the application. It will be appreciated by persons skilled in the art that the above embodiments are not intended to limit the application in any way, and that all technical solutions obtained by means of equivalent substitutions or equivalent transformations fall within the scope of the application.

Claims

1. A TOE acceleration system for improving performance of an application network, comprising: the system comprises an interface replacement module, a TOE network driver and TOE network card equipment;

2. The TOE acceleration system for improving performance of an application network of claim 1,

the interface replacing module replaces a socket interface related to data receiving and transmitting in the application program through a hook function.

3. The TOE acceleration system for improving performance of an application network of claim 1,

the protocol work processed by the TOE network card equipment comprises out-of-order transmission, timer management, flow control, congestion control and encapsulation analysis of the protocol.

4. The TOE acceleration system for improving performance of an application network of claim 1,

the TOE network driver interacts with the TOE network card equipment through a register and an interrupt to complete the control of the sending/receiving task, the transparent transmission of bypass messages and the configuration of hardware states.

5. The TOE acceleration system for improving performance of an application network of claim 1,

the interface replacing module obtains a local IP address according to the socket handle of the interface parameter, and queries in a virtual network port IP hash table generated by the TOE network driver to judge whether the TOE network driver needs to be called to realize TCP unloading.

6. The TOE acceleration system for improving performance of an application network of claim 1,

the TOE network driver performs data transceiving control on the TOE network card device based on task commands instead of TCP messages, as follows:

7. The TOE acceleration system for improving performance of an application network of claim 6,

the TOE network driver comprises a hardware configuration unit, a bypass processing unit and a data receiving and transmitting unit;

8. The TOE acceleration system for improving performance of an application network of claim 7,

the method for processing and sending the relevant interfaces by the data receiving and sending unit comprises the following steps:

9. The TOE acceleration system for improving performance of an application network of claim 7,

the method for processing and receiving the relevant interfaces by the data receiving and transmitting unit comprises the following steps:

10. The TOE acceleration system for improving performance of an application network of claim 9,

the method uses the equal length receiving orders with continuous addresses to splice as receiving buffer area to simulate traditional TCP receiving window, and dynamically maintains the receiving window every time the data is read out, and the TOE network card device puts the data from the transmission network into the designated receiving buffer area address.