CN116489250A - Method for transmitting path in zero copy in stack based on shared memory communication mode - Google Patents

Method for transmitting path in zero copy in stack based on shared memory communication mode Download PDF

Info

Publication number
CN116489250A
CN116489250A CN202310443027.7A CN202310443027A CN116489250A CN 116489250 A CN116489250 A CN 116489250A CN 202310443027 A CN202310443027 A CN 202310443027A CN 116489250 A CN116489250 A CN 116489250A
Authority
CN
China
Prior art keywords
data
data packet
network
memory
protocol stack
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310443027.7A
Other languages
Chinese (zh)
Inventor
李健
裘鹏泽
管海兵
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Jiaotong University
Original Assignee
Shanghai Jiaotong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Jiaotong University filed Critical Shanghai Jiaotong University
Priority to CN202310443027.7A priority Critical patent/CN116489250A/en
Publication of CN116489250A publication Critical patent/CN116489250A/en
Pending legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L69/00Network arrangements, protocols or services independent of the application payload and not provided for in the other groups of this subclass
    • H04L69/22Parsing or analysis of headers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F13/00Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F13/14Handling requests for interconnection or transfer
    • G06F13/20Handling requests for interconnection or transfer for access to input/output bus
    • G06F13/28Handling requests for interconnection or transfer for access to input/output bus using burst mode transfer, e.g. direct memory access DMA, cycle steal
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/54Interprogram communication
    • G06F9/544Buffers; Shared memory; Pipes
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L65/00Network arrangements, protocols or services for supporting real-time applications in data packet communication
    • H04L65/60Network streaming of media packets
    • H04L65/75Media network packet handling
    • H04L65/762Media network packet handling at the source 
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/02Protocols based on web technology, e.g. hypertext transfer protocol [HTTP]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/14Session management
    • H04L67/141Setup of application sessions
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L69/00Network arrangements, protocols or services independent of the application payload and not provided for in the other groups of this subclass
    • H04L69/16Implementation or adaptation of Internet protocol [IP], of transmission control protocol [TCP] or of user datagram protocol [UDP]
    • H04L69/161Implementation details of TCP/IP or UDP/IP stack architecture; Specification of modified or new header fields
    • H04L69/162Implementation details of TCP/IP or UDP/IP stack architecture; Specification of modified or new header fields involving adaptations of sockets based mechanisms
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/50Reducing energy consumption in communication networks in wire-line communication networks, e.g. low power modes or reduced link rate

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Theoretical Computer Science (AREA)
  • Multimedia (AREA)
  • Computer Security & Cryptography (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The invention discloses an intra-stack zero-copy transmission path method based on a shared memory communication mode, and relates to the technical field of data communication. The copy from the shared memory to the data packet buffer is changed into the reference of writing the data in the shared memory in the data packet buffer, so that the data copy in the data packet buffer area is eliminated. On the premise that the application side is not changed to keep the transparency and low coupling of the network protocol stack to the upper layer application, the original transmission path in the network protocol stack is transformed into a zero-copy transmission path, and the copy in the network protocol stack is eliminated, so that the repetition of data in LLC is eliminated, the LLC is saved, and the network performance of the network protocol stack for transmitting large files under a high-load scene is improved.

Description

Method for transmitting path in zero copy in stack based on shared memory communication mode
Technical Field
The invention relates to the technical field of data communication, in particular to an intra-stack zero-copy transmission path method based on a shared memory communication mode.
Background
Today, people like to acquire information and share communication through short videos and streaming media, and correspondingly, video streaming also forms a considerable part of internet traffic. This enormous flow creates a significant challenge while also creating revenue for the video platform. The platform needs to have enough hardware and software to support this huge traffic.
Thanks to the rapid development of hardware in recent years, the performance of network hardware is stronger and stronger, the bandwidth is continuously increased, and the 100G network card and the 400G network card are already appeared. It has been found that the processing speed of the software at the upper layer of the host side is gradually not kept pace with the development of hardware, and becomes a bottleneck for overall performance. The network protocol stack is used as a key for communicating the upper layer application and the lower layer network hardware, and has important influence on the overall network performance.
A simplified video streaming scenario can be seen as a client receiving a large file sent by the server side over a long connection. Under the scene of large file, long connection transmission and huge overall flow, the efficiency of the traditional kernel mode network protocol stack is lower. Because the application uses the kernel mode network protocol stack to bring about the context switching of the user mode and the kernel mode, data needs to be copied between the user mode and the kernel mode, and other overheads, some researchers start to explore the direction of the user mode network protocol stack so as to bypass the kernel mode network protocol stack, and network performance is improved.
The user state network protocol stack realizes the network protocol stack in the user state process, so that the context switching between the user state and the kernel state is avoided; meanwhile, compared with a kernel program, the user mode program is more convenient in development, test and deployment, and can be updated in an iterative manner.
Common user state protocol stacks can be divided into two categories: library OS (LibOS) mode and microkernel mode.
The user state protocol stack of the LibOS mode places the protocol stack and the application in the same process, and the application realizes network communication through function call. Compared with the traditional system call, the function call is lighter, and the cost of context switching is saved. But this also results in a tight coupling of application code and network protocol stack code and requires additional development for different applications, with poor versatility between different applications. At the same time, the coupling causes the update and upgrade of the network protocol stack to be bound with the application program, and the entire process needs to be restarted to update and upgrade.
The microkernel mode user mode protocol stack deploys the network protocol stack into a single process for operation, and communicates with external applications by sharing memory to provide network services for them. Under the design, the application and the network protocol stack have the characteristic of low coupling, can be suitable for more programs, has good universality, and can be upgraded independently of the application.
The microkernel mode user mode network protocol stack is different from the application in process, so that network service is required to be provided for the application by an inter-process communication mode, and the common inter-process communication mode is communication based on a shared memory. But the communication mode of the shared memory introduces multiple copies of data.
Taking the transmission of an application as an example, the data to be transmitted is in a memory space private to the application, so the application copies the data to be transmitted into the shared memory, and the network protocol stack can access the data. After the network protocol stack takes the data, it needs to package the data. The packetizing operation is performed in the packet buffer, so the network protocol stack copies the shared memory data into the packet buffer, and then prepends the packet with a header. And finally, the network protocol stack gives the data packet to the network card and sends the data packet out.
In a large file transfer scenario, because the transferred data blocks are generally large, many CPU cycles are consumed for memory copying, making memory copying a significant overhead.
However, when the impact of memory copying is analyzed from the perspective of the processor microarchitecture, it is found that memory copying in large file transfer scenarios consumes not only a large number of CPU cycles, but also considerable LLC resources.
Fig. 1 analyzes a transmission flow of an original VPP network protocol stack. Starting from the application side, the application copies the data from the private memory into the shared memory and notifies the network protocol stack. In the Intel x86 architecture, when an application and a network protocol stack respectively run on different CPU cores of the same socket, the CPU caches data in the shared memory in the LLC, so that the different cores share the data. Then the network protocol stack copies the data in the shared memory into the data packet buffer, adds a packet header in front of the data and gives the data to the driver, and the driver constructs the descriptor to write the descriptor ring and informs the network card, so that the network card sends the data packet in a Direct Memory Access (DMA) mode. However, as the network bandwidth increases continuously, the network card needs to complete processing of the data packet in a shorter time, and delay caused by original access to the memory is too high. Therefore, some manufacturers add Direct Cache Access (DCA) capability to I/O devices such as a network card, for example, intel's Data Direct I/O (DDIO), where the I/O devices can directly access the LLC, so as to reduce delay of Data reading and writing and improve throughput. Therefore, the CPU will cache the data packet in the LLC for access by the network card.
Based on the above analysis, it was found that the application on the transmit path and the data copy of the network protocol stack together generate two pieces of data, one piece of application data being placed in the shared memory, and the other piece of application data being encapsulated with the packet header into a packet. Although the data structures of the application data and the data packet are different, the main content is the transmission data of the application, and both copies of the data occupy the LLC. However, the space of LLC is limited, and LLC on one socket is only tens of MB, far smaller than physical memory, and cannot be extended on some common processors, such as Intel Xeon series. Thus, as the amount of data sent by an application increases, more memory space is used by the data, and copies of the data copies may enlarge this memory space, resulting in the LLC being quickly depleted. Once exhausted, more data and copies of the copy can only be placed in memory, resulting in more CPU cycles for this portion of the memory copy to complete, and the network card also requires more time to wait for the data to be read from the memory. The slow memory copy and network card sending data packets can result in reduced packet sending rate of the network protocol stack, and consequently reduced performance.
Through experiments, LLC service conditions of application and network protocol stacks under different network connection numbers are observed, and the LLC ratio of the network protocol stacks CPU is plotted in FIG. 2 and FIG. 3. In the experiment, when the number of network connections is less than 400, more LLCs are occupied by the application and the network protocol stack; after 400, the application and network protocol stacks occupy almost all of the LLC, while the LLC store miss rate of the network protocol stacks increases rapidly from 12% to 80%. And the store operation of the network protocol stack on the LLC mainly originates from the memory copy from the shared memory to the data packet cache. The rising of LLC store miss rate causes the memory copy to be delayed, reduces the data packet processing rate of the network protocol stack, and reduces the network performance of the network protocol stack.
Accordingly, those skilled in the art have been working to develop an intra-stack zero-copy transmit path method based on a shared memory communication scheme. On the premise that the application side is not changed to keep the transparency and low coupling of the network protocol stack to the upper layer application, the original transmission path in the network protocol stack is transformed into a zero-copy transmission path, and the copy in the network protocol stack is eliminated, so that the repetition of data in LLC is eliminated, the LLC is saved, and the network performance of the network protocol stack for transmitting large files under a high-load scene is improved.
Disclosure of Invention
In view of the above-mentioned drawbacks of the prior art, the present invention solves the technical problem of how to eliminate memory copy on the internal transmission path of the network protocol stack in a zero-copy manner, thereby avoiding the LLC memory miss generated by the memory copy from reducing the processing rate of the data packet, and at the same time eliminating the duplication of data in the LLC, so that the saved LLC is used for caching more transmission data, and improving the performance of the network protocol stack.
In order to achieve the above object, the present invention provides a method for zero copy transmission path in stack based on shared memory communication mode, comprising the following steps:
step 1, an event polling module of a network protocol stack receives an I/O request of an application from an I/O event queue in a polling mode, and then the event polling module forwards the request to a session module for processing;
step 2, after receiving the request for transmission, the session module finds out the corresponding network session, finds out the data transmission queue according to the data transmission queue information recorded in the network session, determines the position and length of the transmitted data, and then invokes the memory connection module and transmits the information of the transmitted data to the memory connection module;
step 3, after the memory connection module obtains the data information in the transmission queue, calculating the needed data packet cache resources, distributing the data packet cache resources from a data packet cache resource pool, and then storing the data reference into the data packet cache to replace the original operation of copying the data into the data packet cache; in the process, one more data packet buffer memory is allocated;
step 4, after the memory connection module writes the reference into the data packet cache, the data packet cache is handed to the session module; the session module caches the data packets and transmits the data packets to a transmission/network layer for packaging;
step 5, the transmission/network layer receives the session information and the data packet buffer provided by the session module, finds out the corresponding network connection, and the network protocol stack generates a data packet header; in order to quickly distinguish whether the data packet cache stores the reference of the shared memory data, the network protocol stack stores the data packet header and the data reference in different data packet caches respectively, and adds a mark to the data packet cache storing the data reference; the network protocol stack allocates a data packet buffer in advance in the step 3, the network protocol stack writes the generated data packet header into the reserved data packet buffer, then links the data packet buffer with corresponding data reference to the data packet buffer with the data packet header, and the group of data packet buffers jointly represent a packaged data packet with scattered data packet content;
step 6, after the data package is completed, the session module notifies the event polling module that the data package cache is ready, and the network protocol stack notifies the network card driver;
step 7, the network card driver extracts the packet header or the address and length information of the data in the shared memory from the packet cache, and fills the descriptor into the descriptor ring; for the linked data packet cache, the driver traverses next, extracts information, fills the descriptors, links the data packets through a domain in the descriptors to obtain a string of descriptors, and writes the string of descriptors into a transmission descriptor ring; on a network card supporting SGL characteristics, the network card respectively reads a data packet header and data from a data packet cache and a data transmission queue according to the string of descriptors, composes a continuous data packet inside the network card and then transmits the continuous data packet; after the transmission is completed, the network card marks the string of descriptors as completed;
step 8, after the network card driver receives the string of completed descriptors, the network protocol stack caches the data packet to the memory separation module;
and 9, the memory separation module checks the data packet caches one by one, deletes the reference information in the data packet caches for the marked data packet caches with data references, restores the reference information into common data packet caches, and gives the common data packet caches to the data packet cache resource pool for recovery.
Further, the user mode network protocol stack VPP based on the open source is developed.
Further, the upper layer application provides Web service to the outside through the communication of the shared memory and the network protocol stack.
Further, the network protocol stack comprises a memory connection module, a memory separation module and a memory management module.
Further, the memory connection module determines how many network data packets need to be divided for transmission; and then, for each network data packet, the memory connection module performs traversing, preparing and connecting steps to connect the data of the shared memory to the data packet cache.
Further, the traversal counts how many blocks the data spans, and the data in each block is used as a segment, so that the memory connection module knows how many packet buffer resources should be allocated.
Further, the preparing, the memory connection module allocates a group of packet buffers from the packet buffer resource pool, the number of the packet buffers is one more than the number of the data fragments obtained by traversing, and the module reserves the first one of the group of packet buffers for the subsequent transport/network layer to add the packet header.
Further, the connection is performed, the memory connection module sequentially fills the address and length information of the traversed data segment into the group of data packet caches to complete connection between the data in the shared memory and the data packet caches, and then the memory connection module directs the previous next data in the group of data packet caches to the memory connection module, so that the memory connection module and the previous data packet caches are connected in series to form a linked list.
Further, in step 5, the network protocol stack generates a packet header according to the network transmission protocol.
Further, in the step 5, the information of the packet header generated by the network protocol stack includes network connection and the length of the data.
In the preferred embodiment of the invention, on the transmission path, the data copy from the shared memory to the data packet buffer of the network protocol stack causes two identical data in the shared memory and the data packet buffer, and both the two data occupy LLC. The invention eliminates the memory copy from the shared memory to the data packet buffer memory by using a zero copy mode in the network protocol stack, and eliminates the copy in the data packet buffer memory. The copy from the shared memory to the data packet buffer is changed into the reference of writing the data in the shared memory in the data packet buffer, so that the data copy in the data packet buffer area is eliminated.
The data in the shared memory belongs to the application layer, is incompatible with the data structure of the transmission/network layer below, and the problem that the data packet of the application layer is sent to the network card is solved when the zero copy is realized. The present invention increases the support for data referencing. The original continuous data packet is changed into a discrete form, and the characteristics of the network card are utilized to realize the transmission of the discrete data packet. The data structure of the data packet buffer is expanded, the data packet buffer is only used for storing the data packet header or the data reference, and a plurality of data packet buffers are connected in series into a linked list according to the sequence of the data packet header-data reference to represent a discrete data packet. And delivering the discrete data packets to a network card supporting SGL characteristics, and gathering the discrete data packets into a continuous data packet by the network card to be sent out.
After the original network protocol stack finishes transmitting, the data packet buffer can be immediately recovered by the data packet buffer pool. Now, references to application layer data are stored in the data packet cache, and these data may need to be reserved according to the reliability design of different transmission protocols, and finally released by the management unit in the shared memory. The problem of data packet buffering and data correct recovery needs to be solved. The invention adds a memory separation module to keep the memory release of the application layer and the transmission/network layer independent and ensure that the application layer and the transmission/network layer are correctly recycled. Before the data packet buffer is released and recovered by the corresponding management unit, the memory separation module deletes the reference in the data packet buffer, releases the connection between the shared memory and the data packet buffer, and then the management units in the data packet buffer and the shared memory respectively manage the respective memories.
Compared with the prior art, the invention has the following obvious substantial characteristics and obvious advantages:
1. the invention eliminates the memory copy, avoids the influence of LLC store miss generated by the memory copy on the performance of the network protocol stack, and improves the network performance of the network protocol stack. The LLC resource occupied by the data copy in the original data packet buffer area is saved, the problem that two parts of the same data exist in the LLC is solved, the use of the LLC is reduced, and more transmitted data can be cached by the saved LLC.
2. The invention relates to a shared memory and a data packet cache, and encapsulates data into a complete data packet under the condition that the data structure of an application layer is not required to be modified. And finally, the network card directly reads the packet header of the data packet from the transmission layer, reads the data from the application layer and sends the data packet out.
3. The invention avoids the coupling of the data packet buffer management logic and the shared memory data management logic. And ensuring the correctness of the release of the shared memory data in a zero copy mode.
4. The invention makes the speed of processing network data package by the network protocol stack faster; meanwhile, the network protocol stack saves considerable CPU resources for memory copying, can be used for processing more network data packets, and improves the performance of the network protocol stack.
The conception, specific structure, and technical effects of the present invention will be further described with reference to the accompanying drawings to fully understand the objects, features, and effects of the present invention.
Drawings
FIG. 1 is a VPP network protocol stack transmission flow;
FIG. 2 is an LLC use case of different numbers of long connection-down applications and VPP protocol stacks;
FIG. 3 is a different number of long-connection lower protocol stack CPU LLC miss rates;
FIG. 4 is a general flow framework diagram;
FIG. 5 is a flow chart of a memory connection module;
fig. 6 is a transmission flow when the transmission path in the protocol stack is in the zero copy mode.
Detailed Description
The following description of the preferred embodiments of the present invention refers to the accompanying drawings, which make the technical contents thereof more clear and easy to understand. The present invention may be embodied in many different forms of embodiments and the scope of the present invention is not limited to only the embodiments described herein.
In the drawings, like structural elements are referred to by like reference numerals and components having similar structure or function are referred to by like reference numerals. The dimensions and thickness of each component shown in the drawings are arbitrarily shown, and the present invention is not limited to the dimensions and thickness of each component. The thickness of the components is exaggerated in some places in the drawings for clarity of illustration.
In view of the problem that two copies of data are generated by memory copy on a transmission path, when the transmitted data volume is large, LLC is exhausted, and a large number of LLC memory copies appear in the memory copy of a network protocol stack to reduce the packet processing rate, the invention solves the technical problem of how to eliminate the memory copy on the transmission path inside the network protocol stack in a zero-copy mode, avoid the LLC memory copies generated by the memory copy to reduce the packet processing rate, and simultaneously eliminate the repetition of the data in the LLC, thereby saving the saved LLC for caching more transmitted data and improving the performance of the network protocol stack.
In this regard, the present invention provides an intra-stack zero copy transmission path based on a shared memory communication mode to achieve the above object. The invention is developed based on an open source user mode network protocol stack VPP, and an upper layer application such as Nginx provides Web service through shared memory and network protocol stack communication. The developed network protocol stack comprises a memory connection module, a memory separation module and a memory management module, and the whole architecture is shown in figure 4.
On the side of the upper layer application, when the application calls a sending interface of a socket, the VPP library loaded into the application in the mode of LD_PRELOAD can translate the sending call into a process-to-process communication based on a shared memory. The communication mainly comprises two steps, as shown in fig. 4:
[1] the application writes a send event in the I/O event queue informing the network protocol stack of the send request.
[2] The application copies the data in the private memory to the data transmission queue of the shared memory, and hands the data to the network protocol stack.
If zero copy is implemented on the application side, this means that the network protocol stack needs to read data from the application's private memory and send it out, which requires that the application's private memory be visible to the network protocol stack. Thus, malicious applications can pollute the memory space of the network protocol stack in a buffer overflow mode, and further, the space of other applications can be polluted, and safety problems are generated. If the private memory exposed by the application is reduced to improve the security, the application only shares the memory with the transmitted data to the network protocol stack, and the sharing is timely canceled after the data transmission is completed, so that each transmission, the network protocol stack needs to modify the page table twice to establish and delete the shared memory. However, modifying the page table requires entry to the kernel to complete, and modifying the page table for each transmission incurs significant overhead. Therefore, original memory copy-based sending logic at the application side is reserved in order to avoid potential safety hazards caused by exposing application private memory and huge overhead of frequently modifying page tables.
In fig. 4, the general flow sent on the network protocol stack side is described as follows:
(1) And (2) the event polling module of the network protocol stack receives the I/O request of the application from the I/O event queue in a polling mode, and then forwards the request to the session module for processing.
(3) After receiving the request, the session module finds out the corresponding network session, finds out the data transmission queue according to the data transmission queue information recorded in the network session, determines the position and length of the transmitted data, then invokes the memory connection module and transmits the information of the transmitted data to the memory connection module, and the memory connection module processes the data in the data transmission queue.
(4) After the memory connection module obtains the information of the data in the transmission queue, the required data packet buffer resource is calculated, then the data packet buffer resource is distributed from the data packet buffer resource pool, and then the reference of the data is stored in the reverse data packet buffer, so that the original operation of copying the data to the data packet buffer is replaced. In this process, the module allocates one more packet buffer for step (7), and the detailed flow of the memory connection module will be further described below.
(5) (6) the memory connection module will give the group of data packet buffers to the session module after writing the reference to the data packet buffer. At this time, only the data packet buffer memory has reference, which points to the data of the application layer, and the packet header needs to be added according to the corresponding network transmission protocol to carry out the package, so that the session module transfers the data packet buffer memory to the lower transmission/network layer to carry out the package.
(7) The transmission/network layer receives the session information and the data packet buffer provided by the upper session module, finds out the corresponding network connection, and the network protocol stack generates a data packet header according to the specific network transmission protocol and the information such as the network connection, the length of the data and the like. In order to quickly distinguish whether a data packet buffer stores references to shared memory data, a network protocol stack stores data packet headers and data references in different data packet buffers, respectively, and adds a flag to the data packet buffer storing the data references. Therefore, the network protocol stack allocates one more data packet buffer in step (4) in advance, the network protocol stack writes the generated data packet header into the reserved data packet buffer, and links the data packet buffer with the corresponding data reference to the data packet buffer with the data packet header, and the group of data packet buffers together represent a packaged data packet with scattered data packet contents.
(8) (9) (10) after the completion of the encapsulation of the data packets, the session module will notify the event polling module that the data packet buffer is ready, and the network protocol stack will then notify the network card driver.
(11) The network card driver extracts the information of the packet header or the address and the length of the data in the shared memory from the packet cache, fills the descriptor into the descriptor ring; and traversing the linked data packet caches, extracting information, filling the descriptors, linking through some fields in the descriptors to obtain a string of descriptors, and writing the string of descriptors to send the descriptor ring.
(a) And (b) on the network card supporting SGL characteristics, the network card respectively reads the packet header and the data from the packet buffer and the data transmission queue according to the string of descriptors, composes a continuous data packet inside the network card and then transmits the continuous data packet. After the transmission is completed, the network card marks the string of descriptors as completed.
(12) And (13) after receiving the completed descriptors, the network card driver directly enables the data packet buffer resource pool to recover the corresponding data packet buffer in the original design. However, the existing data packet buffer may store references of data in the shared memory, and it needs to be ensured that the memory occupied by the data in the shared memory is correctly recovered, but introducing data management of an upper application layer into the management logic of the data packet buffer resource pool breaks the independence between the layers, so as to avoid the coupling of the memory management logic between the application layer and the transmission/network layer, and therefore, before the data packet buffer is delivered to the data packet buffer resource pool for recovery, the network protocol stack delivers the data packet buffer to the memory separation module.
(14) The memory separation module checks the data packet caches one by one, and for the data packet caches with marks and stored with data references, the memory separation module deletes the reference information in the data packet caches and restores the reference information to the common data packet caches, and then the data packet caches are delivered to the data packet cache resource pool for recovery.
The memory connection module is a key part of the network protocol stack, and memory copy from the shared memory to the data packet cache in the original network protocol stack is eliminated. On the premise of eliminating this memory copy, it is desirable to realize the following: 1. the original good layering design of the VPP network protocol stack is maintained, and tight coupling among different layers is avoided; 2. the modification of the data structure and the function logic of a certain layer should be as low as possible in performance.
Based on the above consideration, the network protocol stack gives up modifying the data structure of the data transmission queue in the shared memory to be compatible with the data structure of the data packet buffer, and avoids the coupling of the application layer and the transmission/network layer data structure. Likewise, the method of sharing the packet buffer resource pool of the transport/network layer to the application for storing the application layer data is also unsatisfactory. Therefore, the memory connection module needs to encapsulate the data of the transmission/network layer and the application layer together under the condition of no copying, and then send the encapsulated data to the network card for transmission. The data structure of the data packet cache of the transmission/network layer is expanded, so that the data packet cache can refer to a memory space except the data structure of the data packet cache; and the data structure of the scattered aggregation linked list is utilized to encapsulate the data packet.
The main working content of the memory connection module is as follows: firstly, according to the length of data from application and data packet MSS provided by session module, window size is also considered for TCP protocol, and how many network data packets need to be divided for transmission; and then for each network data packet, the memory connection module performs three steps of traversing, preparing and connecting, and connects the data of the shared memory to the data packet cache. Specifically, the traversing, preparing and connecting workflow of the memory connection module is as shown in fig. 5:
(1) Traversing: the transmit data queue is formed by organizing a group of blocks in a linked list, the space for storing data among the blocks is discontinuous, and the data written by application can span multiple blocks. According to the address and length information of the data, the module firstly needs to do so, namely, according to the address and length of the data and the information of each block, the module calculates how many blocks the data spans, and the data in each block is used as a segment, so that the module knows how much data packet cache resources should be allocated in the next step.
(2) Preparation: the module then allocates a set of packet buffers from the packet buffer resource pool, one more than the number of data fragments obtained in step (1). The module will reserve the first of the set of packet buffers for later use by the transport/network layer to add the header.
(3) And (3) connection: and (3) sequentially filling the address and length information of the data fragments obtained in the step (1) into the group of data packet caches by the module to finish the connection between the data in the shared memory and the data packet caches. The module then points the previous "next" in the set of packet buffers to itself, concatenating with the previous packet buffers into a linked list.
The packet buffers together with the packet buffers reserved in advance for the packet header form a distributed aggregation linked list representing network packets that have not yet been encapsulated, each element of the linked list pointing to a segment of the network packet.
After the memory connection module completes the connection, the distributed aggregation link table representing a network data packet that has not completed the encapsulation is handed to the subsequent transport/network layer. The protocol processing node of the transmission/network layer traverses the data packet buffer in the linked list to obtain the length of the data, and fills the information such as network connection, the length of the data and the like into the protocol packet head to be packaged into a complete data packet.
The subsequent driver traverses the set of packet buffer linked lists, generates a descriptor for each packet buffer, and directs the "next" of each descriptor to the next to obtain a string of descriptors.
The string of descriptors can be then handed to the network card, and by virtue of SGL characteristics of the network card, the network card obtains all fragments according to address and length information, and is connected into a continuous network packet inside the network card, and then the continuous network packet is sent out.
Therefore, by expanding the data structure of the data packet buffer and utilizing the scattered aggregation linked list, a plurality of data packet buffers can be used for representing a completely packaged network data packet, but the actual memory is discrete network data packet, thereby achieving the purpose of eliminating the copy from the shared memory to the data packet buffer. With the help of the network card supporting SGL characteristics, the network protocol stack realizes that data is sent out from an application layer on the premise of no copying, and ensures the integrity of network data packets.
In the original design implementation, after the network card sends out the network data packet in the data packet cache, the data packet cache can recover the data packet cache resource pool; if a transmitted network packet is lost, a reliable network connection such as TCP will again copy the data from the application layer for retransmission. However, the zero copy design takes the data in the shared memory of the application layer directly as the content of the network data packet, and the network protocol stack not only needs to realize the purpose of sending out, but also needs to make the management and recovery work of the occupied space of the data and the data packet buffer.
In managing the recovery of both data and packet buffering in shared memory, the following aspects must be guaranteed to be correct: 1. the data packet cache is correctly recovered after the network card is transmitted, and can be used normally later; 2. the timing of recovering the data in the shared memory is correct, and for reliable network transmission protocols such as TCP, the data in the shared memory should be recovered after the data is ensured to be sent to the client; 3. whether or not a reliable network transport protocol is used at the transport/network layer, the content of the shared memory should be unchanged until reclamation, ensuring that the retransmitted content is correct.
In order to ensure correctness and avoid the coupling of the application layer and the memory management logic of the transmission/network layer as much as possible, a memory separation module is designed to manage and recover the data of the data packet buffer and the associated shared memory which are sent by the network card. Before the memory separation module is arranged in the data packet buffer resource pool to recover the data packet buffer, the logic for processing the data reference in the shared memory in the data packet buffer is arranged in the memory separation module, so that the coupling between the management logic of the data packet buffer resource pool and the data management recovery logic of the shared memory is avoided. The following describes the workflow of the memory separation module:
when the network card sends the data packet buffer to the memory separation module, the memory separation module checks the data packet buffer to find out the marked data packet buffer with the reference of the data in the shared memory, and then the memory separation module deletes the data reference information and the mark in the data packet buffer to restore the data packet buffer into a common data packet buffer without the data reference. These packets are then free of data references and can subsequently be handed over to the packet cache resource pool for reclamation. Therefore, not only can the data packet be recovered and recycled, but also other functions of the network protocol stack can not be used for modifying the content of the application data by referring when the data packet buffer is modified when the data packet buffer is recycled.
In order to meet the reliability of the network transmission protocol and ensure that the data is recovered at the correct time, a method of adding a callback function in the data packet buffer is adopted. In the connection stage of the memory connection module, different functions can be provided for data packet caches of different network transmission protocols as callback functions, the callback functions are called in the memory separation module, and whether to notify an application layer to recycle data is determined according to the corresponding network transmission protocols. Taking TCP as an example, the completion of network card sending cannot ensure that the client can receive the data, so that the corresponding callback function should reserve the data, and the data is recovered after receiving the ACK of the client.
On the new transmission path, the data of the network packet is located in the shared memory, so the network protocol stack needs to expand the original memory management module and increase the function of managing the DMA mapping for the shared memory. In view of the fact that under the scene of higher network load, the sending request is very frequent, a large amount of expenditure is brought to the way of creating DMA mapping for each data and then deleting the mapping after the sending is completed, and therefore, when the application is started, the memory management module can build DMA mapping for the whole shared memory; when the application is closed, the module re-deletes the DMA map of the shared memory. If the DMA map fails because of limited system resources, e.g., the number of pages in the DMA map exceeds the system limit, the shared memory is marked and the session module copies the data from the shared memory to the packet buffer using the original memory copy.
The technical effects are as follows: after changing the transmission path inside the network protocol stack to zero copy, the process of transmitting data from the application to the network protocol stack and finally to the network card is re-analyzed, as shown in fig. 6. Compared with the original process of fig. 1, the steps 1 and 2 executed by the application side are the same as the original steps, and the data written into the shared memory is cached in the LLC. And the memory connection module allocates a data packet buffer at the network protocol stack side, and then writes the reference of the data in the shared memory into the data packet buffer. The network protocol stack then generates a corresponding header for the block of data and writes it into another packet buffer. And then the driver can generate a plurality of descriptors to write into the descriptor ring, and after the network card DMA descriptor, the DMA data packet header and the data are sequentially written. Compared with fig. 1, although the network card needs more times of DMA for sending a network data packet, the hardware performance of the network card is very powerful nowadays, and the cost introduced by the process of collecting the scattered data packet headers and data into the network card has little influence on the performance of the network card under the scene of large file transmission, and the network card is not caused to become a bottleneck of the overall performance.
In this process, it can be seen that the shared memory and the packet buffer still occupy the LLC, but the load of the packet buffer in fig. 1 is replaced by the data reference, the network protocol stack saves a large amount of LLC resources used by the original load, only a very small portion of resources are needed for storing the data reference (because the allocated packet buffer is only used by one thread of the network protocol stack, the data reference can be buffered to L1 and L2 buffers), the purpose of saving LLC resources is achieved, and the network protocol stack no longer needs to write a large amount of data into the LLC, and a large amount of LLC store miss due to memory copy is avoided, so that the network protocol stack can process network packets faster; meanwhile, the network protocol stack saves considerable CPU resources for memory copying, can be used for processing more network data packets, and improves the performance of the network protocol stack.
The foregoing describes in detail preferred embodiments of the present invention. It should be understood that numerous modifications and variations can be made in accordance with the concepts of the invention without requiring creative effort by one of ordinary skill in the art. Therefore, all technical solutions which can be obtained by logic analysis, reasoning or limited experiments based on the prior art by the person skilled in the art according to the inventive concept shall be within the scope of protection defined by the claims.

Claims (10)

1. The method for transmitting the path of the zero copy in the stack based on the shared memory communication mode is characterized by comprising the following steps:
step 1, an event polling module of a network protocol stack receives an I/O request of an application from an I/O event queue in a polling mode, and then the event polling module forwards the request to a session module for processing;
step 2, after receiving the request for transmission, the session module finds out the corresponding network session, finds out the data transmission queue according to the data transmission queue information recorded in the network session, determines the position and length of the transmitted data, and then invokes the memory connection module and transmits the information of the transmitted data to the memory connection module;
step 3, after the memory connection module obtains the data information in the transmission queue, calculating the needed data packet cache resources, distributing the data packet cache resources from a data packet cache resource pool, and then storing the data reference into the data packet cache to replace the original operation of copying the data into the data packet cache; in the process, one more data packet buffer memory is allocated;
step 4, after the memory connection module writes the reference into the data packet cache, the data packet cache is handed to the session module; the session module caches the data packets and transmits the data packets to a transmission/network layer for packaging;
step 5, the transmission/network layer receives the session information and the data packet buffer provided by the session module, finds out the corresponding network connection, and the network protocol stack generates a data packet header; in order to quickly distinguish whether the data packet cache stores the reference of the shared memory data, the network protocol stack stores the data packet header and the data reference in different data packet caches respectively, and adds a mark to the data packet cache storing the data reference; the network protocol stack allocates a data packet buffer in advance in the step 3, the network protocol stack writes the generated data packet header into the reserved data packet buffer, then links the data packet buffer with corresponding data reference to the data packet buffer with the data packet header, and the group of data packet buffers jointly represent a packaged data packet with scattered data packet content;
step 6, after the data package is completed, the session module notifies the event polling module that the data package cache is ready, and the network protocol stack notifies the network card driver;
step 7, the network card driver extracts the packet header or the address and length information of the data in the shared memory from the packet cache, and fills the descriptor into the descriptor ring; for the linked data packet cache, the driver traverses next, extracts information, fills the descriptors, links the data packets through a domain in the descriptors to obtain a string of descriptors, and writes the string of descriptors into a transmission descriptor ring; on a network card supporting SGL characteristics, the network card respectively reads a data packet header and data from a data packet cache and a data transmission queue according to the string of descriptors, composes a continuous data packet inside the network card and then transmits the continuous data packet; after the transmission is completed, the network card marks the string of descriptors as completed;
step 8, after the network card driver receives the string of completed descriptors, the network protocol stack caches the data packet to the memory separation module;
and 9, the memory separation module checks the data packet caches one by one, deletes the reference information in the data packet caches for the marked data packet caches with data references, restores the reference information into common data packet caches, and gives the common data packet caches to the data packet cache resource pool for recovery.
2. The method for zero copy transmission path in stack based on shared memory communication mode as claimed in claim 1, wherein said network protocol stack VPP is developed based on open source user mode.
3. The method for zero copy transmission path in stack based on shared memory communication mode as claimed in claim 1, wherein the upper layer application provides Web service to the outside through shared memory and network protocol stack communication.
4. The method for zero-copy transmission path in stack based on shared memory communication mode as claimed in claim 1, wherein said network protocol stack comprises a memory connection module, a memory separation module and a memory management module.
5. The method for zero-copy transmission paths in a stack based on a shared memory communication scheme as claimed in claim 1, wherein said memory connection module determines how many network packets need to be split for transmission; and then, for each network data packet, the memory connection module performs traversing, preparing and connecting steps to connect the data of the shared memory to the data packet cache.
6. The method of claim 5, wherein the traversing counts how many blocks the data spans, the data in each block being a segment, and the memory connection module knows how many packet buffer resources should be allocated.
7. The method for zero-copy transmission path in stack based on shared memory communication mode as recited in claim 5, wherein said preparing, memory connection module allocates a set of packet buffers from a packet buffer resource pool, one more than the number of data fragments obtained by traversing, and module reserves the first one of the set of packet buffers for later use by transport/network layers for adding packet headers.
8. The method for zero-copy transmission path in stack based on shared memory communication mode as claimed in claim 5, wherein said connection, the memory connection module sequentially fills the address and length information of the traversed data segment into the group of packet buffers to complete the connection between the data in the shared memory and the packet buffers, and then the memory connection module directs the previous "next" in the group of packet buffers to itself, thereby concatenating the previous packet buffers into a linked list.
9. The method for zero-copy transmission path in stack based on shared memory communication mode as claimed in claim 1, wherein in step 5, the network protocol stack generates the packet header according to the network transmission protocol.
10. The method for zero-copy transmission path in stack based on shared memory communication mode as claimed in claim 1, wherein said step 5, the information of the packet header generated by the network protocol stack includes network connection, length of data.
CN202310443027.7A 2023-04-23 2023-04-23 Method for transmitting path in zero copy in stack based on shared memory communication mode Pending CN116489250A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310443027.7A CN116489250A (en) 2023-04-23 2023-04-23 Method for transmitting path in zero copy in stack based on shared memory communication mode

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310443027.7A CN116489250A (en) 2023-04-23 2023-04-23 Method for transmitting path in zero copy in stack based on shared memory communication mode

Publications (1)

Publication Number Publication Date
CN116489250A true CN116489250A (en) 2023-07-25

Family

ID=87226362

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310443027.7A Pending CN116489250A (en) 2023-04-23 2023-04-23 Method for transmitting path in zero copy in stack based on shared memory communication mode

Country Status (1)

Country Link
CN (1) CN116489250A (en)

Similar Documents

Publication Publication Date Title
Guo et al. Clio: A hardware-software co-designed disaggregated memory system
US7089289B1 (en) Mechanisms for efficient message passing with copy avoidance in a distributed system using advanced network devices
US6799200B1 (en) Mechanisms for efficient message passing with copy avoidance in a distributed system
US6535518B1 (en) System for bypassing a server to achieve higher throughput between data network and data storage system
US7813342B2 (en) Method and apparatus for writing network packets into computer memory
CA2509404C (en) Using direct memory access for performing database operations between two or more machines
US7953817B2 (en) System and method for supporting TCP out-of-order receive data using generic buffer
US20030145230A1 (en) System for exchanging data utilizing remote direct memory access
CN101304373B (en) Method and system for implementing high-efficiency transmission chunk data in LAN
US8271669B2 (en) Method and system for extended steering tags (STAGS) to minimize memory bandwidth for content delivery servers
CN106598752B (en) Remote zero-copy method
JPH1185710A (en) Server device and file management method
CN111966446B (en) RDMA virtualization method in container environment
JP2007510978A (en) Storage server bottom-up cache structure
CN111400307B (en) Persistent hash table access system supporting remote concurrent access
EP0707266B1 (en) Methods and apparatus for a data transfer mechanism in the field of computer systems
WO2018032510A1 (en) Method and apparatus for processing access request applied to storage device
WO2015084506A1 (en) System and method for managing and supporting virtual host bus adaptor (vhba) over infiniband (ib) and for supporting efficient buffer usage with a single external memory interface
US20110107347A1 (en) Generic Transport Layer Mechanism For Firmware Communication
US20080162663A1 (en) Computer system with network interface retransmit
CN111459417A (en) NVMeoF storage network-oriented lock-free transmission method and system
CN114756388A (en) RDMA (remote direct memory Access) -based method for sharing memory among cluster system nodes as required
CN112600882B (en) Hardware acceleration method based on shared memory communication mode
CN115801770B (en) Large file transmission method based on full-user-state QUIC protocol
US20150121376A1 (en) Managing data transfer

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination