WO2023030178A1 - 一种基于用户态协议栈的通信方法及相应装置 - Google Patents

一种基于用户态协议栈的通信方法及相应装置 Download PDF

Info

Publication number
WO2023030178A1
WO2023030178A1 PCT/CN2022/115019 CN2022115019W WO2023030178A1 WO 2023030178 A1 WO2023030178 A1 WO 2023030178A1 CN 2022115019 W CN2022115019 W CN 2022115019W WO 2023030178 A1 WO2023030178 A1 WO 2023030178A1
Authority
WO
WIPO (PCT)
Prior art keywords
thread
target
connection
routing module
threads
Prior art date
Application number
PCT/CN2022/115019
Other languages
English (en)
French (fr)
Inventor
陆志浩
黄黎明
吴长冶
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2023030178A1 publication Critical patent/WO2023030178A1/zh

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/40Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks using virtualisation of network functions or resources, e.g. SDN or NFV entities
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L69/00Network arrangements, protocols or services independent of the application payload and not provided for in the other groups of this subclass
    • H04L69/16Implementation or adaptation of Internet protocol [IP], of transmission control protocol [TCP] or of user datagram protocol [UDP]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L69/00Network arrangements, protocols or services independent of the application payload and not provided for in the other groups of this subclass
    • H04L69/30Definitions, standards or architectural aspects of layered protocol stacks
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L9/00Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
    • H04L9/40Network security protocols

Definitions

  • the present application relates to the field of computer technology, in particular to a communication method and a corresponding device based on a user state protocol stack.
  • the interaction between application threads and network hardware is usually implemented through a kernel protocol stack or a user mode protocol stack.
  • IO input/output
  • the kernel protocol stack needs to frequently perform context switching between the kernel state and the user state in the process of processing IO data. switch. Therefore, the capabilities of the existing kernel protocol stack can no longer fully release the IO capabilities of network hardware.
  • the user mode protocol stack is a common and effective technical means.
  • the original intention of the user-mode protocol stack is to bypass the kernel and allow applications to interact with hardware in a more direct way.
  • the current user-mode protocol stack usually designs the user-mode protocol stack thread and the application thread to be in the same thread context, which can avoid the overhead of thread switching.
  • this design method of the user-mode protocol stack will cause the threads of the user-mode protocol stack to be bound to the threads of the application, resulting in insufficient versatility.
  • An embodiment of the present application provides a communication method based on a user-mode protocol stack, which is used to improve the versatility of the user-mode protocol stack.
  • Embodiments of the present application also provide corresponding devices, computer-readable storage media, computer program products, and the like.
  • the first aspect of the present application provides a communication method based on a user-mode protocol stack, which is applied to a server.
  • the server includes an application layer, a user-mode protocol stack, and a hardware layer.
  • the target application of the application layer corresponds to at least one W thread, and the W thread is The thread used to process the data of the target application
  • the user mode protocol stack includes a plurality of N threads, a routing module, and a transmission control protocol hash table corresponding to the plurality of N threads one by one, the N thread is a user mode protocol stack thread, and the hardware
  • the layer includes a plurality of unbalanced memory access NUMA nodes and network cards, wherein a plurality of N threads correspond to a plurality of NUMA nodes; the method includes: obtaining a first correspondence through a routing module, and the first correspondence includes a first W thread
  • the corresponding relationship between the listening file descriptor (file description, FD) and multiple shadow FDs, multiple shadow FDs are generated
  • the communication method based on the user state protocol stack provided by the present application can be applied to a non-uniform memory access (NUMA) system, and the NUMA system usually includes a plurality of NUMA nodes (node), and each NUMA node usually includes Multiple processing cores, memory and input/output (input/output, IO) resources, etc.
  • the processing core may also be referred to as a central processing unit (central processing unit, CPU) core, or CPU for short.
  • the server may include a server, a virtual machine (virtual machine, VM) or a container (container).
  • Clients can include end devices, virtual machines, or containers.
  • the application layer can include multiple applications, and the target application can be one of them.
  • the target application can correspond to one W thread or multiple W threads. If the target application corresponds to one W thread, the W thread can complete the listening , waiting and data processing and other functions. If the target application corresponds to multiple W threads, multiple W threads can respectively complete functions such as listening, waiting, and data processing. Of course, one thread can also perform two or more functions, such as: One of the W threads completes the function of waiting and data processing.
  • a plurality in this application includes two or more, and can also be described as at least two.
  • each N thread has a transmission control protocol (transmission control protocol, TCP) hash table, and the TCP hash table includes information required by the N threads to execute the TCP protocol.
  • TCP transmission control protocol
  • Each N thread corresponds to a NUMA node, and the corresponding relationship between N threads and NUMA nodes can be configured during server initialization.
  • a NUMA node usually includes multiple processing cores, and N threads can be bound to one of the processing cores.
  • the routing module can be a software package with routing functions, such as: software development kit (software development kit, SDK) or data plane programming kit (data plane development kit, DPDK).
  • the routing module includes a first correspondence and a second correspondence.
  • the first corresponding relationship may be referred to as an FD shadow table, which is a corresponding relationship between the listening FD of the first W thread that initiates the listening operation and the shadow FD corresponding to each N thread.
  • the shadow table can be in the form of listening FD corresponding to shadow FD1, shadow FD2,..., shadow FDn.
  • the shadow FD refers to the FD that the operating system does not perceive. For the operating system, only the listening FD of the first W thread is perceived.
  • the second correspondence may be called a routing table, and the routing table records the correspondence between each N thread and the corresponding connection FD, including the correspondence between the target N thread and the corresponding connection FD.
  • the listening FD refers to the FD related to the listening operation of the first W thread
  • the connection FD refers to the FD generated by the N thread for establishing the TCP connection between the client and the server
  • a TCP connection has a connection FD .
  • the connection FD corresponds to the N thread that establishes the TCP connection.
  • the N thread that establishes the TCP connection is referred to as the target N thread.
  • the server can realize the corresponding association search from N thread to W through the first corresponding relationship (ie shadow table) in the routing module, thereby transferring the connection FD , and then use the second corresponding relationship (routing table) through the routing module to determine the target N threads used in the communication process, thereby completing the communication process.
  • this application does not need to establish the binding relationship between W thread and N thread in advance, nor does it need multiple N threads to share a TCP hash table, and W thread and N thread can be decoupled, thereby improving the user mode protocol.
  • the versatility of the stack in addition, because it does not involve the operation of the kernel, the W thread and the N thread do not need to perform context switching, and also improve the performance of the user mode protocol stack.
  • the above step: obtaining the first correspondence through the routing module includes: receiving the listening operation initiated by the first W thread through the routing module, and generating a listening operation for the first W thread FD; through the routing module to initiate a listening operation to a plurality of N threads respectively, to obtain a plurality of shadow FDs corresponding to a plurality of N threads, and a plurality of shadow FDs correspond to a plurality of N threads one by one; through the routing module, establish a listening FD and correspondence among multiple shadow FDs to obtain a first correspondence.
  • the server periodically initiates a monitoring operation through the first W thread for the target application, so as to detect whether there is relevant data of the target application to be received.
  • the first W thread initiates a listening operation, and the routing module will initiate a listening operation to each N thread according to the listening operation, so that a shadow table from the first W thread to each N thread can be established for In the subsequent communication process, there is no need to pre-bind the corresponding relationship between the W thread and the N thread, thereby improving the versatility of the user mode protocol stack.
  • the network card includes at least one network card queue
  • the above step: obtaining the second corresponding relationship through the routing module includes: obtaining the connection FD generated by the target N thread for establishing the communication connection through the routing module , the communication connection is established based on the link establishment request sent by the client terminal received by the first network card queue, and the first network card queue is one of at least one network card queue; the corresponding relationship between the target N thread and the connection FD is established through the routing module to obtain Second Correspondence.
  • the network card usually includes multiple network card queues, one network card queue corresponds to one N thread, and the corresponding relationship between the network card queue and N threads is not pre-configured, but can be established during the process of establishing a communication connection. If the first network card queue in the network card receives the link establishment request from the client, and the network selects the target N thread for the link establishment request according to its own logic, then the second link between the first network card queue and the target N thread is established. Correspondence. The second corresponding relationship is stored in the routing module, so that subsequent communication relationships generated through the connection FD can determine the corresponding target N thread through the routing module to complete the subsequent communication process and improve the flexibility of communication.
  • the above steps: using the routing module to communicate with the client based on the first correspondence and the second correspondence include: using the routing module to communicate with the client based on the target N threads in the first correspondence The correspondence between the corresponding shadow FD and the listening FD corresponding to the first W thread, the connection FD corresponding to the target N thread is passed to the first W thread; through the routing module, based on the connection FD, and the second correspondence and Client communication.
  • connection FD can be passed to the first W thread through the shadow table, so that the relevant W thread of the target application can use the connection FD to perform other subsequent operations, and the routing module can also initiate according to the W thread
  • the connection FD in other operations determines the corresponding target N thread to perform related operations and complete the communication process.
  • communicating with the client through the routing module based on the connection FD and the second correspondence includes: receiving Waiting for poll/extended waiting for epoll event initiated by the second W thread, the poll/epoll event includes connection FD, the connection FD is passed from the first W thread to the second W thread, and the second W thread initiates the poll/epoll event and then transfers to Enter the dormant state, the second W thread is one of the multiple W threads corresponding to the target application; through the routing module, determine that the connection FD corresponds to the target N thread according to the second correspondence, so as to wait for the wake-up event related to the target thread; After the second W thread is woken up, the routing module executes a read operation or a write operation related to the target N thread according to the second corresponding relationship.
  • the thread model of Mysql is that a master thread is responsible for completing the listening (listen), and the new TCP connection is handed over to the auth thread, and the final SQL The request is handed over to the Worker thread for completion.
  • the first W thread needs to pass the connection FD to the second W thread, and the second W thread triggers the poll/epoll event, and then goes to sleep, waiting for the wake-up event of the target thread after the relevant data arrives. After the second W thread is woken up, the subsequent communication process is executed. In this way, the power consumption of the second W thread in an active state can be reduced without affecting the communication process, and the performance of the system is improved.
  • the method further includes: waking up the second W thread through a wakeup agent thread associated with the target N thread.
  • the second W thread can be woken up by the wake-up proxy thread associated with the target N thread, which can prevent the target N thread from entering the system state, so that the target N thread can always be in the running state, thereby reducing the communication process. Network delay.
  • the method further includes: allocating and receiving the connection FD in the memory of the NUMA node corresponding to the target N thread.
  • the queue and the sending queue the receiving queue is used to record the memory address of the data related to the read operation
  • the sending queue is used to record the memory address of the data related to the writing operation.
  • the above step: using the routing module to execute the read operation related to the target N thread according to the second correspondence includes: receiving the second W thread or the third W thread through the routing module
  • the read operation initiated by the thread carries the connection FD
  • the third thread is one of multiple W threads corresponding to the target application.
  • the connection FD is passed to the third W thread by the second W thread.
  • the W thread through the routing module, according to the connection FD, obtain the memory address of the first data from the receiving queue associated with the connection FD, the first data is the data received from the client by the first network card queue associated with the target N thread,
  • the first network card queue is a network card queue that receives the link establishment request sent by the client; obtains the first data according to the memory address of the first data, and passes the first data to the second thread or the third W thread for processing.
  • the process of the read operation can be directly initiated by the second W thread, or the second W thread can pass the connection FD to the third W thread, which is initiated by the third W thread.
  • the second W thread can be an auth thread
  • the third W thread can be a worker thread.
  • the target N thread can obtain the memory address of the first data from the corresponding receiving queue according to the connection FD in the read operation, and then obtain the memory address of the first data from the memory.
  • the first data is transferred to the buffer of the second W thread or the third W thread, and the first data is processed by the second W thread or the third W thread.
  • the routing module determines the corresponding target N thread by connecting to the FD to complete the corresponding processing process, which can improve the efficiency of data reading.
  • the above step: using the routing module to execute the write operation related to the target N thread according to the second correspondence includes: receiving the second W thread or the third W thread through the routing module The write operation initiated by the thread carries the connection FD and the second data.
  • the third thread is one of the multiple W threads corresponding to the target application.
  • connection FD is the second W thread Passed to the third W thread; through the routing module, according to the connection FD, the second data is written in the memory corresponding to the target N thread, and the memory address of the second data in the memory is written into the transmission corresponding to the connection FD queue; when the target N thread polls the memory address of the second data in the sending queue, send the second data in the memory to the network card.
  • the relationship between the second W thread and the third W thread can be understood by referring to the relationship in the above read operation.
  • the routing module determines the corresponding target N thread according to the connection FD, and then writes the second data into the memory corresponding to the target N thread, and then writes the memory address of the second data into the memory address corresponding to the connection FD
  • the target N thread is trained to the memory address of the second data in the sending queue, it will send the second data in the memory to the first network card queue in the network card, and the first network card queue will send the data to the first network card queue
  • the second data is sent to the client.
  • the method before performing the read operation or the write operation, further includes: processing of binding the second W thread or the third W thread to the NUMA node where the target N thread is located nuclear on.
  • the second W thread or the third W thread is bound to the processing core in the NUMA node where the target N thread is located, and the TCP communication process can be completed without crossing NUMA nodes, and the target N thread and The second W thread and the third W first share the data in the memory, thereby increasing the communication speed and reducing the network delay.
  • the memory in the NUMA node corresponding to the target N threads is a huge page memory.
  • the probability of a cache lookup miss can be reduced by setting a large page memory.
  • the second aspect of the present application provides a communication method based on a user-mode protocol stack, which is applied to a client.
  • the client includes an application layer, a user-mode protocol stack, and a hardware layer.
  • the target application of the application layer corresponds to at least one W thread, and the W thread is The thread used to process the data of the target application, the user mode protocol stack includes a plurality of N threads, a routing module, and a transmission control protocol hash table corresponding to the plurality of N threads one by one, the N thread is a user mode protocol stack thread, and the hardware
  • the layer includes a plurality of unbalanced memory access NUMA nodes, wherein a plurality of N threads correspond to a plurality of NUMA nodes; the method includes: obtaining a target correspondence through a routing module, and the target correspondence includes a connection file descriptor FD and a target N Correspondence of threads, the target N thread is the N thread selected by the routing module for the first W thread that initiates the
  • the same features involved in the second aspect as those in the first aspect can be understood by referring to the explanation of the first aspect.
  • the corresponding relationship between the connection FD and the target N thread is established through the routing module, so that in the subsequent communication process, the corresponding target N thread can be determined according to the connection FD, and then subsequent communication operations are performed without W
  • the thread is bound to the N thread, thereby improving the versatility of the user mode protocol stack.
  • the W thread and the N thread do not need to perform context switching, and the performance of the user mode protocol stack is also improved.
  • the above step: obtaining the target correspondence through the routing module includes: receiving the connection operation initiated by the first W thread through the routing module, and selecting from multiple N threads for the connection operation target N threads, and generate a connection FD for the first W thread; establish the corresponding relationship between the target N threads and the connection FD through the routing module, so as to obtain the target corresponding relationship.
  • the above steps through the routing module, communicate with the server based on the target correspondence, including: through the routing module, according to the connection FD, determine the NUMA node and the network card corresponding to the target N thread Queue: sending a link establishment request and the first data to the server through the NUMA node corresponding to the target N thread and the network card queue.
  • the above steps after determining the NUMA node corresponding to the target N thread and the network card queue according to the connection FD, the method further includes: in the memory of the NUMA node corresponding to the target N thread The send queue is allocated for the connection FD, and the send queue is used to record the memory address of the data related to the write operation.
  • the above step: sending the first data to the server through the NUMA node corresponding to the target N thread and the network card queue includes: receiving the write operation initiated by the second W thread through the routing module , the write operation carries the connection FD and the first data, and the second thread is one of the multiple W threads corresponding to the target application.
  • connection FD is passed from the first W thread to the second W Thread; through the routing module, according to the connection FD, the first data is written in the memory corresponding to the target N thread, and the memory address of the first data in the memory is written into the sending queue corresponding to the connection FD; when the target N When the thread polls the memory address of the first data in the sending queue, it sends the first data in the memory to the network card.
  • the method before performing the write operation, further includes: binding the second W thread to a processing core in the NUMA node where the target N thread is located.
  • Any possible implementation of the second aspect may refer to the first aspect or any possible implementation of the first aspect for the same features as the first aspect or any possible implementation of the first aspect.
  • the explanation of the way of implementation is understood.
  • the third aspect of the present application provides a server, which has the function of realizing the method of the above-mentioned first aspect or any possible implementation manner of the first aspect.
  • This function may be implemented by hardware, or may be implemented by executing corresponding software on the hardware.
  • the hardware or software includes one or more modules corresponding to the above-mentioned functions, for example: a first processing unit, a second processing unit, and a third processing unit, and these three processing units can be processed by one processing unit or multiple processing units accomplish.
  • a fourth aspect of the present application provides a client, which has a function of implementing the method of the above-mentioned second aspect or any possible implementation manner of the second aspect.
  • This function may be implemented by hardware, or may be implemented by executing corresponding software on the hardware.
  • the hardware or software includes one or more modules corresponding to the above functions, for example: a first processing unit and a second processing unit, and these two units may be implemented by one processing unit.
  • a fifth aspect of the present application provides a computer device, the computer device includes at least one processor, a memory, an input/output (input/output, I/O) interface, and a computer executable program stored in the memory and operable on the processor Instructions, when the computer-executed instructions are executed by the processor, the processor executes the method according to the above first aspect or any possible implementation manner of the first aspect.
  • a sixth aspect of the present application provides a computer device, the computer device includes at least one processor, a memory, an input/output (input/output, I/O) interface, and a computer executable program stored in the memory and operable on the processor Instructions, when the computer-executed instructions are executed by the processor, the processor executes the method according to the above-mentioned second aspect or any possible implementation manner of the second aspect.
  • the seventh aspect of the present application provides a computer-readable storage medium storing one or more computer-executable instructions.
  • the computer-executable instructions are executed by a processor, the one or more processors execute any of the above-mentioned first aspect or the first aspect.
  • the eighth aspect of the present application provides a computer program product that stores one or more computer-executable instructions.
  • the computer-executable instructions are executed by one or more processors, one or more processors execute the above-mentioned second aspect or second A method for any one of the possible implementations of the aspect.
  • the ninth aspect of the present application provides a chip system, the chip system includes at least one processor, at least one processor is used to support the server to implement the functions involved in the first aspect or any possible implementation of the first aspect .
  • the system-on-a-chip may further include a memory for storing necessary program instructions and data of the server.
  • the system-on-a-chip may consist of chips, or may include chips and other discrete devices.
  • the tenth aspect of the present application provides a chip system, the chip system includes at least one processor, at least one processor is used to support the client to implement the functions involved in the second aspect or any possible implementation of the second aspect .
  • the chip system may further include a memory, and the memory is used for storing necessary program instructions and data of the client.
  • the system-on-a-chip may consist of chips, or may include chips and other discrete devices.
  • FIG. 1 is a schematic diagram of a communication scenario between a server and a client
  • Fig. 2 is a schematic structural diagram of a computer device provided by an embodiment of the present application.
  • FIG. 3 is a schematic structural diagram of a server provided by an embodiment of the present application.
  • FIG. 4 is a schematic structural diagram of a client provided by an embodiment of the present application.
  • FIG. 5 is a schematic diagram of an embodiment of a communication method based on a user mode protocol stack provided in an embodiment of the present application
  • FIG. 6 is a schematic diagram of another embodiment of a communication method based on a user mode protocol stack provided by an embodiment of the present application
  • FIG. 7 is a schematic diagram of another embodiment of a communication method based on a user mode protocol stack provided by an embodiment of the present application.
  • FIG. 8 is a schematic diagram of another embodiment of a communication method based on a user mode protocol stack provided by an embodiment of the present application.
  • FIG. 9 is a schematic diagram of another embodiment of a communication method based on a user mode protocol stack provided by an embodiment of the present application.
  • FIG. 10 is a schematic structural diagram of a server provided by an embodiment of the present application.
  • FIG. 11 is a schematic structural diagram of a client provided by an embodiment of the present application.
  • Fig. 12 is a schematic structural diagram of a computer device provided by an embodiment of the present application.
  • An embodiment of the present application provides a communication method based on a user-mode protocol stack, which is used to improve the versatility of the user-mode protocol stack.
  • Embodiments of the present application also provide corresponding devices, computer-readable storage media, computer program products, and the like. Each will be described in detail below.
  • the communication method based on the user mode protocol stack provided in the embodiment of the present application can be applied to the scenario of communication between the client and the server as shown in FIG. 1 .
  • the server and the client can use the transmission control protocol (transmission control protocol, TCP) to communicate.
  • TCP transmission control protocol
  • the server may include a server, a virtual machine (virtual machine, VM) or a container (container).
  • Clients can include end devices, virtual machines, or containers.
  • the server can be any form of physical machine.
  • Terminal equipment also called user equipment (UE)
  • UE user equipment
  • UE user equipment
  • Terminal equipment can be personal computer (personal computer, PC), mobile phone (mobile phone), tablet computer (pad), computer with wireless transceiver function, virtual reality (virtual reality, VR) terminal, augmented reality (augmented reality, AR) terminal , wireless terminals in industrial control, wireless terminals in self driving, wireless terminals in remote medical, wireless terminals in smart grid, transportation security safety), wireless terminals in a smart city, wireless terminals in a smart home, wireless terminals in the Internet of Things (IoT), etc.
  • PC personal computer
  • VR virtual reality
  • AR augmented reality
  • wireless terminals in industrial control wireless terminals in self driving
  • wireless terminals in remote medical wireless terminals in remote medical
  • wireless terminals in smart grid wireless terminals in smart grid, transportation security safety
  • wireless terminals in a smart city wireless terminals in a smart home
  • wireless terminals in the Internet of Things (IoT) etc.
  • Both the client and the server of this application belong to computer equipment, and the resource allocation mode of the computer equipment adopts the structure of non-uniform memory access (NUMA) system, and the user mode protocol stack is installed on the computer equipment.
  • NUMA non-uniform memory access
  • the system structure of the computer device provided by the embodiment of the present application can be understood by referring to FIG. 2 .
  • a system structure of a computer device includes an application layer, a user state protocol stack, and a hardware layer.
  • the application layer may include one or more applications, the target application may be one of the applications, and the target application corresponds to at least one W thread, and the W thread is a thread for processing data of the target application.
  • the user mode protocol stack includes a plurality of N threads, a routing module, and a TCP hash table corresponding to the plurality of N threads. That is to say, each N thread has a TCP hash table (Hash Table), and the TCP hash table includes the information required by the N thread to execute the TCP protocol.
  • the routing module can be a software package with routing functions, such as: software development kit (software development kit, SDK) or data plane programming kit (data plane development kit, DPDK).
  • the routing module is responsible for hook interface operations (including socket socket(), listening listen(), binding bind(), connecting connect(), waiting for poll() event, extended waiting for epoll() event, sending send(), Receive operations such as recv()).
  • the routing module can realize the routing between W and N according to the corresponding relationship.
  • TCP Hash Table Used to maintain and manage TCP related information, including establishing Establish, binding bind, listening listen, TCP Control Block (TCP Control Block) TCB, FD, etc.
  • the hardware layer includes multiple unbalanced memory access NUMA nodes and network cards, and each NUMA node usually includes multiple processing cores and memory, which may be large page memory.
  • a network card may include multiple network card queues.
  • the processing core may also be referred to as a central processing unit (central processing unit, CPU) core, or CPU for short.
  • an N thread can be configured for each NUMA node.
  • the relationship between the network card queue and the N thread can be pre-configured, or the network card can be selected according to its own logic during the process of establishing a communication connection. of.
  • POSIX portable operating system interface
  • the routing module can hook POSIX to determine the type of operation, such as: listening operation, connection operation, read operation and write operation wait.
  • N threads and NUMA nodes and network card queues shown in Figure 2 is just an example, and in practical applications, it is not limited to the correspondence between N threads and NUMA nodes and network card queues shown in Figure 2 .
  • FD is the index created by the kernel to efficiently manage the opened files , which is used to point to the opened file, and all system calls to perform I/O operations will pass through FD.
  • the FD will be passed in as a parameter, first look up the entry corresponding to the FD from the file descriptor table, take out the handle of the corresponding opened file, and go to the system file descriptor according to the file handle pointing Find the inode pointed to by the file in the table, so as to locate the real location of the file and perform I/O operations.
  • the corresponding relationship maintained by the routing module will be introduced respectively from the server side and the client side.
  • the routing module maintains the first corresponding relationship and the second corresponding relationship.
  • the first correspondence may be called a file descriptor (file description, FD) shadow table, which is the correspondence between the listening FD of the first W thread that initiates the listening operation and the shadow FD corresponding to each N thread. If there are n N threads, the FD shadow table can be in the form of listening FD corresponding to shadow FD1, shadow FD2,..., shadow FDn.
  • the shadow FD refers to the FD that the operating system does not perceive. For the operating system, only the listening FD of the first W thread is perceived.
  • the second correspondence may be called an FD routing table, and the FD routing table records the correspondence between each N thread and the corresponding connection FD, including the correspondence between the target N thread and the corresponding connection FD.
  • the first W thread is one of at least one W thread
  • the target N thread is one of multiple N threads.
  • the server will obtain the first correspondence and the second correspondence through the routing module, and communicate with the client based on the first correspondence and the second correspondence through the routing module.
  • the listening FD refers to the FD related to the listening operation of the first W thread
  • the connection FD refers to the FD generated by the N thread for establishing the TCP connection between the client and the server.
  • One TCP connection has one Connect FD.
  • the connection FD corresponds to the N thread that establishes the TCP connection.
  • the N thread that establishes the TCP connection is called the target N thread.
  • the acquisition process of the first corresponding relationship may include: receiving the listening operation initiated by the first W thread through the routing module, and generating a listening FD for the first W thread; and respectively initiating the listening operation to a plurality of N threads through the routing module , to obtain a plurality of shadow FDs corresponding to a plurality of N threads, and a plurality of shadow FDs correspond to a plurality of N threads one-to-one; through the routing module, the correspondence between the listening FD and the plurality of shadow FDs is established to obtain the first correspondence relation.
  • the obtaining process of the second corresponding relationship may include: obtaining the connection FD generated by the target N thread for establishing the communication connection through the routing module, and the communication connection is established based on the link establishment request sent by the client received by the first network card queue.
  • the network card queue is one of at least one network card queue; the corresponding relationship between the target N threads and the connection FD is established through the routing module to obtain the second corresponding relationship.
  • the above process of communicating with the client based on the first correspondence and the second correspondence may include: through the routing module, based on the relationship between the shadow FD corresponding to the target N thread in the first correspondence and the listening FD corresponding to the first W thread Corresponding relationship, transfer the connection FD corresponding to the target N thread to the first W thread; through the routing module, communicate with the client based on the connection FD and the second corresponding relationship.
  • the communication with the client based on the connection FD and the second correspondence through the routing module may include: receiving the waiting poll/extended message initiated by the second W thread through the routing module Wait for the epoll event.
  • the poll/epoll event includes the connection FD.
  • the connection FD is passed from the first W thread to the second W thread.
  • the second W thread initiates the poll/epoll event and goes to sleep.
  • the second W thread is the target application One of the corresponding multiple W threads; through the routing module, determine that the connection FD is corresponding to the target N thread according to the second correspondence, so as to wait for the wake-up event related to the target thread; after the second W thread is awakened, through the routing module , according to the second corresponding relationship, perform a read operation or a write operation related to the target N thread.
  • connection FD may further include: allocating a receiving queue and a sending queue for the connection FD in the memory of the NUMA node corresponding to the target N thread , the receiving queue is used to record memory addresses of data related to read operations, and the sending queue is used to record memory addresses of data related to write operations.
  • the above-mentioned routing module executes the read operation related to the target N thread, which may include: receiving the read operation initiated by the second W thread or the third W thread through the routing module, and carrying the connection FD in the read operation.
  • the three threads are one of the multiple W threads corresponding to the target application.
  • the connection FD is passed from the second W thread to the third W thread; through the routing module, according to the connection FD, from and Obtain the memory address of the first data in the receiving queue associated with the connection FD.
  • the first data is the data received from the client by the first network card queue associated with the target N thread.
  • the first network card queue is to receive the link establishment request sent by the client.
  • the network card queue acquire the first data according to the memory address of the first data, and pass the first data to the second thread or the third W thread for processing.
  • the aforementioned routing module executes the writing operation related to the target N thread, which may include: receiving the writing operation initiated by the second W thread or the third W thread through the routing module, and carrying the connection FD and the first W thread in the writing operation.
  • the third thread is one of multiple W threads corresponding to the target application.
  • connection FD is passed from the second W thread to the third W thread; through the routing module, according to the connection FD, write the second data in the memory corresponding to the target N thread, and write the memory address of the second data in the memory into the send queue corresponding to the connection FD; when the target N thread polls the first thread in the send queue When the memory address of the second data is set, the second data in the memory is sent to the network card.
  • it may further include: binding the second W thread or the third W thread to the processing core in the NUMA node where the target N thread is located.
  • the user mode protocol stack can also include a wake-up proxy thread, such as P thread 1, P thread 2, ... in Figure 3, P thread n is a wake-up proxy thread, and each N thread corresponds to a wake-up proxy thread.
  • Agent threads for example: N thread 1 corresponds to P thread 1, N thread 2 corresponds to P thread 2, ..., N thread n corresponds to P thread n.
  • the second W thread before performing a read operation or a write operation, the second W thread is woken up by a wakeup proxy thread associated with the target N thread.
  • the server can realize the corresponding association search from N thread to W through the first corresponding relationship (ie shadow table) in the routing module, thereby transferring the connection FD , and then use the second corresponding relationship (routing table) through the routing module to determine the target N threads used in the communication process, thereby completing the communication process.
  • this application does not need to establish the binding relationship between W thread and N thread in advance, nor does it need multiple N threads to share a TCP hash table, and W thread and N thread can be decoupled, thereby improving the user mode protocol.
  • the versatility of the stack in addition, because it does not involve the operation of the kernel, the W thread and the N thread do not need to perform context switching, and also improve the performance of the user mode protocol stack.
  • the routing module in the client will maintain the target correspondence, which includes the correspondence between the connection file descriptor FD and the target N thread.
  • N threads selected by the first W thread the first W thread is one of at least one W thread
  • the target N thread is one of multiple N threads.
  • the client will obtain the target correspondence, and then communicate with the server based on the target correspondence.
  • the acquisition process of the target correspondence may include: receiving the connection operation initiated by the first W thread through the routing module, selecting the target N thread from a plurality of N threads for the connection operation, and generating a connection FD for the first W thread;
  • the corresponding relationship between the target N threads and the connection FD is established through the routing module to obtain the target corresponding relationship.
  • the above communication with the server based on the target correspondence through the routing module may include: through the routing module, according to the connection FD, determine the NUMA node and the network card queue corresponding to the target N thread; through the NUMA node and the network card queue corresponding to the target N thread Send a link establishment request and the first data to the server.
  • the connection FD may also include: assigning a sending queue to the connection FD in the memory of the NUMA node corresponding to the target N thread, and the sending queue is used for recording and writing operations The memory address of the associated data.
  • the above sending the first data to the server through the NUMA node corresponding to the target N thread and the network card queue includes: receiving the write operation initiated by the second W thread through the routing module, the write operation carries the connection FD and the first data, and the second thread One of the multiple W threads corresponding to the target application, when the second thread initiates a write operation, the connection FD is passed from the first W thread to the second W thread; through the routing module, according to the connection FD, the first data Write in the memory corresponding to the target N thread, and write the memory address of the first data in the memory into the send queue corresponding to the connection FD; when the target N thread polls the memory address of the first data in the send queue , and send the first data in the memory to the network card.
  • the write operation Before performing the write operation, it also includes: binding the second W thread to the processing core in the NUMA node where the target N thread is located.
  • the client can determine the target N thread used in the communication process through the target correspondence (routing table) in the routing module, thereby completing the communication process.
  • the target correspondence routing table
  • W thread There is no need to pre-establish the binding relationship between W thread and N thread, and it is not necessary for multiple N threads to share a TCP hash table, thereby improving the versatility of the user mode protocol stack.
  • W thread Because it does not involve kernel operations, W thread There is no need to perform context switching with N threads, and the performance of the user mode protocol stack is also improved.
  • N threads are usually deployed according to the number of NUMA nodes in the hardware layer. Generally, one NUMA node deploys one N thread, and each N thread is bound to a processing core in the corresponding NUMA node. It is also possible to bind the W thread to the NUMA node in advance. Of course, it is not necessary to bind the W thread. If the W thread is not bound, then in the subsequent process of establishing a communication connection or data processing, it can be based on the load balancing strategy or performance optimization. The policy is then bound to the W thread, which is not limited in this application.
  • the working process includes:
  • the server initiates a listening (listen) operation through W thread 1.
  • W thread 1 may also be referred to as the first W thread.
  • the routing module receives the listening operation initiated by W thread 1, and generates a listening FD for W thread 1.
  • the routing module initiates a listening operation to multiple N threads respectively.
  • the multiple N threads are respectively N thread 1, N thread 2, . . . , N thread n.
  • N thread 1 N thread 1
  • N thread 2 . . . , N thread n
  • Each N thread generates a shadow FD for the corresponding listening operation, and feeds the shadow FD back to the routing module.
  • the routing module establishes an FD shadow table for the listening FD of W thread 1 and the shadow FDs of n N threads, that is, the first corresponding relationship.
  • the shadow table can be expressed in the form shown in Table 1 below:
  • Table 1 FD shadow table
  • Table 1 is only a form of expression of the FD shadow table.
  • the form of expression of the FD shadow table is not limited in this embodiment of the application. Other forms that can indicate the corresponding relationship between the listening FD and the shadow FD can be used as this application
  • the FD shadow table as shown in Figure 5, is the representation form of the FD shadow table.
  • the network card queue 1 in the network card receives the link establishment request sent by the client.
  • the link establishment request may be a TCP SYN message.
  • the network card selects N thread 2 as the target N thread according to its own configured logic.
  • connection FD connection FD
  • connection FD in this FIG. 5 is the connection FD2.
  • connection FD2 Return the connection FD2 to the W thread 1 through the routing module, and add the corresponding relationship between the connection FD2 and the N thread 2 to the FD routing table.
  • the process of returning the connection FD2 to the W thread 1 can be: determine that the shadow FD corresponding to the N thread 2 is the shadow FD2, and it can be determined through the FD shadow table that the shadow FD2 corresponds to the listening FD of the W thread 1, so that Pass connection FD2 to W thread1.
  • the routing table in the embodiment of the present application can be understood by referring to Table 2.
  • Table 2 FD routing table
  • Table 2 is only an example of the FD routing table, and the FD routing table may also have other representation forms or corresponding relationships, which are not limited in this embodiment of the present application.
  • W thread 1 passes the connection FD2 to W thread 2.
  • W thread 2 may be referred to as a second W thread.
  • Figure 5 shows the scene where the target application corresponds to multiple threads. If the target thread corresponds to a single thread, this step does not need to be performed, and the epoll/poll operation can be directly initiated by W thread 1.
  • W thread 2 initiates an epoll/poll operation according to the connection FD2, and then the W2 thread goes into a dormant state, and the epoll/poll operation includes the connection FD2.
  • the routing module After the routing module receives the epoll/poll operation, it determines that the connection FD2 corresponds to the N thread 2 according to the connection FD2 and the FD routing table, and waits for the epoll/poll wakeup event of the source N thread 2.
  • connection FD2 corresponds to the N thread 2
  • a receiving queue and a sending queue are allocated for the connection FD2 in the memory of the NUMA2 node corresponding to the N thread 2.
  • connection FDn the expression forms of the connection FDn and the corresponding receiving queue and sending queue can be understood by referring to Table 3.
  • Receive queue n(Rx) Send queue n(Tx) the memory address of the data the memory address of the data ... ...
  • each connection FD corresponds to a receiving queue and a sending queue.
  • the value of n in Table 3 above can be understood as a variable, and different values correspond to different connection FDs.
  • the form of connecting the receiving queue and the sending queue of FD2 as above can be understood as taking n in Table 3 as 2.
  • the receiving queue Rx is used to record memory addresses of data related to read operations
  • the sending queue Tx is used to record memory addresses of data related to write operations.
  • the working process includes:
  • the network card queue 1 in the network card receives the first data.
  • the first data may be TCP data.
  • the network card queue 1 in the network card writes the first data into the memory of the NUMA node 2 corresponding to the N thread 2 .
  • the N thread 2 After polling the first data in the memory, the N thread 2 writes the memory address of the first data into the receiving queue corresponding to the connection FD2. As shown in Table 4:
  • Receive queue 2 (Rx) Send queue 2 (Tx) The memory address of the first data ...
  • N thread 2 wakes up W thread 2 through P thread 2 .
  • P thread 2 is the wake-up agent thread of N thread 2. Waking up W thread 2 through P thread 2 can prevent N thread 2 from entering the system state, so that N thread 2 can always be in the running state, which can improve network performance and reduce network traffic. delay.
  • W thread 2 passes connection FD2 to W thread n.
  • W thread n may be referred to as a third thread.
  • step S35 may not be executed.
  • Connection FD2 is included in this read operation.
  • the routing module takes over the read operation initiated by the W thread n, and determines that the connection FD2 corresponds to the N thread 2 according to the connection FD2 and the FD routing table.
  • the write operation includes connecting FD2 and the second data.
  • the routing module takes over the write operation initiated by the W thread n, and determines that the connection FD2 corresponds to the N thread 2 according to the connection FD2 and the FD routing table.
  • the network card queue 1 sends the second data to the client.
  • the process of performing the write operation also needs to perform the wake-up operation and transmit the FD2 operation, which can be understood by referring to S34 and S35 in FIG. 6 .
  • the communication process described above in FIG. 6 and FIG. 7 may also include binding W thread 2 and W thread 3 to the processing core of NUMA node 2, so that W thread 2, W thread 3 and N thread 2 can share memory There is no need to copy data across NUMA nodes, which can improve communication efficiency and reduce network delay.
  • the working process includes:
  • the client initiates a connection (connect) operation through the W thread 3 .
  • the routing module receives the connection operation initiated by W thread 3, selects N thread 2 as the target N thread from a plurality of N threads for the connection operation of this W thread 3, and generates a connection FD2 (connect FD) for this W thread 3 .
  • the routing module transfers the connection FD2 to the W thread 3.
  • the routing module adds the corresponding relationship between the N thread 2 and the connection FD2 to the FD routing table.
  • the write operation process includes:
  • W thread 3 passes connection FD2 to W thread n.
  • the write operation includes connecting FD2 and the first data.
  • W Thread 3 initiates a write operation.
  • the write operation includes connecting FD2 and the first data.
  • the routing module takes over the write operation initiated by the W thread n, and determines that the connection FD2 corresponds to the N thread 2 according to the connection FD2 and the FD routing table.
  • the network card queue 2 sends the second data to the client.
  • the process of the client's read operation can be understood by referring to the aforementioned process of the server's read operation, except that there is no need to perform a wake-up operation and replace the first data with the second data.
  • the server includes an application layer, a user state protocol stack, and a hardware layer, and the target application of the application layer corresponds to at least one W thread, and the W thread is a user The thread for processing the data of the target application.
  • the user mode protocol stack includes multiple N threads, routing modules, and transmission control protocol hash tables corresponding to the multiple N threads.
  • the N threads are user mode protocol stack threads, and the hardware layer Including multiple unbalanced memory access NUMA nodes and network cards, where multiple N threads correspond to multiple NUMA nodes one by one: the server also includes:
  • the first processing unit 701 is configured to obtain the first correspondence through the routing module, the first correspondence includes the correspondence between the listening file descriptor FD of the first W thread and a plurality of shadow FDs, and the plurality of shadow FDs are for Multiple N threads are generated one-to-one, and the first W thread is one of at least one W thread.
  • the second processing unit 702 is configured to obtain the second correspondence through the routing module, the second correspondence includes the correspondence between the target N thread and the connection FD, and the target N thread is a plurality of N threads that are establishing a communication connection with the client One N thread selected by the network card at the time.
  • the third processing unit 703 is configured to communicate with the client based on the first correspondence obtained by the first processing unit 701 and the second correspondence obtained by the second processing unit 702 through the routing module.
  • the server can realize the corresponding association search from N thread to W through the first corresponding relationship (ie shadow table) in the routing module, thereby transferring the connection FD , and then use the second corresponding relationship (routing table) through the routing module to determine the target N threads used in the communication process, thereby completing the communication process.
  • this application does not need to establish the binding relationship between W thread and N thread in advance, nor does it need multiple N threads to share a TCP hash table, and W thread and N thread can be decoupled, thereby improving the user mode protocol.
  • the versatility of the stack in addition, because it does not involve the operation of the kernel, the W thread and the N thread do not need to perform context switching, and also improve the performance of the user mode protocol stack.
  • the first processing unit 701 is configured to: receive the listening operation initiated by the first W thread through the routing module, and generate a listening FD for the first W thread; initiate the listening operation to multiple N threads respectively through the routing module , to obtain a plurality of shadow FDs corresponding to a plurality of N threads, and a plurality of shadow FDs correspond to a plurality of N threads one-to-one; through the routing module, the correspondence between the listening FD and the plurality of shadow FDs is established to obtain the first correspondence relation.
  • the network card includes at least one network card queue
  • the second processing unit 702 is configured to: use the routing module to obtain the connection FD generated by the target N thread for establishing a communication connection, and the communication connection is sent based on the client received by the first network card queue
  • the first network card queue is one of at least one network card queue
  • the corresponding relationship between the target N thread and the connection FD is established through the routing module to obtain the second corresponding relationship.
  • the third processing unit 703 is configured to: through the routing module, based on the corresponding relationship between the shadow FD corresponding to the target N thread and the listening FD corresponding to the first W thread in the first correspondence, the target N thread The corresponding connection FD is passed to the first W thread; and the routing module communicates with the client based on the connection FD and the second corresponding relationship.
  • the third processing unit 703 is configured to: receive the waiting poll/extended waiting epoll event initiated by the second W thread through the routing module, and the poll/epoll event includes Connect FD, the connection FD is passed from the first W thread to the second W thread, the second W thread initiates a poll/epoll event and goes to sleep, and the second W thread is one of the multiple W threads corresponding to the target application;
  • the routing module according to the second correspondence, it is determined that the connection FD corresponds to the target N thread, so as to wait for the wake-up event related to the target thread; after the second W thread is awakened, through the routing module, according to the second correspondence, execute the connection with the target N thread-related read or write operations.
  • the third processing unit 703 is further configured to wake up the second W thread through a wakeup agent thread associated with the target N thread.
  • the third processing unit 703 is further configured to allocate a receiving queue and a sending queue for the connection FD in the memory of the NUMA node corresponding to the target N thread, the receiving queue is used to record the memory address of the data related to the read operation, and send The queue is used to record memory addresses of data related to write operations.
  • the third processing unit 703 is configured to: receive the read operation initiated by the second W thread or the third W thread through the routing module, the read operation carries the connection FD, and the third thread is one of the multiple W threads corresponding to the target application
  • the connection FD is passed from the second W thread to the third W thread
  • the routing module according to the connection FD, the first data is obtained from the receiving queue associated with the connection FD
  • the first data is the data received from the client by the first network card queue associated with the target N thread
  • the first network card queue is the network card queue that receives the link establishment request sent by the client; according to the memory address of the first data, get first data, and pass the first data to the second thread or the third W thread for processing.
  • the third processing unit 703 is configured to: receive the write operation initiated by the second W thread or the third W thread through the routing module, the write operation carries the connection FD and the second data, and the third thread is the multiple One of the W threads, when the write operation is initiated by the third thread, the connection FD is passed to the third W thread by the second W thread; through the routing module, according to the connection FD, the second data is written to the target N thread In the corresponding memory, write the memory address of the second data in the memory into the send queue corresponding to the connection FD; when the target N thread polls the memory address of the second data in the send queue, write the memory address of the second data in the memory The second data is sent to the network card.
  • the third processing unit 703 is further configured to bind the second W thread or the third W thread to a processing core in the NUMA node where the target N thread is located.
  • the memory in the NUMA node corresponding to the target N threads is a huge page memory.
  • the client 80 includes an application layer, a user-mode protocol stack, and a hardware layer, and the target application of the application layer corresponds to at least one W thread, and the W thread It is a thread for processing the data of the target application, and the user mode protocol stack includes a plurality of N threads, a routing module, and a transmission control protocol hash table corresponding to the plurality of N threads one by one, and the N thread is a user mode protocol stack thread,
  • the hardware layer includes a plurality of unbalanced memory access NUMA nodes, wherein a plurality of N threads corresponds to a plurality of NUMA nodes; the client 80 also includes:
  • the first processing unit 801 is configured to obtain the target correspondence through the routing module, the target correspondence includes the correspondence between the connection file descriptor FD and the target N thread, and the target N thread is selected by the routing module for the first W thread that initiates the connection operation N threads, the first W thread is one of at least one W thread, and the target N thread is one of a plurality of N threads;
  • the second processing unit 802 is configured to communicate with the server based on the target correspondence through the routing module.
  • the client can determine the target N thread used in the communication process through the target correspondence (routing table) in the routing module, thereby completing the communication process.
  • the target correspondence routing table
  • W thread There is no need to pre-establish the binding relationship between W thread and N thread, and it is not necessary for multiple N threads to share a TCP hash table, thereby improving the versatility of the user mode protocol stack.
  • W thread Because it does not involve kernel operations, W thread There is no need to perform context switching with N threads, and the performance of the user mode protocol stack is also improved.
  • the first processing unit 801 is configured to: receive a connection operation initiated by the first W thread through the routing module, select a target N thread from multiple N threads for the connection operation, and generate a connection FD for the first W thread; The corresponding relationship between the target N threads and the connection FD is established through the routing module to obtain the target corresponding relationship.
  • the second processing unit 802 is configured to: determine the NUMA node and network card queue corresponding to the target N thread according to the connection FD through the routing module; send the NUMA node and the network card queue corresponding to the target N thread to the server. chain request, and first data.
  • the second processing unit 802 is further configured to allocate a send queue for the connection FD in the memory of the NUMA node corresponding to the target N thread, and the send queue is used to record the memory address of data related to the write operation.
  • the second processing unit 802 is configured to: receive the write operation initiated by the second W thread through the routing module, the write operation carries the connection FD and the first data, and the second thread is one of the multiple W threads corresponding to the target application.
  • the connection FD is passed from the first W thread to the second W thread; through the routing module, according to the connection FD, the first data is written into the memory corresponding to the target N thread, And write the memory address of the first data in the memory into the sending queue corresponding to the connection FD; when the target N thread polls the memory address of the first data in the sending queue, send the first data in the memory to the network card .
  • the second processing unit 802 is further configured to bind the second W thread to the processing core of the NUMA node where the target N thread is located before performing the write operation.
  • the server 70 and the client 80 described above can be understood by referring to the corresponding content of the previous method embodiments, and will not be repeated here.
  • FIG. 12 is a schematic diagram of a possible logical structure of a computer device 90 provided by an embodiment of the present application.
  • the computer device 90 includes: a plurality of NUMA nodes 900 and a network card 910 , and each NUMA node includes a plurality of processors 901 , memory 902 and a bus 903 .
  • the processor 901 and the memory 902 are connected to each other through a bus 903 .
  • the processor 901 is used to control and manage the actions of the computer device 90 , for example, the processor 901 is used to execute the steps in FIG. 5 to FIG. 9 .
  • the communication interface 902 is used to support the computer device 90 to communicate.
  • the memory 902 is used to store program codes and data of the computer device 90, and provide memory space for the process group. Network cards are used to communicate with other devices.
  • the processor 901 may be a central processing unit, a general processor, a digital signal processor, an application specific integrated circuit, a field programmable gate array or other programmable logic devices, transistor logic devices, hardware components or any combination thereof. It can implement or execute the various illustrative logical blocks, modules and circuits described in connection with the present disclosure.
  • the processor 901 may also be a combination that implements computing functions, for example, a combination of one or more microprocessors, a combination of a digital signal processor and a microprocessor, and the like.
  • the bus 903 may be a Peripheral Component Interconnect (PCI) bus or an Extended Industry Standard Architecture (Extended Industry Standard Architecture, EISA) bus, etc.
  • PCI Peripheral Component Interconnect
  • EISA Extended Industry Standard Architecture
  • a computer-readable storage medium is also provided, and computer-executable instructions are stored in the computer-readable storage medium.
  • the processor of the device executes the computer-executable instructions
  • the device executes the above-mentioned FIG. Steps in Figure 9.
  • a computer program product includes computer-executable instructions stored in a computer-readable storage medium; when the processor of the device executes the computer-executable instructions , the device executes the above-mentioned steps in FIG. 5 to FIG. 9 .
  • a system-on-a-chip is further provided, and the system-on-a-chip includes a processor, and the processor is used for a device supporting memory management to implement the above-mentioned steps in FIG. 5 to FIG. 9 .
  • the system-on-a-chip may further include a memory for storing necessary program instructions and data of the server or client.
  • the system-on-a-chip may consist of chips, or may include chips and other discrete devices.
  • the disclosed systems, devices and methods may be implemented in other ways.
  • the device embodiments described above are only illustrative.
  • the division of units is only a logical function division. In actual implementation, there may be other division methods.
  • multiple units or components can be combined or integrated. to another system, or some features may be ignored, or not implemented.
  • the mutual coupling or direct coupling or communication connection shown or discussed may be through some interfaces, and the indirect coupling or communication connection of devices or units may be in electrical, mechanical or other forms.
  • a unit described as a separate component may or may not be physically separated, and a component displayed as a unit may or may not be a physical unit, that is, it may be located in one place, or may be distributed to multiple network units. Part or all of the units can be selected according to actual needs to achieve the purpose of the solution of this embodiment.
  • each functional unit in each embodiment of the embodiment of the present application may be integrated into one processing unit, each unit may exist separately physically, or two or more units may be integrated into one unit.
  • the functions are realized in the form of software functional units and sold or used as independent products, they can be stored in a computer-readable storage medium.
  • the technical solution of the embodiment of the present application is essentially or the part that contributes to the prior art or the part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium , including several instructions to make a computer device (which may be a personal computer, a server, or a network device, etc.) execute all or part of the steps of the methods in the various embodiments of the embodiments of the present application.
  • the aforementioned storage media include: U disk, mobile hard disk, read-only memory (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic disk or optical disc and other media that can store program codes. .

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Computer And Data Communications (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

本申请公开了一种基于用户态协议栈的通信方法,该方法应用于使用NUMA结构的计算机设备,该计算机设备可以是服务端或客户端。计算机设备包括应用层、用户态协议栈和硬件层,应用层的目标应用对应至少一个W线程,用户态协议栈包括多个N线程、路由模块,以及与多个N线程一一对应的传输控制协议哈希表,硬件层包括多个NUMA节点和网卡,其中,多个N线程与多个NUMA节点一一对应。该方法包括:通过路由模块获取一个W线程的侦听FD与每个N线程的影子FD的影子表,再获取连接FD与N线程的路由表,通过影子表传递连接FD,然后,使用连接FD进行通信,从而解耦了W线程和N线程,提高了用户态协议栈的通用性。

Description

一种基于用户态协议栈的通信方法及相应装置
本申请要求于2021年8月31日提交中国专利局、申请号为202111017331.2、发明名称为“一种基于用户态协议栈的通信方法及相应装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及计算机技术领域,具体涉及一种基于用户态协议栈的通信方法及相应装置。
背景技术
计算机系统中实现应用线程与网络硬件之间的交互通常是通过内核协议栈或用户态协议栈。近年来,网络硬件的输入/输出(input/output,IO)能力得到了很大的发展,因为内核协议栈在处理IO数据的过程中需要频繁的在内核态和用户态之间进行上下文之间切换。所以现有内核协议栈的能力已无法充分释放网络硬件的IO能力,在多种网络IO加速的技术中,用户态协议栈是一种常见且有效的技术手段。
用户态协议栈的设计初衷就是越过(bypass)内核,采取更直接的方式使应用与硬件交互。目前的用户态协议栈通常是将用户态协议栈线程与应用的线程设计成在同一个线程上下文内,这样可以避免线程切换的开销。但是这种用户态协议栈的设计方式,会导致用户态协议栈的线程与应用的线程绑定,导致通用性不够。
发明内容
本申请实施例提供一种基于用户态协议栈的通信方法,用于提高用户态协议栈的通用性。本申请实施例还提供了相应设备、计算机可读存储介质及计算机程序产品等。
本申请第一方面提供一种基于用户态协议栈的通信方法,应用于服务端,服务端包括应用层、用户态协议栈和硬件层,应用层的目标应用对应至少一个W线程,W线程为用于处理目标应用的数据的线程,用户态协议栈包括多个N线程、路由模块,以及与多个N线程一一对应的传输控制协议哈希表,N线程为用户态协议栈线程,硬件层包括多个非均衡内存访问NUMA节点和网卡,其中,多个N线程与多个NUMA节点一一对应;该方法包括:通过路由模块获取第一对应关系,第一对应关系包括第一W线程的侦听文件描述符(file description,FD)与多个影子FD之间的对应关系,多个影子FD是针对多个N线程一对一生成的,第一W线程为至少一个W线程中的一个;通过路由模块获取第二对应关系,第二对应关系包括目标N线程与连接FD之间的对应关系,目标N线程是多个N线程中在建立与客户端的通信连接时被网卡选中的一个N线程;通过路由模块,基于第一对应关系和第二对应关系与客户端通信。
本申请提供的基于用户态协议栈的通信方法可以应用于非均衡内存访问(non-uniform memory access,NUMA)系统,该NUMA系统通常包括多个NUMA节点(node),每个NUMA节点通常都包括多个处理核,内存以及输入/输出(input/output,IO)资源等。本申请中,处理核也可以称为中央处理器(central processing unit,CPU)核,或者简称CPU。
本申请中,服务端可以包括服务器、虚拟机(virtual machine,VM)或容器(container)。客户端可以包括终端设备、虚拟机或容器。
本申请中,应用层可以包括多个应用,目标应用可以是其中的一个应用,目标应用可以对应一个W线程或多个W线程,若目标应用对应一个W线程,则该W线程可以完成侦听、等 待以及数据处理等多种功能。若该目标应用对应多个W线程,则可以由多个W线程分别完成侦听、等待以及数据处理等功能,当然,其中,也可以包括一个线程完成两项或两项以上的功能,如:其中一个W线程即完成等待又完成数据处理的功能。
本申请中的多个包括两个或两个以上,也可以描述为至少两个。
本申请中,用户态协议栈中,每个N线程具有一个传输控制协议(transmission control protocol,TCP)哈希表,TCP哈希表中包括N线程执行TCP协议所需要的信息。每个N线程对应一个NUMA节点,N线程与NUMA节点的对应关系可以是在服务端初始化时配置的。NUMA节点中通常包括多个处理核,N线程可以绑定在其中一个处理核上。路由模块可以是具有路由功能的软件包,如:软件开发工具包(software development kit,SDK)或数据面编程工具包(data plane development kit,DPDK)。该路由模块中包括第一对应关系和第二对应关系。该第一对应关系可以称为FD影子表,为发起侦听操作的第一W线程的侦听FD与每个N线程对应的影子FD之间的对应关系。若有n个N线程,则影子表的形式可以是侦听FD对应影子FD1、影子FD2,…,影子FDn。影子FD指的是操作系统不感知的FD,对于操作系统来说,只感知第一W线程的侦听FD。第二对应关系可以称为路由表,该路由表中记录每个N线程与对应的连接FD之间的对应关系,其中包括目标N线程与对应的连接FD之间的对应关系。
本申请中,侦听FD指的是与第一W线程的侦听操作相关的FD,连接FD指的是N线程为建立客户端与服务端的TCP连接生成的FD,一个TCP连接有一个连接FD。该连接FD对应建立该TCP连接的N线程,本申请中,将建立TCP连接的N线程称为目标N线程。
本申请中,在客户端与服务端进行TCP通信过程中,服务端通过路由模块中的第一对应关系(即影子表),就可以实现从N线程到W相应的关联查找,从而传递连接FD,再通过路由模块使用第二对应关系(路由表)就可以为确定通信过程所使用的目标N线程,从而完成通信过程。由上述方案可知,本申请不需要预先建立W线程与N线程的绑定关系,也不需要多个N线程共用一个TCP哈希表,可以将W线程与N线程解耦,从而提高用户态协议栈的通用性,另外,因为不涉及到内核的操作,W线程与N线程也不需要进行上下文切换,还提高了用户态协议栈的性能。
在第一方面的一种可能的实现方式中,上述步骤:通过路由模块获取第一对应关系,包括:通过路由模块接收第一W线程发起的侦听操作,并为第一W线程生成侦听FD;通过路由模块向多个N线程分别发起侦听操作,以得到多个N线程对应的多个影子FD,多个影子FD与多个N线程一一对应;通过路由模块建立侦听FD与多个影子FD之间的对应关系,以得到第一对应关系。
该种可能的实现方式中,服务端针对目标应用会通过第一W线程周期性的发起侦听操作,以侦听是否有该目标应用的相关数据要接收。第一W线程发起一个侦听操作,路由模块会根据该侦听操作,对每个N线程都发起侦听操作,这样,就可以建立从第一W线程到每个N线程的影子表用于后续的通信过程,不需要预先绑定W线程与N线程的对应关系,从而提高了用户态协议栈的通用性。
在第一方面的一种可能的实现方式中,网卡中包括至少一个网卡队列,上述步骤:通过路由模块获取第二对应关系,包括:通过路由模块获取目标N线程为建立通信连接生成的 连接FD,通信连接是基于第一网卡队列接收的客户端发送的建链请求建立的,第一网卡队列为至少一个网卡队列中的一个;通过路由模块建立目标N线程与连接FD的对应关系,以得到第二对应关系。
该种可能的实现方式中,网卡中通常包括多个网卡队列,一个网卡队列会对应一个N线程,网卡队列与N线程的对应关系不是预先配置的,可以是建立通信连接的过程中确立的。若网卡中的第一网卡队列接收到客户端的建链请求,该网络根据自身逻辑,为该建链请求选择了目标N线程,则就建立了第一网卡队列与目标N线程之间的第二对应关系。该第二对应关系保存在路由模块中,这样,后续通过该连接FD产生的通信关系都可以通过该路由模块确定对应的目标N线程,完成后续的通信过程,提高了通信的灵活性。
在第一方面的一种可能的实现方式中,上述步骤:通过路由模块,基于第一对应关系和第二对应关系与客户端通信,包括:通过路由模块,基于第一对应关系中目标N线程对应的影子FD与第一W线程对应的侦听FD之间的对应关系,将与目标N线程对应的连接FD传递给第一W线程;通过路由模块,基于连接FD,以及第二对应关系与客户端通信。
该种可能的实现方式中,通过影子表可以将连接FD传递给第一W线程,这样,目标应用的相关W线程就可以使用该连接FD执行后续其他操作,并且路由模块也可以根据W线程发起的其他操作中的连接FD确定所对应的目标N线程来执行相关操作,完成通信过程。
在第一方面的一种可能的实现方式中,当目标应用对应的W线程有多个时,通过路由模块,基于连接FD,以及第二对应关系与客户端通信,包括:通过路由模块,接收第二W线程发起的等待poll/扩展的等待epoll事件,poll/epoll事件中包括连接FD,连接FD是第一W线程传递给第二W线程的,第二W线程发起poll/epoll事件后转入休眠状态,第二W线程为目标应用对应的多个W线程中的一个;通过路由模块,根据第二对应关系确定连接FD与目标N线程对应,以等待与目标线程相关的唤醒事件;在第二W线程被唤醒后,通过路由模块,根据第二对应关系,执行与目标N线程相关的读操作或写操作。
该种可能的实现方式中,当目标应用对应的W线程有多个时,如:Mysql的线程模型是由一个master线程负责完成侦听(listen),新建TCP连接交由auth线程,最终的SQL请求交由Worker线程完成。这种情况,需要第一W线程将连接FD传递给第二W线程,由第二W线程触发poll/epoll事件,然后转入睡眠状态,等待相关数据到来后目标线程的唤醒事件。第二W线程被唤醒后,再执行后续的通信过程,这样,即可以减少第二W线程处于活跃状态的功耗,又不会影响通信过程,提高了系统的性能。
在第一方面的一种可能的实现方式中,该方法还包括:通过与目标N线程关联的唤醒代理线程唤醒第二W线程。
该种可能的实现方式中,通过与目标N线程关联的唤醒代理线程唤醒第二W线程,可以避免目标N线程进入系统态,使得目标N线程可以一直处于运行状态,从而减少了通信过程中的网络时延。
在第一方面的一种可能的实现方式中,在根据第二对应关系确定连接FD与目标N线程对应之后,该方法还包括:在目标N线程对应的NUMA节点的内存中为连接FD分配接收队列和发送队列,接收队列用于记录与读操作相关的数据的内存地址,发送队列用于记录与写操作 相关的数据的内存地址。
该种可能的实现方式中,通过连接FD关联接收队列和发送队列,可以快速关联到相应的目标N线程,从而提高了通信过程中的系统性能。
在第一方面的一种可能的实现方式中,上述步骤:通过路由模块,根据第二对应关系,执行与目标N线程相关的读操作,包括:通过路由模块接收第二W线程或第三W线程发起的读操作,读操作中携带连接FD,第三线程为目标应用对应的多个W线程中的一个,当由第三线程发起读操作时,连接FD是第二W线程传递给第三W线程的;通过路由模块,根据连接FD,从与连接FD关联的接收队列中获取第一数据的内存地址,第一数据是与目标N线程关联的第一网卡队列从客户端接收的数据,第一网卡队列是接收客户端发送的建链请求的网卡队列;根据第一数据的内存地址,获取第一数据,并将第一数据传递给二线程或第三W线程进行处理。
该种可能的实现方式中,读操作的过程可以是第二W线程直接发起的,也可以是第二W线程将连接FD传递给第三W线程,由第三W线程发起的,若是在mysql的场景中,第二W线程可以是auth线程,第三W线程可以是worker线程。针对读操作的过程,第一网卡队列接收到客户端发送过来的第一数据后,会将该第一数据发送到对应的目标N线程所关联的NUMA节点的内存中进行存储,该第一数据在内存中的内存地址会存储到连接FD所关联的接收队列中。这样,在第二W线程或第三W线程发起的读操作后,目标N线程就可以根据读操作中的连接FD,从相应的接收队列中获取第一数据的内存地址,进而再从内存中读取到第一数据,将该第一数据传递到第二W线程或第三W线程的缓冲区,由第二W线程或第三W线程处理该第一数据。该服务端读取客户端的数据的过程,由路由模块通过连接FD确定对应的目标N线程来完成相应的处理过程,可以提高数据读取的效率。
在第一方面的一种可能的实现方式中,上述步骤:通过路由模块,根据第二对应关系,执行与目标N线程相关的写操作,包括:通过路由模块接收第二W线程或第三W线程发起的写操作,写操作中携带连接FD和第二数据,第三线程为目标应用对应的多个W线程中的一个,当由第三线程发起写操作时,连接FD是第二W线程传递给第三W线程的;通过路由模块,根据连接FD,将第二数据写入与目标N线程对应的内存中,并将第二数据在内存中的内存地址写入与连接FD对应的发送队列;当目标N线程轮询到发送队列中的第二数据的内存地址时,将内存中的第二数据发送到网卡。
该种可能的实现方式中,第二W线程与第三W线程的关系可以参阅上述读操作中的关系进行理解。写操作的过程,路由模块根据连接FD确定到对应的目标N线程,进而将第二数据写入到该目标N线程对应的内存中,再将第二数据的内存地址写入到连接FD所对应的发送队列中,这样,目标N线程轮训到发送队列中的第二数据的内存地址后,就会将内存中的第二数据发送到网卡中的第一网卡队列,由第一网卡队列将该第二数据发送给客户端。
在第一方面的一种可能的实现方式中,在执行读操作或写操作之前,该方法还包括:将第二W线程或第三W线程绑定到目标N线程所在的NUMA节点中的处理核上。
该种可能的实现方式中,将第二W线程或第三W线程绑定到目标N线程所在的NUMA节点中的处理核上,不需要跨NUMA节点就可以完成TCP通信过程,目标N线程与第二W线程和第三W 先共享内存中的数据,从而提高了通信速度,降低了网络时延。
在第一方面的一种可能的实现方式中,与目标N线程对应的NUMA节点中的内存为大页内存。
该种可能的实现方式中,通过设置大页内存的方式,可以降低缓存查找未命中的概率。
本申请第二方面提供一种基于用户态协议栈的通信方法,应用于客户端,客户端包括应用层、用户态协议栈和硬件层,应用层的目标应用对应至少一个W线程,W线程为用于处理目标应用的数据的线程,用户态协议栈包括多个N线程、路由模块,以及与多个N线程一一对应的传输控制协议哈希表,N线程为用户态协议栈线程,硬件层包括多个非均衡内存访问NUMA节点,其中,多个N线程与多个NUMA节点一一对应;该方法包括:通过路由模块获取目标对应关系,目标对应关系包括连接文件描述符FD与目标N线程的对应关系,目标N线程是路由模块为发起连接操作的第一W线程选择的N线程,第一W线程为至少一个W线程中的一个,目标N线程是多个N线程中的一个;通过路由模块,基于目标对应关系与服务端通信。
该第二方面所涉及到的与第一方面相同的特征可以参阅第一方面的解释进行理解。在客户端中,通过路由模块建立连接FD与目标N线程的对应关系,这样,在后续通信过程中,就可以根据该连接FD,确定对应的目标N线程,进而执行后续通信操作,不需要W线程与N线程绑定,从而提高用户态协议栈的通用性,另外,因为不涉及到内核的操作,W线程与N线程也不需要进行上下文切换,还提高了用户态协议栈的性能。
在第二方面的一种可能的实现方式中,上述步骤:通过路由模块获取目标对应关系,包括:通过路由模块接收第一W线程发起的连接操作,为该连接操作从多个N线程中选择目标N线程,并为第一W线程生成连接FD;通过路由模块建立目标N线程与连接FD的对应关系,以得到目标对应关系。
在第二方面的一种可能的实现方式中,上述步骤:通过路由模块,基于目标对应关系与服务端通信,包括:通过路由模块,根据连接FD,确定与目标N线程对应的NUMA节点以及网卡队列;通过与目标N线程对应的NUMA节点以及网卡队列向服务端发送建链请求,以及第一数据。
在第二方面的一种可能的实现方式中,上述步骤:在根据连接FD,确定与目标N线程对应的NUMA节点以及网卡队列之后,该方法还包括:在目标N线程对应的NUMA节点的内存中为连接FD分配发送队列,发送队列用于记录与写操作相关的数据的内存地址。
在第二方面的一种可能的实现方式中,上述步骤:通过与目标N线程对应的NUMA节点以及网卡队列向服务端发送第一数据,包括:通过路由模块接收第二W线程发起的写操作,写操作中携带连接FD和第一数据,第二线程为目标应用对应的多个W线程中的一个,当由第二线程发起写操作时,连接FD是第一W线程传递给第二W线程的;通过路由模块,根据连接FD,将第一数据写入与目标N线程对应的内存中,并将第一数据在内存中的内存地址写入与连接FD对应的发送队列;当目标N线程轮询到发送队列中的第一数据的内存地址时,将内存中的第一数据发送到网卡。
在第二方面的一种可能的实现方式中,在执行所述写操作之前,该方法还包括:将第二W线程绑定到目标N线程所在的NUMA节点中的处理核上。
该第二方面的任一种可能的实现方式所涉及到的与第一方面或第一方面的任一种可能的实现方式相同的特征可以参阅第一方面或第一方面的任一种可能的实现方式的解释进行理解。
本申请第三方面提供一种服务端,该服务端具有实现上述第一方面或第一方面任意一种可能实现方式的方法的功能。该功能可以通过硬件实现,也可以通过硬件执行相应的软件实现。该硬件或软件包括一个或多个与上述功能相对应的模块,例如:第一处理单元、第二处理单元和第三处理单元,这三个处理单元可以通过一个处理单元或多个处理单元来实现。
本申请第四方面提供一种客户端,该客户端具有实现上述第二方面或第二方面任意一种可能实现方式的方法的功能。该功能可以通过硬件实现,也可以通过硬件执行相应的软件实现。该硬件或软件包括一个或多个与上述功能相对应的模块,例如:第一处理单元和第二处理单元,这两个单元可以通过一个处理单元来实现。
本申请第五方面提供一种计算机设备,该计算机设备包括至少一个处理器、存储器、输入/输出(input/output,I/O)接口以及存储在存储器中并可在处理器上运行的计算机执行指令,当计算机执行指令被处理器执行时,处理器执行如上述第一方面或第一方面任意一种可能的实现方式的方法。
本申请第六方面提供一种计算机设备,该计算机设备包括至少一个处理器、存储器、输入/输出(input/output,I/O)接口以及存储在存储器中并可在处理器上运行的计算机执行指令,当计算机执行指令被处理器执行时,处理器执行如上述第二方面或第二方面任意一种可能的实现方式的方法。
本申请第七方面提供一种存储一个或多个计算机执行指令的计算机可读存储介质,当计算机执行指令被处理器执行时,一个或多个处理器执行如上述第一方面或第一方面任意一种可能的实现方式的方法。
本申请第八方面提供一种存储一个或多个计算机执行指令的计算机程序产品,当计算机执行指令被一个或多个处理器执行时,一个或多个处理器执行如上述第二方面或第二方面任意一种可能的实现方式的方法。
本申请第九方面提供了一种芯片系统,该芯片系统包括至少一个处理器,至少一个处理器用于支持服务端实现上述第一方面或第一方面任意一种可能的实现方式中所涉及的功能。在一种可能的设计中,芯片系统还可以包括存储器,存储器,用于保存服务端必要的程序指令和数据。该芯片系统,可以由芯片构成,也可以包含芯片和其他分立器件。
本申请第十方面提供了一种芯片系统,该芯片系统包括至少一个处理器,至少一个处理器用于支持客户端实现上述第二方面或第二方面任意一种可能的实现方式中所涉及的功能。在一种可能的设计中,芯片系统还可以包括存储器,存储器,用于保存客户端必要的程序指令和数据。该芯片系统,可以由芯片构成,也可以包含芯片和其他分立器件。
附图说明
图1是服务端与客户端的一通信场景示意图;
图2是本申请实施例提供的计算机设备的一结构示意图;
图3是本申请实施例提供的服务端的一结构示意图;
图4是本申请实施例提供的客户端的一结构示意图;
图5是本申请实施例提供的基于用户态协议栈的通信方法的一实施例示意图;
图6是本申请实施例提供的基于用户态协议栈的通信方法的另一实施例示意图;
图7是本申请实施例提供的基于用户态协议栈的通信方法的另一实施例示意图;
图8是本申请实施例提供的基于用户态协议栈的通信方法的另一实施例示意图;
图9是本申请实施例提供的基于用户态协议栈的通信方法的另一实施例示意图;
图10是本申请实施例提供的服务端的一结构示意图;
图11是本申请实施例提供的客户端的一结构示意图;
图12是本申请实施例提供的计算机设备的一结构示意图。
具体实施方式
下面结合附图,对本申请的实施例进行描述,显然,所描述的实施例仅仅是本申请一部分的实施例,而不是全部的实施例。本领域普通技术人员可知,随着技术发展和新场景的出现,本申请实施例提供的技术方案对于类似的技术问题,同样适用。
本申请的说明书和权利要求书及上述附图中的术语“第一”、“第二”等是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。应该理解这样使用的数据在适当情况下可以互换,以便这里描述的实施例能够以除了在这里图示或描述的内容以外的顺序实施。此外,术语“包括”和“具有”以及他们的任何变形,意图在于覆盖不排他的包含,例如,包含了一系列步骤或单元的过程、方法、系统、产品或设备不必限于清楚地列出的那些步骤或单元,而是可包括没有清楚地列出的或对于这些过程、方法、产品或设备固有的其它步骤或单元。
本申请实施例提供一种基于用户态协议栈的通信方法,用于提高用户态协议栈的通用性。本申请实施例还提供了相应设备、计算机可读存储介质及计算机程序产品等。以下分别进行详细说明。
本申请实施例所提供的基于用户态协议栈的通信方法可以应用于如图1所示的客户端与服务端通信的场景。服务端和客户端可以采用传输控制协议(transmission control protocol,TCP)进行通信。其中,服务端可以包括服务器、虚拟机(virtual machine,VM)或容器(container)。客户端可以包括终端设备、虚拟机或容器。
服务器可以是任意形态的物理机。
终端设备(也可以称为用户设备(user equipment,UE))是一种具有无线收发功能的设备,可以部署在陆地上,包括室内或室外、手持或车载;也可以部署在水面上(如轮船等);还可以部署在空中(例如飞机、气球和卫星上等)。终端设备可以个人电脑(personal computer,PC)、手机(mobile phone)、平板电脑(pad)、带无线收发功能的电脑、虚拟现实(virtual reality,VR)终端、增强现实(augmented reality,AR)终端、工业控制(industrial control)中的无线终端、无人驾驶(self driving)中的无线终端、远程医疗(remote medical)中的无线终端、智能电网(smart grid)中的无线终端、运输安全(transportation safety)中的无线终端、智慧城市(smart city)中的无线终端、 智慧家庭(smart home)中的无线终端、以物联网(internet of things,IoT)中的无线终端等。
本申请的客户端和服务端都属于计算机设备,该计算机设备的资源配置方式采用非均衡内存访问(non-uniform memory access,NUMA)系统的结构,且计算机设备上安装有用户态协议栈。
无论是客户端,还是服务端,本申请实施例所提供的计算机设备的系统结构都可以参阅图2进行理解。
如图2所示,本申请实施例提供的计算机设备的一系统结构包括应用层、用户态协议栈和硬件层。
其中,应用层可以包括一个或多个应用,目标应用可以是其中的一个应用,该目标应用对应至少一个W线程,W线程为用于处理目标应用的数据的线程。
用户态协议栈包括多个N线程、路由模块,以及与多个N线程一一对应的TCP哈希表。也就是说,每个N线程具有一个TCP哈希表(Hash Table),TCP哈希表中包括N线程执行TCP协议所需要的信息。路由模块可以是具有路由功能的软件包,如:软件开发工具包(software development kit,SDK)或数据面编程工具包(data plane development kit,DPDK)。路由模块负责hook接口操作(包括套接字socket(),侦听listen(),捆绑bind(),连接connect(),等待poll()事件,扩展的等待epoll()事件,发送send(),接收recv()等操作)。路由模块可以根据对应关系实现W与N之间的路由。TCP Hash Table:用于维护管理TCP相关信息,包括建立Establish、捆绑bind、侦听listen、TCP控制块(TCP Control Block)TCB、FD等。
硬件层包括多个非均衡内存访问NUMA节点和网卡,每个NUMA节点通常包括多个处理核以及内存,该内存可以是大页内存。网卡中可以包括多个网卡队列。处理核也可以称为中央处理器(central processing unit,CPU)核,或者简称CPU。
在计算机设备初始化时或者资源配置时,可以为每个NUMA节点配置一个N线程,网卡队列与N线程的关系可以是预先配置好的,也可以是建立通信连接的过程中,网卡根据自身逻辑选择的。
需要说明的是,应用层和用户态协议栈之间可以配置有便携式操作系统接口(POSIX),路由模块可以hook POSIX,确定操作的类型,如:侦听操作,连接操作、读操作和写操作等。
图2中所示的N线程与NUMA节点与网卡队列之间的对应关系只是一种示例,实际应用中,并不限于图2中所示的N线程与NUMA节点与网卡队列之间的对应关系。
上述图2所示的路由模块中所维护的对应关系在服务端与客户端略有不同,但都涉及到了文件描述符(file description,FD),下面先对FD进行介绍。
Linux系统中,把一切都看做是文件,当进程或线程打开现有文件或创建新文件时,内核向进程或线程返回一个FD,FD就是内核为了高效管理已被打开的文件所创建的索引,用来指向被打开的文件,所有执行I/O操作的系统调用都会通过FD。当需要进行I/O操作的时候,会传入FD作为参数,先从文件描述符表查找该FD对应的条目,取出对应的已经打开 的文件的句柄,根据文件句柄指向,去系统文件描述符表中查找到该文件指向的inode,从而定位到该文件的真正位置,进行I/O操作。
下面分别从服务端和客户端对路由模块所维护的对应关系进行介绍。
如图3所示,在服务端,路由模块中会维护第一对应关系和第二对应关系。第一对应关系可以称为文件描述符(file description,FD)影子表,为发起侦听操作的第一W线程的侦听FD与每个N线程对应的影子FD之间的对应关系。若有n个N线程,则FD影子表的形式可以是侦听FD对应影子FD1、影子FD2,…,影子FDn。影子FD指的是操作系统不感知的FD,对于操作系统来说,只感知第一W线程的侦听FD。第二对应关系可以称为FD路由表,该FD路由表中记录每个N线程与对应的连接FD之间的对应关系,其中包括目标N线程与对应的连接FD之间的对应关系。其中,第一W线程为至少一个W线程中的一个,目标N线程是多个N线程中的一个。
在通信过程中,服务端会通过路由模块获取第一对应关系和第二对应关系,通过路由模块,基于第一对应关系和第二对应关系与客户端通信。
本申请实施例中,侦听FD指的是与第一W线程的侦听操作相关的FD,连接FD指的是N线程为建立客户端与服务端的TCP连接生成的FD,一个TCP连接有一个连接FD。该连接FD对应建立该TCP连接的N线程,本申请实施例中,将建立TCP连接的N线程称为目标N线程。
其中,第一对应关系的获取过程可以包括:通过路由模块接收第一W线程发起的侦听操作,并为第一W线程生成侦听FD;通过路由模块向多个N线程分别发起侦听操作,以得到多个N线程对应的多个影子FD,多个影子FD与多个N线程一一对应;通过路由模块建立侦听FD与多个影子FD之间的对应关系,以得到第一对应关系。
其中,第二对应关系的获取过程可以包括:通过路由模块获取目标N线程为建立通信连接生成的连接FD,通信连接是基于第一网卡队列接收的客户端发送的建链请求建立的,第一网卡队列为至少一个网卡队列中的一个;通过路由模块建立目标N线程与连接FD的对应关系,以得到第二对应关系。
上述基于第一对应关系和第二对应关系与客户端通信的过程可以包括:通过路由模块,基于第一对应关系中目标N线程对应的影子FD与第一W线程对应的侦听FD之间的对应关系,将与目标N线程对应的连接FD传递给第一W线程;通过路由模块,基于连接FD,以及第二对应关系与客户端通信。
当目标应用对应的W线程有多个时,上述通过路由模块,基于连接FD,以及第二对应关系与客户端通信,可以包括:通过路由模块,接收第二W线程发起的等待poll/扩展的等待epoll事件,poll/epoll事件中包括连接FD,连接FD是第一W线程传递给第二W线程的,第二W线程发起poll/epoll事件后转入休眠状态,第二W线程为目标应用对应的多个W线程中的一个;通过路由模块,根据第二对应关系确定连接FD与目标N线程对应,以等待与目标线程相关的唤醒事件;在第二W线程被唤醒后,通过路由模块,根据第二对应关系,执行与目标N线程相关的读操作或写操作。
上述根据所述第二对应关系确定所述连接FD与所述目标N线程对应之后,还可以包括:在所述目标N线程对应的NUMA节点的内存中为所述连接FD分配接收队列和发送队列,所述接 收队列用于记录与读操作相关的数据的内存地址,所述发送队列用于记录与写操作相关的数据的内存地址。
上述通过路由模块,根据第二对应关系,执行与目标N线程相关的读操作,可以包括:通过路由模块接收第二W线程或第三W线程发起的读操作,读操作中携带连接FD,第三线程为目标应用对应的多个W线程中的一个,当由第三线程发起读操作时,连接FD是第二W线程传递给第三W线程的;通过路由模块,根据连接FD,从与连接FD关联的接收队列中获取第一数据的内存地址,第一数据是与目标N线程关联的第一网卡队列从客户端接收的数据,第一网卡队列是接收客户端发送的建链请求的网卡队列;根据第一数据的内存地址,获取第一数据,并将第一数据传递给二线程或第三W线程进行处理。
上述通过路由模块,根据第二对应关系,执行与目标N线程相关的写操作,可以包括:通过路由模块接收第二W线程或第三W线程发起的写操作,写操作中携带连接FD和第二数据,第三线程为目标应用对应的多个W线程中的一个,当由第三线程发起写操作时,连接FD是第二W线程传递给第三W线程的;通过路由模块,根据连接FD,将第二数据写入与目标N线程对应的内存中,并将第二数据在内存中的内存地址写入与连接FD对应的发送队列;当目标N线程轮询到发送队列中的第二数据的内存地址时,将内存中的第二数据发送到网卡。
在执行读操作或写操作之前,还可以包括:将第二W线程或第三W线程绑定到目标N线程所在的NUMA节点中的处理核上。
另外,在服务端,用户态协议栈中还可以包括唤醒代理线程,如图3中的P线程1、P线程2,…,P线程n都是唤醒代理线程,且每个N线程对应一个唤醒代理线程,如:N线程1对应P线程1、N线程2对应P线程2,…,N线程n对应P线程n。本申请实施例中,在执行读操作或写操作之前,通过与目标N线程关联的唤醒代理线程唤醒第二W线程。
本申请中,在客户端与服务端进行TCP通信过程中,服务端通过路由模块中的第一对应关系(即影子表),就可以实现从N线程到W相应的关联查找,从而传递连接FD,再通过路由模块使用第二对应关系(路由表)就可以确定通信过程所使用的目标N线程,从而完成通信过程。由上述方案可知,本申请不需要预先建立W线程与N线程的绑定关系,也不需要多个N线程共用一个TCP哈希表,可以将W线程与N线程解耦,从而提高用户态协议栈的通用性,另外,因为不涉及到内核的操作,W线程与N线程也不需要进行上下文切换,还提高了用户态协议栈的性能。
在客户端,如图4所示,客户端中的路由模块会维护目标对应关系,该目标对应关系包括连接文件描述符FD与目标N线程的对应关系,目标N线程是路由模块为发起连接操作的第一W线程选择的N线程,第一W线程为至少一个W线程中的一个,目标N线程是多个N线程中的一个。
在通信过程中,客户端会获取该目标对应关系,然后基于该目标对应关系与服务端通信。
其中,目标对应关系的获取过程可以包括:通过路由模块接收第一W线程发起的连接操作,为该连接操作从多个N线程中选择目标N线程,并为第一W线程生成连接FD;
通过路由模块建立目标N线程与连接FD的对应关系,以得到目标对应关系。
上述通过路由模块,基于目标对应关系与服务端通信,可以包括:通过路由模块,根据连接FD,确定与目标N线程对应的NUMA节点以及网卡队列;通过与目标N线程对应的NUMA节点以及网卡队列向服务端发送建链请求,以及第一数据。
而且,根据连接FD,确定与目标N线程对应的NUMA节点以及网卡队列之后,还可以包括:在目标N线程对应的NUMA节点的内存中为连接FD分配发送队列,发送队列用于记录与写操作相关的数据的内存地址。
上述通过与目标N线程对应的NUMA节点以及网卡队列向服务端发送第一数据,包括:通过路由模块接收第二W线程发起的写操作,写操作中携带连接FD和第一数据,第二线程为目标应用对应的多个W线程中的一个,当由第二线程发起写操作时,连接FD是第一W线程传递给第二W线程的;通过路由模块,根据连接FD,将第一数据写入与目标N线程对应的内存中,并将第一数据在内存中的内存地址写入与连接FD对应的发送队列;当目标N线程轮询到发送队列中的第一数据的内存地址时,将内存中的第一数据发送到网卡。
在执行写操作之前,还包括:将第二W线程绑定到目标N线程所在的NUMA节点中的处理核上。
本申请实施例中,在客户端与服务端进行TCP通信过程中,客户端通过路由模块中的目标对应关系(路由表)可以确定通信过程所使用的目标N线程,从而完成通信过程。不需要预先建立W线程与N线程的绑定关系,也不需要多个N线程共用一个TCP哈希表,从而提高用户态协议栈的通用性,另外,因为不涉及到内核的操作,W线程与N线程也不需要进行上下文切换,还提高了用户态协议栈的性能。
以上描述了服务端与客户端的差异,下面分别结合附图分别介绍在TCP通信连接建立和TCP数据处理过程中服务端的工作过程,以及客户端的工作过程。
需要说明的是,无论是服务端,还是客户端,在建立TCP通信连接以及进行数据处理之前,都需要进行N线程的部署以及资源配置。通常按照硬件层中NUMA节点的数量部署N线程,一般一个NUMA节点部署一个N线程,会将每个N线程与对应的NUMA节点中的一个处理核绑定。也可以预先将W线程与NUMA节点绑定,当然,也可以不绑定W线程,如果没有绑定W线程,那么在后续建立通信连接或者数据处理的过程中,可以根据负载均衡策略或者性能优化策略再绑定W线程,对此,本申请中不做限定。
下面介绍服务端与客户端的TCP通信连接建立和TCP数据处理过程。
一、服务端在TCP通信连接建立过程中的工作过程。
如图5所示,该工作过程包括:
S10.服务端通过W线程1发起侦听(listen)操作。
W线程1也可以称为第一W线程。
S11.路由模块接收W线程1发起的侦听操作,并为W线程1生成侦听FD。
S12.路由模块向多个N线程分别发起侦听操作。
图5中,多个N线程分别为N线程1、N线程2,…,N线程n。当然,N线程也可以有两个,图5所示出的只是一种示例。
S13.每个N线程为对应的侦听操作生成一个影子FD,并将影子FD反馈给路由模块。
S14.路由模块为W线程1的侦听FD与n个N线程的影子FD建立FD影子表,即:第一对应关系。
该影子表可以表示为如下表1所示的形式:
表1:FD影子表
Figure PCTCN2022115019-appb-000001
当然,表1只是FD影子表的一种表现形式,对于FD影子表的表现形式,本申请实施例中不做限定,其他可以表示侦听FD与影子FD的对应关系的形式都可以作为本申请的FD影子表,如图5中所示的FD影子表的表现形式。
S15.网卡中的网卡队列1接收到客户端发送的建链请求。
该建链请求可以为TCP SYN报文。
S16.网卡根据自身配置的逻辑选择N线程2作为目标N线程。
S17.N线程2建立TCP连接,生成连接FD(connection FD)。
该图5中该连接FD为连接FD2。
S18.通过路由模块,将该连接FD2返回给W线程1,并将该连接FD2与N线程2的对应关系添加到FD路由表中。
通过路由模块,将该连接FD2返回给W线程1的过程可以是:确定N线程2对应的影子FD为影子FD2,通过FD影子表可以确定影子FD2对应的是W线程1的侦听FD,从而将连接FD2传递给W线程1。
本申请实施例中的路由表可以参阅表2进行理解。
表2:FD路由表
连接FD N线程
连接FD1 N线程1
连接FD2 N线程2
连接FDn N线程n
需要说明的是,表2只是FD路由表的一种示例,FD路由表还可以有其他表现形式或者对应关系,对此本申请实施例不做限定。
S19.W线程1将该连接FD2传递给W线程2。
W线程2可以称为第二W线程。
该图5所示的是目标应用对应多线程的场景,如果目标线程对应的是单线程,则不需要执行该步骤,直接由W线程1发起epoll/poll操作即可。
S20.W线程2根据连接FD2发起epoll/poll操作,然后W2线程转入休眠状态,该epoll/poll操作中包括连接FD2。
S21.路由模块接收到epoll/poll操作后,根据连接FD2,以及FD路由表确定该连接FD2对应N线程2,则等待来源N线程2的epoll/poll唤醒事件。
另外,在确定连接FD2对应N线程2后,还在N线程2对应的NUMA2节点的内存中为连接FD2分配接收队列和发送队列。
本申请实施例中,连接FDn与对应的接收队列和发送队列的表示形式可以参阅表3进行理解。
表3:连接FDn对应的接收队列和发送队列
接收队列n(Rx) 发送队列n(Tx)
数据的内存地址 数据的内存地址
本申请实施例中,每个连接FD都会对应一个的接收队列和发送队列,上述表3中n的取值可以理解为是一个变量,不同取值对应不同的连接FD。如上述连接FD2的接收队列和发送队列的表格形式可以理解为将表3中的n取2。
接收队列Rx用于记录与读操作相关的数据的内存地址,发送队列Tx用于记录与写操作相关的数据的内存地址。
二、服务端在TCP数据处理过程中的工作过程。
如图6所示,该工作过程包括:
S31.网卡中的网卡队列1接收到第一数据。
该第一数据可以为TCP数据。
S32.网卡中的网卡队列1将第一数据写入N线程2所对应的NUMA节点2的内存中。
S33.N线程2在内存中轮询到该第一数据后,将该第一数据的内存地址写入到连接FD2对应的接收队列中。如表4所示:
表4:连接FD2对应的接收队列和发送队列
接收队列2(Rx) 发送队列2(Tx)
第一数据的内存地址
S34.N线程2通过P线程2唤醒W线程2。
P线程2是N线程2的唤醒代理线程,通过P线程2唤醒W线程2可以避免N线程2进入系统态,这样N线程2就可以一直处于运行(running)态,可以提高网络性能,减少网络时延。
S35.W线程2将连接FD2传递给W线程n。
W线程n可以称为第三线程。
若是W线程2可以发起读操作,则可以不执行步骤S35。
S36.若W线程n发起读操作,则执行后续步骤S37、S38和S39。
该读操作中包括连接FD2。
S37.路由模块接管W线程n发起的读操作,根据连接FD2及FD路由表确定连接FD2对应N线程2。
S38.通过N线程2到NUMA节点2的内存中,从连接FD2对应的接收队列2中获取第一数据的内存地址,并根据第一数据的内存地址从内存中读取第一数据。
S39.将第一数据复制到W线程n对应的缓存(buffer)中,通过W线程n处理第一数据。
本申请实施例执行写操作的过程可以参阅图7进行理解,如图7所示:
S40.若W线程n发起写操作,则执行后续步骤S41、S42、S43和S44。
该写操作中包括连接FD2和第二数据。
S41.路由模块接管W线程n发起的写操作,根据连接FD2及FD路由表确定连接FD2对应N线程2。
S42.将第二数据写入到N线程2对应的NUMA节点2的内存中,并将第二数据的内存地址写入到FD2对应的发送队列中,如表5所示:
表5:连接FD2对应的接收队列和发送队列
接收队列2(Rx) 发送队列2(Tx)
第二数据的内存地址
S43.N线程2轮询发送队列2时,根据第二数据的内存地址,将第二数据发送到网卡队列1。
S44.网卡队列1将第二数据发送给客户端。
另外,执行写操作的过程也需要执行唤醒操作和传递FD2操作,可以参阅图6中的S34和S35进行理解。
以上图6和图7所描述的通信过程,还可以包括将W线程2和W线程3绑定到NUMA节点2的处理核上,这样,W线程2和W线程3和N线程2可以共享内存中的数据,不需要跨NUMA节点复制数据,可以提高通信效率,降低网络时延。
三、客户端在通信连接建立过程中的工作过程。
如图8所示,该工作过程包括:
S50.客户端通过W线程3发起连接(connect)操作。
S51.路由模块接收W线程3发起的连接操作,为该W线程3的连接操作从多个N线程中选择了N线程2作为目标N线程,并为该W线程3生成连接FD2(connect FD)。
S52.路由模块将连接FD2传递给W线程3。
S53.路由模块将N线程2与连接FD2的对应关系添加到FD路由表中。
S54.在N线程2对应的NUMA节点2的内存中为连接FD2分配接收队列和发送队列。
该过程可以参阅上述服务端中的相关内容进行理解,此处不再过多赘述。
S55.通过与N线程2对应的NUMA节点2以及网卡队列2向服务端发送建链请求。
四、客户端在TCP数据处理过程中的工作过程。
如图9所示,写操作过程包括:
S60.W线程3将连接FD2传递给W线程n。
该写操作中包括连接FD2和第一数据。
S61.W线程3发起写操作。
该写操作中包括连接FD2和第一数据。
S62.路由模块接管W线程n发起的写操作,根据连接FD2及FD路由表确定连接FD2对应N线程2。
S63.将第一数据写入到N线程2对应的NUMA节点2的内存中,并将第二数据的内存地址写入到FD2对应的发送队列中,可以参阅前面的表5进行理解。
S64.N线程2轮询发送队列2时,根据第一数据的内存地址,将第一数据发送到网卡队列2。
S65.网卡队列2将第二数据发送给客户端。
客户端读操作的过程可以参阅前述服务端的读操作的过程进行理解,只是不需要执行唤醒操作,以及将第一数据替换为第二数据。
为了验证本申请实施例提供的用户态协议栈在通信过程中的性能,工程开发人员对采用本申请的方案处理mysql请求,以及采用现有技术的方案处理mysql请求都做了反复的实验,实验结果显示,采用本申请方案处理mysql请求时,160个以下的TCP连接都可以保持线性度趋近1,当连接数超过240个以后依然可以保持较好的性能。而采用现有技术的方案处理mysql请求时,TCP连接数到达40个就会出现严重的多线程访存竞争,性能极速下降。由本申请方案的效果与现有技术的效果的对比可见,本申请的方案可以有效的降低多线程访存竞争,提高通信过程中计算机设备的性能。
以上介绍了基于用户态协议栈的通信方法,下面结合附图,介绍本申请实施例提供的服务端和客户端。
如图10所示,本申请实施例提供的服务端70的一实施例中,服务端包括应用层、用户态协议栈和硬件层,应用层的目标应用对应至少一个W线程,W线程为用于处理目标应用的数据的线程,用户态协议栈包括多个N线程、路由模块,以及与多个N线程一一对应的传输控制协议哈希表,N线程为用户态协议栈线程,硬件层包括多个非均衡内存访问NUMA节点和网卡,其中,多个N线程与多个NUMA节点一一对应:该服务端还包括:
第一处理单元701,用于通过路由模块获取第一对应关系,第一对应关系包括第一W线程的侦听文件描述符FD与多个影子FD之间的对应关系,多个影子FD是针对多个N线程一对一生成的,第一W线程为至少一个W线程中的一个。
第二处理单元702,用于通过路由模块获取第二对应关系,第二对应关系包括目标N线程与连接FD之间的对应关系,目标N线程是多个N线程中在建立与客户端的通信连接时被网卡选中的一个N线程。
第三处理单元703,用于通过路由模块,基于第一处理单元701获得的第一对应关系和第二处理单元702获得的第二对应关系与客户端通信。
本申请中,在客户端与服务端进行TCP通信过程中,服务端通过路由模块中的第一对应关系(即影子表),就可以实现从N线程到W相应的关联查找,从而传递连接FD,再通过路由模块使用第二对应关系(路由表)就可以确定通信过程所使用的目标N线程,从而完成通信过程。由上述方案可知,本申请不需要预先建立W线程与N线程的绑定关系,也不需要多个N线程共用一个TCP哈希表,可以将W线程与N线程解耦,从而提高用户态协议栈的通用性,另外,因为不涉及到内核的操作,W线程与N线程也不需要进行上下文切换,还提高了用户态协议栈的性能。
可选地,第一处理单元701用于:通过路由模块接收第一W线程发起的侦听操作,并为 第一W线程生成侦听FD;通过路由模块向多个N线程分别发起侦听操作,以得到多个N线程对应的多个影子FD,多个影子FD与多个N线程一一对应;通过路由模块建立侦听FD与多个影子FD之间的对应关系,以得到第一对应关系。
可选地,网卡中包括至少一个网卡队列,第二处理单元702用于:通过路由模块获取目标N线程为建立通信连接生成的连接FD,通信连接是基于第一网卡队列接收的客户端发送的建链请求建立的,第一网卡队列为至少一个网卡队列中的一个;通过路由模块建立目标N线程与连接FD的对应关系,以得到第二对应关系。
可选地,第三处理单元703用于:通过路由模块,基于第一对应关系中目标N线程对应的影子FD与第一W线程对应的侦听FD之间的对应关系,将与目标N线程对应的连接FD传递给第一W线程;通过路由模块,基于连接FD,以及第二对应关系与客户端通信。
可选地,当目标应用对应的W线程有多个时,第三处理单元703用于:通过路由模块,接收第二W线程发起的等待poll/扩展的等待epoll事件,poll/epoll事件中包括连接FD,连接FD是第一W线程传递给第二W线程的,第二W线程发起poll/epoll事件后转入休眠状态,第二W线程为目标应用对应的多个W线程中的一个;通过路由模块,根据第二对应关系确定连接FD与目标N线程对应,以等待与目标线程相关的唤醒事件;在第二W线程被唤醒后,通过路由模块,根据第二对应关系,执行与目标N线程相关的读操作或写操作。
可选地,第三处理单元703,还用于通过与目标N线程关联的唤醒代理线程唤醒第二W线程。
可选地,第三处理单元703,还用于在目标N线程对应的NUMA节点的内存中为连接FD分配接收队列和发送队列,接收队列用于记录与读操作相关的数据的内存地址,发送队列用于记录与写操作相关的数据的内存地址。
可选地,第三处理单元703用于:通过路由模块接收第二W线程或第三W线程发起的读操作,读操作中携带连接FD,第三线程为目标应用对应的多个W线程中的一个,当由第三线程发起读操作时,连接FD是第二W线程传递给第三W线程的;通过路由模块,根据连接FD,从与连接FD关联的接收队列中获取第一数据的内存地址,第一数据是与目标N线程关联的第一网卡队列从客户端接收的数据,第一网卡队列是接收客户端发送的建链请求的网卡队列;根据第一数据的内存地址,获取第一数据,并将第一数据传递给二线程或第三W线程进行处理。
可选地,第三处理单元703用于:通过路由模块接收第二W线程或第三W线程发起的写操作,写操作中携带连接FD和第二数据,第三线程为目标应用对应的多个W线程中的一个,当由第三线程发起写操作时,连接FD是第二W线程传递给第三W线程的;通过路由模块,根据连接FD,将第二数据写入与目标N线程对应的内存中,并将第二数据在内存中的内存地址写入与连接FD对应的发送队列;当目标N线程轮询到发送队列中的第二数据的内存地址时,将内存中的第二数据发送到网卡。
可选地,第三处理单元703,还用于将第二W线程或第三W线程绑定到目标N线程所在的NUMA节点中的处理核上。
可选地,与目标N线程对应的NUMA节点中的内存为大页内存。
如图11所示,本申请实施例提供的客户端80的一实施例中,该客户端80包括应用层、用户态协议栈和硬件层,应用层的目标应用对应至少一个W线程,W线程为用于处理目标应用的数据的线程,用户态协议栈包括多个N线程、路由模块,以及与多个N线程一一对应的传输控制协议哈希表,N线程为用户态协议栈线程,硬件层包括多个非均衡内存访问NUMA节点,其中,多个N线程与多个NUMA节点一一对应;客户端80还包括:
第一处理单元801,用于通过路由模块获取目标对应关系,目标对应关系包括连接文件描述符FD与目标N线程的对应关系,目标N线程是路由模块为发起连接操作的第一W线程选择的N线程,第一W线程为至少一个W线程中的一个,目标N线程是多个N线程中的一个;
第二处理单元802,用于通过路由模块,基于目标对应关系与服务端通信。
本申请实施例中,在客户端与服务端进行TCP通信过程中,客户端通过路由模块中的目标对应关系(路由表)可以确定通信过程所使用的目标N线程,从而完成通信过程。不需要预先建立W线程与N线程的绑定关系,也不需要多个N线程共用一个TCP哈希表,从而提高用户态协议栈的通用性,另外,因为不涉及到内核的操作,W线程与N线程也不需要进行上下文切换,还提高了用户态协议栈的性能。
可选地,第一处理单元801用于:通过路由模块接收第一W线程发起的连接操作,为该连接操作从多个N线程中选择目标N线程,并为第一W线程生成连接FD;通过路由模块建立目标N线程与连接FD的对应关系,以得到目标对应关系。
可选地,第二处理单元802用于:通过路由模块,根据连接FD,确定与目标N线程对应的NUMA节点以及网卡队列;通过与目标N线程对应的NUMA节点以及网卡队列向服务端发送建链请求,以及第一数据。
可选地,第二处理单元802,还用于在目标N线程对应的NUMA节点的内存中为连接FD分配发送队列,发送队列用于记录与写操作相关的数据的内存地址。
可选地,第二处理单元802用于:通过路由模块接收第二W线程发起的写操作,写操作中携带连接FD和第一数据,第二线程为目标应用对应的多个W线程中的一个,当由第二线程发起写操作时,连接FD是第一W线程传递给第二W线程的;通过路由模块,根据连接FD,将第一数据写入与目标N线程对应的内存中,并将第一数据在内存中的内存地址写入与连接FD对应的发送队列;当目标N线程轮询到发送队列中的第一数据的内存地址时,将内存中的第一数据发送到网卡。
可选地,第二处理单元802,还用于在执行写操作之前,将第二W线程绑定到目标N线程所在的NUMA节点中的处理核上。
以上所描述的服务端70和客户端80可以参阅前面方法实施例的相应内容进行理解,此处不在重复赘述。
图12所示,为本申请的实施例提供的计算机设备90的一种可能的逻辑结构示意图。计算机设备90包括:多个NUMA节点900和网卡910,每个NUMA节点中包括多个处理器901、存储器902以及总线903。处理器901、以及存储器902通过总线903相互连接。在本申请的实施例中,处理器901用于对计算机设备90的动作进行控制管理,例如,处理器901用于执行图5至图9中的步骤。通信接口902用于支持计算机设备90进行通信。存储器902,用于存储计算 机设备90的程序代码和数据,并为进程组提供内存空间。网卡用于与其他设备通信。
其中,处理器901可以是中央处理器单元,通用处理器,数字信号处理器,专用集成电路,现场可编程门阵列或者其他可编程逻辑器件、晶体管逻辑器件、硬件部件或者其任意组合。其可以实现或执行结合本申请公开内容所描述的各种示例性的逻辑方框,模块和电路。处理器901也可以是实现计算功能的组合,例如包含一个或多个微处理器组合,数字信号处理器和微处理器的组合等等。总线903可以是外设部件互连标准(Peripheral Component Interconnect,PCI)总线或扩展工业标准结构(Extended Industry Standard Architecture,EISA)总线等。总线可以分为地址总线、数据总线、控制总线等。为便于表示,图12中仅用一条粗线表示,但并不表示仅有一根总线或一种类型的总线。
在本申请的另一实施例中,还提供一种计算机可读存储介质,计算机可读存储介质中存储有计算机执行指令,当设备的处理器执行该计算机执行指令时,设备执行上述图5至图9中的步骤。
在本申请的另一实施例中,还提供一种计算机程序产品,该计算机程序产品包括计算机执行指令,该计算机执行指令存储在计算机可读存储介质中;当设备的处理器执行该计算机执行指令时,设备执行上述图5至图9中的步骤。
在本申请的另一实施例中,还提供一种芯片系统,该芯片系统包括处理器,该处理器用于支持内存管理的装置实现上述图5至图9中的步骤。在一种可能的设计中,芯片系统还可以包括存储器,存储器,用于保存服务端或客户端必要的程序指令和数据。该芯片系统,可以由芯片构成,也可以包含芯片和其他分立器件。
本领域普通技术人员可以意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请实施例的范围。
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的系统、装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。
在本申请实施例所提供的几个实施例中,应该理解到,所揭露的系统、装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。
作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
另外,在本申请实施例各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。
功能如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请实施例的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本申请实施例各个实施例方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(Read-Only Memory,ROM)、随机存取存储器(Random Access Memory,RAM)、磁碟或者光盘等各种可以存储程序代码的介质。
以上,仅为本申请实施例的具体实施方式,但本申请实施例的保护范围并不局限于此。

Claims (41)

  1. 一种基于用户态协议栈的通信方法,应用于服务端,其特征在于,所述服务端包括应用层、所述用户态协议栈和硬件层,所述应用层的目标应用对应至少一个W线程,所述W线程为用于处理所述目标应用的数据的线程,所述用户态协议栈包括多个N线程、路由模块,以及与所述多个N线程一一对应的传输控制协议哈希表,所述N线程为用户态协议栈线程,所述硬件层包括多个非均衡内存访问NUMA节点和网卡,其中,所述多个N线程与所述多个NUMA节点一一对应;所述方法包括:
    通过所述路由模块获取第一对应关系,所述第一对应关系包括第一W线程的侦听文件描述符FD与多个影子FD之间的对应关系,所述多个影子FD是针对所述多个N线程一对一生成的,所述第一W线程为所述至少一个W线程中的一个;
    通过所述路由模块获取第二对应关系,所述第二对应关系包括目标N线程与连接FD之间的对应关系,所述目标N线程是所述多个N线程中在建立与客户端的通信连接时被所述网卡选中的一个N线程;
    通过所述路由模块,基于所述第一对应关系和所述第二对应关系与所述客户端通信。
  2. 根据权利要求1所述的通信方法,其特征在于,所述通过所述路由模块获取第一对应关系,包括:
    通过所述路由模块接收所述第一W线程发起的侦听操作,并为所述第一W线程生成所述侦听FD;
    通过所述路由模块向所述多个N线程分别发起侦听操作,以得到所述多个N线程对应的多个影子FD,所述多个影子FD与所述多个N线程一一对应;
    通过所述路由模块建立所述侦听FD与所述多个影子FD之间的对应关系,以得到所述第一对应关系。
  3. 根据权利要求1或2所述的通信方法,其特征在于,所述网卡中包括至少一个网卡队列,所述通过所述路由模块获取第二对应关系,包括:
    通过所述路由模块获取所述目标N线程为建立所述通信连接生成的所述连接FD,所述通信连接是基于第一网卡队列接收的所述客户端发送的建链请求建立的,所述第一网卡队列为所述至少一个网卡队列中的一个;
    通过所述路由模块建立所述目标N线程与所述连接FD的对应关系,以得到所述第二对应关系。
  4. 根据权利要求1-3任一项所述的通信方法,其特征在于,所述通过所述路由模块,基于所述第一对应关系和所述第二对应关系与所述客户端通信,包括:
    通过所述路由模块,基于所述第一对应关系中所述目标N线程对应的影子FD与所述第一W线程对应的侦听FD之间的对应关系,将与所述目标N线程对应的连接FD传递给所述第一W线程;
    通过所述路由模块,基于所述连接FD,以及所述第二对应关系与所述客户端通信。
  5. 根据权利要求4所述的通信方法,其特征在于,当所述目标应用对应的W线程有多个时,所述通过所述路由模块,基于所述连接FD,以及所述第二对应关系与所述客户端通信, 包括:
    通过所述路由模块,接收第二W线程发起的等待poll/扩展的等待epoll事件,所述poll/epoll事件中包括所述连接FD,所述连接FD是所述第一W线程传递给所述第二W线程的,所述第二W线程发起所述poll/epoll事件后转入休眠状态,所述第二W线程为所述目标应用对应的多个W线程中的一个;
    通过所述路由模块,根据所述第二对应关系确定所述连接FD与所述目标N线程对应,以等待与所述目标线程相关的唤醒事件;
    在所述第二W线程被唤醒后,通过所述路由模块,根据所述第二对应关系,执行与所述目标N线程相关的读操作或写操作。
  6. 根据权利要求5所述的通信方法,其特征在于,所述方法还包括:
    通过与所述目标N线程关联的唤醒代理线程唤醒所述第二W线程。
  7. 根据权利要求5或6所述的通信方法,其特征在于,所述根据所述第二对应关系确定所述连接FD与所述目标N线程对应之后,所述方法还包括:
    在所述目标N线程对应的NUMA节点的内存中为所述连接FD分配接收队列和发送队列,所述接收队列用于记录与读操作相关的数据的内存地址,所述发送队列用于记录与写操作相关的数据的内存地址。
  8. 根据权利要求7所述的通信方法,其特征在于,所述通过所述路由模块,根据所述第二对应关系,执行与所述目标N线程相关的读操作,包括:
    通过所述路由模块接收所述第二W线程或第三W线程发起的读操作,所述读操作中携带所述连接FD,所述第三线程为所述目标应用对应的多个W线程中的一个,当由所述第三线程发起所述读操作时,所述连接FD是所述第二W线程传递给所述第三W线程的;
    通过所述路由模块,根据所述连接FD,从与所述连接FD关联的所述接收队列中获取第一数据的内存地址,所述第一数据是与所述目标N线程关联的第一网卡队列从所述客户端接收的数据,所述第一网卡队列是接收所述客户端发送的建链请求的网卡队列;
    根据所述第一数据的内存地址,获取所述第一数据,并将所述第一数据传递给所述二线程或所述第三W线程进行处理。
  9. 根据权利要求7所述的通信方法,其特征在于,所述通过所述路由模块,根据所述第二对应关系,执行与所述目标N线程相关的写操作,包括:
    通过所述路由模块接收所述第二W线程或第三W线程发起的写操作,所述写操作中携带所述连接FD和第二数据,所述第三线程为所述目标应用对应的多个W线程中的一个,当由所述第三线程发起所述写操作时,所述连接FD是所述第二W线程传递给所述第三W线程的;
    通过所述路由模块,根据所述连接FD,将所述第二数据写入与所述目标N线程对应的内存中,并将所述第二数据在所述内存中的内存地址写入与所述连接FD对应的所述发送队列;
    当所述目标N线程轮询到所述发送队列中的所述第二数据的内存地址时,将所述内存中的所述第二数据发送到所述网卡。
  10. 根据权利要求8或9所述的通信方法,其特征在于,在执行所述读操作或写操作之前,所述方法还包括:
    将所述第二W线程或所述第三W线程绑定到所述目标N线程所在的NUMA节点中的处理核上。
  11. 根据权利要求7-10任一项所述的通信方法,其特征在于,与所述目标N线程对应的NUMA节点中的内存为大页内存。
  12. 一种基于用户态协议栈的通信方法,应用于客户端,其特征在于,所述客户端包括应用层、所述用户态协议栈和硬件层,所述应用层的目标应用对应至少一个W线程,所述W线程为用于处理所述目标应用的数据的线程,所述用户态协议栈包括多个N线程、路由模块,以及与所述多个N线程一一对应的传输控制协议哈希表,所述N线程为用户态协议栈线程,所述硬件层包括多个非均衡内存访问NUMA节点,其中,所述多个N线程与所述多个NUMA节点一一对应;所述方法包括:
    通过路由模块获取目标对应关系,所述目标对应关系包括连接文件描述符FD与目标N线程的对应关系,所述目标N线程是所述路由模块为发起连接操作的第一W线程选择的N线程,所述第一W线程为所述至少一个W线程中的一个,所述目标N线程是所述多个N线程中的一个;
    通过所述路由模块,基于所述目标对应关系与服务端通信。
  13. 根据权利要求12所述的通信方法,其特征在于,所述通过路由模块获取目标对应关系,包括:
    通过所述路由模块接收所述第一W线程发起的连接操作,为该连接操作从所述多个N线程中选择目标N线程,并为所述第一W线程生成所述连接FD;
    通过所述路由模块建立所述目标N线程与所述连接FD的对应关系,以得到所述目标对应关系。
  14. 根据权利要求12所述的通信方法,其特征在于,所述通过所述路由模块,基于所述目标对应关系与服务端通信,包括:
    通过所述路由模块,根据所述连接FD,确定与所述目标N线程对应的NUMA节点以及网卡队列;
    通过所述与所述目标N线程对应的NUMA节点以及网卡队列向所述服务端发送建链请求,以及第一数据。
  15. 根据权利要求12-14任一项所述的通信方法,其特征在于,所述根据所述连接FD,确定与所述目标N线程对应的NUMA节点以及网卡队列之后,所述方法还包括:
    在所述目标N线程对应的NUMA节点的内存中为所述连接FD分配发送队列,所述发送队列用于记录与写操作相关的数据的内存地址。
  16. 根据权利要求15所述的通信方法,其特征在于,所述通过所述与所述目标N线程对应的NUMA节点以及网卡队列向所述服务端发送第一数据,包括:
    通过所述路由模块接收第二W线程发起的写操作,所述写操作中携带所述连接FD和第一数据,所述第二线程为所述目标应用对应的多个W线程中的一个,当由所述第二线程发起所述写操作时,所述连接FD是所述第一W线程传递给所述第二W线程的;
    通过所述路由模块,根据所述连接FD,将所述第一数据写入与所述目标N线程对应的内存中,并将所述第一数据在所述内存中的内存地址写入与所述连接FD对应的所述发送队列;
    当所述目标N线程轮询到所述发送队列中的所述第一数据的内存地址时,将所述内存中的所述第一数据发送到所述网卡。
  17. 根据权利要求16所述的通信方法,其特征在于,在执行所述写操作之前,所述方法还包括:将所述第二W线程绑定到所述目标N线程所在的NUMA节点中的处理核上。
  18. 一种服务端,其特征在于,所述服务端包括应用层、所述用户态协议栈和硬件层,所述应用层的目标应用对应至少一个W线程,所述W线程为用于处理所述目标应用的数据的线程,所述用户态协议栈包括多个N线程、路由模块,以及与所述多个N线程一一对应的传输控制协议哈希表,所述N线程为用户态协议栈线程,所述硬件层包括多个非均衡内存访问NUMA节点和网卡,其中,所述多个N线程与所述多个NUMA节点一一对应;所述服务端还包括:
    第一处理单元,用于通过所述路由模块获取第一对应关系,所述第一对应关系包括第一W线程的侦听文件描述符FD与多个影子FD之间的对应关系,所述多个影子FD是针对所述多个N线程一对一生成的,所述第一W线程为所述至少一个W线程中的一个;
    第二处理单元,用于通过所述路由模块获取第二对应关系,所述第二对应关系包括目标N线程与连接FD之间的对应关系,所述目标N线程是所述多个N线程中在建立与客户端的通信连接时被所述网卡选中的一个N线程;
    第三处理单元,用于通过所述路由模块,基于所述第一处理单元获得的第一对应关系和所述第二处理单元获得的第二对应关系与所述客户端通信。
  19. 根据权利要求18所述的服务端,其特征在于,
    第一处理单元用于:
    通过所述路由模块接收所述第一W线程发起的侦听操作,并为所述第一W线程生成所述侦听FD;
    通过所述路由模块向所述多个N线程分别发起侦听操作,以得到所述多个N线程对应的多个影子FD,所述多个影子FD与所述多个N线程一一对应;
    通过所述路由模块建立所述侦听FD与所述多个影子FD之间的对应关系,以得到所述第一对应关系。
  20. 根据权利要求18所述的服务端,其特征在于,所述网卡中包括至少一个网卡队列,
    所述第二处理单元用于:
    通过所述路由模块获取所述目标N线程为建立所述通信连接生成的所述连接FD,所述通信连接是基于第一网卡队列接收的所述客户端发送的建链请求建立的,所述第一网卡队列为所述至少一个网卡队列中的一个;
    通过所述路由模块建立所述目标N线程与所述连接FD的对应关系,以得到所述第二对应关系。
  21. 根据权利要求18-20任一项所述的服务端,其特征在于,
    所述第三处理单元用于:
    通过所述路由模块,基于所述第一对应关系中所述目标N线程对应的影子FD与所述第一W线程对应的侦听FD之间的对应关系,将与所述目标N线程对应的连接FD传递给所述第一W线程;
    通过所述路由模块,基于所述连接FD,以及所述第二对应关系与所述客户端通信。
  22. 根据权利要求21所述的服务端,其特征在于,当所述目标应用对应的W线程有多个时,所述第三处理单元用于:
    通过所述路由模块,接收第二W线程发起的等待poll/扩展的等待epoll事件,所述poll/epoll事件中包括所述连接FD,所述连接FD是所述第一W线程传递给所述第二W线程的,所述第二W线程发起所述poll/epoll事件后转入休眠状态,所述第二W线程为所述目标应用对应的多个W线程中的一个;
    通过所述路由模块,根据所述第二对应关系确定所述连接FD与所述目标N线程对应,以等待与所述目标线程相关的唤醒事件;
    在所述第二W线程被唤醒后,通过所述路由模块,根据所述第二对应关系,执行与所述目标N线程相关的读操作或写操作。
  23. 根据权利要求22所述的服务端,其特征在于,
    所述第三处理单元,还用于通过与所述目标N线程关联的唤醒代理线程唤醒所述第二W线程。
  24. 根据权利要求22或23所述的服务端,其特征在于,
    所述第三处理单元,还用于在所述目标N线程对应的NUMA节点的内存中为所述连接FD分配接收队列和发送队列,所述接收队列用于记录与读操作相关的数据的内存地址,所述发送队列用于记录与写操作相关的数据的内存地址。
  25. 根据权利要求24所述的服务端,其特征在于,
    所述第三处理单元用于:
    通过所述路由模块接收所述第二W线程或第三W线程发起的读操作,所述读操作中携带所述连接FD,所述第三线程为所述目标应用对应的多个W线程中的一个,当由所述第三线程发起所述读操作时,所述连接FD是所述第二W线程传递给所述第三W线程的;
    通过所述路由模块,根据所述连接FD,从与所述连接FD关联的所述接收队列中获取第一数据的内存地址,所述第一数据是与所述目标N线程关联的第一网卡队列从所述客户端接收的数据,所述第一网卡队列是接收所述客户端发送的建链请求的网卡队列;
    根据所述第一数据的内存地址,获取所述第一数据,并将所述第一数据传递给所述二线程或所述第三W线程进行处理。
  26. 根据权利要求24所述的服务端,其特征在于,
    所述第三处理单元用于:
    通过所述路由模块接收所述第二W线程或第三W线程发起的写操作,所述写操作中携带所述连接FD和第二数据,所述第三线程为所述目标应用对应的多个W线程中的一个,当由所述第三线程发起所述写操作时,所述连接FD是所述第二W线程传递给所述第三W线程的;
    通过所述路由模块,根据所述连接FD,将所述第二数据写入与所述目标N线程对应的内存中,并将所述第二数据在所述内存中的内存地址写入与所述连接FD对应的所述发送队列;
    当所述目标N线程轮询到所述发送队列中的所述第二数据的内存地址时,将所述内存中的所述第二数据发送到所述网卡。
  27. 根据权利要求25或26所述的服务端,其特征在于,
    所述第三处理单元,还用于将所述第二W线程或所述第三W线程绑定到所述目标N线程所在的NUMA节点中的处理核上。
  28. 一种客户端,其特征在于,所述客户端包括应用层、所述用户态协议栈和硬件层,所述应用层的目标应用对应至少一个W线程,所述W线程为用于处理所述目标应用的数据的线程,所述用户态协议栈包括多个N线程、路由模块,以及与所述多个N线程一一对应的传输控制协议哈希表,所述N线程为用户态协议栈线程,所述硬件层包括多个非均衡内存访问NUMA节点,其中,所述多个N线程与所述多个NUMA节点一一对应;所述客户端还包括:
    第一处理单元,用于通过路由模块获取目标对应关系,所述目标对应关系包括连接文件描述符FD与目标N线程的对应关系,所述目标N线程是所述路由模块为发起连接操作的第一W线程选择的N线程,所述第一W线程为所述至少一个W线程中的一个,所述目标N线程是所述多个N线程中的一个;
    第二处理单元,用于通过所述路由模块,基于所述目标对应关系与服务端通信。
  29. 根据权利要求28所述的客户端,其特征在于,
    所述第一处理单元用于:
    通过所述路由模块接收所述第一W线程发起的连接操作,为该连接操作从所述多个N线程中选择目标N线程,并为所述第一W线程生成所述连接FD;
    通过所述路由模块建立所述目标N线程与所述连接FD的对应关系,以得到所述目标对应关系。
  30. 根据权利要求28所述的客户端,其特征在于,
    所述第二处理单元用于:
    通过所述路由模块,根据所述连接FD,确定与所述目标N线程对应的NUMA节点以及网卡队列;
    通过所述与所述目标N线程对应的NUMA节点以及网卡队列向所述服务端发送建链请求,以及第一数据。
  31. 根据权利要求28-30任一项所述的客户端,其特征在于,
    所述第二处理单元,还用于在所述目标N线程对应的NUMA节点的内存中为所述连接FD分配发送队列,所述发送队列用于记录与写操作相关的数据的内存地址。
  32. 根据权利要求31所述的客户端,其特征在于,
    所述第二处理单元用于:
    通过所述路由模块接收第二W线程发起的写操作,所述写操作中携带所述连接FD和第一数据,所述第二线程为所述目标应用对应的多个W线程中的一个,当由所述第二线程发起所述写操作时,所述连接FD是所述第一W线程传递给所述第二W线程的;
    通过所述路由模块,根据所述连接FD,将所述第一数据写入与所述目标N线程对应的内存中,并将所述第一数据在所述内存中的内存地址写入与所述连接FD对应的所述发送队列;
    当所述目标N线程轮询到所述发送队列中的所述第一数据的内存地址时,将所述内存中的所述第一数据发送到所述网卡。
  33. 根据权利要求32所述的客户端,其特征在于,
    所述第二处理单元,还用于在执行所述写操作之前,将所述第二W线程绑定到所述目标N线程所在的NUMA节点中的处理核上。
  34. 一种计算设备,其特征在于,包括一个或多个处理器和存储有计算机程序的计算机可读存储介质;
    所述计算机程序被所述一个或多个处理器执行时实现如权利要求1-11任一项所述的方法。
  35. 一种计算设备,其特征在于,包括一个或多个处理器和存储有计算机程序的计算机可读存储介质;
    所述计算机程序被所述一个或多个处理器执行时实现如权利要求12-17任一项所述的方法。
  36. 一种芯片系统,其特征在于,包括一个或多个处理器,所述一个或多个处理器被调用用于执行如权利要求1-11任一项所述的方法。
  37. 一种芯片系统,其特征在于,包括一个或多个处理器,所述一个或多个处理器被调用用于执行如权利要求12-17任一项所述的方法。
  38. 一种计算机可读存储介质,其上存储有计算机程序,其特征在于,所述计算机程序被一个或多个处理器执行时实现如权利要求1-11任一项所述的方法。
  39. 一种计算机可读存储介质,其上存储有计算机程序,其特征在于,所述计算机程序被一个或多个处理器执行时实现如权利要求12-17任一项所述的方法。
  40. 一种计算机程序产品,其特征在于,包括计算机程序,所述计算机程序当被一个或多个处理器执行时用于实现如权利要求1-11任一项所述的方法。
  41. 一种计算机程序产品,其特征在于,包括计算机程序,所述计算机程序当被一个或多个处理器执行时用于实现如权利要求12-17任一项所述的方法。
PCT/CN2022/115019 2021-08-31 2022-08-26 一种基于用户态协议栈的通信方法及相应装置 WO2023030178A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111017331.2 2021-08-31
CN202111017331.2A CN115766044A (zh) 2021-08-31 2021-08-31 一种基于用户态协议栈的通信方法及相应装置

Publications (1)

Publication Number Publication Date
WO2023030178A1 true WO2023030178A1 (zh) 2023-03-09

Family

ID=85331921

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/115019 WO2023030178A1 (zh) 2021-08-31 2022-08-26 一种基于用户态协议栈的通信方法及相应装置

Country Status (2)

Country Link
CN (1) CN115766044A (zh)
WO (1) WO2023030178A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117041379A (zh) * 2023-07-10 2023-11-10 中科驭数(北京)科技有限公司 同时监听用户态协议栈和内核态协议栈新建连接的方法及装置

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2015143904A1 (zh) * 2014-03-28 2015-10-01 华为技术有限公司 并行用户态协议栈的管理方法和协议栈系统
CN109617833A (zh) * 2018-12-25 2019-04-12 深圳市任子行科技开发有限公司 多线程用户态网络协议栈系统的nat数据审计方法和系统
CN110493329A (zh) * 2019-08-08 2019-11-22 西藏宁算科技集团有限公司 一种基于用户态协议栈的并发推送服务方法和系统
CN111143062A (zh) * 2019-12-19 2020-05-12 上海交通大学 一种用户态协议栈对外部负载进程的均衡分割策略
CN111314311A (zh) * 2020-01-19 2020-06-19 苏州浪潮智能科技有限公司 一种提高交换机性能的方法、系统、设备及介质
CN111934894A (zh) * 2019-05-13 2020-11-13 烽火通信科技股份有限公司 基于dpdk管理无线网络接口的方法及系统

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2015143904A1 (zh) * 2014-03-28 2015-10-01 华为技术有限公司 并行用户态协议栈的管理方法和协议栈系统
CN109617833A (zh) * 2018-12-25 2019-04-12 深圳市任子行科技开发有限公司 多线程用户态网络协议栈系统的nat数据审计方法和系统
CN111934894A (zh) * 2019-05-13 2020-11-13 烽火通信科技股份有限公司 基于dpdk管理无线网络接口的方法及系统
CN110493329A (zh) * 2019-08-08 2019-11-22 西藏宁算科技集团有限公司 一种基于用户态协议栈的并发推送服务方法和系统
CN111143062A (zh) * 2019-12-19 2020-05-12 上海交通大学 一种用户态协议栈对外部负载进程的均衡分割策略
CN111314311A (zh) * 2020-01-19 2020-06-19 苏州浪潮智能科技有限公司 一种提高交换机性能的方法、系统、设备及介质

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117041379A (zh) * 2023-07-10 2023-11-10 中科驭数(北京)科技有限公司 同时监听用户态协议栈和内核态协议栈新建连接的方法及装置
CN117041379B (zh) * 2023-07-10 2024-04-19 中科驭数(北京)科技有限公司 同时监听用户态协议栈和内核态协议栈新建连接的方法及装置

Also Published As

Publication number Publication date
CN115766044A (zh) 2023-03-07

Similar Documents

Publication Publication Date Title
CN107690622B (zh) 实现硬件加速处理的方法、设备和系统
WO2021217529A1 (zh) 一种进程间通信的方法及系统
US11507426B2 (en) Resource pool management method and apparatus, resource pool control unit, and communications device
US11755513B2 (en) Data processing and writing method based on virtual machine memory identification field and devise
CN111858228B (zh) 用于存储装置中的加速内核的状态监测的方法及系统
CN114553635B (zh) Dpu网络设备中的数据处理方法、数据交互方法及产品
CN104123265A (zh) 一种众核间通信方法及系统
US20230152978A1 (en) Data Access Method and Related Device
WO2021135574A1 (zh) 数据存储方法、装置及终端设备
CN111404931A (zh) 一种基于持久性内存的远程数据传输方法
WO2023030178A1 (zh) 一种基于用户态协议栈的通信方法及相应装置
WO2024088268A1 (zh) Rdma事件管理方法、设备及存储介质
CN113632065A (zh) 多核电子装置及其分组处理方法
CN117076140B (zh) 一种分布式计算方法、装置、设备、系统及可读存储介质
US20240061802A1 (en) Data Transmission Method, Data Processing Method, and Related Product
CN113821309A (zh) 一种微内核虚拟机间的通信方法、装置、设备及存储介质
CN113098955A (zh) 一种数据传输方法、装置、设备及计算机可读存储介质
US20230342087A1 (en) Data Access Method and Related Device
WO2023125565A1 (zh) 网络节点的配置和访问请求的处理方法、装置
CN117240935A (zh) 基于dpu的数据平面转发方法、装置、设备及介质
CN109743350B (zh) 一种科学计算应用影像区交换通信模式的卸载实现方法
US20230153153A1 (en) Task processing method and apparatus
CN114780353B (zh) 一种文件日志监控方法、系统及计算设备
EP3913488B1 (en) Data processing method and device
WO2017177400A1 (zh) 一种数据处理方法及系统

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22863316

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2022863316

Country of ref document: EP

Effective date: 20240311

NENP Non-entry into the national phase

Ref country code: DE