CN115866010A - RDMA connection establishing method and device - Google Patents

RDMA connection establishing method and device Download PDF

Info

Publication number
CN115866010A
CN115866010A CN202310153021.6A CN202310153021A CN115866010A CN 115866010 A CN115866010 A CN 115866010A CN 202310153021 A CN202310153021 A CN 202310153021A CN 115866010 A CN115866010 A CN 115866010A
Authority
CN
China
Prior art keywords
visocket
rdma
socket
layer
function
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202310153021.6A
Other languages
Chinese (zh)
Other versions
CN115866010B (en
Inventor
田万廷
刘运渠
官华伯
赵凝霞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jiangsu Weishi Technology Co ltd
Original Assignee
Jiangsu Weishi Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jiangsu Weishi Technology Co ltd filed Critical Jiangsu Weishi Technology Co ltd
Priority to CN202310153021.6A priority Critical patent/CN115866010B/en
Publication of CN115866010A publication Critical patent/CN115866010A/en
Application granted granted Critical
Publication of CN115866010B publication Critical patent/CN115866010B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Computer And Data Communications (AREA)
  • Communication Control (AREA)

Abstract

A server establishes TCP connection with a client in a ViSocket layer, wherein the ViSocket layer is formed by calling a ViSocket function library by a server control process, the ViSocket function library is a pre-established function library integrated with a TCP system call function and an RDMA information transmission function, and the ViSocket layer is used for converting a general TCP/IP system call into an RDMA recognizable transmission operation; exchanging RDMA link establishment information with a client through a TCP connection of a ViSocket layer; and establishing RDMA connection with the client according to the obtained RDMA link establishment information. The embodiment of the disclosure can automatically convert the socket into the RDMA connection, has higher bandwidth and smaller delay jitter, and occupies a host CPU with the calculation power close to zero.

Description

RDMA connection establishing method and device
Technical Field
The present disclosure relates to the field of information processing, and in particular, to a method and an apparatus for establishing a Direct data Access (RDMA) connection.
Background
RDMA has the characteristics of bypassing a software Protocol stack and unloading network operation to hardware relative to a Transmission Control Protocol (TCP), and can effectively increase a network bandwidth and reduce a network delay and a Central Processing Unit (CPU) load. Therefore, how to merge TCP and RDMA becomes a problem to be solved urgently.
In the related art, the common method includes a Mellanox open source software product message accelerator or a Shared Memory Communication over RDMA (remote Memory Communication over RDMA, SMC-R) method.
However, mellanox does not reduce the overhead of CPU processing transport protocols, and delay jitter is still much larger compared with RDMA; SMC-R, however, suffers from large RDMA performance loss due to the presence of user space/kernel space switching, and requires modification of application-related address family code due to the addition of the AF SMC address family.
Disclosure of Invention
The embodiment of the disclosure provides a method and a device for establishing RDMA connection, which can automatically convert a socket into the RDMA connection without modifying codes, have higher bandwidth and smaller delay jitter, and occupy nearly zero CPU calculation power of a host.
In one aspect, an embodiment of the present disclosure provides an RDMA connection establishment method, including:
the method comprises the steps that a server side establishes TCP connection with a client side on a ViSocket layer, wherein the ViSocket layer is formed by calling a ViSocket function library by a server side control process, the ViSocket function library is a pre-established function library integrated with a TCP system calling function and an RDMA information transmission function, and the ViSocket layer is used for converting general TCP system calling into RDMA recognizable transmission operation;
the server exchanges RDMA link establishment information with the client through the TCP connection of the ViSocket layer;
and the server establishes RDMA connection with the client according to the obtained RDMA link establishment information.
Before the server establishes a TCP connection with the client in the vissocket layer, the method further includes:
when an application main process corresponding to the application program of the server runs, the application main process calls the ViSocket function library to form a ViSocket layer corresponding to the application main process;
when the application host process calls the creating function of the interception socket, the ViSocket layer corresponding to the application host process intercepts the call of the creating function of the interception socket, creates the interception socket and returns the interception socket to the application host process.
The method for establishing the TCP connection between the server and the client in the ViSocket layer comprises the following steps:
the application main process monitors a TCP connection request from a client through the monitoring socket;
when the application host process snoops the TCP connection request and calls a creating function of a communication socket to receive the TCP connection request of a client, a ViSocket layer corresponding to the application host process intercepts the call of the creating function of the communication socket and receives the TCP connection request of the client;
and the ViSocket layer corresponding to the application main process establishes TCP connection with the client according to the TCP connection request, and creates a TCP communication socket to return to the application main process.
The server exchanges RDMA link establishment information with the client through TCP connection of a ViSocket layer, and establishes RDMA connection with the client according to the obtained RDMA link establishment information, and the method comprises the following steps:
when the application host process calls a data receiving function or a data sending function through a TCP communication socket of the application host process, a ViSocket layer corresponding to the application host process intercepts the call of the data receiving function or the data sending function;
when the ViSocket layer corresponding to the application main process judges that RDMA connection is not established between the ViSocket layer and the client, the ViSocket layer corresponding to the application main process informs the client to send an RDMA link establishment request through a TCP communication socket of the ViSocket layer;
and the ViSocket layer of the application main process receives RDMA link establishment information corresponding to the RDMA link establishment request from the client through TCP connection, establishes RDMA connection with the client according to the RDMA link establishment information and establishes an RDMA communication socket.
After the server establishes the RDMA connection with the client according to the obtained RDMA link establishment information, the method further includes:
when the application main process calls a data receiving function or a data sending function through a TCP communication socket of the application main process, a ViSocket layer corresponding to the application main process intercepts the call of the data receiving function or the data sending function;
when the ViSocket layer corresponding to the application main process judges that the RDMA connection is established between the ViSocket layer and the client, the ViSocket layer corresponding to the application main process receives or sends data through the RDMA communication socket of the ViSocket layer and the established RDMA connection.
The server exchanges RDMA link establishment information with the client through TCP connection of a ViSocket layer, and establishes RDMA connection with the client according to the obtained RDMA link establishment information, and the method comprises the following steps:
the application main process calls a process copy function to copy the application main process to obtain a work subprocess, and the work subprocess is operated; wherein the work subprocess comprises: copying the interception socket and the TCP communication socket corresponding to the application main process to obtain the interception socket and the TCP communication socket corresponding to the work subprocess;
when the work subprocess runs, the work subprocess calls a ViSocket function library to run to form a ViSocket layer corresponding to the work subprocess;
when a work subprocess calls a data receiving function or a data sending function through a TCP communication socket of the work subprocess, a ViSocket layer corresponding to the work subprocess intercepts the call of the data receiving function or the data sending function;
when the ViSocket layer corresponding to the working subprocess judges that RDMA connection is not established between the ViSocket layer and the client, the ViSocket layer corresponding to the working subprocess informs the client to send an RDMA link establishment request through a TCP communication socket of the ViSocket layer;
and the ViSocket layer corresponding to the work subprocess receives RDMA link establishment information corresponding to the RDMA link establishment request from the client through TCP connection, establishes RDMA connection with the client according to the RDMA link establishment information, and establishes RDMA communication sockets.
After the server establishes the RDMA connection with the client according to the obtained RDMA link establishment information, the method further comprises the following steps:
when the work subprocess calls a data receiving function or a data sending function through a TCP communication socket of the work subprocess, a ViSocket layer corresponding to the work subprocess intercepts the call of the data receiving function or the data sending function;
when the ViSocket layer corresponding to the working subprocess judges that the RDMA connection is established between the working subprocess and the client, the ViSocket layer corresponding to the working subprocess receives or sends data through the RDMA communication socket of the ViSocket layer and the established RDMA connection.
The described work subprocess calls ViSocket function library to operate, after the ViSocket layer correspondent to the described work subprocess is formed, it also includes:
when the work subprocess calls the socket closing function to close the monitoring socket of the work subprocess, the ViSocket layer corresponding to the work subprocess intercepts the call of the socket closing function used for closing the monitoring socket and closes the reference of the monitoring socket corresponding to the work subprocess.
After the server establishes the RDMA connection with the client according to the obtained RDMA link establishment information, the method further includes:
when the application host process calls the socket closing function to close the TCP communication socket of the application host process, the ViSocket layer corresponding to the application host process intercepts the call of the socket closing function to close the TCP communication socket and closes the reference of the TCP communication socket of the application host process.
The method for establishing the TCP connection between the server and the client in the ViSocket layer comprises the following steps:
the application main process calls a process copy function to copy the application main process to obtain at least one work subprocess, and operates all the obtained work subprocesses; wherein each of the work subprocesses comprises: copying the interception socket corresponding to the application main process to obtain an interception socket corresponding to the work subprocess;
when each work subprocess runs, the work subprocess calls a ViSocket function library to run to form a ViSocket layer corresponding to the work subprocess, and the work subprocess monitors a TCP connection request from a client through a monitoring socket of the work subprocess;
the method comprises the steps that a work subprocess which senses a TCP connection request from a client through a monitoring socket is used as a target work subprocess, when the target work subprocess calls a communication socket creating function to receive the TCP connection request of the client, a ViSocket layer corresponding to the target work subprocess intercepts the call of the communication socket creating function and receives the TCP connection request;
and the ViSocket layer corresponding to the target work subprocess establishes TCP connection with the client according to the TCP connection request, and creates a TCP communication socket to return to the target work subprocess.
The server exchanges RDMA link establishment information with the client through the TCP connection of the ViSocket layer, and establishes RDMA connection with the client according to the obtained RDMA link establishment information, and the method comprises the following steps:
when the target work subprocess calls a data receiving function or a data sending function through a TCP communication socket of the target work subprocess, a ViSocket layer corresponding to the target work subprocess intercepts the call of the data receiving function or the data sending function;
when the ViSocket layer corresponding to the target work subprocess judges that RDMA connection is not established between the ViSocket layer and the client, the ViSocket layer corresponding to the target work subprocess informs the client to send an RDMA link establishment request through a TCP communication socket of the ViSocket layer;
and the ViSocket layer corresponding to the target work subprocess receives RDMA link establishment information corresponding to the RDMA link establishment request from the client through TCP connection, establishes RDMA connection with the client according to the RDMA link establishment information, and establishes RDMA communication sockets.
After the server establishes the RDMA connection with the client according to the obtained RDMA link establishment information, the method further includes:
when the target work subprocess calls a data receiving function or a data sending function through a TCP communication socket of the target work subprocess, a ViSocket layer corresponding to the target work subprocess intercepts the call of the data receiving function or the data sending function;
and when the ViSocket layer corresponding to the target work subprocess judges that the RDMA connection is established between the ViSocket layer and the client, the ViSocket layer corresponding to the target work subprocess receives or sends data through the RDMA communication socket of the ViSocket layer and the established RDMA connection.
After the application host process calls the process copy function to copy itself to obtain at least one work subprocess, the method further comprises the following steps:
when the application host process calls the socket closing function to close the monitoring socket of the application host process, the ViSocket layer corresponding to the application host process intercepts the call of the socket closing function to close the monitoring socket and closes the reference of the monitoring socket of the application host process.
The ViSocket layer corresponding to the target worker sub-process receives RDMA link establishment information corresponding to the RDMA link establishment request from the client through TCP connection, and after the RDMA link establishment information is established with the client, the method further comprises the following steps:
when the target work subprocess calls the socket closing function to close the TCP communication socket of the target work subprocess, the ViSocket layer corresponding to the target work subprocess intercepts the call of the socket closing function used for closing the TCP communication socket and closes the reference of the TCP communication socket of the target work subprocess.
In yet another aspect, an embodiment of the present disclosure further provides a server, including a memory and a processor, where the memory is used to store an executable program;
the processor is configured to read and execute the executable program to implement the RDMA connection establishment method described above.
Compared with the prior art, the RDMA connection establishment method provided by the embodiment of the disclosure adds the ViSocket layer between the kernel layer and the application layer of the server, senses the TCP connection in the ViSocket layer, and interacts RDMA link establishment information based on the TCP connection to establish the RDMA connection, thereby completing the automatic conversion of the socket into the RDMA connection, and therefore on the premise of not modifying codes, the bandwidth is improved, the delay jitter is reduced, and the CPU calculation power occupation of the host approaches zero.
Additional features and advantages of the disclosure will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by the practice of the disclosure. Other advantages of the disclosure may be realized and attained by the instrumentalities and combinations particularly pointed out in the specification and the drawings.
Drawings
The accompanying drawings are included to provide an understanding of the disclosed embodiments and are incorporated in and constitute a part of this specification, illustrate embodiments of the disclosure and together with the examples serve to explain the principles of the disclosure and not to limit the disclosure.
Fig. 1 is a schematic flowchart of an RDMA connection establishment method according to an embodiment of the present disclosure;
fig. 2 is a schematic structural diagram of a software and hardware protocol stack hierarchy on a server according to an embodiment of the present disclosure;
FIG. 3 is a flow diagram illustrating another RDMA connection establishment method according to an embodiment of the disclosure;
FIG. 4 is a diagram illustrating a function call process of an RDMA connection establishment method according to an embodiment of the present disclosure;
FIG. 5 is a flowchart illustrating a method for establishing an RDMA connection according to another embodiment of the disclosure;
fig. 6 is a diagram illustrating a function call process of another RDMA connection establishment method according to an embodiment of the present disclosure.
Detailed Description
The present disclosure describes embodiments, but the description is illustrative rather than limiting and it will be apparent to those of ordinary skill in the art that many more embodiments and implementations are possible that are within the scope of the embodiments described in this disclosure. Although many possible combinations of features are shown in the drawings and discussed in the detailed description, many other combinations of the disclosed features are possible. Any feature or element of any embodiment may be used in combination with or instead of any other feature or element in any other embodiment, unless expressly limited otherwise.
The present disclosure includes and contemplates combinations of features and elements known to those of ordinary skill in the art. The embodiments, features and elements of the present disclosure that have been disclosed may also be combined with any conventional features or elements to form unique aspects as defined by the claims. Any feature or element of any embodiment may also be combined with features or elements from other aspects to form yet another unique aspect as defined by the claims. Thus, it should be understood that any features shown and/or discussed in this disclosure may be implemented alone or in any suitable combination. Accordingly, the embodiments are not limited except as by the appended claims and their equivalents. Furthermore, various modifications and changes may be made within the scope of the appended claims.
Further, in describing representative embodiments, the specification may have presented the method and/or process as a particular sequence of steps. However, to the extent that the method or process does not rely on the particular order of steps set forth herein, the method or process should not be limited to the particular sequence of steps described. Other sequences of steps are possible as will be appreciated by those of ordinary skill in the art. Therefore, the particular order of the steps set forth in the specification should not be construed as limitations on the claims. Further, the claims directed to the method and/or process should not be limited to the performance of their steps in the order written, and one skilled in the art can readily appreciate that the sequences may be varied and still remain within the spirit and scope of the embodiments of the present disclosure.
An embodiment of the present disclosure provides an RDMA connection establishment method, as shown in fig. 1, the method includes:
step 101, a server establishes TCP connection with a client in a ViSocket layer, wherein the ViSocket layer is formed by calling a ViSocket function library by a server control process, and the ViSocket function library is a pre-established function library integrated with a TCP system call function and an RDMA information transmission function and is used for converting a general TCP system call into an RDMA recognizable transmission operation;
102, the server exchanges RDMA link establishment information with the client through the TCP connection of the ViSocket layer;
and 103, the server establishes RDMA connection with the client according to the obtained RDMA link establishment information.
Illustratively, the vissocket layer is located between a kernel layer and an application layer of the server.
Modern enterprise data centers and cloud data centers deploy massive common servers for completing computing/storage/network/database and other services. The servers are horizontally expanded to provide elastic computing power and are connected together through a high-speed network to form a huge computer. The multi-core processor in the server follows Moore's law, releases the benefits of the semiconductor process, and the number of the processor cores is 100. To ensure the computational efficiency of such data centers, there are two key points: first, a high speed low latency network reduces message communication latency across servers, is physically loosely coupled, is tightly coupled in performance, and fully frees up server cluster efficiency. Secondly, a network stack is arranged on the server side, data are efficiently transmitted to each application process running on the multi-core processor, the loss along the way is reduced, the delay and the occupation of a host CPU are reduced, and the calculation efficiency of the multi-core processor is fully released. Data centers have been interconnecting servers using ethernet. In the server, the message is transmitted through a TCP/IP network stack implemented in the linux kernel, and a programming interface of the application program is a socket (socket). A socket (in terms of its position, a socket up-link application process, a down-link network protocol stack, an interface through which an application communicates via a network protocol, and an interface through which an application interacts with a network protocol stack) is a logical concept, which represents an endpoint through which two applications access a communication connection when communicating via a network. Host operating systems, such as Unix/Linux/Windows, have standardized socket-related operations into a standard set of APIs to support portability between operating systems for various applications requiring network communication. In a large amount of application software and systems of a data center, a network communication layer of the data center uses sockets to complete inter-process messaging across a network. Meanwhile, in order to fully exert the processing capacity of the multi-core processor on the server, the application main process/parent process generates a plurality of working/sub-processes through a process replication (fork) mechanism, and each sub-process respectively processes TCP connection on different sockets.
InfiniBand RDMA is used for server-side interconnect of High Performance Computing (HPC) clusters, with very High throughput and very low latency. The RDMA network card on the server side is combined with a network stack, an operating system and a processor are bypassed, an application program on the host receives and transmits a zero copy of a message, and the overhead of the processor core of the host for network communication is extremely low. The programming interface used by the application has been standardized as the verbs API. Since RDMA has been applied in the high-performance computing domain, basically only for high-performance computing, its network communication layer/framework uses the verb interface.
RoCEv2 (RDMA over converted Ethernet) is based on UDP/IP Protocol to carry RDMA, can be deployed in a three-layer network, and has a message structure that a User Datagram Protocol (UDP) header, an IP header and a two-layer Ethernet header are added to a message of an original IB architecture, and a RoCE message is identified by a UDP destination port number 4791. RoCE v1 protocol: based on the RDMA carried by the Ethernet, the method can be only deployed in a two-layer network, the message structure of the method is that a message header of the two-layer Ethernet is added on a message of the original IB architecture, and the RoCE message is identified by Ethertype 0x 8915. RoCE (RDMA over managed Ethernet) v1 is a technology for realizing RDMA based on the existing Ethernet network. RoCEv1 allows RDMA technology to be implemented on existing ethernet, achieving performance and delay metrics close to InfiniBand, but without upgrading existing network infrastructure to expensive InfiniBand, saving significant costs. The RoCEv2 is based on standard network Ethernet (Ethernet PHY/MAC), network layer (IP), and transport layer (UDP) protocols, which may enable RoCEv2 network traffic to be routed through conventional network routers. ) The server side of the data center is supported to be continuously interconnected by using the Ethernet, the existing network infrastructure of the data center is compatible, and the fast upgrade of the speed of 10G/25G/40G/100G/200G/400G/800G or higher of the Ethernet link bandwidth can be fully utilized. Leading companies in the industry, such as Mellanox (Nvidia), intel, broadcom, etc., have designed a series of RNICs (RDMA Network Interface Card, RDMA Network Card) supporting RoCEv2 in mass production, and are mature and stable, and gradually enter the market. After the data center server side adopts RoCEv2, the network layer and the transmission layer are unloaded to the RNIC network card, and the problems that the TCP/IP protocol stack consumes the computing power of a host computer CPU and the communication delay is caused by the computing power are solved. And the data packet is communicated among the application processes of the cross-network on the server side and is directly sent to the application program process. The most critical step is left, the industry develops massive mature and stable application software and systems accumulated for a long time, and a socket interface is adopted, so that the application software and the systems are completely incompatible with the verbs of RDMA. And huge manpower and material resources are consumed for modifying huge application software and systems. The network communication layer is a software infrastructure and is a core bottom layer support of large/distributed application software, and any loss in performance, loss in function, problems in compatibility and hidden danger in safety can be amplified by the same times of the number of massive servers, and the whole data center is influenced finally.
As shown in fig. 2, the software and hardware protocol stack layer on the server side includes, in combination with TCP/IP, ethernet: an Ethernet Link Layer (Ethernet Link Layer), an internet protocol Layer (IP), a transmission control protocol Layer (TCP/UDP), a socket Layer (socket Layer), and application layers (Applications); roCEv2 includes: an Ethernet Link Layer (Ethernet Link Layer), an Internet Protocol Layer (IP), a transmission control Protocol Layer (TCP/UDP), an IB Transport Protocol Layer (IB Transport Protocol), an interface Layer (verbs library) and an application Layer (Applications); infiniBand includes: the system comprises an IB connection Layer (IB Link Layer), an IB Network Layer (IB Network Layer), an IB Transport Protocol Layer (IB Transport Protocol), an interface Layer (verbs library) and application layers (Applications).
In the related art, the Mellanox open source software product message Accelerator (VMA) can improve the performance of message and stream based applications. The socket call of the application program is automatically intercepted without modifying the application program, and the socket call is transparently converted into an embedded lightweight user mode protocol stack, so that the network stack of the kernel is completely bypassed. The switching of user space/kernel space is reduced, but the overhead of processing transmission protocols by a CPU is not reduced, and the calculation power of the CPU of the host is slightly reduced. The bandwidth is improved, and the calculation power of the CPU is increased. Compared with a kernel network stack, the delay jitter is reduced; but the delay jitter is still much larger compared to RDMA. The ViSocket of the invention also performs transparent conversion in the user space, does not need to modify an application program, and supports automatic conversion of socket in a more complex process pool. Compared with VMA, the invention automatically converts the socket into RDMA connection, has higher bandwidth and smaller delay jitter, and occupies nearly zero CPU calculation power of the host.
While RDMA technology can bring promising network performance improvements, there are difficulties in trying to transparently improve the network performance of existing TCP applications using RDMA because the use of RDMA networks relies on a new set of semantic interfaces, including ibverbs and rdmacm interfaces (hereinafter collectively referred to as verbs). Compared with the traditional POSIX socket interface, the verbs interface is more in number and is closer to hardware semantics. For the existing TCP network application realized based on the POSIX socket interface, a large amount of modification has to be carried out on an application program to enjoy the performance red benefit brought by RDMA, and the cost is huge. Therefore, it is desirable to be able to follow the socket interface while using the RDMA network, making existing socket applications transparent to the enjoyment of RDMA services.
Shared Memory Communication over RDMA (SMC-R) is a kernel network protocol based on RDMA technology, compatible socket interface, proposed by IBM and contributed to the Linux kernel in 2017. SMC-R can help TCP network applications transparently use RDMA to obtain high-bandwidth, low-latency network communication services. SMC-R, which is deployed by IBM and the Alibara open source and real product environment, has entered the Linux kernel code tree. The code was submitted by IBM's Ursula Braun in 2017 on month 1 as linux, about 2 ten thousand lines of code. SMC-R is parallel with TCP/IP protocol as a set, the up compatibility socket interface, the bottom layer uses RDMA to finish the inner core protocol stack of the shared memory communication, its design intention is to offer transparent RDMA service for TCP application, have kept the key function in TCP/IP ecosystem at the same time, therefore SMC-R realizes that TCP/IP socket changes into RDMA to connect automatically in the inner core, it is the best to TCP/IP compatibility, can give full play to the performance advantage of RDMA. The disadvantages are that there is a userspace/kemel-space switch and the RDMA performance loss is large. Additionally, SMC-R is in parallel with the traditional TCP/IP network stack in the core, requiring the addition of the AF _ SMC address family. The code of the application relating to the address family needs to be modified. The SMC-R code runs in the kernel, and the stability of the entire host system is greatly affected by the problems in terms of code quality & exception handling, and the like. Compared with SMC-R, the ViSocket of the invention is realized, the application program is not required to be modified at all, and the fault domain is smaller; the bandwidth is higher, the delay jitter is smaller, and the CPU calculation power of the host computer is close to zero.
The RDMA connection establishing method provided by the embodiment of the disclosure aims to solve the problem that the application of the data center using socket communication is not matched with the verbs interface of the RDMA network card combined software stack. In specific implementation, a network card cooperation supporting RDMA RoCEv2 is required.
The RDMA connection establishment method provided by the embodiment of the disclosure adds a ViSocket layer between a kernel layer and an application layer of a server, senses TCP connection in the ViSocket layer, and interacts RDMA link establishment information based on the TCP connection to establish RDMA connection, thereby completing automatic conversion of socket into RDMA connection, thus improving bandwidth, reducing delay jitter and enabling CPU (central processing unit) calculation occupation of a host to be close to zero on the premise of not modifying codes. The RDMA connection establishing method provided by the embodiment of the disclosure is applicable to scenes with sufficient resources and high requirements on the message processing speed, and can realize efficient processing of the RDMA message.
In an exemplary instance, before the ViSocket layer establishes a TCP connection with the client, the server further includes:
firstly, when an application main process corresponding to an application program of the server runs, the application main process calls the ViSocket function library to form a ViSocket layer corresponding to the application main process;
secondly, when the application host process calls a creating function of the interception socket, the ViSocket layer corresponding to the application host process intercepts the call of the creating function of the interception socket, creates the interception socket and returns the interception socket to the application host process.
Illustratively, the listening socket creation function is for creating a listening socket, and the listening socket creation function includes: socket/bind/list function, the listening socket created is socket _ L.
In an exemplary embodiment, the server establishes a TCP connection with the client in a ViSocket layer, including:
firstly, the application main process monitors a TCP connection request from a client through the monitoring socket;
secondly, when the application host process snoops the TCP connection request and calls a creating function of a communication socket to receive the TCP connection request of a client, a ViSocket layer corresponding to the application host process intercepts the call of the creating function of the communication socket and receives the TCP connection request of the client;
and thirdly, the ViSocket layer corresponding to the application main process establishes TCP connection with the client according to the TCP connection request, and creates a TCP communication socket to return to the application main process.
Illustratively, the creation function of the communication socket includes: an accept function.
In an exemplary embodiment, the server exchanges RDMA link establishment information with the client through a TCP connection of a ViSocket layer, and establishes an RDMA connection with the client according to the obtained RDMA link establishment information, including:
firstly, when the application host process calls a data receiving function or a data sending function through a TCP communication socket of the application host process, a ViSocket layer corresponding to the application host process intercepts the call of the data receiving function or the data sending function;
secondly, when the ViSocket layer corresponding to the application main process judges that RDMA connection is not established between the ViSocket layer and the client, the ViSocket layer corresponding to the application main process informs the client to send an RDMA link establishment request through a TCP communication socket of the ViSocket layer;
and finally, the ViSocket layer of the application main process receives RDMA link establishment information corresponding to the RDMA link establishment request from the client through TCP connection, establishes RDMA connection with the client according to the RDMA link establishment information and establishes an RDMA communication socket.
Illustratively, the data receiving function includes: a receive function, the data transmission function comprising: send function.
Illustratively, in the process that the application host process calls the polling function for polling the data receiving request or the data sending request of at least one client for the first time, after the RDMA connection is established and the non-blocked RDMA communication socket is obtained, the ViSocket layer corresponding to the application host process directly receives or sends data through the non-blocked RDMA communication socket and the established RDMA connection.
In an exemplary embodiment, after the server establishes an RDMA connection with the client according to the obtained RDMA link establishment information, the method further includes:
firstly, when the application host process calls a data receiving function or a data sending function through a TCP communication socket of the application host process, a ViSocket layer corresponding to the application host process intercepts the call of the data receiving function or the data sending function;
secondly, when the ViSocket layer corresponding to the application main process judges that RDMA connection is established between the ViSocket layer and the client, the ViSocket layer corresponding to the application main process receives or sends data through the RDMA communication socket of the ViSocket layer and the established RDMA connection.
In an exemplary embodiment, the server exchanges RDMA link establishment information with the client through a TCP connection of a vissocket layer, and establishes an RDMA connection with the client according to the obtained RDMA link establishment information, including:
firstly, the application main process calls a process copy function to copy itself to obtain a work subprocess, and the work subprocess is operated; wherein the work subprocess comprises: copying the interception socket and the TCP communication socket corresponding to the application main process to obtain the interception socket and the TCP communication socket corresponding to the work subprocess;
secondly, when the work subprocess runs, the work subprocess calls a ViSocket function library to run to form a ViSocket layer corresponding to the work subprocess;
thirdly, when a work subprocess calls a data receiving function or a data sending function through a TCP communication socket of the work subprocess, a ViSocket layer corresponding to the work subprocess intercepts the call of the data receiving function or the data sending function;
next, when the ViSocket layer corresponding to the working subprocess judges that RDMA connection is not established between the working subprocess and the client, the ViSocket layer corresponding to the working subprocess informs the client to send an RDMA link establishment request through a TCP communication socket of the ViSocket layer;
and finally, the ViSocket layer corresponding to the worker subprocess receives RDMA link establishment information corresponding to the RDMA link establishment request from the client through TCP connection, establishes RDMA connection with the client according to the RDMA link establishment information and establishes an RDMA communication socket.
Illustratively, the process copy function includes: fork function.
For example, the application host process calls a process copy function to copy itself to obtain one or more work sub-processes, and when multiple work sub-processes are obtained by copying, each work sub-process performs the above operations.
Illustratively, in the process that the worker sub-process calls the polling function for polling the data receiving request or the data sending request of at least one client for the first time, after the RDMA connection is established and the non-blocked RDMA communication socket is obtained, the ViSocket layer corresponding to the worker sub-process directly receives or sends data through the established RDMA connection and the non-blocked RDMA communication socket.
In an exemplary embodiment, after the server establishes an RDMA connection with the client according to the obtained RDMA link establishment information, the method further includes:
firstly, when the work subprocess calls a data receiving function or a data sending function through a TCP communication socket of the work subprocess, a ViSocket layer corresponding to the work subprocess intercepts the call of the data receiving function or the data sending function;
secondly, when the ViSocket layer corresponding to the working subprocess judges that RDMA connection is established between the ViSocket layer and the client, the ViSocket layer corresponding to the working subprocess receives or sends data through the RDMA communication socket of the ViSocket layer and the established RDMA connection.
In an exemplary embodiment, the method for forming a vissocket layer corresponding to a worker subprocess further includes, after the worker subprocess calls a vissocket function library to run, that:
when the work subprocess calls the socket closing function to close the listening socket, the ViSocket layer corresponding to the work subprocess intercepts the call of the socket closing function used for closing the listening socket and closes the reference of the listening socket corresponding to the work subprocess.
In an exemplary embodiment, after the server establishes an RDMA connection with the client according to the obtained RDMA link establishment information, the method further includes:
when the application host process calls the socket closing function to close the TCP communication socket of the application host process, the ViSocket layer corresponding to the application host process intercepts the call of the socket closing function to close the TCP communication socket and closes the reference of the TCP communication socket of the application host process.
According to the RDMA connection establishing method provided by the embodiment of the disclosure, since fork has one or more work sub-processes, a plurality of clients can be served in parallel, a plurality of RDMA are established among the clients, and the ViSocket layer delays the RDMA connection which can be established by the main process to be established in the sub-processes, so that the problem that the RDMA connection cannot be used across the processes is avoided.
In an exemplary embodiment, the server establishes a TCP connection with the client in a ViSocket layer, including:
firstly, the application main process calls a process copy function to copy the application main process to obtain at least one work subprocess, and runs all the obtained work subprocesses; wherein each of the work subprocesses comprises: copying the interception socket and the TCP communication socket corresponding to the application main process to obtain an interception socket corresponding to the work subprocess;
secondly, when each work subprocess runs, the work subprocess calls a ViSocket function library to run to form a ViSocket layer corresponding to the work subprocess, and the work subprocess monitors a TCP (transmission control protocol) connection request from a client through a monitoring socket of the work subprocess;
thirdly, taking a work subprocess which senses a TCP connection request from a client side through a sensing socket as a target work subprocess, and when the target work subprocess calls a communication socket creating function to receive the TCP connection request of the client side, a ViSocket layer corresponding to the target work subprocess intercepts the call of the communication socket creating function and receives the TCP connection request;
and finally, the ViSocket layer corresponding to the target work subprocess establishes TCP connection with the client according to the TCP connection request, creates a TCP communication socket and returns the TCP communication socket to the target work subprocess.
In an exemplary embodiment, the server exchanges RDMA link establishment information with the client through a TCP connection of a vissocket layer, and establishes an RDMA connection with the client according to the obtained RDMA link establishment information, including:
firstly, when the target work subprocess calls a data receiving function or a data sending function through a TCP communication socket of the target work subprocess, a ViSocket layer corresponding to the target work subprocess intercepts the call of the data receiving function or the data sending function;
secondly, when the ViSocket layer corresponding to the target work subprocess judges that RDMA connection is not established between the ViSocket layer and the client, the ViSocket layer corresponding to the target work subprocess informs the client to send an RDMA link establishment request through a TCP communication socket of the ViSocket layer;
and finally, the ViSocket layer corresponding to the target work subprocess receives RDMA link establishment information corresponding to the RDMA link establishment request from the client through TCP connection, establishes RDMA connection with the client according to the RDMA link establishment information, and establishes RDMA communication sockets.
Illustratively, in the process that the target work subprocess calls the polling function for polling the data receiving request or the data sending request of at least one client for the first time, after an RDMA connection is established and a non-blocked RDMA communication socket is obtained, the ViSocket layer corresponding to the target work subprocess directly receives or sends data through the established RDMA connection and through the non-blocked RDMA communication socket.
In an exemplary embodiment, after the server establishes an RDMA connection with the client according to the obtained RDMA link establishment information, the method further includes:
firstly, when the target work subprocess calls a data receiving function or a data sending function through a TCP communication socket of the target work subprocess, a ViSocket layer corresponding to the target work subprocess intercepts the call of the data receiving function or the data sending function;
secondly, when the ViSocket layer corresponding to the target work subprocess judges that the RDMA connection is established between the ViSocket layer and the client, the ViSocket layer corresponding to the target work subprocess receives or sends data through the RDMA communication socket of the ViSocket layer and the established RDMA connection.
In an exemplary embodiment, after the application host process calls the process copy function to copy itself to obtain at least one work sub-process, the method further includes:
when the application host process calls the socket closing function to close the monitoring socket of the application host process, the corresponding ViSocket layer of the application host process intercepts the call of the socket closing function to close the monitoring socket and closes the reference of the monitoring socket of the application host process.
In an exemplary instance, after the vissocket layer corresponding to the target worker sub-process receives RDMA link establishment information corresponding to the RDMA link establishment request from the client through a TCP connection and establishes an RDMA connection with the client according to the RDMA link establishment information, the method further includes:
when the target work subprocess calls the socket closing function to close the TCP communication socket of the target work subprocess, the ViSocket layer corresponding to the target work subprocess intercepts the call of the socket closing function used for closing the TCP communication socket and closes the reference of the TCP communication socket of the target work subprocess.
According to the RDMA connection establishing method provided by the embodiment of the disclosure, because one or more work sub-processes are carried out by fork in advance, after a request from a client is received, the request can be directly processed, so that the request processing time is shortened, and the request processing speed is increased.
An embodiment of the present disclosure further provides an RDMA connection establishment method, as shown in fig. 3, including:
201. the application host process calls the socket/bind/list to create the listening socket _ L, and the vissocket layer (hereinafter abbreviated as "vissocket") corresponding to the application host process intercepts related calls at the bottom layer and returns the created socket _ L listening socket to the upper-layer application host process.
202a, calling an accept function by the application main process to receive client connection (three-time handshake operation), intercepting the call at the bottom layer by the ViSocket, and waiting for a client connection request.
203. The client 1 asynchronously sends a connection request to the server.
202b, the ViSocket receives the client connection request, establishes synchronous connection, generates a TCP communication socket _ W, and returns the TCP communication socket _ W to the application main process.
204. And calling fork by the application main process to copy a work subprocess 1, adding the work subprocess 1 into the process pool, closing the reference of the listening socket _ L by the work subprocess 1, and carrying and maintaining the communication socket _ W1.
205. As the worker sub-process 1 runs, a vissocket layer (hereinafter referred to as "vissocket 1") corresponding to the worker sub-process 1 is generated, and the vissocket 1 intercepts the close call and closes the reference of the listening socket _ L.
206. The application host process closes the communication socket _ W, the vissocket intercepts the call, and closes the reference of the communication socket _ W.
The steps 205/206 have no precedence relationship and are asynchronously executed in respective processes according to the scheduling of the operating system.
207. The worker subprocess 1 receives and transmits data revive type send for the first time, and triggers the delay chain establishment. Vissocket 1 intercepts this call.
208. The ViSocket1 informs the client through a communication socket _ W1 to start the link establishment.
209. The ViSocket and the ViSocket1 of the client exchange RDMA link establishment information through a three-way handshake REQ/REP/RTU and other similar mechanisms, and the RDMA connection of the client/the server is established. An RDMA communication socket _ RDMA is generated.
210. The client 1 actually transmits/receives the first batch of message data through an RDMA connection using a communication socket _ RDMA.
211. The ViSocket1 is connected through RDMA, and is used for actually receiving/sending first batch of message data, returning to the work subprocess 1 and actually completing first message data receiving and sending. The subsequent message transmission of the worker subprocess 1 does not need to execute the delay chain building steps 207/208/209, and the message transmission is completed directly through the step 210/211.
In the RDMA connection establishment method provided by the embodiment of the present disclosure, the main process is used as a server, an interception socket is created, and a client request is continuously intercepted and received. The client initiates the connection to the server, and triggers the application host process to return from the accept, at which time the client/server TCP connection is already established. The server generates a working socket _ W as a communication endpoint for the server process to access the TCP connection.
In order to fully utilize a plurality of processor cores of a server side and process connection requests of more client sides, an application main process creates a plurality of work sub-processes through fork to form a process pool. And each work subprocess runs in parallel and is responsible for receiving a request message sent by the client through the TCP connection on each work socket _ W1, processing the request message and sending a corresponding response message after finishing an application specific task. When the application host process calls fork, the operating system copies socket _ W to point to the same TCP connection.
The ViSocket intercepts an accept call of an application main process, does not actually start an RDMA link establishment flow, directly returns a synchronous socket to the application, and records socket _ W in the ViSocket to point to the synchronous socket.
Any work sub-process generated by the application main process fork, which carries the work socket _ W1, is actually a synchronous socket that cannot be used for the work sub-process to send and receive data, because the RDMA connection corresponding to this socket is not established. The ViSocket delays the RDMA connection which can be established by the main process to be established in the sub-process, so that the problem that the RDMA connection cannot be used across processes is avoided.
The ViSocket intercepts the first send/recv class call of the worker subprocess to receive and send data, checks and finds that the socket is a synchronous socket, and immediately starts the process of establishing RDMA connection. Waiting for the RDMA link establishment to be successful, recording the socket inside the Visocket as a socket _ RDMA capable of working normally, and pointing to RDMA connection; and normally transceives data through RDMA. The subsequent send/recv class call of the ViSocket interception worker subprocess receives and transmits data, and the data is normally received and transmitted through socket _ RDMA pointed by socket _ W1.
And intercepting the close class call of the worker subprocess by the ViSocket, and starting a flow for releasing the RDMA connection. And waiting for the RDMA link removal success, clearing the relevant records of the internal socket _ W1, and returning to the close class call of the worker subprocess.
Correspondingly, the function call process of the RDMA connection establishment method is as shown in fig. 4, an application host process calls a listening socket creation function socket _ L, calls a binding function bind, then calls a socket use setting function list, uses the binding function to bind the use of the created socket for listening, then calls a creation function accept of a TCP communication socket to receive a TCP connection request of a client, calls a process copy function fork to copy itself to obtain at least one work subprocess, and calls a socket close function close @ socket _ W to close its TCP communication socket.
Each worker subprocess calls a socket closing function close @ socket _ L to close the listening socket of the worker subprocess, calls a data receiving or data sending function send/rev to receive or send data, calls a process closing function close to end the worker subprocess,
an embodiment of the present disclosure further provides an RDMA connection establishment method, as shown in fig. 5, including:
301. the application host process calls the socket/bind/list to create the listening socket _ L, and the vissocket layer (hereinafter abbreviated as "vissocket") corresponding to the application host process intercepts related calls at the bottom layer and returns the created socket _ L listening socket to the upper-layer application host process.
302. And the application main process calls a fork function to copy a work subprocess 1 and adds the work subprocess 1 into the process pool, the work subprocess 1 carries a reference socket _ L1 of the listening socket to start running, and a Visocket layer (hereinafter referred to as Visocket 1) corresponding to the work subprocess 1 is generated along with the running of the work subprocess 1.
303a, the worker subprocess 1 calls an accept function to receive the client connection, and the ViSocket1 intercepts the call at the bottom layer and waits for the client connection request.
304. The client 1 asynchronously sends a connection request to the server.
303b, the ViSocket1 receives the client connection request, establishes synchronous connection, generates a communication socket _ W, and returns to the application main process.
305. The application host process closes the listening socket _ L, and the vissocket 1 intercepts this call, closing the reference of the listening socket _ L.
306. The worker subprocess 1 receives and transmits data for the first time, triggers the delay chain establishment, and the ViSocket1 intercepts the call.
307. The ViSocket1 informs the client through a communication socket _ W to start the link establishment.
308. The ViSocket1 and the ViSocket of the client exchange RDMA link establishment information through a three-way handshake REQ/REP/RTU and other similar mechanisms, and RDMA connection of the client/the server is established. An RDMA communication socket _ RDMA is generated.
309. The client 1 actually sends/receives the first batch of message data through the RDMA connection.
310. The ViSocket1 actually receives/sends first batch of message data through RDMA connection and returns to the work subprocess 1. And actually completing the first message data transceiving. The subsequent message transmission of the worker sub-process 1 does not need to execute 306/307/308 these delay chain building steps, and the message transmission is completed directly through 309/310 steps.
In the RDMA connection establishment method provided by the embodiment of the present disclosure, the main process is used as the server to establish the listening socket, the client initiates the connection to the server, and triggers the work subprocess to return from the accept, at which time the TCP connection of the client/server is already established. The server worker subprocess generates a worker socket _ W as a communication endpoint for the worker subprocess to access the TCP connection.
In order to fully utilize a plurality of processor cores of a server side and process connection requests of more client sides, an application main process creates a plurality of work sub-processes through fork to form a process pool. And each work subprocess runs in parallel and is responsible for receiving the request of the client on each interception socket _ L1 and generating a work socket _ W. And receiving a request message sent by the client through the TCP connection through the working socket _ W, processing the application specific task, and then sending a corresponding response message. The socket _ L1 of the child process is copied from the socket _ L by the operating system and points to the same listening port when the application host process calls fork.
The ViSocket intercepts the accept call of the worker subprocess, does not actually start the RDMA link establishment flow, but directly returns a synchronous socket to the application, and records socket _ W in the ViSocket to point to the synchronous socket.
Any work sub-process generated by the application main process fork, the work socket _ W obtained by accept, is actually a synchronous socket that cannot be used for the work sub-process to send and receive data, because the RDMA connection corresponding to this socket is not established. The vissocket delays the RDMA connection that the worker sub-process can establish until it is first established when the connection is used to send and receive data. Most application processes do not generate a subprocess of a work subprocess after receiving and sending data by using a certain connection, and the work subprocess continues/shares the connection to receive and send data. Based on this situation, plus the problem that the RDMA connection itself cannot be used across processes, the first-use connection is the point in time when the chain is optimally delayed to build.
The ViSocket intercepts the first send/recv class call of the work subprocess to receive and send data, detects that the socket is a synchronous socket, and immediately starts the process of establishing RDMA connection. Waiting for the RDMA link establishment to be successful, recording the socket inside the Visocket as a socket _ RDMA capable of working normally, and pointing to RDMA connection; and normally transceives data through RDMA. And the subsequent send/recv class call of the ViSocket interception worker subprocess receives and transmits data, and the data is normally received and transmitted through socket _ RDMA pointed by the socket _ W.
And intercepting the close class call of the worker subprocess by the ViSocket, and starting a flow for releasing the RDMA connection. And waiting for the RDMA link removal success, clearing the relevant records of the internal socket _ W, and returning to the close class call of the worker subprocess.
Correspondingly, the function call process of the RDMA connection establishment method is as shown in fig. 6, an application host process calls an interception socket creation function socket _ L, calls a binding function bind, then calls a socket use setting function listen, uses the binding function to bind the use of the created socket for interception, then calls a process copy function fork, generates at least one work sub-process, and calls a close end process.
Each target work subprocess calls a creation function accept of the TCP communication socket to receive a TCP connection request of a client, calls a data receiving or data sending function send/rev to receive or send data, calls a process to close a function close to finish the work subprocess, and calls a socket close function close @ socket _ L to close a non-blocking listening socket of the target work subprocess.
The embodiment of the present disclosure further provides a server, which includes a memory and a processor, where the memory is used for storing an executable program;
the processor is configured to read and execute the executable program to implement the RDMA connection establishment method according to any of the above embodiments.
It should be understood that the processor may be a Central Processing Unit (CPU), other general purpose processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf programmable gate array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components, etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
The memory may include both read-only memory and random access memory, and provides instructions and data to the processor. A portion of the memory may also include non-volatile random access memory. For example, the memory may also store device type information.
In implementation, the processing performed by the terminal device may be performed by instructions in the form of hardware integrated logic circuits or software in the processor. That is, the steps of the method disclosed in the embodiments of the present disclosure may be implemented by a hardware processor, or implemented by a combination of hardware and software modules in a processor. The software module may be located in a storage medium such as a random access memory, a flash memory, a read only memory, a programmable read only memory or an electrically erasable programmable memory, a register, etc. The storage medium is located in a memory, and a processor reads information in the memory and completes the steps of the method in combination with hardware of the processor. To avoid repetition, it is not described in detail here.
The present application describes embodiments, but the description is illustrative rather than limiting and it will be apparent to those of ordinary skill in the art that many more embodiments and implementations are possible within the scope of the embodiments described herein. Although many possible combinations of features are shown in the drawings and discussed in the detailed description, many other combinations of the disclosed features are possible. Any feature or element of any embodiment may be used in combination with or instead of any other feature or element in any other embodiment, unless expressly limited otherwise.
The present application includes and contemplates combinations of features and elements known to those of ordinary skill in the art. The embodiments, features and elements disclosed in this application may also be combined with any conventional features or elements to form a unique inventive concept as defined by the claims. Any feature or element of any embodiment may also be combined with features or elements from other inventive aspects to form yet another unique inventive aspect, as defined by the claims. Thus, it should be understood that any of the features shown and/or discussed in this application may be implemented alone or in any suitable combination. Accordingly, the embodiments are not limited except as by the appended claims and their equivalents. Furthermore, various modifications and changes may be made within the scope of the appended claims.
Further, in describing representative embodiments, the specification may have presented the method and/or process as a particular sequence of steps. However, to the extent that the method or process does not rely on the particular order of steps set forth herein, the method or process should not be limited to the particular sequence of steps described. Other orders of steps are possible as will be understood by those of ordinary skill in the art. Therefore, the particular order of the steps set forth in the specification should not be construed as limitations on the claims. Further, the claims directed to the method and/or process should not be limited to the performance of their steps in the order written, and one skilled in the art can readily appreciate that the sequences may be varied and still remain within the spirit and scope of the embodiments of the present application.

Claims (15)

1. An RDMA connection establishment method, comprising:
the method comprises the steps that a server side establishes TCP connection with a client side on a ViSocket layer, wherein the ViSocket layer is formed by calling a ViSocket function library by a server side control process, the ViSocket function library is a pre-established function library integrated with a TCP system calling function and an RDMA information transmission function, and the ViSocket layer is used for converting general TCP system calling into RDMA recognizable transmission operation;
the server exchanges RDMA link establishment information with the client through TCP connection of a ViSocket layer;
and the server establishes RDMA connection with the client according to the obtained RDMA link establishment information.
2. The method of claim 1, wherein before the ViSocket layer establishes the TCP connection with the client, the server further comprises:
when an application main process corresponding to the application program of the server side runs, the application main process calls the ViSocket function library to form a ViSocket layer corresponding to the application main process;
when the application host process calls the creating function of the interception socket, the ViSocket layer corresponding to the application host process intercepts the call of the creating function of the interception socket, creates the interception socket and returns the interception socket to the application host process.
3. The method of claim 2, wherein the server establishes a TCP connection with the client in a ViSocket layer, and the method comprises:
the application main process monitors a TCP connection request from a client through the monitoring socket;
when the application host process snoops the TCP connection request and calls a creating function of a communication socket to receive the TCP connection request of a client, a ViSocket layer corresponding to the application host process intercepts the call of the creating function of the communication socket and receives the TCP connection request of the client;
and the ViSocket layer corresponding to the application main process establishes TCP connection with the client according to the TCP connection request, and creates a TCP communication socket to return to the application main process.
4. The method of claim 3, wherein the server exchanges RDMA link establishment information with the client over a TCP connection of a ViSocket layer and establishes an RDMA connection with the client according to the obtained RDMA link establishment information, comprising:
when the application main process calls a data receiving function or a data sending function through a TCP communication socket of the application main process, a ViSocket layer corresponding to the application main process intercepts the call of the data receiving function or the data sending function;
when the ViSocket layer corresponding to the application main process judges that the RDMA connection is not established between the ViSocket layer and the client, the ViSocket layer corresponding to the application main process informs the client to send an RDMA link establishment request through a TCP communication socket of the ViSocket layer;
and the ViSocket layer of the application main process receives RDMA link establishment information corresponding to the RDMA link establishment request from the client through TCP connection, establishes RDMA connection with the client according to the RDMA link establishment information and establishes an RDMA communication socket.
5. The method of claim 3 or 4, wherein after the server establishes the RDMA connection with the client according to the obtained RDMA link establishment information, the method further comprises:
when the application main process calls a data receiving function or a data sending function through a TCP communication socket of the application main process, a ViSocket layer corresponding to the application main process intercepts the call of the data receiving function or the data sending function;
when the ViSocket layer corresponding to the application main process judges that the RDMA connection is established between the ViSocket layer and the client, the ViSocket layer corresponding to the application main process receives or sends data through the RDMA communication socket of the ViSocket layer and the established RDMA connection.
6. The method of claim 3, wherein the server exchanges RDMA link establishment information with the client over a TCP connection of a ViSocket layer and establishes an RDMA connection with the client according to the obtained RDMA link establishment information, comprising:
the application main process calls a process copy function to copy the application main process to obtain a work subprocess, and the work subprocess is operated; wherein the work subprocess comprises: copying the interception socket and the TCP communication socket corresponding to the application main process to obtain the interception socket and the TCP communication socket corresponding to the work subprocess;
when the work subprocess runs, the work subprocess calls a ViSocket function library to run to form a ViSocket layer corresponding to the work subprocess;
when a work subprocess calls a data receiving function or a data sending function through a TCP communication socket of the work subprocess, a ViSocket layer corresponding to the work subprocess intercepts the call of the data receiving function or the data sending function;
when the ViSocket layer corresponding to the working subprocess judges that RDMA connection is not established between the ViSocket layer and the client, the ViSocket layer corresponding to the working subprocess informs the client to send an RDMA link establishment request through a TCP communication socket of the ViSocket layer;
and the ViSocket layer corresponding to the work subprocess receives RDMA link establishment information corresponding to the RDMA link establishment request from the client through TCP connection, establishes RDMA connection with the client according to the RDMA link establishment information and establishes an RDMA communication socket.
7. The method of claim 6, wherein after the server establishes the RDMA connection with the client according to the obtained RDMA link establishment information, further comprising:
when the work subprocess calls a data receiving function or a data sending function through a TCP communication socket of the work subprocess, a ViSocket layer corresponding to the work subprocess intercepts the call of the data receiving function or the data sending function;
when the ViSocket layer corresponding to the working subprocess judges that the RDMA connection is established between the working subprocess and the client, the ViSocket layer corresponding to the working subprocess receives or sends data through the RDMA communication socket of the ViSocket layer and the established RDMA connection.
8. The method of claim 6, wherein the work subprocess calls a ViSocket function library to run, and after a ViSocket layer corresponding to the work subprocess is formed, the method further comprises:
when the worker sub-process calls the socket closing function to close the monitoring socket of the worker sub-process, the ViSocket layer corresponding to the worker sub-process intercepts the call of the socket closing function to close the monitoring socket and closes the reference of the monitoring socket corresponding to the worker sub-process.
9. The method of claim 6, wherein after the server establishes the RDMA connection with the client according to the obtained RDMA link establishment information, further comprising:
when the application host process calls the socket closing function to close the TCP communication socket of the application host process, the ViSocket layer corresponding to the application host process intercepts the call of the socket closing function to close the TCP communication socket and closes the reference of the TCP communication socket of the application host process.
10. The method of claim 2, wherein the server establishes a TCP connection with the client in a ViSocket layer, and the method comprises:
the application main process calls a process copy function to copy the application main process to obtain at least one work subprocess, and operates all the obtained work subprocesses; wherein each of the work subprocesses comprises: copying the interception socket corresponding to the application main process to obtain an interception socket corresponding to the work subprocess;
when each worker subprocess runs, the worker subprocess calls a ViSocket function library to run to form a ViSocket layer corresponding to the worker subprocess, and the worker subprocess monitors a TCP connection request from a client through a monitoring socket of the worker subprocess;
the method comprises the steps that a work subprocess which senses a TCP connection request from a client through a monitoring socket is used as a target work subprocess, when the target work subprocess calls a communication socket creating function to receive the TCP connection request of the client, a ViSocket layer corresponding to the target work subprocess intercepts the call of the communication socket creating function and receives the TCP connection request;
and the ViSocket layer corresponding to the target work subprocess establishes TCP connection with the client according to the TCP connection request, and creates a TCP communication socket to return to the target work subprocess.
11. The method of claim 10, wherein the server exchanges RDMA link establishment information with the client over a TCP connection of a vissocket layer and establishes an RDMA connection with the client according to the obtained RDMA link establishment information, comprising:
when the target work subprocess calls a data receiving function or a data sending function through a TCP communication socket of the target work subprocess, a ViSocket layer corresponding to the target work subprocess intercepts the call of the data receiving function or the data sending function;
when the ViSocket layer corresponding to the target work subprocess judges that RDMA connection is not established between the ViSocket layer and the client, the ViSocket layer corresponding to the target work subprocess informs the client to send an RDMA link establishment request through a TCP communication socket of the ViSocket layer;
and the ViSocket layer corresponding to the target work subprocess receives RDMA link establishment information corresponding to the RDMA link establishment request from the client through TCP connection, establishes RDMA connection with the client according to the RDMA link establishment information and establishes an RDMA communication socket.
12. The method of claim 11, wherein after the server establishes the RDMA connection with the client according to the obtained RDMA link establishment information, further comprising:
when the target work subprocess calls a data receiving function or a data sending function through a TCP communication socket of the target work subprocess, a ViSocket layer corresponding to the target work subprocess intercepts the call of the data receiving function or the data sending function;
and when the ViSocket layer corresponding to the target work subprocess judges that the RDMA connection is established between the ViSocket layer and the client, the ViSocket layer corresponding to the target work subprocess receives or sends data through the RDMA communication socket of the ViSocket layer and the established RDMA connection.
13. The method of claim 10, wherein after the application host process calls the process copy function to copy itself to obtain at least one work sub-process, the method further comprises:
when the application host process calls the socket closing function to close the monitoring socket of the application host process, the ViSocket layer corresponding to the application host process intercepts the call of the socket closing function to close the monitoring socket and closes the reference of the monitoring socket of the application host process.
14. The method of claim 11, wherein the ViSocket layer corresponding to the target worker sub-process receives RDMA link establishment information corresponding to the RDMA link establishment request from the client over a TCP connection, and further comprising, after establishing an RDMA connection with the client according to the RDMA link establishment information:
when the target work subprocess calls the socket closing function to close the TCP communication socket of the target work subprocess, the ViSocket layer corresponding to the target work subprocess intercepts the call of the socket closing function used for closing the TCP communication socket and closes the reference of the TCP communication socket of the target work subprocess.
15. A server, comprising a memory and a processor, wherein the memory is used for storing an executable program;
the processor is operable to read and execute the executable program to implement the RDMA connection establishment method of any of claims 1-14.
CN202310153021.6A 2023-02-22 2023-02-22 RDMA connection establishment method and device Active CN115866010B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310153021.6A CN115866010B (en) 2023-02-22 2023-02-22 RDMA connection establishment method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310153021.6A CN115866010B (en) 2023-02-22 2023-02-22 RDMA connection establishment method and device

Publications (2)

Publication Number Publication Date
CN115866010A true CN115866010A (en) 2023-03-28
CN115866010B CN115866010B (en) 2023-05-26

Family

ID=85658721

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310153021.6A Active CN115866010B (en) 2023-02-22 2023-02-22 RDMA connection establishment method and device

Country Status (1)

Country Link
CN (1) CN115866010B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117834709A (en) * 2024-01-04 2024-04-05 天津大学 Method for directly transferring data between functions of server-oriented non-perception computing scene

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130332767A1 (en) * 2012-06-12 2013-12-12 International Business Machines Corporation Redundancy and load balancing in remote direct memory access communications
CN103746977A (en) * 2013-12-27 2014-04-23 东软熙康健康科技有限公司 Connection method and device for Linux server
US20170085683A1 (en) * 2015-09-21 2017-03-23 International Business Machines Corporation Protocol selection for transmission control protocol/internet protocol (tcp/ip)
CN111327639A (en) * 2020-03-19 2020-06-23 刘奇峰 Socket communication method and device
CN113064846A (en) * 2021-04-14 2021-07-02 中南大学 Zero-copy data transmission method based on Rsockets protocol
US20210377345A1 (en) * 2018-11-09 2021-12-02 Microsoft Technology Licensing, Llc Establishment of socket connection in user space

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130332767A1 (en) * 2012-06-12 2013-12-12 International Business Machines Corporation Redundancy and load balancing in remote direct memory access communications
CN103746977A (en) * 2013-12-27 2014-04-23 东软熙康健康科技有限公司 Connection method and device for Linux server
US20170085683A1 (en) * 2015-09-21 2017-03-23 International Business Machines Corporation Protocol selection for transmission control protocol/internet protocol (tcp/ip)
US20210377345A1 (en) * 2018-11-09 2021-12-02 Microsoft Technology Licensing, Llc Establishment of socket connection in user space
CN111327639A (en) * 2020-03-19 2020-06-23 刘奇峰 Socket communication method and device
CN113064846A (en) * 2021-04-14 2021-07-02 中南大学 Zero-copy data transmission method based on Rsockets protocol

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
MASTERT-J: "基于Java的RDMA高性能通信库(二):Java Socket Over RDMA" *
马梯恩: "用户层通信函数库vi socket分析和实现" *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117834709A (en) * 2024-01-04 2024-04-05 天津大学 Method for directly transferring data between functions of server-oriented non-perception computing scene

Also Published As

Publication number Publication date
CN115866010B (en) 2023-05-26

Similar Documents

Publication Publication Date Title
CN111614631B (en) User mode assembly line framework firewall system
US7274706B1 (en) Methods and systems for processing network data
Bakre et al. Handoff and Systems Support for Indirect TCP/IP.
US7640364B2 (en) Port aggregation for network connections that are offloaded to network interface devices
US10114792B2 (en) Low latency remote direct memory access for microservers
Laufer et al. Climb: Enabling network function composition with click middleboxes
CN110768994B (en) Method for improving SIP gateway performance based on DPDK technology
US20070297334A1 (en) Method and system for network protocol offloading
WO2022148363A1 (en) Data transmission method and data transmission server
JP2004526218A (en) Highly scalable and fast content-based filtering and load balancing system and method in interconnected fabric
WO2024037296A1 (en) Protocol family-based quic data transmission method and device
WO2017028399A1 (en) Communication data transmission method and system
US11895027B2 (en) Methods and systems for service distribution using data path state replication and intermediate device mapping
KR20220157322A (en) Methods and systems for service state replication using original data packets
Hayakawa et al. Prism: Proxies without the pain
CN115866010B (en) RDMA connection establishment method and device
CN117240935A (en) Data plane forwarding method, device, equipment and medium based on DPU
CN115834660B (en) Non-blocking RDMA connection establishment method and device
Shah et al. {CSP}: A Novel System Architecture for Scalable Internet and Communication Services
CN115202573A (en) Data storage system and method
WO2023186109A1 (en) Node access method and data transmission system
Qi et al. X-IO: A high-performance unified I/O interface using lock-free shared memory processing
Hayakawa et al. Prism: a proxy architecture for datacenter networks
Sterbenz et al. AXON: Application-oriented lightweight transport protocol design
Liu et al. L 2 5GC+: An Improved, 3GPP-compliant 5G Core for Low-latency Control Plane Operations

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant