WO2019219184A1 - Dispositif de réception et dispositif de transmission pour une communication tcp - Google Patents

Dispositif de réception et dispositif de transmission pour une communication tcp Download PDF

Info

Publication number
WO2019219184A1
WO2019219184A1 PCT/EP2018/062720 EP2018062720W WO2019219184A1 WO 2019219184 A1 WO2019219184 A1 WO 2019219184A1 EP 2018062720 W EP2018062720 W EP 2018062720W WO 2019219184 A1 WO2019219184 A1 WO 2019219184A1
Authority
WO
WIPO (PCT)
Prior art keywords
identifier
tcp
receiving device
transmitting device
tcp segment
Prior art date
Application number
PCT/EP2018/062720
Other languages
English (en)
Inventor
Victor Gissin
Elena Gurevich
Original Assignee
Huawei Technologies Co., Ltd.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co., Ltd. filed Critical Huawei Technologies Co., Ltd.
Priority to PCT/EP2018/062720 priority Critical patent/WO2019219184A1/fr
Priority to CN201880093503.4A priority patent/CN112154633B/zh
Publication of WO2019219184A1 publication Critical patent/WO2019219184A1/fr

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L69/00Network arrangements, protocols or services independent of the application payload and not provided for in the other groups of this subclass
    • H04L69/16Implementation or adaptation of Internet protocol [IP], of transmission control protocol [TCP] or of user datagram protocol [UDP]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/10Flow control; Congestion control
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L69/00Network arrangements, protocols or services independent of the application payload and not provided for in the other groups of this subclass
    • H04L69/16Implementation or adaptation of Internet protocol [IP], of transmission control protocol [TCP] or of user datagram protocol [UDP]
    • H04L69/163In-band adaptation of TCP data exchange; In-band control procedures

Definitions

  • the present invention relates generally to Transmission Control Protocol (TCP) communications.
  • TCP Transmission Control Protocol
  • the present invention specifically presents a receiving device for communicating via TCP, a transmitting device for communicating via TCP, and a system including the two devices.
  • the present invention thereby relates to the use, exchange and configuration of an identifier via TCP segments, which identifier allows directly accessing a connection context of the connection between the transmitting device and the receiving device at the receiving device.
  • TCP/Intemet Protocol is a native transport choice for stream-semantics oriented services.
  • the TCP/IP protocol is ideally suited for reliable end-to-end communications over a heterogeneous network infrastructure. It provides simple and universal data stream abstraction and connection management. Further, it provides high scalability and simplicity to the infrastructure required for its deployment. Additionally, it provides flexibility and extensibility, particularly because there are plenty of end-to-end congestion and flow control algorithms addressing changing data center networking requirements like DCTCP, MPTCP, ICTCP etc.
  • a disadvantage is, however, that a TCP/IP implementation suffers from high Central Processing Unit (CPU) utilization and prolonged data processing.
  • CPU Central Processing Unit
  • RoCE RDMA over Converged Ethernet
  • RoCE is a native transport choice for memory-semantics oriented services.
  • RoCE is ideally suited for Remote Direct Memory Access (RDMA) operations over a lossless network infrastructure.
  • RoCE provides low CPU utilization, low latency, and highest throughput over the lossless network infrastructure.
  • a disadvantage, however, is that RoCE fails to scale over heterogeneous network infrastructures (even with the newly developed Data Center Quantized Congestion Notification (DCQCN)). Further, it is very sensitive to packet loss, hence it requires a Priority-based Flow Control (PFC) deployment. Further, it involves a complicated network management, congestion spreading, and deadlocks.
  • DCQCN Data Center Quantized Congestion Notification
  • TCP/IP stack processing may be implemented in a Smart Network Interface Controller (NIC) card, also referred to as TCP/IP Offload Engine (TOE) bringing the following gains:
  • TOE processes the TCP protocol’s stack in the Smart NIC, thus leaving the CPU cycles for the user application.
  • TOE implements Zero-copy of the send buffers, taking the data directly from the user buffers.
  • TOE implements Zero-copy of the receive buffers, putting the data directly to the user buffers using a“pre-posted buffers” mechanism. ⁇ TOE avoids User Space/Kemel crossing delays implementing User Space driver schemas.
  • TOE is able to fully or partially offload Upper Layer applications (e.g. DIF calculation for iSCSI, CRC calculation for iWARP, TLS inline processing).
  • Upper Layer applications e.g. DIF calculation for iSCSI, CRC calculation for iWARP, TLS inline processing.
  • TOE retains the ability of the robust protocol to service high-scale applications over heterogeneous network.
  • TOE scalability is not limited by the capacity of the Smart NIC and allows“caching” mechanisms, where the TCP connection may be processed by Kernel and TOE in rotation.
  • TOE implementation based on the programmable Smart NIC retains the flexibility to incorporate mature and emerging developments of TCP/IP protocol.
  • a transmission processing latency of TOE is comparable with RoCE, as is shown in the following table.
  • the complexity of the protocol - and in turn the complexity of the protocol’s processing - is not the comparison factor.
  • the complexity reflects the underlying network infrastructure and the load pattern - the same conditions require similar solutions.
  • TOE/TCP connection context is lookup (binary tree, hash, etc.) by 5-tuples (296bits), which is in the segment.
  • the 5-tuples based lookup of TOE/TCP connection context is a huge disadvantage particularly for high scalable applications, for example, when thousands or millions of TCP connections are required.
  • the lookup latency significantly harms the performance of such applications.
  • the present invention aims to improve conventional TCP/TOE communication.
  • the present invention has in particular the objective to combine the advantages of TCP/TOE and RoCE. That is, the present invention aims for the possibility of a faster access of the connection context in TCP/TOE. To this end, the present invention has the goal to provide capability of a direct access of the connection context in
  • the present invention proposes a receiving device and transmitting device for TCP communications, which use, exchange and configure an identifier that allows direct access of the connection context at the receiving device.
  • a first aspect of the present invention provides a receiving device for communicating via TCP, the receiving device being configured to receive a TCP segment carrying an identifier from a transmitting device, obtain the identifier from the received TCP segment, and directly access a connection context at the receiving device based on the identifier, wherein the connection context relates to a connection between the receiving device and the transmitting device.
  • the receiving device obtains the identifier from multiple or each TCP segment, which the transmitting device sends. Since the identifier allows the receiving device to directly access the connection context at the receiving device, it can obtain the connection context much faster than in the existing TCP/TOE implementation. Nevertheless, all advantages of TCP/TOE remain. Thus, the advantages of TCP/TOE and RoCE are combined. As a consequence of the fast access to the connection context, the receiving device of the first aspect provides significant performance advantages, particularly for high scalable applications.
  • the receiving device is configured to obtain the identifier from a determined TCP option of the received TCP segment, particularly of a TCP header of the received TCP segment.
  • the TCP option can be efficiently used to carry an identifier to the receiving device.
  • the receiving device is configured to extract a key associated with the connection context from the identifier, and directly access the connection context using the key.
  • the receiving device can very quickly and efficiently obtain the connection context by using the key.
  • the receiving device is configured to extract an address of the connection context and/or an index of a table maintaining the connection context at the receiving device from the identifier, and directly access the connection context according to the address and/or index.
  • the receiving device can very quickly and efficiently obtain the connection context by means of the address or the index, respectively.
  • the receiving device is configured to extract information indicating whether the connection between the transmitting device and the receiving device has been offloaded from a host TCP processing stack to a TOE at the receiving device from the identifier.
  • the receiving device is configured to extract information identifying a service type of an upper layer protocol and/or information identifying a processing type for the TCP segment from the identifier.
  • the receiving device is configured to, during an exchange of control information for configuring the connection between the transmitting device and the receiving device, insert identifier set information into a TCP segment, particularly into a determined TCP option in the TCP segment, and transmit said TCP segment to the transmitting device.
  • the receiving device is in particular configured to insert the identifier set information according to capability information, which the receiving device beforehand received by the transmitting device. For instance, the transmitting device may inform the receiving device how long an identifier may be (capability information). The receiving device accordingly adjusts to this length information, and sends the identifier set information to transmitting device.
  • the identifier set information includes information enabling the transmitting device to obtain the identifier.
  • the transmitting device can preferably insert the identifier into each TCP segment sent to the receiving device, thereby allowing the receiving device to directly access the connection context.
  • the receiving device is further configured to, during the exchange of control information, insert capability information into a TCP segment, particularly into a determined TCP option in the TCP segment, and transmit said TCP segment to the transmitting device.
  • the receiving device may on the one hand inform the transmitting device, whether it requests to participate in the scheme of the invention, i.e. whether it is configured to extract an identifier from a received TCP segment or not.
  • the transmitting device may thus be informed that the receiving device is also able to insert an identifier related to a connection context at the transmitting device into a TCP segment sent to the transmitting device, i.e. that it can participate in the communication as both“receiving device” and“transmitting device” according to the invention, respectively.
  • the capability information includes a maximum supported identifier size.
  • the transmitting device can be informed about a maximum size of an identifier, which may be maintained at the receiving device and can be used by the receiving device when acting as“transmitting device”.
  • a second aspect of the present invention provides a transmitting device for communicating via TCP, the transmitting device being configured to insert an identifier into a TCP segment and transmit the TCP segment carrying the identifier to a receiving device, wherein the identifier includes information that enables the receiving device to directly access a connection context at the receiving device, wherein the connection context relates to a connection between the transmitting device and the receiving device.
  • the transmitting device enables high scalability of applications even for thousands or millions of TCP connections, without harming significantly the application performance.
  • the transmitting devices configured to insert the identifier into a determined TCP option in the TCP segment, particularly of a TCP header of the TCP segment.
  • the TCP option can be efficiently used to transmit the identifier from the transmitting device to the receiving device. No large additional overhead is thereby created.
  • the structure of the identifier is opaque to the transmitting device.
  • the transmitting device does not know the“meaning” of the identifier, i.e. whether it relates to a connection context at the receiving device or not, and simply inserts the identifier“blindly” into a TCP segment, preferably each TCP segment sent to the receiving device.
  • the identifier is pre-configured at the transmitting device.
  • the identifier may be stored at the transmitting device upon its configuration or setup.
  • the transmitting device is configured to, during an exchange of control information for configuring the connection between the transmitting device and the receiving device, receive a TCP segment carrying identifier set information from the receiving device, and obtain the identifier, particularly from a determined TCP option of the received TCP segment, based on the received identifier set information.
  • the transmitting device can also obtain the identifier from the receiving device, for subsequent insertion into every TCP segment transmitted to the receiving device.
  • the transmitting device is configured to maintain a determined variable per established connection for storing the obtained identifier together with a size and/or a validity state of the identifier.
  • a third aspect of the present invention provides a system for communicating via TCP, the system comprising a receiving device according to the first aspect or any of its implementation forms, and a transmitting device according to the second aspect or any of its implementation forms.
  • the system of the third aspect achieves the advantages described above for the receiving device of the first aspect, and the transmitting device of the second aspect, respectively.
  • a fourth aspect of the present invention provides a method for a receiving device communicating via TCP, the method comprising receiving a TCP segment carrying an identifier from a transmitting device, obtaining the identifier from the received TCP segment, and directly accessing a connection context at the receiving device based on the identifier, wherein the connection context relates to a connection between the receiving device and the transmitting device.
  • the method of the fourth aspect can be extended by implementation forms, which correspond to the implementation forms of the receiving device of the first aspect. Accordingly the method of the fourth aspect achieves all advantages and effects of the device of the first aspect and its implementation forms.
  • a fifth aspect of the present invention provides a method for a transmitting device communicating via TCP, the method comprising inserting an identifier into a TCP segment and transmitting the TCP segment carrying the identifier to a receiving device, wherein the identifier includes information enabling the receiving device to directly access a connection context at the receiving device, wherein the connection context relates to a connection between the transmitting device and the receiving device.
  • the method of the fifth aspect can be extended by implementation forms, which correspond to the implementation forms of the transmitting device of the second aspect. Accordingly the method of the fifth aspect achieves all advantages and effects of the device of the second aspect and its implementation forms.
  • a sixth aspect of the present invention provides a method for automatic management and configuration of an identifier between a transmitting device and a receiving device by exchanging control information, the method comprising: inserting, by the transmitting device, identifier capability information into a TCP segment, particularly into a determined TCP option in the TCP segment, and transmitting said TCP segment to the receiving device, inserting, by the receiving device, identifier set information into a TCP segment, particularly into a determined TCP option in the TCP segment, and transmitting said TCP segment to the transmitting device, and receiving, by the transmitting device, the TCP segment carrying the identifier set information and obtaining the identifier based on the received identifier set information.
  • the method of the sixth aspect allows the transmitting device and receiving device to exchange an identifier, which can be then used for faster TCP segment processing enabled by direct access to the connection context at the receiving device. It has to be noted that all devices, elements, units and means described in the present application could be implemented in the software or hardware elements or any kind of combination thereof. All steps which are performed by the various entities described in the present application as well as the functionalities described to be performed by the various entities are intended to mean that the respective entity is adapted to or configured to perform the respective steps and functionalities.
  • FIG. 1 shows a receiving device according to an embodiment of the present invention.
  • FIG. 2 shows a transmitting device according to an embodiment of the present invention.
  • FIG. 3 shows an overview of conventional TCP (left), conventional RoCE (right) and the scheme of the present invention achieving combined advantages of TCP and RoCE (middle).
  • FIG. 4 shows a procedure carried out by to end-points A and B (receiving device and transmitting device according to embodiments of the present invention).
  • FIG. 5 shows a TCP option format for the identifier.
  • FIG. 6 shows a TCP segment layout comprising the determined identifier TCP option incorporated with EDO option.
  • FIG. 7 shows a TCP option format for identifier set information (top) and shows a TCP option format for capability information (bottom).
  • FIG. 8 shows a procedure carried out by two end-points A and B (receiving device and transmitting device) according to embodiments of the present invention.
  • FIG. 9 shows a procedure carried out by two end-points A and B (receiving device and transmitting device) according to embodiments of the present invention.
  • FIG. 10 shows a TCP option format for identifier set information (top) and shows a an alternate TCP option format for capability information (bottom).
  • FIG. 11 shows a procedure carried out by two end-points A and B (receiving device and transmitting device) according to embodiments of the present invention.
  • FIG. 12 shows a procedure carried out two end-points A and B (receiving device and transmitting device) according to embodiments of the present invention.
  • FIG. 13 shows a use case of TCP vs. RoCe.
  • FIG. 14 shows TCP vs. RoCe connection context lookup latency.
  • FIG. 15 shows a TCP option format.
  • FIG. 16 shows an optimized dataflow within a Hi 1823 adaptor.
  • FIG. 17 shows a method for a receiving device according to an embodiment of the present invention.
  • FIG. 18 shows a method for a transmitting device according to an embodiment of the present invention.
  • FIG. 19 shows a method for an automatic management and configuration of an identifier between a transmitting device and a receiving device according to embodiments of the present invention, namely by exchanging control information.
  • FIG. 1 shows a receiving device 100 according to an embodiment of the present invention.
  • the receiving device is configured to communicate via TCP/TOE.
  • the receiving device 100 may be provided in a computer or other terminal device.
  • the receiving device 100 may comprise at least one processor or other processing circuitry configured to carry out functions as described in the following.
  • the receiving device 100 is particularly configured to receive a TCP segment 101 carrying an identifier 102 from a transmitting device 110. For instance, the identifier 102 was inserted into the TCP segment 101 by the transmitting device 110. Particularly, the identifier 102 may be inserted into a determined (i.e. dedicated or specific) TCP option of the received TCP segment 101, specifically in a TCP header of the TCP segment 101.
  • the receiving device 100 is further configured obtain the identifier 102 from the received TCP segment 101.
  • the receiving device 100 may be configured to obtain the identifier 102 from a determined TCP option of the TCP segment 101.
  • the receiving device 100 is then configured to directly access a connection context 103 at the receiving device 100 based on the identifier 102.
  • the connection context 103 relates to a connection between the receiving device 100 and the transmitting device 11, i.e. it includes context about this connection.
  • the receiving device 100 may extract the identifier 102 from the TCP segment 101, and may then use it to obtain the connection context 103 in a direct manner.
  • FIG. 2 shows a transmitting device 110 according to an embodiment of the present invention.
  • the transmitting device 110 is configured to communicate via TCP/TOE.
  • the transmitting device 110 may be provided in a computer or other terminal device.
  • the transmitting device 110 may comprise at least one processor or other processing circuitry configured to carry out functions a described in the following.
  • the transmitting device 110 of FIG. 2 may specifically be the transmitting device 110 shown in FIG. 1.
  • the transmitting device 110 is configured to insert an identifier 102 into a TCP segment 101, and to transmit the TCP segment 101 carrying the identifier 102 to a receiving device 100 (specifically the receiving device 100 shown in FIG. 1).
  • the identifier 102 inserted by the transmitting device 110 is particularly an identifier according to a preconfigured (manually or automatically) value (e.g. variable 410) of its connection context 113 related to the connection to the receiving device 100.
  • the transmitting device 110 may include the identifier 102 into a determined TCP option in the TCP segment 101.
  • the identifier 102 includes information that enables the receiving device 100 to directly access a connection context 103 at the receiving device 100.
  • the structure of the identifier 102 (e.g. whether it is a key, address or index) is opaque to the transmitting device 110.
  • the transmitting device 110 does not know that the information, and what kind of information, in the identifier 102 enables the receiving device 100 to directly access the connection context 103.
  • the connection context 103 again relates to the connection between the transmitting device 110 and the receiving device 100.
  • the possibility to directly access the connection context 103 by the receiving device 100 equalizes the latency of processing the received TCP segment 101 and a latency for processing a received RoCE packet.
  • the present invention - as implemented by the receiving device 100 and/or the transmitting device 110 - allows having all advantages of the TCP protocol (for stream-semantics oriented applications), while also achieving the advantages of the RoCE protocol. This is illustrated in FIG. 3.
  • FIG. 3 shows on the left side the conventional TCP protocol, using multi-stage lookup procedure to access the connection context, and on the right side the conventional RoCE protocol, the latter allowing a direct access of the connection context.
  • the solution of the present invention wherein the TCP option carrying the identifier 102 is referred to as a“cookie” or“Connection Cookie (CoCo)”.
  • the insertion of the identifier 102 into sent TCP segments 101 allows to direct access to the connection context 103 also for the TCP protocol.
  • FIG. 4 shows a procedure carried out between two end-points A and B (e.g. terminal devices). Both end-points A and B may be the same.
  • end-point A is indicated as being a receiving device 100 according to an embodiment of the present invention
  • end- point B is indicated as being a transmitting device 110 according to an embodiment of the present invention.
  • both end-points A and B may act as“receiving device 100” and “transmitting device 110” according to the invention, depending on the communication direction between the end-points.
  • end-point B the transmitting device 110
  • end-point A the receiving device, it inserts the identifier 102 into the TCP segment 101.
  • end-point A when end-point A (now acting as“transmitting device 110”) sends a TCP segment 111 to end-point B (now acting as“receiving device 100), it may include an identifier 112 into the TCP segment, wherein the identifier 112 allows end-point B to directly access a connection context at end-point B.
  • the identifier 102 is only for unidirectional traffic from a transmitting device 110 to a receiving device 100.
  • both ends of the TCP connection i.e. transmitting device 110 and receiving device 100
  • both ends of the TCP connection i.e. transmitting device 110 and receiving device 100
  • the scheme does not mean that both of them (or even one of them) always have to utilize this capability (i.e. either the whole life or the part of the life of this connection).
  • a TCP stack at end-point B, the transmitting device 110 is configured to maintain a variable 410 (also referred to as“ Peer CoCo” variable) in the connection context of the transmitting device 110.
  • This variable 410 maintains the identifier 102 of end-point A, the receiving device 100 (also referred to as“ Peer CoCo value”) including a size and validity state of the identifier 102.
  • a TCP stack of end-point A may maintain a variable 400 to store identifier 112.
  • the identifier 102 includes information related to the connection context 103, which is maintained at the receiving device 100, and is opaque for the transmitting device 110.
  • the size of identifier 102 is implementation dependent and is not limited in the present invention.
  • the identifier 102 maintained in variable 410 of the transmitting device 110 may comprise a direct address of the connection context 103 at the receiving device 100, e.g. in x86 memory. In this case the identifier 102 may be bigger than or equal to 64bits.
  • the identifier 102 may comprise an index in a table of the connection context at the receiving device 100. In this case the identifier 102 may be in the order of 32-bits.
  • the variable 410 maintained at the transmitting device 110 may be initialized in an invalid state upon the creation of the connection context at the transmitting device 110.
  • the present invention does not limit the way of initialization and modification of the variable 410.
  • the initialization and modification may be done either“out-of-band” or“in-band”.
  • “Out-of-band” stands for an external agent, which updates the variable 410 via TCP stack’s Application Programming Interface (API).“In-band” stands for the managing of the variable 410 through an incoming TCP stream, and is described later in more detail.
  • FIG. 5 shows that the identifier 102 may particularly be inserted into a TCP option 500 of the TCP segment 101, which is described in more detail below.
  • FIG. 6 shows how the TCP option 500 with the identifier 102 may be provided in the TCP segment 101.
  • a TCP stack of the transmitting device 1 10 may attach a TCP option 500 (e.g. as shown in FIG. 5) to every TCP segment 101 (e.g. as shown in FIG. 6) transmitted to the receiving device 100 in the case, when the local variable 410 at the transmitting device 100 is in the valid state.
  • the TCP option 500 carries the identifier 102, i.e. the value stored in the variable 410.
  • the format of the TCP option 500 may be defined according to RFC6994 (Experimental TCP Options).
  • the TCP option 500 is used together with other popular TCP options (e.g. timestamp, window scale, 3-hole SACK), and cannot fit into a TCP standard option space (e.g. 40B), it can be used in conjunction with EDO option, as e.g. described in https://tools. ietf.org/html/draft-ietf-tcpm-tcp-edo-08.
  • the present invention does not limit the way of the processing of the received TCP segment 101 at the receiving device 100, particularly not the processing of the TCP option 500 carrying the identifier 102.
  • the present invention does also not limit the contents of the identifier 102, but it generally supposes that the identifier 102 serves as a helper for directly accessing of the connection context 103 at the receiving device 100.
  • the identifier 102 may comprise a short key, which allows fast lookup of the connection context 103 at the receiving device 100.
  • the identifier 102 may also comprise an index to a direct-access“Connection contexts” table at the receiving device 100, or a fully qualified memory address of that connection context 103 at the receiving device 100.
  • the receiving device 100 may receive sequential TCP segments 101, with or without TCP options 500 (i.e. with or without identifier 102), or even with TCP options 500 carrying different identifiers 102 (e.g. or as a result of a variable 410 re-configuration at a transmitting device 110). This may lead to a temporary disordering in multi-threaded TCP stack’s implementation, since the processing flows of TCP segments 101 with different identifiers 102 go through the different paths (TBD).
  • a header length field within Extended Data Offset (EDO) option may override the data_offset field of TCP header.
  • EDO option and TCP option 500 of FIG. 5 may reside within an option space covered by the data_offset field.
  • a TCP stack at a conventional receiving device which does not support the scheme of the present invention, but which nevertheless receives a TCP segment 101 with a TCP option 500 including an identifier 102, may silently ignore the TCP options 500.
  • an“in-band” initialization and modification (i.e. in band control) of the variable 410 at the transmitting device 110 is described.
  • the proposed invention does not limit the way of the implementation of this“in-band” control (i.e. via in-band traffic).
  • the proposed invention only supposes that the in-band control is based on the exchange of the determined TCP options, which are attached to transmitted TCP segments.
  • TCP options may be either in a standard or in an experimental format.
  • a TCP option may be attached to any data segment.
  • a TCP option may be attached to any data or SYN segment.
  • the TCP options shall be reliably transmitted to the connection peer. For achieving this reliability of the TCP option transmission, monitoring and acknowledgment of the TCP segment 101, to which TCP option 500 is attached, may be provided. Two alternatives for the in-band control are now described as examples.
  • a first“in-band” is illustrated in FIG. 7, FIG. 8 and FIG. 9.
  • the first alternative is implemented using two dedicated TCP options 700 and 702 as shown in FIG. 7.
  • the upper TCP option 700 of FIG. 7 includes identifier set information 703, and the lower TCP option 702 of FIG. 7 includes capability information 703.
  • the capability information 703 can declare the capability of the transmitting device 110 to maintain an identifier 102 in the variable 410 of a maximum length, and to insert the identifier 102 when sending TCP segments 101 to the receiving device 100.
  • the capability information 703 may include a maximum supported size of the identifier 102.
  • A‘O’ size value means no support, i.e. no capability to maintain and use an identifier 102.
  • the identifier set information 701 can pass a value of the identifier 102 from the receiving device 100 to the transmitting device 110.
  • the identifier set information 701 may include the identifier 102 value and its size.
  • A‘O’ size value means an invalidation of the identifier 102.
  • the TCP options 700 and 702 may be attached to any non-SYN data segment and shall be reliably transmitted to the connection peer.
  • FIG. 8 A possible exchange of these TCP options 700 and 702 is illustrated in FIG. 8 (graceful exchange) and FIG. 9 (abortive exchange) and is explained below.
  • the exchange is performed by two TCP end-points A and B.
  • end-point A is indicated as being a receiving device 100 according to an embodiment of the present invention
  • end-point B is indicated as being a transmitting device 110 according to an embodiment of the present invention.
  • both end- points A and B may act as both“receiving device 100” and“transmitting device 110” according to the invention, depending on the communication direction between the end-points.
  • end-points A and B When a state of the transmitting device’s capability of sending and/or receiving identifiers over the connection to the receiving device 100 is changed, the transmitting device 110 shall inform the receiving device 100 about its capability.
  • end-point B the transmitting device 110, may send capability information 703 in a TCP segment 800 to end-point A, the receiving device 100. This can, for instance, be done at connection setup, download or upload Connection to/ffom TOE, configuration, etc.
  • end-point A may transmit identifier set information 701 to end-point B in a TCP segment 810, this set may be redundant but required because end-point A may“forget” the state during transition between TCP stacks of different capabilities (Host-software/TOE).
  • end-point B Upon reception of a valid, in-order TCP segment 800 with attached TCP option 700 including identifier set information 701, end-point B shall update its local variable 410 including the validity state and size of the identifier 102 obtained from the identifier set information 710.
  • end-point B Upon reception of an identifier set information 701, which does not fit the local capability of end-point B, end-point B shall invalidate the local variable 410 and re-send its capability information 70 to end-point A.
  • end-point B can insert the identifier 102 into a TCP segment 101 send to end-point A.
  • end-point B also submits own identifier set information 711 in the TCP segment 800.
  • end-point B may send capability information 713 in the TCP segment 810 to end-point A.
  • an identifier 112 obtained from identifier set information 711 can be configured in the variable 400 at end-point A.
  • end-point A can also act as“transmitting device 110” and can insert the identifier 112 into a TCP segment 111 sent to end-point B acting in this case as“receiving device 100”.
  • graceful exchange means that during the exchange there were no conflicts in capabilities, and a bi-directional data flow with identifiers 102 and 112 inserted into TCP segments 101 and 111, respectively, transmitted in both directions is established.
  • abortive exchange means that there were conflicts in the capability of end- point B and the identifier requested by end-point A.
  • end-point B in the end does not insert an identifier 102 into TCP segments 101 sent to end-point A.
  • end- point A requests with the identifier set information 701 an identifier with a size of 8 bytes, but end-point B can only provide an identifier 102 with a size of 4 bytes as indicated with the capability information 703.
  • identifier 112 is used in direction from end- point A to end-point B, but no identifier is used in direction from end-point B to end-point A.
  • a second“in-band” alternative is explained with respect to FIG. 10, FIG. 11 and FIG. 12.
  • the second alternative is implemented using two TCP options 1000 and 1002 as shown in FIG. 10.
  • the upper TCP option 1000 includes identifier set information 1001
  • the lower TCP option 1002 includes extended capability information 1003.
  • the capability information 1003 can declare a capability of both receiving device 100 and transmitting device 110 of a TCP connection.
  • the capability information 1003 may include a maximum supported size of the identifier 102 for both ends of the connection.
  • A‘O’ size value means no support, i.e. no capability.
  • the identifier set information 1001 can pass a value of the identifier 102 to the transmitting device 110.
  • the identifier set information 1001 may include the identifier 102 value and its size.
  • A‘O’ size value means an invalidation of the identifier 102.
  • FIG. 10 A possible exchange of these TCP options 1000 and 1002 is illustrated in FIG. 10 (graceful exchange) and FIG. 11 (abortive exchange) and is explained below.
  • the exchange is performed by two TCP end-points A and B.
  • end-point A is indicated as being a receiving device 100 according to an embodiment of the present invention
  • end-point B is indicated as being a transmitting device 110 according to an embodiment of the present invention.
  • both end-points A and B may act as both “receiving device 100” and“transmitting device 110” according to the invention, depending on the communication direction between the end-points.
  • the transmitting device shall inform the receiving device 100 about the known capability.
  • end- point B the transmitting device 110
  • This can, for instance, be done at connection setup, download or upload Connection to/ffom TOE, configuration, etc.
  • the capability information 1003 may preferably be attached to s SYN segment.
  • end-point A may send its known capability information 1013 to end-point B in a TCP segment 1110.
  • TCP end-point A may submit identifier set information 1001 to end-point B in a TCP segment 1020, in order to allow end-point B to obtain the identifier 102 and maintain it in variable 410. That is, upon reception of a valid, in-order segment 1020 with attached identifier set information 1011, TCP end-point B shall update the local variable 410 including the validity state and size of the identifier 102. Upon the reception of identifier set information 1011, which does not fit the local capability of end-point B, the end-point B shall invalidate the local variable 410.
  • TCP end-point A Upon reception of capability information 1003 with a remote capability, which does not correspond to the capability of end-point A, the TCP end-point A shall submit its own known capability information. This may be redundant but preferred, because TCP end-points may “forget” the state during transition between TCP stacks of different capabilities (Host- soft ware/TOE).
  • a TCP end-point which does not support the present invention will silently ignore any received TCP option 1000 or 1002.
  • graceful exchange means that during the exchange there were no conflicts in capabilities, and a bi-directional data flow with identifiers into TCP segments, respectively, transmitted in both directions is established.
  • an identifier size is 8 bytes
  • an identifier size is 4 bytes (even though end-point B is capable to support identifiers with a size of 8 bytes).
  • abortive exchange means that there were conflicts in the capability of end-point B and the identifier requested by end-point A.
  • an identifier size requested by end-point A (with a size of 8 bytes) is bigger than a size end-point B can provide (namely only 4 bytes).
  • the present invention may, for instance, be applied to accelerate the processing of received TCP packets by TOE within a H ⁇ 1823 smart network adaptor.
  • the goal of this use case is that the TCP packet should be fully qualified as TCP offloaded packet by IPSU RX module.
  • FIG. 13 shows in this respect a current TCP flow (bottom) and a RoCE flow (top) in a receiver of the Hi 1823.
  • FIG. 15 shows a TCP option 1500 with opcode 1501 included.
  • the TCP option also includes an X_ID.
  • the present invention implemented in HP823 accelerates the processing of TCP-based traffic. Results are summarized in FIG.14. Equalization of the receive latency between TCP and RDMA packets is achieved. In particular, a RX latency of the offloaded TCP packet is cut by more than 600ns. Further, parallel processing of Upper Layer protocols is possible: iSCSI, iWARP, MPI. Also, a unification of RoCE and TCP flows’ processing is achieved.
  • FIG. 17 shows a method 1700 according to an embodiment of the present invention for a receiving device 100 communicating via TCP.
  • the method 1700 comprises receiving 1701 a TCP segment 101 carrying an identifier 102 from a transmitting device 110, obtaining 1702 the identifier 102 from the received TCP segment 101, and directly accessing a connection context 103 at the receiving device 100 based on the identifier 102.
  • the connection context 103 relates to a connection between the receiving device 100 and the transmitting device 110.
  • FIG. 18 shows a method 1800 according to an embodiment of the present invention for a transmitting device 110 communicating via TCP.
  • the method 1800 comprises inserting 1801 an identifier 102 into a TCP segment 101 and transmitting the TCP segment 101 carrying the identifier 102 to a receiving device 100.
  • the identifier 102 includes information enabling the receiving device 100 to directly access a connection context 103 at the receiving device 100.
  • the connection context 103 relates to a connection between the transmitting device 110 and the receiving device 100.
  • FIG. 19 shows a method 1900 for automatic management and configuration of an identifier 102 between a transmitting device 110 and a receiving device 100 by exchanging control information.
  • the method 1900 comprises inserting 1901, by the transmitting device 110, identifier capability information 713 into a first TCP segment 810, particularly into a determined TCP option in the first TCP segment 810, and transmitting said first TCP segment 810 to the receiving device 100, inserting 1902, by the receiving device 100, identifier set information 701 into a second TCP segment 800, particularly into a determined TCP option 700 in the second TCP segment 800, and transmitting said second TCP segment 800 to the transmitting device 110, and receiving 1903, by the transmitting device 110, the second TCP segment 800 carrying the identifier set information 701 and obtaining the identifier 102 based on the received identifier set information 701.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Computer Security & Cryptography (AREA)
  • Communication Control (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

La présente invention concerne d'une manière générale un protocole de commande de transmission (TCP) et un moteur de délestage TCP (TOE). En particulier, l'invention concerne un dispositif de réception et un dispositif de transmission pour des communications TCP. Le dispositif de transmission est configuré pour insérer un identifiant dans un segment TCP qu'il envoie au dispositif de réception. Le dispositif de réception est configuré pour obtenir cet identifiant provenant du segment TCP. L'identifiant comprend des informations qui permettent au dispositif de réception d'accéder directement à un contexte de connexion au niveau du dispositif de réception, le contexte de connexion se rapportant à une connexion entre le dispositif de transmission et le dispositif de réception. La présente invention concerne également des procédés correspondants, et en particulier un procédé de gestion et de configuration automatiques de l'identifiant entre le dispositif de transmission et le dispositif de réception par échange d'informations de commande.
PCT/EP2018/062720 2018-05-16 2018-05-16 Dispositif de réception et dispositif de transmission pour une communication tcp WO2019219184A1 (fr)

Priority Applications (2)

Application Number Priority Date Filing Date Title
PCT/EP2018/062720 WO2019219184A1 (fr) 2018-05-16 2018-05-16 Dispositif de réception et dispositif de transmission pour une communication tcp
CN201880093503.4A CN112154633B (zh) 2018-05-16 2018-05-16 用于tcp通信的接收装置和传输装置

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/EP2018/062720 WO2019219184A1 (fr) 2018-05-16 2018-05-16 Dispositif de réception et dispositif de transmission pour une communication tcp

Publications (1)

Publication Number Publication Date
WO2019219184A1 true WO2019219184A1 (fr) 2019-11-21

Family

ID=62235936

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/EP2018/062720 WO2019219184A1 (fr) 2018-05-16 2018-05-16 Dispositif de réception et dispositif de transmission pour une communication tcp

Country Status (2)

Country Link
CN (1) CN112154633B (fr)
WO (1) WO2019219184A1 (fr)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114070799A (zh) * 2020-07-27 2022-02-18 中国电信股份有限公司 优先级暂停帧的处理方法、处理装置及目标网络设备
CN115086397A (zh) * 2022-06-10 2022-09-20 中国银行股份有限公司 一种tcp连接的管理方法及系统
US11784874B2 (en) * 2019-10-31 2023-10-10 Juniper Networks, Inc. Bulk discovery of devices behind a network address translation device
US11805011B2 (en) 2019-10-31 2023-10-31 Juniper Networks, Inc. Bulk discovery of devices behind a network address translation device

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040184459A1 (en) * 2003-03-20 2004-09-23 Uri Elzur Self-describing transport protocol segments
US20070014246A1 (en) * 2005-07-18 2007-01-18 Eliezer Aloni Method and system for transparent TCP offload with per flow estimation of a far end transmit window

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN100484136C (zh) * 2003-10-27 2009-04-29 英特尔公司 网络协议引擎
CN100372339C (zh) * 2004-03-10 2008-02-27 华为技术有限公司 一种gprs网络终端与ip网络设备之间数据传输的方法
CN101707590B (zh) * 2009-09-25 2015-03-11 曙光信息产业(北京)有限公司 基于零拷贝方式的tcp/ip协议报文发送方法和装置
CN101753452B (zh) * 2009-12-17 2012-07-25 福建星网锐捷网络有限公司 连接标识的分配方法、装置和通讯设备
JP6006313B2 (ja) * 2011-09-12 2016-10-12 タタ コンサルタンシー サービシズ リミテッドTATA Consultancy Services Limited 複数の異種デバイスの識別およびコンテキストによる動的サービス協調のためのシステム
CN102413176B (zh) * 2011-11-11 2014-01-01 华为技术有限公司 连接转换方法和设备
CN104618961A (zh) * 2015-01-21 2015-05-13 普天信息技术有限公司 应用于智能电网的单通道tcp/ip头压缩方法及系统
CN106034084B (zh) * 2015-03-16 2020-04-28 华为技术有限公司 一种数据传输方法及装置

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040184459A1 (en) * 2003-03-20 2004-09-23 Uri Elzur Self-describing transport protocol segments
US20070014246A1 (en) * 2005-07-18 2007-01-18 Eliezer Aloni Method and system for transparent TCP offload with per flow estimation of a far end transmit window

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11784874B2 (en) * 2019-10-31 2023-10-10 Juniper Networks, Inc. Bulk discovery of devices behind a network address translation device
US11805011B2 (en) 2019-10-31 2023-10-31 Juniper Networks, Inc. Bulk discovery of devices behind a network address translation device
CN114070799A (zh) * 2020-07-27 2022-02-18 中国电信股份有限公司 优先级暂停帧的处理方法、处理装置及目标网络设备
CN114070799B (zh) * 2020-07-27 2024-04-30 中国电信股份有限公司 优先级暂停帧的处理方法、处理装置及目标网络设备
CN115086397A (zh) * 2022-06-10 2022-09-20 中国银行股份有限公司 一种tcp连接的管理方法及系统

Also Published As

Publication number Publication date
CN112154633B (zh) 2021-12-17
CN112154633A (zh) 2020-12-29

Similar Documents

Publication Publication Date Title
JP4921569B2 (ja) オフロードユニットを使用したtcp接続のためのデータ処理
WO2019219184A1 (fr) Dispositif de réception et dispositif de transmission pour une communication tcp
US8218555B2 (en) Gigabit ethernet adapter
US8427945B2 (en) SoC device with integrated supports for Ethernet, TCP, iSCSI, RDMA and network application acceleration
US9325764B2 (en) Apparatus and method for transparent communication architecture in remote communication
JP4638658B2 (ja) オフロードされたネットワークスタックの状態オブジェクトをアップロードする方法及びそれを同期する方法
US8122140B2 (en) Apparatus and method for accelerating streams through use of transparent proxy architecture
US9219683B2 (en) Unified infrastructure over ethernet
US20030097481A1 (en) Method and system for performing packet integrity operations using a data movement engine
US8180928B2 (en) Method and system for supporting read operations with CRC for iSCSI and iSCSI chimney
US8438321B2 (en) Method and system for supporting hardware acceleration for iSCSI read and write operations and iSCSI chimney
US20070253430A1 (en) Gigabit Ethernet Adapter
EP1759317B1 (fr) Procede et systeme de support d'operations de lecture protocole scsi sur ip et cheminee scsi sur ip
CN114422432A (zh) 基于可靠传输层的可靠覆盖
US6980551B2 (en) Full transmission control protocol off-load
US20050281261A1 (en) Method and system for supporting write operations for iSCSI and iSCSI chimney
US11570257B1 (en) Communication protocol, and a method thereof for accelerating artificial intelligence processing tasks
JP6279970B2 (ja) プロセッサ、通信装置、通信システム、通信方法およびコンピュータプログラム
JP4916482B2 (ja) ギガビット・イーサネット・アダプタ

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18726759

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18726759

Country of ref document: EP

Kind code of ref document: A1