WO2023244872A2

WO2023244872A2 - A transport protocol for in-network computing in support of rpc-based applications

Info

Publication number: WO2023244872A2
Application number: PCT/US2023/033621
Authority: WO
Inventors: Haoyu Song
Original assignee: Futurewei Technologies, Inc.
Priority date: 2023-09-25
Filing date: 2023-09-25
Publication date: 2023-12-21

Abstract

A method implemented by a source host for supporting in-network computing (INC). The method includes receiving first data at a transport layer of the source host from an application layer of the source host; adding a first transport protocol layer for in-network computing (TINC) header to the first data; generating a first packet that includes the first data and the first TINC header; and sending the first packet to a destination host indicated in the first packet.

Description

A Transport Protocol for In-Network Computing in Support of RPC-based Applications

CROSS-REFERENCE TO RELATED APPLICATIONS

[0001] None.

TECHNICAL FIELD

[0002] The present disclosure is generally related to the field of communication networks and, in particular, to a transport protocol for in-network computing in support of Remote Procedure Call (RPC) based applications.

BACKGROUND

[0003] In network communications, a protocol is a set of rules, conventions, and procedures that define how data is transmitted, received, and processed over a network. Protocols ensure that devices and systems can communicate with each other effectively, reliably, and in a standardized manner. In particular, transport protocols play a critical role in ensuring reliable and efficient communication between devices in a network.

SUMMARY

[0004] The present disclosure provides various embodiments of a transport protocol for supporting in-network computing (INC) including a transport protocol for INC in support of RPC based applications.

[0005] A first aspect relates to a method implemented by a source host for supporting INC. The method includes receiving first data at a transport layer of the source host from an application layer of the source host; adding a first Transport protocol layer for In-Network Computing (TINC) header to the first data; generating a first packet comprising the first data and the first TINC header; and sending the first packet to a destination host indicated in the first packet.

[0006] Optionally, in a first implementation according to the first aspect, the method further includes receiving a second packet in response to the first packet, the second packet comprising a second TINC header and second data; and performing one or more actions based on the second data or the second TINC header.

[0007] Optionally, in a second implementation according to the first aspect or any implementation thereof, the method further includes determining that the second packet is an acknowledgment (ACK) packet to the first packet based on an ACK bit in the second TINC header being set and based on a first sequence number in the first TINC header matching a second sequence number in the second TINC header.

[0008] Optionally, in a third implementation according to the first aspect or any implementation thereof, the second data comprises computational results obtained by executing an INC application specified in the first TINC header.

[0009] Optionally, in a fourth implementation according to the first aspect or any implementation thereof, the method further includes determining that a congestion control bit is set in the second TINC; and adjusting a packet transmission window size based on the congestion control bit being set.

[0010] Optionally, in a fifth implementation according to the first aspect or any implementation thereof, the method further includes setting a first service bit in the first TINC header to indicate a first request to an intermediate network device on a forwarding path of the first packet to process the first packet using the INC application on the intermediate network device.

[0011] Optionally, in a sixth implementation according to the first aspect or any implementation thereof, the forwarding path is confined to a single network domain under a control of a single administrative entity.

[0012] Optionally, in a seventh implementation according to the first aspect or any implementation thereof, the single network domain is a data center network.

[0013] Optionally, in an eighth implementation according to the first aspect or any implementation thereof, the INC application supports a remote procedure call (RPC) communication process.

[0014] Optionally, in a ninth implementation according to the first aspect or any implementation thereof, the INC application on the intermediate network device is limited in processing the first packet to procedures that can be performed on a data plane fast path of the intermediate network device.

[0015] Optionally, in a tenth implementation according to the first aspect or any implementation thereof, the first packet includes argument values used as input values to the INC application for processing the first packet.

[0016] Optionally, in an eleventh implementation according to the first aspect or any implementation thereof, the method further includes setting a port number in the first TINC header to indicate the INC application for processing the first packet.

[0017] Optionally, in a twelfth implementation according to the first aspect or any implementation thereof, the method further includes setting a second service bit in the first TINC header to indicate a second request to a server to process the first packet using the INC application.

[0018] A second aspect relates to a method implemented by a network device for supporting in-network computing (INC). The method includes receiving a first packet comprising a first Transport protocol layer for In-Network Computing (TINC) header and first data, the first TINC header specifying parameters for processing the first packet using an INC application; processing the first packet, using the INC application, based on the parameters specified in the first TINC header and the first data to obtain computational results; and sending a second packet comprising a second TINC header and the computational results.

[0019] Optionally, in a first implementation according to the second aspect, the method further includes setting an acknowledgment (ACK) bit and a second sequence number in the second TINC header to indicate that the second packet is an ACK packet to the first packet, wherein the second sequence number is a first sequence number specified in the first TINC header.

[0020] Optionally, in a second implementation according to the second aspect or any implementation thereof, the method further includes determining that a congestion control bit is set in the first TINC header; and setting the congestion control bit in the second TINC header.

[0021] Optionally, in a third implementation according to the second aspect or any implementation thereof, the method further includes determining, prior to processing the first packet, that a first service bit in the first TINC header indicates a first request to process the first packet using the INC application on the netw ork device.

[0022] Optionally, in a fourth implementation according to the second aspect or any implementation thereof, the method further includes identifying, prior to processing the first packet, the INC application based on a source port value in the first TINC header.

[0023] Optionally, in a fifth implementation according to the second aspect or any implementation thereof, the INC application supports a remote procedure call (RPC) communication process.

[0024] Optionally, in a sixth implementation according to the second aspect or any implementation thereof, the network device is a server.

[0025] Optionally, in a seventh implementation according to the second aspect or any implementation thereof, the network device is on a forw arding path of the first packet.

[0026] Optionally, in an eighth implementation according to the second aspect or any implementation thereof, the forwarding path is confined to a single network domain under a control of a single administrative entity. [0027] Optionally, in a ninth implementation according to the second aspect or any implementation thereof, the single network domain is a data center network.

[0028] Optionally, in a tenth implementation according to the second aspect or any implementation thereof, the method further includes configuring the INC application to limit processing of the first packet to procedures that can be performed on a data plane fast path of the network device.

[0029] Optionally, in an eleventh implementation according to the second aspect or any implementation thereof, the method further includes processing the first packet using argument values specified in the first packet as input values to the INC application.

[0030] Optionally, in a twelfth implementation according to the second aspect or any implementation thereof, the method further includes determining, prior to processing the first packet, that processing the first packet does not exceed a predetermined resource usage threshold. [0031] Optionally, in a thirteenth implementation according to the second aspect or any implementation thereof, the method further includes determining, prior to sending the second packet, whether a second service bit in the first TINC header is set indicating a second request for a server to process the first packet; and sending the second packet to the server when the second service bit is set, wherein the second packet further comprises the first data for enabling the server to process the first packet.

[0032] A third aspect relates to a source host comprising a memory storing instructions; and one or more processors coupled to the memory and configured to execute the instructions to cause the source host to perform the method according to the first aspect or any implementation thereof.

[0033] A fourth aspect relates to a net ork device comprising a memory storing instructions; and one or more processors coupled to the memory and configured to execute the instructions to cause the network device to perform the method according to the second aspect or any implementation thereof.

[0034] A fifth aspect relates a computer program product comprising computer-executable instructions stored on a non-transitory computer-readable storage medium, the computerexecutable instructions when executed by a processor of an apparatus, cause the apparatus to perform a method according to the first aspect, the second aspect, or any implementation thereof. [0035] A sixth aspect relates to an apparatus comprising means for performing the method according to the first aspect, the second aspect, or any implementation thereof. [0036] For the purpose of clarity, any one of the foregoing embodiments may be combined with any one or more of the other foregoing embodiments to create a new embodiment within the scope of the present disclosure.

[0037] These and other features will be more clearly understood from the following detailed description taken in conjunction with the accompanying draw ings and claims.

BRIEF DESCRIPTION OF THE DRAWINGS

[0038] For a more complete understanding of this disclosure, reference is now made to the following brief description, taken in connection with the accompanying drawings and detailed description, wherein like reference numerals represent like parts.

[0039] FIG. 1 is a schematic diagram of an end-to-end (E2E) model.

[0040] FIG. 2 is a schematic diagram of an end-to-middle-to-end (E2M2E) model according to an embodiment of the present disclosure.

[0041] FIG. 3 is a schematic diagram of a Transport protocol for In-Network Computing (TINC) header according to an embodiment of the present disclosure.

[0042] FIG. 4 is a flowchart of a method implemented by a source host for supporting INC applications according to an embodiment of the present disclosure.

[0043] FIG. 5 is a flowchart of a method implemented by a network device for supporting INC applications according to an embodiment of the present disclosure.

[0044] FIG. 6 is a schematic diagram of a network device according to an embodiment of the disclosure.

DETAILED DESCRIPTION

[0045] It should be understood at the outset that although an illustrative implementation of one or more embodiments are provided below-, the disclosed systems and/or methods may be implemented using any number of techniques, whether currently known or in existence. The disclosure should in no way be limited to the illustrative implementations, drawings, and techniques illustrated below, including the exemplary designs and implementations illustrated and described herein, but may be modified within the scope of the appended claims along with their full scope of equivalents.

[0046] The present disclosure provides various embodiments of a transport protocol for supporting INC or INC applications. INC means computational capabilities are integrated directly into the network infrastructure, typically within switches, routers, or other networking devices. One purpose of INC is to offload certain processing tasks from traditional end-host devices (such as servers) and perform the processing tasks within the network itself. INC may be used to improve the application performance (e.g., lower latency and higher throughput), reduce the system cost (e.g., lower power consumption and fewer servers), and provide other benefits (e.g., reduce server load).

[0047] FIG. 1 is a schematic diagram of an E2E model 100. As shown in FIG. 1, ty pically, E2E communication occurs between a source host 110 and destination host 120. For example, the source host 110 may be a client/end-user device and the destination host 120 may be a server that provides a service to the client device.

[0048] The source host 110 and the destination host 120 communicate by sending messages to each other through a communication network 140. The communication network 140 may include various types of networks including different service provider networks, wired or wireless networks, private or public network netw orks. and the Internet. In general, the messages are communicated in packets or data packets. For example, an application (e.g., a w eb brow ser, email client, or other application) at an application layer 112 of the source host 110 may generate data 150 that is to be sent to the destination host 120. A transport layer 114 of the source host 110 receives the data from the application layer 112, breaks the data down into smaller units such as segments, and adds transport layer header information like source and destination port numbers, sequence numbers, and error checking information to each segment. The transport layer 114 is responsible for establishing and terminating connections, error handling, flow control, and other essential aspects of communication. It should be noted that other layers such as a presentation layer or a session layer may exist between the application layer 112 and the transport layer 114 in practical applications. A network layer 116 receives the segments from the transport layer 114 and encapsulates them into packets. Each packet includes a netw ork layer header containing information such as the source and destination Internet Protocol (IP) addresses, as well as information for routing the packet through the network. A link/physical layer 1 18 may further encapsulates the packet into frames (e.g., Ethernet frames for Ethernet networks) that include frame/link layer header information such as source and destination media access control (MAC) addresses, as w ell as control information for managing the physical transmission. The link/physical layer 118 converts the frames into bits (0s and 1 s) for transmission over the physical medium.

[0049] The packets/frames are routed through the network device 130 of the communication network 140. While one network device 130 is shown in FIG. 1, one or more network devices 130 may be included in the communication network 140 in practical applications. Each network device 130 may be referred to herein as a hop. The network device 130 may be a router or switch in the communication network 140. In general, the primary function of the network device 130 is to route packets between the source host 110 and the network device 130. The information used to route that packet is in the network layer header of the packet. Thus, when the network device 130 receives the bits at the link/physical layer 138, the network device 130, at the network layer 136, converts the bits back into packets to obtain the information contained in the network layer header of the packet (e.g., the destination IP address). The network device 130 is configured to make routing decisions (e.g., determine the next hop or path to send the packet) based on the information contained in the network layer header using one or more locally stored routing tables. The network device 130 is not concerned about the information contained in the transport layer header or application layer and therefore does not process the packet beyond the network layer 136. The link/physical layer 138 then converts the packet back into bits for transmission to a next hop/node (the next hop/node may be another network device 130 or may be the destination host 120).

[0050] When the bits reach the destination host 120, the destination host 120 performs the reverse process to convert the bits back into frames at link/physical layer 128. packets at network layer 126, segments at transport layer 124, and finally, the data 150 is reconstructed at application layer 122. The same process, performed in reverse, applies to data 154 sent from the destination host 120 to the source host 110.

[0051] FIG. 2 is a schematic diagram of an E2M2E model 200 according to an embodiment of the present disclosure. In some embodiments, the E2M2E model 200 is confined to a single network domain under the control of a single administrative entity (e.g., a data center network or an access network). The E2M2E model 200 breaks the E2E model 100 between the source host 110 and the destination host 120 as described in FIG. 1. As shown in FIG. 2, the source host 110 and the destination host 120 respectively includes the application layer 112/122, the transport layer 114/124, the network layer 116/126, and the link/physical layer 118/128 as described in FIG. 1 for communicating packets between the source host 110 and the destination host 120. However, in the E2M2E model 200, the packets are routed through a communication network 170. The communication network 170 is similar to the communication network 140, except the communication network 170 supports/includes one or more network devices 160.

[0052] The network device 160 is a programmable network device that supports INC. The network device 160 may also be referred to as an on-path programmable network device. An on-path programmable network device is a network device (e.g., switches and routers between the source host 110 and the destination host 120) that can be dynamically programmed to perform customized packet processing functions directly on the data path as network traffic flows through the network device. The network device 160 may support INC applications that are configured to perform specific computational procedures, processing tasks, or other functions directly within the network infrastructure using data contained in a packet. For example, in some embodiments, the network device 1 0 may support INC applications that follow the communication paradigm (i.e., framework/exchange/process) of a remote procedure call (RPC), referred to herein as INC RPC-based applications, where each packet is an individual message that can be processed independently (i.e.. the packet includes the data needed for the INC RPC-based application to satisfy/perform one or more parts of the procedure or service requested by the source host 110). For instance, in an RPC, a client such as the source host 110 sends a message with arguments/parameters to a server such as the destination host 120. The server performs the requested computation/procedure based on the arguments and sends a response back to the client containing the computation result of the procedure.

[0053] As shown in FIG. 2, to support INC RPC-based applications, the network device 160 is configured to obtain the data in the packet at the application layer 132, which means that the network device 160 should be configured to support transport layer functions at the transport layer 134. However, currently, there is no existing transport protocol that provides reliable transport for an RPC request and response/reply for INC RPC-based applications. As described above, a transport protocol or transport layer protocol (or layer 4 protocol) is a set of rules, convention/standard, or format that dictate how⁷ data is transmitted, received, and acknowledged between devices over a network. For instance, a transport protocol may define the format of data packets, the mechanisms for establishing and terminating connections, error handling, flow control, and other essential aspects of communication. Currently, all transport protocol w ere designed to support the E2E model 100 in FIG. 1 and not the E2M2E model 200 in FIG. 2. For example, common transport protocols include the Transmission Control Protocol (TCP), User Datagram Protocol (UDP), and Quick UDP Internet Connection (QUIC). TCP is a connection- oriented protocol and provides reliable end-to end delivery of data packets. TCP is not suitable for INC applications because TCP is too complex for in-network handling. Additionally, certain INC RPC-based applications of the present disclosure, may modify a payload of a packet (i.e., the actual data carried in a packet), which causes a break to an end-to-end TCP delivery stream between a client and a server. Further, any dropped packet in a TCP stream sensed by the receiver/server must be retransmitted, which is inconsistent with certain embodiments of the present disclosure, where an INC RPC-based application of the network device 160 may terminate routing of a packet, perform the requested computational processing, and return the computing result directly to the source host 110. UDP is simpler than TCP because UDP does not establish a connection before sending data. However, UDP is not reliable, lacks multiplexing capability; and fails to support sufficient transport layer functions for most INC RPC-based applications. QUIC adds multiplexing and secure connection to UDP, but encrypts portions of a packet, which prohibits INC. Even without encryption, QUIC header information does not provide sufficient information to support RPC-based applications.

[0054] Accordingly, the present disclosure provides a new transport protocol for INC applications (referred to as TINC). As described herein, certain embodiments of TINC supports multiplexing, a reliability mechanism, open flow control (i.e., congestion control mechanism may be user-defined), and are INC aware (i.e., are transparent to algorithms for packet loss detection (in network device, client, and in server). In some embodiments, given the limited resources and common processing constraints in cunent programmable network devices, the computational processing performed by a TINC-supported application may be limited to procedures that can be performed on the data plane fast path of the network device. The data plane fast path (or forwarding plane) is a high-speed path through the router/switch. Packets on the data plane fast path need minimal processing, whereas packets in a slow path of the network device need more complex processing (e.g., control plane tasks). However, advancements in programmable network devices may eliminate such limitation. In some embodiments, TINC is configured to virtualize the network (e.g., the communication network 170 in FIG. 2) as a single logical middle point. That is, if multiple network devices 160 collaborate on a computing task, the multiple network devices 160 are considered as a single network device 160. In some embodiments, packet forwarding among the multiple network devices 160 that perform collaborative computing processing are handled by the network layer 136 using routing techniques such as segment routing (SR) or service function chaining (SFC). In SR, a source node (e.g., the source host 110) may include segment list in packet that specifies a path, or segments, that the packet should traverse. In SFC, a packet is forwarded through a defined service function chain (i.e., an ordered sequence of processing tasks) based on specific headers in the packet.

[0055] As described above, certain embodiments of the present disclosure are intended to support RPC based INC applications. In addition, embodiments of the present disclosure may support INC applications that provide services in conjunction with, or independent, services provided by a server. For example, Table 1 provides three service models supported by various embodiments of the present disclosure.

Table 1

[0056] In synchronous collaboration (SC) service model, a set of clients each sends a piece/portion of data for performing a procedure to a server at roughly the same time (i.e., synchronized) . The data pieces are combined to produce a global result that can be distributed back to the set of clients. As shown in Table 1, embodiments of the present disclosure may support INC applications that perform synchronous collaboration on just the network device (i.e., Device Only (DO) mode) or on both the network device and the server (i.e., Device + Server (DS) mode). In DO mode, as shown in FIG. 2, an INC application on the network device 160 receives a request 172 from the source host 110 (e.g., client), performs a computing task associated with the request 172, and returns the result of the computing task in a response 174 directly to source host 110. DO mode mainly aims to reduce latency. In DS mode, as show n in FIG. 2, the INC application on the network device 160 receives a request 172 from the source host 110. partially completes a computing task associated with the request 172, and sends the intermediate result 176 to the destination host 120 (e.g., server) to complete the computing task associated with the request 172. For example, the INC application may perform a portion of the computing and forward the intermediate result 176 to the destination host 120, where the intermediate result 176 is combined with other results obtained by the destination host 120 to determine a final result. The destination host 120 then returns the final result of the computing task in a response 180 to the source host 110 through the one or more network devices 160. In some embodiments, the response 180 does not traverse the same path or the same network devices 160 traversed by the request 172. Alternatively, in some embodiments, the server may send intermediate results to the network device for combining the intermediate results with results obtained by the INC application to obtain a final result. DS mode mainly aims to reduce the traffic bandwidth and server load. An example of an SC procedure is AllReduce, which collects data from different processing units to combine them into a global result.

[0057] In asynchronous collaboration (AC) service model, a set of clients each sends multiple data items to a server at different times. The processing result can be computed when all the data items are received. Due to the limited resources of network devices, in general, AC processing needs resources that exceed a capability of a network device and cannot be performed by DO mode. For example, in certain embodiments, due to the limited resources and common processing constraints in programmable network devices, the TINC applications should not maintain a packet buffer or complex inter-packet states. In some embodiments, TINC applications are configured to forward complex tasks to the end host/server for processing. For example, the TINC application may forward any task that needs or exceeds a predefined percentage or amount of resources of a network device.

[0058] An example of an AC procedure is MapReduce, which divides input data into small chunks that are processed independently by multiple map tasks running in parallel across different nodes in a distributed computing environment.

[0059] In individual request (IR) service model, a client sends individual requests to a server and receives a response for each request. As shown in Table 1, embodiments of the present disclosure are capable of handling the IR service model using DO mode, where the INC application performs a computing task associated with a request and returns the final result of the computing task directly to the client. An example of an IR procedure is NetCache, which is an in-network key-value cache that leverages the power and flexibility of programmable switches to cache query⁷ results for addressing load imbalance.

[0060] FIG. 3 is a schematic diagram of a TINC header 300 according to an embodiment of the present disclosure. The TINC header 300 includes information for enabling a transport protocol for INC applications according the one or more embodiments the present disclosure. As described in FIG. 1, the transport layer 114 of the source host 110 may receive data from the application layer 112, break the data down into smaller data units, and add the TINC header 300 to each data unit. The data units are then passed to the network layer 116 for generating packets. [0061] As shown in FIG. 3, the TINC header 300 includes a Source Port field 302, a Destination Port field 304, a Flags field 306, a Job Identifier (JID) field 308, a Payload Length (LEN) field 310, aversion (V) field 312, a Service Requested - Switch (SRX) field 314, a Service Requested - Server (SRS) field 316, a Service Conducted - Switch (SCX) field 318, a Service Conducted - Server (SCS) field 320, an Explicit Congestion Notification (ECN) field 322, a Resent field (RE) field 324, an ACK field 326, a Reserved (RESV) field 326, a Sequence Number (SN) field 330, a Window Size (WIN) field 332, and a First Packet (FP) field 334. It should be noted the arrangement/sequence, naming convention, or size of the fields of the TINC header 300 illustrated in FIG. 3 may be different in other embodiments of the present disclosure. Such modifications to the TINC header 300 are intended to be within the scope of the claims of the present disclosure. Additionally, in some embodiments, the TINC header 300 may include additional fields not shown in FIG. 3 or may exclude one or more fields illustrated in FIG. 3.

[0062] In the depicted embodiment, the Source Port field 302 is a 16-bit field that contains a port number that identifies the source connection (e.g., the port number of the source host 110 that transmitted the packet). The Destination Port field 304 is a 16-bit field that contains a port number that identifies a specific application to perform INC processing on the packet.

[0063] In an embodiment, the Flags field 306 is an 8-bit field that is used for connection establishment/termination and other purposes.

[0064] The JID field 308 is a 12-bit field that contains an identifier that is used to identify a particular job for an application. In an embodiment, the JID is used to distinguish between concurrent jobs for the same application.

[0065] The LEN field 310 is a 12 -bit field that contains a payload length indicating the length of the payload after the TINC header 300. In some embodiments, the payload length can be up to 4 kilobytes (KB). 1 KB is equal to 1024 bytes or 8192 bits. The payload contains the data being carried by the packet.

[0066] The V field 312 is a 4-bit field that indicates the current version of the TINC protocol specification (e.g., a value of 1 indicates the first version of the TINC protocol specification).

[0067] The SRX field 314 is a 1 -bit field that is set to request the switch or network device to process the packet in network. The SRS field 316 is a 1 -bit field that is set to request the server to process the packet. As described above, a client/source host can request the network device, the server, or both to process the request packet by setting the bit in the SRX field 314 and the SRS field 316 accordingly. In an embodiment, if the SRX field 314 or SRS field 316 is not set, the respective device is configured to not process the packet.

[0068] The SCX field 318 is a 1-bit field that is set by the switch or network device in a response packet to indicate whether the request packet is processed by the switch/network device. The SCS field 320 is a 1 -bit field that is set by the server in a response packet to indicate whether the request packet is processed by the server. As stated above, in some circumstance, even when the SRX field 314 and/or the SRS field 316 is set by the client, the network device/server may not process the request packet due to various reasons (e.g., a server may not process a packet because it is short of resource, or a switch may not process a packet because the switch has performed all the job).

[0069] The ECN field 322 is a 1-bit field that is set by a network device on the forwarding path of the request packet to indicate the packet (i.e., the network device on the forwarding path) is experiencing congestion. This bit in the ECN field 322 can be reflected back (i.e., correspondingly set) in an acknowledgment packet or response packet to inform the client/source host of the congestion on the forwarding. The client can then perform congestion control (e.g., perform a window adjustment to reduce a packet transmission rate) to ease congestion and avoid packet loss. [0070] The RE field 324 is a 1 -bit field that indicates that the packet is a resent packet. For example, when a lost packet is detected, the client will resend the lost packet with the bit in the RE field set to 1 to indicate that the packet is a resent packet.

[0071] The ACK field 326 is a 1-bit field that is set to indicate that packet is an acknowledgment packet.

[0072] The RESV field 328 is a 5-bil field that is reserved for future use.

[0073] The SN field 330 is a 16-bit field that contains a sequence number for the forwarding/request packet. The same sequence number is set in an acknowledgment packet to indicate that the request packet containing the sequence number was received.

[0074] The WIN field 332 is a 16-bit field that is set by the client in the forwarding/request packet to specify a cunent window size in the number of packets. The FP field 334 is a 16-bit field that is set by the client in the forwarding/request packet and contains a sequence number of the first packet to be acknowledged. In an embodiment, when the first packet to be acknowledged is processed/advances or the window size increases, the client can send more packets.

[0075] As shown above, the TINC header 300 provides signaling functions and an in- network computing load-balancing scheme for INC applications by enabling a client to request in-network and/or server processing, and receive acknowledgment of the processing by setting the various service requested/conducted fields. The TINC header 300 also provides a reliability function that enables packet loss detection (e.g., not receiving an acknowledgment packet corresponding to a particular sequence number of a forwarding packet) and retransmission of lost packet (e.g., setting the bit in the RE field 324). The TINC header 300 also provides congestion control using the WIN field 332 to adjust a number of outstanding packets in the network (i.e., a window) to limit the rate of packets transmitted by the client to reduce congestion. In the depicted embodiment, the TINC header 300 also includes the ECN field 322 that allows network devices and endpoints to signal each other about network congestion before packet loss occurs. This feature is designed to improve the quality of service and overall performance in IP networks, particularly in situations where network congestion is starting to occur. Traditionally, when a network becomes congested, routers might drop packets to alleviate the congestion, which leads to retransmissions and reduced network efficiency. ECN provides an alternative mechanism to handle congestion before experiencing packet loss. In other embodiments, the TINC header 300 may include a different type of network congestion control bit. In particular, embodiments of the present disclosure are not limited to a particular congestion control mechanism, and is open (can be used) with any type of congestion control mechanism. [0076] In certain embodiments, due to network device resource limitations, any complex logic/computational processing is not implemented in-network, but instead is performed in end servers. Embodiments of the disclosed TINC is general and applicable for use with all INC applications with the RPC partem and is extensible for future enhancements. TINC simplifies application design by delegating the transport functions to a standard layer, and promotes the development and adoption of INC solutions.

[0077] FIG. 4 is a flowchart of a method 400 implemented by a source host for supporting INC applications according to an embodiment of the present disclosure. For example, the method 400 may be implemented by the source host 110 in FIG. 2. The method 400, at step 402, includes receiving, by the source host, data at a transport layer of the source host from an application layer of the source host. The data may have been generated by an application executed on the source host and is to be sent to a destination host. The source host, at step 404, adds a transport protocol layer for in-network computing (TINCadd) header to the data Additionally, the source host may perform other tasks associated with the data at the transport layer such as, but not limited to, breaking the data into smaller data units. The TINC header specifies parameters that indicate whether an in-network computing (INC) application is to process the packet. As described above, INC applications are applications implemented by an on-path programmable network device such as switch or router. For example, the TINC header may include a destination port field indicating a port value that identifies a particular INC application to perform processing on the packet. In some embodiments, the port value corresponding to the particular INC application may be assigned by the Internet Assigned Numbers Authority (IANA). In other embodiments, the port value corresponding to the particular INC application may be assigned by an administrative entity or administrator of a network domain in which the TINC header is employed. For example, the disclosed embodiments may be implemented in a data center network and an administrator of the data center network may assign port numbers to certain INC applications implemented by network devices in the data center network. The TINC header may also specify whether the client requested the packet processing service to be performed at an intermediate network device or at the server/destination host. Additionally, as described above, the TINC header may also include parameters in one or more fields for enabling congestion control and ensuring reliable transport.

[0078] At step 406, the source host, at the network layer, creates a packet that includes the data, the TINC header, and a network layer header (e.g., an IP version 4 (IPv4) or an IP version 6 (IPv6) network layer header). Embodiments of the disclosed TINC header may be used in conjunction with any network layer protocol header. The network layer header contains information for routing packets within the network (e.g., source and destination IP addresses, time-to-live (TTL). type of service (ToS), and other control and addressing information).

[0079] The source host, at step 408, sends the packet to a destination host indicated in the packet. At step 410, the source host receives an ACK packet (or response packet) to the packet that was sent at step 408 (i.e., the request or forwarding packet). The ACK packet also includes a TINC header. The TINC header may indicate whether the packet processing service requested by the forwarding packet was performed at an intermediate network device, at the server, or both. In some embodiments, the ACK packet also includes the results of the packet processing service. In other embodiments, the ACK packet simply acknowledges that the request packet was received or processed, and the results of the of the packet processing service may be transmitted separately in one or more additional ACK packets or response packets.

[0080] At step 412, the source host performs one or more actions based on the data contained in the ACK packet. For example, if the ACK packet includes results of the packet processing service, the source host may analyze the results and perform one or more actions based on the results. For instance, if the requested packet processing service requests that the service or INC application authentic a user based on credentials provided in the packet, and the result indicates that the user is authenticated, then the source host may enable the user to access the source host or access other requested resources. In another non-limiting example, the result may be an answer to a mathematical calculation based on parameters provided by the source host in the request packet, and the source node may utilize the result to perform additional calculations or to adjust a particular parameter value. The source host may also resend a packet if the TINC header in the ACK packet indicates that a packet was lost (i.e., not received by the destination host) or adjust a transmission window' for easing netw ork congestion if the ECN bit in the TINC header is set.

[0081] FIG. 5 is a flowchart of a method 500 implemented by a network device for supporting INC applications according to an embodiment of the present disclosure. For example, the method 500 may be implemented by a router, switch, or server such as the network device 160 or the destination host 120 in FIG. 2. The method 500, at step 502, includes receiving, by the network device, a first packet that includes a first TINC header and first data. The first TINC header specifies parameters for processing the first packet using an INC application as described above.

[0082] At step 504, the network device processes the first packet, using the INC application, based on the parameters specified in the first TINC header and the first data to obtain computational results. For example, in an embodiment, w hen the TINC header indicates a source port value corresponding to an INC application of the network device, and when the TINC header indicates a service process request to be performed at the network device, then the network device processes, when resources of the netw ork device are available, the first packet using the INC application of the netw ork device.

[0083] The network device, at step 506, sends a second packet comprising a second TINC header and the computational results. Depending on the parameters in the second TINC header, the second packet may be an ACK packet or a response packet back to the source host/ client that requested the packet processing (i.e., set bit the ACK field 326 and include the same sequence number of the forwarding packet). As described above, the network device may also set certain bits or fields in the TINC header to indicate that the service was performed by the network device, or that the packet experienced network congestion on the forwarding path. Alternatively, at step 506, the network device may send the second packet that includes the computational results to another netw ork device on the forw arding path or to a server for additional processing, or both.

[0084] FIG. 6 is a schematic diagram of a netw ork device 600 according to an embodiment of the disclosure. The network apparatus 600 is suitable for implementing the disclosed embodiments as described herein. For example, the network apparatus 600 may be configured to perform the functions, procedures, steps, or methods of the source host 110, the network device 160, or the destination host 120 as described herein.

[0085] In the depicted embodiment, the network device 600 comprises ingress ports/ingress means 610 and receiver units (Rx)/receiving means 620 for receiving data; a processor, logic unit, or central processing unit (CPU)/processing means 630 to process the data; transmitter units (Tx)/transmitting means 640 and egress ports/egress means 650 for transmitting the data; and a memory/memory means 660 for storing the data. The network device 600 may also comprise optical-to-electrical (OE) components and electrical-to-optical (EO) components coupled to the ingress ports/ingress means 610, the receiver units/receiving means 620, the transmitter units/transmitting means 640, and the egress ports/egress means 650 for egress or ingress of optical or electrical signals.

[0086] The processor/processing means 630 is implemented by hardware and softw are. The processor/processing means 630 may be implemented as one or more CPU chips, cores (e.g., as a multi-core processor), field-programmable gate arrays (FPGAs), application specific integrated circuits (ASICs), programmable pipeline processors (PPPs), and digital signal processors (DSPs). The processor/processing means 630 is in communication with the ingress ports/ingress means 610, receiver units/receiving means 620, transmitter units/transmitting means 640, egress ports/egress means 650, and memory/memory means 660. The processor/processing means 630 comprises an in-network computing transport protocol module 670. The in-network computing transport protocol module 670 comprises instructions that when executed by the processor/processing means 630 causes the network device 600 to perform the methods disclosed herein. In some embodiments, the instructions for performing the methods disclosed herein are programmed directly on the processor/processing means 630. For example, one or more PPPs may be programmed to perform a processing stage or function according to embodiments of the present disclosure. Thus, the inclusion of the in-network computing transport protocol module 670 therefore provides a substantial improvement to the functionality of the network device 600 and effects a transformation of the network device 600 to a different state. Alternatively, the in- network computing transport protocol module 670 may be stored in the memory/memory means 660 and one or more instructions included in the in-network computing transport protocol module 670 for performing one or more aspects of the disclosed embodiments may be executed by the processor/processing means 630.

[0087] The network device 600 may also include input and/or output (I/O) devices/I/O means 680 for communicating data to and from a user. The I/O devices VO means 680 may include output devices such as a display for displaying video data, speakers for outputting audio data, etc. The I/O devices VO means 680 may also include input devices, such as a keyboard, mouse, trackball, etc., and/or corresponding interfaces for interacting with such output devices. [0088] The memory/memory means 660 comprises one or more disks, tape drives, and solid- state drives and may be used as an over-flow data storage device, to store programs when such programs are selected for execution, and to store instructions and data that are read during program execution. The memory/memory means 660 may be volatile and/or non-volatile and may be read-only memory⁷ (ROM), random access memory (RAM), ternary⁷ content-addressable memory (TCAM), and/or static random-access memory (SRAM).

[0089] While several embodiments have been provided in the present disclosure, it may be understood that the disclosed systems and methods might be embodied in many other specific forms w ithout departing from the spirit or scope of the present disclosure. The present examples are to be considered as illustrative and not restrictive, and the intention is not to be limited to the details given herein. For example, the various elements or components may be combined or integrated in another system or certain features may be omitted, or not implemented.

[0090] In addition, techniques, systems, subsystems, and methods described and illustrated in the various embodiments as discrete or separate may be combined or integrated with other systems, components, techniques, or methods without departing from the scope of the present disclosure. Other examples of changes, substitutions, and alterations are ascertainable by one skilled in the art and may be made without departing from the spirit and scope disclosed here.

[0091] While several embodiments have been provided in the present disclosure, it may be understood that the disclosed systems and methods might be embodied in many other specific forms without departing from the spirit or scope of the present disclosure. The present examples are to be considered as illustrative and not restrictive, and the intention is not to be limited to the details given herein. For example, the various elements or components may be combined or integrated in another system or certain features may be omitted, or not implemented.

[0092] In addition, techniques, systems, subsystems, and methods described and illustrated in the various embodiments as discrete or separate may be combined or integrated with other systems, components, techniques, or methods without departing from the scope of the present disclosure. Other examples of changes, substitutions, and alterations are ascertainable by one skilled in the art and may be made without departing from the spirit and scope disclosed herein.

Claims

CLAIMS What is claimed is:

1. A method implemented by a source host for supporting in-network computing (INC), the method comprising: receiving first data at a transport layer of the source host from an application layer of the source host; adding a first Transport protocol layer for In-Network Computing (TINC) header to the first data; generating a first packet comprising the first data and the first TINC header; and sending the first packet to a destination host indicated in the first packet.

2. The method according to claim 1, further comprising: receiving a second packet in response to the first packet, the second packet comprising a second TINC header and second data; and performing one or more actions based on the second data or the second TINC header.

3. The method to any of claims 1-2, further comprising determining that the second packet is an acknowledgment (ACK) packet to the first packet based on an ACK bit in the second TINC header being set and based on a first sequence number in the first TINC header matching a second sequence number in the second TINC header.

4. The method according to any of claims 1-3, wherein the second data comprises computational results obtained by executing an INC application specified in the first TINC header.

5. The method according to any of claims 1-4, further comprising: determining that a congestion control bit is set in the second TINC; and adjusting a packet transmission window size based on the congestion control bit being set.

6. The method according to any of claims 4-5, further comprising setting a first service bit in the first TINC header to indicate a first request to an intermediate network device on a forwarding path of the first packet to process the first packet using the INC application on the intermediate network device.

7. The method according to claim 6, wherein the forwarding path is confined to a single network domain under a control of a single administrative entity.

8. The method according to claim 7, wherein the single network domain is a data center network.

9. The method according to any of claims 4-8, wherein the INC application supports a remote procedure call (RPC) communication process.

10. The method according to any of claim 4-9, wherein the INC application on the intermediate network device is limited in processing the first packet to procedures that can be performed on a data plane fast path of the intermediate network device.

11. The method according to any of claim 4-10, wherein the first packet includes argument values used as input values to the INC application for processing the first packet.

12. The method according to any of claims 4-11, further comprising setting a port number in the first TINC header to indicate the INC application for processing the first packet.

13. The method according to any of claims 4-12, further comprising setting a second service bit in the first TINC header to indicate a second request to a server to process the first packet using the INC application.

14. A method implemented by a network device for supporting in-network computing (INC), the method comprising: receiving a first packet comprising a first Transport protocol layer for In-Network Computing (TINC) header and first data, the first TINC header specifying parameters for processing the first packet using an INC application; processing the first packet, using the INC application, based on the parameters specified in the first TINC header and the first data to obtain computational results; and sending a second packet comprising a second TINC header and the computational results.

15. The method according to claim 14, further comprising setting an acknowledgment (ACK) bit and a second sequence number in the second TINC header to indicate that the second packet is an ACK packet to the first packet, wherein the second sequence number is a first sequence number specified in the first TINC header.

16. The method according to any of claims 14-15, further comprising: determining that a congestion control bit is set in the first TINC header; and setting the congestion control bit in the second TINC header.

17. The method according to any of claims 14-16, further comprising, determining, prior to processing the first packet, that a first service bit in the first TINC header indicates a first request to process the first packet using the INC application on the network device.

18. The method according to any of claims 14-17, further comprising, identifying, prior to processing the first packet, the INC application based on a source port value in the first TINC header.

19. The method according to any of claims 14-18, wherein the INC application supports a remote procedure call (RPC) communication process.

20. The method according to any of claims 14-19, wherein the network device is a server.

21. The method according to any of claims 14-19, wherein the network device is on a forwarding path of the first packet.

22. The method according to claim 21 , wherein the forwarding path is confined to a single network domain under a control of a single administrative entity.

23. The method according to claim 22 wherein the single network domain is a data center network.

24. The method according to any of claims 21-23, further comprising configuring the INC application to limit processing of the first packet to procedures that can be performed on a data plane fast path of the network device.

25. The method according to any of claims 14-24, further comprising processing the first packet using argument values specified in the first packet as input values to the INC application.

26. The method according to any of claims 21-25, further comprising, determining, prior to processing the first packet, that processing the first packet does not exceed a predetermined resource usage threshold.

27. The method according to any of claims 21-26, further comprising: determining, prior to sending the second packet, whether a second service bit in the first TINC header is set indicating a second request for a server to process the first packet; and sending the second packet to the server when the second service bit is set, wherein the second packet further comprises the first data for enabling the server to process the first packet.

28. A source host, comprising: a memory storing instructions; and one or more processors in communication with the memory and configured to execute the instructions to cause the source host to: receive first data at a transport layer of the source host from an application layer of the source host; add a first transport protocol layer for in-network computing (TINC) header to the first data; generate a first packet comprising the first data and the first TINC header; and send the first packet to a destination host indicated in the first packet;

29. The source host according to claim 28, wherein the one or more processors are further configured to execute the instructions to cause the source host to: receive a second packet in response to the first packet, the second packet comprising a second TINC header and second data; and perform one or more actions based on the second data or the second TINC header.

30. The source host according to any of claims 28-29, wherein the one or more processors are further configured to execute the instructions to cause the source host to determine that the second packet is an acknowledgment (ACK) packet to the first packet based on an ACK bit in the second TINC header being set and based on a first sequence number in the first TINC header matching a second sequence number in the second TINC header.

31. The source host according to any of claims 28-30, wherein the second data comprises computational results obtained by executing an INC application specified in the first TINC header.

32. The source host according to any of claims 28-31, wherein the one or more processors are further configured to execute the instructions to cause the source host to: determine that a congestion control bit is set in the second TINC; and adjust a packet transmission window size based on the congestion control bit being set.

33. The source host according to any of claims 31-32, wherein the one or more processors are further configured to execute the instructions to cause the source host to set a first service bit in the first TINC header to indicate a first request to a network device on a forwarding path of the first packet to process the first packet using the INC application on the network device.

34. The source host according to claim 33, wherein the forwarding path is confined to a single network domain under a control of a single administrative entity.

35. The source host according to claim 34, wherein the single network domain is a data center network.

36. The source host according to any of claims 31-35, wherein the INC application supports a remote procedure call (RPC) communication process.

37. The source host according to any of claim 31-36, wherein the INC application on the network device is limited in processing the first packet to procedures that can be performed on a data plane fast path of the network device.

38. The source host according to any of claim 31-37, wherein the first packet includes argument values used as input values to the INC application for processing the first packet.

39. The source host according to any of claims 31-38, wherein the one or more processors are further configured to execute the instructions to cause the source host to set a port number in the first TINC header to indicate the INC application for processing the first packet.

40. The source host according to any of claims 31-39, wherein the one or more processors are further configured to execute the instructions to cause the source host to set a second service bit in the first TINC header to indicate a second request to a server to process the first packet using the INC application.

41. A network device, comprising: a memory storing instructions; and one or more processors in communication with the memory and configured to execute the instructions to cause the network device to: receive a first packet comprising a first transport protocol layer for in-network computing (TINC) header and first data, the first TINC header specifying parameters for processing the first packet using an INC application; process the first packet, using the INC application, based on the parameters specified in the first TINC header and the first data to obtain computational results; and send a second packet comprising a second TINC header and the computational results.

42. The network device according to claim 41, wherein the one or more processors are further configured to execute the instructions to cause the network device to set an acknowledgment (ACK) bit and a second sequence number in the second TINC header to indicate that the second packet is an ACK packet to the first packet, wherein the second sequence number is a first sequence number specified in the first TINC header.

43. The network device according to any of claims 41-42, wherein the one or more processors are further configured to execute the instructions to cause the network device to: determine that a congestion control bit is set in the first TINC header; and set the congestion control bit in the second TINC header.

44. The network device according to any of claims 41-43, wherein the one or more processors are further configured to execute the instructions to cause the network device to, determine, prior to processing the first packet, that a first service bit in the first TINC header indicates a first request to process the first packet using the INC application on the network device.

45. The network device according to any of claims 41-44, wherein the one or more processors are further configured to execute the instructions to cause the network device to, identify, prior to processing the first packet, the INC application based on a source port value in the first TINC header.

46. The network device according to any of claims 41-45, wherein the INC application is configured to support a remote procedure call (RPC) communication process.

47. The network device according to any of claims 41 -46, wherein the network device is a server.

48. The network device according to any of claims 41-46, wherein the network device is on a forwarding path of the first packet.

49. The network device according to claim 48, wherein the forwarding path is confined to a single network domain under a control of a single administrative entity.

50. The network device according to claim 49 wherein the single network domain is a data center network.

51. The network device according to any of claims 48-50, wherein the one or more processors are further configured to execute the instructions to cause the network device to configure the INC application to limit processing of the first packet to procedures that can be performed on a data plane fast path of the network device.

52. The network device according to any of claims 41-51, wherein the one or more processors are further configured to execute the instructions to cause the network device to process the first packet using argument values specified in the first packet as input values to the INC application.

53. The network device according to any of claims 48-52, wherein the one or more processors are further configured to execute the instructions to cause the network device to, determine, prior to processing the first packet, that processing the first packet does not exceed a predetermined resource usage threshold.

54. The network device according to any of claims 48-53, wherein the one or more processors are further configured to execute the instructions to cause the network device to: determine, prior to sending the second packet, whether a second service bit in the first TINC header is set indicating a second request for a server to process the first packet; and send the second packet to the server when the second service bit is set, wherein the second packet further comprises the first data for enabling the server to process the first packet.

55. A computer program product comprising computer-executable instructions stored on a non- transitory computer-readable storage medium, the computer-executable instructions when executed by a processor of an apparatus, cause the apparatus to perform a method according to any of claims 1-27.

56. A source host comprising means for performing the method of any of claims 1-13.

57. A network device comprising means for performing the method of any of claims 14-27.