CN116325708A

CN116325708A - Data processing and transmitting method and related equipment

Info

Publication number: CN116325708A
Application number: CN202080105542.9A
Authority: CN
Inventors: 顾华玺; 刁兴龙; 余晓杉; 唐德智
Original assignee: Huawei Technologies Co Ltd
Current assignee: Huawei Technologies Co Ltd
Priority date: 2020-09-30
Filing date: 2020-09-30
Publication date: 2023-06-23
Also published as: WO2022067791A1

Abstract

A data processing method, a data transmission method and related equipment, wherein the data processing method is applied to a host and comprises the following steps: generating a first message segment, and determining a target TCP stream to which the first message segment belongs; acquiring target flow information matched with the target TCP flow, wherein the target flow information comprises a time threshold corresponding to the target TCP flow and a time stamp of a second message segment in the target TCP flow; comparing the difference between the time stamp of the first message segment and the time stamp of the second message segment with the time threshold; and determining whether the first message segment and the second message segment are divided into the same Flowlet according to the comparison result. By adopting the method, the data transmission efficiency and the accuracy can be improved.

Description

Data processing and transmitting method and related equipment

Technical Field

The present disclosure relates to the field of communications technologies, and in particular, to a data processing and transmitting method and related devices.

Background

At present, with the requirements of high speed and real-time data transmission service, data transmission equipment is required to quickly and accurately perform load balancing of traffic so as to improve forwarding performance and enhance reliability of a network, thereby better serving users.

Currently, equal-cost multi-path routing (ECMP) is a relatively common load balancing process. ECMP techniques include Packet-based (Packet) and Flow-based (Flow) path selection approaches. The path selection mode based on the Packet can achieve load balancing, and time delays of different paths in multiple paths are different, so that the Packet received at a receiving end can be disordered, and Packet reordering is needed; in the Flow-based path selection mode, an output interface (a message forwarding path) for forwarding the message can be determined according to a hash algorithm, and a receiving end does not need message reordering. However, the rates of different flows may differ (e.g., large flows occupying a large bandwidth (eleuthant flows) and small flows occupying a small bandwidth (trace flows), and the flows transmitted in different paths may also differ.

In order to achieve better load balancing, a load balancing processing method based on a small flow/micro flow (Flowlet) mechanism is provided. In the load balancing processing method based on the Flowlet mechanism, as shown in fig. 1, fig. 1 is a schematic diagram of a TCP flow in the prior art, for example, after a packet of the current TCP flow arrives at a switch, the packet is detected to include 5 flowlets, the time difference that each packet in the same Flowlet arrives at the switch is generally smaller, and the time difference that the packet arrives at the switch between different flowlets is obvious. Wherein the time difference (timeap) between the first data packet and the second data packet of the Flowlet 1 is smaller than the predetermined threshold (timeout), so the switch regards both data packets as the same Flowlet. For another example, the time difference between the last packet of Flowlet 1 and the first packet of Flowlet 2 is greater than a given threshold, so the switch treats the first packet of Flowlet 2 as a new Flowlet. That is, if the time difference between arrival times of two adjacent packets of the same TCP flow at the switch is smaller than a predetermined time interval (timeout), the switch treats the two packets as the same Flowlet.

To sum up, existing Flowlet granularity load balancing schemes detect flowlets at switches based on fixed time intervals, but load conditions within a data transport network (e.g., a data center network) are dynamically changing, unpredictable, and fixed time intervals are difficult to adapt to dynamically changing network loads. When the time interval is too small, the number of detected flowlets can be increased, the processing granularity of load balancing is finer, and the risk of disorder of data packets is easily increased; when the time interval is too large, the number of detected flowlets is reduced, the processing granularity of load balancing is too coarse, and the effect of load balancing is not obvious.

Disclosure of Invention

The embodiment of the invention provides a data processing and transmitting method and related equipment, so as to improve the efficiency and accuracy in the data transmission process.

In a first aspect, an embodiment of the present invention provides a data processing method, applied to a host, which may include:

generating a first message segment, and determining a target TCP stream to which the first message segment belongs; acquiring a time stamp of the first message segment, and acquiring target flow information matched with the target TCP flow, wherein the target flow information comprises a time threshold corresponding to the target TCP flow and a time stamp of a second message segment in the target TCP flow; the second packet segment is the last packet segment adjacent to the first packet segment in the target TCP flow, the time threshold is the difference between a first path delay and a second path delay, the first path delay is the delay of the uplink path with the largest delay in the multipath set of the target TCP flow, and the second path delay is the delay of the uplink path with the smallest delay in the multipath set of the target TCP flow; comparing the difference between the time stamp of the first message segment and the time stamp of the second message segment with the time threshold; and determining whether the first message segment and the second message segment are divided into the same Flowlet according to the comparison result.

In the embodiment of the invention, for a first message segment to be sent which is generated at present, firstly, determining which TCP stream the first message segment specifically belongs to (for example, determining through a source port) and further obtaining target stream information matched with the TCP stream, wherein the target stream information comprises various information of the target TCP stream such as a time stamp of a last adjacent message segment (namely, a second message segment) and a time threshold value for dividing a Flowlet in the target TCP stream; further, the host side compares the difference between the timestamp values of the first message segment to be sent currently and the second message segment adjacent to the last message segment with the time threshold value, so as to determine whether to divide the first message segment and the last message segment adjacent to the first message segment into the same Flowlet; the time threshold is dynamically calculated by the time delay of the path in the multipath set of the target TCP flow, for example, the time threshold is calculated according to the difference between the maximum time delay and the minimum time delay of the real-time update of the historical message segment (the ACK packet with the same triplet or quintuple information of the target TCP flow) received by the host; i.e. the time threshold is a value that is dynamically adjusted in real time according to the network transmission load situation. That is, in the embodiment of the present invention, the time threshold for dividing the Flowlet is dynamically changed for different TCP flows or data packets of the same TCP flow in different states, and is dynamically adjusted according to the real-time transmission delay of the data in the corresponding TCP flow, so that the dynamic network load change can be always adapted, and the problem that in the prior art, the Flowlet is difficult to adapt to the dynamic network load due to the fact that the switch side detects the Flowlet based on the fixed time interval is avoided. In summary, the embodiment of the invention combines the time delay feedback information of the network path at the host side, and dynamically configures the time interval for detecting and dividing the Flowlet, so that the granularity of the Flowlet is matched with the state of the network path, the risk of disorder of the data packet is reduced, and the effect of load balancing is ensured.

In one possible implementation manner, the determining the target TCP flow to which the first packet segment belongs includes: and determining the target TCP stream to which the first message segment belongs according to the source port number of the first message segment.

In the embodiment of the invention, the source port number in the five-tuple information of the message segment is identified to identify which TCP stream the current message segment to be sent (namely, the first message segment) belongs to, so that stream information (comprising the timestamp value of the last adjacent message segment and the time threshold for dividing the Flowlet) of the TCP stream to which the message segment belongs is further acquired, and whether the current data message segment to be sent and the last data message segment belong to the same Flowlet or are divided into new flowlets is further judged based on the related information in the stream information.

In one possible implementation manner, the host maintains a flow information table, where the flow information table includes flow information of N TCP flows, N is an integer greater than or equal to 1, and the flow information of each TCP flow includes a flow index of a corresponding TCP flow; the obtaining the target flow information matched with the target TCP flow includes: and searching the target flow information matched with the target TCP flow from the flow information table according to the flow index of the target TCP flow.

In the embodiment of the present invention, the host side maintains a flow information table of one or more TCP flows (for example, a TCP flow currently in an active state), where the flow information table includes flow information of one or more TCP flows, and each TCP flow information may further include an index of the TCP flow, a time threshold referred to in the first aspect, a timestamp of a last packet segment adjacent to the current data packet segment to be sent, and so on. The host can maintain the flow information of all the TCP flows currently in activity, so that when a message segment needs to be sent, the target flow information matched with the flow index in the flow information table can be found according to the flow index of the TCP flow to which the message segment belongs, and the subsequent division of the Flowlet is performed.

In one possible implementation manner, the determining whether to divide the first segment and the second segment into the same Flowlet according to the comparison result includes: if the difference value between the time stamp of the first message segment and the time stamp of the second message segment is smaller than or equal to the time threshold value, dividing the first message segment and the second message segment into the same Flowlet; and if the difference value between the time stamp of the first message segment and the time stamp of the second message segment is larger than the time threshold value, dividing the first message segment into a new Flowlet.

In the embodiment of the invention, if the difference value between the time stamps of the first message segment to be sent and the second message segment adjacent to the last one in the target TCP stream is smaller than the time threshold value corresponding to the target TCP stream (the time threshold value is dynamically changed), the first message segment and the second message segment adjacent to the last one are considered to meet the condition of being sent in the same Flowlet, namely the first message segment and the second message segment adjacent to the last one can be judged to be divided into the same Flowlet; similarly, if the difference between the time stamps of the first message segment to be sent and the second message segment adjacent to the last one in the target TCP flow is greater than the time threshold corresponding to the target TCP flow, the condition that the first message segment and the second message segment adjacent to the last one are sent in the same Flowlet is not met, that is, the first message segment is divided into new flowets.

In a possible implementation manner, the target flow information further includes a reference Flowlet identifier of the target TCP flow, where the reference Flowlet identifier is currently a first Flowlet identifier corresponding to the second packet segment; the method further comprises the steps of: generating a first data packet, wherein the first data packet comprises the first message segment and a Flowlet identifier of the first message segment; if the difference value between the time stamp of the first message segment and the time stamp of the second message segment is smaller than or equal to the time threshold value, the Flowlet identifier of the first message segment is the first Flowlet identifier; and if the difference value between the time stamp of the first message segment and the time stamp of the second message segment is larger than the time threshold value, the Flowlet identifier of the first message segment is the second Flowlet identifier.

In the embodiment of the invention, when the first message segment is further encapsulated to transmit data through the network, the Flowlet identifier corresponding to the message segment can be set in the encapsulation process, so that after the message segment is encapsulated into the data packet, the switch side can identify which Flowlet the data packet belongs to through the Flowlet identifier, and then determine which path to transmit through. For example, when the first packet segment is to enter the data link layer where the switch is located, the first packet segment needs to be further encapsulated, at this time, by setting a flag bit in the encapsulated packet for the switch to identify which Flowlet the packet segment belongs to, when the Flowlet identifications of the first packet segment and the second packet segment are identical, the second packet corresponding to the first packet segment and the second packet segment is forwarded on the switch side through the same path. In summary, the embodiment of the invention divides the Flowlet on the host side, and marks the Flowlet by using the bit (for example, 1 bit) in the reserved field of the header of the transmission layer, the switch can identify the Flowlet only by means of the header field, the efficiency is high, the hardware cost is low, and meanwhile, the same Flowlet is ensured not to be cut again no matter the Flowlet passes through a plurality of hops of switches in the network, and the risk of disordered data packets is reduced.

In one possible implementation, the method further includes: and if the difference value between the time stamp of the first message segment and the time stamp of the second message segment is larger than the time threshold value, updating the reference Flowlet identifier to the second Flowlet identifier.

In the embodiment of the invention, the flow information of each TCP flow in the flow information table maintained by the host side also comprises a reference Flowlet identifier of each TCP flow, namely, the current Flowlet identifier of each TCP flow is maintained in the flow information table, so that the corresponding Flowlet identifier is conveniently set for the message segment to be sent. For example, assuming that the reference Flowlet identifier is the first Flowlet identifier (i.e., the Flowlet identifier corresponding to the second message segment), when the first message segment and the second message segment are divided into the same Flowlet, the Flowlet identifier of the first message segment is also marked as the first Flowlet identifier, i.e., the reference Flowlet identifier remains unchanged as the first Flowlet identifier; if the reference Flowlet identifier is a first Flowlet identifier, and when the first message segment and the second message segment are divided into different flowlets (i.e. the first message segment is divided into a new Flowlet), the Flowlet identifier of the first message segment is marked as a second Flowlet identifier, and the reference Flowlet identifier needs to be updated as the second Flowlet identifier at the moment. Alternatively, the reference Flowlet identifier may be switched between 0 or 1, i.e. between two adjacent flowlets, where the Flowlet identifier takes a value between 0 or 1, so that it is only possible to accurately indicate whether different data packets belong to the same Flowlet by 1 bit.

In one possible implementation, the method further includes: receiving a target ACK packet, wherein the target ACK packet is an ACK packet with the same destination port number or the same destination address as the destination TCP stream; determining the uplink path delay of the target ACK packet, wherein the uplink path delay is the difference value between the timestamp value of the target ACK packet and the timestamp loopback answer value; comparing the uplink path delay of the target ACK packet with the delay of the uplink path in the multipath set of the target TCP stream; if the uplink path delay of the target ACK packet is larger than the delay of the uplink path with the maximum current delay in the multipath set, updating the first path delay into the uplink path delay of the target ACK packet; and if the uplink path delay of the target ACK packet is smaller than the delay of the uplink path with the minimum current delay in the multipath set, updating the second path delay to be the uplink path delay of the target ACK packet.

In the embodiment of the present invention, the time threshold for dividing the Flowlet may be calculated from the difference between the maximum uplink path delay and the minimum uplink path delay of the historical packet received in the target TCP flow (or the TCP flow in the same network session with the target TCP flow), that is, the time threshold is a value that is dynamically adjusted according to the real-time change of the network transmission load condition. Specifically, when the host side receives an ACK packet belonging to a target TCP flow (i.e. the destination port number is the same), or receives an ACK packet of a TCP flow in the same network session as the target TCP flow (i.e. the destination address is the same or the destination network segment is the same), the host side calculates the transmission delay of the uplink path of the ACK packet by sending the difference between the timestamp value of the target ACK packet and the timestamp loopback reply value, determines a current minimum uplink path delay according to the history value of the uplink path transmission delays of all the received ACK packets, uses the current minimum uplink path delay as the first path delay, determines a current maximum uplink path delay as the second path delay; finally, calculating a time threshold value for dividing the Flowlet in the target TCP stream by utilizing the difference value of the first path delay and the second path delay. Therefore, the time threshold value for dividing the Flowlet of different TCP streams or data of the same TCP stream in different states is dynamically changed and is dynamically adjusted according to the real-time transmission delay of the data in the corresponding TCP stream, so that the dynamic network load change can be always adapted.

In one possible implementation, the multi-path set includes a plurality of equivalent transmission paths of the target TCP stream; alternatively, the multi-path set includes a plurality of equivalent transmission paths and a non-equivalent transmission path of the target TCP stream; alternatively, the multi-path set includes a plurality of non-equivalent transmission paths of the target TCP stream.

In the embodiment of the invention, when the network accessed by the host is an equivalent multipath model, a plurality of transmission paths in a multipath set of a target TCP stream are equivalent paths, at the moment, the first path delay is the delay of the uplink path with the largest delay in the equivalent paths, and the second path delay is the delay of the uplink path with the smallest delay in the equivalent paths; when the network accessed by the host is a conventional multipath model, a plurality of transmission paths in the multipath set of the target TCP stream can comprise equivalent paths or non-equivalent paths, and at the moment, the first path delay is the delay of the uplink path with the largest delay in the equivalent or non-equivalent paths, and the second path delay is the delay of the uplink path with the smallest delay in the equivalent or non-equivalent paths. In summary, whether or not the multiple paths in the multipath set are equivalent depends on the type of network topology of the network accessed by the host, and the embodiment of the present invention may be applicable to all network types where multipath transmission exists.

In a second aspect, an embodiment of the present invention provides a data transmission method, applied to a switch, which may include:

receiving a first data packet, wherein the first data packet comprises a first message segment and a Flowlet identifier of the first message segment; determining a target TCP stream to which the first data packet belongs, and acquiring forwarding information matched with the target TCP stream; the forwarding information comprises a reference Flowlet identification and a reference forwarding path of the target TCP flow; the reference Flowlet identifier is a first Flowlet identifier corresponding to a second message segment currently, and the second message segment is the last message segment adjacent to the first message segment in the target TCP stream; the reference forwarding path is a first forwarding path of a second data packet, and the second data packet comprises the second message segment and the first Flowlet identifier; comparing the Flowlet identification of the first message segment with the first Flowlet identification; and determining whether to forward the first message segment through the first forwarding path according to the comparison result.

In the embodiment of the invention, after receiving the data packet, the exchange side judges whether the Flowlet identification of the first data packet is the same as that of the adjacent data packet in the target TCP stream according to the Flowlet identification by identifying the Flowlet identification in the data packet and determining whether the first data packet needs to be forwarded through the forwarding path corresponding to the second data packet based on the Flowlet identification. That is, the exchange side does not need to divide the Flowlet for the received data packet according to the time interval of the received data packet, but directly identifies whether the data packet to be sent currently belongs to the same Flowlet as the last adjacent data packet in the same TCP flow according to the Flowlet identification bit contained in the received data packet, so as to decide whether to continue forwarding through the forwarding path of the adjacent data packet or divide a new Flowlet for the data packet and decide a new forwarding path for the data packet.

In one possible implementation manner, the switch maintains a forwarding information table, where the forwarding information table includes forwarding information of M TCP flows, M is an integer greater than or equal to 1, and the forwarding information of each TCP flow includes a five-tuple hash value of a corresponding TCP flow; the determining the target TCP flow to which the first data packet belongs, and obtaining forwarding information matched with the target TCP flow, includes: calculating a five-tuple hash value of the first data packet according to the five-tuple information of the first data packet; and searching forwarding information matched with the target TCP flow from the forwarding information table according to the five-tuple hash value of the first data packet.

In the embodiment of the invention, the exchange side maintains a forwarding information table, and the forwarding information table comprises forwarding information of one or more TCP streams (such as TCP streams in an active state) on a host connected with the forwarding information table, and the forwarding information of each TCP stream can further comprise five-tuple hash values of the TCP streams. That is, the switch may maintain forwarding information of all TCP flows currently active, so that when there is a data packet to be sent, forwarding information (including a reference Flowlet identifier, a forwarding path, etc.) that matches the five-tuple hash value in the forwarding information table may be found according to the five-tuple hash value of the data packet, so as to forward the data packet to be sent.

In a possible implementation manner, the determining whether to forward the first packet segment through the forwarding path according to the comparison result includes: if the Flowlet identification of the first message segment is the same as the first Flowlet identification, forwarding the first data packet through the first forwarding path; and if the Flowlet identification of the first message segment is different from the first Flowlet identification, determining a second forwarding path for the first data packet, and forwarding through the second forwarding path.

In the embodiment of the invention, when the switch recognizes that the Flowlet identification of the first data packet is the same as that of the adjacent data packet in the target TCP stream, the first data packet and the second data packet are forwarded on the same path; when the switch recognizes that the Flowlet identification of the first data packet is different from that of the adjacent data packet in the target TCP stream, a new forwarding path is determined for the first data packet, and forwarding is performed through the new forwarding path. It should be noted that the second forwarding path may be the same as or different from the first forwarding path, depending on the decision of the switch.

In one possible implementation, the method further includes: if the Flowlet identifier of the first message segment is different from the first Flowlet identifier and is a second Flowlet identifier, updating the reference Flowlet identifier of the target TCP flow to the second Flowlet identifier, and updating the reference forwarding path to the second forwarding path.

In the embodiment of the invention, when the Flowlet identifiers of the first data packet and the second data packet are different, the first data packet and the last adjacent second data packet in the belonged TCP stream are not the same Flowlet, so that the switch needs to divide the first data packet into new flowlets, and the reference Flowlet identifier of the belonged TCP stream needs to be updated to the Flowlet identifier corresponding to the current latest data packet, namely the second Flowlet identifier.

In a third aspect, an embodiment of the present invention provides a data processing apparatus, which may include:

the first generation unit is used for generating a first message segment;

a first determining unit, configured to determine a target TCP flow to which the first packet segment belongs;

the acquisition unit is used for acquiring the time stamp of the first message segment and acquiring target flow information matched with the target TCP flow, wherein the target flow information comprises a time threshold value corresponding to the target TCP flow and the time stamp of a second message segment in the target TCP flow; the second packet segment is the last packet segment adjacent to the first packet segment in the target TCP flow, the time threshold is the difference between a first path delay and a second path delay, the first path delay is the delay of the uplink path with the largest delay in the multipath set of the target TCP flow, and the second path delay is the delay of the uplink path with the smallest delay in the multipath set of the target TCP flow;

A first comparing unit, configured to compare a difference value between the timestamp of the first packet segment and the timestamp of the second packet segment with the time threshold;

and the Flowlet dividing unit is used for determining whether the first message segment and the second message segment are divided into the same Flowlet according to the comparison result.

In a possible implementation manner, the first determining unit is specifically configured to:

and determining the target TCP stream to which the first message segment belongs according to the source port number of the first message segment.

In one possible implementation, the apparatus further includes:

a maintenance unit, configured to maintain a flow information table, where the flow information table includes flow information of N TCP flows, N is an integer greater than or equal to 1, where the flow information of each TCP flow includes a flow index of a corresponding TCP flow;

the acquisition unit is specifically configured to: and searching the target flow information matched with the target TCP flow from the flow information table according to the flow index of the target TCP flow.

In a possible implementation manner, the Flowlet dividing unit is specifically configured to:

if the difference value between the time stamp of the first message segment and the time stamp of the second message segment is smaller than or equal to the time threshold value, dividing the first message segment and the second message segment into the same Flowlet;

And if the difference value between the time stamp of the first message segment and the time stamp of the second message segment is larger than the time threshold value, dividing the first message segment into a new Flowlet.

In a possible implementation manner, the target flow information further includes a reference Flowlet identifier of the target TCP flow, where the reference Flowlet identifier is currently a first Flowlet identifier corresponding to the second packet segment; the apparatus further comprises:

a second generating unit, configured to generate a first data packet, where the first data packet includes the first packet segment and a Flowlet identifier of the first packet segment; wherein,

if the difference value between the time stamp of the first message segment and the time stamp of the second message segment is smaller than or equal to the time threshold value, the Flowlet identifier of the first message segment is the first Flowlet identifier;

and if the difference value between the time stamp of the first message segment and the time stamp of the second message segment is larger than the time threshold value, the Flowlet identifier of the first message segment is the second Flowlet identifier.

In one possible implementation, the apparatus further includes:

and the first updating unit is used for updating the reference Flowlet identifier into the second Flowlet identifier if the difference value between the time stamp of the first message segment and the time stamp of the second message segment is larger than the time threshold value.

In one possible implementation, the apparatus further includes:

a receiving unit, configured to receive a target ACK packet, where the target ACK packet is an ACK packet with the same destination port number or the same destination address of the target TCP flow;

a second determining unit, configured to determine an uplink path delay of the target ACK packet, where the uplink path delay is a difference between a timestamp value of the target ACK packet and a timestamp loopback reply value;

a second comparing unit, configured to compare the uplink path delay of the target ACK packet with the delay of the uplink path in the multipath set of the target TCP flow;

a second updating unit, configured to update the first path delay to the uplink path delay of the target ACK packet if the uplink path delay of the target ACK packet is greater than the delay of the uplink path with the maximum current delay in the multipath set;

and a third updating unit, configured to update the second path delay to the uplink path delay of the target ACK packet if the uplink path delay of the target ACK packet is smaller than the delay of the uplink path with the minimum current delay in the multipath set.

In a fourth aspect, an embodiment of the present invention provides a data processing apparatus, which may include:

a receiving unit, configured to receive a first data packet, where the first data packet includes a first packet segment and a Flowlet identifier of the first packet segment, and the first data packet belongs to a target TCP flow;

the determining unit is used for determining a target TCP stream to which the first data packet belongs and acquiring forwarding information matched with the target TCP stream; the forwarding information comprises a reference Flowlet identification and a reference forwarding path of the target TCP flow; the reference Flowlet identifier is a first Flowlet identifier corresponding to a second message segment currently, and the second message segment is the last message segment adjacent to the first message segment in the target TCP stream; the reference forwarding path is a first forwarding path of a second data packet, and the second data packet comprises the second message segment and the first Flowlet identifier;

a comparison unit, configured to compare the Flowlet identifier of the first packet segment with the first Flowlet identifier;

and the forwarding unit is used for determining whether to forward the first message segment through the first forwarding path according to the comparison result.

In one possible implementation manner, the switch maintains a forwarding information table, where the forwarding information table includes forwarding information of M TCP flows, M is an integer greater than or equal to 1, and the forwarding information of each TCP flow includes a five-tuple hash value of a corresponding TCP flow; the determining unit is specifically configured to:

calculating a five-tuple hash value of the first data packet according to the five-tuple information of the first data packet;

and searching forwarding information matched with the target TCP flow from the forwarding information table according to the five-tuple hash value of the first data packet.

In a possible implementation manner, the forwarding unit is specifically configured to:

if the Flowlet identification of the first message segment is the same as the first Flowlet identification, forwarding the first data packet through the first forwarding path;

and if the Flowlet identification of the first message segment is different from the first Flowlet identification, determining a second forwarding path for the first data packet, and forwarding through the second forwarding path.

In one possible implementation, the apparatus further includes:

and the updating unit is used for updating the reference Flowlet identifier of the target TCP flow to the second Flowlet identifier and updating the reference forwarding path to the second forwarding path if the Flowlet identifier of the first message segment is different from the first Flowlet identifier and is the second Flowlet identifier.

In a fifth aspect, the present application provides a semiconductor chip, which may include the data processing apparatus provided in any implementation manner of the third aspect.

In a sixth aspect, the present application provides a semiconductor chip, which may include the data processing apparatus provided in any implementation manner of the fourth aspect.

In a seventh aspect, the present application provides a semiconductor chip, which may include: a data processing apparatus, an internal memory coupled to the data processing apparatus, and an external memory as provided in any of the implementations of the third aspect above.

In an eighth aspect, the present application provides a semiconductor chip, which may include: a data transmission device, an internal memory coupled to the data processing device, and an external memory as provided in any implementation manner of the fourth aspect.

In a ninth aspect, the present application provides a system-on-chip SoC chip, which includes the data processing apparatus provided in any implementation manner of the third aspect, an internal memory and an external memory coupled to the data processing apparatus. The SoC chip may be formed by a chip, or may include a chip and other discrete devices.

In a tenth aspect, the present application provides a system-on-chip SoC chip, which includes the data transmission device provided in any implementation manner of the fourth aspect, an internal memory and an external memory coupled to the data transmission device. The SoC chip may be formed by a chip, or may include a chip and other discrete devices.

In an eleventh aspect, the present application provides a chip system, which includes the data processing apparatus provided in any implementation manner of the third aspect. In one possible design, the chip system further includes a memory for storing program instructions and data necessary or relevant for the operation of the data processing apparatus. The chip system can be composed of chips, and can also comprise chips and other discrete devices.

In a twelfth aspect, the present application provides a chip system, which includes the data transmission device provided in any implementation manner of the fourth aspect. In one possible design, the chip system further comprises a memory for storing program instructions and data necessary or relevant for the operation of the data transmission device. The chip system can be composed of chips, and can also comprise chips and other discrete devices.

In a thirteenth aspect, the present application provides a data processing apparatus having a function of implementing any one of the data processing methods of the first aspect. The functions can be realized by hardware, and can also be realized by executing corresponding software by hardware. The hardware or software includes one or more modules corresponding to the functions described above.

In a fourteenth aspect, the present application provides a data transmission apparatus having a function of implementing any one of the data transmission methods of the second aspect. The functions can be realized by hardware, and can also be realized by executing corresponding software by hardware. The hardware or software includes one or more modules corresponding to the functions described above.

In a fifteenth aspect, the present application provides a host comprising a processor for performing the data processing method provided by any one of the implementations of the first aspect. The host may also include a memory for coupling with the processor, which holds the program instructions and data necessary for the host. The host may also include a communication interface for the host to communicate with other devices or communication networks.

In a sixteenth aspect, the present application provides a switch, the switch comprising a processor configured to perform the data transmission method provided by any one of the implementations of the first aspect. The switch may also include a memory for coupling with the processor, which holds the program instructions and data necessary for the switch. The switch may also include a communication interface for the switch to communicate with other devices or communication networks.

In a seventeenth aspect, the present application provides a computer readable storage medium storing a computer program that, when executed by a host, implements a processing method flow of the multicore processor of any of the second aspects above.

In an eighteenth aspect, the present application provides a computer-readable storage medium storing a computer program that, when executed by a switch, implements the processing method flow of the multicore processor of any of the fourth aspects above.

In a nineteenth aspect, an embodiment of the present invention provides a computer program, where the computer program includes instructions that, when executed by a multicore processor, enable a host to execute a processing method flow of the multicore processor according to any one of the second aspects above.

In a twentieth aspect, an embodiment of the present invention provides a computer program, including instructions that, when executed by a multicore processor, enable a switch to perform the processing method flow of the multicore processor of any one of the fourth aspects above.

Drawings

Fig. 1 is a schematic diagram of a TCP flow division into flowlets in the prior art.

Fig. 2 is a schematic diagram of a network transmission system architecture according to an embodiment of the present application.

Fig. 3 is a schematic diagram of a network topology of a data center according to an embodiment of the present application.

Fig. 4 is a schematic diagram of a computer network OSI model and a TCP/IP model provided in an embodiment of the present application.

Fig. 5 is a flow chart of a data transmission method according to an embodiment of the present invention.

Fig. 6A is a schematic diagram of a Flowlet with a first packet and a second packet according to an embodiment of the present invention.

Fig. 6B is a schematic diagram of a first packet and a second packet in different flowlets according to an embodiment of the present invention.

Fig. 6C is a flow chart illustrating the partitioning and marking of a Flowlet by an additional layer protocol according to an embodiment of the present invention.

Fig. 6D is a flowchart illustrating a method for dynamically updating a segmentation threshold of a Flowlet according to an embodiment of the present invention.

Fig. 7 is a flow chart of a data transmission method according to an embodiment of the present invention.

Fig. 8 is a schematic structural diagram of a data processing apparatus according to an embodiment of the present invention.

Fig. 9 is a schematic structural diagram of a data transmission device according to an embodiment of the present invention.

Detailed Description

Embodiments of the present invention will be described below with reference to the accompanying drawings in the embodiments of the present invention. The terms "first," "second," "third," and "fourth" and the like in the description and in the claims of this application and in the drawings, are used for distinguishing between different objects and not for describing a particular sequential order. Furthermore, the terms "comprise" and "have," as well as any variations thereof, are intended to cover a non-exclusive inclusion. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those listed steps or elements but may include other steps or elements not listed or inherent to such process, method, article, or apparatus. Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment may be included in at least one embodiment of the present application. The appearances of such phrases in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. Those of skill in the art will explicitly and implicitly appreciate that the embodiments described herein may be combined with other embodiments.

As used in this specification, the terms "component," "module," "system," and the like are intended to refer to a computer-related entity, either hardware, firmware, a combination of hardware and software, or software in execution. For example, a component may be, but is not limited to being, a process running on a processor, an object, an executable, a thread of execution, a program, and/or a computer. By way of illustration, both an application running on a computing device and the computing device can be a component. One or more components may reside within a process and/or thread of execution and a component may be localized on one computer and/or distributed between 2 or more computers. Furthermore, these components can execute from various computer readable media having various data structures stored thereon. The components may communicate by way of local and/or remote processes such as in accordance with a signal having one or more data packets (e.g., data from two components interacting with one another in a local system, distributed system, and/or across a network such as the internet with other systems by way of the signal).

First, some terms in this application are explained for easy understanding by those skilled in the art.

(1) Equal-cost multipath (ECMP), i.e., there are multiple paths of the same overhead to the same destination address, where the same overhead refers to the same number of hops (i.e., number) through the switch. For example, in a network topology of an equal cost multipath model (e.g., a data center network), all possible transmission paths between the same pair of source and destination hosts are equal cost paths. When the device supports equivalent routing, the traffic sent to the destination IP or destination network segment can be shared by different paths to realize the load balancing of the network, and when some paths fail, the other paths replace the forwarding processing to realize the route redundancy backup function. If the traditional routing technology is used, the data packet sent to the destination address can only use one link, other links are in a backup state or an invalid state, and a certain time is needed for mutual switching in a dynamic routing environment, and the equivalent multi-path routing protocol can use a plurality of links simultaneously in the network environment, so that the transmission bandwidth is increased, and the data transmission of the invalid link can be backed up without delay and packet loss.

(2) The transmission control protocol (Transmission Control Protocol, TCP) is a connection-oriented, reliable, byte stream based transport layer communication protocol. TCP is intended to accommodate a layered protocol hierarchy that supports multiple network applications. Reliable communication services are provided by means of TCP between pairs of processes in host computers connected to different but interconnected computer communication networks. TCP assumes that it can obtain simple, possibly unreliable datagram services from lower level protocols. In principle, TCP should be able to operate over a variety of communication systems from hardwired to packet-switched or circuit-switched networks.

(3) A network Flow (Flow), which may also be referred to simply as a network Flow, is a set of packets having the same five-tuple for a period of time, where the five-tuple includes a source IP address, a source port number, a destination IP address, a destination port number, and a transport layer protocol of both communication parties.

(4) A network session is a collection of network flows that have the same triplet (source address, destination address, transport layer protocol).

(5) A Flow slice/micro Flow/small Flow (Flowlet) is understood to be a packet group consisting of a plurality of packets sent consecutively in one Flow, each Flow including a plurality of flowlets. When the message forwarding is performed based on the Flowlet mechanism, the forwarding of a plurality of messages included in the Flowlet can be realized based on the Flowlet flow table entry. Different flowlets correspond to different Flowlet flow table entries. The Flowlet flow table entry is used to indicate a message forwarding path of a plurality of messages included in each Flowlet.

(6) The transmission control protocol/internet protocol (Transmission Control Protocol/Internet Protocol, TCP/IP) refers to a protocol cluster that enables information transfer between a plurality of different networks. The TCP/IP protocol refers not only to two protocols of TCP and IP but also to a protocol cluster composed of protocols of FTP, SMTP, TCP, UDP, IP and the like, and is called a TCP/IP protocol because the TCP protocol and the IP protocol are the most representative among the TCP/IP protocols. TCP is a connection-oriented, reliable, byte stream based transport layer communication protocol, among other things.

(7) An internet service provider (Internet Service Provider, ISP) network, a telecommunications carrier that provides a broad array of users with a combination of internet access services, information services, and value added services.

In order to facilitate understanding of the embodiments of the present application, a description will be given below of a network transmission system architecture on which the embodiments of the present application are based. Fig. 2 is a schematic diagram of a network transmission system architecture provided in an embodiment of the present application, referring to fig. 2, the network transmission system architecture mainly includes: host 10, SWITCH (SWITCH) 20, and the internet. The host 10 may be divided into a source host or a destination host according to whether it is a transmitting end or a receiving end, and the source host may be connected to the internet through the switch 20 so as to communicate with the destination host.

Host 10 may be any computing device that generates data and has network access functionality. For example, any computer connected to the Internet may be referred to as a host, each having a unique IP address. The host 10 may be a server, a personal computer, a tablet computer, a mobile phone, a personal digital assistant, an intelligent wearable device, an unmanned terminal, and other devices. When two hosts (such as a source host and a destination host) need to communicate and transmit data, the source host is required to package application data into data packets (such as TCP/IP packets), and then the data packets are delivered to a next data link layer (such as a switch) to be packaged into frames; and then the exchanger and the like accurately transmit the data from the source host to the destination host according to the MAC address. In the embodiment of the present invention, the host 10 further has the functions of performing the division of the Flowlet on the message segment, the Flowlet identification, and the dynamic configuration of the time threshold for dividing the Flowlet, which are specifically referred to in the following description of the related embodiments, and are not repeated herein.

The switch 20 is a network device that completes the function of encapsulating and forwarding data packets based on MAC (hardware address of network card) identification. It can "learn" the MAC address and store it in an internal address table, by setting up a temporary switching path between the sender and receiver of the data frame, the data frame is made to go directly from the source address to the destination address. The functions of the switch 20 may include physical addressing, network topology, error checking, frame sequencing, flow control, and the like. In the embodiment of the present invention, the switch 20 further has the functions of dividing the Flowlet and identifying the Flowlet according to the above-mentioned message segment on the host 10 side, so that forwarding of the same Flowlet or different flowlets in the data flow is performed based on the already-divided and identified Flowlet identification, which is specifically referred to the description of the related embodiments hereinafter and will not be repeated here.

For example, the source host 10 processes data by using a transmission control protocol (Transmission Control Protocol, TCP) and a data processing method in the application, then sends a message to a message forwarding device of the routing switch network, and the message forwarding device (such as a switch and a router) in the routing switch network forwards the message by using an ECMP technology and a data transmission method in the application, and finally forwards the message to the destination host 10, thereby achieving the effect of load balancing processing.

The data processing method or the data transmission method in the embodiment of the invention can be applied to a transmission mechanism based on TCP/IP. The application range of the data transmission method in the invention is not limited to the data center network, but is also applicable to any network with multipath, such as an ISP network (Internet Service Provider), and the network topology provides a plurality of network paths for any two source destination communication nodes (namely a source host and a destination host), so that the technical scheme in the application can be applied to execute the dynamic load balancing of the Flowlet granularity.

It should be noted that, for the data center network, the characteristics of the network topology determine that in the same network session of the data center network, that is, the TCP flows with the same information of the triplets (source address, destination address, transport layer protocol), one or more paths contained in the corresponding multipath set are all equivalent paths; for other types of networks, for example, in the same network session of an ISP network, the one or more paths included in the corresponding multipath set may or may not be equivalent. Therefore, depending on the topology type of the network accessed by the host, one or more paths included in the multipath set of the target TCP flow described in the present application may or may not be equivalent.

It will be appreciated that the network architecture of fig. 2 above is merely an exemplary implementation of an embodiment of the present application, and the network architecture of an embodiment of the present invention includes, but is not limited to, the above network architecture.

Referring to fig. 3, fig. 3 is a schematic diagram of a data center network topology according to an embodiment of the present application, where the data center network mainly includes: core layer, aggregation layer, access Access layer and POD Aggregation area layer. The source host may communicate with the destination host via the switch and the core network via the TCP protocol. Wherein,

a convergence zone (Point of delivery, POD) layer, consisting of multiple PODs, each POD may include server, storage, and network devices. The Top of Rack (ToR) mode is one mode of wiring of server cabinets of a data center, and when the ToR mode is adopted for wiring, 1-2 access switches are deployed at the upper end of each server cabinet.

Access Access layer: the physical connection server, typically placed on top of the cabinet, also known as the ToR switch, is also known as the Edge Layer (Edge Layer). Access switches are typically located on top of racks, so they are also called ToR (Top of Rack) switches, which physically connect servers.

An Aggregation convergence layer: aggregation switches, aggregate connection access switches, and provide other services such as Firewall (FW), load balancing (Server Load Balancer, SLB), secure socket protocol offload (Secure Sockets Layer offload, SSL offlood), intrusion detection, network analysis, and the like.

Core layer: and the core switch provides high-speed forwarding and connectivity for a plurality of convergence layers. The core switches provide high speed forwarding of packets into and out of the data center, connectivity for multiple convergence layers, and the core switches provide a flexible L3 routing network for the overall network.

For example, in fig. 3, for TOR1 in Pod1, there are at least 4 equal cost paths (ECMP) for access to the Internet (Internet), as shown in fig. 3, TOR1 in Pod1 may at least pass through the equal cost paths: path 1, path 2, path 3 and path 4 to access the internet.

It should be noted that, in the embodiment of the present invention, the data transmission method applied to the exchange side may be applied to the switches (Access layer, aggregation layer or Core layer) of the above layers, that is, in the entire forwarding path of the data packet from the source host to the destination host, all the switches involved in forwarding may implement any one of the data transmission methods provided in the present application.

It will be appreciated that the data center network topology of fig. 3 above is merely one exemplary implementation of an embodiment of the present application, and that the data center network topology of an embodiment of the present invention includes, but is not limited to, the above network architecture.

Referring to fig. 4, fig. 4 is a schematic diagram of a computer network OSI model and a TCP/IP model provided in an embodiment of the present application, in which an additional layer is added between a transport layer and a network layer in the prior art computer network OSI model or TCP/IP model, and the additional layer is mainly used for partitioning, marking, and setting related parameters of a Flowlet in a TCP flow. Specifically, the OSI eight-layer Network model provided in the embodiment of the present invention is 1 to 8 layers from bottom to top, which are respectively a Physical layer (Physical layer), a Data link layer (Data link layer), a Network layer (Network layer), an additional layer, a Transport layer (Transport layer), a Session layer (Session layer), a presentation layer (Presentation layer), and an application layer (Application layer); the TCP/IP model provided by the embodiment of the invention can be simplified into 1 to 5 layers from bottom to top, and mainly comprises a network interface layer, a network layer, an additional layer, a transmission layer and an application layer. Wherein,

(1) Application layer

The layer closest to the user in the OSI reference model is to provide an application interface for computer users and also to directly provide various network services for users. Providing rich system application interfaces to user application software. The network service protocols of the common application layer are: hypertext transfer protocol (Hyper Text Transfer Protocol, HTTP), hypertext transfer security protocol (Hyper Text Transfer Protocol over Secure Socket Layer, HTTPs), file transfer protocol (File Transfer Protocol, FTP), post office protocol Version 3 (Post Office Protocol-Version 3, pop 3), simple mail transfer protocol (Simple Mail Transfer Protocol, SMTP), and the like.

(2) Presentation layer

And the coding and conversion of data are ensured, and the normal operation of an application layer is ensured. The conversion of the data format is performed to ensure that application layer data generated by one system is recognized and understood by the application layer of another system. Computers may employ different data representations on a network, so data format conversion is required at the time of data transmission. In order to enable computers employing different data representations to communicate with each other and exchange data, abstract data structures are used to represent the data transferred during communication. While still employing the respective standard code inside the machine. The tasks of managing these abstract data structures, converting the internal code of the machine into a transmission syntax suitable for transmission over the network at the sender, and vice versa at the receiver are all done by the presentation layer.

(3) Session layer

Is responsible for setting up, maintaining, controlling sessions, distinguishing between different sessions, and providing services in three communication modes, simplex (Simplex), half duplex (Half duplex), full duplex (Full duplex). For example, a session is established, managed, and terminated between two communicating parties, it is determined whether the parties should begin communication initiated by one party, and so on.

(4) Transport layer

And the method is responsible for dividing and combining data and realizing end-to-end logic connection. The transmission layer establishes a host end-to-end link, and the transmission layer has the function of providing end-to-end reliable and transparent data transmission service for an upper layer protocol, including processing the problems of error control, flow control and the like. This layer shields the details of the underlying data communication to the higher layer, enabling the higher layer user to see only one host-to-host, user-controlled and user-settable, reliable data path between the two transport entities. TCP/UDP is at this level.

(5) Additional layer

The additional layer in the embodiment of the invention is used for dividing and marking the TCP stream and setting related parameters. The additional layer protocol dynamically configures a segmentation threshold of the Flowlet according to the time delay feedback of the network path, and divides the message segment of the TCP stream into the Flowlet according to the dynamic segmentation threshold. Since the division of the flowlets is done at the host side, the invention can use the 1-bit reserved field of the header of the transmission layer to mark the adjacent flowlets of the same TCP stream (the 1-bit field is named as FL_tag), the division result of the flowlets is transmitted to the switch in the network, and the switch identifies the flowlets according to the header flag bit of the data packet. In the embodiment of the present invention, the functions of the host side mainly relate to the application layer, the presentation layer, the session layer, the transport layer and the additional layer.

It should be noted that, the additional layer described in the present application may be deployed as a single layer, or may be deployed into the existing transport layer. I.e. by incorporating the functions implemented by the additional layers into the transport layer, embodiments of the invention are not particularly limited in this regard.

(6) Network layer

Is responsible for managing network addresses, locating devices, and determining routes. The layer establishes connection between two nodes through IP addressing, and selects proper route and exchange node for the packet sent by the transport layer of the source end, and sends the packet to the transport layer of the destination end according to the address correctly. I.e. the IP protocol layer as commonly referred to. Specifically, the network layer realizes the whole transmission process of data from any one node to another node according to the address information of the network layer contained in the data, namely the main function is to complete the message transmission between hosts in the network, and each message is transmitted from a source end to a destination end by using the service of the data link layer. The functions related to the switch in the embodiment of the invention correspond to the network layer.

(7) Data link layer

Is responsible for preparing physical transmissions, cyclic redundancy check (Cyclic Redundancy Check, CRC), error notification, network topology, flow control, etc. The bits are combined into bytes, the bytes are then combined into frames, the link layer address (ethernet using MAC address) is used to access the medium, and error detection is performed. Between adjacent nodes connected by physical links, data links in a logical sense are established, and point-to-point or point-to-multi-way direct communication of data is achieved on the data links. In wide area networks, the data link layer is responsible for the reliable transfer of data between the host's interface information processor (Interface Message Processor, IMP), IMP-IMP. In a local area network, the data link layer is responsible for the reliable transmission of data between and among systems.

(8) Physical layer

Complete the conversion of logical "0" and "1" to physical (optical/electrical) suitable for the transport medium carrying; the transmission and the reception of the physical signals and the transmission process of the medium are realized. The main function of the physical layer is to complete the original bit stream transmission between adjacent nodes. I.e. is responsible for transmitting and receiving data in a bit stream. In practice the transmission of the final signal is achieved by the physical layer. Common physical layer transmission media are (various physical devices) hubs, repeaters, modems, network cables, twisted pair, coaxial cable, etc.

It should be noted that, in the simplified TCP/IP model, when application layer data is sent to the network through a protocol stack, a header (header) is added to each layer of protocol, which is called Encapsulation (Encapsulation), where different protocol layers refer to data packets differently, for example, in the application layer, called messages, in the transport layer, called segments, in the network layer, called datagrams (datagram) or packets (Packet), in the link layer, called frames (frames), etc.

It will be appreciated that the relevant network models and functions in fig. 4 above are merely one exemplary implementation in the embodiments of the present application, and the network models and functions referred to in the embodiments of the present invention include, but are not limited to, the above models and functions.

First, for better understanding of the embodiments of the present invention, the Flow and Flowlet related in the present application will be further described. As shown in fig. 1 above, the Flowlet is actually a micro-Flow (micro-Flow). One Flow can be divided into many flowlets. The same Flowlet has the same quintuple information, namely the source IP, the destination IP, the source port, the destination port and the transport layer protocol are the same. And taking a plurality of packets continuously transmitted in a certain flow as a Flowlet, and applying a Flowlet mechanism to carry out path selection so as to forward the plurality of packets included in the Flowlet based on the selected path. In the present application, for different data packets in the same Flowlet, the identical forwarding paths (not including equivalent paths) are adopted for forwarding, while for different flowlets in the same TCP flow, different but equivalent paths (i.e. equivalent multipaths) may be adopted for forwarding, and also non-equivalent paths may be adopted for forwarding, depending on the type of network topology of the network accessed by the host. The equal cost multipath may be included in the forwarding path of the data packet, where the hop count of the switch is equal; and the unequal multipath is a path with unequal hop counts of the switch in the forwarding paths of the data packets.

In other words, a Flow can be regarded as being composed of a plurality of flowlets, and load balancing is based on the introduction of an intermediate layer on the basis of flowlets, which is neither a Packet (Packet) nor a Flow (Flow), but is a Flowlet larger than a Packet smaller than a Flow, i.e. a Flowlet can be regarded as a micro-Flow composed of one or more packets in the same Flow.

Based on the network architecture provided in fig. 2 or fig. 3 and the computer network model provided in fig. 4, the technical problems presented in the present application are specifically analyzed and solved in combination with the data transmission method provided in the present application.

Referring to fig. 5, fig. 5 is a flowchart of a data transmission method according to an embodiment of the present invention, where the method may be applied to the network architecture described in fig. 2 or fig. 3, and the host 10 may be configured to support and execute the method steps S501 to S504 shown in fig. 5. The description will be made below from the host 10 (source host) side with reference to fig. 3. The method may include the following step S501-step S504, and optionally, step S505-step S506.

Step S501: and generating a first message segment, and determining a target TCP stream to which the first message segment belongs.

Specifically, at the transmitting end, when a host (which may be called a source host) needs to send a message to another host (which may be called a destination host), the source host locally generates a data packet conforming to the relevant protocol standard, and then sends the data packet to the destination host through a switch or the like. Wherein, on the host side, the process of generating the data packet mainly involves an application layer (including an application layer, a presentation layer, a session layer), a transport layer and a network layer. For example, when an application in the source host needs to send a message to the destination host, the message enters the transport layer after the source host side is encapsulated by the application layer, and a segment (i.e., a first segment) conforming to the transport layer protocol is generated, for example, a TCP segment (segment) conforming to the TCP protocol. That is, at the host side, when the TCP segment completes the encapsulation of the header field of the transport layer (i.e., the first segment in the embodiment of the present invention is generated), the function of the additional layer in the present application (as described in fig. 2, which is not described herein in detail) is triggered, and the subsequent division and marking of the Flowlet are performed on the first segment, and the time threshold for dividing the Flowlet is dynamically configured. Specifically, the source host determines the TCP flow to which the first message segment belongs through the additional layer, and then obtains the related information for dividing and identifying the Flowlet corresponding to the first message segment according to the TCP flow to which the first message segment belongs. In the present application, a TCP Flow (Flow) refers to a data transmission process in a certain service process, that is, from TCP three-way handshake to data transmission end to connection release; and five-tuple information of the same TCP stream is the same, wherein the five-tuple information comprises a source IP, a destination IP, a source port, a destination port and a transport layer protocol.

Optionally, the host determines, according to the source port number in the first packet segment, a target TCP flow to which the first packet segment belongs. That is, the source port numbers corresponding to different TCP flows are necessarily different, so it can be determined whether different packet segments belong to the same TCP flow through the source port in the packet segment. For example, the source host may determine to which target TCP flow the first segment belongs based on source port information in the first segment.

Step S502: and acquiring the time stamp of the first message segment and acquiring target flow information matched with the target TCP flow.

Specifically, since in the present application, the division and identification of the Flowlet are performed on the first packet segment, depending on the relationship between the difference value of the time stamps between the first packet segment and the adjacent previous packet segment and the corresponding time threshold, it is necessary to obtain the matched related flow information. After the source host determines the TCP stream to which the first message segment to be sent belongs, further acquiring a time stamp of the first message segment, and acquiring target stream information matched with the target TCP stream, so as to further divide and identify the subsequent Flowlet. It should be noted that, in the embodiment of the present invention, the timestamp of a packet segment is usually time information added when the packet segment is encapsulated in the transport layer, where the time information represents the time when the packet segment is generated (for the destination host, it may also be understood that the time when the source host sends the packet segment). For example, when the host needs to send data to the destination host, the sending time is encapsulated into a time stamp item of the data, and for both the source host and the destination host, it is known by the time stamp what time the data is sent, so as to calculate (or measure) network delay, time consuming processing of computing service, and the like. Alternatively, the host may obtain the timestamp of the first packet segment by obtaining the timestamp value carried in the first packet segment, or may obtain the timestamp of the first packet segment according to the current timestamp of the system, and the obtaining step may be completed after the first packet segment is generated and before step S503, which is not limited to the specific execution time point. The target flow information comprises a time threshold corresponding to the target TCP flow and a time stamp of a second message segment in the target TCP flow. Wherein,

The second segment is the last segment adjacent to the first segment in the target TCP flow (i.e., the segment in the target TCP flow having a timestamp value earlier than the first segment and adjacent to the first segment); the time stamp of the second message segment can be obtained by the host according to the time stamp value carried in the second message segment, or can be obtained by the host according to the time recorded by the system at the time, that is, the time stamp of the first message segment and the time stamp of the second message segment are obtained by adopting the same standard.

The time threshold is the difference between a first path delay and a second path delay, the first path delay is the delay of the uplink path with the largest delay in the multipath set of the target TCP stream, and the second path delay is the delay of the uplink path with the smallest delay in the multipath set of the target TCP stream. The multipath set of the target TCP flow may include multiple transmission paths corresponding to the target TCP flow, that is, multiple possible uplink transmission paths between a source IP and a source port of the target TCP flow and a destination IP and a destination port of the target TCP flow. Optionally, the multipath set of the target TCP flow may further include a plurality of transmission paths corresponding to the TCP flow of the same network session between the target TCP flows, that is, a plurality of possible uplink transmission paths between the source IP and the destination IP. In other words, the multipath set of the target TCP flow may include a plurality of transmission paths corresponding to the target TCP flow itself, or may further include a plurality of transmission paths corresponding to the same TCP flow as the triplet information (source address, destination address, transport layer protocol) of the target TCP flow. That is, there may be one TCP flow corresponding to one multipath set, or there may be multiple TCP flows in the same network session corresponding to the same multipath set, and thus, each TCP flow may maintain a time threshold for dividing a Flowlet separately, or may maintain a time threshold for dividing a Flowlet together among multiple TCP flows. The uplink path refers to a path from a transmitting end (i.e., a source host) to a receiving end (a destination host); the uplink path delay refers to the total delay between the time when the message segment is sent from the source host and reaches the destination host, and the time when the source host receives the acknowledgement from the destination host (the destination host sends the acknowledgement immediately after receiving the data).

Further optionally, when the network accessed by the host is an equivalent multipath model, all the multiple transmission paths in the multipath set of the target TCP flow are equivalent paths, and at this time, the first path delay is the delay of the uplink path with the largest delay in the equivalent paths, and the second path delay is the delay of the uplink path with the smallest delay in the equivalent paths; when the network accessed by the host is a conventional multipath model, a plurality of transmission paths in the multipath set of the target TCP stream can comprise equivalent paths or non-equivalent paths, and at the moment, the first path delay is the delay of the uplink path with the largest delay in the equivalent or non-equivalent paths, and the second path delay is the delay of the uplink path with the smallest delay in the equivalent or non-equivalent paths. It should be noted that, since the source addresses of the data packets sent from the source host are necessarily the same, if the addresses of the destination hosts are the same, that is, the destination addresses are the same or the destination network segments are the same, the multipath sets between different TCP flows between the sending end and the receiving end in the network (such as a data center network) of the equal-cost multipath model are actually the same, so that the calculation of the time threshold can be performed based on the uplink path delay of the same historical packet segment as the five-tuple or the three-tuple of the target TCP flow.

For example, as shown in fig. 3, since the network topology in fig. 3 is a data center network, belonging to the equal cost multipath model, all paths in the multipath set of the target TCP flow under the network are equal cost paths, such as path 1, path 2, path 3 and path shown in fig. 3.

In one possible implementation manner, the host maintains a flow information table, where the flow information table includes flow information of N TCP flows, N is an integer greater than or equal to 1, and the flow information of each TCP flow includes a flow index of a corresponding TCP flow; the obtaining the target flow information matched with the target TCP flow includes: and searching the target flow information matched with the target TCP flow from the flow information table according to the flow index of the target TCP flow. For example, to implement the above-described functions of the embodiments of the present invention, the additional layer protocol needs to maintain a flow information table, flowInfoTable, for recording flow information when dividing the Flowlet for each TCP flow. Each TCP flow takes an entry in the FlowInfoTable table, as shown in table 1,

TABLE 1

In table 1 above, each entry may contain six entries: srcPort, lstFLTag, lstTS, TTDiff, tripTime _max, triptime_min. Wherein,

(1) A TCP stream index (SrcPort) entry is used to index each TCP stream. I.e. the labels of the TCP flows. Wherein this term corresponds to the stream index of the TCP stream described in this application.

(2) The field value (LstFLTag) entry of the last packet is the fl_tag field value of the last segment of the TCP stream. I.e. whether the Flowlet identification corresponding to the last transmitted packet is 0 or 1. Wherein this item corresponds to the reference Flowlet identification described in the present application.

(3) The LstTS entry is the timestamp value of a segment on the TCP stream. That is, the timestamp value of the last packet sent (the timestamp that the transport layer just added) is typically on the order of microseconds (us). Wherein the term corresponds to a timestamp value of the second packet segment described in the present application.

(4) The TTDiff term is the time threshold that the TCP flow uses to divide the Flowlet. That is, each stream maintains a time threshold, typically on the order of microseconds (us), separately. For example, the difference between the triptime_max term and the triptime_min term shown in table 1 is, for example, 58 for the triptime_max term and 31 for the triptime_min term, 27 for the TTDiff term, 49 for the triptime_max term and 36 for the triptime_min term, and 13 for the TTDiff term. Wherein this term corresponds to the time threshold described in the present application.

(5) The triptime_max term is the time delay of the uplink path (uplink path refers to the path from the transmitting end to the receiving end) with the largest time delay in the multipath set corresponding to the TCP flow, and the unit is typically in the order of microseconds (us). Wherein this term corresponds to the first path delay described in this application.

(6) The triptime_min term is the delay of the uplink path with the smallest delay in the multipath set corresponding to the TCP flow, and its unit is typically in the order of microseconds (us). Wherein this term corresponds to the second path delay described in this application.

Step S503: and comparing the difference value between the time stamp of the first message segment and the time stamp of the second message segment with the time threshold.

Specifically, after determining the target TCP flow to which the first packet segment belongs and searching for the target flow information (for example, the content of one table entry in the table 1), determining the timestamp of the second packet segment from the target TCP flow information, and comparing the difference between the timestamp of the first packet segment (the timestamp is determined from the first packet segment) and the timestamp of the second packet segment with the time threshold in the target flow information; and comparing to obtain whether the time difference between the first message segment and the second message segment exceeds the maximum time interval corresponding to the target TCP stream to which the message segment belongs, namely the time threshold.

Step S504: and determining whether the first message segment and the second message segment are divided into the same Flowlet according to the comparison result.

Specifically, according to the comparison result in step S303, it is determined whether to divide the first packet segment and the second packet segment into the same Flowlet, and forwarding is performed through the same path for the same Flowet in one TCP flow, and the forwarding path is required to be re-determined for different flowlets.

In one possible implementation manner, if the difference between the time stamp of the first message segment and the time stamp of the second message segment is less than or equal to the time threshold, dividing the first message segment and the second message segment into the same Flowlet; and if the difference value between the time stamp of the first message segment and the time stamp of the second message segment is larger than the time threshold value, dividing the first message segment into a new Flowlet. In the embodiment of the invention, if the difference value between the time stamps of the first message segment to be sent and the second message segment adjacent to the last one in the target TCP stream is smaller than the time threshold value corresponding to the target TCP stream (the time threshold value is dynamically changed), the first message segment and the second message segment adjacent to the last one are considered to meet the condition of being sent in the same Flowlet, namely the first message segment and the second message segment adjacent to the last one can be judged to be divided into the same Flowlet; similarly, if the difference between the time stamps of the first message segment to be sent and the second message segment adjacent to the last one in the target TCP flow is greater than the time threshold corresponding to the target TCP flow, the condition that the first message segment and the second message segment adjacent to the last one are sent in the same Flowlet is not met, that is, the first message segment is divided into new flowets.

As shown in fig. 6A, fig. 6A is a schematic diagram of a Flowlet in which a first data packet and a second data packet are provided in an embodiment of the present invention; in fig. 6A, the first packet is a packet in which the first segment is encapsulated by the additional layer, and the second packet is a packet in which the second segment is encapsulated by the additional layer. In fig. 6A, assuming that the difference between the timestamp values of the first packet segment and the second packet segment is less than or equal to the time threshold, the host side divides the first packet segment and the second packet segment in the same Flowlet (Flowlet 5 in the figure), that is, the corresponding first packet and second packet are divided in the same Flowlet 5.

As shown in fig. 6B, fig. 6B is a schematic diagram of a first packet and a second packet in different flowlets according to an embodiment of the present invention, in fig. 6B, assuming that a difference between timestamp values of a first packet segment and a second packet segment is greater than the time threshold, a host side divides the first packet segment and the second packet segment into different flowlets (respectively, flowlet5 and Flowlet4 in the figure), that is, corresponding first packet and second packet are divided into Flowlet5 and Flowlet 4. It will be appreciated that this corresponds to the first packet being the first packet in the new Flowlet.

Aiming at the problem that the fixed detection interval in the existing Flowlet granularity load balancing scheme is difficult to adapt to dynamic network load, the embodiment of the invention combines the time delay feedback information of a network path to dynamically configure the time interval for detecting the Flowlet so as to ensure that the Flowlet granularity is matched with the network path state.

Optionally, embodiments of the present invention may further include the following method steps S505-S506.

Step S505: and generating a first data packet, wherein the first data packet comprises the first message segment and a Flowlet identifier of the first message segment.

Specifically, after the source host side encapsulates the first data packet segment through the additional layer and the network layer, a first data packet is further generated, where the first data packet includes the first packet segment and a Flowlet identifier of the first packet segment. That is, in the process of generating the first data packet, the Flowlet identifier of the packet segment needs to be encapsulated in addition to the header of the encapsulation-related protocol. Alternatively, the Flowlet identification of the first segment may be encapsulated on a Flowlet identification bit on the header.

The target flow information further includes a reference Flowlet identifier (i.e. corresponding to the LstFLTag field in table 1) of the target TCP flow, and if the reference Flowlet identifier is currently the first Flowlet identifier corresponding to the second packet segment, that is, if the last recently transmitted packet segment is the second packet segment, the reference Flowlet identifier actually refers to the first Flowlet identifier corresponding to the second packet segment. If the difference value between the time stamp of the first message segment and the time stamp of the second message segment is smaller than or equal to the time threshold value, the Flowlet identifier of the first message segment is the first Flowlet identifier; and if the difference value between the time stamp of the first message segment and the time stamp of the second message segment is larger than the time threshold value, the Flowlet identifier of the first message segment is the second Flowlet identifier.

Step S506: and if the difference value between the time stamp of the first message segment and the time stamp of the second message segment is larger than the time threshold value, updating the reference Flowlet identifier to the second Flowlet identifier.

Specifically, the flow information of each TCP flow in the flow information table maintained at the host side further includes a reference Flowlet identifier of each TCP flow, that is, the identifier of the current Flowlet of each TCP flow is maintained in the flow information table, so as to set the corresponding Flowlet identifier for the message segment to be sent. For example, assuming that the reference Flowlet identifier is the first Flowlet identifier (i.e., the Flowlet identifier corresponding to the second message segment), when the first message segment and the second message segment are divided into the same Flowlet, the Flowlet identifier of the first message segment is also marked as the first Flowlet identifier, i.e., the reference Flowlet identifier remains unchanged as the first Flowlet identifier; if the reference Flowlet identifier is a first Flowlet identifier, and when the first message segment and the second message segment are divided into different flowlets (i.e. the first message segment is divided into a new Flowlet), the Flowlet identifier of the first message segment is marked as a second Flowlet identifier, and the reference Flowlet identifier needs to be updated as the second Flowlet identifier at the moment. Alternatively, the reference Flowlet identifier may be switched between 0 or 1, i.e. between two adjacent flowlets, where the Flowlet identifier takes a value between 0 or 1, so that it is only possible to accurately indicate whether different data packets belong to the same Flowlet by 1 bit.

In one possible implementation, the host also updates the first path delay or the second path delay according to the received ACK packet to update the time threshold of the target TCP flow in real time. Specifically, the host receives a target ACK packet, where the target ACK packet is an ACK packet with the same destination port number or the same destination address of the target TCP flow; determining the uplink path delay of the target ACK packet, wherein the uplink path delay is the difference value between the timestamp value of the target ACK packet and the timestamp loopback answer value; comparing the uplink path delay of the target ACK packet with the uplink path delay of the historical ACK packet in the target TCP stream; if the uplink path delay of the target ACK packet is greater than the maximum value of the uplink path delays of the historical ACK packets, updating the first path delay to the uplink path delay of the target ACK packet; and if the uplink path delay of the target ACK packet is smaller than the minimum value in the uplink path delay of the historical ACK packet, updating the second path delay into the uplink path delay of the target ACK packet.

In the embodiment of the present invention, the time threshold for dividing the Flowlet may be calculated from the difference between the maximum uplink path delay and the minimum uplink path delay of the historical packet received in the target TCP flow (or the TCP flow in the same network session with the target TCP flow), that is, the time threshold is a value that is dynamically adjusted according to the real-time change of the network transmission load condition. Specifically, when the host side receives an ACK packet belonging to a target TCP flow (i.e. the destination port number is the same), or receives an ACK packet of a TCP flow in the same network session as the target TCP flow (i.e. the destination address is the same or the destination network segment is the same), the host side calculates the transmission delay of the uplink path of the ACK packet by sending the difference between the timestamp value of the target ACK packet and the timestamp loopback reply value, determines a current minimum uplink path delay according to the history value of the uplink path transmission delays of all the received ACK packets, uses the current minimum uplink path delay as the first path delay, determines a current maximum uplink path delay as the second path delay; finally, calculating a time threshold value for dividing the Flowlet in the target TCP stream by utilizing the difference value of the first path delay and the second path delay. That is, after each time the target ACK packet is received, it is required to detect whether the first path delay or the second path delay needs to be updated, so that different TCP flows or data of the same TCP flow in different states, the time threshold for dividing the Flowlet is dynamically changed, and is dynamically adjusted according to the real-time transmission delay of the data in the corresponding TCP flow, so that the method can always adapt to the dynamic network load change.

Referring to fig. 6C, fig. 6C is a schematic flow chart of dividing and marking a Flowlet by an additional layer protocol according to an embodiment of the present invention, based on the flow information table maintained by the host in the above table 1, the following exemplary description describes a process for triggering the function of the additional layer and performing the dividing and marking of the Flowlet after the TCP segment completes the encapsulation of the header field of the transport layer, which specifically includes the following steps:

1. after a TCP segment (e.g., a first segment) enters a transport layer, an entry (denoted as SrcPort) corresponding to the TCP flow is first indexed in a FlowInfoTable information table (a flow information table described in table 1) according to source port information carried by the segment.

2. And further obtaining a time stamp value (marked as CruTS) carried by the message segment (such as the first message segment).

3. And next, judging whether the message segment (such as the first message segment) meets the segmentation condition of the Flowlet, namely judging the size relation (namely CurTS- [ SrcPort ]. LstTS.

4. If the judging condition is matched, the current message segment (such as the first message segment) is regarded as the first message segment of a new Flowlet and marked on the FL_Tag bit of the header, the marking method is that the FL_Tag bit value of the header of the message segment is set as the opposite value of the LstFLTag item in the item, and then the value of the LstFLTag item in the item is updated;

5. If the judging condition is not matched, the current message segment (such as the first message segment) is regarded as the subsequent message segment of the last Flowlet and marked on the FL_Tag bit of the header, and the marking method is that the FL_Tag bit value of the header of the message segment is set as the same value of the LstFLTag item in the entry. After the additional layer completes the division and marking of the Flowlet, the TCP segment is delivered to the network layer.

Referring to fig. 6D, fig. 6D is a flowchart illustrating a procedure for dynamically updating a segmentation threshold of a Flowlet according to an embodiment of the present invention. Based on the flow information table maintained by the host in the above table 1, it should be noted that, firstly, because the data center network topology provides multiple equivalent paths for the same pair of source destination hosts, in the embodiment of the present invention, uplink path delays of equivalent paths (including equivalent paths in the same TCP flow or equivalent paths in the same network session) are continuously obtained according to timestamps carried by ACK packets, maximum values and minimum values of the uplink path delays are recorded, and the maximum delay difference between the equivalent paths is represented by a difference value between the maximum values and the minimum values of the uplink path delays, and then TTDiff parameters are periodically configured based on the maximum delay difference. The following exemplary description of the implementation of the additional layer protocol to dynamically configure TTDiff parameters for indicating the partitioning of a Flowlet may specifically include the steps of:

1. As shown in fig. 6D, after an ACK packet (e.g., a destination ACK packet) returned from the destination host side enters the additional layer protocol of the source host, the host side indexes an entry (denoted as SrcPort) corresponding to the ACK packet in the FlowInfoTable information table according to the carried destination port information, where the entry is the same as an entry corresponding to the TCP flow associated with the ACK packet.

2. The timestamp value carried by the ACK packet is read, including a timestamp value field value (denoted as Timesatmp) and a timestamp loopback reply field value (denoted as TimesatmpEcho), and the difference between the two timestamps is used to characterize the delay of the uplink path (denoted as TripTime).

3. And comparing the calculated uplink path delay with the maximum uplink path delay (namely [ SrcPort ]. TrpTime_max) and the minimum uplink path delay (namely [ SrcPort ]. TrpTime_min) recorded in the entry.

4. If the uplink path delay is greater than the maximum uplink path delay recorded in the information table entry, updating the [ SrcPort ]. TrpTime_max item in the entry to be the value of the uplink path delay;

5. if the uplink path delay is smaller than the minimum uplink path delay recorded in the information table entry, updating the [ SrcPort ]. TripTime_min item in the entry to be the value of the uplink path delay.

Optionally, the additional layer protocol may also update TTDiff entries of all entries in the information table periodically, and in the embodiment of the present invention, the value of the TTDiff entry is configured as the difference value between the triptime_max entry and the triptime_min entry of each entry, and at the same time, the values of the triptime_max entry and the triptime_min entry are reset, so as to avoid that the dead maximum value or the dead minimum value always exists.

The embodiment of the invention can also set the update period to be in the time order of the network round trip delay (about 100-200 microseconds), set the reset value of the triptime_max item to zero, and set the reset value of the triptime_min item to a larger value (such as the maximum value represented by 4 bytes).

The Flowlet technology can well solve the problems of hash collision, mouse flow blocking, asymmetry and the like faced by data center network load balancing. The existing Flowlet level scheme detects and forwards flowlets at the switch based on fixed time intervals, but the fixed time intervals cannot be matched with traffic load moment dynamically changing in the data center network, so that load distribution in the network is uneven. The application proposes that traffic is segmented in advance at the end host based on a time threshold value adaptively changing with path load, then fine-grained flowlets are scattered into the network, and after the switch identifies the flowlets, any routing algorithm can be executed to further balance the load.

Referring to fig. 7, fig. 7 is a flowchart of a data transmission method according to an embodiment of the present invention, where the method may be applied to a switch in the network architecture described in fig. 2 or fig. 3, and the switch 20 may be used to support and execute the method steps S701 to S704 shown in fig. 7. The following will describe from the exchanger side with reference to fig. 3. The method may include the following steps S701-S704.

Step S701: a first data packet is received.

Specifically, the first data packet includes a first packet segment and a Flowlet identifier of the first packet segment.

Step S702: and determining a target TCP stream to which the first data packet belongs, and acquiring forwarding information matched with the target TCP stream.

Specifically, the forwarding information includes a reference Flowlet identification of the target TCP flow and a reference forwarding path; the reference Flowlet identifier is a first Flowlet identifier corresponding to a second message segment currently, and the second message segment is the last message segment adjacent to the first message segment in the target TCP stream; the reference forwarding path is a first forwarding path of a second data packet, and the second data packet comprises the second message segment and the first Flowlet identifier;

In one possible implementation manner, the switch maintains a forwarding information table, where the forwarding information table includes forwarding information of M TCP flows, M is an integer greater than or equal to 1, and the forwarding information of each TCP flow includes a five-tuple hash value of a corresponding TCP flow; the determining the target TCP flow to which the first data packet belongs, and obtaining forwarding information matched with the target TCP flow, includes: calculating a five-tuple hash value of the first data packet according to the five-tuple information of the first data packet; and searching forwarding information matched with the target TCP flow from the forwarding information table according to the five-tuple hash value of the first data packet. In the embodiment of the invention, the exchange side maintains a forwarding information table, and the forwarding information table comprises forwarding information of one or more TCP streams (such as TCP streams in an active state) on a host connected with the forwarding information table, and the forwarding information of each TCP stream can further comprise five-tuple hash values of the TCP streams. That is, the switch may maintain forwarding information of all TCP flows currently active, so that when there is a data packet to be sent, forwarding information (including a reference Flowlet identifier, a forwarding path, etc.) that matches the five-tuple hash value in the forwarding information table may be found according to the five-tuple hash value of the data packet, so as to forward the data packet to be sent.

Step S703: comparing the Flowlet identification of the first message segment with the first Flowlet identification;

step S704: and determining whether to forward the first message segment through the first forwarding path according to the comparison result.

Specifically, after receiving the data packet, the exchange side identifies the Flowlet identifier in the data packet, determines whether the Flowlet identifier of the first data packet is the same as that of the adjacent data packet in the target TCP stream according to the Flowlet identifier, and determines whether the first data packet needs to be forwarded through a forwarding path corresponding to the second data packet based on the Flowlet identifier. That is, the exchange side does not need to divide the Flowlet for the received data packet according to the time interval of the received data packet, but directly identifies whether the data packet to be sent currently belongs to the same Flowlet as the last adjacent data packet in the same TCP flow according to the Flowlet identification bit contained in the received data packet, so as to decide whether to continue forwarding through the forwarding path of the adjacent data packet or divide a new Flowlet for the data packet and decide a new forwarding path for the data packet.

It should be noted that, the forwarding path in the embodiment of the present invention refers to a forwarding port that each switch can currently determine. I.e. a complete forwarding path for a packet, may actually need to be decided together via forwarding ports decided by the multi-hop switches separately. Therefore, in the embodiment of the present invention, on the switch side, actually, each switch in the multi-hop switch performs the above data transmission method, so as to finally determine the complete forwarding path of the first data packet.

In one possible implementation manner, if the Flowlet identifier of the first packet segment is the same as the first Flowlet identifier, forwarding the first data packet through the first forwarding path; and if the Flowlet identification of the first message segment is different from the first Flowlet identification, determining a second forwarding path for the first data packet, and forwarding through the second forwarding path. In the embodiment of the invention, when the switch recognizes that the Flowlet identification of the first data packet is the same as that of the adjacent data packet in the target TCP stream, the first data packet and the second data packet are forwarded on the same path; when the switch recognizes that the Flowlet identification of the first data packet is different from that of the adjacent data packet in the target TCP stream, a new forwarding path is determined for the first data packet, and forwarding is performed through the new forwarding path. It should be noted that the second forwarding path may be the same as or different from the first forwarding path, depending on the decision result of the switch.

In one possible implementation, the method further includes: if the Flowlet identifier of the first message segment is different from the first Flowlet identifier and is a second Flowlet identifier, updating the reference Flowlet identifier of the target TCP flow to the second Flowlet identifier, and updating the reference forwarding path to the second forwarding path. In the embodiment of the invention, when the Flowlet identifiers of the first data packet and the second data packet are different, the first data packet and the last adjacent second data packet in the belonged TCP stream are not the same Flowlet, so that the switch needs to divide the first data packet into new flowlets, and the reference Flowlet identifier of the belonged TCP stream needs to be updated to the Flowlet identifier corresponding to the current latest data packet, namely the second Flowlet identifier.

In summary, embodiments of the present invention identify the Flowlet at the switch based on the identification bit of the data packet after the Flowlet is partitioned and marked at the host side. The switch implements the above function through a forwarding information Table (Flowlet Table) of the Flowlet, wherein the format of Table 2 is as follows:

TABLE 2

In table 2 above, each entry of the Flowlet forwarding information table may contain three items: entry, FLTag, port. Wherein,

(1) The Entry records the hash value of the packet five-tuple (source IP, destination IP, source port, destination port, protocol number) and indexes the corresponding Entry of the TCP flow in the forwarding table according to the hash value. Wherein the term corresponds to the five-tuple hash value described in the present application.

(2) The FLTag entry records Flowlet flag information for identification of adjacent flowlets, where the entry corresponds to a reference Flowlet identification as described in this application.

(3) The Port entry records information of the forwarding Port. Wherein this item corresponds to the forwarding information described in the present application.

Based on the forwarding information table maintained by the switch in table 2, the following exemplarily describes an implementation procedure for identifying the Flowlet on the switch side according to the identification bit of the data packet, which may include the following main steps:

1. For each arriving packet, the switch first identifies which TCP flow the packet belongs to, and then identifies which Flowlet the packet belongs to.

2. The exchanger firstly carries out hash operation on the five-tuple of the data packet to obtain a hash value, and then searches a forwarding table Entry corresponding to an Entry item equal to the hash value in a Flowlet forwarding table to judge which TCP stream the arrived data packet belongs to. For example, the hash value is key1, key2, or the like in table 1.

3. The value of the data packet header fl_tag bit is then read and compared with the current value of the forwarding table entry FLTag entry, i.e. the Flowlet identity carried in the data packet is compared with the reference Flowlet identity recorded in the entry for the corresponding TCP flow.

4. If the data packets are equal, the data packets belong to the current flow burst, and the data packets are forwarded to an output Port indicated by the entry Port item.

5. If not, the data packet is a new flow burst of the data flow, the value of the FLTag item of the forwarding table entry is updated to be the value of the FL_Tag bit of the data packet header, then a load balancing decision is carried out on the new Flowlet, and the decision result is saved to the Port item of the forwarding table entry.

6. When judging which TCP stream an arriving data packet belongs to, if no corresponding forwarding table entry is found according to the hash value, the data packet belongs to a new TCP stream and is the first data packet of the new stream. At this time, a forwarding table entry needs to be newly created, the value of the FLTag entry is the value of the Flowlet flag bit of the data packet, then load balancing decision is carried out, and the decision result is stored in the Port entry of the forwarding table.

For example, assuming that there is a newly arrived packet a at the switch, the switch hashes the newly arrived packet a in five tuples to obtain a hash value of key1, and further assuming that the Flowlet identification value of the packet a is 0; the switch indexes the corresponding table entry in the forwarding information table according to the hash value key1, and compares the FLTag value in the table entry with the Flowlet identification value of the data packet a, and as 0=0, forwards the data packet a to the output Port indicated by the Port entry in the table entry (i.e. the first forwarding path described in the application).

Assuming that a newly arrived data packet B exists at the switch, performing five-tuple hash on the newly arrived data packet B by the switch to obtain a hash value of key2, and assuming that the Flowlet identification value of the data packet B is 0; the switch indexes the corresponding table entry in the forwarding information table according to the hash value key2, compares the FLTag value in the table entry with the Flowlet identification value of the data packet B, and because 1 is not equal to 0, indicates that the data packet B is the first data packet of a new Flowlet of the TCP flow, the data packet B cannot be forwarded to the output Port indicated by the Port entry in the table entry, and needs to make a routing decision on the data packet B again, select a new output Port, and update the value of the Port entry (i.e., the second forwarding path described in the present application).

In summary, the main protection points of the present application may include the following points:

1. an additional layer protocol is added to the transmission layer or the newly added additional layer, an information table is maintained, each stream monopolizes an entry in the information table, and the entry comprises relevant information when the Flowlet is split.

2. According to the time stamp option in the ACK data packet, continuously acquiring single-way time delay information of the path at the host, and further calculating the time delay of the uplink path with the maximum time delay in the multi-path set (which can comprise equivalent paths or non-equivalent paths); and periodically setting the maximum delay difference as a time threshold for splitting the Flowlet to ensure that the time threshold can be dynamically adapted to the path load and updating the information table.

3. When a stream is split, once the difference value between the time stamp of the current message segment and the time stamp of the last message segment of the stream exceeds a set time threshold value, the current message segment is regarded as the first message segment of a new Flowlet; and distinguishing different flowlets of the same flow by using one bit of a reserved field in a TCP header as a flag bit, wherein the values of the flag bits of all message segments in the same Flowlet are the same, and the values of the flag bits of adjacent flowlets are opposite.

4. And identifying each Flowlet at the switch according to the five-tuple hash and the header one bit flag bit of the transmission layer or the additional layer, and completing the forwarding of the data packet by adopting any routing algorithm.

The foregoing details the method according to the embodiments of the present invention, and the following provides relevant apparatuses according to the embodiments of the present invention.

Referring to fig. 8, fig. 8 is a schematic structural diagram of a data processing apparatus according to an embodiment of the present invention, and the data processing apparatus 80 may include a first generating unit 801, a first determining unit 802, an acquiring unit 803, a first comparing unit 804, and a Flowlet dividing unit 805, wherein the respective units are described in detail below.

A first generating unit 801, configured to generate a first message segment;

a first determining unit 802, configured to determine a target TCP flow to which the first packet segment belongs;

an obtaining unit 803, configured to obtain a timestamp of the first packet segment, and obtain target flow information matched with the target TCP flow, where the target flow information includes a time threshold corresponding to the target TCP flow and a timestamp of a second packet segment in the target TCP flow; the second packet segment is the last packet segment adjacent to the first packet segment in the target TCP flow, the time threshold is the difference between a first path delay and a second path delay, the first path delay is the delay of the uplink path with the largest delay in the multipath set of the target TCP flow, and the second path delay is the delay of the uplink path with the smallest delay in the multipath set of the target TCP flow;

A first comparing unit 804, configured to compare a difference value between the timestamp of the first packet segment and the timestamp of the second packet segment with the time threshold;

and a Flowlet dividing unit 805 configured to determine whether to divide the first message segment and the second message segment into the same Flowlet according to the comparison result.

In one possible implementation, the apparatus further includes:

It should be noted that, the functions of each functional unit in the data processing apparatus 80 described in the embodiment of the present invention may be referred to the related descriptions of step S501 to step S506 in the method embodiment described in fig. 5, and are not repeated here.

Referring to fig. 9, fig. 9 is a schematic structural diagram of a data transmission device provided in an embodiment of the present invention, and the data transmission device 90 may include a receiving unit 901, a determining unit 902, a comparing unit 903, and a forwarding unit 904, where the detailed descriptions of the respective units are as follows.

A receiving unit 901, configured to receive a first data packet, where the first data packet includes a first packet segment and a Flowlet identifier of the first packet segment, and the first data packet belongs to a target TCP flow;

a determining unit 902, configured to determine a target TCP flow to which the first data packet belongs, and obtain forwarding information that matches the target TCP flow; the forwarding information comprises a reference Flowlet identification and a reference forwarding path of the target TCP flow; the reference Flowlet identifier is a first Flowlet identifier corresponding to a second message segment currently, and the second message segment is the last message segment adjacent to the first message segment in the target TCP stream; the reference forwarding path is a first forwarding path of a second data packet, and the second data packet comprises the second message segment and the first Flowlet identifier;

A comparing unit 903, configured to compare the Flowlet identifier of the first packet segment with the first Flowlet identifier;

and a forwarding unit 904, configured to determine whether to forward the first packet segment through the first forwarding path according to the comparison result.

In one possible implementation, the apparatus further includes:

It should be noted that, the functions of each functional unit in the data transmission device 90 described in the embodiment of the present invention can be referred to the related descriptions of step S701 to step S704 in the method embodiment described in fig. 7, and are not repeated here.

The embodiment of the invention also provides a host, wherein the host comprises a processor, a memory and a communication interface, wherein the memory is used for storing data processing program codes, and the processor is used for calling the data processing program codes to execute part or all of the steps of any one of the data processing methods described in the embodiment of the method.

The embodiment of the invention also provides a switch, wherein the host comprises a processor, a memory and a communication interface, wherein the memory is used for storing data transmission program codes, and the processor is used for calling the data transmission program codes to execute part or all of the steps of any one of the data transmission methods described in the embodiment of the method.

The embodiment of the present invention also provides a computer readable storage medium, where the computer readable storage medium may store a program, where the program when executed by a host includes part or all of the steps of any one of the data processing methods described in the above method embodiments.

The embodiments of the present invention also provide a computer program comprising instructions which, when executed by a switch, cause the switch to perform part or all of the steps of any one of the data transmission methods.

In the foregoing embodiments, the descriptions of the embodiments are emphasized, and for parts of one embodiment that are not described in detail, reference may be made to the related descriptions of other embodiments.

It should be noted that, for simplicity of description, the foregoing method embodiments are all expressed as a series of action combinations, but it should be understood by those skilled in the art that the present application is not limited by the order of actions described, as some steps may be performed in other order or simultaneously in accordance with the present application. Further, those skilled in the art will also appreciate that the embodiments described in the specification are all preferred embodiments, and that the acts and modules referred to are not necessarily required in the present application.

In the several embodiments provided in this application, it should be understood that the disclosed apparatus may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, such as the above-described division of units, merely a division of logic functions, and there may be additional manners of dividing in actual implementation, such as multiple units or components may be combined or integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, or may be in electrical or other forms.

The units described above as separate components may or may not be physically separate, and components shown as units may or may not be physical units, may be located in one place, or may be distributed over a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in the embodiments of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.

The integrated units described above, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be essentially or a part contributing to the prior art or all or part of the technical solution may be embodied in the form of a software product stored in a storage medium, including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc., in particular may be a processor in the computer device) to perform all or part of the steps of the above-described method of the various embodiments of the present application. Wherein the aforementioned storage medium may comprise: various media capable of storing program codes, such as a U disk, a removable hard disk, a magnetic disk, a compact disk, a Read-Only Memory (abbreviated as ROM), or a random access Memory (Random Access Memory, abbreviated as RAM), are provided.

The above embodiments are merely for illustrating the technical solution of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the corresponding technical solutions.

Claims

A data processing method, applied to a host, comprising:

generating a first message segment, and determining a target TCP stream to which the first message segment belongs;

acquiring a time stamp of the first message segment, and acquiring target flow information matched with the target TCP flow, wherein the target flow information comprises a time threshold corresponding to the target TCP flow and a time stamp of a second message segment in the target TCP flow; the second packet segment is the last packet segment adjacent to the first packet segment in the target TCP flow, the time threshold is the difference between a first path delay and a second path delay, the first path delay is the delay of the uplink path with the largest delay in the multipath set of the target TCP flow, and the second path delay is the delay of the uplink path with the smallest delay in the multipath set of the target TCP flow;

comparing the difference between the time stamp of the first message segment and the time stamp of the second message segment with the time threshold;

and determining whether the first message segment and the second message segment are divided into the same Flowlet according to the comparison result.
The method of claim 1, wherein the determining the target TCP flow to which the first segment belongs comprises:

And determining the target TCP stream to which the first message segment belongs according to the source port number of the first message segment.
The method of claim 1 or 2, wherein the host maintains a flow information table comprising flow information for N TCP flows, N being an integer greater than or equal to 1, wherein the flow information for each TCP flow comprises a flow index for the corresponding TCP flow; the obtaining the target flow information matched with the target TCP flow includes:

and searching the target flow information matched with the target TCP flow from the flow information table according to the flow index of the target TCP flow.
A method according to any one of claims 1-3, wherein said determining whether to divide said first segment and said second segment into the same Flowlet based on the comparison result comprises:

if the difference value between the time stamp of the first message segment and the time stamp of the second message segment is smaller than or equal to the time threshold value, dividing the first message segment and the second message segment into the same Flowlet;

and if the difference value between the time stamp of the first message segment and the time stamp of the second message segment is larger than the time threshold value, dividing the first message segment into a new Flowlet.
The method of any of claims 1-4, wherein the target flow information further includes a reference Flowlet identification of the target TCP flow, the reference Flowlet identification being currently the first Flowlet identification corresponding to the second segment; the method further comprises the steps of:

generating a first data packet, wherein the first data packet comprises the first message segment and a Flowlet identifier of the first message segment; wherein,

if the difference value between the time stamp of the first message segment and the time stamp of the second message segment is smaller than or equal to the time threshold value, the Flowlet identifier of the first message segment is the first Flowlet identifier;

and if the difference value between the time stamp of the first message segment and the time stamp of the second message segment is larger than the time threshold value, the Flowlet identifier of the first message segment is the second Flowlet identifier.
The method of claim 5, wherein the method further comprises:

and if the difference value between the time stamp of the first message segment and the time stamp of the second message segment is larger than the time threshold value, updating the reference Flowlet identifier to the second Flowlet identifier.
The method of any one of claims 1-6, wherein the method further comprises:

Receiving a target ACK packet, wherein the target ACK packet is an ACK packet with the same destination port number or the same destination address as the destination TCP stream;

determining the uplink path delay of the target ACK packet, wherein the uplink path delay is the difference value between the timestamp value of the target ACK packet and the timestamp loopback answer value;

comparing the uplink path delay of the target ACK packet with the delay of the uplink path in the multipath set of the target TCP stream;

if the uplink path delay of the target ACK packet is larger than the delay of the uplink path with the maximum current delay in the multipath set, updating the first path delay into the uplink path delay of the target ACK packet;

and if the uplink path delay of the target ACK packet is smaller than the delay of the uplink path with the minimum current delay in the multipath set, updating the second path delay to be the uplink path delay of the target ACK packet.
The method of any of claims 1-7, wherein the multi-path set comprises a plurality of equivalent transmission paths of the target TCP stream; alternatively, the multi-path set includes a plurality of equivalent transmission paths and a non-equivalent transmission path of the target TCP stream; alternatively, the multi-path set includes a plurality of non-equivalent transmission paths of the target TCP stream.
A data transmission method, applied to a switch, comprising:

receiving a first data packet, wherein the first data packet comprises a first message segment and a Flowlet identifier of the first message segment;

determining a target TCP stream to which the first data packet belongs, and acquiring forwarding information matched with the target TCP stream; the forwarding information comprises a reference Flowlet identification and a reference forwarding path of the target TCP flow; the reference Flowlet identifier is a first Flowlet identifier corresponding to a second message segment currently, and the second message segment is the last message segment adjacent to the first message segment in the target TCP stream; the reference forwarding path is a first forwarding path of a second data packet, and the second data packet comprises the second message segment and the first Flowlet identifier;

comparing the Flowlet identification of the first message segment with the first Flowlet identification;

and determining whether to forward the first message segment through the first forwarding path according to the comparison result.
The method of claim 9, wherein the switch maintains a forwarding information table comprising forwarding information for M TCP flows, M being an integer greater than or equal to 1, wherein the forwarding information for each TCP flow comprises five-tuple hash values for the corresponding TCP flow; the determining the target TCP flow to which the first data packet belongs, and obtaining forwarding information matched with the target TCP flow, includes:

Calculating a five-tuple hash value of the first data packet according to the five-tuple information of the first data packet;

and searching forwarding information matched with the target TCP flow from the forwarding information table according to the five-tuple hash value of the first data packet.
The method according to claim 9 or 10, wherein determining whether to forward the first segment through the forwarding path according to the comparison result comprises:

if the Flowlet identification of the first message segment is the same as the first Flowlet identification, forwarding the first data packet through the first forwarding path;

and if the Flowlet identification of the first message segment is different from the first Flowlet identification, determining a second forwarding path for the first data packet, and forwarding through the second forwarding path.
The method of claim 11, wherein the method further comprises:

if the Flowlet identifier of the first message segment is different from the first Flowlet identifier and is a second Flowlet identifier, updating the reference Flowlet identifier of the target TCP flow to the second Flowlet identifier, and updating the reference forwarding path to the second forwarding path.
A data processing apparatus, comprising:

the first generation unit is used for generating a first message segment;

a first determining unit, configured to determine a target TCP flow to which the first packet segment belongs;

the acquisition unit is used for acquiring the time stamp of the first message segment and acquiring target flow information matched with the target TCP flow, wherein the target flow information comprises a time threshold value corresponding to the target TCP flow and the time stamp of a second message segment in the target TCP flow; the second packet segment is the last packet segment adjacent to the first packet segment in the target TCP flow, the time threshold is the difference between a first path delay and a second path delay, the first path delay is the delay of the uplink path with the largest delay in the multipath set of the target TCP flow, and the second path delay is the delay of the uplink path with the smallest delay in the multipath set of the target TCP flow;

a first comparing unit, configured to compare a difference value between the timestamp of the first packet segment and the timestamp of the second packet segment with the time threshold;

and the Flowlet dividing unit is used for determining whether the first message segment and the second message segment are divided into the same Flowlet according to the comparison result.
The apparatus according to claim 13, wherein the first determining unit is specifically configured to:

and determining the target TCP stream to which the first message segment belongs according to the source port number of the first message segment.
The apparatus of claim 13 or 14, wherein the apparatus further comprises:

a maintenance unit, configured to maintain a flow information table, where the flow information table includes flow information of N TCP flows, N is an integer greater than or equal to 1, where the flow information of each TCP flow includes a flow index of a corresponding TCP flow;

the acquisition unit is specifically configured to: and searching the target flow information matched with the target TCP flow from the flow information table according to the flow index of the target TCP flow.
The apparatus according to any of the claims 13-15, wherein the Flowlet partitioning unit is specifically configured to:

if the difference value between the time stamp of the first message segment and the time stamp of the second message segment is smaller than or equal to the time threshold value, dividing the first message segment and the second message segment into the same Flowlet;

and if the difference value between the time stamp of the first message segment and the time stamp of the second message segment is larger than the time threshold value, dividing the first message segment into a new Flowlet.
The apparatus of any one of claims 13-16, wherein the target flow information further includes a reference Flowlet identification of the target TCP flow, the reference Flowlet identification being currently the first Flowlet identification corresponding to the second segment; the apparatus further comprises:

a second generating unit, configured to generate a first data packet, where the first data packet includes the first packet segment and a Flowlet identifier of the first packet segment; wherein,

if the difference value between the time stamp of the first message segment and the time stamp of the second message segment is smaller than or equal to the time threshold value, the Flowlet identifier of the first message segment is the first Flowlet identifier;

and if the difference value between the time stamp of the first message segment and the time stamp of the second message segment is larger than the time threshold value, the Flowlet identifier of the first message segment is the second Flowlet identifier.
The apparatus of claim 17, wherein the apparatus further comprises:

and the first updating unit is used for updating the reference Flowlet identifier into the second Flowlet identifier if the difference value between the time stamp of the first message segment and the time stamp of the second message segment is larger than the time threshold value.
The apparatus according to any one of claims 13-18, wherein the apparatus further comprises:

a receiving unit, configured to receive a target ACK packet, where the target ACK packet is an ACK packet with the same destination port number or the same destination address of the target TCP flow;

a second determining unit, configured to determine an uplink path delay of the target ACK packet, where the uplink path delay is a difference between a timestamp value of the target ACK packet and a timestamp loopback reply value;

a second comparing unit, configured to compare the uplink path delay of the target ACK packet with the delay of the uplink path in the multipath set of the target TCP flow;

a second updating unit, configured to update the first path delay to the uplink path delay of the target ACK packet if the uplink path delay of the target ACK packet is greater than the delay of the uplink path with the maximum current delay in the multipath set;

and a third updating unit, configured to update the second path delay to the uplink path delay of the target ACK packet if the uplink path delay of the target ACK packet is smaller than the delay of the uplink path with the minimum current delay in the multipath set.
The apparatus of any of claims 13-19, wherein the multi-path set comprises a plurality of equivalent transmission paths of the target TCP stream; alternatively, the multi-path set includes a plurality of equivalent transmission paths and a non-equivalent transmission path of the target TCP stream; alternatively, the multi-path set includes a plurality of non-equivalent transmission paths of the target TCP stream.
A data transmission apparatus, comprising:

a receiving unit, configured to receive a first data packet, where the first data packet includes a first packet segment and a Flowlet identifier of the first packet segment, and the first data packet belongs to a target TCP flow;

the determining unit is used for determining a target TCP stream to which the first data packet belongs and acquiring forwarding information matched with the target TCP stream; the forwarding information comprises a reference Flowlet identification and a reference forwarding path of the target TCP flow; the reference Flowlet identifier is a first Flowlet identifier corresponding to a second message segment currently, and the second message segment is the last message segment adjacent to the first message segment in the target TCP stream; the reference forwarding path is a first forwarding path of a second data packet, and the second data packet comprises the second message segment and the first Flowlet identifier;

a comparison unit, configured to compare the Flowlet identifier of the first packet segment with the first Flowlet identifier;

and the forwarding unit is used for determining whether to forward the first message segment through the first forwarding path according to the comparison result.
The apparatus of claim 21, wherein the switch maintains a forwarding information table comprising forwarding information for M TCP flows, M being an integer greater than or equal to 1, wherein the forwarding information for each TCP flow comprises five-tuple hash values for the corresponding TCP flow; the determining unit is specifically configured to:

Calculating a five-tuple hash value of the first data packet according to the five-tuple information of the first data packet;

and searching forwarding information matched with the target TCP flow from the forwarding information table according to the five-tuple hash value of the first data packet.
The apparatus according to claim 21 or 22, wherein the forwarding unit is specifically configured to:

if the Flowlet identification of the first message segment is the same as the first Flowlet identification, forwarding the first data packet through the first forwarding path;

and if the Flowlet identification of the first message segment is different from the first Flowlet identification, determining a second forwarding path for the first data packet, and forwarding through the second forwarding path.
The apparatus of claim 23, wherein the apparatus further comprises:

and the updating unit is used for updating the reference Flowlet identifier of the target TCP flow to the second Flowlet identifier and updating the reference forwarding path to the second forwarding path if the Flowlet identifier of the first message segment is different from the first Flowlet identifier and is the second Flowlet identifier.
A host comprising a processor, a memory, and a communication interface, wherein the memory is configured to store data processing program code, and wherein the processor is configured to invoke the data processing program code to perform the method of any of claims 1-8.
A switch comprising a processor, a memory and a communication interface, wherein the memory is configured to store data transfer program code, and wherein the processor is configured to invoke the data transfer program code to perform the method of any of claims 9-12.
A computer readable storage medium, characterized in that the computer readable storage medium stores a computer program which, when executed by a host, implements the method of any of the preceding claims 1-8.
A computer readable storage medium, characterized in that the computer readable storage medium stores a computer program which, when executed by a switch, implements the method of any of the preceding claims 9-12.
A computer program comprising instructions which, when executed by a host computer, cause the host computer to perform the method of any one of claims 1-8.
A computer program comprising instructions which, when executed by a switch, cause the switch to perform the method of any of claims 9-12.