WO2022067791A1 - Data processing method, data transmission method, and related device - Google Patents

Data processing method, data transmission method, and related device Download PDF

Info

Publication number
WO2022067791A1
WO2022067791A1 PCT/CN2020/119708 CN2020119708W WO2022067791A1 WO 2022067791 A1 WO2022067791 A1 WO 2022067791A1 CN 2020119708 W CN2020119708 W CN 2020119708W WO 2022067791 A1 WO2022067791 A1 WO 2022067791A1
Authority
WO
WIPO (PCT)
Prior art keywords
segment
flowlet
packet
target
flow
Prior art date
Application number
PCT/CN2020/119708
Other languages
French (fr)
Chinese (zh)
Inventor
顾华玺
刁兴龙
余晓杉
唐德智
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to CN202080105542.9A priority Critical patent/CN116325708A/en
Priority to PCT/CN2020/119708 priority patent/WO2022067791A1/en
Publication of WO2022067791A1 publication Critical patent/WO2022067791A1/en

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/10Flow control; Congestion control
    • H04L47/28Flow control; Congestion control in relation to timing considerations
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/02Capturing of monitoring data
    • H04L43/026Capturing of monitoring data using flow identification
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/08Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters
    • H04L43/0852Delays
    • H04L43/0858One way delays
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/10Active monitoring, e.g. heartbeat, ping or trace-route
    • H04L43/106Active monitoring, e.g. heartbeat, ping or trace-route using time related information in packets, e.g. by adding timestamps
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/16Threshold monitoring

Definitions

  • the present application relates to the field of communication technologies, and in particular, to a data processing and transmission method and related equipment.
  • data transmission equipment needs to be able to quickly and accurately balance the traffic load to improve its forwarding performance and enhance the reliability of the network, so as to better serve users.
  • Equal-cost multi-path routing is a commonly used load balancing method.
  • the ECMP technology includes a packet-based (Packet) path selection method and a flow (Flow)-based path selection method.
  • the packet-based path selection method can achieve load balancing, and the delays of different paths in multi-paths are different, resulting in out-of-order packets received at the receiving end, requiring packet reordering;
  • Flow-based path selection method The outgoing interface (packet forwarding path) for forwarding the packet can be determined according to a hash algorithm, and the receiving end does not need to reorder the packet.
  • FIG. 1 is a schematic diagram of dividing a TCP flow into Flowlets in the prior art. For example, after the data packet of the current TCP flow reaches the switch, it is detected as containing For 5 Flowlets, the time difference between each data packet in the same Flowlet reaching the switch is generally relatively small, and the time difference between data packets between different Flowlets reaching the switch is relatively obvious.
  • the time gap (timegap) between the first data packet and the second data packet of Flowlet 1 is less than the predetermined threshold (timeout), so the switch regards these two data packets as the same Flowlet.
  • the time difference between the last data packet of Flowlet 1 and the first data packet of Flowlet 2 is greater than a predetermined threshold, so the switch regards the first data packet of Flowlet 2 as a new Flowlet. That is, if the time difference between two adjacent data packets of the same TCP flow reaching the switch is less than a predetermined time interval (timeout), the switch regards the two data packets as the same Flowlet.
  • the existing flowlet granularity load balancing scheme detects Flowlets based on a fixed time interval at the switch, but the load situation in the data transmission network (such as the data center network) changes dynamically and unpredictably, and the fixed time interval Difficult to adapt to dynamically changing network loads.
  • the time interval is too small, the number of detected Flowlets will increase, and the processing granularity of load balancing will be finer, which will easily increase the risk of out-of-order data packets; when the time interval is too large, the number of detected Flowlets will decrease, and load balancing will The processing granularity is too coarse, and the effect of load balancing is not significant.
  • Embodiments of the present invention provide a data processing and transmission method and related equipment, so as to improve the efficiency and accuracy in the data transmission process.
  • an embodiment of the present invention provides a data processing method, which is applied to a host and may include:
  • Generate a first message segment determine the target TCP flow to which the first message segment belongs; obtain the timestamp of the first message segment, and obtain target flow information matching the target TCP flow, the target The flow information includes the time threshold corresponding to the target TCP flow, and the timestamp of the second segment in the target TCP flow; wherein, the second segment is the same as the first message in the target TCP flow.
  • the time threshold is the difference between the first path delay and the second path delay
  • the first path delay is the longest delay in the multipath set of the target TCP flow
  • the delay of the upstream path, the delay of the second path is the delay of the upstream path with the smallest delay in the multi-path set of the target TCP flow;
  • the difference between the time stamps of the segments is compared with the time threshold; according to the comparison result, it is determined whether to divide the first packet segment and the second packet segment into the same Flowlet.
  • the host side first determines which TCP stream the first message segment belongs to (for example, by determining the source port), and then obtains a matching TCP stream.
  • the target flow information wherein, the target flow information includes various information of the target TCP flow, such as the timestamp of the last adjacent segment (ie, the second segment), and in the target TCP flow
  • the time threshold used to divide the Flowlet further, the host side compares the difference between the timestamp values of the first segment to be sent and the last adjacent second segment with the time threshold, thereby Determines whether to divide the first segment and the previous adjacent segment into the same Flowlet; and the time threshold is dynamically calculated by the delay of the paths in the multipath set of the target TCP flow , for example, calculated according to the difference between the maximum delay and the minimum delay of the real-time update of the historical segment (ACK packet with the same triple or quintuple information of the target TCP stream) received by the host; that is, the time
  • the time threshold used to divide Flowlets is dynamically changed, and is based on the corresponding TCP flow.
  • the real-time data transmission delay is dynamically adjusted, so it can always adapt to dynamic network load changes, avoiding the problem of difficulty in adapting to dynamic network loads caused by the switch side detecting Flowlets based on fixed time intervals in the prior art.
  • the embodiment of the present invention combines the delay feedback information of the network path on the host side, and dynamically configures the time interval for detecting and dividing the flowlet, so that the flowlet granularity matches the state of the network path, reduces the risk of out-of-order data packets, and ensures that the effect of load balancing.
  • the determining the target TCP flow to which the first packet segment belongs includes: determining, according to the source port number of the first packet segment, the target TCP flow to which the first packet segment belongs the target TCP stream.
  • the source port number in the quintuple information of the message segment by identifying the source port number in the quintuple information of the message segment, it is possible to identify which TCP stream the message segment to be sent (ie, the first message segment) belongs to, so as to further obtain the message.
  • Flow information of the TCP flow to which the segment belongs including the timestamp value of the last adjacent segment and the time threshold used to divide the Flowlet), so as to further determine the datagram to be sent based on the relevant information in the flow information Whether the segment belongs to the same Flowlet as the last data segment or is it divided into a new Flowlet.
  • the host maintains a flow information table, and the flow information table includes flow information of N TCP flows, where N is an integer greater than or equal to 1, wherein the flow information of each TCP flow Including the flow index of the corresponding TCP flow;
  • the acquiring target flow information matching the target TCP flow includes: searching the flow information table for the target flow matching the target TCP flow according to the flow index of the target TCP flow the target stream information.
  • the host side maintains a flow information table of one or more TCP flows (for example, a currently active TCP flow), and the flow information table includes flow information of one or more TCP flows, and each The TCP flow information may further include the index of the TCP flow, the time threshold involved in the above-mentioned first aspect, and the timestamp of the last adjacent latest segment of the data segment currently to be sent. That is, the host can maintain the flow information of all currently active TCP flows, so that when a segment needs to be sent, it can find the flow index in the flow information table according to the flow index of the TCP flow to which the segment belongs. Matching target flow information to perform subsequent Flowlet division.
  • the determining, according to the comparison result, whether to divide the first packet segment and the second packet segment into the same Flowlet includes: if the first packet segment is of the same Flowlet The difference between the timestamp and the timestamp of the second packet segment is less than or equal to the time threshold, and the first packet segment and the second packet segment are divided into the same Flowlet; The difference between the timestamp of a segment and the timestamp of the second segment is greater than the time threshold, and the first segment is divided into a new Flowlet.
  • the difference between the timestamps between the first segment to be sent and the last adjacent second segment in the target TCP flow to which it belongs is smaller than that corresponding to the target TCP flow (the time threshold is dynamically changed), it is considered that the first segment and the last adjacent second segment meet the conditions of being sent in the same Flowlet, that is, the first segment can be sent in the same Flowlet.
  • a segment is determined to be divided into the same Flowet as the previous second segment; for the same reason, if the current first segment to be sent is the same as the last adjacent second segment in the target TCP flow to which it belongs If the difference between the time stamps is greater than the time threshold corresponding to the target TCP flow, it is considered that the first segment and the last adjacent second segment do not meet the requirements in the same Flowlet.
  • the condition for sending is to divide the first segment into a new Flowet.
  • the target flow information further includes a reference Flowlet identifier of the target TCP flow, and the reference Flowlet identifier is currently the first Flowlet identifier corresponding to the second segment; the method It also includes: generating a first data packet, the first data packet including the first message segment and the Flowlet identifier of the first message segment; wherein, if the timestamp of the first message segment is the same as the all If the difference between the timestamps of the second segment is less than or equal to the time threshold, the Flowlet identifier of the first segment is the first Flowlet identifier; if the timestamp of the first segment is If the difference from the timestamp of the second packet segment is greater than the time threshold, the Flowlet identifier of the first packet segment is the second Flowlet identifier.
  • the flowlet identifier corresponding to the packet segment can be set during the encapsulation process, so that the packet segment can be encapsulated
  • the switch side can identify which Flowlet the data packet belongs to through the Flowlet identifier, so as to decide which path to send through. For example, when the first packet is to enter the data link layer where the switch is located, the first packet needs to be further encapsulated. In this case, a bit is set in the encapsulated data packet for the switch to identify the packet. The identifier of which Flowlet the segment belongs to.
  • the switch side passes the first packet and the second packet corresponding to the second segment through the same path for forwarding.
  • the embodiment of the present invention divides the Flowlet on the host side, and can use the bit (for example, 1 bit) in the reserved field of the transport layer header to mark the Flowlet, and the switch can identify the Flowlet only by the header field, with high efficiency and high efficiency.
  • the hardware overhead is low, and it also ensures that the same Flowlet will not be segmented again no matter how many hops of switches it goes through in the network, reducing the risk of out-of-order packets.
  • the method further includes: if the difference between the timestamp of the first packet segment and the timestamp of the second packet segment is greater than the time threshold, set the The reference Flowlet identifier is updated to the second Flowlet identifier.
  • the flow information of each TCP flow in the flow information table maintained by the host side further includes the reference Flowlet identifier of each TCP flow, that is, the current flow information of each TCP flow is maintained in the flow information table.
  • the ID of the Flowlet so that the corresponding Flowlet ID can be set for the segment to be sent.
  • the reference Flowlet ID is the first Flowlet ID (that is, the Flowlet ID corresponding to the second segment)
  • the The Flowlet ID of a message segment is also marked as the first Flowlet ID, that is, the reference Flowlet ID remains unchanged as the first Flowlet ID; if the reference Flowlet ID is the first Flowlet ID, and when the first message segment matches the first Flowlet ID
  • the second segment is divided into different Flowlets (that is, the first segment is divided into a new Flowlet)
  • the Flowlet ID of the first segment is marked as the second Flowlet ID, and at this time, refer to the Flowlet ID Then it needs to be updated to the second Flowlet ID.
  • the reference Flowlet ID can be switched between 0 or 1, that is, between two adjacent Flowlets, their Flowlet IDs take values at intervals between 0 or 1, so only 1 bit can accurately indicate the difference. Whether the packets belong to the same Flowlet.
  • the method further includes: receiving a target ACK packet, where the target ACK packet is an ACK packet with the same destination port number or the same destination address as the target TCP flow; determining the target ACK packet The uplink path delay of the packet, the uplink path delay is the difference between the timestamp value of the target ACK packet and the timestamp echo response value; compare the uplink path delay of the target ACK packet with the target ACK packet Compare the delays of the upstream paths in the multipath set of the TCP flow; if the upstream path delay of the target ACK packet is greater than the delay of the upstream path with the current maximum delay in the multipath set, the first The path delay is updated to the uplink path delay of the target ACK packet; if the uplink path delay of the target ACK packet is less than the delay of the uplink path with the current minimum delay in the multipath set, the The second path delay is updated to the uplink path delay of the target ACK packet.
  • the time threshold for dividing the Flowlet may be the maximum upstream path delay of the historical packet received by the target TCP flow (or the TCP flow in the same network session as the target TCP flow).
  • the time threshold is calculated from the difference between the time delay and the minimum uplink path delay, that is, the time threshold is a value that changes in real time and is dynamically adjusted according to the network transmission load.
  • the transmission delay of the upstream path of the ACK packet is calculated by the difference between the timestamp value of the target ACK packet and the timestamp echo response value, and the transmission delay of the ACK packet is calculated according to the The historical value of the transmission delay of the uplink path, determine a current minimum uplink path delay, and use it as the first path delay, and determine a current maximum uplink path delay, and use it as the second path delay Path delay; finally, the difference between the first path delay and the second path delay is used to calculate the time threshold for dividing Flowlets in the target TCP flow.
  • the time threshold used to divide Flowlets for different TCP flows or data in different states of the same TCP flow changes dynamically, and is dynamically adjusted according to the real-time transmission delay of the data in the corresponding TCP flow. Therefore, , which can always adapt to dynamic network load changes.
  • the multi-path set includes multiple equivalent transmission paths of the target TCP flow; or, the multi-path set includes multiple equivalent transmission paths of the target TCP flow and non-equivalent transmission paths of the target TCP flow Equivalent transmission paths; or, the multi-path set includes multiple non-equivalent transmission paths of the target TCP stream.
  • the multiple transmission paths in the multi-path set of the target TCP flow are all equal-cost paths.
  • the first path delay is the delay of the uplink path with the largest delay among these equal-cost paths
  • the second path delay is the delay of the uplink path with the smallest delay among these equal-cost paths;
  • the multiple transmission paths in the multi-path set of the target TCP flow may include equal-cost paths or non-equivalent paths.
  • the first path delay is these equivalent or non-equivalent paths.
  • the delay of the uplink path with the largest delay in the middle path, and the delay of the second path is the delay of the uplink path with the smallest delay among these equivalent or non-equivalent paths.
  • whether the multiple paths in the above multipath set are equivalent depends on the type of network topology of the network to which the host accesses, and the embodiment of the present invention can be applied to all network types with multipath transmission.
  • an embodiment of the present invention provides a data transmission method, which is applied to a switch, and may include:
  • the first data packet includes a first message segment and a Flowlet identifier of the first message segment; determine the target TCP flow to which the first data packet belongs, and obtain the target TCP flow associated with the first data packet matching forwarding information; the forwarding information includes a reference Flowlet identifier of the target TCP flow and a reference forwarding path; wherein, the reference Flowlet identifier is currently the first Flowlet identifier corresponding to the second segment, and the second message The segment is the previous segment adjacent to the first segment in the target TCP flow; the reference forwarding path is the first forwarding path of the second data packet, and the second data packet includes the first segment.
  • two message segments and the first Flowlet identifier compare the Flowlet identifier of the first message segment with the first Flowlet identifier; and determine whether to pass the first message segment through the The first forwarding path forwards.
  • the switch side after receiving the data packet, identifies the Flowlet identifier in the data packet, and according to the Flowlet identifier, determines the Flowlet identifier of the adjacent data packet in the target TCP flow to which the first data packet belongs. whether the first data packet needs to be forwarded through the forwarding path corresponding to the second data packet based on this. That is, the switch side does not need to divide Flowlets for data packets according to the time interval of the received data packets, but directly identifies whether the current data packets to be sent are the same TCP flow according to the Flowlet identification bits contained in the received data packets. The previous adjacent data packet in the flowlet belongs to the same Flowlet, so it is decided whether to continue forwarding through the forwarding path of the adjacent data packet, or to divide a new Flowlet for the data packet and decide a new forwarding path for it.
  • the switch maintains a forwarding information table, and the forwarding information table includes forwarding information of M TCP flows, where M is an integer greater than or equal to 1, wherein the forwarding information of each TCP flow Including the five-tuple hash value of the corresponding TCP flow; the determining the target TCP flow to which the first data packet belongs, and obtaining the forwarding information matching the target TCP flow, comprising: according to the five-tuple of the first data packet. tuple information, calculate the quintuple hash value of the first data packet; according to the quintuple hash value of the first data packet, search for the target TCP flow matching from the forwarding information table Forward information.
  • the switch side maintains a forwarding information table, where the forwarding information table includes forwarding information of one or more TCP flows (for example, currently active TCP flows) on the hosts connected to it, and each TCP flow
  • the forwarding information may in turn include the quintuple hash value of the TCP stream. That is, the switch can maintain the forwarding information of all currently active TCP flows, so that when a data packet needs to be sent, it can search the forwarding information table with the quintuple hash value according to the quintuple hash value of the data packet.
  • the forwarding information (including the reference Flowlet identifier, forwarding path, etc.) matching the value of the value is used to forward the to-be-sent data packet.
  • the determining whether to forward the first packet segment through the forwarding path according to the comparison result includes: if the Flowlet identifier of the first packet segment is the same as the first packet If the Flowlet identifiers are the same, the first data packet is forwarded through the first forwarding path; if the Flowlet identifier of the first segment is different from the first Flowlet identifier, it is determined for the first data packet The second forwarding path is forwarded through the second forwarding path.
  • the switch when the switch identifies that the first data packet is the same as the Flowlet identifier of the adjacent data packet in the target TCP flow to which it belongs, the switch forwards the first data packet and the second data packet on the same path; when When the switch identifies that the first data packet is different from the flowlet identifier of the adjacent data packet in the target TCP flow to which it belongs, it determines a new forwarding path for the first data packet, and forwards the first data packet through the new forwarding path. It should be noted that the second forwarding path may be the same as or different from the first forwarding path, depending on the decision of the switch.
  • the method further includes: if the Flowlet identifier of the first packet segment is different from the first Flowlet identifier and is the second Flowlet identifier, referencing the target TCP flow The flowlet identification is updated to the second flowlet identification, and the reference forwarding path is updated to the second forwarding path.
  • the switch needs to divide the first data packet into a new Flowlet, and needs to update the reference Flowlet ID of the TCP flow to which it belongs to the Flowlet ID corresponding to the current latest data packet, that is, the second Flowlet ID.
  • an embodiment of the present invention provides a data processing apparatus, which may include:
  • a first generating unit configured to generate a first segment
  • a first determining unit configured to determine the target TCP flow to which the first segment belongs
  • an obtaining unit configured to obtain the timestamp of the first segment, and obtain target flow information that matches the target TCP flow, where the target flow information includes a time threshold corresponding to the target TCP flow, the target flow Timestamp of the second segment in the TCP flow; wherein the second segment is the previous segment adjacent to the first segment in the target TCP stream, and the time threshold is the first segment
  • the first path delay is the delay of the uplink path with the largest delay in the multipath set of the target TCP flow
  • the second path delay is the The delay of the uplink path with the smallest delay in the multipath set of the target TCP flow
  • a first comparison unit configured to compare the difference between the timestamp of the first packet segment and the timestamp of the second packet segment with the time threshold
  • the flowlet dividing unit is configured to determine, according to the comparison result, whether to divide the first packet segment and the second packet segment into the same Flowlet.
  • the first determining unit is specifically configured to:
  • the target TCP flow to which the first message segment belongs is determined according to the source port number of the first message segment.
  • the apparatus further includes:
  • a maintenance unit configured to maintain a flow information table, where the flow information table includes flow information of N TCP flows, where N is an integer greater than or equal to 1, wherein the flow information of each TCP flow includes a flow index corresponding to the TCP flow;
  • the obtaining unit is specifically configured to: search the target flow information matching the target TCP flow from the flow information table according to the flow index of the target TCP flow.
  • the Flowlet is divided into units, and is specifically used for:
  • the first packet segment is divided into a new Flowlet.
  • the target flow information further includes a reference Flowlet identifier of the target TCP flow, and the reference Flowlet identifier is currently the first Flowlet identifier corresponding to the second segment;
  • the apparatus Also includes:
  • a second generating unit configured to generate a first data packet, where the first data packet includes the first packet segment and the Flowlet identifier of the first packet segment;
  • the flowlet of the first packet segment is identified as the first flowlet identification
  • the Flowlet identifier of the first packet segment is the second Flowlet identifier.
  • the apparatus further includes:
  • a first update unit configured to update the reference Flowlet identifier to the first message segment if the difference between the timestamp of the first segment and the timestamp of the second segment is greater than the time threshold 2.
  • the apparatus further includes:
  • a receiving unit configured to receive a target ACK packet, where the target ACK packet is an ACK packet with the same destination port number or the same destination address as the target TCP stream;
  • a second determining unit configured to determine the uplink path delay of the target ACK packet, where the uplink path delay is the difference between the timestamp value of the target ACK packet and the timestamp echo response value;
  • a second comparison unit configured to compare the uplink path delay of the target ACK packet with the uplink path delay in the multipath set of the target TCP flow
  • a second updating unit configured to update the first path delay to the target if the uplink path delay of the target ACK packet is greater than the delay of the uplink path with the current maximum delay in the multipath set Upstream path delay of ACK packet;
  • a third updating unit configured to update the second path delay to the target if the uplink path delay of the target ACK packet is smaller than the delay of the uplink path with the smallest current delay in the multipath set Upstream path delay of ACK packets.
  • the multi-path set includes multiple equivalent transmission paths of the target TCP flow; or, the multi-path set includes multiple equivalent transmission paths of the target TCP flow and non-equivalent transmission paths of the target TCP flow Equivalent transmission paths; or, the multi-path set includes multiple non-equivalent transmission paths of the target TCP stream.
  • an embodiment of the present invention provides a data processing apparatus, which may include:
  • a receiving unit configured to receive a first data packet, where the first data packet includes a first message segment and a Flowlet identifier of the first message segment, and the first data packet belongs to a target TCP flow;
  • a determining unit configured to determine the target TCP flow to which the first data packet belongs, and obtain forwarding information matching the target TCP flow;
  • the forwarding information includes a reference Flowlet identifier of the target TCP flow and a reference forwarding path; wherein , the reference Flowlet identifier is currently the first Flowlet identifier corresponding to the second message segment, and the second message segment is the previous message segment adjacent to the first message segment in the target TCP flow;
  • the reference forwarding path is a first forwarding path of a second data packet, and the second data packet includes the second packet segment and the first Flowlet identifier;
  • a comparison unit configured to compare the Flowlet identifier of the first segment with the first Flowlet identifier
  • a forwarding unit configured to determine whether to forward the first packet segment through the first forwarding path according to the comparison result.
  • the switch maintains a forwarding information table, and the forwarding information table includes forwarding information of M TCP flows, where M is an integer greater than or equal to 1, wherein the forwarding information of each TCP flow Including the five-tuple hash value corresponding to the TCP flow; the determining unit is specifically used for:
  • the forwarding information matching the target TCP flow is searched from the forwarding information table.
  • the forwarding unit is specifically used for:
  • a second forwarding path is determined for the first data packet, and forwarded through the second forwarding path.
  • the apparatus further includes:
  • an update unit if the Flowlet identifier of the first segment is different from the first Flowlet identifier and is the second Flowlet identifier, updating the reference Flowlet identifier of the target TCP flow to the second Flowlet identifier, and The reference forwarding path is updated to the second forwarding path.
  • the present application provides a semiconductor chip, which may include the data processing apparatus provided by any one of the implementation manners of the third aspect.
  • the present application provides a semiconductor chip, which may include the data processing apparatus provided by any one of the implementation manners of the fourth aspect.
  • the present application provides a semiconductor chip, which may include: the data processing device provided by any one of the implementation manners of the third aspect, an internal memory coupled to the data processing device, and an external memory.
  • the present application provides a semiconductor chip, which may include: the data transmission device provided by any one of the implementation manners of the fourth aspect, an internal memory coupled to the data processing device, and an external memory.
  • the present application provides a system-on-chip SoC chip, where the SoC chip includes the data processing apparatus provided in any one of the implementation manners of the third aspect, an internal memory and an external memory coupled to the data processing apparatus.
  • the SoC chip may be composed of chips, or may include chips and other discrete devices.
  • the present application provides a system-on-chip SoC chip, where the SoC chip includes the data transmission device provided by any one of the implementation manners of the fourth aspect, an internal memory and an external memory coupled to the data transmission device.
  • the SoC chip may be composed of chips, or may include chips and other discrete devices.
  • the present application provides a chip system, where the chip system includes the data processing apparatus provided by any one of the implementation manners of the foregoing third aspect.
  • the chip system further includes a memory for storing necessary or related program instructions and data during the operation of the data processing apparatus.
  • the chip system may be composed of chips, or may include chips and other discrete devices.
  • the present application provides a chip system, where the chip system includes the data transmission device provided by any one of the implementation manners of the fourth aspect.
  • the chip system further includes a memory for storing necessary or related program instructions and data during the operation of the data transmission device.
  • the chip system may be composed of chips, or may include chips and other discrete devices.
  • the present application provides a data processing apparatus, the processing apparatus having the function of implementing any one of the data processing methods in the above-mentioned first aspect.
  • This function can be implemented by hardware or by executing corresponding software by hardware.
  • the hardware or software includes one or more modules corresponding to the above functions.
  • the present application provides a data transmission device, and the processing device has the function of implementing any one of the data transmission methods in the above-mentioned second aspect.
  • This function can be implemented by hardware or by executing corresponding software by hardware.
  • the hardware or software includes one or more modules corresponding to the above functions.
  • the present application provides a host, where the host includes a processor, and the processor is configured to execute the data processing method provided by any one of the implementation manners of the foregoing first aspect.
  • the host may also include memory, coupled to the processor, which holds program instructions and data necessary for the host.
  • the host may also include a communication interface for the host to communicate with other devices or communication networks.
  • the present application provides a switch, where the switch includes a processor, and the processor is configured to execute the data transmission method provided by any one of the implementation manners of the foregoing first aspect.
  • the switch may also include a memory for coupling with the processor that holds program instructions and data necessary for the switch.
  • the switch may also include a communication interface for the switch to communicate with other devices or communication networks.
  • the present application provides a computer-readable storage medium, where the computer-readable storage medium stores a computer program, and when the computer program is executed by a host, implements the multi-core processor described in any one of the second aspect above processing method flow.
  • the present application provides a computer-readable storage medium, where the computer-readable storage medium stores a computer program, and when the computer program is executed by a switch, implements the multi-core processor described in any one of the fourth aspect above processing method flow.
  • an embodiment of the present invention provides a computer program, where the computer program includes instructions, when the computer program is executed by a multi-core processor, so that a host can execute the multi-core processing described in any one of the second aspect above The processing method flow of the device.
  • an embodiment of the present invention provides a computer program, where the computer program includes instructions, when the computer program is executed by a multi-core processor, the switch can perform the multi-core processing described in any one of the fourth aspect above The processing method flow of the device.
  • FIG. 1 is a schematic diagram of dividing a TCP flow into Flowlets in the prior art.
  • FIG. 2 is a schematic diagram of an architecture of a network transmission system provided by an embodiment of the present application.
  • FIG. 3 is a schematic diagram of a topology structure of a data center network provided by an embodiment of the present application.
  • FIG. 4 is a schematic diagram of a computer network OSI model and a TCP/IP model provided by an embodiment of the present application.
  • FIG. 5 is a schematic flowchart of a data transmission method provided by an embodiment of the present invention.
  • FIG. 6A is a schematic diagram of a first data packet and a second data packet in the same Flowlet according to an embodiment of the present invention.
  • FIG. 6B is a schematic diagram of a first data packet and a second data packet in different Flowlets according to an embodiment of the present invention.
  • FIG. 6C is a schematic flowchart of dividing and marking a Flowlet by an additional layer protocol according to an embodiment of the present invention.
  • FIG. 6D is a schematic flowchart of an additional layer protocol dynamically updating the segmentation threshold of a Flowlet according to an embodiment of the present invention.
  • FIG. 7 is a schematic flowchart of a data transmission method provided by an embodiment of the present invention.
  • FIG. 8 is a schematic structural diagram of a data processing apparatus provided by an embodiment of the present invention.
  • FIG. 9 is a schematic structural diagram of a data transmission apparatus provided by an embodiment of the present invention.
  • a component may be, but is not limited to, a process running on a processor, a processor, an object, an executable, a thread of execution, a program, and/or a computer.
  • an application running on a computing device and the computing device may be components.
  • One or more components may reside within a process and/or thread of execution, and a component may be localized on one computer and/or distributed between 2 or more computers.
  • these components can execute from various computer readable media having various data structures stored thereon.
  • a component may, for example, be based on a signal having one or more data packets (eg, data from two components interacting with another component between a local system, a distributed system, and/or a network, such as the Internet interacting with other systems via signals) Communicate through local and/or remote processes.
  • data packets eg, data from two components interacting with another component between a local system, a distributed system, and/or a network, such as the Internet interacting with other systems via signals
  • Equal-Cost Multipath Routing that is, there are multiple paths with the same cost to the same destination address, where the same cost refers to the number of hops (that is, the number) of the switches passed through. same.
  • ECMP Equal-Cost Multipath Routing
  • the same cost refers to the number of hops (that is, the number) of the switches passed through. same.
  • the network topology of the equal-cost multi-path model such as a data center network
  • all possible transmission paths between the same pair of source and destination hosts are equal-cost paths.
  • the traffic sent to the destination IP or destination network segment can be shared through different paths to achieve network load balancing, and when some of the paths are faulty, other paths are used instead to complete the forwarding process , to achieve routing redundancy backup function.
  • the data packets sent to the destination address can only use one of the links, the other links are in the backup state or invalid state, and it takes a certain amount of time to switch each other in the dynamic routing environment, and the equivalent cost
  • the multi-path routing protocol can use multiple links at the same time in this network environment, which not only increases the transmission bandwidth, but also can backup the data transmission of the failed link without delay and packet loss.
  • TCP Transmission Control Protocol
  • TCP is a connection-oriented, reliable, byte stream-based transport layer communication protocol.
  • TCP is designed to accommodate a layered protocol hierarchy that supports multiple network applications. Reliable communication services are provided between pairs of processes in host computers connected to different but interconnected computer communication networks relying on TCP. TCP assumes that it can get simple, possibly unreliable, datagram service from lower-level protocols. In principle, TCP should be able to operate over a wide variety of communication systems from hardwired to packet-switched or circuit-switched networks.
  • Network flow also referred to as network flow
  • Flow a collection of data packets with the same quintuple in a period of time is called a network flow, where the quintuple contains the source IP address and source port of both parties. number, destination IP address, destination port number, and transport layer protocol.
  • a network session is a collection of multiple network flows, and multiple network flows have the same triplet (source address, destination address, transport layer protocol).
  • Flow slice/microflow/small flow which can be understood as a packet group composed of multiple packets continuously sent in a Flow, and each Flow includes multiple Flowlets.
  • Flowlet Flow slice/microflow/small flow
  • packets are forwarded based on the Flowlet mechanism
  • multiple packets included in the Flowlet can be forwarded based on the Flowlet flow table entries.
  • Different Flowlets correspond to different Flowlet flow table entries.
  • the Flowlet flow table entry is used to indicate packet forwarding paths of multiple packets included in each Flowlet.
  • Transmission Control Protocol/Internet Protocol refers to a protocol suite that can realize information transmission between multiple different networks.
  • the TCP/IP protocol not only refers to the two protocols of TCP and IP, but also refers to a protocol cluster composed of FTP, SMTP, TCP, UDP, IP and other protocols, just because the TCP protocol and the IP protocol in the TCP/IP protocol The most representative, so it is called the TCP/IP protocol.
  • TCP is a connection-oriented, reliable, byte stream-based transport layer communication protocol.
  • ISP Internet Service Provider
  • FIG. 2 is a schematic diagram of an architecture of a network transmission system provided by an embodiment of the present application. Please refer to FIG. 2 .
  • the architecture of the network transmission system mainly includes: a host 10 , a switch (SWITCH) 20 , and the Internet.
  • the host 10 can be classified as a source host or a destination host according to whether it is a sender or a receiver.
  • the source host can be connected to the Internet through the switch 20 to communicate with the destination host.
  • the host 10 can be any computing device that generates data and has a network access function.
  • any computer connected to the Internet can be called a host, and each host has a unique IP address.
  • the host 10 may specifically be various types of devices, such as a server, a personal computer, a tablet computer, a mobile phone, a personal digital assistant, a smart wearable device, and an unmanned terminal.
  • the source host needs to encapsulate the application data into data packets (such as TCP/IP packets), and then hand them over to the next data link layer (such as a switch).
  • the host 10 also has the functions of dividing the packet segment by Flowlet, identifying the Flowlet, and dynamically configuring the time threshold for dividing the Flowlet.
  • the switch 20 is a network device that performs the function of encapsulating and forwarding data packets based on MAC (hardware address of the network card) identification. It can "learn" the MAC address and store it in the internal address table. By establishing a temporary exchange path between the sender and receiver of the data frame, the data frame can directly reach the destination address from the source address.
  • the functions of switch 20 may include physical addressing, network topology, error checking, frame sequence, and flow control, among others.
  • the switch 20 also has functions such as Flowlet division and Flowlet identification performed on the packet segment according to the above-mentioned host 10 side, and further performs the same Flowlet in the data flow based on the Flowlet identification that has been divided and identified. Or for the forwarding of different Flowlets, refer to the description of the subsequent related embodiments for details, and details are not repeated here.
  • the source host 10 uses a transmission control protocol (Transmission Control Protocol, TCP) and processes the data through the data processing method in this application, and then sends a message to the message forwarding device of the routing switching network.
  • TCP Transmission Control Protocol
  • the message forwarding device (such as switch, router, etc.) adopts ECMP technology and forwards the message through the data transmission method in this application, and finally forwards the message to the destination host 10, thereby achieving the effect of load balancing processing.
  • the data processing method or the data transmission method in the embodiment of the present invention may be applicable to a transmission mechanism based on TCP/IP.
  • the application scope of the data transmission method in the present invention is not only limited to the data center network, but also applicable to any network with multiple paths, such as an ISP network (Internet Service Provider), where the network topology is any two source-destination communication nodes (that is, Both the source host and the destination host) provide multiple network paths, so the technical solutions in this application can be applied to perform dynamic load balancing at Flowlet granularity.
  • ISP network Internet Service Provider
  • the characteristics of its network topology determine that in the same network session of the data center network, that is, the triplet (source address, destination address, transport layer protocol) information is the same.
  • One or more paths included in the corresponding multi-path set are equal-cost paths; for other types of networks, such as in the same network session of the ISP network, the corresponding multi-path sets are equal-cost paths.
  • One or more paths included in the path set may or may not be equivalent. Therefore, one or more paths included in the multi-path set of the target TCP flow described in this application may be equivalent or not equivalent according to the type of topology of the network to which the host is connected.
  • FIG. 2 is only an exemplary implementation in the embodiment of the present application, and the network architecture in the embodiment of the present invention includes but is not limited to the above network architecture.
  • FIG. 3 is a schematic diagram of a data center network topology provided by an embodiment of the present application.
  • the data center network mainly includes: a Core core layer, an Aggregation convergence layer, an Access access layer, and a POD convergence area layer.
  • the source host can communicate with the destination host through the switch and the core network through the TCP protocol. in,
  • the Point of Delivery (POD) layer consists of multiple PODs, each of which can include servers, storage, and network devices.
  • POD Point of Delivery
  • the Top of Rack (ToR) mode is a way of cabling server cabinets in the data center. When the TOR mode is used for cabling, 1 to 2 access switches are deployed on the upper end of each server cabinet.
  • Access layer Physically connected to the server, generally placed at the top of the cabinet, also known as the ToR switch, or the Edge layer. Access switches are usually located at the top of the rack, so they are also called ToR (Top of Rack) switches, and they physically connect servers.
  • ToR Top of Rack
  • Aggregation Aggregation Layer Aggregation switches, aggregation connection access switches, and provide other services such as firewall (Fire Wall, FW), load balancing (Server Load Balancer, SLB), Secure Sockets Layer offload (Secure Sockets Layer offload, SSL offload) ), intrusion detection, network analysis, etc.
  • the core switch provides high-speed forwarding and provides connectivity for multiple aggregation layers.
  • the core switch provides high-speed forwarding of packets in and out of the data center, and provides connectivity for multiple aggregation layers.
  • the core switch provides an elastic L3 routing network for the entire network.
  • TOR1 in Pod1 it has at least 4 equal-cost paths (ECMP) to access the Internet (Internet).
  • ECMP equal-cost paths
  • TOR1 in Pod1 can at least pass through the equivalent path: Path 1, Path 2, Path 3, and Path 4 to access the Internet.
  • the data transmission method applied to the switch side can be applied to the switches of the above-mentioned layers (Access access layer, Aggregation convergence layer, or Core core layer), that is, when the data packets are sent from the source In the entire forwarding path from the host to the destination host, all switches participating in forwarding can implement any one of the data transmission methods provided in this application.
  • the switches of the above-mentioned layers Access access layer, Aggregation convergence layer, or Core core layer
  • data center network topology in FIG. 3 above is only an exemplary implementation in the embodiments of the present application, and the data center network topology in the embodiments of the present invention includes but is not limited to the above network architecture.
  • FIG. 4 is a schematic diagram of a computer network OSI model and a TCP/IP model provided by an embodiment of the present application.
  • IP model an additional layer is added between the transport layer and the network layer.
  • the additional layer is mainly used to divide, mark and set related parameters of Flowlets in the TCP flow.
  • the OSI eight-layer network model provided by the embodiment of the present invention consists of 1 to 8 layers from bottom to top, which are respectively a physical layer (Physical layer), a data link layer (Data link layer), a network layer (Network layer), Additional layer, transport layer (Transport layer), session layer (Session layer), presentation layer (Presentation layer) and application layer (Application layer);
  • the TCP/IP model provided by the embodiment of the present invention can be simplified from bottom to top 5 layers, mainly including the network interface layer, network layer, additional layer, transport layer and application layer.
  • the layer closest to the user in the OSI reference model is to provide application interfaces for computer users, and also to provide users with various network services directly. Provide rich system application interface to user application software.
  • Common application layer network service protocols are: Hyper Text Transfer Protocol (HTTP), Hyper Text Transfer Protocol (Hyper Text Transfer Protocol over Secure Socket Layer, HTTPS), File Transfer Protocol (File Transfer Protocol, FTP) , Post Office Protocol-Version 3 (POP3), Simple Mail Transfer Protocol (Simple Mail Transfer Protocol, SMTP), etc.
  • the transport layer establishes the end-to-end link of the host.
  • the role of the transport layer is to provide end-to-end reliable and transparent data transmission services for the upper-layer protocols, including dealing with issues such as error control and flow control.
  • This layer shields the details of the data communication of the lower layer from the upper layer, so that the upper layer user only sees a reliable data path between the two transmission entities from host to host, which can be controlled and set by the user.
  • TCP/UDP is at this layer.
  • the additional layer in the embodiment of the present invention is used for dividing and marking the Flowet and setting related parameters for the TCP flow.
  • the additional layer protocol dynamically configures the segmentation threshold of the Flowlet according to the delay feedback of the network path, and divides the segment of the TCP flow into Flowlets according to the dynamic segmentation threshold. Since the flowlet division is completed on the host side, the present invention can use the 1-bit reserved field of the transport layer header to mark the adjacent Flowlets of the same TCP flow (the present invention names the 1-bit field as FL_Tag), and transmits the division result of the Flowlet. At the switch in the network, the switch identifies the Flowlet according to the header flag bit of the data packet.
  • the functions on the host side mainly involve the above-mentioned application layer, presentation layer, session layer, transport layer, and additional layer.
  • the additional layer described in this application may be deployed as a single layer, or may be deployed in the above-mentioned existing transport layer. That is, the functions implemented by the additional layer are combined into the transport layer for implementation, which is not specifically limited in this embodiment of the present invention.
  • This layer establishes the connection between two nodes through IP addressing, selects appropriate routing and switching nodes for the packets sent by the transport layer at the source end, and transmits them to the transport layer at the destination end correctly according to the address. Also known as the IP protocol layer.
  • the network layer realizes the entire transmission process of data from any node to any other node according to the network layer address information contained in the data, that is, the main function is to complete the message transmission between hosts in the network, using the data link
  • the services of the layer transmit each message from the source to the destination.
  • the functions involved in the switch in the embodiment of the present invention correspond to the network layer.
  • Cyclic Redundancy Check (CRC), error notification, network topology, flow control, etc. Combine bits into bytes, and bytes into frames, use link-layer addresses (Ethernet uses MAC addresses) to access the medium, and perform error detection. Between adjacent nodes connected by physical links, a data link in a logical sense is established, and point-to-point or point-to-multiple direct communication of data is realized on the data link.
  • the data link layer is responsible for the reliable transmission of data between the host's interface message processor (Interface Message Processor, IMP) and the IMP-IMP.
  • IMP Interface Message Processor
  • the data link layer is responsible for the reliable transmission of data between and among them.
  • the main function of the physical layer is to complete the original bit stream transmission between adjacent nodes. That is, it is responsible for sending and receiving data in the form of a bit stream. In fact, the transmission of the final signal is realized through the physical layer.
  • Commonly used transmission media for the physical layer include (various physical devices) hubs, repeaters, modems, network cables, twisted pairs, coaxial cables, etc.
  • each layer protocol when application layer data is sent to the network through the protocol stack, each layer protocol must add a data header, which is called encapsulation.
  • the protocol layer has different names for data packets. For example, it is called a message at the application layer, a segment at the transport layer, a datagram or packet at the network layer, and a frame at the link layer. frame), etc.
  • FIG. 4 is only an exemplary implementation in the embodiments of the present application, and the network models and functions involved in the embodiments of the present invention include but are not limited to the above models and functions.
  • a Flowlet is actually a micro-Flow.
  • a flow can be divided into many Flowlets.
  • the same Flowlet has the same quintuple information, that is, the source IP, destination IP, source port, destination port and transport layer protocol are all the same.
  • Multiple Packets continuously sent in a flow are regarded as a Flowlet, and the Flowlet mechanism is applied to select a path, so that the multiple Packets included in the Flowlet are forwarded based on the selected path.
  • the same forwarding path (excluding the equivalent path) is used for forwarding
  • different but equivalent paths can be used (ie, equal-cost multi-path) for forwarding, or unequal-cost paths for forwarding, depending on the type of network topology of the network to which the host is connected.
  • the equal-cost multipath can be included in the forwarding path of the data packet, the path with the same number of switch hops; the unequal-cost multipath refers to the path with unequal switch hops in the forwarding path of the data packet.
  • a Flow can be regarded as a composition of multiple Flowlets.
  • Load balancing introduces an intermediate layer based on Flowlets, which is neither a packet nor a flow, but is larger than a flowlet.
  • a Flowlet whose packet is smaller than a Flow, that is, a Flowlet can be considered as a microflow composed of one or more Packets in the same Flow.
  • FIG. 5 is a schematic flowchart of a data transmission method provided by an embodiment of the present invention.
  • the method may be applied to the network architecture described in FIG. 2 or FIG. 3, wherein the host 10 may be used to support and Steps S501 to S504 of the method flow shown in FIG. 5 are performed.
  • the following description will be made from the side of the host 10 (source host) with reference to FIG. 3 .
  • the method may include the following steps S501-S504, and optionally, may further include steps S505-S506.
  • Step S501 Generate a first packet segment, and determine the target TCP flow to which the first packet segment belongs.
  • the source host when a host (which can be called a source host) needs to send a message to another host (which can be called a destination host), the source host first generates a data packet that conforms to the relevant protocol standards locally, and then passes the switch. Wait for it to be sent to the destination host.
  • the process of generating a data packet mainly involves an application layer (including an application layer, a presentation layer, and a session layer), a transport layer and a network layer.
  • the message enters the transport layer after being encapsulated by the application layer on the source host side, and generates a segment that conforms to the transport layer protocol (that is, the first message segment), such as a TCP segment (segment) conforming to the TCP protocol. That is, on the host side, when the TCP segment completes the encapsulation of the transport layer header field (that is, the first segment in the embodiment of the present invention is generated), the additional layer in the present application is triggered (as shown in FIG. 2 ).
  • a TCP flow represents the data transmission process in a certain business process, that is, from TCP three-way handshake ⁇ data transmission end ⁇ connection release; and the quintuple information of the same TCP flow is the same,
  • the quintuple information includes source IP, destination IP, source port, destination port, and transport layer protocol.
  • the host determines the target TCP flow to which the first message segment belongs according to the source port number in the first message segment. That is, the source port numbers corresponding to different TCP streams must be different, so whether different packet segments belong to the same TCP stream can be determined by the source ports in the packet segments. For example, the source host may determine which target TCP flow the first packet belongs to according to the source port information in the first packet.
  • Step S502 Obtain the timestamp of the first packet segment, and obtain target flow information matching the target TCP flow.
  • the flowlet division and identification of the first segment depends on the difference between the timestamps between the first segment and the adjacent previous segment and the corresponding time The relationship between the thresholds, so it is necessary to obtain matching related flow information.
  • the source host determines the TCP flow to which the first message segment to be sent currently belongs, it further obtains the timestamp of the first message segment, and obtains target flow information matching the target TCP flow for further follow-up The division, identification, etc. of the Flowlet.
  • the timestamp of the message segment is usually time information added when the message segment is encapsulated in the transport layer, and the time information represents the moment when the message segment is generated (for the destination host It can also be understood as the moment when the source host sends the segment). For example, when the host needs to send data to the destination host, it will encapsulate the sending time into the timestamp item of the data. For both the source host and the destination host, the timestamp can be used to know when the data was sent. , so as to calculate (or measure) the network delay, calculate the time-consuming of service processing, etc.
  • the host can obtain the timestamp of the first packet segment by obtaining the timestamp value carried in the first packet segment, or obtain the timestamp of the first packet segment according to the current timestamp of the system, and The obtaining step may be completed after the first packet segment is generated and before step S503, and the specific execution time point is not limited.
  • the target flow information includes a time threshold corresponding to the target TCP flow and a timestamp of the second packet segment in the target TCP flow.
  • the second segment is the previous segment adjacent to the first segment in the target TCP stream (that is, the timestamp value in the target TCP stream is earlier than the first segment and adjacent to the first segment. message segment); the timestamp of the second message segment can be obtained by the host according to the timestamp value carried in the second message segment, or obtained by the host according to the moment recorded by the system at that time, that is, the first message
  • the timestamp of the segment and the timestamp of the second segment may be obtained using the same standard.
  • the time threshold is the difference between the first path delay and the second path delay
  • the first path delay is the delay of the uplink path with the largest delay in the multipath set of the target TCP flow
  • the first path delay is The two-path delay is the delay of the uplink path with the smallest delay in the multi-path set of the target TCP flow.
  • the multipath set of the target TCP flow may include multiple transmission paths corresponding to the target TCP flow, that is, multiple possible upstream transmission paths between the source IP, source port, destination IP, and destination port of the target TCP flow.
  • the multi-path set of the target TCP flow may further include multiple transmission paths corresponding to the TCP flow of the same network session between the target TCP flow, that is, multiple possible transmission paths between the source IP and the destination IP.
  • the multipath set of the target TCP stream may include multiple transmission paths corresponding to the target TCP stream itself, and may further include triple information (source address, destination address, transport layer protocol) related to the target TCP stream Multiple transmission paths corresponding to the same TCP stream. That is, one TCP stream may correspond to a multipath set, or multiple TCP streams in the same network session may correspond to the same multipath set. Therefore, each TCP stream may maintain a
  • the time threshold for dividing Flowlets may also be a time threshold for dividing Flowlets that is jointly maintained among multiple TCP flows.
  • the upstream path refers to the path from the sender (that is, the source host) to the receiver (destination host); and the upstream path delay refers to the time when the source host receives the packet after the segment is sent from the source host and reaches the destination host. The total delay experienced between the acknowledgment from the destination host (the destination host sends the acknowledgment immediately after receiving the data).
  • the multiple transmission paths in the multi-path set of the target TCP flow are all equal-cost paths.
  • the first path delay is equal to Among these equal-cost paths, the delay of the uplink path with the largest delay, the second path delay is the delay of the uplink path with the smallest delay among these equal-cost paths;
  • the multiple transmission paths in the multi-path set of the target TCP flow may include equal-cost paths or non-equivalent paths.
  • the first path delay is these equivalent or non-equivalent paths.
  • the delay of the uplink path with the largest delay is the delay of the second path
  • the delay of the second path is the delay of the uplink path with the smallest delay among these equivalent or non-equivalent paths.
  • the network of the equivalent multi-path model (such as data The multipath set between the different TCP flows between the sender and the receiver in the central network) is actually the same, so it can be based on the same historical segment as the quintuple or triple of the target TCP flow.
  • the uplink path delay is used to calculate the time threshold.
  • Fig. 3 since the network topology in Fig. 3 is a data center network, which belongs to the equal-cost multi-path model, all paths in the multi-path set of the target TCP flow under this network are equivalent Paths, such as Path 1, Path 2, Path 3 and Path shown in Figure 3.
  • the host maintains a flow information table, and the flow information table includes flow information of N TCP flows, where N is an integer greater than or equal to 1, wherein the flow information of each TCP flow Including the flow index of the corresponding TCP flow; the acquiring target flow information matching the target TCP flow includes: searching the flow information table for the target flow matching the target TCP flow according to the flow index of the target TCP flow the target stream information.
  • the additional layer protocol needs to maintain a flow information table FlowInfoTable, which is used to record the flow information of each TCP flow when the Flowlet is divided.
  • Each TCP flow occupies an entry in the FlowInfoTable table, as shown in Table 1,
  • each entry may contain six items: SrcPort, LstFLTag, LstTS, TTDiff, TripTime_max, TripTime_min. in,
  • the TCP stream index (SrcPort) item is used to index each TCP stream. That is, the label of the TCP stream. Wherein, this item corresponds to the flow index of the TCP flow described in this application.
  • the field value (LstFLTag) item of the previous data packet is the FL_Tag field value of the previous segment of the TCP flow. That is, whether the Flowlet ID corresponding to the last packet just sent is 0 or 1. Wherein, this item corresponds to the reference Flowlet identifier described in this application.
  • the LstTS item is the timestamp value of a segment on the TCP stream. That is, the timestamp value of the last data packet just sent (the timestamp value added by the transport layer), and its unit is usually microsecond (us) level. Wherein, this item corresponds to the timestamp value of the second segment described in this application.
  • the TTDiff item is the time threshold used by the TCP flow to divide Flowlets. That is, each stream maintains a time threshold independently, whose unit is usually microsecond (us) level. For example, it is the difference between the TripTime_max item and the TripTime_min item shown in Table 1. For example, the value of the TripTime_max item is 58 and the value of the TripTime_min item is 31, then the value of the TTDiff item is 27, and the value of the TripTime_max item is 27. 49. If the value of the TripTime_min item is 36, the value of the TTDiff item is 13. Wherein, this item corresponds to the time threshold described in this application.
  • TripTime_max is the delay of the upstream path with the largest delay in the multipath set corresponding to the TCP stream (the upstream path refers to the path from the sender to the receiver), and its unit is usually microseconds ( us) level. Wherein, this item corresponds to the first path delay described in this application.
  • TripTime_min is the delay of the uplink path with the smallest delay in the multipath set corresponding to the TCP flow, and the unit is usually a microsecond (us) level. Wherein, this item corresponds to the second path delay described in this application.
  • Step S503 Compare the difference between the timestamp of the first packet segment and the timestamp of the second packet segment with the time threshold.
  • the second The timestamp of the packet segment, the difference between the timestamp of the first packet segment (the timestamp is determined from the first packet segment) and the timestamp of the second packet segment and the above target flow information Compare the time threshold of the first message segment and the second message segment to obtain whether the time difference between the first segment and the second segment exceeds the maximum time interval corresponding to the target TCP flow to which the segment belongs, that is, the time threshold.
  • Step S504 According to the comparison result, determine whether to divide the first packet segment and the second packet segment into the same Flowlet.
  • step S303 it is determined whether to divide the first packet segment and the second packet segment into the same Flowlet, and for the same Flowet in a TCP flow, it is performed through the same path. For forwarding, the forwarding path needs to be re-decided for different Flowlets.
  • the first packet segment is is divided into the same Flowlet as the second segment; if the difference between the timestamp of the first segment and the timestamp of the second segment is greater than the time threshold, the first segment Segments are divided into new Flowlets.
  • the difference between the timestamps between the first segment to be sent and the last adjacent second segment in the target TCP flow to which it belongs is smaller than that corresponding to the target TCP flow (the time threshold is dynamically changed), it is considered that the first segment and the last adjacent second segment meet the conditions of being sent in the same Flowlet, that is, the first segment can be sent in the same Flowlet.
  • a segment is determined to be divided into the same Flowet as the previous second segment; for the same reason, if the current first segment to be sent is the same as the last adjacent second segment in the target TCP flow to which it belongs If the difference between the time stamps is greater than the time threshold corresponding to the target TCP flow, it is considered that the first segment and the last adjacent second segment do not meet the requirements in the same Flowlet.
  • the condition for sending is to divide the first segment into a new Flowet.
  • FIG. 6A is a schematic diagram of a first data packet and a second data packet in the same Flowlet provided by an embodiment of the present invention
  • the first data packet is the first segment and is encapsulated by an additional layer
  • the second data packet is a data packet after the second segment is encapsulated by the additional layer.
  • the host side divides the first packet segment and the second packet segment into the same In one Flowlet (Flowlet5 in the figure), that is, the corresponding first data packet and the second data packet are divided into the same Flowlet5.
  • FIG. 6B is a schematic diagram of a first data packet and a second data packet in different Flowlets according to an embodiment of the present invention.
  • the timestamps of the first packet segment and the second packet segment are If the difference between the values is greater than the time threshold, the host side divides the first segment and the second segment into different Flowlets (respectively, Flowlet5 and Flowlet4 in the figure), that is, the corresponding first data packets. and the second packet is divided into Flowlet5 and Flowlet4. It can be understood that, at this time, the first data packet is equivalent to the first data packet in the new Flowlet.
  • the embodiment of the present invention dynamically configures the time interval for detecting the Flowlet by combining the delay feedback information of the network path, so as to ensure Flowlet granularity matches network path state.
  • this embodiment of the present invention may further include the following method steps S505-S506.
  • Step S505 Generate a first data packet, where the first data packet includes the first packet segment and the Flowlet identifier of the first packet segment.
  • the source host side after the source host side encapsulates the first data segment through the additional layer and the network layer, it further generates a first data packet, where the first data packet includes the first segment and the first segment
  • the Flowlet identifier that is, in the process of generating the first data packet, in addition to encapsulating the header of the relevant protocol, it is also necessary to encapsulate the Flowlet identifier of the packet segment.
  • the Flowlet identifier of the first segment may be encapsulated in the Flowlet identifier bit on the header.
  • the target flow information also includes the reference Flowlet identifier of the target TCP flow (that is, corresponding to the LstFLTag field in Table 1 above). It is assumed that the reference Flowlet identifier is currently the first Flowlet identifier corresponding to the second segment, that is, the last most recent Flowlet identifier. When the sent segment is the second segment, the reference Flowlet identifier actually refers to the first Flowlet identifier corresponding to the second segment.
  • the flowlet of the first packet segment is identified as the first flowlet identifier; if the difference between the timestamp of the first packet segment and the timestamp of the second packet segment is greater than the time threshold, the Flowlet identifier of the first packet segment is the second Flowlet identifier.
  • the flowlet identifier corresponding to the packet segment can be set during the encapsulation process, so that the packet segment can be encapsulated
  • the switch side can identify which Flowlet the data packet belongs to through the Flowlet identifier, so as to decide which path to send through. For example, when the first packet is to enter the data link layer where the switch is located, the first packet needs to be further encapsulated. In this case, a bit is set in the encapsulated data packet for the switch to identify the packet. The identifier of which Flowlet the segment belongs to.
  • the switch side passes the first packet and the second packet corresponding to the second segment through the same path for forwarding.
  • the embodiment of the present invention divides the Flowlet on the host side, and can use the bit (for example, 1 bit) in the reserved field of the transport layer header to mark the Flowlet, and the switch can identify the Flowlet only by the header field, with high efficiency and high efficiency.
  • the hardware overhead is low, and it also ensures that the same Flowlet will not be segmented again no matter how many hops of switches it goes through in the network, reducing the risk of out-of-order packets.
  • Step S506 If the difference between the timestamp of the first packet segment and the timestamp of the second packet segment is greater than the time threshold, update the reference Flowlet identifier to the second Flowlet identifier.
  • the flow information of each TCP flow in the flow information table maintained by the host side also includes the reference Flowlet identifier of each TCP flow, that is, the current Flowlet identifier of each TCP flow is maintained in the flow information table , so as to set the corresponding Flowlet identifier for the segment to be sent.
  • the reference Flowlet ID is the first Flowlet ID (that is, the Flowlet ID corresponding to the second segment)
  • the The Flowlet ID of a message segment is also marked as the first Flowlet ID, that is, the reference Flowlet ID remains unchanged as the first Flowlet ID; if the reference Flowlet ID is the first Flowlet ID, and when the first message segment matches the first Flowlet ID
  • the second segment is divided into different Flowlets (that is, the first segment is divided into a new Flowlet)
  • the Flowlet ID of the first segment is marked as the second Flowlet ID, and at this time, refer to the Flowlet ID Then it needs to be updated to the second Flowlet ID.
  • the reference Flowlet ID can be switched between 0 or 1, that is, between two adjacent Flowlets, their Flowlet IDs take values at intervals between 0 or 1, so only 1 bit can accurately indicate the difference. Whether the packets belong to the same Flowlet.
  • the host also updates the first path delay or the second path delay according to the received ACK packet, so as to update the time threshold of the target TCP flow in real time.
  • the host receives a target ACK packet, and the target ACK packet is an ACK packet with the same destination port number or the same destination address as the target TCP stream; determining the uplink path delay of the target ACK packet, the uplink path The delay is the difference between the timestamp value of the target ACK packet and the timestamp echo response value; the uplink path delay of the target ACK packet and the uplink path delay of the historical ACK packets in the target TCP flow Compare; if the uplink path delay of the target ACK packet is greater than the maximum value of the uplink path delays of the historical ACK packets, then update the first path delay to the uplink path time of the target ACK packet If the uplink path delay of the target ACK packet is less than the minimum value among the uplink path delays of the historical ACK packets, update the second path delay to the up
  • the time threshold for dividing the Flowlet may be the maximum upstream path delay of the historical packet received by the target TCP flow (or the TCP flow in the same network session as the target TCP flow).
  • the time threshold is calculated from the difference between the time delay and the minimum uplink path delay, that is, the time threshold is a value that changes in real time and is dynamically adjusted according to the network transmission load.
  • the transmission delay of the upstream path of the ACK packet is calculated by the difference between the timestamp value of the target ACK packet and the timestamp echo response value, and the transmission delay of the ACK packet is calculated according to the The historical value of the transmission delay of the uplink path, determine a current minimum uplink path delay, and use it as the first path delay, and determine a current maximum uplink path delay, and use it as the second path delay Path delay; finally, the difference between the first path delay and the second path delay is used to calculate the time threshold for dividing Flowlets in the target TCP flow.
  • the time threshold used to divide Flowlets is dynamically changed, and is dynamically adjusted according to the real-time transmission delay of data in the corresponding TCP flow, so it can always adapt to dynamic network load changes.
  • FIG. 6C is a schematic flowchart of an additional layer protocol dividing and marking a Flowlet according to an embodiment of the present invention. Based on the flow information table maintained by the host in the above Table 1, the following exemplarily describes when the TCP segment is completed. After the encapsulation of the header field of the transport layer, the function of the additional layer is triggered and the realization process of flowlet division and marking is performed, which may include the following steps:
  • the index in the FlowInfoTable (the flow information table described in Table 1) is changed to correspond to the TCP flow. entry (denoted as [SrcPort]).
  • the next step is to judge whether the segment (such as the first segment) meets the flowlet segmentation conditions, that is, to determine whether the timestamp value of the segment (such as the timestamp value of the first segment) is different from that of the segment.
  • the relationship between the difference between the timestamp values of the previous segment recorded in the corresponding entry of the TCP flow and the segmentation threshold of the Flowlet that is, CurTS–[SrcPort].LstTS ⁇ [SrcPort].TTDiff).
  • the current segment (such as the first segment) is regarded as the first segment of the new Flowlet and marked on the FL_Tag bit of the header.
  • the value of the FL_Tag bit of the segment header is set to the opposite value of the LstFLTag item in the entry, and then the value of the LstFLTag item in the entry is updated;
  • the current segment (such as the first segment) is regarded as the subsequent segment of the previous Flowlet and marked on the FL_Tag bit of the header.
  • the method of marking is to mark the packet.
  • the value of the FL_Tag bit of the segment header is set to the same value of the LstFLTag entry in this entry.
  • FIG. 6D is a schematic flowchart of an additional layer protocol dynamically updating a segmentation threshold of a Flowlet according to an embodiment of the present invention. Based on the flow information table maintained by the host in Table 1 above, it should be noted that, since the data center network topology provides multiple equal-cost paths for the same pair of source and destination hosts, in this embodiment of the present invention, the time carried by the ACK packet is first used.
  • Stamp continuously obtains the uplink path delay of equal-cost paths (including equal-cost paths in the same TCP flow, or can include equal-cost paths in the same network session), and records the maximum and minimum uplink path delays
  • the maximum delay difference between equal-cost paths is represented by the difference between the maximum and minimum uplink path delays, and the TTDiff parameter is periodically configured based on the maximum delay difference.
  • the following exemplarily describes the implementation process of the additional layer protocol dynamically configuring the TTDiff parameter used to indicate the division of the Flowlet, which may specifically include the following steps:
  • the host side indexes the corresponding ACK packet in the FlowInfoTable information table according to the destination port information carried.
  • the entry (denoted as [SrcPort]) is the same as the entry corresponding to the TCP flow associated with the ACK packet.
  • Timesatmp timestamp value field value
  • TimesatmpEcho timestamp echo response field value
  • the additional layer protocol may also periodically update the TTDiff items of all entries in the information table.
  • the value of the TTDiff item is configured as the difference between the TripTime_max item and the TripTime_min item of each entry, and the TripTime_max item and the TripTime_min item are set simultaneously. to reset the value of , so that the maximum or minimum value that has expired will not always exist.
  • the update period may also be set to the time order of the network round-trip delay (about 100-200 microseconds), the reset value of the TripTime_max item is set to zero, and the reset value of the TripTime_min item is set to one Larger value (such as the maximum value that can be represented by 4 bytes).
  • Flowlet technology can well solve the problems of hash collision, rat flow blocking, asymmetry and other problems faced by data center network load balancing.
  • Most of the existing Flowlet-level solutions detect and forward Flowlets at the switch based on a fixed time interval, but the fixed time interval cannot match the dynamically changing traffic load moment of the data center network, which will lead to uneven load distribution in the network.
  • This application proposes to pre-segment traffic at the terminal host based on a time threshold that adaptively changes with the path load, and then distribute fine-grained Flowlets to the network. After the switch recognizes the Flowlets, it can execute any routing algorithm to further balance the load.
  • FIG. 7 is a schematic flowchart of a data transmission method provided by an embodiment of the present invention.
  • the method can be applied to the switch in the network architecture described in FIG. 2 or FIG. 3.
  • the switch 20 can be used to support and Steps S701 to S704 of the method flow shown in FIG. 7 are performed.
  • the following description will be made from the switch side with reference to FIG. 3 .
  • the method may include the following steps S701-S704.
  • Step S701 Receive the first data packet.
  • the first data packet includes a first packet segment and a Flowlet identifier of the first packet segment.
  • Step S702 Determine the target TCP flow to which the first data packet belongs, and obtain forwarding information matching the target TCP flow.
  • the forwarding information includes a reference Flowlet identifier of the target TCP flow and a reference forwarding path; wherein, the reference Flowlet identifier is currently a first Flowlet identifier corresponding to a second packet segment, and the second packet segment is the last packet segment adjacent to the first packet segment in the target TCP flow; the reference forwarding path is the first forwarding path of the second data packet, and the second packet includes the second packet a text segment and the first Flowlet identifier;
  • the switch maintains a forwarding information table, and the forwarding information table includes forwarding information of M TCP flows, where M is an integer greater than or equal to 1, wherein the forwarding information of each TCP flow Including the five-tuple hash value of the corresponding TCP flow; the determining the target TCP flow to which the first data packet belongs, and obtaining the forwarding information matching the target TCP flow, comprising: according to the five-tuple of the first data packet. tuple information, calculate the quintuple hash value of the first data packet; according to the quintuple hash value of the first data packet, search for the target TCP flow matching from the forwarding information table Forward information.
  • the switch side maintains a forwarding information table, where the forwarding information table includes forwarding information of one or more TCP flows (for example, currently active TCP flows) on the hosts connected to it, and each TCP flow
  • the forwarding information may in turn include the quintuple hash value of the TCP stream. That is, the switch can maintain the forwarding information of all currently active TCP flows, so that when a data packet needs to be sent, it can search the forwarding information table with the quintuple hash value according to the quintuple hash value of the data packet.
  • the forwarding information (including the reference Flowlet identifier, forwarding path, etc.) matching the value of the value is used to forward the to-be-sent data packet.
  • Step S703 Compare the Flowlet identifier of the first segment with the first Flowlet identifier
  • Step S704 According to the comparison result, determine whether to forward the first packet segment through the first forwarding path.
  • the switch side identifies the Flowlet identifier in the data packet, and according to the Flowlet identifier, determines whether the Flowlet identifier of the first data packet and the adjacent data packets in the target TCP flow to which it belongs are the same. , and based on this, it is determined whether the first data packet needs to be forwarded through the forwarding path corresponding to the second data packet. That is, the switch side does not need to divide Flowlets for data packets according to the time interval of the received data packets, but directly identifies whether the current data packets to be sent are the same TCP flow according to the Flowlet identification bits contained in the received data packets. The previous adjacent data packet in the flowlet belongs to the same Flowlet, so it is decided whether to continue forwarding through the forwarding path of the adjacent data packet, or to divide a new Flowlet for the data packet and decide a new forwarding path for it.
  • the forwarding path referred to in the embodiment of the present invention refers to the forwarding port currently determined by each switch. That is, the complete forwarding path of a data packet may actually need to be jointly determined by the forwarding ports that are determined separately by the multi-hop switch. Therefore, in the embodiment of the present invention, on the switch side actually means that each switch in the multi-hop switch executes the above data transmission method, and then finally determines the complete forwarding path of the first data packet.
  • the first data packet is forwarded through the first forwarding path; If the Flowlet identifier of a packet segment is different from the first Flowlet identifier, a second forwarding path is determined for the first data packet and forwarded through the second forwarding path.
  • the switch when the switch identifies that the first data packet is the same as the Flowlet identifier of the adjacent data packet in the target TCP flow to which it belongs, the switch forwards the first data packet and the second data packet on the same path; when When the switch identifies that the first data packet is different from the flowlet identifier of the adjacent data packet in the target TCP flow to which it belongs, it determines a new forwarding path for the first data packet, and forwards the first data packet through the new forwarding path. It should be noted that the second forwarding path may be the same as or different from the first forwarding path, depending on the decision result of the switch.
  • the method further includes: if the Flowlet identifier of the first packet segment is different from the first Flowlet identifier and is the second Flowlet identifier, referencing the target TCP flow The flowlet identification is updated to the second flowlet identification, and the reference forwarding path is updated to the second forwarding path.
  • the switch needs to divide the first data packet into a new Flowlet, and needs to update the reference Flowlet ID of the TCP flow to which it belongs to the Flowlet ID corresponding to the current latest data packet, that is, the second Flowlet ID.
  • the flowlet is identified at the switch according to the identification bit of the data packet.
  • the switch implements the above functions through the Flowlet forwarding information table (Flowlet Table), where the format of Table 2 is as follows:
  • each entry in the Flowlet forwarding information table may contain three items: Entry, FLTag, and Port. in,
  • the Entry item records the hash value of the quintuple (source IP, destination IP, source port, destination port, and protocol number) of the data packet, and uses this hash value to index the corresponding entry of the TCP stream in the forwarding table. Wherein, this item corresponds to the quintuple hash value described in this application.
  • the FLTag item records the Flowlet flag bit information, which is used for the identification of adjacent Flowlets, wherein this item corresponds to the reference Flowlet identifier described in this application.
  • the Port item records the information of the forwarding port. Wherein, this item corresponds to the forwarding information described in this application.
  • the following exemplarily describes the implementation process of identifying the Flowlet according to the identification bit of the data packet on the switch side, and the implementation process may include the following main steps:
  • the switch For each arriving data packet, the switch must first identify which TCP flow the data packet belongs to, and then identify which Flowlet of the TCP flow the data packet belongs to.
  • the switch first performs a hash operation on the quintuple of the data packet to obtain a hash value, and then searches the Flowlet forwarding table for the forwarding table entry corresponding to the Entry item with the same hash value to determine that the arriving data packet belongs to Which TCP stream.
  • the hash value is key1, key2, etc. in Table 1 above.
  • the data packet belongs to the current flow burst, and the data packet is forwarded to the output port indicated by the Port item of the entry.
  • the switch performs a quintuple hash on it, and the hash value is key2, and then assumes that the flowlet identifier value of the packet B is 0; the switch forwards according to the hash value key2
  • the index in the information table corresponds to the entry, and the FLTag value in the entry is compared with the Flowlet identification value of packet B.
  • the main protection points of this application may include the following points:
  • each flow has an exclusive entry in the information table, and the entry includes the relevant information when splitting the Flowlet.
  • the timestamp option in the ACK packet continuously obtain the one-way delay information of the path from the host, and then calculate the one with the largest delay in the multi-path set (which may include equal-cost paths or non-equal-cost paths).
  • the delay of the uplink path; and the maximum delay difference is periodically set as the time threshold for dividing the Flowlet to ensure that the time threshold can dynamically adapt to the path load, and the information table is updated.
  • the current segment is regarded as the first segment of the new Flowlet.
  • a bit of a reserved field in the TCP header is used as a flag bit to distinguish different Flowlets of the same flow. Among them, the flag bits of all message segments in the same Flowlet have the same value, and the flag bits of adjacent Flowlets have the same value. value is opposite.
  • the switch can identify each Flowlet according to the five-tuple hash and the one-bit flag bit in the header of the transport layer or the additional layer, and can use any routing algorithm to complete the forwarding of the data packet.
  • FIG. 8 is a schematic structural diagram of a data processing apparatus provided by an embodiment of the present invention.
  • the data processing apparatus 80 may include a first generating unit 801, a first determining unit 802, an obtaining unit 803, a first The comparing unit 804 and the Flowlet dividing unit 805, wherein the detailed description of each unit is as follows.
  • a first generating unit 801, configured to generate a first segment
  • a first determining unit 802 configured to determine the target TCP flow to which the first packet segment belongs
  • Obtaining unit 803, configured to obtain the timestamp of the first segment, and obtain target flow information matching the target TCP flow, where the target flow information includes a time threshold corresponding to the target TCP flow, the Timestamp of the second segment in the target TCP flow; wherein, the second segment is the previous segment adjacent to the first segment in the target TCP stream, and the time threshold is the first segment
  • the first path delay is the delay of the uplink path with the largest delay in the multipath set of the target TCP flow
  • the second path delay is the Describe the delay of the uplink path with the smallest delay in the multipath set of the target TCP flow;
  • a first comparison unit 804 configured to compare the difference between the timestamp of the first packet segment and the timestamp of the second packet segment with the time threshold
  • the flowlet dividing unit 805 is configured to determine, according to the comparison result, whether to divide the first packet segment and the second packet segment into the same Flowlet.
  • the first determining unit is specifically configured to:
  • the target TCP flow to which the first message segment belongs is determined according to the source port number of the first message segment.
  • the apparatus further includes:
  • a maintenance unit configured to maintain a flow information table, where the flow information table includes flow information of N TCP flows, where N is an integer greater than or equal to 1, wherein the flow information of each TCP flow includes a flow index corresponding to the TCP flow;
  • the obtaining unit is specifically configured to: search the target flow information matching the target TCP flow from the flow information table according to the flow index of the target TCP flow.
  • the Flowlet is divided into units, and is specifically used for:
  • the first packet segment is divided into a new Flowlet.
  • the target flow information further includes a reference Flowlet identifier of the target TCP flow, and the reference Flowlet identifier is currently the first Flowlet identifier corresponding to the second segment;
  • the apparatus Also includes:
  • a second generating unit configured to generate a first data packet, where the first data packet includes the first packet segment and a Flowlet identifier of the first packet segment;
  • the flowlet of the first packet segment is identified as the first flowlet identification
  • the Flowlet identifier of the first packet segment is the second Flowlet identifier.
  • the apparatus further includes:
  • a first update unit configured to update the reference Flowlet identifier to the first message segment if the difference between the timestamp of the first segment and the timestamp of the second segment is greater than the time threshold 2.
  • the apparatus further includes:
  • a receiving unit configured to receive a target ACK packet, where the target ACK packet is an ACK packet with the same destination port number or the same destination address as the target TCP stream;
  • a second determining unit configured to determine the uplink path delay of the target ACK packet, where the uplink path delay is the difference between the timestamp value of the target ACK packet and the timestamp echo response value;
  • a second comparison unit configured to compare the uplink path delay of the target ACK packet with the uplink path delay in the multipath set of the target TCP flow
  • a second updating unit configured to update the first path delay to the target if the uplink path delay of the target ACK packet is greater than the delay of the uplink path with the current maximum delay in the multipath set Upstream path delay of ACK packet;
  • a third updating unit configured to update the second path delay to the target if the uplink path delay of the target ACK packet is smaller than the delay of the uplink path with the smallest current delay in the multipath set Upstream path delay of ACK packets.
  • the multi-path set includes multiple equivalent transmission paths of the target TCP flow; or, the multi-path set includes multiple equivalent transmission paths of the target TCP flow and non-equivalent transmission paths of the target TCP flow Equivalent transmission paths; or, the multi-path set includes multiple non-equivalent transmission paths of the target TCP stream.
  • FIG. 9 is a schematic structural diagram of a data transmission apparatus provided by an embodiment of the present invention.
  • the data transmission apparatus 90 may include a receiving unit 901, a determining unit 902, a comparing unit 903, and a forwarding unit 904, wherein each unit
  • the detailed description is as follows.
  • a receiving unit 901 configured to receive a first data packet, where the first data packet includes a first message segment and a Flowlet identifier of the first message segment, and the first data packet belongs to a target TCP flow;
  • a determining unit 902 configured to determine the target TCP flow to which the first data packet belongs, and obtain forwarding information matching the target TCP flow;
  • the forwarding information includes a reference Flowlet identifier of the target TCP flow and a reference forwarding path;
  • the reference Flowlet identifier is currently the first Flowlet identifier corresponding to the second packet segment, and the second packet segment is the previous packet segment adjacent to the first packet segment in the target TCP flow;
  • the reference forwarding path is a first forwarding path of a second data packet, and the second data packet includes the second packet segment and the first Flowlet identifier;
  • a comparison unit 903 configured to compare the Flowlet identifier of the first segment with the first Flowlet identifier
  • the forwarding unit 904 is configured to determine, according to the comparison result, whether to forward the first packet segment through the first forwarding path.
  • the switch maintains a forwarding information table, and the forwarding information table includes forwarding information of M TCP flows, where M is an integer greater than or equal to 1, wherein the forwarding information of each TCP flow Including the five-tuple hash value corresponding to the TCP flow; the determining unit is specifically used for:
  • the forwarding information matching the target TCP flow is searched from the forwarding information table.
  • the forwarding unit is specifically used for:
  • a second forwarding path is determined for the first data packet, and forwarded through the second forwarding path.
  • the apparatus further includes:
  • an update unit if the Flowlet identifier of the first segment is different from the first Flowlet identifier and is the second Flowlet identifier, updating the reference Flowlet identifier of the target TCP flow to the second Flowlet identifier, and The reference forwarding path is updated to the second forwarding path.
  • An embodiment of the present invention further provides a host, wherein the host includes a processor, a memory, and a communication interface, wherein the memory is used for storing data processing program codes, and the processor is used for calling the data processing program codes to execute Part or all of the steps of any one of the data processing methods described in the above method embodiments.
  • An embodiment of the present invention further provides a switch, wherein the host includes a processor, a memory, and a communication interface, wherein the memory is used to store a data transmission program code, and the processor is used to call the data transmission program code to execute Part or all of the steps of any one of the data transmission methods described in the above method embodiments.
  • Embodiments of the present invention further provide a computer-readable storage medium, wherein the computer-readable storage medium may store a program, and when the program is executed by a host, the program includes any part or part of any of the data processing methods described in the above method embodiments. all steps.
  • the embodiment of the present invention also provides a computer program, the computer program includes instructions, when the computer program is executed by the switch, the switch can execute part or all of the steps of any data transmission method.
  • the disclosed apparatus may be implemented in other manners.
  • the device embodiments described above are only illustrative.
  • the division of the above-mentioned units is only a logical function division.
  • multiple units or components may be combined or integrated. to another system, or some features can be ignored, or not implemented.
  • the shown or discussed mutual coupling or direct coupling or communication connection may be through some interfaces, indirect coupling or communication connection of devices or units, and may be in electrical or other forms.
  • the units described above as separate components may or may not be physically separated, and components shown as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution in this embodiment.
  • each functional unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist physically alone, or two or more units may be integrated into one unit.
  • the above-mentioned integrated units may be implemented in the form of hardware, or may be implemented in the form of software functional units.
  • the integrated units are implemented in the form of software functional units and sold or used as independent products, they may be stored in a computer-readable storage medium.
  • the technical solutions of the present application can be embodied in the form of software products in essence, or the parts that contribute to the prior art, or all or part of the technical solutions, and the computer software products are stored in a storage medium , including several instructions to cause a computer device (which may be a personal computer, a server, or a network device, etc., specifically a processor in the computer device) to execute all or part of the steps of the foregoing methods in the various embodiments of the present application.
  • a computer device which may be a personal computer, a server, or a network device, etc., specifically a processor in the computer device
  • the aforementioned storage medium may include: U disk, mobile hard disk, magnetic disk, optical disk, Read-Only Memory (Read-Only Memory, abbreviation: ROM) or Random Access Memory (Random Access Memory, abbreviation: RAM), etc.
  • a medium that can store program code may include: U disk, mobile hard disk, magnetic disk, optical disk, Read-Only Memory (Read-Only Memory, abbreviation: ROM) or Random Access Memory (Random Access Memory, abbreviation: RAM), etc.

Abstract

A data processing method, a data transmission method, and a related device. The data processing method is applied to a host and comprises: generating a first message segment, and determining a target TCP flow to which the first message segment belongs; obtaining target flow information matching the target TCP flow, the target flow information comprising a time threshold corresponding to the target TCP flow, and a timestamp of a second message segment in the target TCP flow; comparing a difference value between a timestamp of the first message segment and the timestamp of the second message segment, with the time threshold; and determining, according to the comparison result, whether to classify the first message segment and the second message segment into a same Flowlet. By means of the method, the efficiency and accuracy of data transmission can be improved.

Description

一种数据处理、传输方法及相关设备A data processing and transmission method and related equipment 技术领域technical field
本申请涉及通信技术领域,尤其涉及一种数据处理、传输方法及相关设备。The present application relates to the field of communication technologies, and in particular, to a data processing and transmission method and related equipment.
背景技术Background technique
目前,随着数据传输业务的高速化以及实时性要求,需要数据传输设备能够快速、准确地进行流量的负载均衡,来提高其转发性能,增强网络的可靠性,从而更好的为用户服务。At present, with the high-speed and real-time requirements of data transmission services, data transmission equipment needs to be able to quickly and accurately balance the traffic load to improve its forwarding performance and enhance the reliability of the network, so as to better serve users.
目前,等价多路径负载均衡(Equal-cost multi-path routing,ECMP)是比较常用的负载均衡处理方法。ECMP技术包括基于包(Packet)的路径选择方式和基于流(Flow)的路径选择方式。其中,基于Packet的路径选择方式可以做到负载均衡,多路径中不同路径的时延存在差别,导致在接收端接收到的Packet会出现乱序,需要Packet重排序;基于Flow的路径选择方式中可以根据哈希(hash)算法确定出转发报文的出接口(报文转发路径),接收端不需要报文重排序。然而,不同Flow的速率会有差别(例如占据带宽较大的大流(Elephant Flow)和占据带宽较小的小流(Mice Flow),不同路径中传输的Flow也不同。当不同路径中传输的Flow的速率不相等时,可能会导致负载不均衡。At present, Equal-cost multi-path routing (ECMP) is a commonly used load balancing method. The ECMP technology includes a packet-based (Packet) path selection method and a flow (Flow)-based path selection method. Among them, the packet-based path selection method can achieve load balancing, and the delays of different paths in multi-paths are different, resulting in out-of-order packets received at the receiving end, requiring packet reordering; Flow-based path selection method The outgoing interface (packet forwarding path) for forwarding the packet can be determined according to a hash algorithm, and the receiving end does not need to reorder the packet. However, the rates of different Flows will be different (for example, a large flow (Elephant Flow) occupying a larger bandwidth and a small flow (Mice Flow) occupying a smaller bandwidth, the Flows transmitted in different paths are also different. When the flow rates are not equal, load imbalance may occur.
为了能够达到更好的负载均衡,提出了一种基于小流/微流(Flowlet)机制的负载均衡处理方法。在基于Flowlet机制的负载均衡处理方法中,如图1所示,图1为现有技术中的一种TCP流划分为Flowlet的示意图,例如,当前TCP流的数据包到达交换机后被检测为包含5个Flowlet,同一个Flowlet内的各个数据包到达交换机的时间差一般比较小,不同Flowlet间的数据包到达交换机的时间差比较明显。其中,Flowlet 1的第一个数据包与第二个数据包的时间差(timegap)小于既定阈值(timeout),所以交换机将这两个数据包视为同一个Flowlet。再例如,Flowlet 1的最后一个数据包与Flowlet 2的第一个数据包的时间差大于既定阈值,所以交换机将Flowlet 2的第一个数据包视为一个新的Flowlet。也即是,若同一条TCP流相邻的两个数据包到达交换机的时间差小于既定的时间间隔(timeout),则交换机将这两个数据包视为同一个Flowlet。In order to achieve better load balancing, a load balancing processing method based on flowlet/microflow (Flowlet) mechanism is proposed. In the load balancing processing method based on the Flowlet mechanism, as shown in FIG. 1, FIG. 1 is a schematic diagram of dividing a TCP flow into Flowlets in the prior art. For example, after the data packet of the current TCP flow reaches the switch, it is detected as containing For 5 Flowlets, the time difference between each data packet in the same Flowlet reaching the switch is generally relatively small, and the time difference between data packets between different Flowlets reaching the switch is relatively obvious. Among them, the time gap (timegap) between the first data packet and the second data packet of Flowlet 1 is less than the predetermined threshold (timeout), so the switch regards these two data packets as the same Flowlet. For another example, the time difference between the last data packet of Flowlet 1 and the first data packet of Flowlet 2 is greater than a predetermined threshold, so the switch regards the first data packet of Flowlet 2 as a new Flowlet. That is, if the time difference between two adjacent data packets of the same TCP flow reaching the switch is less than a predetermined time interval (timeout), the switch regards the two data packets as the same Flowlet.
综上,现有Flowlet粒度的负载均衡方案,是通过在交换机处基于固定时间间隔来检测Flowlet,但数据传输网络(如数据中心网络)内的负载情况动态变化、不可预测,而固定的时间间隔难以适应于动态变化的网络负载。当时间间隔过小时,则检测出的Flowlet数目会增加,负载均衡的处理粒度更精细,容易增加数据包乱序的风险;当时间间隔过大时,则检测出的Flowlet数目会降低,负载均衡的处理粒度过于粗糙,负载均衡的效果不显著。To sum up, the existing flowlet granularity load balancing scheme detects Flowlets based on a fixed time interval at the switch, but the load situation in the data transmission network (such as the data center network) changes dynamically and unpredictably, and the fixed time interval Difficult to adapt to dynamically changing network loads. When the time interval is too small, the number of detected Flowlets will increase, and the processing granularity of load balancing will be finer, which will easily increase the risk of out-of-order data packets; when the time interval is too large, the number of detected Flowlets will decrease, and load balancing will The processing granularity is too coarse, and the effect of load balancing is not significant.
发明内容SUMMARY OF THE INVENTION
本发明实施例提供一种数据处理、传输方法及相关设备,以提升数据传输过程中的效率和准确率。Embodiments of the present invention provide a data processing and transmission method and related equipment, so as to improve the efficiency and accuracy in the data transmission process.
第一方面,本发明实施例提供了一种数据处理方法,应用于主机,可包括:In a first aspect, an embodiment of the present invention provides a data processing method, which is applied to a host and may include:
生成第一报文段,确定所述第一报文段所属的目标TCP流;获取所述第一报文段的时间戳,以及获取与所述目标TCP流匹配的目标流信息,所述目标流信息包括所述目标TCP 流对应的时间阈值、所述目标TCP流中第二报文段的时间戳;其中,所述第二报文段为所述目标TCP流中与所述第一报文段相邻的上一个报文段,所述时间阈值为第一路径时延和第二路径时延之差,所述第一路径时延为所述目标TCP流的多路径集合中时延最大的上行路径的时延,所述第二路径时延为所述目标TCP流的多路径集合中时延最小的上行路径的时延;将所述第一报文段的时间戳与第二报文段的时间戳的差值与所述时间阈值进行比较;根据比较结果,确定是否将所述第一报文段与所述第二报文段划分在同一个Flowlet。Generate a first message segment, determine the target TCP flow to which the first message segment belongs; obtain the timestamp of the first message segment, and obtain target flow information matching the target TCP flow, the target The flow information includes the time threshold corresponding to the target TCP flow, and the timestamp of the second segment in the target TCP flow; wherein, the second segment is the same as the first message in the target TCP flow. The previous segment adjacent to the segment, the time threshold is the difference between the first path delay and the second path delay, and the first path delay is the longest delay in the multipath set of the target TCP flow The delay of the upstream path, the delay of the second path is the delay of the upstream path with the smallest delay in the multi-path set of the target TCP flow; The difference between the time stamps of the segments is compared with the time threshold; according to the comparison result, it is determined whether to divide the first packet segment and the second packet segment into the same Flowlet.
本发明实施例,在主机侧针对当前生成的待发送的第一报文段,首先确定该第一报文段具体属于哪个TCP流(例如通过源端口来确定),进而获取与该TCP流匹配的目标流信息,其中,该目标流信息中包含了该目标TCP流的多种信息比如相邻的上一个报文段(即第二报文段)的时间戳,以及在该目标TCP流中用于划分Flowlet的时间阈值;进一步地,主机侧将当前待发送的第一报文段与上一个相邻的第二报文段之间的时间戳值之差与该时间阈值进行比较,从而决定是否将第一报文段与上一个相邻的报文段划分在同一个Flowlet中;而其中的时间阈值是由该目标TCP流的多路径集合中的路径的时延来动态计算得到的,例如,根据主机接收到的历史报文段(与目标TCP流的三元组或五元组信息相同的ACK包)实时更新的最大时延和最小时延之差计算得到的;即该时间阈值是一个根据网络传输负载情况实时变化、动态调整的值。也即是说,在本发明实施例中,针对不同的TCP流或者同一个TCP流在不同状态下的数据包,其用于划分Flowlet的时间阈值是动态变化的,且是根据对应TCP流中的数据实时传输时延来动态调整的,因此,能够始终适应动态网络负载变化,避免了现有技术中由交换机侧基于固定的时间间隔来检测Flowlet而导致的难以适应动态网络负载的问题。综上,本发明实施例在主机侧结合了网络路径的时延反馈信息,动态配置用于检测划分Flowlet的时间间隔,使得Flowlet粒度与网络路径状态匹配,减少了数据包乱序的风险,保证了负载均衡的效果。In this embodiment of the present invention, for the currently generated first message segment to be sent, the host side first determines which TCP stream the first message segment belongs to (for example, by determining the source port), and then obtains a matching TCP stream. The target flow information, wherein, the target flow information includes various information of the target TCP flow, such as the timestamp of the last adjacent segment (ie, the second segment), and in the target TCP flow The time threshold used to divide the Flowlet; further, the host side compares the difference between the timestamp values of the first segment to be sent and the last adjacent second segment with the time threshold, thereby Determines whether to divide the first segment and the previous adjacent segment into the same Flowlet; and the time threshold is dynamically calculated by the delay of the paths in the multipath set of the target TCP flow , for example, calculated according to the difference between the maximum delay and the minimum delay of the real-time update of the historical segment (ACK packet with the same triple or quintuple information of the target TCP stream) received by the host; that is, the time The threshold is a value that changes in real time and dynamically adjusts according to the network transmission load. That is to say, in this embodiment of the present invention, for different TCP flows or data packets of the same TCP flow in different states, the time threshold used to divide Flowlets is dynamically changed, and is based on the corresponding TCP flow. The real-time data transmission delay is dynamically adjusted, so it can always adapt to dynamic network load changes, avoiding the problem of difficulty in adapting to dynamic network loads caused by the switch side detecting Flowlets based on fixed time intervals in the prior art. To sum up, the embodiment of the present invention combines the delay feedback information of the network path on the host side, and dynamically configures the time interval for detecting and dividing the flowlet, so that the flowlet granularity matches the state of the network path, reduces the risk of out-of-order data packets, and ensures that the effect of load balancing.
在一种可能的实现方式中,所述确定所述第一报文段所属的目标TCP流,包括:根据所述第一报文段的源端口号,确定所述第一报文段所属的所述目标TCP流。In a possible implementation manner, the determining the target TCP flow to which the first packet segment belongs includes: determining, according to the source port number of the first packet segment, the target TCP flow to which the first packet segment belongs the target TCP stream.
本发明实施例,通过识别报文段的五元组信息中的源端口号,来识别当前待发送的报文段(即第一报文段)是属于哪个TCP流,从而进一步获取该报文段所属的TCP流的流信息(包括上一个相邻报文段的时间戳值、以及用于划分Flowlet的时间阈值),以便于基于该流信息中的相关信息进一步判断当前待发送的数据报文段是否与上一个数据报文段属于同一个Flowlet或者是将其划分至新的Flowlet。In this embodiment of the present invention, by identifying the source port number in the quintuple information of the message segment, it is possible to identify which TCP stream the message segment to be sent (ie, the first message segment) belongs to, so as to further obtain the message. Flow information of the TCP flow to which the segment belongs (including the timestamp value of the last adjacent segment and the time threshold used to divide the Flowlet), so as to further determine the datagram to be sent based on the relevant information in the flow information Whether the segment belongs to the same Flowlet as the last data segment or is it divided into a new Flowlet.
在一种可能的实现方式中,所述主机维护有流信息表,所述流信息表包括N条TCP流的流信息,N为大于或者等于1的整数,其中,每条TCP流的流信息包括对应TCP流的流索引;所述获取与所述目标TCP流匹配的目标流信息,包括:根据所述目标TCP流的流索引从所述流信息表中查找与所述目标TCP流匹配的所述目标流信息。In a possible implementation manner, the host maintains a flow information table, and the flow information table includes flow information of N TCP flows, where N is an integer greater than or equal to 1, wherein the flow information of each TCP flow Including the flow index of the corresponding TCP flow; the acquiring target flow information matching the target TCP flow includes: searching the flow information table for the target flow matching the target TCP flow according to the flow index of the target TCP flow the target stream information.
本发明实施例中,主机侧维护有一个或多个TCP流(例如当前处于活动状态的TCP流)的流信息表,该流信息表中包括一个或多个TCP流的流信息,而每个TCP流信息又可以包括TCP流的索引、以及上述第一方面所涉及的时间阈值和当前待发送的数据报文段的相邻上一个最新的报文段的时间戳等。即主机可以维护当前正在活动的所有TCP流的流信息,以便于当有报文段需要发送的时候,则可以根据报文段所属的TCP流的流索引查找到 流信息表中与该流索引匹配的目标流信息,从而进行后续的Flowlet的划分。In this embodiment of the present invention, the host side maintains a flow information table of one or more TCP flows (for example, a currently active TCP flow), and the flow information table includes flow information of one or more TCP flows, and each The TCP flow information may further include the index of the TCP flow, the time threshold involved in the above-mentioned first aspect, and the timestamp of the last adjacent latest segment of the data segment currently to be sent. That is, the host can maintain the flow information of all currently active TCP flows, so that when a segment needs to be sent, it can find the flow index in the flow information table according to the flow index of the TCP flow to which the segment belongs. Matching target flow information to perform subsequent Flowlet division.
在一种可能的实现方式中,所述根据比较结果,确定是否将所述第一报文段与所述第二报文段划分为同一个Flowlet,包括:若所述第一报文段的时间戳与所述第二报文段的时间戳的差值小于或者等于所述时间阈值,将所述第一报文段与所述第二报文段划分在同一Flowlet中;若所述第一报文段的时间戳与所述第二报文段的时间戳的差值大于所述时间阈值,将所述第一报文段划分至新的Flowlet中。In a possible implementation manner, the determining, according to the comparison result, whether to divide the first packet segment and the second packet segment into the same Flowlet includes: if the first packet segment is of the same Flowlet The difference between the timestamp and the timestamp of the second packet segment is less than or equal to the time threshold, and the first packet segment and the second packet segment are divided into the same Flowlet; The difference between the timestamp of a segment and the timestamp of the second segment is greater than the time threshold, and the first segment is divided into a new Flowlet.
本发明实施例中,如果当前待发送的第一报文段与所属目标TCP流中上一个相邻的第二报文段之间的时间戳之间的差值,小于该目标TCP流所对应的时间阈值(该时间阈值是动态变化的),则认为该第一报文段与上一个相邻的第二报文段之间满足在同一个Flowlet中发送的条件,也即是可以将第一报文段判定为与上一个第二报文段划分在同一个Flowet中;同理,如果当前待发送的第一报文段与所属目标TCP流中上一个相邻的第二报文段之间的时间戳之间的差值,大于该目标TCP流所对应的时间阈值,则认为该第一报文段与上一个相邻的第二报文段之间不满足在同一个Flowlet中发送的条件,也即是将第一报文段划分至新的Flowet中。In this embodiment of the present invention, if the difference between the timestamps between the first segment to be sent and the last adjacent second segment in the target TCP flow to which it belongs is smaller than that corresponding to the target TCP flow (the time threshold is dynamically changed), it is considered that the first segment and the last adjacent second segment meet the conditions of being sent in the same Flowlet, that is, the first segment can be sent in the same Flowlet. A segment is determined to be divided into the same Flowet as the previous second segment; for the same reason, if the current first segment to be sent is the same as the last adjacent second segment in the target TCP flow to which it belongs If the difference between the time stamps is greater than the time threshold corresponding to the target TCP flow, it is considered that the first segment and the last adjacent second segment do not meet the requirements in the same Flowlet. The condition for sending is to divide the first segment into a new Flowet.
在一种可能的实现方式中,所述目标流信息还包括所述目标TCP流的参考Flowlet标识,所述参考Flowlet标识当前为所述第二报文段对应的第一Flowlet标识;所述方法还包括:生成第一数据包,所述第一数据包包括所述第一报文段和所述第一报文段的Flowlet标识;其中,若所述第一报文段的时间戳与所述第二报文段的时间戳的差值小于或者等于所述时间阈值,则所述第一报文段的Flowlet标识为所述第一Flowlet标识;若所述第一报文段的时间戳与所述第二报文段的时间戳的差值大于所述时间阈值,则所述第一报文段的Flowlet标识为第二Flowlet标识。In a possible implementation manner, the target flow information further includes a reference Flowlet identifier of the target TCP flow, and the reference Flowlet identifier is currently the first Flowlet identifier corresponding to the second segment; the method It also includes: generating a first data packet, the first data packet including the first message segment and the Flowlet identifier of the first message segment; wherein, if the timestamp of the first message segment is the same as the all If the difference between the timestamps of the second segment is less than or equal to the time threshold, the Flowlet identifier of the first segment is the first Flowlet identifier; if the timestamp of the first segment is If the difference from the timestamp of the second packet segment is greater than the time threshold, the Flowlet identifier of the first packet segment is the second Flowlet identifier.
本发明实施例中,当进一步对第一报文段进行封装以将数据通过网络传输时,则可通过在封装过程中,设置该报文段对应的Flowlet标识,以便于该报文段被封装成数据包之后,在交换机侧可以通过该Flowlet标识来识别数据包属于哪个Flowlet,从而决定通过哪个路径来进行发送。例如,当第一报文段要进入到交换机所在的数据链路层时,需要进一步将第一报文段进行封装,此时通过在封装的数据包中,设置一位用于交换机识别该报文段属于哪个Flowlet的标识位,当第一报文段与第二报文段的Flowlet标识相同时,则在交换机侧将第一数据包与第二报文段对应的第二数据包通过相同的路径进行转发。综上,本发明实施例通过在主机侧划分好Flowlet,并可利用传输层报头保留字段中的比特位(例如为1比特)来标记Flowlet,交换机仅依靠报头字段即可识别Flowlet,效率高、硬件开销低,同时还保证了同一个Flowlet无论在网络内历经几跳交换机,都不会被再次切分,降低了数据包乱序的风险。In the embodiment of the present invention, when the first packet segment is further encapsulated to transmit data through the network, the flowlet identifier corresponding to the packet segment can be set during the encapsulation process, so that the packet segment can be encapsulated After the data packet is formed, the switch side can identify which Flowlet the data packet belongs to through the Flowlet identifier, so as to decide which path to send through. For example, when the first packet is to enter the data link layer where the switch is located, the first packet needs to be further encapsulated. In this case, a bit is set in the encapsulated data packet for the switch to identify the packet. The identifier of which Flowlet the segment belongs to. When the Flowlet identifier of the first segment and the second segment is the same, the switch side passes the first packet and the second packet corresponding to the second segment through the same path for forwarding. In summary, the embodiment of the present invention divides the Flowlet on the host side, and can use the bit (for example, 1 bit) in the reserved field of the transport layer header to mark the Flowlet, and the switch can identify the Flowlet only by the header field, with high efficiency and high efficiency. The hardware overhead is low, and it also ensures that the same Flowlet will not be segmented again no matter how many hops of switches it goes through in the network, reducing the risk of out-of-order packets.
在一种可能的实现方式中,所述方法还包括:若所述第一报文段的时间戳与所述第二报文段的时间戳的差值大于所述时间阈值,则将所述参考Flowlet标识更新为所述第二Flowlet标识。In a possible implementation manner, the method further includes: if the difference between the timestamp of the first packet segment and the timestamp of the second packet segment is greater than the time threshold, set the The reference Flowlet identifier is updated to the second Flowlet identifier.
本发明实施例中,主机侧维护的流信息表中的每条TCP流的流信息中还包括每条TCP流的参考Flowlet标识,也即是在流信息表中维护了每条TCP流当前的Flowlet的标识,以便于为待发送的报文段设置其对应的Flowlet标识。例如,假设参考Flowlet标识为第一 Flowlet标识(也即是第二报文段对应的Flowlet标识),那么当第一报文段与第二报文段被划分至同一个Flowlet时,则该第一报文段的Flowlet标识也被标记为第一Flowlet标识,也即是参考Flowlet标识仍然保持为第一Flowlet标识不变;若参考Flowlet标识为第一Flowlet标识,且当第一报文段与第二报文段被划分至不同Flowlet时(即第一报文段被划分至新的Flowlet),则该第一报文段的Flowlet标识被标记为第二Flowlet标识,而此时参考Flowlet标识则需要更新为第二Flowlet标识。可选的,参考Flowlet标识可以在0或1之间进行切换,即两个相邻的Flowlet之间,其Flowlet标识在0或1之间间隔取值,因此仅通过1bit即可准确的指示不同的数据包是否属于同一个Flowlet。In this embodiment of the present invention, the flow information of each TCP flow in the flow information table maintained by the host side further includes the reference Flowlet identifier of each TCP flow, that is, the current flow information of each TCP flow is maintained in the flow information table. The ID of the Flowlet, so that the corresponding Flowlet ID can be set for the segment to be sent. For example, assuming that the reference Flowlet ID is the first Flowlet ID (that is, the Flowlet ID corresponding to the second segment), then when the first segment and the second segment are divided into the same Flowlet, the The Flowlet ID of a message segment is also marked as the first Flowlet ID, that is, the reference Flowlet ID remains unchanged as the first Flowlet ID; if the reference Flowlet ID is the first Flowlet ID, and when the first message segment matches the first Flowlet ID When the second segment is divided into different Flowlets (that is, the first segment is divided into a new Flowlet), the Flowlet ID of the first segment is marked as the second Flowlet ID, and at this time, refer to the Flowlet ID Then it needs to be updated to the second Flowlet ID. Optionally, the reference Flowlet ID can be switched between 0 or 1, that is, between two adjacent Flowlets, their Flowlet IDs take values at intervals between 0 or 1, so only 1 bit can accurately indicate the difference. Whether the packets belong to the same Flowlet.
在一种可能的实现方式中,所述方法还包括:接收目标ACK包,所述目标ACK包为与所述目标TCP流的目的端口号相同或者目的地址相同的ACK包;确定所述目标ACK包的上行路径时延,所述上行路径时延为所述目标ACK包的时间戳值和时间戳回送回答值之间的差值;将所述目标ACK包的上行路径时延与所述目标TCP流的多路径集合中上行路径的时延进行比较;若所述目标ACK包的上行路径时延大于所述多路径集合中当前时延最大的上行路径的时延,则将所述第一路径时延更新为所述目标ACK包的上行路径时延;若所述目标ACK包的上行路径时延小于所述多路径集合中当前时延最小的上行路径的时延,则将所述第二路径时延更新为所述目标ACK包的上行路径时延。In a possible implementation manner, the method further includes: receiving a target ACK packet, where the target ACK packet is an ACK packet with the same destination port number or the same destination address as the target TCP flow; determining the target ACK packet The uplink path delay of the packet, the uplink path delay is the difference between the timestamp value of the target ACK packet and the timestamp echo response value; compare the uplink path delay of the target ACK packet with the target ACK packet Compare the delays of the upstream paths in the multipath set of the TCP flow; if the upstream path delay of the target ACK packet is greater than the delay of the upstream path with the current maximum delay in the multipath set, the first The path delay is updated to the uplink path delay of the target ACK packet; if the uplink path delay of the target ACK packet is less than the delay of the uplink path with the current minimum delay in the multipath set, the The second path delay is updated to the uplink path delay of the target ACK packet.
本发明实施例中,用于划分Flowlet的时间阈值可以是由该目标TCP流(或与该目标TCP流在同一网络会话中的TCP流)中接收到的历史报文段的最大上行路径时延和最小上行路径时延之差计算得到的,也即是该时间阈值是一个根据网络传输负载情况实时变化、动态调整的值。具体地,主机侧每次接收到属于目标TCP流的ACK包(即目的端口号相同),或者接收到与目标TCP流在同一网络会话中的TCP流的ACK包(即目的地址相同或目的网段相同)时,都通过发送该目标ACK包的时间戳值与时间戳回送回答值之间的差值来计算该ACK包的上行路径的传输时延,并根据所接收到的所有ACK包的上行路径传输时延的历史值,确定一个当前的最小上行路径时延,并将其作为所述第一路径时延,且确定一个当前的最大上行路径时延,并将其作为所述第二路径时延;最终利用第一路径时延和第二路径时延的差值来计算得到目标TCP流中用于划分Flowlet的时间阈值。从而使得不同的TCP流或者同一个TCP流在不同状态下的数据,其用于划分Flowlet的时间阈值是动态变化的,且是根据对应TCP流中的数据实时传输时延来动态调整的,因此,能够始终适应动态网络负载变化。In this embodiment of the present invention, the time threshold for dividing the Flowlet may be the maximum upstream path delay of the historical packet received by the target TCP flow (or the TCP flow in the same network session as the target TCP flow). The time threshold is calculated from the difference between the time delay and the minimum uplink path delay, that is, the time threshold is a value that changes in real time and is dynamically adjusted according to the network transmission load. Specifically, each time the host side receives an ACK packet belonging to the target TCP stream (that is, the destination port number is the same), or receives an ACK packet of a TCP stream in the same network session as the target TCP stream (that is, the destination address is the same or the destination network is the same) When the segment is the same), the transmission delay of the upstream path of the ACK packet is calculated by the difference between the timestamp value of the target ACK packet and the timestamp echo response value, and the transmission delay of the ACK packet is calculated according to the The historical value of the transmission delay of the uplink path, determine a current minimum uplink path delay, and use it as the first path delay, and determine a current maximum uplink path delay, and use it as the second path delay Path delay; finally, the difference between the first path delay and the second path delay is used to calculate the time threshold for dividing Flowlets in the target TCP flow. Therefore, the time threshold used to divide Flowlets for different TCP flows or data in different states of the same TCP flow changes dynamically, and is dynamically adjusted according to the real-time transmission delay of the data in the corresponding TCP flow. Therefore, , which can always adapt to dynamic network load changes.
在一种可能的实现方式中,所述多路径集合包括所述目标TCP流的多条等价传输路径;或者,所述多路径集合包括所述目标TCP流的多条等价传输路径以及非等价传输路径;或者,所述多路径集合包括所述目标TCP流的多条非等价传输路径。In a possible implementation manner, the multi-path set includes multiple equivalent transmission paths of the target TCP flow; or, the multi-path set includes multiple equivalent transmission paths of the target TCP flow and non-equivalent transmission paths of the target TCP flow Equivalent transmission paths; or, the multi-path set includes multiple non-equivalent transmission paths of the target TCP stream.
本发明实施例中,当主机接入的网络为等价多路径模型时,则目标TCP流的多路径集合中的多条传输路径之间均为等价路径,此时,第一路径时延则为这些等价路径中时延最大的上行路径的时延,第二路径时延则为这些等价路径中时延最小的上行路径的时延;当主机接入的网络为常规的多路径模型时,则目标TCP流的多路径集合中的多条传输路径之间可以包括等价路径也可以包括非等价路径,此时,第一路径时延则为这些等价或非等价路径中时延最大的上行路径的时延,第二路径时延则为这些等价或非等价路径中时延最小 的上行路径的时延。综上,上述多路径集合中的多个路径之间是否等价,取决于主机所接入的网络的网络拓扑结构的类型,本发明实施例可以适用于所有存在多路径传输的网络类型。In this embodiment of the present invention, when the network accessed by the host is an equal-cost multi-path model, the multiple transmission paths in the multi-path set of the target TCP flow are all equal-cost paths. In this case, the first path delay is the delay of the uplink path with the largest delay among these equal-cost paths, and the second path delay is the delay of the uplink path with the smallest delay among these equal-cost paths; when the network accessed by the host is a conventional multipath model, the multiple transmission paths in the multi-path set of the target TCP flow may include equal-cost paths or non-equivalent paths. In this case, the first path delay is these equivalent or non-equivalent paths. The delay of the uplink path with the largest delay in the middle path, and the delay of the second path is the delay of the uplink path with the smallest delay among these equivalent or non-equivalent paths. To sum up, whether the multiple paths in the above multipath set are equivalent depends on the type of network topology of the network to which the host accesses, and the embodiment of the present invention can be applied to all network types with multipath transmission.
第二方面,本发明实施例提供了一种数据传输方法,应用于交换机,可包括:In a second aspect, an embodiment of the present invention provides a data transmission method, which is applied to a switch, and may include:
接收第一数据包,所述第一数据包包括第一报文段和所述第一报文段的Flowlet标识;确定所述第一数据包所属的目标TCP流,获取与所述目标TCP流匹配的转发信息;所述转发信息包括所述目标TCP流的参考Flowlet标识以及参考转发路径;其中,所述参考Flowlet标识当前为第二报文段对应的第一Flowlet标识,所述第二报文段为所述目标TCP流中与所述第一报文段相邻的上一个报文段;所述参考转发路径为第二数据包的第一转发路径,所述第二数据包包括所述第二报文段和所述第一Flowlet标识;将所述第一报文段的Flowlet标识与所述第一Flowlet标识进行比较;根据比较结果,确定是否将所述第一报文段通过所述第一转发路径转发。Receive a first data packet, where the first data packet includes a first message segment and a Flowlet identifier of the first message segment; determine the target TCP flow to which the first data packet belongs, and obtain the target TCP flow associated with the first data packet matching forwarding information; the forwarding information includes a reference Flowlet identifier of the target TCP flow and a reference forwarding path; wherein, the reference Flowlet identifier is currently the first Flowlet identifier corresponding to the second segment, and the second message The segment is the previous segment adjacent to the first segment in the target TCP flow; the reference forwarding path is the first forwarding path of the second data packet, and the second data packet includes the first segment. two message segments and the first Flowlet identifier; compare the Flowlet identifier of the first message segment with the first Flowlet identifier; and determine whether to pass the first message segment through the The first forwarding path forwards.
本发明实施例,交换机侧在接收到数据包后,通过识别数据包中的Flowlet标识,并根据该Flowlet标识,判断第一数据包与所属的目标TCP流中的相邻的数据包的Flowlet标识是否相同,并基于此决定第一数据包是否需要通过第二数据包对应的转发路径转发。即交换机侧无需根据接收到的数据包的时间间隔,来为数据包划分Flowlet,而是直接根据接收到的数据包中所包含的Flowlet标识位来识别当前待发送的数据包是否与相同TCP流中的上一个相邻数据包属于同一个Flowlet,从而决定是通过相邻数据包的转发路径继续转发,还是为该数据包划分新的Flowlet以及为其决策新的转发路径。In the embodiment of the present invention, after receiving the data packet, the switch side identifies the Flowlet identifier in the data packet, and according to the Flowlet identifier, determines the Flowlet identifier of the adjacent data packet in the target TCP flow to which the first data packet belongs. whether the first data packet needs to be forwarded through the forwarding path corresponding to the second data packet based on this. That is, the switch side does not need to divide Flowlets for data packets according to the time interval of the received data packets, but directly identifies whether the current data packets to be sent are the same TCP flow according to the Flowlet identification bits contained in the received data packets. The previous adjacent data packet in the flowlet belongs to the same Flowlet, so it is decided whether to continue forwarding through the forwarding path of the adjacent data packet, or to divide a new Flowlet for the data packet and decide a new forwarding path for it.
在一种可能的实现方式中,所述交换机维护有转发信息表,所述转发信息表包括M条TCP流的转发信息,M为大于或者等于1的整数,其中,每条TCP流的转发信息包括对应TCP流的五元组哈希值;所述确定所述第一数据包所属的目标TCP流,获取与所述目标TCP流匹配的转发信息,包括:根据所述第一数据包的五元组信息,计算所述第一数据包的五元组哈希值;根据所述第一数据包的五元组哈希值,从所述转发信息表中查找与所述目标TCP流匹配的转发信息。In a possible implementation manner, the switch maintains a forwarding information table, and the forwarding information table includes forwarding information of M TCP flows, where M is an integer greater than or equal to 1, wherein the forwarding information of each TCP flow Including the five-tuple hash value of the corresponding TCP flow; the determining the target TCP flow to which the first data packet belongs, and obtaining the forwarding information matching the target TCP flow, comprising: according to the five-tuple of the first data packet. tuple information, calculate the quintuple hash value of the first data packet; according to the quintuple hash value of the first data packet, search for the target TCP flow matching from the forwarding information table Forward information.
本发明实施例,交换机侧维护有转发信息表,该转发信息表中包括一个或多个与其连接的主机上的TCP流(例如当前处于活动状态的TCP流)的转发信息,而每个TCP流的转发信息又可以包括TCP流的五元组哈希值。即交换机可以维护当前正在活动的所有TCP流的转发信息,以便于当有数据包需要发送的时候,则可以根据数据包的五元组哈希值查找到转发信息表中与该五元组哈希值匹配的转发信息(包括参考Flowlet标识、转发路径等),从而进行待发送数据包的转发。In this embodiment of the present invention, the switch side maintains a forwarding information table, where the forwarding information table includes forwarding information of one or more TCP flows (for example, currently active TCP flows) on the hosts connected to it, and each TCP flow The forwarding information may in turn include the quintuple hash value of the TCP stream. That is, the switch can maintain the forwarding information of all currently active TCP flows, so that when a data packet needs to be sent, it can search the forwarding information table with the quintuple hash value according to the quintuple hash value of the data packet. The forwarding information (including the reference Flowlet identifier, forwarding path, etc.) matching the value of the value is used to forward the to-be-sent data packet.
在一种可能的实现方式中,所述根据比较结果,确定是否将所述第一报文段通过所述转发路径转发,包括:若所述第一报文段的Flowlet标识与所述第一Flowlet标识相同,则将所述第一数据包通过所述第一转发路径转发;若所述第一报文段的Flowlet标识与所述第一Flowlet标识不同,则为所述第一数据包确定第二转发路径,并通过所述第二转发路径转发。In a possible implementation manner, the determining whether to forward the first packet segment through the forwarding path according to the comparison result includes: if the Flowlet identifier of the first packet segment is the same as the first packet If the Flowlet identifiers are the same, the first data packet is forwarded through the first forwarding path; if the Flowlet identifier of the first segment is different from the first Flowlet identifier, it is determined for the first data packet The second forwarding path is forwarded through the second forwarding path.
本发明实施例,当交换机识别出第一数据包与所属的目标TCP流中的相邻的数据包的 Flowlet标识相同时,则将第一数据包与第二数据包在同一路径上转发;当交换机识别出第一数据包与所属的目标TCP流中的相邻的数据包的Flowlet标识不同时,则为第一数据包确定新的转发路径,并通过新的转发路径进行转发。需要说明的是,第二转发路径有可能与第一转发路径相同也可能不同,取决于交换机的决策。In this embodiment of the present invention, when the switch identifies that the first data packet is the same as the Flowlet identifier of the adjacent data packet in the target TCP flow to which it belongs, the switch forwards the first data packet and the second data packet on the same path; when When the switch identifies that the first data packet is different from the flowlet identifier of the adjacent data packet in the target TCP flow to which it belongs, it determines a new forwarding path for the first data packet, and forwards the first data packet through the new forwarding path. It should be noted that the second forwarding path may be the same as or different from the first forwarding path, depending on the decision of the switch.
在一种可能的实现方式中,所述方法还包括:若所述第一报文段的Flowlet标识与所述第一Flowlet标识不同且为第二Flowlet标识,则将所述目标TCP流的参考Flowlet标识更新为所述第二Flowlet标识,以及将所述参考转发路径更新为所述第二转发路径。In a possible implementation manner, the method further includes: if the Flowlet identifier of the first packet segment is different from the first Flowlet identifier and is the second Flowlet identifier, referencing the target TCP flow The flowlet identification is updated to the second flowlet identification, and the reference forwarding path is updated to the second forwarding path.
本发明实施例,当第一数据包与第二数据包的Flowlet标识不同时,则说明第一数据包与所属TCP流中的上一个相邻的第二数据包之间不属于同一个Flowlet,因此,交换机需要将第一数据包划分至新的Flowlet,并且需要将所属TCP流的参考Flowlet标识更新为当前最新的数据包所对应的Flowlet标识,也即是第二Flowlet标识。In this embodiment of the present invention, when the Flowlet identifiers of the first data packet and the second data packet are different, it means that the first data packet and the last adjacent second data packet in the TCP flow to which they belong do not belong to the same Flowlet, Therefore, the switch needs to divide the first data packet into a new Flowlet, and needs to update the reference Flowlet ID of the TCP flow to which it belongs to the Flowlet ID corresponding to the current latest data packet, that is, the second Flowlet ID.
第三方面,本发明实施例提供了一种数据处理装置,可包括:In a third aspect, an embodiment of the present invention provides a data processing apparatus, which may include:
第一生成单元,用于生成第一报文段;a first generating unit, configured to generate a first segment;
第一确定单元,用于确定所述第一报文段所属的目标TCP流;a first determining unit, configured to determine the target TCP flow to which the first segment belongs;
获取单元,用于获取所述第一报文段的时间戳,以及获取与所述目标TCP流匹配的目标流信息,所述目标流信息包括所述目标TCP流对应的时间阈值、所述目标TCP流中第二报文段的时间戳;其中,所述第二报文段为所述目标TCP流中与所述第一报文段相邻的上一个报文段,所述时间阈值为第一路径时延和第二路径时延之差,所述第一路径时延为所述目标TCP流的多路径集合中时延最大的上行路径的时延,所述第二路径时延为所述目标TCP流的多路径集合中时延最小的上行路径的时延;an obtaining unit, configured to obtain the timestamp of the first segment, and obtain target flow information that matches the target TCP flow, where the target flow information includes a time threshold corresponding to the target TCP flow, the target flow Timestamp of the second segment in the TCP flow; wherein the second segment is the previous segment adjacent to the first segment in the target TCP stream, and the time threshold is the first segment The difference between the path delay and the second path delay, the first path delay is the delay of the uplink path with the largest delay in the multipath set of the target TCP flow, and the second path delay is the The delay of the uplink path with the smallest delay in the multipath set of the target TCP flow;
第一比较单元,用于将所述第一报文段的时间戳与第二报文段的时间戳的差值与所述时间阈值进行比较;a first comparison unit, configured to compare the difference between the timestamp of the first packet segment and the timestamp of the second packet segment with the time threshold;
Flowlet划分单元,用于根据比较结果,确定是否将所述第一报文段与所述第二报文段划分在同一个Flowlet。The flowlet dividing unit is configured to determine, according to the comparison result, whether to divide the first packet segment and the second packet segment into the same Flowlet.
在一种可能的实现方式中,所述第一确定单元,具体用于:In a possible implementation manner, the first determining unit is specifically configured to:
根据所述第一报文段的源端口号,确定所述第一报文段所属的所述目标TCP流。The target TCP flow to which the first message segment belongs is determined according to the source port number of the first message segment.
在一种可能的实现方式中,所述装置还包括:In a possible implementation, the apparatus further includes:
维护单元,用于维护流信息表,所述流信息表包括N条TCP流的流信息,N为大于或者等于1的整数,其中,每条TCP流的流信息包括对应TCP流的流索引;A maintenance unit, configured to maintain a flow information table, where the flow information table includes flow information of N TCP flows, where N is an integer greater than or equal to 1, wherein the flow information of each TCP flow includes a flow index corresponding to the TCP flow;
所述获取单元,具体用于:根据所述目标TCP流的流索引从所述流信息表中查找与所述目标TCP流匹配的所述目标流信息。The obtaining unit is specifically configured to: search the target flow information matching the target TCP flow from the flow information table according to the flow index of the target TCP flow.
在一种可能的实现方式中,所述Flowlet划分单元,具体用于:In a possible implementation manner, the Flowlet is divided into units, and is specifically used for:
若所述第一报文段的时间戳与所述第二报文段的时间戳的差值小于或者等于所述时间阈值,将所述第一报文段与所述第二报文段划分在同一Flowlet中;If the difference between the timestamp of the first packet segment and the timestamp of the second packet segment is less than or equal to the time threshold, divide the first packet segment from the second packet segment in the same Flowlet;
若所述第一报文段的时间戳与所述第二报文段的时间戳的差值大于所述时间阈值,将所述第一报文段划分至新的Flowlet中。If the difference between the timestamp of the first packet segment and the timestamp of the second packet segment is greater than the time threshold, the first packet segment is divided into a new Flowlet.
在一种可能的实现方式中,所述目标流信息还包括所述目标TCP流的参考Flowlet标 识,所述参考Flowlet标识当前为所述第二报文段对应的第一Flowlet标识;所述装置还包括:In a possible implementation manner, the target flow information further includes a reference Flowlet identifier of the target TCP flow, and the reference Flowlet identifier is currently the first Flowlet identifier corresponding to the second segment; the apparatus Also includes:
第二生成单元,用于生成第一数据包,所述第一数据包包括所述第一报文段和所述第一报文段的Flowlet标识;其中,a second generating unit, configured to generate a first data packet, where the first data packet includes the first packet segment and the Flowlet identifier of the first packet segment; wherein,
若所述第一报文段的时间戳与所述第二报文段的时间戳的差值小于或者等于所述时间阈值,则所述第一报文段的Flowlet标识为所述第一Flowlet标识;If the difference between the timestamp of the first packet segment and the timestamp of the second packet segment is less than or equal to the time threshold, the flowlet of the first packet segment is identified as the first flowlet identification;
若所述第一报文段的时间戳与所述第二报文段的时间戳的差值大于所述时间阈值,则所述第一报文段的Flowlet标识为第二Flowlet标识。If the difference between the timestamp of the first packet segment and the timestamp of the second packet segment is greater than the time threshold, the Flowlet identifier of the first packet segment is the second Flowlet identifier.
在一种可能的实现方式中,所述装置还包括:In a possible implementation, the apparatus further includes:
第一更新单元,用于若所述第一报文段的时间戳与所述第二报文段的时间戳的差值大于所述时间阈值,则将所述参考Flowlet标识更新为所述第二Flowlet标识。a first update unit, configured to update the reference Flowlet identifier to the first message segment if the difference between the timestamp of the first segment and the timestamp of the second segment is greater than the time threshold 2. Flowlet logo.
在一种可能的实现方式中,所述装置还包括:In a possible implementation, the apparatus further includes:
接收单元,用于接收目标ACK包,所述目标ACK包为与所述目标TCP流的目的端口号相同或者目的地址相同的ACK包;a receiving unit, configured to receive a target ACK packet, where the target ACK packet is an ACK packet with the same destination port number or the same destination address as the target TCP stream;
第二确定单元,用于确定所述目标ACK包的上行路径时延,所述上行路径时延为所述目标ACK包的时间戳值和时间戳回送回答值之间的差值;a second determining unit, configured to determine the uplink path delay of the target ACK packet, where the uplink path delay is the difference between the timestamp value of the target ACK packet and the timestamp echo response value;
第二比较单元,用于将所述目标ACK包的上行路径时延与所述目标TCP流的多路径集合中上行路径的时延进行比较;a second comparison unit, configured to compare the uplink path delay of the target ACK packet with the uplink path delay in the multipath set of the target TCP flow;
第二更新单元,用于若所述目标ACK包的上行路径时延大于所述多路径集合中当前时延最大的上行路径的时延,则将所述第一路径时延更新为所述目标ACK包的上行路径时延;a second updating unit, configured to update the first path delay to the target if the uplink path delay of the target ACK packet is greater than the delay of the uplink path with the current maximum delay in the multipath set Upstream path delay of ACK packet;
第三更新单元,用于若所述目标ACK包的上行路径时延小于所述多路径集合中当前时延最小的上行路径的时延,则将所述第二路径时延更新为所述目标ACK包的上行路径时延。a third updating unit, configured to update the second path delay to the target if the uplink path delay of the target ACK packet is smaller than the delay of the uplink path with the smallest current delay in the multipath set Upstream path delay of ACK packets.
在一种可能的实现方式中,所述多路径集合包括所述目标TCP流的多条等价传输路径;或者,所述多路径集合包括所述目标TCP流的多条等价传输路径以及非等价传输路径;或者,所述多路径集合包括所述目标TCP流的多条非等价传输路径。In a possible implementation manner, the multi-path set includes multiple equivalent transmission paths of the target TCP flow; or, the multi-path set includes multiple equivalent transmission paths of the target TCP flow and non-equivalent transmission paths of the target TCP flow Equivalent transmission paths; or, the multi-path set includes multiple non-equivalent transmission paths of the target TCP stream.
第四方面,本发明实施例提供了一种数据处理装置,可包括:In a fourth aspect, an embodiment of the present invention provides a data processing apparatus, which may include:
接收单元,用于接收第一数据包,所述第一数据包包括第一报文段和所述第一报文段的Flowlet标识,所述第一数据包属于目标TCP流;a receiving unit, configured to receive a first data packet, where the first data packet includes a first message segment and a Flowlet identifier of the first message segment, and the first data packet belongs to a target TCP flow;
确定单元,用于确定所述第一数据包所属的目标TCP流,获取与所述目标TCP流匹配的转发信息;所述转发信息包括所述目标TCP流的参考Flowlet标识以及参考转发路径;其中,所述参考Flowlet标识当前为第二报文段对应的第一Flowlet标识,所述第二报文段为所述目标TCP流中与所述第一报文段相邻的上一个报文段;所述参考转发路径为第二数据包的第一转发路径,所述第二数据包包括所述第二报文段和所述第一Flowlet标识;a determining unit, configured to determine the target TCP flow to which the first data packet belongs, and obtain forwarding information matching the target TCP flow; the forwarding information includes a reference Flowlet identifier of the target TCP flow and a reference forwarding path; wherein , the reference Flowlet identifier is currently the first Flowlet identifier corresponding to the second message segment, and the second message segment is the previous message segment adjacent to the first message segment in the target TCP flow; The reference forwarding path is a first forwarding path of a second data packet, and the second data packet includes the second packet segment and the first Flowlet identifier;
比较单元,用于将所述第一报文段的Flowlet标识与所述第一Flowlet标识进行比较;a comparison unit, configured to compare the Flowlet identifier of the first segment with the first Flowlet identifier;
转发单元,用于根据比较结果,确定是否将所述第一报文段通过所述第一转发路径转发。A forwarding unit, configured to determine whether to forward the first packet segment through the first forwarding path according to the comparison result.
在一种可能的实现方式中,所述交换机维护有转发信息表,所述转发信息表包括M条 TCP流的转发信息,M为大于或者等于1的整数,其中,每条TCP流的转发信息包括对应TCP流的五元组哈希值;所述确定单元,具体用于:In a possible implementation manner, the switch maintains a forwarding information table, and the forwarding information table includes forwarding information of M TCP flows, where M is an integer greater than or equal to 1, wherein the forwarding information of each TCP flow Including the five-tuple hash value corresponding to the TCP flow; the determining unit is specifically used for:
根据所述第一数据包的五元组信息,计算所述第一数据包的五元组哈希值;According to the quintuple information of the first data packet, calculate the quintuple hash value of the first data packet;
根据所述第一数据包的五元组哈希值,从所述转发信息表中查找与所述目标TCP流匹配的转发信息。According to the quintuple hash value of the first data packet, the forwarding information matching the target TCP flow is searched from the forwarding information table.
在一种可能的实现方式中,所述转发单元,具体用于:In a possible implementation manner, the forwarding unit is specifically used for:
若所述第一报文段的Flowlet标识与所述第一Flowlet标识相同,则将所述第一数据包通过所述第一转发路径转发;If the Flowlet identifier of the first packet segment is the same as the first Flowlet identifier, forwarding the first data packet through the first forwarding path;
若所述第一报文段的Flowlet标识与所述第一Flowlet标识不同,则为所述第一数据包确定第二转发路径,并通过所述第二转发路径转发。If the Flowlet identifier of the first packet segment is different from the first Flowlet identifier, a second forwarding path is determined for the first data packet, and forwarded through the second forwarding path.
在一种可能的实现方式中,所述装置还包括:In a possible implementation, the apparatus further includes:
更新单元,若所述第一报文段的Flowlet标识与所述第一Flowlet标识不同且为第二Flowlet标识,则将所述目标TCP流的参考Flowlet标识更新为所述第二Flowlet标识,以及将所述参考转发路径更新为所述第二转发路径。an update unit, if the Flowlet identifier of the first segment is different from the first Flowlet identifier and is the second Flowlet identifier, updating the reference Flowlet identifier of the target TCP flow to the second Flowlet identifier, and The reference forwarding path is updated to the second forwarding path.
第五方面,本申请提供一种半导体芯片,可包括上述第三方面中的任意一种实现方式所提供的数据处理装置。In a fifth aspect, the present application provides a semiconductor chip, which may include the data processing apparatus provided by any one of the implementation manners of the third aspect.
第六方面,本申请提供一种半导体芯片,可包括上述第四方面中的任意一种实现方式所提供的数据处理装置。In a sixth aspect, the present application provides a semiconductor chip, which may include the data processing apparatus provided by any one of the implementation manners of the fourth aspect.
第七方面,本申请提供一种半导体芯片,可包括:上述第三方面中的任意一种实现方式所提供的数据处理装置、耦合于所述数据处理装置的内部存储器以及外部存储器。In a seventh aspect, the present application provides a semiconductor chip, which may include: the data processing device provided by any one of the implementation manners of the third aspect, an internal memory coupled to the data processing device, and an external memory.
第八方面,本申请提供一种半导体芯片,可包括:上述第四方面中的任意一种实现方式所提供的数据传输装置、耦合于所述数据处理装置的内部存储器以及外部存储器。In an eighth aspect, the present application provides a semiconductor chip, which may include: the data transmission device provided by any one of the implementation manners of the fourth aspect, an internal memory coupled to the data processing device, and an external memory.
第九方面,本申请提供一种片上系统SoC芯片,该SoC芯片包括上述第三方面中的任意一种实现方式所提供的数据处理装置、耦合于所述数据处理装置的内部存储器和外部存储器。该SoC芯片,可以由芯片构成,也可以包含芯片和其他分立器件。In a ninth aspect, the present application provides a system-on-chip SoC chip, where the SoC chip includes the data processing apparatus provided in any one of the implementation manners of the third aspect, an internal memory and an external memory coupled to the data processing apparatus. The SoC chip may be composed of chips, or may include chips and other discrete devices.
第十方面,本申请提供一种片上系统SoC芯片,该SoC芯片包括上述第四方面中的任意一种实现方式所提供的数据传输装置、耦合于所述数据传输装置的内部存储器和外部存储器。该SoC芯片,可以由芯片构成,也可以包含芯片和其他分立器件。In a tenth aspect, the present application provides a system-on-chip SoC chip, where the SoC chip includes the data transmission device provided by any one of the implementation manners of the fourth aspect, an internal memory and an external memory coupled to the data transmission device. The SoC chip may be composed of chips, or may include chips and other discrete devices.
第十一方面,本申请提供了一种芯片系统,该芯片系统包括上述第三方面中的任意一种实现方式所提供的数据处理装置。在一种可能的设计中,所述芯片系统还包括存储器,所述存储器,用于保存所述数据处理装置在运行过程中所必要或相关的程序指令和数据。该芯片系统,可以由芯片构成,也可以包含芯片和其它分立器件。In an eleventh aspect, the present application provides a chip system, where the chip system includes the data processing apparatus provided by any one of the implementation manners of the foregoing third aspect. In a possible design, the chip system further includes a memory for storing necessary or related program instructions and data during the operation of the data processing apparatus. The chip system may be composed of chips, or may include chips and other discrete devices.
第十二方面,本申请提供了一种芯片系统,该芯片系统包括上述第四方面中的任意一种实现方式所提供的数据传输装置。在一种可能的设计中,所述芯片系统还包括存储器,所述存储器,用于保存所述数据传输装置在运行过程中所必要或相关的程序指令和数据。该芯片系统,可以由芯片构成,也可以包含芯片和其它分立器件。In a twelfth aspect, the present application provides a chip system, where the chip system includes the data transmission device provided by any one of the implementation manners of the fourth aspect. In a possible design, the chip system further includes a memory for storing necessary or related program instructions and data during the operation of the data transmission device. The chip system may be composed of chips, or may include chips and other discrete devices.
第十三方面,本申请提供一种数据处理装置,该处理装置具有实现上述第一方面中的 任意一种数据处理方法的功能。该功能可以通过硬件实现,也可以通过硬件执行相应的软件实现。该硬件或软件包括一个或多个与上述功能相对应的模块。In a thirteenth aspect, the present application provides a data processing apparatus, the processing apparatus having the function of implementing any one of the data processing methods in the above-mentioned first aspect. This function can be implemented by hardware or by executing corresponding software by hardware. The hardware or software includes one or more modules corresponding to the above functions.
第十四方面,本申请提供一种数据传输装置,该处理装置具有实现上述第二方面中的任意一种数据传输方法的功能。该功能可以通过硬件实现,也可以通过硬件执行相应的软件实现。该硬件或软件包括一个或多个与上述功能相对应的模块。In a fourteenth aspect, the present application provides a data transmission device, and the processing device has the function of implementing any one of the data transmission methods in the above-mentioned second aspect. This function can be implemented by hardware or by executing corresponding software by hardware. The hardware or software includes one or more modules corresponding to the above functions.
第十五方面,本申请提供一种主机,该主机包括处理器,该处理器用于执行上述第一方面中的任意一种实现方式所提供的数据处理方法。该主机还可以包括存储器,存储器用于与处理器耦合,其保存主机必要的程序指令和数据。该主机还可以包括通信接口,用于该主机与其它设备或通信网络通信。In a fifteenth aspect, the present application provides a host, where the host includes a processor, and the processor is configured to execute the data processing method provided by any one of the implementation manners of the foregoing first aspect. The host may also include memory, coupled to the processor, which holds program instructions and data necessary for the host. The host may also include a communication interface for the host to communicate with other devices or communication networks.
第十六方面,本申请提供一种交换机,该交换机包括处理器,该处理器用于执行上述第一方面中的任意一种实现方式所提供的数据传输方法。该交换机还可以包括存储器,存储器用于与处理器耦合,其保存交换机必要的程序指令和数据。该交换机还可以包括通信接口,用于该交换机与其它设备或通信网络通信。In a sixteenth aspect, the present application provides a switch, where the switch includes a processor, and the processor is configured to execute the data transmission method provided by any one of the implementation manners of the foregoing first aspect. The switch may also include a memory for coupling with the processor that holds program instructions and data necessary for the switch. The switch may also include a communication interface for the switch to communicate with other devices or communication networks.
第十七方面,本申请提供一种计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,该计算机程序被主机执行时实现上述第二方面中任意一项所述的多核处理器的处理方法流程。In a seventeenth aspect, the present application provides a computer-readable storage medium, where the computer-readable storage medium stores a computer program, and when the computer program is executed by a host, implements the multi-core processor described in any one of the second aspect above processing method flow.
第十八方面,本申请提供一种计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,该计算机程序被交换机执行时实现上述第四方面中任意一项所述的多核处理器的处理方法流程。In an eighteenth aspect, the present application provides a computer-readable storage medium, where the computer-readable storage medium stores a computer program, and when the computer program is executed by a switch, implements the multi-core processor described in any one of the fourth aspect above processing method flow.
第十九方面,本发明实施例提供了一种计算机程序,该计算机程序包括指令,当该计算机程序被多核处理器执行时,使得主机可以执行上述第二方面中任意一项所述的多核处理器的处理方法流程。In a nineteenth aspect, an embodiment of the present invention provides a computer program, where the computer program includes instructions, when the computer program is executed by a multi-core processor, so that a host can execute the multi-core processing described in any one of the second aspect above The processing method flow of the device.
第二十方面,本发明实施例提供了一种计算机程序,该计算机程序包括指令,当该计算机程序被多核处理器执行时,使得交换机可以执行上述第四方面中任意一项所述的多核处理器的处理方法流程。In a twentieth aspect, an embodiment of the present invention provides a computer program, where the computer program includes instructions, when the computer program is executed by a multi-core processor, the switch can perform the multi-core processing described in any one of the fourth aspect above The processing method flow of the device.
附图说明Description of drawings
图1为现有技术中的一种TCP流划分为Flowlet的示意图。FIG. 1 is a schematic diagram of dividing a TCP flow into Flowlets in the prior art.
图2是本申请实施例提供的一种网络传输系统架构示意图。FIG. 2 is a schematic diagram of an architecture of a network transmission system provided by an embodiment of the present application.
图3是本申请实施例提供的一种数据中心网络拓扑结构示意图。FIG. 3 is a schematic diagram of a topology structure of a data center network provided by an embodiment of the present application.
图4是本申请实施例提供的一种计算机网络OSI模型以及TCP/IP模型示意图。FIG. 4 is a schematic diagram of a computer network OSI model and a TCP/IP model provided by an embodiment of the present application.
图5是本发明实施例提供的一种数据传输方法的流程示意图。FIG. 5 is a schematic flowchart of a data transmission method provided by an embodiment of the present invention.
图6A为本发明实施例提供的第一数据包和第二数据包在同一个Flowlet的示意图。FIG. 6A is a schematic diagram of a first data packet and a second data packet in the same Flowlet according to an embodiment of the present invention.
图6B为本发明实施例提供的第一数据包和第二数据包在不同Flowlet的示意图。FIG. 6B is a schematic diagram of a first data packet and a second data packet in different Flowlets according to an embodiment of the present invention.
图6C为本发明实施例所提供的一种附加层协议划分并标记Flowlet的流程示意图。FIG. 6C is a schematic flowchart of dividing and marking a Flowlet by an additional layer protocol according to an embodiment of the present invention.
图6D为本发明实施例所提供的一种附加层协议动态更新Flowlet的切分阈值的流程示意图。FIG. 6D is a schematic flowchart of an additional layer protocol dynamically updating the segmentation threshold of a Flowlet according to an embodiment of the present invention.
图7是本发明实施例提供的一种数据传输方法的流程示意图。FIG. 7 is a schematic flowchart of a data transmission method provided by an embodiment of the present invention.
图8是本发明实施例提供的一种数据处理装置的结构示意图。FIG. 8 is a schematic structural diagram of a data processing apparatus provided by an embodiment of the present invention.
图9是本发明实施例提供的一种数据传输装置的结构示意图。FIG. 9 is a schematic structural diagram of a data transmission apparatus provided by an embodiment of the present invention.
具体实施方式Detailed ways
下面将结合本发明实施例中的附图,对本发明实施例进行描述。本申请的说明书和权利要求书及所述附图中的术语“第一”、“第二”、“第三”和“第四”等是用于区别不同对象,而不是用于描述特定顺序。此外,术语“包括”和“具有”以及它们任何变形,意图在于覆盖不排他的包含。例如包含了一系列步骤或单元的过程、方法、系统、产品或设备没有限定于已列出的步骤或单元,而是可选地还包括没有列出的步骤或单元,或可选地还包括对于这些过程、方法、产品或设备固有的其它步骤或单元。在本文中提及“实施例”意味着,结合实施例描述的特定特征、结构或特性可以包含在本申请的至少一个实施例中。在说明书中的各个位置出现该短语并不一定均是指相同的实施例,也不是与其它实施例互斥的独立的或备选的实施例。本领域技术人员显式地和隐式地理解的是,本文所描述的实施例可以与其它实施例相结合。The embodiments of the present invention will be described below with reference to the accompanying drawings in the embodiments of the present invention. The terms "first", "second", "third" and "fourth" in the description and claims of the present application and the drawings are used to distinguish different objects, rather than to describe a specific order . Furthermore, the terms "comprising" and "having", and any variations thereof, are intended to cover non-exclusive inclusion. For example, a process, method, system, product or device comprising a series of steps or units is not limited to the listed steps or units, but optionally also includes unlisted steps or units, or optionally also includes For other steps or units inherent to these processes, methods, products or devices. Reference herein to an "embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the present application. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor a separate or alternative embodiment that is mutually exclusive of other embodiments. It is explicitly and implicitly understood by those skilled in the art that the embodiments described herein may be combined with other embodiments.
在本说明书中使用的术语“部件”、“模块”、“系统”等用于表示计算机相关的实体、硬件、固件、硬件和软件的组合、软件、或执行中的软件。例如,部件可以是但不限于,在处理器上运行的进程、处理器、对象、可执行文件、执行线程、程序和/或计算机。通过图示,在计算设备上运行的应用和计算设备都可以是部件。一个或多个部件可驻留在进程和/或执行线程中,部件可位于一个计算机上和/或分布在2个或更多个计算机之间。此外,这些部件可从在上面存储有各种数据结构的各种计算机可读介质执行。部件可例如根据具有一个或多个数据分组(例如来自与本地系统、分布式系统和/或网络间的另一部件交互的二个部件的数据,例如通过信号与其它系统交互的互联网)的信号通过本地和/或远程进程来通信。The terms "component", "module", "system" and the like are used in this specification to refer to a computer-related entity, hardware, firmware, a combination of hardware and software, software, or software in execution. For example, a component may be, but is not limited to, a process running on a processor, a processor, an object, an executable, a thread of execution, a program, and/or a computer. By way of illustration, both an application running on a computing device and the computing device may be components. One or more components may reside within a process and/or thread of execution, and a component may be localized on one computer and/or distributed between 2 or more computers. In addition, these components can execute from various computer readable media having various data structures stored thereon. A component may, for example, be based on a signal having one or more data packets (eg, data from two components interacting with another component between a local system, a distributed system, and/or a network, such as the Internet interacting with other systems via signals) Communicate through local and/or remote processes.
首先,对本申请中的部分用语进行解释说明,以便于本领域技术人员理解。First, some terms in this application will be explained so as to facilitate the understanding of those skilled in the art.
(1)等价多路径(Equal-Cost Multipath Routing,ECMP),即存在多条到达同一个目的地址的相同开销的路径,其中,相同开销是指经过的交换机的跳数(也即是个数)相同。例如,在等价多路径模型的网络拓扑结构(如数据中心网络)中,在同一对源主机和目的主机之间的所有可能的传输路径均为等价路径。当设备支持等价路由时,发往该目的IP或者目的网段的流量就可以通过不同的路径分担,实现网络的负载均衡,并在其中某些路径出现故障时,由其它路径代替完成转发处理,实现路由冗余备份功能。如果使用传统的路由技术,发往该目的地址的数据包只能利用其中的一条链路,其它链路处于备份状态或无效状态,并且在动态路由环境下相互的切换需要一定时间,而等价多路径路由协议可以在该网络环境下同时使用多条链路,不仅增加了传输带宽,并且可以无时延无丢包地备份失效链路的数据传输。(1) Equal-Cost Multipath Routing (ECMP), that is, there are multiple paths with the same cost to the same destination address, where the same cost refers to the number of hops (that is, the number) of the switches passed through. same. For example, in the network topology of the equal-cost multi-path model (such as a data center network), all possible transmission paths between the same pair of source and destination hosts are equal-cost paths. When the device supports equal-cost routing, the traffic sent to the destination IP or destination network segment can be shared through different paths to achieve network load balancing, and when some of the paths are faulty, other paths are used instead to complete the forwarding process , to achieve routing redundancy backup function. If the traditional routing technology is used, the data packets sent to the destination address can only use one of the links, the other links are in the backup state or invalid state, and it takes a certain amount of time to switch each other in the dynamic routing environment, and the equivalent cost The multi-path routing protocol can use multiple links at the same time in this network environment, which not only increases the transmission bandwidth, but also can backup the data transmission of the failed link without delay and packet loss.
(2)传输控制协议(Transmission Control Protocol,TCP),是一种面向连接的、可靠的、基于字节流的传输层通信协议。TCP旨在适应支持多网络应用的分层协议层次结构。连接到不同但互连的计算机通信网络的主计算机中的成对进程之间依靠TCP提供可靠的通信服务。TCP假设它可以从较低级别的协议获得简单的,可能不可靠的数据报服务。原则上,TCP应该能够在从硬线连接到分组交换或电路交换网络的各种通信系统之上操作。(2) Transmission Control Protocol (TCP) is a connection-oriented, reliable, byte stream-based transport layer communication protocol. TCP is designed to accommodate a layered protocol hierarchy that supports multiple network applications. Reliable communication services are provided between pairs of processes in host computers connected to different but interconnected computer communication networks relying on TCP. TCP assumes that it can get simple, possibly unreliable, datagram service from lower-level protocols. In principle, TCP should be able to operate over a wide variety of communication systems from hardwired to packet-switched or circuit-switched networks.
(3)网络流(Flow),也可简称为网流,在一段时间内具有相同五元组的数据包的集合称为一条网络流,其中五元组包含通信双方的源IP地址、源端口号、目的IP地址、目的端口号以及传输层协议。(3) Network flow (Flow), also referred to as network flow, a collection of data packets with the same quintuple in a period of time is called a network flow, where the quintuple contains the source IP address and source port of both parties. number, destination IP address, destination port number, and transport layer protocol.
(4)网络会话,是多条网络流的集合,多条网络流具有相同的三元组(源地址,目的地址,传输层协议)。(4) A network session is a collection of multiple network flows, and multiple network flows have the same triplet (source address, destination address, transport layer protocol).
(5)流切片/微流/小流(Flowlet),可以理解为是一个Flow中连续发送的多个报文组成的报文组,每个Flow中包括多个Flowlet。基于Flowlet机制进行报文转发时可基于Flowlet流表表项实现Flowlet中包括的多个报文的转发。不同的Flowlet对应不同的Flowlet流表表项。所述Flowlet流表表项用于指示每个Flowlet中包括的多个报文的报文转发路径。(5) Flow slice/microflow/small flow (Flowlet), which can be understood as a packet group composed of multiple packets continuously sent in a Flow, and each Flow includes multiple Flowlets. When packets are forwarded based on the Flowlet mechanism, multiple packets included in the Flowlet can be forwarded based on the Flowlet flow table entries. Different Flowlets correspond to different Flowlet flow table entries. The Flowlet flow table entry is used to indicate packet forwarding paths of multiple packets included in each Flowlet.
(6)传输控制协议/网际协议(Transmission Control Protocol/Internet Protocol,TCP/IP)是指能够在多个不同网络间实现信息传输的协议簇。TCP/IP协议不仅仅指的是TCP和IP两个协议,而是指一个由FTP、SMTP、TCP、UDP、IP等协议构成的协议簇,只是因为在TCP/IP协议中TCP协议和IP协议最具代表性,所以被称为TCP/IP协议。其中,TCP是一种面向连接的、可靠的、基于字节流的传输层通信协议。(6) Transmission Control Protocol/Internet Protocol (Transmission Control Protocol/Internet Protocol, TCP/IP) refers to a protocol suite that can realize information transmission between multiple different networks. The TCP/IP protocol not only refers to the two protocols of TCP and IP, but also refers to a protocol cluster composed of FTP, SMTP, TCP, UDP, IP and other protocols, just because the TCP protocol and the IP protocol in the TCP/IP protocol The most representative, so it is called the TCP/IP protocol. Among them, TCP is a connection-oriented, reliable, byte stream-based transport layer communication protocol.
(7)互联网服务提供商(Internet Service Provider,ISP)网络,即向广大用户综合提供互联网接入业务、信息业务、和增值业务的电信运营商。(7) Internet Service Provider (ISP) network, that is, a telecommunications operator that comprehensively provides Internet access services, information services, and value-added services to users.
为了便于理解本申请实施例,下面先对本申请实施例所基于的网络传输系统架构进行描述。图2是本申请实施例提供的一种网络传输系统架构示意图,请参阅图2,该网络传输系统架构中主要包括:主机10、交换机(SWITCH)20、以及互联网。而主机10又可以根据其为发送端或者为接收端分为源主机或目的主机,源主机可通过交换机20连接至互联网从而与目的主机进行通信。In order to facilitate understanding of the embodiments of the present application, the following first describes the network transmission system architecture on which the embodiments of the present application are based. FIG. 2 is a schematic diagram of an architecture of a network transmission system provided by an embodiment of the present application. Please refer to FIG. 2 . The architecture of the network transmission system mainly includes: a host 10 , a switch (SWITCH) 20 , and the Internet. The host 10 can be classified as a source host or a destination host according to whether it is a sender or a receiver. The source host can be connected to the Internet through the switch 20 to communicate with the destination host.
主机10,可以为任何产生数据,且具有网络接入功能的计算设备。例如,与,Internet相连的任何一台计算机都可以称为主机,每台主机都有一个唯一的IP地址。其中,主机10具体可以为服务器、个人计算机、平板电脑、手机、个人数字助理、智能穿戴设备、无人驾驶终端等各类设备。当两台主机(如源主机和目的主机)要通信传送数据时,需要源主机把应用数据封装成数据包(如TCP/IP包),然后再交给下一层数据链路层(如交换机)继续封装成帧;之后交换机等根据MAC地址把数据从源主机,准确无误的传送到目的主机。在本发明实施例中,主机10还具有对报文段进行Flowlet的划分、Flowlet标识以及动态配置用于划分Flowlet的时间阈值等功能,具体参见后续相关实施例的描述,此处不再赘述。The host 10 can be any computing device that generates data and has a network access function. For example, any computer connected to the Internet can be called a host, and each host has a unique IP address. The host 10 may specifically be various types of devices, such as a server, a personal computer, a tablet computer, a mobile phone, a personal digital assistant, a smart wearable device, and an unmanned terminal. When two hosts (such as the source host and the destination host) want to communicate and transmit data, the source host needs to encapsulate the application data into data packets (such as TCP/IP packets), and then hand them over to the next data link layer (such as a switch). ) continues to encapsulate into frames; then the switch, etc., transmits the data from the source host to the destination host accurately according to the MAC address. In this embodiment of the present invention, the host 10 also has the functions of dividing the packet segment by Flowlet, identifying the Flowlet, and dynamically configuring the time threshold for dividing the Flowlet.
交换机20,是一种基于MAC(网卡的硬件地址)识别,完成封装转发数据包功能的网络设备。其可以“学习”MAC地址,并把其存放在内部地址表中,通过在数据帧的发送端和接收端之间建立临时的交换路径,使数据帧直接由源地址到达目的地址。交换机20的功能可包括物理编址、网络拓扑结构、错误校验、帧序列以及流控等。在本发明实施例中,交换机20还具有根据上述主机10侧对报文段进行的Flowlet的划分、Flowlet标识等功能,进而基于已经划分并标识好的Flowlet标识,来进行数据流中的相同Flowlet或者不同Flowlet的转发,具体参见后续相关实施例的描述,此处不再赘述。The switch 20 is a network device that performs the function of encapsulating and forwarding data packets based on MAC (hardware address of the network card) identification. It can "learn" the MAC address and store it in the internal address table. By establishing a temporary exchange path between the sender and receiver of the data frame, the data frame can directly reach the destination address from the source address. The functions of switch 20 may include physical addressing, network topology, error checking, frame sequence, and flow control, among others. In the embodiment of the present invention, the switch 20 also has functions such as Flowlet division and Flowlet identification performed on the packet segment according to the above-mentioned host 10 side, and further performs the same Flowlet in the data flow based on the Flowlet identification that has been divided and identified. Or for the forwarding of different Flowlets, refer to the description of the subsequent related embodiments for details, and details are not repeated here.
例如,源主机10采用如传输控制协议(Transmission Control Protocol,TCP)并通过本申 请中的数据处理方法对数据进行处理,之后则向路由交换网络的报文转发设备发送报文,路由交换网络中的报文转发设备(诸如交换机、路由器等)采用ECMP技术并通过本申请中的数据传输方法进行报文的转发,最终转发至目的主机10,进而达到负载均衡处理的效果。For example, the source host 10 uses a transmission control protocol (Transmission Control Protocol, TCP) and processes the data through the data processing method in this application, and then sends a message to the message forwarding device of the routing switching network. The message forwarding device (such as switch, router, etc.) adopts ECMP technology and forwards the message through the data transmission method in this application, and finally forwards the message to the destination host 10, thereby achieving the effect of load balancing processing.
本发明实施例中的数据处理方法或数据传输方法,可以适用于基于TCP/IP的传输机制。本发明中的数据传输方法的应用范围不仅仅局限于数据中心网络,还适用于任何存在多路径的网络,如ISP网络(Internet Service Provider),该网络拓扑为任意两个源目的通信节点(即源主机和目的主机)都提供了多条网络路径,因此可应用本申请中的技术方案来执行Flowlet粒度的动态负载均衡。The data processing method or the data transmission method in the embodiment of the present invention may be applicable to a transmission mechanism based on TCP/IP. The application scope of the data transmission method in the present invention is not only limited to the data center network, but also applicable to any network with multiple paths, such as an ISP network (Internet Service Provider), where the network topology is any two source-destination communication nodes (that is, Both the source host and the destination host) provide multiple network paths, so the technical solutions in this application can be applied to perform dynamic load balancing at Flowlet granularity.
需要说明的是,对于数据中心网络来说,其网络拓扑结构的特点决定了在数据中心网络的同一网络会话中,也即是即三元组(源地址,目的地址,传输层协议)信息相同的TCP流,其所对应的多路径集合中所包含的一个或多个路径均为等价路径;而对于其他类型的网络来说,例如在ISP网络的同一网络会话中,其所对应的多路径集合中所包含的一个或多个路径之间则可能等价也可能不等价。因此,依据主机所接入的网络的拓扑结构类型的不同,本申请中所述的目标TCP流的多路径集合中所包含的一个或多个路径之间可以等价,也可以不等价。It should be noted that for the data center network, the characteristics of its network topology determine that in the same network session of the data center network, that is, the triplet (source address, destination address, transport layer protocol) information is the same. One or more paths included in the corresponding multi-path set are equal-cost paths; for other types of networks, such as in the same network session of the ISP network, the corresponding multi-path sets are equal-cost paths. One or more paths included in the path set may or may not be equivalent. Therefore, one or more paths included in the multi-path set of the target TCP flow described in this application may be equivalent or not equivalent according to the type of topology of the network to which the host is connected.
可以理解的是,以上图2中的网络架构只是本申请实施例中的一种示例性实施方式,本发明实施例中的网络架构包括但不仅限于以上网络架构。It can be understood that the above network architecture in FIG. 2 is only an exemplary implementation in the embodiment of the present application, and the network architecture in the embodiment of the present invention includes but is not limited to the above network architecture.
请参阅图3,图3是本申请实施例提供的一种数据中心网络拓扑结构示意图,该数据中心网络中主要包括:Core核心层、Aggregation汇聚层、Access接入层以及POD汇聚区域层。源主机可通过交换机以及核心网通过TCP协议与目的主机进行通信。其中,Please refer to FIG. 3 . FIG. 3 is a schematic diagram of a data center network topology provided by an embodiment of the present application. The data center network mainly includes: a Core core layer, an Aggregation convergence layer, an Access access layer, and a POD convergence area layer. The source host can communicate with the destination host through the switch and the core network through the TCP protocol. in,
汇聚区域(Point of delivery,POD)层,由多个POD组成,每个POD可包括服务器、存储和网络设备。其中,接入架顶模式(Top of Rack,ToR)是数据中心服务器机柜布线的一种方式,采用TOR方式布线时,每个服务器机柜的上端部署1~2台接入交换机。The Point of Delivery (POD) layer consists of multiple PODs, each of which can include servers, storage, and network devices. Among them, the Top of Rack (ToR) mode is a way of cabling server cabinets in the data center. When the TOR mode is used for cabling, 1 to 2 access switches are deployed on the upper end of each server cabinet.
Access接入层:物理连接服务器,一般放在机柜顶端,也称ToR交换机,或者称为Edge接入层(Edge Layer)。接入交换机通常位于机架顶部,所以它们也被称为ToR(Top of Rack)交换机,它们物理连接服务器。Access layer: Physically connected to the server, generally placed at the top of the cabinet, also known as the ToR switch, or the Edge layer. Access switches are usually located at the top of the rack, so they are also called ToR (Top of Rack) switches, and they physically connect servers.
Aggregation汇聚层:聚合交换机,汇聚连接接入交换机,同时提供其他服务如防火墙(Fire Wall,FW)、负载均衡(Server Load Balancer,SLB)、安全套接字协议卸载(Secure Sockets Layer offload,SSL offload),入侵检测,网络分析等。Aggregation Aggregation Layer: Aggregation switches, aggregation connection access switches, and provide other services such as firewall (Fire Wall, FW), load balancing (Server Load Balancer, SLB), Secure Sockets Layer offload (Secure Sockets Layer offload, SSL offload) ), intrusion detection, network analysis, etc.
Core核心层:核心交换机,提供高速转发,为多个汇聚层提供连接性。核心交换机为进出数据中心的包提供高速的转发,为多个汇聚层提供连接性,核心交换机为通常为整个网络提供一个弹性的L3路由网络。Core core layer: The core switch provides high-speed forwarding and provides connectivity for multiple aggregation layers. The core switch provides high-speed forwarding of packets in and out of the data center, and provides connectivity for multiple aggregation layers. The core switch provides an elastic L3 routing network for the entire network.
例如,在图3中,针对Pod1中的TOR1,其有至少4条等价路径(ECMP)接入至Internet(互联网),如图3中所示,Pod1中的TOR1至少可以通过等价路径:路径1、路径2、路径3和路径4来接入至互联网。For example, in Figure 3, for TOR1 in Pod1, it has at least 4 equal-cost paths (ECMP) to access the Internet (Internet). As shown in Figure 3, TOR1 in Pod1 can at least pass through the equivalent path: Path 1, Path 2, Path 3, and Path 4 to access the Internet.
需要说明的是,在本发明实施例中,应用于交换机侧的数据传输方法,可以应用于上述各层的交换机(Access接入层、Aggregation汇聚层或Core核心层),即在数据包从源主 机到目的主机的整个转发路径中,所有参与转发的交换机均可以实施本申请中所提供的任意一种所述的数据传输方法。It should be noted that, in this embodiment of the present invention, the data transmission method applied to the switch side can be applied to the switches of the above-mentioned layers (Access access layer, Aggregation convergence layer, or Core core layer), that is, when the data packets are sent from the source In the entire forwarding path from the host to the destination host, all switches participating in forwarding can implement any one of the data transmission methods provided in this application.
可以理解的是,以上图3中的数据中心网络拓扑结构只是本申请实施例中的一种示例性实施方式,本发明实施例中的数据中心网络拓扑结构包括但不仅限于以上网络架构。It can be understood that the data center network topology in FIG. 3 above is only an exemplary implementation in the embodiments of the present application, and the data center network topology in the embodiments of the present invention includes but is not limited to the above network architecture.
请参阅图4,图4是本申请实施例提供的一种计算机网络OSI模型以及TCP/IP模型示意图,在本申请实施例中,本发明实施例在现有技术的计算机网络OSI模型或TCP/IP模型中,在传输层和网络层之间添加了附加层,该附加层主要用于进行TCP流中的Flowlet的划分、标记以及相关参数的设置。具体地,本发明实施例所提供的OSI八层网络模型由下至上为1至8层,分别为物理层(Physical layer)、数据链路层(Data link layer)、网络层(Network layer)、附加层、传输层(Transport layer)、会话层(Session layer)、表示层(Presentation layer)和应用层(Application layer);本发明实施例所提供的TCP/IP模型由下至上可以简化为1至5层,主要包括了网络接口层、网络层、附加层、传输层和应用层。其中,Please refer to FIG. 4. FIG. 4 is a schematic diagram of a computer network OSI model and a TCP/IP model provided by an embodiment of the present application. In the IP model, an additional layer is added between the transport layer and the network layer. The additional layer is mainly used to divide, mark and set related parameters of Flowlets in the TCP flow. Specifically, the OSI eight-layer network model provided by the embodiment of the present invention consists of 1 to 8 layers from bottom to top, which are respectively a physical layer (Physical layer), a data link layer (Data link layer), a network layer (Network layer), Additional layer, transport layer (Transport layer), session layer (Session layer), presentation layer (Presentation layer) and application layer (Application layer); the TCP/IP model provided by the embodiment of the present invention can be simplified from bottom to top 5 layers, mainly including the network interface layer, network layer, additional layer, transport layer and application layer. in,
(1)应用层(1) Application layer
OSI参考模型中最靠近用户的一层,是为计算机用户提供应用接口,也为用户直接提供各种网络服务。向用户应用软件提供丰富的系统应用接口。常见应用层的网络服务协议有:超文本传输协议(Hyper Text Transfer Protocol,HTTP),超文本传输安全协议(Hyper Text Transfer Protocol over Secure Socket Layer,HTTPS),文件传输协议(File Transfer Protocol,FTP)、邮局协议版本3(Post Office Protocol-Version 3,POP3)、简单邮件传输协议(Simple Mail Transfer Protocol,SMTP)等。The layer closest to the user in the OSI reference model is to provide application interfaces for computer users, and also to provide users with various network services directly. Provide rich system application interface to user application software. Common application layer network service protocols are: Hyper Text Transfer Protocol (HTTP), Hyper Text Transfer Protocol (Hyper Text Transfer Protocol over Secure Socket Layer, HTTPS), File Transfer Protocol (File Transfer Protocol, FTP) , Post Office Protocol-Version 3 (POP3), Simple Mail Transfer Protocol (Simple Mail Transfer Protocol, SMTP), etc.
(2)表示层(2) Presentation layer
负责数据的编码、转化,确保应用层的正常工作。进行数据格式的转换,以确保一个系统生成的应用层数据能够被另外一个系统的应用层所识别和理解。在网络上计算机可能采用不同的数据表示,所以需要在数据传输时进行数据格式转换。为了让采用不同数据表示法的计算机之间能够相互通信而且交换数据,就要在通信过程中使用抽象的数据结构来表示所传送的数据。而在机器内部仍然采用各自的标准编码。管理这些抽象数据结构,并在发送方将机器的内部编码转换为适合网上传输的传送语法以及在接收方做相反的转换等工作都是由表示层来完成的。Responsible for data encoding and transformation to ensure the normal operation of the application layer. Convert the data format to ensure that the application layer data generated by one system can be recognized and understood by the application layer of another system. Computers may use different data representations on the network, so it is necessary to perform data format conversion during data transmission. In order to allow computers with different data representations to communicate with each other and exchange data, it is necessary to use abstract data structures to represent the transmitted data in the communication process. However, their own standard encoding is still used inside the machine. The management of these abstract data structures and the conversion of the machine's internal encoding into a transfer syntax suitable for transmission over the Internet on the sender side and the reverse conversion on the receiver side are all done by the presentation layer.
(3)会话层(3) Session layer
负责建立、维护、控制会话,区分不同的会话,以及提供单工(Simplex)、半双工(Half duplex)、全双工(Full duplex)三种通信模式的服务。例如,在通信双方之间建立、管理和终止会话,确定双方是否应该开始进行某一方发起的通信等。Responsible for establishing, maintaining, and controlling sessions, distinguishing different sessions, and providing services for three communication modes: Simplex, Half duplex, and Full duplex. For example, establishing, managing and terminating a session between two communicating parties, determining whether the two parties should start a party-initiated communication, etc.
(4)传输层(4) Transport layer
负责分割、组合数据,实现端到端的逻辑连接。传输层建立了主机端到端的链接,传输层的作用是为上层协议提供端到端的可靠和透明的数据传输服务,包括处理差错控制和流量控制等问题。该层向高层屏蔽了下层数据通信的细节,使高层用户看到的只是在两个传输实体间的一条主机到主机的、可由用户控制和设定的、可靠的数据通路。TCP/UDP就是在这一层。Responsible for splitting and combining data to achieve end-to-end logical connections. The transport layer establishes the end-to-end link of the host. The role of the transport layer is to provide end-to-end reliable and transparent data transmission services for the upper-layer protocols, including dealing with issues such as error control and flow control. This layer shields the details of the data communication of the lower layer from the upper layer, so that the upper layer user only sees a reliable data path between the two transmission entities from host to host, which can be controlled and set by the user. TCP/UDP is at this layer.
(5)附加层(5) Additional layer
本发明实施例中的附加层,用于对TCP流进行Flowet的划分和标记以及相关参数的设置。附加层协议根据网络路径的时延反馈来动态配置Flowlet的切分阈值,并依据所述的动态切分阈值将TCP流的报文段划分为Flowlet。由于划分Flowlet是在主机侧完成,本发明可以利用传输层报头的1比特保留字段来标记同一条TCP流的相邻Flowlet(本发明将该1比特字段命名为FL_Tag),将Flowlet的划分结果传递至网内交换机处,交换机再依据数据包的报头标志位来识别Flowlet。在本发明实施例中,主机侧的功能主要涉及上述应用层、表示层、会话层、传输层和附加层。The additional layer in the embodiment of the present invention is used for dividing and marking the Flowet and setting related parameters for the TCP flow. The additional layer protocol dynamically configures the segmentation threshold of the Flowlet according to the delay feedback of the network path, and divides the segment of the TCP flow into Flowlets according to the dynamic segmentation threshold. Since the flowlet division is completed on the host side, the present invention can use the 1-bit reserved field of the transport layer header to mark the adjacent Flowlets of the same TCP flow (the present invention names the 1-bit field as FL_Tag), and transmits the division result of the Flowlet. At the switch in the network, the switch identifies the Flowlet according to the header flag bit of the data packet. In this embodiment of the present invention, the functions on the host side mainly involve the above-mentioned application layer, presentation layer, session layer, transport layer, and additional layer.
需要说明的是,本申请中所述的附加层,可以单独作为一层进行部署,也可以部署至上述已有的传输层中。也即是将附加层所实现的功能结合到传输层中进行实现,本发明实施例对此不作具体限定。It should be noted that the additional layer described in this application may be deployed as a single layer, or may be deployed in the above-mentioned existing transport layer. That is, the functions implemented by the additional layer are combined into the transport layer for implementation, which is not specifically limited in this embodiment of the present invention.
(6)网络层(6) Network layer
负责管理网络地址、定位设备、决定路由。本层通过IP寻址来建立两个节点之间的连接,为源端的运输层送来的分组,选择合适的路由和交换节点,正确无误地按照地址传送给目的端的运输层。也即是通常所指的IP协议层。具体地,网络层是根据数据中包含的网络层地址信息,实现数据从任何一个节点到任何另外一个节点的整个传输过程,即主要功能是完成网络中主机间的报文传输,使用数据链路层的服务将每个报文从源端传输到目的端。本发明实施例中交换机所涉及的功能即对应该网络层。Responsible for managing network addresses, locating devices, and determining routes. This layer establishes the connection between two nodes through IP addressing, selects appropriate routing and switching nodes for the packets sent by the transport layer at the source end, and transmits them to the transport layer at the destination end correctly according to the address. Also known as the IP protocol layer. Specifically, the network layer realizes the entire transmission process of data from any node to any other node according to the network layer address information contained in the data, that is, the main function is to complete the message transmission between hosts in the network, using the data link The services of the layer transmit each message from the source to the destination. The functions involved in the switch in the embodiment of the present invention correspond to the network layer.
(7)数据链路层(7) Data link layer
负责准备物理传输,循环冗余校验(Cyclic Redundancy Check,CRC),错误通知,网络拓扑,流控等。将比特组合成字节,再将字节组合成帧,使用链路层地址(以太网使用MAC地址)来访问介质,并进行差错检测。在通过物理链路相连接的相邻节点之间,建立逻辑意义上的数据链路,在数据链路上实现数据的点到点或点到多方式的直接通信。在广域网中,数据链路层负责主机的接口信息处理机(Interface Message Processor,IMP)、IMP-IMP之间数据的可靠传送。在局域网中,数据链路层负责制及之间数据的可靠传输。Responsible for preparing physical transmission, Cyclic Redundancy Check (CRC), error notification, network topology, flow control, etc. Combine bits into bytes, and bytes into frames, use link-layer addresses (Ethernet uses MAC addresses) to access the medium, and perform error detection. Between adjacent nodes connected by physical links, a data link in a logical sense is established, and point-to-point or point-to-multiple direct communication of data is realized on the data link. In the wide area network, the data link layer is responsible for the reliable transmission of data between the host's interface message processor (Interface Message Processor, IMP) and the IMP-IMP. In a local area network, the data link layer is responsible for the reliable transmission of data between and among them.
(8)物理层(8) Physical layer
完成逻辑上的“0”和“1”向适合于传输介质承载的物理(光/电信号)的转换;实现物理信号的发送、接收,以及在介质的传输过程。物理层的主要功能是完成相邻结点之间原始比特流传输。即负责将数据以比特流的方式发送、接收。实际上最终信号的传输是通过物理层实现的。常用物理层的传输介质有(各种物理设备)集线器、中继器、调制解调器、网线、双绞线、同轴电缆等。Complete the conversion of logical "0" and "1" to the physical (optical/electrical signal) suitable for the transmission medium; realize the sending, receiving, and transmission process of the physical signal. The main function of the physical layer is to complete the original bit stream transmission between adjacent nodes. That is, it is responsible for sending and receiving data in the form of a bit stream. In fact, the transmission of the final signal is realized through the physical layer. Commonly used transmission media for the physical layer include (various physical devices) hubs, repeaters, modems, network cables, twisted pairs, coaxial cables, etc.
需要说明的是,在TCP/IP简化的模型中,应用层数据通过协议栈发到网络上时,每层协议都要加上一个数据首部(header),称为封装(Encapsulation),其中不同的协议层对数据包有不同的称谓,例如,在应用层叫做消息,在传输层叫做段(segment),在网络层叫做数据报(datagram)或数据包(Packet),在链路层叫做帧(frame)等。It should be noted that, in the simplified model of TCP/IP, when application layer data is sent to the network through the protocol stack, each layer protocol must add a data header, which is called encapsulation. The protocol layer has different names for data packets. For example, it is called a message at the application layer, a segment at the transport layer, a datagram or packet at the network layer, and a frame at the link layer. frame), etc.
可以理解的是,以上图4中的相关网络模型及功能只是本申请实施例中的一种示例性实施方式,本发明实施例中所涉及的网络模型及功能包括但不仅限于以上模型和功能。It can be understood that the above related network models and functions in FIG. 4 are only an exemplary implementation in the embodiments of the present application, and the network models and functions involved in the embodiments of the present invention include but are not limited to the above models and functions.
首先为了更好的理解本发明实施例,对本申请中所涉及分Flow与Flowlet进行进一步的说明。如上述图1所示,Flowlet实际上就是微流(micro-Flow)。一条流Flow可以分成很多个Flowlet。同一个Flowlet拥有相同的五元组信息,即源IP、目的IP、源端口、目的端口和传输层协议均相同。将某个流中连续发送的多个Packet作为一个Flowlet,并应用Flowlet机制进行路径选择,以将该Flowlet中包括的多个Packet基于选择的路径进行转发。在本申请中,针对同一个Flowlet中的不同数据包,采用完全相同的转发路径(不包含等价路径)进行转发,而针对同一个TCP流中的不同Flowlet则可以采用不同但等价的路径(即等价多路径)进行转发,也可以采用不等价的路径进行转发,取决于主机所接入的网络的网络拓扑结构的类型。其中,等价多路径,可以包括在数据包的转发路径中,交换机跳数相等的路径;而不等价多路径,则是指数据包的转发路径中,交换机跳数不相等的路径。First, in order to better understand the embodiments of the present invention, the sub-Flow and Flowlet involved in this application are further described. As shown in Figure 1 above, a Flowlet is actually a micro-Flow. A flow can be divided into many Flowlets. The same Flowlet has the same quintuple information, that is, the source IP, destination IP, source port, destination port and transport layer protocol are all the same. Multiple Packets continuously sent in a flow are regarded as a Flowlet, and the Flowlet mechanism is applied to select a path, so that the multiple Packets included in the Flowlet are forwarded based on the selected path. In this application, for different data packets in the same Flowlet, the same forwarding path (excluding the equivalent path) is used for forwarding, while for different Flowlets in the same TCP flow, different but equivalent paths can be used (ie, equal-cost multi-path) for forwarding, or unequal-cost paths for forwarding, depending on the type of network topology of the network to which the host is connected. Among them, the equal-cost multipath can be included in the forwarding path of the data packet, the path with the same number of switch hops; the unequal-cost multipath refers to the path with unequal switch hops in the forwarding path of the data packet.
换句话说,可以把一条Flow看成是多个Flowlet组成的,负载均衡是基于Flowlet的基础上引入了一个中间层,它既不是数据包(packet),也不是流(Flow),而是大于packet小于Flow的Flowlet,即一个Flowlet可以认为是同一个Flow中的一个或多个Packet组成的微流。In other words, a Flow can be regarded as a composition of multiple Flowlets. Load balancing introduces an intermediate layer based on Flowlets, which is neither a packet nor a flow, but is larger than a flowlet. A Flowlet whose packet is smaller than a Flow, that is, a Flowlet can be considered as a microflow composed of one or more Packets in the same Flow.
基于上述图2或图3提供的网络架构,以及图4提供的计算机网络模型,结合本申请中提供的数据传输方法,对本申请中提出的技术问题进行具体分析和解决。Based on the network architecture provided in FIG. 2 or FIG. 3 and the computer network model provided in FIG. 4, combined with the data transmission method provided in this application, the technical problems proposed in this application are specifically analyzed and solved.
参见图5,图5是本发明实施例提供的一种数据传输方法的流程示意图,该方法可应用于上述图2或图3中所述的网络架构中,其中的主机10可以用于支持并执行图5中所示的方法流程步骤S501-步骤S504。下面将结合附图3从主机10(源主机)侧进行描述。该方法可以包括以下步骤S501-步骤S504,可选的,还可以包括步骤S505-步骤S506。Referring to FIG. 5, FIG. 5 is a schematic flowchart of a data transmission method provided by an embodiment of the present invention. The method may be applied to the network architecture described in FIG. 2 or FIG. 3, wherein the host 10 may be used to support and Steps S501 to S504 of the method flow shown in FIG. 5 are performed. The following description will be made from the side of the host 10 (source host) with reference to FIG. 3 . The method may include the following steps S501-S504, and optionally, may further include steps S505-S506.
步骤S501:生成第一报文段,确定所述第一报文段所属的目标TCP流。Step S501: Generate a first packet segment, and determine the target TCP flow to which the first packet segment belongs.
具体地,在发送端,当一主机(可称为源主机)需要向另一个主机(可称为目的主机)发送消息时,源主机先在本地生成符合相关协议标准的数据包,再通过交换机等发送至目的主机。其中,在主机侧,生成数据包的过程主要涉及应用层(包括应用层、表示层、会话层)、传输层和网络层。例如,源主机中的某个应用需要向目的主机发送消息时,该消息在源主机侧经过应用层的封装之后,进入到传输层,并生成符合传输层协议的报文段(即第一报文段),例如为符合TCP协议的TCP报文段(segment)。也即是,在主机侧,当TCP报文段完成传输层报头字段的封装后(即生成了本发明实施例中的第一报文段),则触发本申请中的附加层(如图2中所述,此处不再赘述)的功能并对第一报文段进行后续的Flowlet的划分和标记、以及动态配置用于划分Flowlet的时间阈值等。具体为源主机首先通过附加层确定该第一报文段其所属于的TCP流,然后再根据其所属的TCP流来获取与该第一报文段对应的用于划分、标识Flowlet的相关信息。在本申请中,一个TCP流(Flow)表示的是,某一次业务过程中的数据传输过程,即从TCP三次握手→数据传输结束→连接释放;且同一个TCP流的五元组信息相同,其中,五元组信息包括源IP、目的IP、源端口和目的端口和传输层协议。Specifically, at the sending end, when a host (which can be called a source host) needs to send a message to another host (which can be called a destination host), the source host first generates a data packet that conforms to the relevant protocol standards locally, and then passes the switch. Wait for it to be sent to the destination host. Among them, on the host side, the process of generating a data packet mainly involves an application layer (including an application layer, a presentation layer, and a session layer), a transport layer and a network layer. For example, when an application in the source host needs to send a message to the destination host, the message enters the transport layer after being encapsulated by the application layer on the source host side, and generates a segment that conforms to the transport layer protocol (that is, the first message segment), such as a TCP segment (segment) conforming to the TCP protocol. That is, on the host side, when the TCP segment completes the encapsulation of the transport layer header field (that is, the first segment in the embodiment of the present invention is generated), the additional layer in the present application is triggered (as shown in FIG. 2 ). The functions described in , and will not be repeated here), and perform subsequent flowlet division and marking for the first packet segment, and dynamically configure a time threshold for dividing flowlets, etc. Specifically, the source host first determines the TCP flow to which the first segment belongs through the additional layer, and then obtains the relevant information for dividing and identifying the Flowlet corresponding to the first segment according to the TCP flow to which it belongs. . In this application, a TCP flow (Flow) represents the data transmission process in a certain business process, that is, from TCP three-way handshake → data transmission end → connection release; and the quintuple information of the same TCP flow is the same, The quintuple information includes source IP, destination IP, source port, destination port, and transport layer protocol.
可选的,主机根据所述第一报文段中的源端口号,确定所述第一报文段属的目标TCP流。也即是,不同TCP流所对应的源端口号必定不同,因此可以通过报文段中的源端口来 确定不同的报文段是否属于同一个TCP流。例如,源主机可以根据第一报文段中的源端口信息来确定第一报文段是属于哪个目标TCP流的。Optionally, the host determines the target TCP flow to which the first message segment belongs according to the source port number in the first message segment. That is, the source port numbers corresponding to different TCP streams must be different, so whether different packet segments belong to the same TCP stream can be determined by the source ports in the packet segments. For example, the source host may determine which target TCP flow the first packet belongs to according to the source port information in the first packet.
步骤S502:获取所述第一报文段的时间戳,以及获取与所述目标TCP流匹配的目标流信息。Step S502: Obtain the timestamp of the first packet segment, and obtain target flow information matching the target TCP flow.
具体地,由于在本申请中,对第一报文段进行Flowlet的划分和标识,取决于第一报文段与相邻的上一个报文段之间的时间戳的差值与对应的时间阈值之间的关系,因此需要获取匹配的相关流信息。当源主机确定当前待发送的第一报文段所属的TCP流之后,则进一步获取所述第一报文段的时间戳,以及获取与该目标TCP流匹配的目标流信息,以进一步进行后续的Flowlet的划分、标识等。需要说明的是,在本发明实施例中,报文段的时间戳通常是在传输层中封装报文段时所加入的时间信息,该时间信息代表该报文段生成的时刻(对于目的主机来说,也可以理解为是源主机发送该报文段的时刻)。例如,当主机需要发送数据至目的主机时,会将发送时间封装至该数据的时间戳项,而对于源主机和目的主机来说,则均可以通过该时间戳获知数据是在什么时刻发送的,以便于计算(或测量)网络延时、计算业务处理耗时等。可选的,主机可以通过获取第一报文段中携带的时间戳值来获得第一报文段的时间戳,也可以根据系统当前的时间戳来获取第一报文段的时间戳,且该获取步骤可以在生成第一报文段之后且在步骤S503之前完成即可,对其具体执行的时间点不作限制。所述目标流信息包括该目标TCP流对应的时间阈值、该目标TCP流中第二报文段的时间戳。其中,Specifically, because in this application, the flowlet division and identification of the first segment depends on the difference between the timestamps between the first segment and the adjacent previous segment and the corresponding time The relationship between the thresholds, so it is necessary to obtain matching related flow information. After the source host determines the TCP flow to which the first message segment to be sent currently belongs, it further obtains the timestamp of the first message segment, and obtains target flow information matching the target TCP flow for further follow-up The division, identification, etc. of the Flowlet. It should be noted that, in this embodiment of the present invention, the timestamp of the message segment is usually time information added when the message segment is encapsulated in the transport layer, and the time information represents the moment when the message segment is generated (for the destination host It can also be understood as the moment when the source host sends the segment). For example, when the host needs to send data to the destination host, it will encapsulate the sending time into the timestamp item of the data. For both the source host and the destination host, the timestamp can be used to know when the data was sent. , so as to calculate (or measure) the network delay, calculate the time-consuming of service processing, etc. Optionally, the host can obtain the timestamp of the first packet segment by obtaining the timestamp value carried in the first packet segment, or obtain the timestamp of the first packet segment according to the current timestamp of the system, and The obtaining step may be completed after the first packet segment is generated and before step S503, and the specific execution time point is not limited. The target flow information includes a time threshold corresponding to the target TCP flow and a timestamp of the second packet segment in the target TCP flow. in,
所述第二报文段为该目标TCP流中与第一报文段相邻的上一个报文段(也即是目标TCP流中时间戳值早于第一报文段且与第一报文段相邻的报文段);第二报文段的时间戳可以为主机根据第二报文段中携带的时间戳值获取的,也可以是主机根据系统当时记录的时刻来获取的,即第一报文段的时间戳与第二报文段的时间戳采用相同的标准来获取即可。The second segment is the previous segment adjacent to the first segment in the target TCP stream (that is, the timestamp value in the target TCP stream is earlier than the first segment and adjacent to the first segment. message segment); the timestamp of the second message segment can be obtained by the host according to the timestamp value carried in the second message segment, or obtained by the host according to the moment recorded by the system at that time, that is, the first message The timestamp of the segment and the timestamp of the second segment may be obtained using the same standard.
所述时间阈值为第一路径时延和第二路径时延之差,所述第一路径时延为所述目标TCP流的多路径集合中时延最大的上行路径的时延,所述第二路径时延为所述目标TCP流的多路径集合中时延最小的上行路径的时延。其中,目标TCP流的多路径集合可包括该目标TCP流对应的多条传输路径,也即是目标TCP流的源IP、源端口和目的IP、目的端口之间多种可能的上行传输路径。可选的,目标TCP流的多路径集合还可以进一步包括与该目标TCP流之间为同一网络会话的TCP流的对应的多条传输路径,也即是源IP和目的IP之间多种可能的上行传输路径。换句话说,目标TCP流的多路径集合中可以包括该目标TCP流自身对应的多条传输路径,也可以进一步包括与目标TCP流的三元组信息(源地址,目的地址,传输层协议)相同的TCP流对应的多条传输路径。也即是,可能是一条TCP流对应一个多路径集合,也可能是同一个网络会话中的多条TCP流对应相同的多路径集合,也因此,可以是每条TCP流都分别维护一个用于划分Flowlet的时间阈值,也可以是多条TCP流之间共同维护一个用于划分Flowlet的时间阈值。其中,上行路径是指从发送端(即源主机)通往接收端(目的主机)方向的路径;而上行路径时延则是指报文段从源主机发出并达到目的主机后,源主机收到来自目的主机的确认(目的主机收到数据后便立即发送确认)之间总共经历的时延。The time threshold is the difference between the first path delay and the second path delay, the first path delay is the delay of the uplink path with the largest delay in the multipath set of the target TCP flow, and the first path delay is The two-path delay is the delay of the uplink path with the smallest delay in the multi-path set of the target TCP flow. The multipath set of the target TCP flow may include multiple transmission paths corresponding to the target TCP flow, that is, multiple possible upstream transmission paths between the source IP, source port, destination IP, and destination port of the target TCP flow. Optionally, the multi-path set of the target TCP flow may further include multiple transmission paths corresponding to the TCP flow of the same network session between the target TCP flow, that is, multiple possible transmission paths between the source IP and the destination IP. the upstream transmission path. In other words, the multipath set of the target TCP stream may include multiple transmission paths corresponding to the target TCP stream itself, and may further include triple information (source address, destination address, transport layer protocol) related to the target TCP stream Multiple transmission paths corresponding to the same TCP stream. That is, one TCP stream may correspond to a multipath set, or multiple TCP streams in the same network session may correspond to the same multipath set. Therefore, each TCP stream may maintain a The time threshold for dividing Flowlets may also be a time threshold for dividing Flowlets that is jointly maintained among multiple TCP flows. Among them, the upstream path refers to the path from the sender (that is, the source host) to the receiver (destination host); and the upstream path delay refers to the time when the source host receives the packet after the segment is sent from the source host and reaches the destination host. The total delay experienced between the acknowledgment from the destination host (the destination host sends the acknowledgment immediately after receiving the data).
进一步可选的,当主机接入的网络为等价多路径模型时,则目标TCP流的多路径集合 中的多条传输路径之间均为等价路径,此时,第一路径时延则为这些等价路径中,时延最大的上行路径的时延,第二路径时延则为这些等价路径中时延最小的上行路径的时延;当主机接入的网络为常规的多路径模型时,则目标TCP流的多路径集合中的多条传输路径之间可以包括等价路径也可以包括非等价路径,此时,第一路径时延则为这些等价或非等价路径中,时延最大的上行路径的时延,第二路径时延则为这些等价或非等价路径中时延最小的上行路径的时延。需要说明的是,由于从源主机发出的数据包其源地址必定相同,若当目的主机的地址也即是目的地址相同或者是目的网段相同时,则等价多路径模型的网络(如数据中心网络)中发送端和接收端之间的不同TCP流之间的多路径集合实际上是相同的,因此可以基于与该目标TCP流的五元组或三元组相同的历史报文段的上行路径时延,来进行时间阈值的计算。Further optionally, when the network accessed by the host is in the equal-cost multi-path model, the multiple transmission paths in the multi-path set of the target TCP flow are all equal-cost paths. In this case, the first path delay is equal to Among these equal-cost paths, the delay of the uplink path with the largest delay, the second path delay is the delay of the uplink path with the smallest delay among these equal-cost paths; when the network accessed by the host is a conventional multipath model, the multiple transmission paths in the multi-path set of the target TCP flow may include equal-cost paths or non-equivalent paths. In this case, the first path delay is these equivalent or non-equivalent paths. Among them, the delay of the uplink path with the largest delay is the delay of the second path, and the delay of the second path is the delay of the uplink path with the smallest delay among these equivalent or non-equivalent paths. It should be noted that since the source address of the data packets sent from the source host must be the same, if the address of the destination host is the same as the destination address or the destination network segment is the same, the network of the equivalent multi-path model (such as data The multipath set between the different TCP flows between the sender and the receiver in the central network) is actually the same, so it can be based on the same historical segment as the quintuple or triple of the target TCP flow. The uplink path delay is used to calculate the time threshold.
例如,如图3中所示,由于图3中的网络拓扑结构为数据中心网络,属于等价多路径模型,因此在该网络下的目标TCP流的多路径集合中的所有路径均为等价路径,如为图3中所示的路径1、路径2、路径3和路径。For example, as shown in Fig. 3, since the network topology in Fig. 3 is a data center network, which belongs to the equal-cost multi-path model, all paths in the multi-path set of the target TCP flow under this network are equivalent Paths, such as Path 1, Path 2, Path 3 and Path shown in Figure 3.
在一种可能的实现方式中,所述主机维护有流信息表,所述流信息表包括N条TCP流的流信息,N为大于或者等于1的整数,其中,每条TCP流的流信息包括对应TCP流的流索引;所述获取与所述目标TCP流匹配的目标流信息,包括:根据所述目标TCP流的流索引从所述流信息表中查找与所述目标TCP流匹配的所述目标流信息。例如,为实现本发明实施例的上述功能,附加层协议需要维护一张流信息表FlowInfoTable,用于记录每条TCP流的划分Flowlet时的流信息。每条TCP流在FlowInfoTable表中占一个条目,如表1所示,In a possible implementation manner, the host maintains a flow information table, and the flow information table includes flow information of N TCP flows, where N is an integer greater than or equal to 1, wherein the flow information of each TCP flow Including the flow index of the corresponding TCP flow; the acquiring target flow information matching the target TCP flow includes: searching the flow information table for the target flow matching the target TCP flow according to the flow index of the target TCP flow the target stream information. For example, in order to implement the above functions of the embodiments of the present invention, the additional layer protocol needs to maintain a flow information table FlowInfoTable, which is used to record the flow information of each TCP flow when the Flowlet is divided. Each TCP flow occupies an entry in the FlowInfoTable table, as shown in Table 1,
表1Table 1
Figure PCTCN2020119708-appb-000001
Figure PCTCN2020119708-appb-000001
在上述表1中,每个条目可以包含六项:SrcPort、LstFLTag、LstTS、TTDiff、TripTime_max、TripTime_min。其中,In Table 1 above, each entry may contain six items: SrcPort, LstFLTag, LstTS, TTDiff, TripTime_max, TripTime_min. in,
(1)TCP流索引(SrcPort)项用于索引每条TCP流。也即是TCP流的标号。其中,该项对应本申请中所述的TCP流的流索引。(1) The TCP stream index (SrcPort) item is used to index each TCP stream. That is, the label of the TCP stream. Wherein, this item corresponds to the flow index of the TCP flow described in this application.
(2)上一个数据包的字段值(LstFLTag)项是该TCP流上一个报文段的FL_Tag字段值。也即是上一个刚发走的数据包对应的Flowlet标识是0还是1。其中,该项对应本申请中所述的参考Flowlet标识。(2) The field value (LstFLTag) item of the previous data packet is the FL_Tag field value of the previous segment of the TCP flow. That is, whether the Flowlet ID corresponding to the last packet just sent is 0 or 1. Wherein, this item corresponds to the reference Flowlet identifier described in this application.
(3)LstTS项是该TCP流上一个报文段的时间戳值。也即是,上一个刚发走的数据包的时间戳值(传输层才加上的就有的时间戳),其单位通常为微秒(us)级别。其中,该项对应本申请中所述的第二报文段的时间戳值。(3) The LstTS item is the timestamp value of a segment on the TCP stream. That is, the timestamp value of the last data packet just sent (the timestamp value added by the transport layer), and its unit is usually microsecond (us) level. Wherein, this item corresponds to the timestamp value of the second segment described in this application.
(4)TTDiff项是该TCP流用于划分Flowlet的时间阈值。也即是,每一条流都单独维护一个时间阈值,其单位通常为微秒(us)级别。例如,为表1中所示的TripTime_max项与TripTime_min项之间的差值,比如TripTime_max项的值为58、TripTime_min项的值为31,则TTDiff项的值为27,又比如TripTime_max项的值为49、TripTime_min项的值为36,则TTDiff项的值为13。其中,该项对应本申请中所述的时间阈值。(4) The TTDiff item is the time threshold used by the TCP flow to divide Flowlets. That is, each stream maintains a time threshold independently, whose unit is usually microsecond (us) level. For example, it is the difference between the TripTime_max item and the TripTime_min item shown in Table 1. For example, the value of the TripTime_max item is 58 and the value of the TripTime_min item is 31, then the value of the TTDiff item is 27, and the value of the TripTime_max item is 27. 49. If the value of the TripTime_min item is 36, the value of the TTDiff item is 13. Wherein, this item corresponds to the time threshold described in this application.
(5)TripTime_max项是该TCP流对应的多路径集合中的时延最大的上行路径的时延(上行路径指的是由发送端通往接收端方向的路径),其单位通常为微秒(us)级别。其中,该项对应本申请中所述的第一路径时延。(5) The item TripTime_max is the delay of the upstream path with the largest delay in the multipath set corresponding to the TCP stream (the upstream path refers to the path from the sender to the receiver), and its unit is usually microseconds ( us) level. Wherein, this item corresponds to the first path delay described in this application.
(6)TripTime_min项是该TCP流对应的多路径集合中时延最小的上行路径的时延,其单位通常为微秒(us)级别。其中,该项对应本申请中所述的第二路径时延。(6) The item TripTime_min is the delay of the uplink path with the smallest delay in the multipath set corresponding to the TCP flow, and the unit is usually a microsecond (us) level. Wherein, this item corresponds to the second path delay described in this application.
步骤S503:将所述第一报文段的时间戳与第二报文段的时间戳的差值与所述时间阈值进行比较。Step S503: Compare the difference between the timestamp of the first packet segment and the timestamp of the second packet segment with the time threshold.
具体地,当确定了与第一报文段所属的目标TCP流,且查找到该目标TCP流对应的目标流信息(例如为上述表1中的一条表项内容)之后,可以从中确定第二报文段的时间戳,将第一报文段的时间戳(该时间戳则从第一报文段中确定)与第二报文段的时间戳之间的差值与上述目标流信息中的时间阈值进行比较;从而比较得到该第一报文段与第二报文段之间的时间差值是否超过该报文段所属的目标TCP流对应的最大时间间隔即时间阈值。Specifically, when the target TCP flow to which the first segment belongs is determined, and the target flow information corresponding to the target TCP flow is found (for example, the content of an entry in Table 1 above), the second The timestamp of the packet segment, the difference between the timestamp of the first packet segment (the timestamp is determined from the first packet segment) and the timestamp of the second packet segment and the above target flow information Compare the time threshold of the first message segment and the second message segment to obtain whether the time difference between the first segment and the second segment exceeds the maximum time interval corresponding to the target TCP flow to which the segment belongs, that is, the time threshold.
步骤S504:根据比较结果,确定是否将所述第一报文段与所述第二报文段划分在同一个Flowlet。Step S504: According to the comparison result, determine whether to divide the first packet segment and the second packet segment into the same Flowlet.
具体地,根据步骤S303中的比较结果,从而决定是否将第一报文段与第二报文段划分为同一个Flowlet,而针对一个TCP流中的同一个Flowet则是,通过相同的路径进行转发,针对不同Flowlet则需要重新决策转发路径。Specifically, according to the comparison result in step S303, it is determined whether to divide the first packet segment and the second packet segment into the same Flowlet, and for the same Flowet in a TCP flow, it is performed through the same path. For forwarding, the forwarding path needs to be re-decided for different Flowlets.
在一种可能的实现方式中,若所述第一报文段的时间戳与所述第二报文段的时间戳的差值小于或者等于所述时间阈值,将所述第一报文段与所述第二报文段划分在同一Flowlet中;若所述第一报文段的时间戳与所述第二报文段的时间戳的差值大于所述时间阈值,将所述第一报文段划分至新的Flowlet中。本发明实施例中,如果当前待发送的第一报文段与所属目标TCP流中上一个相邻的第二报文段之间的时间戳之间的差值,小于该目标TCP流所对应的时间阈值(该时间阈值是动态变化的),则认为该第一报文段与上一个相邻的第二报文段之间满足在同一个Flowlet中发送的条件,也即是可以将第一报文段判定为与上一个第二报文段划分在同一个Flowet中;同理,如果当前待发送的第一报文段与所属目标TCP流中上一个相邻的第二报文段之间的时间戳之间的差值,大于该目标TCP流所对应的时间阈值,则认为该第一报文段与上一个相邻的第二报文段之间不满足在同一个Flowlet中发送的条件,也即是将第一报文段划分至新的Flowet中。In a possible implementation manner, if the difference between the timestamp of the first packet segment and the timestamp of the second packet segment is less than or equal to the time threshold, the first packet segment is is divided into the same Flowlet as the second segment; if the difference between the timestamp of the first segment and the timestamp of the second segment is greater than the time threshold, the first segment Segments are divided into new Flowlets. In this embodiment of the present invention, if the difference between the timestamps between the first segment to be sent and the last adjacent second segment in the target TCP flow to which it belongs is smaller than that corresponding to the target TCP flow (the time threshold is dynamically changed), it is considered that the first segment and the last adjacent second segment meet the conditions of being sent in the same Flowlet, that is, the first segment can be sent in the same Flowlet. A segment is determined to be divided into the same Flowet as the previous second segment; for the same reason, if the current first segment to be sent is the same as the last adjacent second segment in the target TCP flow to which it belongs If the difference between the time stamps is greater than the time threshold corresponding to the target TCP flow, it is considered that the first segment and the last adjacent second segment do not meet the requirements in the same Flowlet. The condition for sending is to divide the first segment into a new Flowet.
如图6A所示,图6A为本发明实施例提供的第一数据包和第二数据包在同一个Flowlet的示意图;在图6A中,第一数据包为第一报文段经过附加层封装后的数据包,第二数据 包为第二报文段经过附加层封装后的数据包。在图6A中,假设第一报文段与第二报文段的时间戳值之差小于或者等于所述时间阈值,那么主机侧则将第一报文段与第二报文段划分在同一个Flowlet(图中为Flowlet5)中,也即是对应的第一数据包和第二数据包被划分在了同一个Flowlet5中。As shown in FIG. 6A , FIG. 6A is a schematic diagram of a first data packet and a second data packet in the same Flowlet provided by an embodiment of the present invention; in FIG. 6A , the first data packet is the first segment and is encapsulated by an additional layer The second data packet is a data packet after the second segment is encapsulated by the additional layer. In FIG. 6A , assuming that the difference between the timestamp values of the first packet segment and the second packet segment is less than or equal to the time threshold, the host side divides the first packet segment and the second packet segment into the same In one Flowlet (Flowlet5 in the figure), that is, the corresponding first data packet and the second data packet are divided into the same Flowlet5.
如图6B所示,图6B为本发明实施例提供的第一数据包和第二数据包在不同Flowlet的示意图,在图6B中,假设第一报文段与第二报文段的时间戳值之差大于所述时间阈值,那么主机侧则将第一报文段与第二报文段划分在不同Flowlet(图中分别为为Flowlet5和Flowlet4)中,也即是对应的第一数据包和第二数据包被划分在了Flowlet5和Flowlet4中。可以理解的是,此时相当于第一数据包为新的Flowlet中的第一个数据包。As shown in FIG. 6B , FIG. 6B is a schematic diagram of a first data packet and a second data packet in different Flowlets according to an embodiment of the present invention. In FIG. 6B , it is assumed that the timestamps of the first packet segment and the second packet segment are If the difference between the values is greater than the time threshold, the host side divides the first segment and the second segment into different Flowlets (respectively, Flowlet5 and Flowlet4 in the figure), that is, the corresponding first data packets. and the second packet is divided into Flowlet5 and Flowlet4. It can be understood that, at this time, the first data packet is equivalent to the first data packet in the new Flowlet.
本发明实施例针对现有Flowlet粒度负载均衡方案中固定的检测间隔难以适应动态网络负载的问题,本发明实施例结合网络路径的时延反馈信息,动态配置用于检测Flowlet的时间间隔,以保证Flowlet粒度与网络路径状态匹配。Aiming at the problem that the fixed detection interval in the existing Flowlet granular load balancing scheme is difficult to adapt to the dynamic network load, the embodiment of the present invention dynamically configures the time interval for detecting the Flowlet by combining the delay feedback information of the network path, so as to ensure Flowlet granularity matches network path state.
可选的,本发明实施例还可以包括以下方法步骤S505-S506。Optionally, this embodiment of the present invention may further include the following method steps S505-S506.
步骤S505:生成第一数据包,所述第一数据包包括所述第一报文段和所述第一报文段的Flowlet标识。Step S505: Generate a first data packet, where the first data packet includes the first packet segment and the Flowlet identifier of the first packet segment.
具体地,源主机侧将第一数据报文段经过所述附加层以及网络层的封装之后,进一步生成第一数据包,该第一数据包包括了第一报文段和第一报文段的Flowlet标识。也即是在生成第一数据包的过程中,除了要将封装相关协议的报头,还需要将该报文段的Flowlet标识封装进去。可选的,该第一报文段的Flowlet标识可被封装在报头上的Flowlet标识位上。Specifically, after the source host side encapsulates the first data segment through the additional layer and the network layer, it further generates a first data packet, where the first data packet includes the first segment and the first segment The Flowlet identifier. That is, in the process of generating the first data packet, in addition to encapsulating the header of the relevant protocol, it is also necessary to encapsulate the Flowlet identifier of the packet segment. Optionally, the Flowlet identifier of the first segment may be encapsulated in the Flowlet identifier bit on the header.
所述目标流信息还包括目标TCP流的参考Flowlet标识(即对应上述表1中的LstFLTag字段),假设参考Flowlet标识当前为所述第二报文段对应的第一Flowlet标识,即上一个最近发送的报文段为第二报文段时,该参考Flowlet标识实际上是指第二报文段对应的第一Flowlet标识。若所述第一报文段的时间戳与所述第二报文段的时间戳的差值小于或者等于所述时间阈值,则所述第一报文段的Flowlet标识为所述第一Flowlet标识;若所述第一报文段的时间戳与所述第二报文段的时间戳的差值大于所述时间阈值,则所述第一报文段的Flowlet标识为第二Flowlet标识。The target flow information also includes the reference Flowlet identifier of the target TCP flow (that is, corresponding to the LstFLTag field in Table 1 above). It is assumed that the reference Flowlet identifier is currently the first Flowlet identifier corresponding to the second segment, that is, the last most recent Flowlet identifier. When the sent segment is the second segment, the reference Flowlet identifier actually refers to the first Flowlet identifier corresponding to the second segment. If the difference between the timestamp of the first packet segment and the timestamp of the second packet segment is less than or equal to the time threshold, the flowlet of the first packet segment is identified as the first flowlet identifier; if the difference between the timestamp of the first packet segment and the timestamp of the second packet segment is greater than the time threshold, the Flowlet identifier of the first packet segment is the second Flowlet identifier.
本发明实施例中,当进一步对第一报文段进行封装以将数据通过网络传输时,则可通过在封装过程中,设置该报文段对应的Flowlet标识,以便于该报文段被封装成数据包之后,在交换机侧可以通过该Flowlet标识来识别数据包属于哪个Flowlet,从而决定通过哪个路径来进行发送。例如,当第一报文段要进入到交换机所在的数据链路层时,需要进一步将第一报文段进行封装,此时通过在封装的数据包中,设置一位用于交换机识别该报文段属于哪个Flowlet的标识位,当第一报文段与第二报文段的Flowlet标识相同时,则在交换机侧将第一数据包与第二报文段对应的第二数据包通过相同的路径进行转发。综上,本发明实施例通过在主机侧划分好Flowlet,并可利用传输层报头保留字段中的比特位(例如为1比特)来标记Flowlet,交换机仅依靠报头字段即可识别Flowlet,效率高、硬件开销低,同时还保证了同一个Flowlet无论在网络内历经几跳交换机,都不会被再次切分,降低了数据 包乱序的风险。In the embodiment of the present invention, when the first packet segment is further encapsulated to transmit data through the network, the flowlet identifier corresponding to the packet segment can be set during the encapsulation process, so that the packet segment can be encapsulated After the data packet is formed, the switch side can identify which Flowlet the data packet belongs to through the Flowlet identifier, so as to decide which path to send through. For example, when the first packet is to enter the data link layer where the switch is located, the first packet needs to be further encapsulated. In this case, a bit is set in the encapsulated data packet for the switch to identify the packet. The identifier of which Flowlet the segment belongs to. When the Flowlet identifier of the first segment and the second segment is the same, the switch side passes the first packet and the second packet corresponding to the second segment through the same path for forwarding. In summary, the embodiment of the present invention divides the Flowlet on the host side, and can use the bit (for example, 1 bit) in the reserved field of the transport layer header to mark the Flowlet, and the switch can identify the Flowlet only by the header field, with high efficiency and high efficiency. The hardware overhead is low, and it also ensures that the same Flowlet will not be segmented again no matter how many hops of switches it goes through in the network, reducing the risk of out-of-order packets.
步骤S506:若所述第一报文段的时间戳与所述第二报文段的时间戳的差值大于所述时间阈值,则将所述参考Flowlet标识更新为所述第二Flowlet标识。Step S506: If the difference between the timestamp of the first packet segment and the timestamp of the second packet segment is greater than the time threshold, update the reference Flowlet identifier to the second Flowlet identifier.
具体地,主机侧维护的流信息表中的每条TCP流的流信息中还包括每条TCP流的参考Flowlet标识,也即是在流信息表中维护了每条TCP流当前的Flowlet的标识,以便于为待发送的报文段设置其对应的Flowlet标识。例如,假设参考Flowlet标识为第一Flowlet标识(也即是第二报文段对应的Flowlet标识),那么当第一报文段与第二报文段被划分至同一个Flowlet时,则该第一报文段的Flowlet标识也被标记为第一Flowlet标识,也即是参考Flowlet标识仍然保持为第一Flowlet标识不变;若参考Flowlet标识为第一Flowlet标识,且当第一报文段与第二报文段被划分至不同Flowlet时(即第一报文段被划分至新的Flowlet),则该第一报文段的Flowlet标识被标记为第二Flowlet标识,而此时参考Flowlet标识则需要更新为第二Flowlet标识。可选的,参考Flowlet标识可以在0或1之间进行切换,即两个相邻的Flowlet之间,其Flowlet标识在0或1之间间隔取值,因此仅通过1bit即可准确的指示不同的数据包是否属于同一个Flowlet。Specifically, the flow information of each TCP flow in the flow information table maintained by the host side also includes the reference Flowlet identifier of each TCP flow, that is, the current Flowlet identifier of each TCP flow is maintained in the flow information table , so as to set the corresponding Flowlet identifier for the segment to be sent. For example, assuming that the reference Flowlet ID is the first Flowlet ID (that is, the Flowlet ID corresponding to the second segment), then when the first segment and the second segment are divided into the same Flowlet, the The Flowlet ID of a message segment is also marked as the first Flowlet ID, that is, the reference Flowlet ID remains unchanged as the first Flowlet ID; if the reference Flowlet ID is the first Flowlet ID, and when the first message segment matches the first Flowlet ID When the second segment is divided into different Flowlets (that is, the first segment is divided into a new Flowlet), the Flowlet ID of the first segment is marked as the second Flowlet ID, and at this time, refer to the Flowlet ID Then it needs to be updated to the second Flowlet ID. Optionally, the reference Flowlet ID can be switched between 0 or 1, that is, between two adjacent Flowlets, their Flowlet IDs take values at intervals between 0 or 1, so only 1 bit can accurately indicate the difference. Whether the packets belong to the same Flowlet.
在一种可能的实现方式中,主机还根据接收到的ACK包更新第一路径时延或第二路径时延,以实时更新目标TCP流的时间阈值。具体为,主机接收目标ACK包,所述目标ACK包为与所述目标TCP流的目的端口号相同或者目的地址相同的ACK包;确定所述目标ACK包的上行路径时延,所述上行路径时延为所述目标ACK包的时间戳值和时间戳回送回答值之间的差值;将所述目标ACK包的上行路径时延与所述目标TCP流中历史ACK包的上行路径时延进行比较;若所述目标ACK包的上行路径时延大于所述历史ACK包的上行路径时延中的最大值,则将所述第一路径时延更新为所述目标ACK包的上行路径时延;若所述目标ACK包的上行路径时延小于所述历史ACK包的上行路径时延中的最小值,则将所述第二路径时延更新为所述目标ACK包的上行路径时延。In a possible implementation manner, the host also updates the first path delay or the second path delay according to the received ACK packet, so as to update the time threshold of the target TCP flow in real time. Specifically, the host receives a target ACK packet, and the target ACK packet is an ACK packet with the same destination port number or the same destination address as the target TCP stream; determining the uplink path delay of the target ACK packet, the uplink path The delay is the difference between the timestamp value of the target ACK packet and the timestamp echo response value; the uplink path delay of the target ACK packet and the uplink path delay of the historical ACK packets in the target TCP flow Compare; if the uplink path delay of the target ACK packet is greater than the maximum value of the uplink path delays of the historical ACK packets, then update the first path delay to the uplink path time of the target ACK packet If the uplink path delay of the target ACK packet is less than the minimum value among the uplink path delays of the historical ACK packets, update the second path delay to the uplink path delay of the target ACK packet .
本发明实施例中,用于划分Flowlet的时间阈值可以是由该目标TCP流(或与该目标TCP流在同一网络会话中的TCP流)中接收到的历史报文段的最大上行路径时延和最小上行路径时延之差计算得到的,也即是该时间阈值是一个根据网络传输负载情况实时变化、动态调整的值。具体地,主机侧每次接收到属于目标TCP流的ACK包(即目的端口号相同),或者接收到与目标TCP流在同一网络会话中的TCP流的ACK包(即目的地址相同或目的网段相同)时,都通过发送该目标ACK包的时间戳值与时间戳回送回答值之间的差值来计算该ACK包的上行路径的传输时延,并根据所接收到的所有ACK包的上行路径传输时延的历史值,确定一个当前的最小上行路径时延,并将其作为所述第一路径时延,且确定一个当前的最大上行路径时延,并将其作为所述第二路径时延;最终利用第一路径时延和第二路径时延的差值来计算得到目标TCP流中用于划分Flowlet的时间阈值。也即是每次接收到目标ACK包之后,都需要检测是否需要更新所述第一路径时延或第二路径时延,从而使得不同的TCP流或者同一个TCP流在不同状态下的数据,其用于划分Flowlet的时间阈值是动态变化的,且是根据对应TCP流中的数据实时传输时延来动态调整的,因此,能够始终适应动态网络负载变化。In this embodiment of the present invention, the time threshold for dividing the Flowlet may be the maximum upstream path delay of the historical packet received by the target TCP flow (or the TCP flow in the same network session as the target TCP flow). The time threshold is calculated from the difference between the time delay and the minimum uplink path delay, that is, the time threshold is a value that changes in real time and is dynamically adjusted according to the network transmission load. Specifically, each time the host side receives an ACK packet belonging to the target TCP stream (that is, the destination port number is the same), or receives an ACK packet of a TCP stream in the same network session as the target TCP stream (that is, the destination address is the same or the destination network is the same) When the segment is the same), the transmission delay of the upstream path of the ACK packet is calculated by the difference between the timestamp value of the target ACK packet and the timestamp echo response value, and the transmission delay of the ACK packet is calculated according to the The historical value of the transmission delay of the uplink path, determine a current minimum uplink path delay, and use it as the first path delay, and determine a current maximum uplink path delay, and use it as the second path delay Path delay; finally, the difference between the first path delay and the second path delay is used to calculate the time threshold for dividing Flowlets in the target TCP flow. That is, after each target ACK packet is received, it is necessary to detect whether the first path delay or the second path delay needs to be updated, so that the data of different TCP streams or the same TCP stream in different states, The time threshold used to divide Flowlets is dynamically changed, and is dynamically adjusted according to the real-time transmission delay of data in the corresponding TCP flow, so it can always adapt to dynamic network load changes.
请参见图6C,图6C为本发明实施例所提供的一种附加层协议划分并标记Flowlet的流程示意图,基于上述表1中主机维护的流信息表,以下示例性描述当TCP报文段完成传输层报头字段的封装后,触发附加层的功能并进行Flowlet的划分和标记的实现过程,具体可以包括如下步骤:Please refer to FIG. 6C. FIG. 6C is a schematic flowchart of an additional layer protocol dividing and marking a Flowlet according to an embodiment of the present invention. Based on the flow information table maintained by the host in the above Table 1, the following exemplarily describes when the TCP segment is completed. After the encapsulation of the header field of the transport layer, the function of the additional layer is triggered and the realization process of flowlet division and marking is performed, which may include the following steps:
1、TCP报文段(如第一报文段)进入传输层后,首先依据报文段携带的源端口信息在FlowInfoTable信息表(表1中所述的流信息表)中索引改TCP流对应的条目(记为[SrcPort])。1. After the TCP segment (such as the first segment) enters the transport layer, first, according to the source port information carried by the segment, the index in the FlowInfoTable (the flow information table described in Table 1) is changed to correspond to the TCP flow. entry (denoted as [SrcPort]).
2、再进一步获取报文段(如第一报文段)携带的时间戳值(记为CruTS)。2. Further obtain the timestamp value (denoted as CruTS) carried by the segment (eg, the first segment).
3、下一步将判断该报文段(如第一报文段)是否符合Flowlet的切分条件,也就是判断该报文段的时间戳值(如第一报文段的时间戳值)与该TCP流对应条目中记录的上一报文段的时间戳值的差值和Flowlet的切分阈值的大小关系(即CurTS–[SrcPort].LstTS≥[SrcPort].TTDiff)。3. The next step is to judge whether the segment (such as the first segment) meets the flowlet segmentation conditions, that is, to determine whether the timestamp value of the segment (such as the timestamp value of the first segment) is different from that of the segment. The relationship between the difference between the timestamp values of the previous segment recorded in the corresponding entry of the TCP flow and the segmentation threshold of the Flowlet (that is, CurTS–[SrcPort].LstTS≥[SrcPort].TTDiff).
4、若匹配判决条件,则将当前报文段(如第一报文段)视为新的Flowlet的第一个报文段并在报头FL_Tag位上做标记,做标记的方法是将该报文段报头的FL_Tag位的值置为该条目中LstFLTag项的相反值,之后再更新该条目中LstFLTag项的值;4. If the judgment condition is matched, the current segment (such as the first segment) is regarded as the first segment of the new Flowlet and marked on the FL_Tag bit of the header. The value of the FL_Tag bit of the segment header is set to the opposite value of the LstFLTag item in the entry, and then the value of the LstFLTag item in the entry is updated;
5、若不匹配判决条件,则将当前报文段(如第一报文段)视为上一Flowlet的后续报文段并在报头FL_Tag位上做标记,做标记的方法是将该报文段报头的FL_Tag位的值置为该条目中LstFLTag项的相同值。当附加层完成Flowlet的划分与标记后,TCP报文段被传递至网络层。5. If the judgment condition is not matched, the current segment (such as the first segment) is regarded as the subsequent segment of the previous Flowlet and marked on the FL_Tag bit of the header. The method of marking is to mark the packet. The value of the FL_Tag bit of the segment header is set to the same value of the LstFLTag entry in this entry. After the additional layer completes the division and marking of the Flowlet, the TCP segment is passed to the network layer.
请参见图6D,图6D为本发明实施例所提供的一种附加层协议动态更新Flowlet的切分阈值的流程示意图。基于上述表1中主机维护的流信息表,首先,需要说明的是,由于数据中心网络拓扑为同一对源目的主机提供了多条等价路径,所以本发明实施例先根据ACK包携带的时间戳持续获取等价路径(包括同一个TCP流内的等价路径,或者是可以包括同一个网路会话内的等价路径)的上行路径时延,并记录上行路径时延的最大值和最小值,用上行路径时延的最大值和最小值的差值表征等价路径间的最大时延差,再基于最大时延差周期性配置TTDiff参数。以下示例性描述附加层协议动态配置用于指示Flowlet的划分的TTDiff参数的实现过程,具体可以包括如下步骤:Referring to FIG. 6D, FIG. 6D is a schematic flowchart of an additional layer protocol dynamically updating a segmentation threshold of a Flowlet according to an embodiment of the present invention. Based on the flow information table maintained by the host in Table 1 above, it should be noted that, since the data center network topology provides multiple equal-cost paths for the same pair of source and destination hosts, in this embodiment of the present invention, the time carried by the ACK packet is first used. Stamp continuously obtains the uplink path delay of equal-cost paths (including equal-cost paths in the same TCP flow, or can include equal-cost paths in the same network session), and records the maximum and minimum uplink path delays The maximum delay difference between equal-cost paths is represented by the difference between the maximum and minimum uplink path delays, and the TTDiff parameter is periodically configured based on the maximum delay difference. The following exemplarily describes the implementation process of the additional layer protocol dynamically configuring the TTDiff parameter used to indicate the division of the Flowlet, which may specifically include the following steps:
1、如图6D所示,当从目的主机侧返回的ACK包(如目标ACK包)进入源主机的附加层协议后,主机侧依据携带的目的端口信息在FlowInfoTable信息表索引该ACK包对应的条目(记为[SrcPort]),该条目与该ACK包所关联的TCP流对应的条目是相同的。1. As shown in Figure 6D, when the ACK packet (such as the target ACK packet) returned from the destination host side enters the additional layer protocol of the source host, the host side indexes the corresponding ACK packet in the FlowInfoTable information table according to the destination port information carried. The entry (denoted as [SrcPort]) is the same as the entry corresponding to the TCP flow associated with the ACK packet.
2、读取该ACK包携带的时间戳值,包括时间戳值字段值(记为Timesatmp)和时间戳回送回答字段值(记为TimesatmpEcho),用这两个时间戳的差值来表征上行路径的时延(记为TripTime)。2. Read the timestamp value carried in the ACK packet, including the timestamp value field value (marked as Timesatmp) and the timestamp echo response field value (marked as TimesatmpEcho), and use the difference between these two timestamps to characterize the uplink path time delay (denoted as TripTime).
3、对比计算所得的上行路径时延与条目中记录的最大上行路径时延(即[SrcPort].TrpTime_max)、最小上行路径时延(即[SrcPort].TrpTime_min)的大小关系。3. Compare the relationship between the calculated upstream path delay and the maximum upstream path delay (ie [SrcPort].TrpTime_max) and the minimum upstream path delay (ie [SrcPort].TrpTime_min) recorded in the entry.
4、若上行路径时延大于信息表条目中记录的最大上行路径时延,则更新条目中[SrcPort].TrpTime_max项为该上行路径时延的值;4. If the uplink path delay is greater than the maximum uplink path delay recorded in the information table entry, update the entry [SrcPort].TrpTime_max to the value of the uplink path delay;
5、若上行路径时延小于信息表条目中记录的最小上行路径时延,则更新条目中 [SrcPort].TripTime_min项为该上行路径时延的值。5. If the upstream path delay is less than the minimum upstream path delay recorded in the information table entry, update the entry [SrcPort].TripTime_min to the value of the upstream path delay.
可选的,附加层协议还可以周期性更新信息表中所有条目的TTDiff项,本发明实施例将TTDiff项的值配置为各条目TripTime_max项和TripTime_min项的差值,同时将TripTime_max项和TripTime_min项的值重置,避免已经失效的极大值或极小值一直存在。Optionally, the additional layer protocol may also periodically update the TTDiff items of all entries in the information table. In this embodiment of the present invention, the value of the TTDiff item is configured as the difference between the TripTime_max item and the TripTime_min item of each entry, and the TripTime_max item and the TripTime_min item are set simultaneously. to reset the value of , so that the maximum or minimum value that has expired will not always exist.
本发明实施例还可以将更新周期设置为网络往返时延的时间量级(约为100~200微秒),将TripTime_max项的重置值配置为零、将TripTime_min项的重置值配置为一个较大值(如4字节能表示的最大值)。In this embodiment of the present invention, the update period may also be set to the time order of the network round-trip delay (about 100-200 microseconds), the reset value of the TripTime_max item is set to zero, and the reset value of the TripTime_min item is set to one Larger value (such as the maximum value that can be represented by 4 bytes).
Flowlet技术能很好解决数据中心网络负载均衡面临的哈希碰撞、鼠流阻塞、不对称性等问题。现有Flowlet级方案多基于固定时间间隔在交换机处检测并转发Flowlet,但固定的时间间隔无法与数据中心网络动态变化的流量负载时刻匹配,将导致网内负载散布不均匀。本申请提出,基于一个随路径负载自适应变化的时间阈值在终端主机处预先切分流量,再将细粒度的Flowlet散布到网内,交换机识别Flowlet后可执行任意路由算法来进一步均衡负载。Flowlet technology can well solve the problems of hash collision, rat flow blocking, asymmetry and other problems faced by data center network load balancing. Most of the existing Flowlet-level solutions detect and forward Flowlets at the switch based on a fixed time interval, but the fixed time interval cannot match the dynamically changing traffic load moment of the data center network, which will lead to uneven load distribution in the network. This application proposes to pre-segment traffic at the terminal host based on a time threshold that adaptively changes with the path load, and then distribute fine-grained Flowlets to the network. After the switch recognizes the Flowlets, it can execute any routing algorithm to further balance the load.
参见图7,图7是本发明实施例提供的一种数据传输方法的流程示意图,该方法可应用于上述图2或图3中所述的网络架构中的交换机,交换机20可以用于支持并执行图7中所示的方法流程步骤S701-步骤S704。下面将结合附图3从交换机侧进行描述。该方法可以包括以下步骤S701-步骤S704。Referring to FIG. 7, FIG. 7 is a schematic flowchart of a data transmission method provided by an embodiment of the present invention. The method can be applied to the switch in the network architecture described in FIG. 2 or FIG. 3. The switch 20 can be used to support and Steps S701 to S704 of the method flow shown in FIG. 7 are performed. The following description will be made from the switch side with reference to FIG. 3 . The method may include the following steps S701-S704.
步骤S701:接收第一数据包。Step S701: Receive the first data packet.
具体地,所述第一数据包包括第一报文段和所述第一报文段的Flowlet标识。Specifically, the first data packet includes a first packet segment and a Flowlet identifier of the first packet segment.
步骤S702:确定所述第一数据包所属的目标TCP流,获取与所述目标TCP流匹配的转发信息。Step S702: Determine the target TCP flow to which the first data packet belongs, and obtain forwarding information matching the target TCP flow.
具体地,所述转发信息包括所述目标TCP流的参考Flowlet标识以及参考转发路径;其中,所述参考Flowlet标识当前为第二报文段对应的第一Flowlet标识,所述第二报文段为所述目标TCP流中与所述第一报文段相邻的上一个报文段;所述参考转发路径为第二数据包的第一转发路径,所述第二数据包包括所述第二报文段和所述第一Flowlet标识;Specifically, the forwarding information includes a reference Flowlet identifier of the target TCP flow and a reference forwarding path; wherein, the reference Flowlet identifier is currently a first Flowlet identifier corresponding to a second packet segment, and the second packet segment is the last packet segment adjacent to the first packet segment in the target TCP flow; the reference forwarding path is the first forwarding path of the second data packet, and the second packet includes the second packet a text segment and the first Flowlet identifier;
在一种可能的实现方式中,所述交换机维护有转发信息表,所述转发信息表包括M条TCP流的转发信息,M为大于或者等于1的整数,其中,每条TCP流的转发信息包括对应TCP流的五元组哈希值;所述确定所述第一数据包所属的目标TCP流,获取与所述目标TCP流匹配的转发信息,包括:根据所述第一数据包的五元组信息,计算所述第一数据包的五元组哈希值;根据所述第一数据包的五元组哈希值,从所述转发信息表中查找与所述目标TCP流匹配的转发信息。本发明实施例,交换机侧维护有转发信息表,该转发信息表中包括一个或多个与其连接的主机上的TCP流(例如当前处于活动状态的TCP流)的转发信息,而每个TCP流的转发信息又可以包括TCP流的五元组哈希值。即交换机可以维护当前正在活动的所有TCP流的转发信息,以便于当有数据包需要发送的时候,则可以根据数据包的五元组哈希值查找到转发信息表中与该五元组哈希值匹配的转发信息(包括参考Flowlet标识、转发路径等),从而进行待发送数据包的转发。In a possible implementation manner, the switch maintains a forwarding information table, and the forwarding information table includes forwarding information of M TCP flows, where M is an integer greater than or equal to 1, wherein the forwarding information of each TCP flow Including the five-tuple hash value of the corresponding TCP flow; the determining the target TCP flow to which the first data packet belongs, and obtaining the forwarding information matching the target TCP flow, comprising: according to the five-tuple of the first data packet. tuple information, calculate the quintuple hash value of the first data packet; according to the quintuple hash value of the first data packet, search for the target TCP flow matching from the forwarding information table Forward information. In this embodiment of the present invention, the switch side maintains a forwarding information table, where the forwarding information table includes forwarding information of one or more TCP flows (for example, currently active TCP flows) on the hosts connected to it, and each TCP flow The forwarding information may in turn include the quintuple hash value of the TCP stream. That is, the switch can maintain the forwarding information of all currently active TCP flows, so that when a data packet needs to be sent, it can search the forwarding information table with the quintuple hash value according to the quintuple hash value of the data packet. The forwarding information (including the reference Flowlet identifier, forwarding path, etc.) matching the value of the value is used to forward the to-be-sent data packet.
步骤S703:将所述第一报文段的Flowlet标识与所述第一Flowlet标识进行比较;Step S703: Compare the Flowlet identifier of the first segment with the first Flowlet identifier;
步骤S704:根据比较结果,确定是否将所述第一报文段通过所述第一转发路径转发。Step S704: According to the comparison result, determine whether to forward the first packet segment through the first forwarding path.
具体地,交换机侧在接收到数据包后,通过识别数据包中的Flowlet标识,并根据该Flowlet标识,判断第一数据包与所属的目标TCP流中的相邻的数据包的Flowlet标识是否相同,并基于此决定第一数据包是否需要通过第二数据包对应的转发路径转发。即交换机侧无需根据接收到的数据包的时间间隔,来为数据包划分Flowlet,而是直接根据接收到的数据包中所包含的Flowlet标识位来识别当前待发送的数据包是否与相同TCP流中的上一个相邻数据包属于同一个Flowlet,从而决定是通过相邻数据包的转发路径继续转发,还是为该数据包划分新的Flowlet以及为其决策新的转发路径。Specifically, after receiving the data packet, the switch side identifies the Flowlet identifier in the data packet, and according to the Flowlet identifier, determines whether the Flowlet identifier of the first data packet and the adjacent data packets in the target TCP flow to which it belongs are the same. , and based on this, it is determined whether the first data packet needs to be forwarded through the forwarding path corresponding to the second data packet. That is, the switch side does not need to divide Flowlets for data packets according to the time interval of the received data packets, but directly identifies whether the current data packets to be sent are the same TCP flow according to the Flowlet identification bits contained in the received data packets. The previous adjacent data packet in the flowlet belongs to the same Flowlet, so it is decided whether to continue forwarding through the forwarding path of the adjacent data packet, or to divide a new Flowlet for the data packet and decide a new forwarding path for it.
需要说明的是,本发明实施例中所指的转发路径,是指每个交换机其当前所能决定的转发端口。也即是一个数据包的完整转发路径,实际上可能需要经过多跳交换机分别决策的转发端口来共同决定的。因此,在本发明实施例中,在交换机侧实际上是指多跳交换机中的每个交换机均执行上述数据传输方法,进而最终确定出第一数据包的完整转发路径。It should be noted that the forwarding path referred to in the embodiment of the present invention refers to the forwarding port currently determined by each switch. That is, the complete forwarding path of a data packet may actually need to be jointly determined by the forwarding ports that are determined separately by the multi-hop switch. Therefore, in the embodiment of the present invention, on the switch side actually means that each switch in the multi-hop switch executes the above data transmission method, and then finally determines the complete forwarding path of the first data packet.
在一种可能的实现方式中,若所述第一报文段的Flowlet标识与所述第一Flowlet标识相同,则将所述第一数据包通过所述第一转发路径转发;若所述第一报文段的Flowlet标识与所述第一Flowlet标识不同,则为所述第一数据包确定第二转发路径,并通过所述第二转发路径转发。本发明实施例,当交换机识别出第一数据包与所属的目标TCP流中的相邻的数据包的Flowlet标识相同时,则将第一数据包与第二数据包在同一路径上转发;当交换机识别出第一数据包与所属的目标TCP流中的相邻的数据包的Flowlet标识不同时,则为第一数据包确定新的转发路径,并通过新的转发路径进行转发。需要说明的是,第二转发路径有可能与第一转发路径相同也可能不同,取决于交换机的决策结果。In a possible implementation manner, if the Flowlet identifier of the first packet segment is the same as the first Flowlet identifier, the first data packet is forwarded through the first forwarding path; If the Flowlet identifier of a packet segment is different from the first Flowlet identifier, a second forwarding path is determined for the first data packet and forwarded through the second forwarding path. In this embodiment of the present invention, when the switch identifies that the first data packet is the same as the Flowlet identifier of the adjacent data packet in the target TCP flow to which it belongs, the switch forwards the first data packet and the second data packet on the same path; when When the switch identifies that the first data packet is different from the flowlet identifier of the adjacent data packet in the target TCP flow to which it belongs, it determines a new forwarding path for the first data packet, and forwards the first data packet through the new forwarding path. It should be noted that the second forwarding path may be the same as or different from the first forwarding path, depending on the decision result of the switch.
在一种可能的实现方式中,所述方法还包括:若所述第一报文段的Flowlet标识与所述第一Flowlet标识不同且为第二Flowlet标识,则将所述目标TCP流的参考Flowlet标识更新为所述第二Flowlet标识,以及将所述参考转发路径更新为所述第二转发路径。本发明实施例,当第一数据包与第二数据包的Flowlet标识不同时,则说明第一数据包与所属TCP流中的上一个相邻的第二数据包之间不属于同一个Flowlet,因此,交换机需要将第一数据包划分至新的Flowlet,并且需要将所属TCP流的参考Flowlet标识更新为当前最新的数据包所对应的Flowlet标识,也即是第二Flowlet标识。In a possible implementation manner, the method further includes: if the Flowlet identifier of the first packet segment is different from the first Flowlet identifier and is the second Flowlet identifier, referencing the target TCP flow The flowlet identification is updated to the second flowlet identification, and the reference forwarding path is updated to the second forwarding path. In this embodiment of the present invention, when the Flowlet identifiers of the first data packet and the second data packet are different, it means that the first data packet and the last adjacent second data packet in the TCP flow to which they belong do not belong to the same Flowlet, Therefore, the switch needs to divide the first data packet into a new Flowlet, and needs to update the reference Flowlet ID of the TCP flow to which it belongs to the Flowlet ID corresponding to the current latest data packet, that is, the second Flowlet ID.
综上,本发明实施例在主机侧划分并标记Flowlet后,在交换机处根据数据包的标识位来识别Flowlet。交换机通过Flowlet的转发信息表(Flowlet Table)来实现上述功能,其中,表2的格式如下:To sum up, in the embodiment of the present invention, after the flowlet is divided and marked on the host side, the flowlet is identified at the switch according to the identification bit of the data packet. The switch implements the above functions through the Flowlet forwarding information table (Flowlet Table), where the format of Table 2 is as follows:
表2Table 2
Figure PCTCN2020119708-appb-000002
Figure PCTCN2020119708-appb-000002
在上述表2中,Flowlet转发信息表的每个条目可以包含三项:Entry、FLTag、Port。其中,In Table 2 above, each entry in the Flowlet forwarding information table may contain three items: Entry, FLTag, and Port. in,
(1)Entry项记录数据包五元组(源IP、目的IP、源端口、目的端口、协议号)的哈希值,并凭此哈希值来索引TCP流在转发表中对应的条目。其中,该项对应本申请中所述的五元组哈希值。(1) The Entry item records the hash value of the quintuple (source IP, destination IP, source port, destination port, and protocol number) of the data packet, and uses this hash value to index the corresponding entry of the TCP stream in the forwarding table. Wherein, this item corresponds to the quintuple hash value described in this application.
(2)FLTag项记录Flowlet标志位信息,用于相邻Flowlet的识别,其中,该项对应本申请中所述的参考Flowlet标识。(2) The FLTag item records the Flowlet flag bit information, which is used for the identification of adjacent Flowlets, wherein this item corresponds to the reference Flowlet identifier described in this application.
(3)Port项记录转发端口的信息。其中,该项对应本申请中所述的转发信息。(3) The Port item records the information of the forwarding port. Wherein, this item corresponds to the forwarding information described in this application.
基于上述表2中交换机维护的转发信息表,以下示例性描述在交换机侧根据数据包的标识位来识别Flowlet的实现过程,该实现过程可以包括如下主要步骤:Based on the forwarding information table maintained by the switch in the above Table 2, the following exemplarily describes the implementation process of identifying the Flowlet according to the identification bit of the data packet on the switch side, and the implementation process may include the following main steps:
1、对于每一个到达的数据包,交换机首先要识别出该数据包属于哪条TCP流,然后再识别该数据包属于该TCP流的哪个Flowlet。1. For each arriving data packet, the switch must first identify which TCP flow the data packet belongs to, and then identify which Flowlet of the TCP flow the data packet belongs to.
2、交换机首先对该数据包的五元组进行哈希运算得到哈希值,然后在Flowlet转发表中查找与该哈希值相等的Entry项对应的转发表条目,以判别到达的数据包属于哪条TCP流。例如哈希值为上述表1中的key1、key2等。2. The switch first performs a hash operation on the quintuple of the data packet to obtain a hash value, and then searches the Flowlet forwarding table for the forwarding table entry corresponding to the Entry item with the same hash value to determine that the arriving data packet belongs to Which TCP stream. For example, the hash value is key1, key2, etc. in Table 1 above.
3、接着读取该数据包报头FL_Tag位的值,与该转发表条目FLTag项的当前值比较,也即是将数据包中携带的Flowlet标识与对应TCP流的条目中所记录的参考Flowlet标识进行比较。3. Then read the value of the FL_Tag bit in the header of the data packet, and compare it with the current value of the FLTag item of the forwarding table entry, that is, compare the Flowlet ID carried in the data packet with the reference Flowlet ID recorded in the entry of the corresponding TCP flow Compare.
4、若相等,则该数据包属于当前流突发,并将该数据包转发到该条目Port项指示的输出端口。4. If they are equal, the data packet belongs to the current flow burst, and the data packet is forwarded to the output port indicated by the Port item of the entry.
5、若不相等,则说明该数据包是该数据流新的流突发,更新转发表条目FLTag项的值为该数据包报头FL_Tag位的值,然后对新Flowlet进行负载均衡决策,将决策结果保存至转发表条目Port项。5. If they are not equal, it means that the data packet is a new flow burst of the data flow, and the value of the FLTag item of the forwarding table entry is updated to the value of the FL_Tag bit of the data packet header, and then the load balancing decision is made for the new Flowlet, and the decision will be made. The result is saved to the Port item of the forwarding table entry.
6、在判别到达的数据包属于哪条TCP流时,若根据哈希值未查找到对应的转发表条目,则说明该数据包属于一条新的TCP流,且为新流的第一个数据包。此时需要新建一个转发表条目,转发表条目FLTag项的值为该数据包Flowlet标志位的值,再进行负载均衡决策,并将决策结果保存至转发表条目Port项。6. When judging which TCP flow the arriving data packet belongs to, if the corresponding forwarding table entry is not found according to the hash value, it means that the data packet belongs to a new TCP flow and is the first data of the new flow Bag. In this case, a new forwarding table entry needs to be created, and the value of the FLTag item of the forwarding table entry is the value of the Flowlet flag bit of the data packet, and then the load balancing decision is made, and the decision result is saved to the Port item of the forwarding table entry.
例如,假设在交换机处有一个新到达的数据包A,交换机对其进行五元组哈希,得到哈希值为key1,再假设数据包A的Flowlet标识值为0;交换机依据哈希值key1在转发信息表中索引对应表项,比对表项中FLTag值与数据包A的Flowlet标识值,由于0=0,所以将数据包A转发到表项中Port项指示的输出端口(即本申请所述的第一转发路径)。For example, assuming that there is a newly arrived packet A at the switch, the switch performs a quintuple hash on it to obtain the hash value of key1, and then assumes that the flowlet identifier value of the packet A is 0; the switch uses the hash value key1 Index the corresponding entry in the forwarding information table, compare the FLTag value in the entry with the Flowlet identification value of the data packet A, since 0=0, forward the data packet A to the output port indicated by the Port entry in the entry (that is, this the first forwarding path described in the application).
假设在交换机处有一个新到达的数据包B,交换机对其进行五元组哈希,得到哈希值为key2,再假设数据包B的Flowlet标识值为0;交换机依据哈希值key2在转发信息表中索引对应表项,比对表项中FLTag值与数据包B的Flowlet标识值,由于1≠0,说明数据包B是该TCP流的一个新flowlet的第一个数据包,所以数据包B不能被转发到表项中Port项指示的输出端口,需要对数据包B重新进行路由决策,选出新的输出端口,并更新Port项的值(即本申请所述的第二转发路径)。Suppose there is a newly arrived packet B at the switch, the switch performs a quintuple hash on it, and the hash value is key2, and then assumes that the flowlet identifier value of the packet B is 0; the switch forwards according to the hash value key2 The index in the information table corresponds to the entry, and the FLTag value in the entry is compared with the Flowlet identification value of packet B. Since 1≠0, it means that packet B is the first packet of a new flowlet of the TCP flow, so the data Packet B cannot be forwarded to the output port indicated by the Port item in the table entry, it is necessary to re-route the data packet B, select a new output port, and update the value of the Port item (that is, the second forwarding path described in this application). ).
综上,本申请主要保护点可以包括如下几点:To sum up, the main protection points of this application may include the following points:
1、在传输层或者新增的附加层增加一个附加层协议,并维护一张信息表,每条流在信息表中独占一个条目,条目中包括切分Flowlet时的相关信息。1. Add an additional layer protocol to the transport layer or a newly added additional layer, and maintain an information table, each flow has an exclusive entry in the information table, and the entry includes the relevant information when splitting the Flowlet.
2、根据ACK数据包中的时间戳选项,持续地在主机处获取路径的单程时延信息,进而可计算多路径集合(可以包括等价路径也可以包括非等价路径)中时延最大的上行路径的时延;且周期性地将最大时延差设置为切分Flowlet的时间阈值,以保证该时间阈值能动态适应于路径负载,并更新信息表。2. According to the timestamp option in the ACK packet, continuously obtain the one-way delay information of the path from the host, and then calculate the one with the largest delay in the multi-path set (which may include equal-cost paths or non-equal-cost paths). The delay of the uplink path; and the maximum delay difference is periodically set as the time threshold for dividing the Flowlet to ensure that the time threshold can dynamically adapt to the path load, and the information table is updated.
3、切分一条流时,当前报文段的时间戳与该流上一报文段的时间戳的差值一旦超过设定的时间阈值,则视当前报文段为新的Flowlet的第一个报文段;利用TCP报头中保留字段的一个比特作为标志位,来区分同一条流的不同Flowlet,其中,同一个Flowlet内所有报文段的标志位的值相同,相邻Flowlet的标志位的值相反。3. When splitting a flow, once the difference between the timestamp of the current segment and the timestamp of the previous segment of the stream exceeds the set time threshold, the current segment is regarded as the first segment of the new Flowlet. A bit of a reserved field in the TCP header is used as a flag bit to distinguish different Flowlets of the same flow. Among them, the flag bits of all message segments in the same Flowlet have the same value, and the flag bits of adjacent Flowlets have the same value. value is opposite.
4、交换机处可根据五元组哈希和传输层或附加层报头一比特标志位识别各个Flowlet,可采用任意路由算法完成数据包的转发。4. The switch can identify each Flowlet according to the five-tuple hash and the one-bit flag bit in the header of the transport layer or the additional layer, and can use any routing algorithm to complete the forwarding of the data packet.
上述详细阐述了本发明实施例的方法,下面提供了本发明实施例的相关装置。The methods of the embodiments of the present invention are described in detail above, and the related apparatuses of the embodiments of the present invention are provided below.
请参见图8,图8是本发明实施例提供的一种数据处理装置的结构示意图,该一种数据处理装置80可以包括第一生成单元801、第一确定单元802、获取单元803、第一比较单元804和Flowlet划分单元805,其中,各个单元的详细描述如下。Please refer to FIG. 8. FIG. 8 is a schematic structural diagram of a data processing apparatus provided by an embodiment of the present invention. The data processing apparatus 80 may include a first generating unit 801, a first determining unit 802, an obtaining unit 803, a first The comparing unit 804 and the Flowlet dividing unit 805, wherein the detailed description of each unit is as follows.
第一生成单元801,用于生成第一报文段;a first generating unit 801, configured to generate a first segment;
第一确定单元802,用于确定所述第一报文段所属的目标TCP流;a first determining unit 802, configured to determine the target TCP flow to which the first packet segment belongs;
获取单元803,用于获取所述第一报文段的时间戳,以及获取与所述目标TCP流匹配的目标流信息,所述目标流信息包括所述目标TCP流对应的时间阈值、所述目标TCP流中第二报文段的时间戳;其中,所述第二报文段为所述目标TCP流中与所述第一报文段相邻的上一个报文段,所述时间阈值为第一路径时延和第二路径时延之差,所述第一路径时延为所述目标TCP流的多路径集合中时延最大的上行路径的时延,所述第二路径时延为所述目标TCP流的多路径集合中时延最小的上行路径的时延;Obtaining unit 803, configured to obtain the timestamp of the first segment, and obtain target flow information matching the target TCP flow, where the target flow information includes a time threshold corresponding to the target TCP flow, the Timestamp of the second segment in the target TCP flow; wherein, the second segment is the previous segment adjacent to the first segment in the target TCP stream, and the time threshold is the first segment The difference between a path delay and a second path delay, the first path delay is the delay of the uplink path with the largest delay in the multipath set of the target TCP flow, and the second path delay is the Describe the delay of the uplink path with the smallest delay in the multipath set of the target TCP flow;
第一比较单元804,用于将所述第一报文段的时间戳与第二报文段的时间戳的差值与所述时间阈值进行比较;a first comparison unit 804, configured to compare the difference between the timestamp of the first packet segment and the timestamp of the second packet segment with the time threshold;
Flowlet划分单元805,用于根据比较结果,确定是否将所述第一报文段与所述第二报文段划分在同一个Flowlet。The flowlet dividing unit 805 is configured to determine, according to the comparison result, whether to divide the first packet segment and the second packet segment into the same Flowlet.
在一种可能的实现方式中,所述第一确定单元,具体用于:In a possible implementation manner, the first determining unit is specifically configured to:
根据所述第一报文段的源端口号,确定所述第一报文段所属的所述目标TCP流。The target TCP flow to which the first message segment belongs is determined according to the source port number of the first message segment.
在一种可能的实现方式中,所述装置还包括:In a possible implementation, the apparatus further includes:
维护单元,用于维护流信息表,所述流信息表包括N条TCP流的流信息,N为大于或者等于1的整数,其中,每条TCP流的流信息包括对应TCP流的流索引;A maintenance unit, configured to maintain a flow information table, where the flow information table includes flow information of N TCP flows, where N is an integer greater than or equal to 1, wherein the flow information of each TCP flow includes a flow index corresponding to the TCP flow;
所述获取单元,具体用于:根据所述目标TCP流的流索引从所述流信息表中查找与所述目标TCP流匹配的所述目标流信息。The obtaining unit is specifically configured to: search the target flow information matching the target TCP flow from the flow information table according to the flow index of the target TCP flow.
在一种可能的实现方式中,所述Flowlet划分单元,具体用于:In a possible implementation manner, the Flowlet is divided into units, and is specifically used for:
若所述第一报文段的时间戳与所述第二报文段的时间戳的差值小于或者等于所述时间阈值,将所述第一报文段与所述第二报文段划分在同一Flowlet中;If the difference between the timestamp of the first packet segment and the timestamp of the second packet segment is less than or equal to the time threshold, divide the first packet segment from the second packet segment in the same Flowlet;
若所述第一报文段的时间戳与所述第二报文段的时间戳的差值大于所述时间阈值,将所述第一报文段划分至新的Flowlet中。If the difference between the timestamp of the first packet segment and the timestamp of the second packet segment is greater than the time threshold, the first packet segment is divided into a new Flowlet.
在一种可能的实现方式中,所述目标流信息还包括所述目标TCP流的参考Flowlet标识,所述参考Flowlet标识当前为所述第二报文段对应的第一Flowlet标识;所述装置还包括:In a possible implementation manner, the target flow information further includes a reference Flowlet identifier of the target TCP flow, and the reference Flowlet identifier is currently the first Flowlet identifier corresponding to the second segment; the apparatus Also includes:
第二生成单元,用于生成第一数据包,所述第一数据包包括所述第一报文段和所述第一报文段的Flowlet标识;其中,a second generating unit, configured to generate a first data packet, where the first data packet includes the first packet segment and a Flowlet identifier of the first packet segment; wherein,
若所述第一报文段的时间戳与所述第二报文段的时间戳的差值小于或者等于所述时间阈值,则所述第一报文段的Flowlet标识为所述第一Flowlet标识;If the difference between the timestamp of the first packet segment and the timestamp of the second packet segment is less than or equal to the time threshold, the flowlet of the first packet segment is identified as the first flowlet identification;
若所述第一报文段的时间戳与所述第二报文段的时间戳的差值大于所述时间阈值,则所述第一报文段的Flowlet标识为第二Flowlet标识。If the difference between the timestamp of the first packet segment and the timestamp of the second packet segment is greater than the time threshold, the Flowlet identifier of the first packet segment is the second Flowlet identifier.
在一种可能的实现方式中,所述装置还包括:In a possible implementation, the apparatus further includes:
第一更新单元,用于若所述第一报文段的时间戳与所述第二报文段的时间戳的差值大于所述时间阈值,则将所述参考Flowlet标识更新为所述第二Flowlet标识。a first update unit, configured to update the reference Flowlet identifier to the first message segment if the difference between the timestamp of the first segment and the timestamp of the second segment is greater than the time threshold 2. Flowlet logo.
在一种可能的实现方式中,所述装置还包括:In a possible implementation, the apparatus further includes:
接收单元,用于接收目标ACK包,所述目标ACK包为与所述目标TCP流的目的端口号相同或者目的地址相同的ACK包;a receiving unit, configured to receive a target ACK packet, where the target ACK packet is an ACK packet with the same destination port number or the same destination address as the target TCP stream;
第二确定单元,用于确定所述目标ACK包的上行路径时延,所述上行路径时延为所述目标ACK包的时间戳值和时间戳回送回答值之间的差值;a second determining unit, configured to determine the uplink path delay of the target ACK packet, where the uplink path delay is the difference between the timestamp value of the target ACK packet and the timestamp echo response value;
第二比较单元,用于将所述目标ACK包的上行路径时延与所述目标TCP流的多路径集合中上行路径的时延进行比较;a second comparison unit, configured to compare the uplink path delay of the target ACK packet with the uplink path delay in the multipath set of the target TCP flow;
第二更新单元,用于若所述目标ACK包的上行路径时延大于所述多路径集合中当前时延最大的上行路径的时延,则将所述第一路径时延更新为所述目标ACK包的上行路径时延;a second updating unit, configured to update the first path delay to the target if the uplink path delay of the target ACK packet is greater than the delay of the uplink path with the current maximum delay in the multipath set Upstream path delay of ACK packet;
第三更新单元,用于若所述目标ACK包的上行路径时延小于所述多路径集合中当前时延最小的上行路径的时延,则将所述第二路径时延更新为所述目标ACK包的上行路径时延。a third updating unit, configured to update the second path delay to the target if the uplink path delay of the target ACK packet is smaller than the delay of the uplink path with the smallest current delay in the multipath set Upstream path delay of ACK packets.
在一种可能的实现方式中,所述多路径集合包括所述目标TCP流的多条等价传输路径;或者,所述多路径集合包括所述目标TCP流的多条等价传输路径以及非等价传输路径;或者,所述多路径集合包括所述目标TCP流的多条非等价传输路径。In a possible implementation manner, the multi-path set includes multiple equivalent transmission paths of the target TCP flow; or, the multi-path set includes multiple equivalent transmission paths of the target TCP flow and non-equivalent transmission paths of the target TCP flow Equivalent transmission paths; or, the multi-path set includes multiple non-equivalent transmission paths of the target TCP stream.
需要说明的是,本发明实施例中所描述的数据处理装置80中各功能单元的功能可参见上述图5中所述的方法实施例中步骤S501-步骤S506的相关描述,此处不再赘述。It should be noted that, for the functions of each functional unit in the data processing apparatus 80 described in this embodiment of the present invention, reference may be made to the relevant descriptions of steps S501 to S506 in the method embodiment described above in FIG. 5 , and details are not repeated here. .
请参见图9,图9是本发明实施例提供的一种数据传输装置的结构示意图,该数据传输装置90可以包括接收单元901、确定单元902、比较单元903和转发单元904,其中,各个单元的详细描述如下。Please refer to FIG. 9. FIG. 9 is a schematic structural diagram of a data transmission apparatus provided by an embodiment of the present invention. The data transmission apparatus 90 may include a receiving unit 901, a determining unit 902, a comparing unit 903, and a forwarding unit 904, wherein each unit The detailed description is as follows.
接收单元901,用于接收第一数据包,所述第一数据包包括第一报文段和所述第一报文段的Flowlet标识,所述第一数据包属于目标TCP流;A receiving unit 901, configured to receive a first data packet, where the first data packet includes a first message segment and a Flowlet identifier of the first message segment, and the first data packet belongs to a target TCP flow;
确定单元902,用于确定所述第一数据包所属的目标TCP流,获取与所述目标TCP流匹配的转发信息;所述转发信息包括所述目标TCP流的参考Flowlet标识以及参考转发路径;其中,所述参考Flowlet标识当前为第二报文段对应的第一Flowlet标识,所述第二报文段为所述目标TCP流中与所述第一报文段相邻的上一个报文段;所述参考转发路径为第二数据包的第一转发路径,所述第二数据包包括所述第二报文段和所述第一Flowlet标识;A determining unit 902, configured to determine the target TCP flow to which the first data packet belongs, and obtain forwarding information matching the target TCP flow; the forwarding information includes a reference Flowlet identifier of the target TCP flow and a reference forwarding path; Wherein, the reference Flowlet identifier is currently the first Flowlet identifier corresponding to the second packet segment, and the second packet segment is the previous packet segment adjacent to the first packet segment in the target TCP flow; The reference forwarding path is a first forwarding path of a second data packet, and the second data packet includes the second packet segment and the first Flowlet identifier;
比较单元903,用于将所述第一报文段的Flowlet标识与所述第一Flowlet标识进行比较;a comparison unit 903, configured to compare the Flowlet identifier of the first segment with the first Flowlet identifier;
转发单元904,用于根据比较结果,确定是否将所述第一报文段通过所述第一转发路径转发。The forwarding unit 904 is configured to determine, according to the comparison result, whether to forward the first packet segment through the first forwarding path.
在一种可能的实现方式中,所述交换机维护有转发信息表,所述转发信息表包括M条TCP流的转发信息,M为大于或者等于1的整数,其中,每条TCP流的转发信息包括对应TCP流的五元组哈希值;所述确定单元,具体用于:In a possible implementation manner, the switch maintains a forwarding information table, and the forwarding information table includes forwarding information of M TCP flows, where M is an integer greater than or equal to 1, wherein the forwarding information of each TCP flow Including the five-tuple hash value corresponding to the TCP flow; the determining unit is specifically used for:
根据所述第一数据包的五元组信息,计算所述第一数据包的五元组哈希值;According to the quintuple information of the first data packet, calculate the quintuple hash value of the first data packet;
根据所述第一数据包的五元组哈希值,从所述转发信息表中查找与所述目标TCP流匹配的转发信息。According to the quintuple hash value of the first data packet, the forwarding information matching the target TCP flow is searched from the forwarding information table.
在一种可能的实现方式中,所述转发单元,具体用于:In a possible implementation manner, the forwarding unit is specifically used for:
若所述第一报文段的Flowlet标识与所述第一Flowlet标识相同,则将所述第一数据包通过所述第一转发路径转发;If the Flowlet identifier of the first packet segment is the same as the first Flowlet identifier, forwarding the first data packet through the first forwarding path;
若所述第一报文段的Flowlet标识与所述第一Flowlet标识不同,则为所述第一数据包确定第二转发路径,并通过所述第二转发路径转发。If the Flowlet identifier of the first packet segment is different from the first Flowlet identifier, a second forwarding path is determined for the first data packet, and forwarded through the second forwarding path.
在一种可能的实现方式中,所述装置还包括:In a possible implementation, the apparatus further includes:
更新单元,若所述第一报文段的Flowlet标识与所述第一Flowlet标识不同且为第二Flowlet标识,则将所述目标TCP流的参考Flowlet标识更新为所述第二Flowlet标识,以及将所述参考转发路径更新为所述第二转发路径。an update unit, if the Flowlet identifier of the first segment is different from the first Flowlet identifier and is the second Flowlet identifier, updating the reference Flowlet identifier of the target TCP flow to the second Flowlet identifier, and The reference forwarding path is updated to the second forwarding path.
需要说明的是,本发明实施例中所描述的数据传输装置90中各功能单元的功能可参见上述图7中所述的方法实施例中步骤S701-步骤S704的相关描述,此处不再赘述。It should be noted that, for the functions of each functional unit in the data transmission device 90 described in the embodiment of the present invention, reference may be made to the relevant descriptions of steps S701 to S704 in the method embodiment described above in FIG. 7 , and details are not repeated here. .
本发明实施例还提供一种主机,其中,该主机包括处理器、存储器以及通信接口,其中,所述存储器用于存储数据处理程序代码,所述处理器用于调用所述数据处理程序代码来执行上述方法实施例中记载的任意一种数据处理方法的部分或全部步骤。An embodiment of the present invention further provides a host, wherein the host includes a processor, a memory, and a communication interface, wherein the memory is used for storing data processing program codes, and the processor is used for calling the data processing program codes to execute Part or all of the steps of any one of the data processing methods described in the above method embodiments.
本发明实施例还提供一种交换机,其中,该主机包括处理器、存储器以及通信接口,其中,所述存储器用于存储数据传输程序代码,所述处理器用于调用所述数据传输程序代码来执行上述方法实施例中记载的任意一种数据传输方法的部分或全部步骤。An embodiment of the present invention further provides a switch, wherein the host includes a processor, a memory, and a communication interface, wherein the memory is used to store a data transmission program code, and the processor is used to call the data transmission program code to execute Part or all of the steps of any one of the data transmission methods described in the above method embodiments.
本发明实施例还提供一种计算机可读存储介质,其中,该计算机可读存储介质可存储有程序,该程序被主机执行时包括上述方法实施例中记载的任意一种数据处理方法的部分或全部步骤。Embodiments of the present invention further provide a computer-readable storage medium, wherein the computer-readable storage medium may store a program, and when the program is executed by a host, the program includes any part or part of any of the data processing methods described in the above method embodiments. all steps.
本发明实施例还提供一种计算机程序,该计算机程序包括指令,当该计算机程序被交换机执行时,使得所述交换机可以执行任意一种数据传输方法的部分或全部步骤。The embodiment of the present invention also provides a computer program, the computer program includes instructions, when the computer program is executed by the switch, the switch can execute part or all of the steps of any data transmission method.
在上述实施例中,对各个实施例的描述都各有侧重,某个实施例中没有详述的部分,可以参见其它实施例的相关描述。In the above-mentioned embodiments, the description of each embodiment has its own emphasis. For parts that are not described in detail in a certain embodiment, reference may be made to the relevant descriptions of other embodiments.
需要说明的是,对于前述的各方法实施例,为了简单描述,故将其都表述为一系列的动作组合,但是本领域技术人员应该知悉,本申请并不受所描述的动作顺序的限制,因为依据本申请,某些步骤可能可以采用其它顺序或者同时进行。其次,本领域技术人员也应该知悉,说明书中所描述的实施例均属于优选实施例,所涉及的动作和模块并不一定是本申请所必须的。It should be noted that, for the sake of simple description, the foregoing method embodiments are all expressed as a series of action combinations, but those skilled in the art should know that the present application is not limited by the described action sequence. Because in accordance with the present application, certain steps may be performed in other orders or concurrently. Secondly, those skilled in the art should also know that the embodiments described in the specification are all preferred embodiments, and the actions and modules involved are not necessarily required by the present application.
在本申请所提供的几个实施例中,应该理解到,所揭露的装置,可通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如上述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性或其它的形式。In the several embodiments provided in this application, it should be understood that the disclosed apparatus may be implemented in other manners. For example, the device embodiments described above are only illustrative. For example, the division of the above-mentioned units is only a logical function division. In actual implementation, there may be other division methods. For example, multiple units or components may be combined or integrated. to another system, or some features can be ignored, or not implemented. On the other hand, the shown or discussed mutual coupling or direct coupling or communication connection may be through some interfaces, indirect coupling or communication connection of devices or units, and may be in electrical or other forms.
上述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。The units described above as separate components may or may not be physically separated, and components shown as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution in this embodiment.
另外,在本申请各实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。In addition, each functional unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist physically alone, or two or more units may be integrated into one unit. The above-mentioned integrated units may be implemented in the form of hardware, or may be implemented in the form of software functional units.
上述集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以为个人计算机、服务器或者网络设备等,具体可以是计算机设备中的处理器)执行本申请各个实施例上述方法的全部或部分步骤。其中,而前述的存储介质可包括:U盘、移动硬盘、磁碟、光盘、只读存储器(Read-Only Memory,缩写:ROM)或者随机存取存储器(Random Access Memory,缩写:RAM)等各种可以存储程序代码的介质。If the above-mentioned integrated units are implemented in the form of software functional units and sold or used as independent products, they may be stored in a computer-readable storage medium. Based on this understanding, the technical solutions of the present application can be embodied in the form of software products in essence, or the parts that contribute to the prior art, or all or part of the technical solutions, and the computer software products are stored in a storage medium , including several instructions to cause a computer device (which may be a personal computer, a server, or a network device, etc., specifically a processor in the computer device) to execute all or part of the steps of the foregoing methods in the various embodiments of the present application. Wherein, the aforementioned storage medium may include: U disk, mobile hard disk, magnetic disk, optical disk, Read-Only Memory (Read-Only Memory, abbreviation: ROM) or Random Access Memory (Random Access Memory, abbreviation: RAM), etc. A medium that can store program code.
以上所述,以上实施例仅用以说明本申请的技术方案,而非对其限制;尽管参照前述实施例对本申请进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本申请各实施例技术方案的精神和范围。As mentioned above, the above embodiments are only used to illustrate the technical solutions of the present application, but not to limit them; although the present application has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art should understand: The technical solutions described in the embodiments are modified, or some technical features thereof are equivalently replaced; and these modifications or replacements do not make the essence of the corresponding technical solutions deviate from the spirit and scope of the technical solutions in the embodiments of the present application.

Claims (30)

  1. 一种数据处理方法,其特征在于,应用于主机,包括:A data processing method, characterized in that, applied to a host, comprising:
    生成第一报文段,确定所述第一报文段所属的目标TCP流;generating a first message segment, and determining the target TCP flow to which the first message segment belongs;
    获取所述第一报文段的时间戳,以及获取与所述目标TCP流匹配的目标流信息,所述目标流信息包括所述目标TCP流对应的时间阈值、所述目标TCP流中第二报文段的时间戳;其中,所述第二报文段为所述目标TCP流中与所述第一报文段相邻的上一个报文段,所述时间阈值为第一路径时延和第二路径时延之差,所述第一路径时延为所述目标TCP流的多路径集合中时延最大的上行路径的时延,所述第二路径时延为所述目标TCP流的多路径集合中时延最小的上行路径的时延;Obtain the timestamp of the first segment, and obtain target flow information that matches the target TCP flow, where the target flow information includes the time threshold corresponding to the target TCP flow, the second in the target TCP flow Timestamp of the message segment; wherein, the second message segment is the last message segment adjacent to the first message segment in the target TCP flow, and the time threshold is the first path delay and the first message segment. The difference between the two path delays, the first path delay is the delay of the upstream path with the largest delay in the multi-path set of the target TCP flow, and the second path delay is the multi-path delay of the target TCP flow The delay of the uplink path with the smallest delay in the path set;
    将所述第一报文段的时间戳与第二报文段的时间戳的差值与所述时间阈值进行比较;comparing the difference between the timestamp of the first segment and the timestamp of the second segment with the time threshold;
    根据比较结果,确定是否将所述第一报文段与所述第二报文段划分在同一个Flowlet。According to the comparison result, it is determined whether the first segment and the second segment are divided into the same Flowlet.
  2. 如权利要求1所述的方法,其特征在于,所述确定所述第一报文段所属的目标TCP流,包括:The method according to claim 1, wherein the determining the target TCP flow to which the first segment belongs comprises:
    根据所述第一报文段的源端口号,确定所述第一报文段所属的所述目标TCP流。The target TCP flow to which the first message segment belongs is determined according to the source port number of the first message segment.
  3. 如权利要求1或2所述的方法,其特征在于,所述主机维护有流信息表,所述流信息表包括N条TCP流的流信息,N为大于或者等于1的整数,其中,每条TCP流的流信息包括对应TCP流的流索引;所述获取与所述目标TCP流匹配的目标流信息,包括:The method according to claim 1 or 2, wherein the host maintains a flow information table, and the flow information table includes flow information of N TCP flows, where N is an integer greater than or equal to 1, wherein each The flow information of the TCP flow includes the flow index of the corresponding TCP flow; the acquisition of the target flow information matching the target TCP flow includes:
    根据所述目标TCP流的流索引从所述流信息表中查找与所述目标TCP流匹配的所述目标流信息。The target flow information matching the target TCP flow is searched from the flow information table according to the flow index of the target TCP flow.
  4. 如权利要求1-3任意一项所述的方法,其特征在于,所述根据比较结果,确定是否将所述第一报文段与所述第二报文段划分为同一个Flowlet,包括:The method according to any one of claims 1-3, wherein the determining, according to the comparison result, whether to divide the first packet segment and the second packet segment into the same Flowlet, comprising:
    若所述第一报文段的时间戳与所述第二报文段的时间戳的差值小于或者等于所述时间阈值,将所述第一报文段与所述第二报文段划分在同一Flowlet中;If the difference between the timestamp of the first packet segment and the timestamp of the second packet segment is less than or equal to the time threshold, divide the first packet segment from the second packet segment in the same Flowlet;
    若所述第一报文段的时间戳与所述第二报文段的时间戳的差值大于所述时间阈值,将所述第一报文段划分至新的Flowlet中。If the difference between the timestamp of the first packet segment and the timestamp of the second packet segment is greater than the time threshold, the first packet segment is divided into a new Flowlet.
  5. 如权利要求1-4任意一项所述的方法,其特征在于,所述目标流信息还包括所述目标TCP流的参考Flowlet标识,所述参考Flowlet标识当前为所述第二报文段对应的第一Flowlet标识;所述方法还包括:The method according to any one of claims 1-4, wherein the target flow information further includes a reference Flowlet identifier of the target TCP flow, and the reference Flowlet identifier currently corresponds to the second packet segment The first Flowlet identification; the method also includes:
    生成第一数据包,所述第一数据包包括所述第一报文段和所述第一报文段的Flowlet标识;其中,generating a first data packet, where the first data packet includes the first segment and the Flowlet identifier of the first segment; wherein,
    若所述第一报文段的时间戳与所述第二报文段的时间戳的差值小于或者等于所述时间阈值,则所述第一报文段的Flowlet标识为所述第一Flowlet标识;If the difference between the timestamp of the first packet segment and the timestamp of the second packet segment is less than or equal to the time threshold, the flowlet of the first packet segment is identified as the first flowlet identification;
    若所述第一报文段的时间戳与所述第二报文段的时间戳的差值大于所述时间阈值,则 所述第一报文段的Flowlet标识为第二Flowlet标识。If the difference between the timestamp of the first packet segment and the timestamp of the second packet segment is greater than the time threshold, the Flowlet identifier of the first packet segment is the second Flowlet identifier.
  6. 如权利要求5所述的方法,其特征在于,所述方法还包括:The method of claim 5, wherein the method further comprises:
    若所述第一报文段的时间戳与所述第二报文段的时间戳的差值大于所述时间阈值,则将所述参考Flowlet标识更新为所述第二Flowlet标识。If the difference between the timestamp of the first packet segment and the timestamp of the second packet segment is greater than the time threshold, the reference Flowlet identifier is updated to the second Flowlet identifier.
  7. 如权利要求1-6任意一项所述的方法,其特征在于,所述方法还包括:The method according to any one of claims 1-6, wherein the method further comprises:
    接收目标ACK包,所述目标ACK包为与所述目标TCP流的目的端口号相同或者目的地址相同的ACK包;Receive a target ACK packet, where the target ACK packet is an ACK packet with the same destination port number or the same destination address as the target TCP stream;
    确定所述目标ACK包的上行路径时延,所述上行路径时延为所述目标ACK包的时间戳值和时间戳回送回答值之间的差值;determining the uplink path delay of the target ACK packet, where the uplink path delay is the difference between the timestamp value of the target ACK packet and the timestamp echo response value;
    将所述目标ACK包的上行路径时延与所述目标TCP流的多路径集合中上行路径的时延进行比较;comparing the uplink path delay of the target ACK packet with the uplink path delay in the multipath set of the target TCP flow;
    若所述目标ACK包的上行路径时延大于所述多路径集合中当前时延最大的上行路径的时延,则将所述第一路径时延更新为所述目标ACK包的上行路径时延;If the uplink path delay of the target ACK packet is greater than the delay of the uplink path with the largest current delay in the multipath set, update the first path delay to the uplink path delay of the target ACK packet ;
    若所述目标ACK包的上行路径时延小于所述多路径集合中当前时延最小的上行路径的时延,则将所述第二路径时延更新为所述目标ACK包的上行路径时延。If the uplink path delay of the target ACK packet is less than the delay of the uplink path with the smallest current delay in the multipath set, update the second path delay to the uplink path delay of the target ACK packet .
  8. 如权利要求1-7任意一项所述的方法,其特征在于,所述多路径集合包括所述目标TCP流的多条等价传输路径;或者,所述多路径集合包括所述目标TCP流的多条等价传输路径以及非等价传输路径;或者,所述多路径集合包括所述目标TCP流的多条非等价传输路径。The method according to any one of claims 1-7, wherein the multi-path set includes multiple equivalent transmission paths of the target TCP flow; or, the multi-path set includes the target TCP flow multiple equivalent transmission paths and non-equivalent transmission paths; or, the multi-path set includes multiple non-equivalent transmission paths of the target TCP stream.
  9. 一种数据传输方法,其特征在于,应用于交换机,包括:A data transmission method, characterized in that, applied to a switch, comprising:
    接收第一数据包,所述第一数据包包括第一报文段和所述第一报文段的Flowlet标识;receiving a first data packet, where the first data packet includes a first segment and a Flowlet identifier of the first segment;
    确定所述第一数据包所属的目标TCP流,获取与所述目标TCP流匹配的转发信息;所述转发信息包括所述目标TCP流的参考Flowlet标识以及参考转发路径;其中,所述参考Flowlet标识当前为第二报文段对应的第一Flowlet标识,所述第二报文段为所述目标TCP流中与所述第一报文段相邻的上一个报文段;所述参考转发路径为第二数据包的第一转发路径,所述第二数据包包括所述第二报文段和所述第一Flowlet标识;Determine the target TCP flow to which the first data packet belongs, and obtain forwarding information matching the target TCP flow; the forwarding information includes a reference Flowlet identifier of the target TCP flow and a reference forwarding path; wherein the reference Flowlet The identifier is currently the first Flowlet identifier corresponding to the second packet segment, and the second packet segment is the previous packet segment adjacent to the first packet segment in the target TCP flow; the reference forwarding path is A first forwarding path of a second data packet, where the second data packet includes the second packet segment and the first Flowlet identifier;
    将所述第一报文段的Flowlet标识与所述第一Flowlet标识进行比较;comparing the Flowlet identifier of the first segment with the first Flowlet identifier;
    根据比较结果,确定是否将所述第一报文段通过所述第一转发路径转发。According to the comparison result, it is determined whether to forward the first segment through the first forwarding path.
  10. 如权利要求9所述的方法,其特征在于,所述交换机维护有转发信息表,所述转发信息表包括M条TCP流的转发信息,M为大于或者等于1的整数,其中,每条TCP流的转发信息包括对应TCP流的五元组哈希值;所述确定所述第一数据包所属的目标TCP流,获取与所述目标TCP流匹配的转发信息,包括:The method according to claim 9, wherein the switch maintains a forwarding information table, and the forwarding information table includes forwarding information of M TCP flows, where M is an integer greater than or equal to 1, wherein each TCP flow The forwarding information of the flow includes a quintuple hash value corresponding to the TCP flow; the determining the target TCP flow to which the first data packet belongs, and obtaining the forwarding information matching the target TCP flow, including:
    根据所述第一数据包的五元组信息,计算所述第一数据包的五元组哈希值;According to the quintuple information of the first data packet, calculate the quintuple hash value of the first data packet;
    根据所述第一数据包的五元组哈希值,从所述转发信息表中查找与所述目标TCP流匹配的转发信息。According to the quintuple hash value of the first data packet, the forwarding information matching the target TCP flow is searched from the forwarding information table.
  11. 如权利要求9或10所述的方法,其特征在于,所述根据比较结果,确定是否将所述第一报文段通过所述转发路径转发,包括:The method according to claim 9 or 10, wherein the determining, according to the comparison result, whether to forward the first packet segment through the forwarding path comprises:
    若所述第一报文段的Flowlet标识与所述第一Flowlet标识相同,则将所述第一数据包通过所述第一转发路径转发;If the Flowlet identifier of the first packet segment is the same as the first Flowlet identifier, forwarding the first data packet through the first forwarding path;
    若所述第一报文段的Flowlet标识与所述第一Flowlet标识不同,则为所述第一数据包确定第二转发路径,并通过所述第二转发路径转发。If the Flowlet identifier of the first packet segment is different from the first Flowlet identifier, a second forwarding path is determined for the first data packet, and forwarded through the second forwarding path.
  12. 如权利要求11所述的方法,其特征在于,所述方法还包括:The method of claim 11, wherein the method further comprises:
    若所述第一报文段的Flowlet标识与所述第一Flowlet标识不同且为第二Flowlet标识,则将所述目标TCP流的参考Flowlet标识更新为所述第二Flowlet标识,以及将所述参考转发路径更新为所述第二转发路径。If the Flowlet identifier of the first segment is different from the first Flowlet identifier and is the second Flowlet identifier, updating the reference Flowlet identifier of the target TCP flow to the second Flowlet identifier, and updating the The reference forwarding path is updated to the second forwarding path.
  13. 一种数据处理装置,其特征在于,包括:A data processing device, comprising:
    第一生成单元,用于生成第一报文段;a first generating unit, configured to generate a first segment;
    第一确定单元,用于确定所述第一报文段所属的目标TCP流;a first determining unit, configured to determine the target TCP flow to which the first segment belongs;
    获取单元,用于获取所述第一报文段的时间戳,以及获取与所述目标TCP流匹配的目标流信息,所述目标流信息包括所述目标TCP流对应的时间阈值、所述目标TCP流中第二报文段的时间戳;其中,所述第二报文段为所述目标TCP流中与所述第一报文段相邻的上一个报文段,所述时间阈值为第一路径时延和第二路径时延之差,所述第一路径时延为所述目标TCP流的多路径集合中时延最大的上行路径的时延,所述第二路径时延为所述目标TCP流的多路径集合中时延最小的上行路径的时延;an obtaining unit, configured to obtain the timestamp of the first segment, and obtain target flow information that matches the target TCP flow, where the target flow information includes a time threshold corresponding to the target TCP flow, the target flow Timestamp of the second segment in the TCP flow; wherein the second segment is the previous segment adjacent to the first segment in the target TCP stream, and the time threshold is the first segment The difference between the path delay and the second path delay, the first path delay is the delay of the uplink path with the largest delay in the multipath set of the target TCP flow, and the second path delay is the The delay of the uplink path with the smallest delay in the multipath set of the target TCP flow;
    第一比较单元,用于将所述第一报文段的时间戳与第二报文段的时间戳的差值与所述时间阈值进行比较;a first comparison unit, configured to compare the difference between the timestamp of the first packet segment and the timestamp of the second packet segment with the time threshold;
    Flowlet划分单元,用于根据比较结果,确定是否将所述第一报文段与所述第二报文段划分在同一个Flowlet。The flowlet dividing unit is configured to determine, according to the comparison result, whether to divide the first packet segment and the second packet segment into the same Flowlet.
  14. 如权利要求13所述的装置,其特征在于,所述第一确定单元,具体用于:The apparatus of claim 13, wherein the first determining unit is specifically configured to:
    根据所述第一报文段的源端口号,确定所述第一报文段所属的所述目标TCP流。The target TCP flow to which the first message segment belongs is determined according to the source port number of the first message segment.
  15. 如权利要求13或14所述的装置,其特征在于,所述装置还包括:The apparatus of claim 13 or 14, wherein the apparatus further comprises:
    维护单元,用于维护流信息表,所述流信息表包括N条TCP流的流信息,N为大于或者等于1的整数,其中,每条TCP流的流信息包括对应TCP流的流索引;A maintenance unit, configured to maintain a flow information table, where the flow information table includes flow information of N TCP flows, where N is an integer greater than or equal to 1, wherein the flow information of each TCP flow includes a flow index corresponding to the TCP flow;
    所述获取单元,具体用于:根据所述目标TCP流的流索引从所述流信息表中查找与所述目标TCP流匹配的所述目标流信息。The obtaining unit is specifically configured to: search the target flow information matching the target TCP flow from the flow information table according to the flow index of the target TCP flow.
  16. 如权利要求13-15任意一项所述的装置,其特征在于,所述Flowlet划分单元,具体用于:The apparatus according to any one of claims 13-15, wherein the Flowlet dividing unit is specifically used for:
    若所述第一报文段的时间戳与所述第二报文段的时间戳的差值小于或者等于所述时间阈值,将所述第一报文段与所述第二报文段划分在同一Flowlet中;If the difference between the timestamp of the first packet segment and the timestamp of the second packet segment is less than or equal to the time threshold, divide the first packet segment from the second packet segment in the same Flowlet;
    若所述第一报文段的时间戳与所述第二报文段的时间戳的差值大于所述时间阈值,将所述第一报文段划分至新的Flowlet中。If the difference between the timestamp of the first packet segment and the timestamp of the second packet segment is greater than the time threshold, the first packet segment is divided into a new Flowlet.
  17. 如权利要求13-16任意一项所述的装置,其特征在于,所述目标流信息还包括所述目标TCP流的参考Flowlet标识,所述参考Flowlet标识当前为所述第二报文段对应的第一Flowlet标识;所述装置还包括:The apparatus according to any one of claims 13-16, wherein the target flow information further includes a reference Flowlet identifier of the target TCP flow, and the reference Flowlet identifier currently corresponds to the second packet segment The first Flowlet identification; the device also includes:
    第二生成单元,用于生成第一数据包,所述第一数据包包括所述第一报文段和所述第一报文段的Flowlet标识;其中,a second generating unit, configured to generate a first data packet, where the first data packet includes the first packet segment and the Flowlet identifier of the first packet segment; wherein,
    若所述第一报文段的时间戳与所述第二报文段的时间戳的差值小于或者等于所述时间阈值,则所述第一报文段的Flowlet标识为所述第一Flowlet标识;If the difference between the timestamp of the first packet segment and the timestamp of the second packet segment is less than or equal to the time threshold, the flowlet of the first packet segment is identified as the first flowlet identification;
    若所述第一报文段的时间戳与所述第二报文段的时间戳的差值大于所述时间阈值,则所述第一报文段的Flowlet标识为第二Flowlet标识。If the difference between the timestamp of the first packet segment and the timestamp of the second packet segment is greater than the time threshold, the Flowlet identifier of the first packet segment is the second Flowlet identifier.
  18. 如权利要求17所述的装置,其特征在于,所述装置还包括:The apparatus of claim 17, wherein the apparatus further comprises:
    第一更新单元,用于若所述第一报文段的时间戳与所述第二报文段的时间戳的差值大于所述时间阈值,则将所述参考Flowlet标识更新为所述第二Flowlet标识。a first update unit, configured to update the reference Flowlet identifier to the first message segment if the difference between the timestamp of the first segment and the timestamp of the second segment is greater than the time threshold 2. Flowlet logo.
  19. 如权利要求13-18任意一项所述的装置,其特征在于,所述装置还包括:The device according to any one of claims 13-18, wherein the device further comprises:
    接收单元,用于接收目标ACK包,所述目标ACK包为与所述目标TCP流的目的端口号相同或者目的地址相同的ACK包;a receiving unit, configured to receive a target ACK packet, where the target ACK packet is an ACK packet with the same destination port number or the same destination address as the target TCP stream;
    第二确定单元,用于确定所述目标ACK包的上行路径时延,所述上行路径时延为所述目标ACK包的时间戳值和时间戳回送回答值之间的差值;a second determining unit, configured to determine the uplink path delay of the target ACK packet, where the uplink path delay is the difference between the timestamp value of the target ACK packet and the timestamp echo response value;
    第二比较单元,用于将所述目标ACK包的上行路径时延与所述目标TCP流的多路径集合中上行路径的时延进行比较;a second comparison unit, configured to compare the uplink path delay of the target ACK packet with the uplink path delay in the multipath set of the target TCP flow;
    第二更新单元,用于若所述目标ACK包的上行路径时延大于所述多路径集合中当前时延最大的上行路径的时延,则将所述第一路径时延更新为所述目标ACK包的上行路径时延;a second updating unit, configured to update the first path delay to the target if the uplink path delay of the target ACK packet is greater than the delay of the uplink path with the current maximum delay in the multipath set Upstream path delay of ACK packet;
    第三更新单元,用于若所述目标ACK包的上行路径时延小于所述多路径集合中当前时延最小的上行路径的时延,则将所述第二路径时延更新为所述目标ACK包的上行路径时延。a third updating unit, configured to update the second path delay to the target if the uplink path delay of the target ACK packet is smaller than the delay of the uplink path with the smallest current delay in the multipath set Upstream path delay of ACK packets.
  20. 如权利要求13-19任意一项所述的装置,其特征在于,所述多路径集合包括所述目标TCP流的多条等价传输路径;或者,所述多路径集合包括所述目标TCP流的多条等价传输路径以及非等价传输路径;或者,所述多路径集合包括所述目标TCP流的多条非等价传输路径。The apparatus according to any one of claims 13-19, wherein the multi-path set includes multiple equivalent transmission paths of the target TCP flow; or, the multi-path set includes the target TCP flow multiple equivalent transmission paths and non-equivalent transmission paths; or, the multi-path set includes multiple non-equivalent transmission paths of the target TCP stream.
  21. 一种数据传输装置,其特征在于,包括:A data transmission device, comprising:
    接收单元,用于接收第一数据包,所述第一数据包包括第一报文段和所述第一报文段的Flowlet标识,所述第一数据包属于目标TCP流;a receiving unit, configured to receive a first data packet, where the first data packet includes a first message segment and a Flowlet identifier of the first message segment, and the first data packet belongs to a target TCP flow;
    确定单元,用于确定所述第一数据包所属的目标TCP流,获取与所述目标TCP流匹配的转发信息;所述转发信息包括所述目标TCP流的参考Flowlet标识以及参考转发路径;其中,所述参考Flowlet标识当前为第二报文段对应的第一Flowlet标识,所述第二报文段为所述目标TCP流中与所述第一报文段相邻的上一个报文段;所述参考转发路径为第二数据包的第一转发路径,所述第二数据包包括所述第二报文段和所述第一Flowlet标识;a determining unit, configured to determine a target TCP flow to which the first data packet belongs, and obtain forwarding information matching the target TCP flow; the forwarding information includes a reference Flowlet identifier of the target TCP flow and a reference forwarding path; wherein , the reference Flowlet identifier is currently the first Flowlet identifier corresponding to the second message segment, and the second message segment is the previous message segment adjacent to the first message segment in the target TCP flow; The reference forwarding path is a first forwarding path of a second data packet, and the second data packet includes the second packet segment and the first Flowlet identifier;
    比较单元,用于将所述第一报文段的Flowlet标识与所述第一Flowlet标识进行比较;a comparison unit, configured to compare the Flowlet identifier of the first segment with the first Flowlet identifier;
    转发单元,用于根据比较结果,确定是否将所述第一报文段通过所述第一转发路径转发。A forwarding unit, configured to determine whether to forward the first packet segment through the first forwarding path according to the comparison result.
  22. 如权利要求21所述的装置,其特征在于,所述交换机维护有转发信息表,所述转发信息表包括M条TCP流的转发信息,M为大于或者等于1的整数,其中,每条TCP流的转发信息包括对应TCP流的五元组哈希值;所述确定单元,具体用于:The apparatus according to claim 21, wherein the switch maintains a forwarding information table, and the forwarding information table includes forwarding information of M TCP flows, where M is an integer greater than or equal to 1, wherein each TCP flow The forwarding information of the flow includes the quintuple hash value corresponding to the TCP flow; the determining unit is specifically used for:
    根据所述第一数据包的五元组信息,计算所述第一数据包的五元组哈希值;According to the quintuple information of the first data packet, calculate the quintuple hash value of the first data packet;
    根据所述第一数据包的五元组哈希值,从所述转发信息表中查找与所述目标TCP流匹配的转发信息。According to the quintuple hash value of the first data packet, the forwarding information matching the target TCP flow is searched from the forwarding information table.
  23. 如权利要求21或22所述的装置,其特征在于,所述转发单元,具体用于:The apparatus according to claim 21 or 22, wherein the forwarding unit is specifically used for:
    若所述第一报文段的Flowlet标识与所述第一Flowlet标识相同,则将所述第一数据包通过所述第一转发路径转发;If the Flowlet identifier of the first packet segment is the same as the first Flowlet identifier, forwarding the first data packet through the first forwarding path;
    若所述第一报文段的Flowlet标识与所述第一Flowlet标识不同,则为所述第一数据包确定第二转发路径,并通过所述第二转发路径转发。If the Flowlet identifier of the first packet segment is different from the first Flowlet identifier, a second forwarding path is determined for the first data packet, and forwarded through the second forwarding path.
  24. 如权利要求23所述的装置,其特征在于,所述装置还包括:The apparatus of claim 23, wherein the apparatus further comprises:
    更新单元,若所述第一报文段的Flowlet标识与所述第一Flowlet标识不同且为第二Flowlet标识,则将所述目标TCP流的参考Flowlet标识更新为所述第二Flowlet标识,以及将所述参考转发路径更新为所述第二转发路径。an update unit, if the Flowlet identifier of the first segment is different from the first Flowlet identifier and is the second Flowlet identifier, updating the reference Flowlet identifier of the target TCP flow to the second Flowlet identifier, and The reference forwarding path is updated to the second forwarding path.
  25. 一种主机,其特征在于,包括处理器、存储器以及通信接口,其中,所述存储器用于存储数据处理程序代码,所述处理器用于调用所述数据处理程序代码来执行权利要求1-8任一项所述的方法。A host is characterized by comprising a processor, a memory and a communication interface, wherein the memory is used for storing data processing program codes, and the processor is used for calling the data processing program codes to execute any of claims 1-8. one of the methods described.
  26. 一种交换机,其特征在于,包括处理器、存储器以及通信接口,其中,所述存储器用于存储数据传输程序代码,所述处理器用于调用所述数据传输程序代码来执行权利要求9-12任一项所述的方法。A switch, characterized in that it includes a processor, a memory and a communication interface, wherein the memory is used to store data transmission program codes, and the processor is used to call the data transmission program codes to execute any of claims 9-12. one of the methods described.
  27. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质存储有计算机程序,当所述计算机程序被主机执行时实现上述权利要求1-8任意一项所述的方法。A computer-readable storage medium, characterized in that, the computer-readable storage medium stores a computer program, and when the computer program is executed by a host, the method described in any one of the preceding claims 1-8 is implemented.
  28. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质存储有计算机程序,当所述计算机程序被交换机执行时实现上述权利要求9-12任意一项所述的方法。A computer-readable storage medium, characterized in that, the computer-readable storage medium stores a computer program, and when the computer program is executed by the switch, the method described in any one of the preceding claims 9-12 is implemented.
  29. 一种计算机程序,其特征在于,所述计算机程序包括指令,当所述计算机程序被主机执行时,使得所述主机执行如权利要求1-8中任意一项所述的方法。A computer program, characterized in that the computer program includes instructions that, when the computer program is executed by a host, cause the host to perform the method according to any one of claims 1-8.
  30. 一种计算机程序,其特征在于,所述计算机程序包括指令,当所述计算机程序被交换机执行时,使得所述交换机执行如权利要求9-12中任意一项所述的方法。A computer program, characterized in that the computer program includes instructions that, when the computer program is executed by a switch, cause the switch to perform the method according to any one of claims 9-12.
PCT/CN2020/119708 2020-09-30 2020-09-30 Data processing method, data transmission method, and related device WO2022067791A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202080105542.9A CN116325708A (en) 2020-09-30 2020-09-30 Data processing and transmitting method and related equipment
PCT/CN2020/119708 WO2022067791A1 (en) 2020-09-30 2020-09-30 Data processing method, data transmission method, and related device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2020/119708 WO2022067791A1 (en) 2020-09-30 2020-09-30 Data processing method, data transmission method, and related device

Publications (1)

Publication Number Publication Date
WO2022067791A1 true WO2022067791A1 (en) 2022-04-07

Family

ID=80949468

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/119708 WO2022067791A1 (en) 2020-09-30 2020-09-30 Data processing method, data transmission method, and related device

Country Status (2)

Country Link
CN (1) CN116325708A (en)
WO (1) WO2022067791A1 (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115277504A (en) * 2022-07-11 2022-11-01 京东科技信息技术有限公司 Network traffic monitoring method, device and system
CN115277568A (en) * 2022-07-20 2022-11-01 重庆星环人工智能科技研究院有限公司 Data sending method, device, equipment and storage medium
CN116366478A (en) * 2023-06-01 2023-06-30 湖北省楚天云有限公司 Data packet contrast deduplication method based on FPGA
CN116708280A (en) * 2023-08-08 2023-09-05 合肥综合性国家科学中心人工智能研究院(安徽省人工智能实验室) Data center network multipath transmission method based on disorder tolerance

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150127797A1 (en) * 2013-11-05 2015-05-07 Cisco Technology, Inc. System and method for multi-path load balancing in network fabrics
CN107634912A (en) * 2016-07-19 2018-01-26 华为技术有限公司 Load-balancing method, device and equipment
CN108270687A (en) * 2016-12-30 2018-07-10 华为技术有限公司 A kind of load balance process method and device
CN110061929A (en) * 2019-03-10 2019-07-26 天津大学 For data center's load-balancing method of asymmetrical network
CN110460537A (en) * 2019-06-28 2019-11-15 天津大学 Data center's asymmetric topology down-off dispatching method based on packet set
US20200067839A1 (en) * 2015-07-02 2020-02-27 Cisco Technology, Inc. Network traffic load balancing

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150127797A1 (en) * 2013-11-05 2015-05-07 Cisco Technology, Inc. System and method for multi-path load balancing in network fabrics
US20200067839A1 (en) * 2015-07-02 2020-02-27 Cisco Technology, Inc. Network traffic load balancing
CN107634912A (en) * 2016-07-19 2018-01-26 华为技术有限公司 Load-balancing method, device and equipment
CN108270687A (en) * 2016-12-30 2018-07-10 华为技术有限公司 A kind of load balance process method and device
CN110061929A (en) * 2019-03-10 2019-07-26 天津大学 For data center's load-balancing method of asymmetrical network
CN110460537A (en) * 2019-06-28 2019-11-15 天津大学 Data center's asymmetric topology down-off dispatching method based on packet set

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115277504A (en) * 2022-07-11 2022-11-01 京东科技信息技术有限公司 Network traffic monitoring method, device and system
CN115277504B (en) * 2022-07-11 2024-04-05 京东科技信息技术有限公司 Network traffic monitoring method, device and system
CN115277568A (en) * 2022-07-20 2022-11-01 重庆星环人工智能科技研究院有限公司 Data sending method, device, equipment and storage medium
CN116366478A (en) * 2023-06-01 2023-06-30 湖北省楚天云有限公司 Data packet contrast deduplication method based on FPGA
CN116366478B (en) * 2023-06-01 2023-08-15 湖北省楚天云有限公司 Data packet contrast deduplication method based on FPGA
CN116708280A (en) * 2023-08-08 2023-09-05 合肥综合性国家科学中心人工智能研究院(安徽省人工智能实验室) Data center network multipath transmission method based on disorder tolerance
CN116708280B (en) * 2023-08-08 2023-10-24 合肥综合性国家科学中心人工智能研究院(安徽省人工智能实验室) Data center network multipath transmission method based on disorder tolerance

Also Published As

Publication number Publication date
CN116325708A (en) 2023-06-23

Similar Documents

Publication Publication Date Title
WO2022067791A1 (en) Data processing method, data transmission method, and related device
US11411770B2 (en) Virtual port channel bounce in overlay network
US20220263767A1 (en) Network Congestion Notification Method, Agent Node, and Computer Device
US9680746B2 (en) Source routing with fabric switches in an ethernet fabric network
US10355879B2 (en) Virtual extensible LAN tunnel keepalives
US8179808B2 (en) Network path tracing method
KR101317969B1 (en) Inter-node link aggregation system and method
US8050180B2 (en) Network path tracing method
US9154394B2 (en) Dynamic latency-based rerouting
US8306039B2 (en) Methods and systems for automatic transport path selection for multi-homed entities in stream control transmission protocol
US9614759B2 (en) Systems and methods for providing anycast MAC addressing in an information handling system
US20160359592A1 (en) Techniques for determining network anomalies in data center networks
US10361954B2 (en) Method and apparatus for processing modified packet
US9059922B2 (en) Network traffic distribution
US20140198793A1 (en) Traffic forwarding in a point multi-point link aggregation using a link selector data table
US9425893B1 (en) Methods and apparatus for implementing optical integrated routing with traffic protection
US9548930B1 (en) Method for improving link selection at the borders of SDN and traditional networks
US10931530B1 (en) Managing routing resources of a network
TWI721103B (en) Cluster accurate speed limiting method and device
WO2022253087A1 (en) Data transmission method, node, network manager, and system
US20220294712A1 (en) Using fields in an encapsulation header to track a sampled packet as it traverses a network
WO2023226633A1 (en) Fault processing method, and related device and system
WO2015096512A1 (en) Packet transmission method and device based on trill network
EP4325800A1 (en) Packet forwarding method and apparatus
WO2022052800A1 (en) Communication system, data processing method and related device

Legal Events

Date Code Title Description
NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20955814

Country of ref document: EP

Kind code of ref document: A1