WO2007064712A2 - Transmission control protocol (tcp) congestion control using transmission delay components - Google Patents

Transmission control protocol (tcp) congestion control using transmission delay components Download PDF

Info

Publication number
WO2007064712A2
WO2007064712A2 PCT/US2006/045702 US2006045702W WO2007064712A2 WO 2007064712 A2 WO2007064712 A2 WO 2007064712A2 US 2006045702 W US2006045702 W US 2006045702W WO 2007064712 A2 WO2007064712 A2 WO 2007064712A2
Authority
WO
WIPO (PCT)
Prior art keywords
window
rtt
max
round trip
trip time
Prior art date
Application number
PCT/US2006/045702
Other languages
French (fr)
Other versions
WO2007064712A3 (en
Inventor
Guglielmo M. Morandin
Original Assignee
Cisco Technology, Inc.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Cisco Technology, Inc. filed Critical Cisco Technology, Inc.
Priority to EP06838585.5A priority Critical patent/EP1955460B1/en
Publication of WO2007064712A2 publication Critical patent/WO2007064712A2/en
Publication of WO2007064712A3 publication Critical patent/WO2007064712A3/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/10Flow control; Congestion control
    • H04L47/22Traffic shaping
    • H04L47/225Determination of shaping rate, e.g. using a moving window
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L1/00Arrangements for detecting or preventing errors in the information received
    • H04L1/12Arrangements for detecting or preventing errors in the information received by using return channel
    • H04L1/16Arrangements for detecting or preventing errors in the information received by using return channel in which the return channel carries supervisory signals, e.g. repetition request signals
    • H04L1/18Automatic repetition systems, e.g. Van Duuren systems
    • H04L1/1867Arrangements specially adapted for the transmitter end
    • H04L1/187Details of sliding window management
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/08Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters
    • H04L43/0852Delays
    • H04L43/0864Round trip delays
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/10Flow control; Congestion control
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/10Flow control; Congestion control
    • H04L47/12Avoiding congestion; Recovering from congestion
    • H04L47/127Avoiding congestion; Recovering from congestion by using congestion prediction
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/10Flow control; Congestion control
    • H04L47/27Evaluation or update of window size, e.g. using information derived from acknowledged [ACK] packets
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/10Flow control; Congestion control
    • H04L47/28Flow control; Congestion control in relation to timing considerations
    • H04L47/283Flow control; Congestion control in relation to timing considerations in response to processing delays, e.g. caused by jitter or round trip time [RTT]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L69/00Network arrangements, protocols or services independent of the application payload and not provided for in the other groups of this subclass
    • H04L69/16Implementation or adaptation of Internet protocol [IP], of transmission control protocol [TCP] or of user datagram protocol [UDP]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L69/00Network arrangements, protocols or services independent of the application payload and not provided for in the other groups of this subclass
    • H04L69/16Implementation or adaptation of Internet protocol [IP], of transmission control protocol [TCP] or of user datagram protocol [UDP]
    • H04L69/163In-band adaptation of TCP data exchange; In-band control procedures
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04JMULTIPLEX COMMUNICATION
    • H04J3/00Time-division multiplex systems
    • H04J3/02Details
    • H04J3/06Synchronising arrangements
    • H04J3/0635Clock or time synchronisation in a network
    • H04J3/0682Clock or time synchronisation in a network by delay compensation, e.g. by compensation of propagation delay or variations thereof, by ranging

Definitions

  • the present invention generally relates to the congestion control. More specifically, the present invention provides techniques and mechanisms for improving the transmission control protocol (TCP), particularly for transmitting data such as storage application data.
  • TCP transmission control protocol
  • TCP provides reliability, network adaptability, congestion control and flow control.
  • Reliability is generally provided by using mechanisms such as sequence numbers to enable retransmission.
  • Network adaptability and flow control are generally provided by using mechanisms such as windows. A window limits the amount of data that can be transmitted onto a network.
  • TCP congestion control mechanisms work well for many types of data transmissions.
  • conventional TCP congestion control mechanisms often do not work adequately for delay sensitive or bursty data, such as data associated with an Internet Protocol (IP) Storage Application, especially when the bandwidth, delay, and optimal window sizes of a connection are large.
  • IP Internet Protocol
  • TCP does not work adequately for transferring data associated with Storage Area Networks (SANs).
  • Some improvements to TCP such as FastTCP, described in "FastTCP: Motivation, Architecture, Algorithms, Performance” by Chen Jin, David Wei, and Steven Low, IEEE Infocom, March 2004, Hong Kong, address some concerns associated with TCP but still have a number of limitations.
  • TCP Transmission Control Protocol
  • a maximum send window is adjusted using forward queuing delay and maximum bandwidth parameters. Reverse queuing delay and the number of packets drops are not typically factored into generation of the maximum send window, even though recent packet drops do cause the send window to decrease.
  • By controlling the maximum send window size using an estimate of the forward congestion delay network buffer occupation is bounded and a congestion window is effectively varied using rate shaping.
  • the congestion window size gradually increases based at least partially on the number of recently acknowledged bytes.
  • a technique for performing congestion control using a transmission control protocol is provided.
  • a forward delay component of a round trip time associated with sending data associated with a flow from a source node and receiving an acknowledgment from a destination node is determined using the transmission control protocol (TCP).
  • a maximum window is adjusted by using the forward delay component and an observed minimum round trip time.
  • a network device is configured to perform congestion control using a transmission control protocol (TCP).
  • TCP transmission control protocol
  • the network device includes an interface and a processor.
  • the interface is coupled to an Internet Protocol (IP) network.
  • IP Internet Protocol
  • the processor is operable to determine a forward delay component of a round trip time associated with sending data associated with a flow and receiving an acknowledgment from a destination node.
  • the destination node is connected to the interface using the transmission control protocol (TCP).
  • TCP transmission control protocol
  • the processor is also configured to adjust a maximum window by using the forward delay component and an observed minimum round trip time.
  • the window size is decreased even before any packet drop is detected.
  • Figure 1 is a diagrammatic representation showing network nodes that can use the techniques of the present invention.
  • FIG. 2 is a diagrammatic representation showing a TCP transmission stream.
  • Figure 3 is a diagrammatic representation showing a TCP sliding window.
  • Figure 4 is a flow process diagram showing one technique for updating a window.
  • Figure 5 is a diagrammatic representation of a device that can use the techniques of the present invention.
  • TCP transmission control protocol
  • the techniques of the present invention can be applied to different variations and flavors of TCP as well as to alternatives to TCP and other network protocols that have a congestion control component.
  • numerous specific details are set forth in order to provide a thorough understanding of the present invention. The present invention may be practiced without some or all of these specific details. In other instances, well known process operations have not been described in detail in order not to unnecessarily obscure the present invention.
  • TCP transmission control protocol
  • the transmission control protocol is a transport layer protocol that provides full-duplex, stream-oriented, connections allowing for reliable transmissions, network adaptation, and flow control.
  • TCP provides transmission of streams of bytes in full-duplex. Traffic flows in both the forward and reverse directions. Only during connection start and close sequences can TCP exhibit asymmetric behavior. Data transmissions are organized into different connections.
  • TCP arranges for retransmission if it determines that data has been lost. Plain TCP learns about delay characteristics associated with a network and attempts to adjust its operation to maximize throughput by adjusting its retransmission timer.
  • TCP uses 32-bit sequence numbers that identify bytes in the data stream. Each TCP packet includes the starting sequence number of the data in that packet, and the sequence number (also referred to as an acknowledgment number) of the last byte received from the remote peer. Forward and reverse sequence numbers are independent, and each TCP peer tracks both its own sequence numbering and the numbering being used by the remote peer. TCP also uses a number of flags to manage connections.
  • TCP provides adaptability and flow control by using windows.
  • TCP attempts to manage the amount of data transmitted onto a network.
  • a window limits the amount of data not yet acknowledged that can be transmitted onto a network.
  • no other data can be sent.
  • the window slides and additional data can be sent. If no acknowledgment is received after a predetermined time out period, the oldest packet is assumed to have been lost and the data is retransmitted.
  • TCP varies the size of the window based on whether or not an acknowledgment is received.
  • Any window that is varied in size based on transmission characteristics is referred to herein as a congestion window.
  • a congestion window grows by one segment every time a positive acknowledgment is received. Consequently, the sender not only can send new data based on the acknowledgment being received but can also send new data based on the increased window size.
  • the scheme is often too aggressive, as the quickly growing window size will eventually cause too much data to be transmitted onto the network and lead to packet drops.
  • the congestion window typically shrinks to a single segment every time the sender is idle for more than a retransmission timeout.
  • the congestion window then gradually grows based on successful transmissions.
  • the congestion window grows linearly if the TCP Congestion Avoidance scheme is being used and the congestion window grows exponentially if the TCP Slow Start scheme is being used.
  • Congestion Avoidance and Slow Start are described in RFC 2001.
  • the growth of the congestion window is often slow, particularly because the congestion window shrinks to a very small size. Consequently, neither Congestion Avoidance nor Slow Start are effective for bursty data such as data from storage applications in storage area networks.
  • the conservative growth of the congestion window may also not be suitable for delay sensitive data such as real-time video data.
  • TCP typically performs no traffic shaping of any sort. As a congestion window is growing in size, bursty traffic can be transmitted suddenly onto a network without regard to any traffic shaping parameters. Consequently, bursty traffic can end up flooding network queues and at the minimum creating unnecessary delay affecting other traffic and risking buffer overflow.
  • FastTCP developed by the Networking Group led by Steven Low at CalTech, is described in "FastTCP: Motivation, Architecture, Algorithms, Performance” by Chen Jin, David Wei, and Steven Low, IEEE Infocom, March 2004, Hong Kong.
  • FastTCP updates the maximum transmit window size using round trip times and the current value of the window. Window sizes for transmissions from a source for a particular flow are adjusted using round trip time. Packets drops are no longer considered as a primary factor in adjusting window sizes, even tough FastTCP honors the standard TCP semantics and shrinks window sizes when packet drops are detected. Round trip times provide a workable mechanism for adjusting window sizes and transmit rates.
  • the techniques of the present invention recognize that round trip times are sensitive to congestion happening in the forward direction from a source to a destination as well as congestion happening in the reverse direction from the destination back to the source. It is recognized that it would be preferable to consider only forward direction congestion, since forward direction congestion or forward queuing is what should affect window sizes and transmission rates from a source. Conventional TCP and FastTCP both are sensitive to reverse direction congestion. Reverse direction congestion or reverse queuing should only affect window sizes and transmission rates from other network nodes, including the peer.
  • FastTCP also does not put any bound on network buffer consumption.
  • aggregate network buffer usage grows linearly with the number of FastTCP connections in a network.
  • the techniques of the present invention recognize that this is not desirable because it provides that there is no upper bound to the amount of buffer required in network nodes.
  • all FastTCP connections congest the same queue, and if the queue length is insufficient to accommodate them, drops still occur, defeating some of the primary benefits of FastTCP.
  • the techniques of the present invention adjust window sizes using at least the forward component of network delay. That is, forward direction delay associated with queuing for transmission from a source to a destination is considered a primary factor in adjusting window sizes.
  • a maximum window size as well as a congestion window size are calculated using values associated with forward direction delay.
  • the amount of buffer consumed in a network by a particular flow is controlled in order to limit the total amount of buffer usage. Transmission rates are also controlled using traffic shaping and managed window size changes.
  • a variable rate shaper is used to pace packet introduction onto the network to avoid bursts, such as bursts associated with data from a disk array. The variable rate shaper maximum rate is calculated based on the maximum window size and the measured round trip time.
  • the congestion window is controlled based on the forward component of the network delay, as opposed to the full round trip time.
  • the amount of bottleneck buffer consumed by each connection is also controlled in order to limit the total amount of buffer usage. Buffer usage no longer grows linearly based on the number of flows.
  • the techniques and mechanisms of the present invention provide a number of improvements to conventional TCP and FastTCP.
  • Various embodiments of the present will be referred to herein as Internet Protocol Storage (IPS) FastTCP.
  • IPS Internet Protocol Storage
  • the techniques and mechanisms of the present invention provide relatively smooth traffic flow, reduced probability of packet drops and required retransmissions, reduced bottleneck queuing, and more optimal bandwidth usage. Transmission windows are not distorted by congestion on the reverse path and total buffer usage in the network is bounded.
  • the techniques and mechanisms of the present invention can be applied to a variety of networks, including IP storage associated networks, where packet drops are particularly undesirable.
  • the techniques of the present invention increase overall performance by reducing the probability of drops and increasing overall bandwidth usage.
  • Figure 1 is a diagrammatic representation showing a network topology that can use the techniques of the present invention.
  • a storage area network 101 includes hosts 121 and 123 along with storage node 125.
  • Storage node 125 may include a disk or tape array.
  • the storage area network 101 can also include multiple fibre channel switches.
  • the storage area network 101 is coupled to an IP network 103 through a tunneling switch 111.
  • Storage area network 105 includes host 127, storage 129, as well as other fibre channel switches and tunneling switch 113.
  • the tunneling switches 111 and 113 allow the formation of a tunnel to transmit storage network data over an IP network 103.
  • improvements to TCP can be implemented at any source originating traffic or at any destination receiving traffic.
  • improvements to TCP can be implemented at hosts in a storage area network.
  • improvements to TCP can be implemented at tunneling switches connecting storage area networks to an IP network.
  • the techniques can be implemented anywhere TCP is implemented. TCP typically allows the transmission of data using windows.
  • Figure 2 is a diagrammatic representation showing a window.
  • a data stream is separated into different parts.
  • Portion 211 is data that has been sent and acknowledged.
  • Portion 213 is data has been sent but not yet acknowledged. In some examples, this portion 213 includes data that has been retransmitted one or more times. This part is often referred to as the flight size 203.
  • Another part of the data stream is referred to as the usable window 205.
  • the usable window is the portion 215 that can be sent but has not yet been sent over the network.
  • the usable window has a non-zero size when space is available in window 201.
  • the flight size 203 and the usable window 205 together are referred to as a window 201, transmission window, or congestion window.
  • the data stream can also include data 217 that can not yet be sent over the network.
  • Figure 3 is a diagrammatic representation showing a sliding window. As data is acknowledged, a previous window 303 slides over to a current window 301 position. As data is transmitted and acknowledged, the current window continues to move to incorporate more data that has not been sent. For example, an acknowledgment may be sent for every 2 packets received by a destination. Each acknowledgment detected can shift the current window 301 over a predetermined amount. The window typically slides by the amount of data that has been acknowledged. The window typically changes size based on slow start and congestion control mechanisms.
  • Figure 4 is a simplified flow process diagram showing a technique for adjusting a congestion window. It should be noted that the flow processes depicted herein are merely examples of techniques of the present invention. A variety of other techniques can also be applied. Some of the processes depicted are optional while other processes may be added while remaining within the scope of the present invention, hi some instances, details associated with the process operations may not be described in order not to unnecessarily obscure the present invention.
  • space is available if a window is not full of data sent but not yet acknowledged. If no space is available, the sender waits for acknowledgments corresponding to sent packets or for sent packet time outs. If space is available in a window, data is transmitted 405. In one example, data 1 is continuously transmitted as long as space is available in a window. Various counters, timers, and sequence numbers can be maintained to allow for reliable retransmission. As data is transmitted, a decreasing amount of space is available in the window for transmission until acknowledgments are received.
  • the window now has more space to incorporate new data to be transmitted.
  • the window slides to include new data for transmission at 417.
  • the window is enlarged by one segment.
  • the window is collapsed to a very small size, often to one or two segments at 415. This is often a very drastic remedy for addressing dropped packets.
  • FTP file transfer protocol
  • a congestion window is adjusted every time TCP input is invoked.
  • a maximum send window (snd_hiwat) is adjusted once per round trip time, after the previous window has been fully acknowledged.
  • FastTCP adjusts window sizes based on the delay associated with network buffering and changes the window size to the smaller of either twice the current window size or a window size modified by a control function dependent on round trip times. FastTCP adjusts window sizes using the following equation:
  • Equation 1 Equation 1 where: w is a congestion window; RTT is the round trip time; baseRTT is the observed minimum round trip time; delay(t) is a delay associated with network queuing; ⁇ is an empirically determined control constant used to govern congestion window changes.
  • the parameter a is held constant. It can be shown that on equilibrium a is the average sum of the buffer occupation for the connection. Since the metric used to determine congestion is the round trip time, FastTCP is sensitive to congestion happening in both directions of the path between two network endpoints. The TCP flows converge to a global stability point, but the configuration of flows does not necessarily maximize usage of the available bandwidth. For example, the resulting delay may all be associated with delay on the reverse path or the return path. The techniques of the present invention recognize that delay on the return path or the reverse path should not affect congestion window sizes.
  • the forward delay component is associated with a reverse delay component and in many instances, it will be recognized that simple substitutions will allow calculation of a maximum window using forward delay components.
  • Reverse delay can be calculated in a variety of manners. In one example, accurately synchronized source and destination nodes will allow determination of forward and reverse delays. In other examples, reverse delay can be determined using the techniques and mechanisms described in concurrently filed U.S. Patent Application No. 11/291,251, titled METHODS AND APPARATUS FOR DETERMINING REVERSE PATH DELAY (Atty Docket No. CISCP465) by Guglielmo M. Morandin, the entirety of which is incorporated by reference for all purposes.
  • reverse delay is calculated by maintaining accurate measurements at a source node and receiving timestamp packets from a destination node.
  • Any packet or acknowledge including a timestamp is referred to herein as a timestamp packet.
  • a timestamp packet is a typical acknowledgment with timestamp information.
  • the destination node has a timestamp speed at which timestamp packets are incremented, for example 10ms
  • the reverse delay is the timestamp value subtracted from the timestamp expected all divided by the timestamp speed.
  • the techniques and mechanisms of the present invention provide a maximum window that is used to adjust a congestion window. It should be noted that the maximum window is merely a target window and not necessarily an absolute maximum. According to various embodiments, the maximum window is adjusted using the forward delay component associated with a round trip time. Other factors such as an observed minimum round trip time or a maximum bandwidth can also be considered.
  • the maximum window is calculated using the following equation:
  • RTl ⁇ t baseRTT + qdelayit) + r delay (t)
  • Equation 3 Equation 3 where: cwnd is a congestion window; sndjtiw ⁇ t is the maximum window;
  • RTT(t) is the round trip time
  • b ⁇ seRTT is the observed minimum round trip time
  • r delay (t) is a reverse delay component of the round trip time
  • qdelay(t) is the forward delay component of the round trip time
  • L is a target buffer occupation
  • max JDW is a maximum bandwidth
  • K p is an empirically determined proportional controller constant between 0 and 1.
  • the maximum window is evaluated every time a current window is acknowledged.
  • the baseRTT is used to determine the network delay associated with a network free of traffic.
  • congestion delay is merely one source of delay in the network.
  • Other sources of delay include router processing and forwarding delays, source and destination delays, latency delays, etc.
  • the techniques of the present invention provide mechanisms for measuring forward delay independent of baseRTT.
  • the techniques and mechanisms of the present invention are immune to congestion occurring in the reverse direction.
  • the equilibrium point is achieved when each connection maintains ⁇ bytes queued in the forward direction.
  • the max_bw or maximum bandwidth is a static property of the various network nodes and link in a network and does not depend on the amount of traffic present in the network. Consequently, the max_bw can be empirically determined and/or manually configured. The value can also be estimated and autoconflgured in other examples.
  • the FastTCP buffer utilization is proportional to the number of active flows.
  • the techniques of the present invention can impose an upper bound on buffer usage by using a variable ⁇ function. This avoids a significant drawback of FastTCP.
  • the function ⁇ determines the equilibrium point for optimized distribution of network bandwidth. In general, it is desirable to reduce the value of the ⁇ function when many flows are traversing the same bottleneck buffer, and raise it again if the number of flows decreases. Unfortunately, without collaboration from the bottleneck, it is not possible to determine the number of flows. Furthermore, in general, not all the flows might be willing to send at the same rate, so a simple count of the flows would not be the best input to the ⁇ function.
  • a simple proportional controller is used, although more complex controllers such as proportional integral derivative (PID) controllers can also be used.
  • PID proportional integral derivative
  • Equation 3 Given a-priori knowledge of the bottleneck bandwidth, a target buffer occupation L is compared to an estimate of the current buffer occupation obtained from the measured qdelay. An increase in qdelay is considered a consequence of increased amounts of traffic, generated by other sources, converging on the bottleneck and caasing the bottleneck buffer occupation to grow.
  • the proportional controller in Equation 3 applies negative feedback to the system, reducing ⁇ and consequently reducing the contribution of each flow to the bottleneck buffer utilization.
  • AIMD Additive Increase/ Multiplicative Decrease
  • a proportional controller preferably has both good fairness and a quick response to changing network conditions.
  • the techniques of the present invention provide a proportional controller that allows a system to reach a stable equilibrium point in which all the connections on a bottleneck achieve the same constant throughput, not just a variable throughput whose average is the ideal one.
  • the proportional controller also calculates a new value for every RTT and is based on the last estimate of the queuing delay. This is important to avoid drops when network conditions change, for example when a new flow activates, and to be able to consume all the available bandwidth when flows depart.
  • the techniques of the present invention reduce burstiness by applying variable rate shaping and by allowing the congestion window to grow only gradually.
  • the packet transmission process is shaped according to the following equation:
  • Equation 5 Equation 5 where: max_rate is a transmission rate; sndjkiwat is the maximum window; and baseRTT is the observed minimum round trip time.
  • Shaping is implemented by smoothly delaying packet departures in order to obtain the desired send rate. Of course new packets are sent only if they belong to the usable portion of the window.
  • techniques of the present invention not only provide more efficient and effective ways for determining congestion window sizes but also provide a variable shaper to smooth bursts of data.
  • the shaper is programmed to a rate 1/32 faster than the average one achievable using the new target window, but still shapes at a rate close enough to the achievable rate in order to better spread the transmission of packets.
  • the correcting factor allows full utilization of the usable window portion.
  • shaping is important when the target window is lower than bottleneck _bandwidth • rtt , i.e. below equilibrium, or in presence of other competing TCP connections. It should be noted that even if the congestion window is held constant, and the packets are initially evenly spread across the whole RTT, packets tend to cluster in time in the absence of shaping, resulting in rate variations from the average, and network buffer oscillations.
  • the congestion window is updated for every positive acknowledgment received.
  • the change in the congestion window is calculated using the following equation:
  • Equation 6 Equation 6 where: cwnd is a congestion window; sndjiiwat is the maximum window; qdelay (t) is the forward delay component of the round trip time;
  • L is a target buffer occupation
  • maxjbw is a maximum bandwidth
  • K p is an empirically determined proportional controller constant between 0 and 1.
  • the techniques and mechanisms of the present invention recognize that growing the congestion window using conventional TCP mechanisms is inadequate. Growth according to available TCP mechanisms is too slow and reduction is too drastic. Consequently, the techniques of the present invention provide for more gradual changes to congestion window sizes.
  • Another corner case can happen after an IPS FastTCP failure, i.e. after a drop event and the subsequent recovery.
  • the number of bytes acknowledged could have a value significantly larger than normal, resulting in a sudden window growth that could cause other drops.
  • the number of bytes acknowledged is limited.
  • the variable holding the number of bytes acknowledged, when used to grow the congestion window cwnd is limited to 16000 bytes. This value has been empirically chosen, but different values might prove effective. Of course the actual value of acked is used as usual to slide the window and discard newly acknowledged data.
  • Equation 6 gives only the target cwnd growth over the next round trip time.
  • Equation 6 gives only the target cwnd growth over the next round trip time.
  • Equation 8 allows a gradual update every time a positive acknowledge is processed. However, more state information than usual needs to be kept in the TCP control block.
  • the techniques of the present invention also recognize that in some instances, the baseRTT variable may have to be recalculated. For example, the base round trip time for a particular source destination pair may change when network topology changes. One trivial case happens when the delay is decreasing. In that case it is sufficient to use the new observed RTT as the new baseRTT. A difficult case is one that results from increases in the propagation delay, because the increase could also result from network congestion.
  • the techniques of the present invention recognize that the variable baseRTT should not be increased as a result of network congestion.
  • the TCP stack keeps track of two recently observed RTTs in order to properly update the baseRTT variable and uses the following equation:
  • baseRTT min(min_ rtt, prev _ min_ rtt)
  • Equation 9 Equation 9) where: prev_min_rtt is the best estimate of baseRTT on a period prior to the previous 30 seconds epoch; and minj'tt is the smallest RTT observed in the last 30 seconds epoch.
  • the prev_min_rtt and minj-tt are compared periodically. In some examples, the values are compared every 30 seconds. Only if minj-tt is sufficiently bigger than prev_min_rtt, the value of prev_min_rtt is minj'tt, otherwise no change is made.
  • the rationale is that a large amount of congestion is unlikely to persist for 30 seconds, so during the 30 second interval the measured round trip time should oscillate and reach a value sufficiently close to the previously observed minimum. If instead the increased round trip time is due to a longer path, the observed RTT does not oscillate close to the old minimum, but is more likely to be exactly the new (bigger) minimum, since congestion is unlikely because all connections are trying to adapt to the longer path.
  • the minjrtt value can be updated using the following equations:
  • min_delta is the minimum time increase that is considered still due to congestion.
  • the minjrtt value can be adjusted when significant routing changes occur.
  • a source can control the send process of a destination by modulating windows associated with the destination.
  • the techniques of the present invention can be used to pace acknowledgements to a destination so that windows at a destination can be modulated based on computations at a source.
  • the techniques of the present invention can be implemented on a variety of devices such as hosts and switches.
  • the improvements to TCP can be implemented at any source originating traffic or destination receiving traffic. To be effective, it does not need to be implemented on both.
  • the improvements to TCP can also be implemented at tunneling switches used to transmit storage application data over IP networks.
  • FIG. 5 is a diagrammatic representation of one example of a fibre channel switch that can be used to implement techniques of the present invention. Although one particular configuration will be described, it should be noted that a wide variety of switch and router configurations are available.
  • the tunneling switch 501 may include one or more supervisors 511. According to various embodiments, the supervisor 511 has its own processor, memory, and storage resources.
  • Line cards . 503, 505, and 507 can communicate with an active supervisor 511 through interface circuitry 583, 585, and 587 and the backplane 515.
  • each line card includes a plurality of ports that can act as either input ports or output ports for communication with external fibre channel network entities 551 and 553.
  • the backplane 515 can provide a communications channel for all traffic between line cards and supervisors.
  • Individual line cards 503 and 507 can also be coupled to external fibre channel network entities 551 and 553 through fibre channel ports 543 and 547.
  • External fibre channel network entities 551 and 553 can be nodes such as other fibre channel switches, disks, RAIDS, tape libraries, or servers. It should be noted that the switch can support any number of line cards and supervisors. In the embodiment shown, only a single supervisor is connected to the backplane 515 and the single supervisor communicates with many different line cards.
  • the active supervisor 511 may be configured or designed to run a plurality of applications such as routing, domain manager, system manager, and utility applications.
  • the switch also includes line cards 575 and 577 with IP interfaces 565 and 567.
  • the IP port 565 is coupled to an external IP network entity 555.
  • each IP line card includes a plurality of ports that can act as either input ports or output ports for communication with external IP entities 555. These IP entities could be IP switches or routers, or directly attached network endnodes.
  • the line cards 575 and 577 can also be coupled to the backplane 515 through interface circuitry 595 and 597.
  • the switch can have a single IP port and a single fibre channel port.
  • two fibre channel switches used to form an FCIP tunnel each have one fibre channel line card and one IP line card.
  • Each fibre channel line card connects to an external fibre channel network entity and each IP line card connects to a shared IP network.
  • a fibre channel switch performs gateway functions between hosts or disks connected to an IP network and host or disks connected to a fibre channel network, hi various embodiments, the techniques of the present invention do not need to be implemented by both TCP end points in order to be effective.
  • the above-described embodiments may be implemented in a variety of network devices (e.g., servers) as well as in a variety of mediums.
  • instructions and data for implementing the above-described invention may be stored on a disk drive, a hard drive, a floppy disk, a server computer, or a remotely networked computer. Accordingly, the present embodiments are to be considered as illustrative and not restrictive, and the invention is not to be limited to the details given herein, but may be modified within the scope and equivalents of the appended claims.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Computer Security & Cryptography (AREA)
  • Environmental & Geological Engineering (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

According to the present invention, methods and apparatus are provided to improve the Transmission Control Protocol (TCP) for data such as delay sensitive or bursty data. A maximum send window is adjusted using forward queuing delay and maximum bandwidth parameters. Reverse queuing delay and the number of packets drops are not factored into generation of the maximum send window. Network buffer occupation is bounded and a congestion window is effectively varied using rate shaping and gradual size increases based at least partially on the number acknowledged packets.

Description

TRANSMISSION CONTROL PROTOCOL (TCP) CONGESTION CONTROL USING TRANSMISSION DELAY COMPONENTS
BACKGROUND OF THE INVENTION
1. Field of the Invention.
The present invention generally relates to the congestion control. More specifically, the present invention provides techniques and mechanisms for improving the transmission control protocol (TCP), particularly for transmitting data such as storage application data.
2. Description of Related Art
TCP provides reliability, network adaptability, congestion control and flow control. Reliability is generally provided by using mechanisms such as sequence numbers to enable retransmission. Network adaptability and flow control are generally provided by using mechanisms such as windows. A window limits the amount of data that can be transmitted onto a network.
Conventional TCP congestion control mechanisms work well for many types of data transmissions. However, conventional TCP congestion control mechanisms often do not work adequately for delay sensitive or bursty data, such as data associated with an Internet Protocol (IP) Storage Application, especially when the bandwidth, delay, and optimal window sizes of a connection are large. In one example, TCP does not work adequately for transferring data associated with Storage Area Networks (SANs). Some improvements to TCP such as FastTCP, described in "FastTCP: Motivation, Architecture, Algorithms, Performance" by Chen Jin, David Wei, and Steven Low, IEEE Infocom, March 2004, Hong Kong, address some concerns associated with TCP but still have a number of limitations.
Consequently, it is desirable to provide techniques for improving TCP to allow more effective and efficient transmission of data such as delay sensitive and bursty data, in order to greatly reduce the probability of packet drops while minimizing delay. SUMMARY OF THE INVENTION
According to the present invention, methods and apparatus are provided to improve the Transmission Control Protocol (TCP) for data such as delay sensitive or bursty data. A maximum send window is adjusted using forward queuing delay and maximum bandwidth parameters. Reverse queuing delay and the number of packets drops are not typically factored into generation of the maximum send window, even though recent packet drops do cause the send window to decrease. By controlling the maximum send window size using an estimate of the forward congestion delay, network buffer occupation is bounded and a congestion window is effectively varied using rate shaping. The congestion window size gradually increases based at least partially on the number of recently acknowledged bytes.
In one embodiment, a technique for performing congestion control using a transmission control protocol (TCP) is provided. A forward delay component of a round trip time associated with sending data associated with a flow from a source node and receiving an acknowledgment from a destination node is determined using the transmission control protocol (TCP). A maximum window is adjusted by using the forward delay component and an observed minimum round trip time.
In another embodiment, a network device is configured to perform congestion control using a transmission control protocol (TCP). The network device includes an interface and a processor. The interface is coupled to an Internet Protocol (IP) network. The processor is operable to determine a forward delay component of a round trip time associated with sending data associated with a flow and receiving an acknowledgment from a destination node. The destination node is connected to the interface using the transmission control protocol (TCP). The processor is also configured to adjust a maximum window by using the forward delay component and an observed minimum round trip time.
In some embodiments, when the estimated forward congestion delay increases, the window size is decreased even before any packet drop is detected. A further understanding of the nature and advantages of the present invention may be realized by reference to the remaining portions of the specification and the drawings.
BRIEF DESCRPTION OF THE DRAWINGS
The invention may best be understood by reference to the following description taken in conjunction with the accompanying drawings, which are illustrative of specific embodiments of the present invention.
Figure 1 is a diagrammatic representation showing network nodes that can use the techniques of the present invention.
Figure 2 is a diagrammatic representation showing a TCP transmission stream.
Figure 3 is a diagrammatic representation showing a TCP sliding window.
Figure 4 is a flow process diagram showing one technique for updating a window.
Figure 5 is a diagrammatic representation of a device that can use the techniques of the present invention.
DETAILED DESCRIPTION OF SPECIFIC EMBODIMENTS
Reference will now be made in detail to some specific embodiments of the invention including the best modes contemplated by the inventors for carrying out the invention. Examples of these specific embodiments are illustrated in the accompanying drawings. While the invention is described in conjunction with these specific embodiments, it will be understood that it is not intended to limit the invention to the described embodiments. On the contrary, it is intended to cover alternatives, modifications, and equivalents as may be included within the spirit and scope of the invention as defined by the appended claims.
For example, the techniques of the present invention will be described in the context of the transmission control protocol (TCP). However, it should be noted that the techniques of the present invention can be applied to different variations and flavors of TCP as well as to alternatives to TCP and other network protocols that have a congestion control component. In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention. The present invention may be practiced without some or all of these specific details. In other instances, well known process operations have not been described in detail in order not to unnecessarily obscure the present invention.
Furthermore, techniques and mechanisms of the present invention will sometimes be described in singular form for clarity. However, it should be noted that some embodiments can include multiple iterations of a technique or multiple instantiations of a mechanism unless noted otherwise. For example, a processor is used in a variety of contexts. However, it will be appreciated that multiple processors can also be used while remaining within the scope of the present invention.
The transmission control protocol (TCP) is a transport layer protocol that provides full-duplex, stream-oriented, connections allowing for reliable transmissions, network adaptation, and flow control. TCP provides transmission of streams of bytes in full-duplex. Traffic flows in both the forward and reverse directions. Only during connection start and close sequences can TCP exhibit asymmetric behavior. Data transmissions are organized into different connections.
Reliability is provided using sequence numbers to track what data has been transmitted and received for each particular connection. TCP arranges for retransmission if it determines that data has been lost. Plain TCP learns about delay characteristics associated with a network and attempts to adjust its operation to maximize throughput by adjusting its retransmission timer. TCP uses 32-bit sequence numbers that identify bytes in the data stream. Each TCP packet includes the starting sequence number of the data in that packet, and the sequence number (also referred to as an acknowledgment number) of the last byte received from the remote peer. Forward and reverse sequence numbers are independent, and each TCP peer tracks both its own sequence numbering and the numbering being used by the remote peer. TCP also uses a number of flags to manage connections. TCP provides adaptability and flow control by using windows. To avoid continuously overflowing various network buffers, TCP attempts to manage the amount of data transmitted onto a network. In typical instances, a window limits the amount of data not yet acknowledged that can be transmitted onto a network. When the window is full of data transmitted by not yet acknowledged, no other data can be sent. When an acknowledgment is received, the window slides and additional data can be sent. If no acknowledgment is received after a predetermined time out period, the oldest packet is assumed to have been lost and the data is retransmitted. Some TCP flavors use more sophisticated techniques that take advantage of the selective acknowledgement options.
In some instances, TCP varies the size of the window based on whether or not an acknowledgment is received. Any window that is varied in size based on transmission characteristics is referred to herein as a congestion window. In one example, a congestion window grows by one segment every time a positive acknowledgment is received. Consequently, the sender not only can send new data based on the acknowledgment being received but can also send new data based on the increased window size. However, the scheme is often too aggressive, as the quickly growing window size will eventually cause too much data to be transmitted onto the network and lead to packet drops. Similarly, the congestion window typically shrinks to a single segment every time the sender is idle for more than a retransmission timeout.
The congestion window then gradually grows based on successful transmissions. The congestion window grows linearly if the TCP Congestion Avoidance scheme is being used and the congestion window grows exponentially if the TCP Slow Start scheme is being used. Congestion Avoidance and Slow Start are described in RFC 2001. However, the growth of the congestion window is often slow, particularly because the congestion window shrinks to a very small size. Consequently, neither Congestion Avoidance nor Slow Start are effective for bursty data such as data from storage applications in storage area networks. The conservative growth of the congestion window may also not be suitable for delay sensitive data such as real-time video data. Furthermore, TCP typically performs no traffic shaping of any sort. As a congestion window is growing in size, bursty traffic can be transmitted suddenly onto a network without regard to any traffic shaping parameters. Consequently, bursty traffic can end up flooding network queues and at the minimum creating unnecessary delay affecting other traffic and risking buffer overflow.
FastTCP, developed by the Networking Group led by Steven Low at CalTech, is described in "FastTCP: Motivation, Architecture, Algorithms, Performance" by Chen Jin, David Wei, and Steven Low, IEEE Infocom, March 2004, Hong Kong. FastTCP updates the maximum transmit window size using round trip times and the current value of the window. Window sizes for transmissions from a source for a particular flow are adjusted using round trip time. Packets drops are no longer considered as a primary factor in adjusting window sizes, even tough FastTCP honors the standard TCP semantics and shrinks window sizes when packet drops are detected. Round trip times provide a workable mechanism for adjusting window sizes and transmit rates.
However, the techniques of the present invention recognize that round trip times are sensitive to congestion happening in the forward direction from a source to a destination as well as congestion happening in the reverse direction from the destination back to the source. It is recognized that it would be preferable to consider only forward direction congestion, since forward direction congestion or forward queuing is what should affect window sizes and transmission rates from a source. Conventional TCP and FastTCP both are sensitive to reverse direction congestion. Reverse direction congestion or reverse queuing should only affect window sizes and transmission rates from other network nodes, including the peer.
FastTCP also does not put any bound on network buffer consumption. In fact, aggregate network buffer usage grows linearly with the number of FastTCP connections in a network. The techniques of the present invention recognize that this is not desirable because it provides that there is no upper bound to the amount of buffer required in network nodes. In the presence of a single bottleneck, all FastTCP connections congest the same queue, and if the queue length is insufficient to accommodate them, drops still occur, defeating some of the primary benefits of FastTCP.
Consequently, the techniques of the present invention adjust window sizes using at least the forward component of network delay. That is, forward direction delay associated with queuing for transmission from a source to a destination is considered a primary factor in adjusting window sizes. A maximum window size as well as a congestion window size are calculated using values associated with forward direction delay. The amount of buffer consumed in a network by a particular flow is controlled in order to limit the total amount of buffer usage. Transmission rates are also controlled using traffic shaping and managed window size changes. A variable rate shaper is used to pace packet introduction onto the network to avoid bursts, such as bursts associated with data from a disk array. The variable rate shaper maximum rate is calculated based on the maximum window size and the measured round trip time. The congestion window is controlled based on the forward component of the network delay, as opposed to the full round trip time. The amount of bottleneck buffer consumed by each connection is also controlled in order to limit the total amount of buffer usage. Buffer usage no longer grows linearly based on the number of flows.
The techniques and mechanisms of the present invention provide a number of improvements to conventional TCP and FastTCP. Various embodiments of the present will be referred to herein as Internet Protocol Storage (IPS) FastTCP. The techniques and mechanisms of the present invention provide relatively smooth traffic flow, reduced probability of packet drops and required retransmissions, reduced bottleneck queuing, and more optimal bandwidth usage. Transmission windows are not distorted by congestion on the reverse path and total buffer usage in the network is bounded. The techniques and mechanisms of the present invention can be applied to a variety of networks, including IP storage associated networks, where packet drops are particularly undesirable. The techniques of the present invention increase overall performance by reducing the probability of drops and increasing overall bandwidth usage. Figure 1 is a diagrammatic representation showing a network topology that can use the techniques of the present invention. A storage area network 101 includes hosts 121 and 123 along with storage node 125. Storage node 125 may include a disk or tape array. The storage area network 101 can also include multiple fibre channel switches. The storage area network 101 is coupled to an IP network 103 through a tunneling switch 111. Storage area network 105 includes host 127, storage 129, as well as other fibre channel switches and tunneling switch 113. The tunneling switches 111 and 113 allow the formation of a tunnel to transmit storage network data over an IP network 103.
According to various embodiments, improvements to TCP can be implemented at any source originating traffic or at any destination receiving traffic. For example, improvements to TCP can be implemented at hosts in a storage area network. In another example, improvements to TCP can be implemented at tunneling switches connecting storage area networks to an IP network. The techniques can be implemented anywhere TCP is implemented. TCP typically allows the transmission of data using windows.
Figure 2 is a diagrammatic representation showing a window. A data stream is separated into different parts. Portion 211 is data that has been sent and acknowledged. Portion 213 is data has been sent but not yet acknowledged. In some examples, this portion 213 includes data that has been retransmitted one or more times. This part is often referred to as the flight size 203. Another part of the data stream is referred to as the usable window 205. The usable window is the portion 215 that can be sent but has not yet been sent over the network.
In some examples, the usable window has a non-zero size when space is available in window 201. When the usable window 205 diminishes to nearly 0 and the flight size 203 encompasses nearly all of window 201, no additional data can be sent. The flight size 203 and the usable window 205 together are referred to as a window 201, transmission window, or congestion window. The data stream can also include data 217 that can not yet be sent over the network. Figure 3 is a diagrammatic representation showing a sliding window. As data is acknowledged, a previous window 303 slides over to a current window 301 position. As data is transmitted and acknowledged, the current window continues to move to incorporate more data that has not been sent. For example, an acknowledgment may be sent for every 2 packets received by a destination. Each acknowledgment detected can shift the current window 301 over a predetermined amount. The window typically slides by the amount of data that has been acknowledged. The window typically changes size based on slow start and congestion control mechanisms.
Figure 4 is a simplified flow process diagram showing a technique for adjusting a congestion window. It should be noted that the flow processes depicted herein are merely examples of techniques of the present invention. A variety of other techniques can also be applied. Some of the processes depicted are optional while other processes may be added while remaining within the scope of the present invention, hi some instances, details associated with the process operations may not be described in order not to unnecessarily obscure the present invention.
At 401, it is determined if space is available in a window. According to various embodiments, space is available if a window is not full of data sent but not yet acknowledged. If no space is available, the sender waits for acknowledgments corresponding to sent packets or for sent packet time outs. If space is available in a window, data is transmitted 405. In one example, data1 is continuously transmitted as long as space is available in a window. Various counters, timers, and sequence numbers can be maintained to allow for reliable retransmission. As data is transmitted, a decreasing amount of space is available in the window for transmission until acknowledgments are received.
If an acknowledgment is received at 411, the window now has more space to incorporate new data to be transmitted. In one example, if an acknowledgment is received at 411, the window slides to include new data for transmission at 417. In some instances, the window is enlarged by one segment. However, if no acknowledgment is received after one or more retransmit attempts at 413, the window is collapsed to a very small size, often to one or two segments at 415. This is often a very drastic remedy for addressing dropped packets. This is an effective solution for conventional protocols such as the file transfer protocol (FTP), which is more concerned focused towards effectively transferring a file than it is with realtime handling of bursty data.
However, the techniques of the present invention recognize that the window need not be adjusted as drastically. The techniques of the present invention provide mechanisms for intelligently adjusting the window size, based on bandwidth availability and forward delay components. According to various embodiments, a congestion window is adjusted every time TCP input is invoked. A maximum send window (snd_hiwat) is adjusted once per round trip time, after the previous window has been fully acknowledged.
FastTCP adjusts window sizes based on the delay associated with network buffering and changes the window size to the smaller of either twice the current window size or a window size modified by a control function dependent on round trip times. FastTCP adjusts window sizes using the following equation:
Figure imgf000011_0001
RTT= baseRTT + delay
0 < γ≤ l
J delay > 0 => a(w, delay) = a
{delay = 0 => a(w, delay) — aw
(Equation 1) where: w is a congestion window; RTT is the round trip time; baseRTT is the observed minimum round trip time; delay(t) is a delay associated with network queuing; γ is an empirically determined control constant used to govern congestion window changes.
Once some extra delay is detected, that is, when the average RTT becomes larger than the baseRTT, the parameter a is held constant. It can be shown that on equilibrium a is the average sum of the buffer occupation for the connection. Since the metric used to determine congestion is the round trip time, FastTCP is sensitive to congestion happening in both directions of the path between two network endpoints. The TCP flows converge to a global stability point, but the configuration of flows does not necessarily maximize usage of the available bandwidth. For example, the resulting delay may all be associated with delay on the reverse path or the return path. The techniques of the present invention recognize that delay on the return path or the reverse path should not affect congestion window sizes.
In the presence of N FastTCP connections all using the same a , the total average buffer occupation is up to N* a . This is not desirable because it essentially implies that there is no upper bound to the amount of buffer required in network nodes, unless the max number of concurrent connections is known in advance. Even when an upper bound on the number of connections is known, the required buffer occupation might be bigger than the affordable buffer size. In the presence of common single bottleneck congestion, all the connections congest the same queue, and if the maximum queue length is insufficient to accommodate them, drops happen, thereby defeating the usefulness of FastTCP.
The forward delay component is associated with a reverse delay component and in many instances, it will be recognized that simple substitutions will allow calculation of a maximum window using forward delay components. Reverse delay can be calculated in a variety of manners. In one example, accurately synchronized source and destination nodes will allow determination of forward and reverse delays. In other examples, reverse delay can be determined using the techniques and mechanisms described in concurrently filed U.S. Patent Application No. 11/291,251, titled METHODS AND APPARATUS FOR DETERMINING REVERSE PATH DELAY (Atty Docket No. CISCP465) by Guglielmo M. Morandin, the entirety of which is incorporated by reference for all purposes.
According to various embodiments, reverse delay is calculated by maintaining accurate measurements at a source node and receiving timestamp packets from a destination node. Any packet or acknowledge including a timestamp is referred to herein as a timestamp packet. In some instances, a timestamp packet is a typical acknowledgment with timestamp information. The destination node has a timestamp speed at which timestamp packets are incremented, for example 10ms
Il or lms. When a timestamp packet is received, it is determined whether the timestamp expected in the packet is different from the timestamp value included in the packet. In one particular example, the reverse delay is the timestamp value subtracted from the timestamp expected all divided by the timestamp speed.
The techniques and mechanisms of the present invention provide a maximum window that is used to adjust a congestion window. It should be noted that the maximum window is merely a target window and not necessarily an absolute maximum. According to various embodiments, the maximum window is adjusted using the forward delay component associated with a round trip time. Other factors such as an observed minimum round trip time or a maximum bandwidth can also be considered.
According to various embodiments, the maximum window is calculated using the following equation:
, , . ,ΛΛ J/, .. baseRTT + rdelay(t -1) ... snd _ nιwat{t) = cwndyt — 1) — + αr(max_σw, qdelay{t — 1))
ΛJ 1 \t — 1)
RTl \t) = baseRTT + qdelayit) + r delay (t)
(Equation 2)
\L - max_bw * qdelαy > 0 => α = (jL - max_bw * qdelαy) -Kp 0 < Kp ≤ l [L — τnax_bw * qdelαy ≤ 0 => α = 0
(Equation 3) where: cwnd is a congestion window; sndjtiwαt is the maximum window;
RTT(t) is the round trip time; bαseRTTis the observed minimum round trip time; r delay (t) is a reverse delay component of the round trip time; qdelay(t) is the forward delay component of the round trip time;
L is a target buffer occupation; max JDW is a maximum bandwidth; and
Kp is an empirically determined proportional controller constant between 0 and 1.
Equation 2 can also be rewritten in the following form: hRTT(t I^ snd _ hiwat(t) = cwnd(t - 1) — — — , , . ,. + a(max_bw, qdelαy(t - 1)) bRTT(t -Y) + qdelαyit - Y) bRTTif) = bαseRTT + rdelαy{t)
(Equation 4)
In some examples, the maximum window is evaluated every time a current window is acknowledged. The baseRTT is used to determine the network delay associated with a network free of traffic. The techniques and mechanisms of the present invention recognize that congestion delay is merely one source of delay in the network. Other sources of delay include router processing and forwarding delays, source and destination delays, latency delays, etc. The techniques of the present invention provide mechanisms for measuring forward delay independent of baseRTT.
With the introduction of rdelay in the control equation, the techniques and mechanisms of the present invention are immune to congestion occurring in the reverse direction. The equilibrium point is achieved when each connection maintains α bytes queued in the forward direction.
The max_bw or maximum bandwidth is a static property of the various network nodes and link in a network and does not depend on the amount of traffic present in the network. Consequently, the max_bw can be empirically determined and/or manually configured. The value can also be estimated and autoconflgured in other examples.
The FastTCP buffer utilization is proportional to the number of active flows. However, the techniques of the present invention can impose an upper bound on buffer usage by using a variable α function. This avoids a significant drawback of FastTCP. The function α determines the equilibrium point for optimized distribution of network bandwidth. In general, it is desirable to reduce the value of the α function when many flows are traversing the same bottleneck buffer, and raise it again if the number of flows decreases. Unfortunately, without collaboration from the bottleneck, it is not possible to determine the number of flows. Furthermore, in general, not all the flows might be willing to send at the same rate, so a simple count of the flows would not be the best input to the α function. Consequently, the techniques and mechanisms of the present invention use the forward delay component to adapt the a function. According to various embodiments, a simple proportional controller is used, although more complex controllers such as proportional integral derivative (PID) controllers can also be used. As noted above in Equation 3, given a-priori knowledge of the bottleneck bandwidth, a target buffer occupation L is compared to an estimate of the current buffer occupation obtained from the measured qdelay. An increase in qdelay is considered a consequence of increased amounts of traffic, generated by other sources, converging on the bottleneck and caasing the bottleneck buffer occupation to grow. The proportional controller in Equation 3 applies negative feedback to the system, reducing α and consequently reducing the contribution of each flow to the bottleneck buffer utilization. Since the measured quantity to calculate a is the qdelay, a quantity whose average is going to be the same for all the connections sharing a single bottleneck, it is guaranteed that all these connections, on equilibrium, will tend to use the same value of a , resulting in good fairness properties.
An Additive Increase/ Multiplicative Decrease (AIMD) scheme similar to the one used in conventional TCP was evaluated in order to adjust a , but it was difficult to obtain acceptable fairness. To achieve it in reasonable timescales, the rate of adjustment needs to be high, but a high rate of adjustment causes oscillations in the bottleneck queue size that can lead to packet drops.
Consequently, the techniques of the present invention recognize that a proportional controller preferably has both good fairness and a quick response to changing network conditions. According to various embodiments, the techniques of the present invention provide a proportional controller that allows a system to reach a stable equilibrium point in which all the connections on a bottleneck achieve the same constant throughput, not just a variable throughput whose average is the ideal one. The proportional controller also calculates a new value for every RTT and is based on the last estimate of the queuing delay. This is important to avoid drops when network conditions change, for example when a new flow activates, and to be able to consume all the available bandwidth when flows depart. In typical networks, there are usually other sources of delay besides congestion and propagation times, and in order to overcome these extra delays a minimum positive a is used to achieve full rate.
While FastTCP calculates a new congestion window every RTT, the techniques of the present invention recognize that it is not necessarily a good idea to immediately increase the window to the new value, because the resulting burst could cause packet drops and result in extra queueing delay that increases the measured RTT. This would occur even though the window is still smaller than the one that would be reached on equilibrium. Furthermore, bursts cause decreased performance since the increased RTT, even though temporary, limits the growth of the window towards the value it would reach at equilibrium.
Consequently, the techniques of the present invention reduce burstiness by applying variable rate shaping and by allowing the congestion window to grow only gradually. The packet transmission process is shaped according to the following equation:
max_ rate = {snd _ hiwat I base _ r#)(l + 1/32)
(Equation 5) where: max_rate is a transmission rate; sndjkiwat is the maximum window; and baseRTT is the observed minimum round trip time.
Shaping is implemented by smoothly delaying packet departures in order to obtain the desired send rate. Of course new packets are sent only if they belong to the usable portion of the window.
According to various embodiments, techniques of the present invention not only provide more efficient and effective ways for determining congestion window sizes but also provide a variable shaper to smooth bursts of data. The shaper is programmed to a rate 1/32 faster than the average one achievable using the new target window, but still shapes at a rate close enough to the achievable rate in order to better spread the transmission of packets. The correcting factor allows full utilization of the usable window portion. According to various embodiments, shaping is important when the target window is lower than bottleneck _bandwidth • rtt , i.e. below equilibrium, or in presence of other competing TCP connections. It should be noted that even if the congestion window is held constant, and the packets are initially evenly spread across the whole RTT, packets tend to cluster in time in the absence of shaping, resulting in rate variations from the average, and network buffer oscillations.
According to various embodiments, the congestion window is updated for every positive acknowledgment received. The change in the congestion window is calculated using the following equation:
Δcwn d = [min(sn<i _ hiwat — cwnd, a)]/ 4
[L - max_έw * qdelay > 0 => a = (∑ - max_hw * qdelay) - Kp 0 < Kp ≤ l
[L - max_bw * qdelay < 0 => α = 0
(Equation 6) where: cwnd is a congestion window; sndjiiwat is the maximum window; qdelay (t) is the forward delay component of the round trip time;
L is a target buffer occupation; maxjbw is a maximum bandwidth; and
Kp is an empirically determined proportional controller constant between 0 and 1.
However, the techniques and mechanisms of the present invention recognize that growing the congestion window using conventional TCP mechanisms is inadequate. Growth according to available TCP mechanisms is too slow and reduction is too drastic. Consequently, the techniques of the present invention provide for more gradual changes to congestion window sizes.
The increase in congestion window size occurring every RTT is reduced in order to dampen the window oscillation dynamics by a factor of 1A. It should be noted that a variety of reduction factors can be used, but 4 is a hardware bit shifting friendly reduction factor. Besides the reduction to 1A of the gap between sndjiiwat and cwnd, this relation also covers scenarios after an idle period where the gap between sndjiiwat and cwnd is quite large. After an idle or partially idle period (lasting less than one rtt), sndjiiwat reflects a window used in the past and cwnd tracks the amount of data that is still pending in the network, in order to reduce burstiness. In this case it is not desirable to grow cwnd towards snd_hiwat, since snd_hiwat does not appropriately represent the network congestion state. In such cases the cwnd is only grown by a 14.
Another corner case can happen after an IPS FastTCP failure, i.e. after a drop event and the subsequent recovery. In such a case it is common that a large number of bytes is acknowledged by a single packet, so the number of bytes acknowledged could have a value significantly larger than normal, resulting in a sudden window growth that could cause other drops. To improve the stability of IPS FastTCP after a drop event, the number of bytes acknowledged is limited. According to various embodiments, the variable holding the number of bytes acknowledged, when used to grow the congestion window cwnd, is limited to 16000 bytes. This value has been empirically chosen, but different values might prove effective. Of course the actual value of acked is used as usual to slide the window and discard newly acknowledged data.
The above Equation 6 gives only the target cwnd growth over the next round trip time. To perform a gradual increase of cwnd, the following equations can be used:
— nάxi{sn d _ hiwat - cwnd, a)
Aactcwnd = — cwnd
JL - max_bw * qdelay > 0 => a = [I, - max_bw * qdelαy) -Kp Q < Kp ≤ \ [L - max_bw * qdelαy ≤ 0 => α = 0
(Equation 7)
cwnd = cwnd + Aactcwnd • acked
(Equation 8)
Equation 8 allows a gradual update every time a positive acknowledge is processed. However, more state information than usual needs to be kept in the TCP control block. The techniques of the present invention also recognize that in some instances, the baseRTT variable may have to be recalculated. For example, the base round trip time for a particular source destination pair may change when network topology changes. One trivial case happens when the delay is decreasing. In that case it is sufficient to use the new observed RTT as the new baseRTT. A difficult case is one that results from increases in the propagation delay, because the increase could also result from network congestion. The techniques of the present invention recognize that the variable baseRTT should not be increased as a result of network congestion.
According to various embodiments, the TCP stack keeps track of two recently observed RTTs in order to properly update the baseRTT variable and uses the following equation:
baseRTT = min(min_ rtt, prev _ min_ rtt)
(Equation 9) where: prev_min_rtt is the best estimate of baseRTT on a period prior to the previous 30 seconds epoch; and minj'tt is the smallest RTT observed in the last 30 seconds epoch.
According to various embodiments, the prev_min_rtt and minj-tt are compared periodically. In some examples, the values are compared every 30 seconds. Only if minj-tt is sufficiently bigger than prev_min_rtt, the value of prev_min_rtt is minj'tt, otherwise no change is made. The rationale is that a large amount of congestion is unlikely to persist for 30 seconds, so during the 30 second interval the measured round trip time should oscillate and reach a value sufficiently close to the previously observed minimum. If instead the increased round trip time is due to a longer path, the observed RTT does not oscillate close to the old minimum, but is more likely to be exactly the new (bigger) minimum, since congestion is unlikely because all connections are trying to adapt to the longer path. In the presence of pre-existing congestion on the new path, the observed RTT will still oscillate, but will be bigger than the one measured on the old path. If the baseRTT is overestimated, drops are likely, but the subsequent path underutilization will result in a good baseRTT estimate. In some examples, the minjrtt value can be updated using the following equations:
min_ rtt - prev _ min_ rtt > min_ delta => prev _ min_ rtt = min_ rtt
(Equation 10)
min_ delta = L *4/ max_ bw
(Equation 11)
where: min_delta is the minimum time increase that is considered still due to congestion.
The minjrtt value can be adjusted when significant routing changes occur.
It should be noted that the techniques of the present invention can also be implemented by traffic destinations. In some examples, a source can control the send process of a destination by modulating windows associated with the destination. The techniques of the present invention can be used to pace acknowledgements to a destination so that windows at a destination can be modulated based on computations at a source.
The techniques of the present invention can be implemented on a variety of devices such as hosts and switches. In some examples, the improvements to TCP can be implemented at any source originating traffic or destination receiving traffic. To be effective, it does not need to be implemented on both. In other examples, the improvements to TCP can also be implemented at tunneling switches used to transmit storage application data over IP networks.
Figure 5 is a diagrammatic representation of one example of a fibre channel switch that can be used to implement techniques of the present invention. Although one particular configuration will be described, it should be noted that a wide variety of switch and router configurations are available. The tunneling switch 501 may include one or more supervisors 511. According to various embodiments, the supervisor 511 has its own processor, memory, and storage resources.
Line cards .503, 505, and 507 can communicate with an active supervisor 511 through interface circuitry 583, 585, and 587 and the backplane 515. According to various embodiments, each line card includes a plurality of ports that can act as either input ports or output ports for communication with external fibre channel network entities 551 and 553. The backplane 515 can provide a communications channel for all traffic between line cards and supervisors. Individual line cards 503 and 507 can also be coupled to external fibre channel network entities 551 and 553 through fibre channel ports 543 and 547.
External fibre channel network entities 551 and 553 can be nodes such as other fibre channel switches, disks, RAIDS, tape libraries, or servers. It should be noted that the switch can support any number of line cards and supervisors. In the embodiment shown, only a single supervisor is connected to the backplane 515 and the single supervisor communicates with many different line cards. The active supervisor 511 may be configured or designed to run a plurality of applications such as routing, domain manager, system manager, and utility applications.
According to various embodiments, the switch also includes line cards 575 and 577 with IP interfaces 565 and 567. In one example, the IP port 565 is coupled to an external IP network entity 555. According to various embodiments, each IP line card includes a plurality of ports that can act as either input ports or output ports for communication with external IP entities 555. These IP entities could be IP switches or routers, or directly attached network endnodes. The line cards 575 and 577 can also be coupled to the backplane 515 through interface circuitry 595 and 597.
According to various embodiments, the switch can have a single IP port and a single fibre channel port. In one embodiment, two fibre channel switches used to form an FCIP tunnel each have one fibre channel line card and one IP line card. Each fibre channel line card connects to an external fibre channel network entity and each IP line card connects to a shared IP network. In another embodiment, a fibre channel switch performs gateway functions between hosts or disks connected to an IP network and host or disks connected to a fibre channel network, hi various embodiments, the techniques of the present invention do not need to be implemented by both TCP end points in order to be effective.
In addition, although an exemplary switch is described, the above-described embodiments may be implemented in a variety of network devices (e.g., servers) as well as in a variety of mediums. For instance, instructions and data for implementing the above-described invention may be stored on a disk drive, a hard drive, a floppy disk, a server computer, or a remotely networked computer. Accordingly, the present embodiments are to be considered as illustrative and not restrictive, and the invention is not to be limited to the details given herein, but may be modified within the scope and equivalents of the appended claims.
While the invention has been particularly shown and described with reference to specific embodiments thereof, it will be understood by those skilled in the art that changes in the form and details of the disclosed embodiments may be made without departing from the spirit or scope of the invention. For example, embodiments of the present invention may be employed with a variety of network protocols and architectures. It is therefore intended that the invention be interpreted to include all variations and equivalents that fall within the true spirit and scope of the present invention.

Claims

WHAT IS CLAIMED IS:
1. A method for performing congestion control using a transmission control protocol (TCP), the method comprising: determining a forward delay component of a round trip time associated with sending data associated with a flow from a source node and receiving an acknowledgment from a destination node using the transmission control protocol (TCP); adjusting a maximum window by using the forward delay component and an observed minimum round trip time.
2. The method of claim 1, wherein the maximum window is adjusted by also using a maximum bandwidth.
3. The method of claim 2, wherein the maximum bandwidth is the capacity of the slowest link along the path from the source node to the destination node.
4. The method of claim 1 , wherein the maximum window is calculated using the following equation: j j - /^ j tt ^ baseRTT + rdelay(t - 1) , ... snd hiwat(t) = cwnd(t - 1) — + α(max bw, qdelay (t - 1))
W ' RJT(t -l) ~
RTT (t) = baseRTT + qdelay if) + rdelayit)
[L - max_bw * qdelay > 0 => a = (L - m&x_bw * qdelay) - Kp 0 < Kp ≤ 1 [L - max_bw* qdelay ≤ 0 => a = 0
where: cwnd is a congestion window; sndjiiwat is the maximum window;
RTT(t) is the round trip time; baseRTT 'is the observed minimum round trip time; r delay (t) is a reverse delay component of the round trip time; qdelay (t) is the forward delay component of the round trip time;
L is a target buffer occupation; max_bw is a maximum bandwidth; and
Kp is an empirically determined proportional controller constant between 0 and 1.
5. The method of claim 4, wherein determining the forward delay component comprises determining the reverse delay component.
6. The method of claim 4, wherein network buffer occupation is bound.
7. The method of claim 1, wherein a variable rate shaper is used to determine a transmission rate.
8. The method of claim 7, wherein the transmission rate is determined using the maximum window and the observed minimum round trip time.
9. The method of claim 8, wherein the transmission rate is determined using the following equation:
max_ rate = (snd _ hiwat I base _ rtt)(l + 1/32)
where: max_rαfe is a transmission rate; sndjiiwat is the maximum window; and baseRTT is the observed minimum round trip time.
10. The method of claim 1, wherein a congestion window is updated using an amount of data acknowledged.
11. The method of claim 10, wherein the change in the congestion window is determined using the maximum window, the congestion window, the forward delay component, and a maximum bandwidth.
12. The method of claim 11, wherein the total change in the congestion window is determined using the following equation:
Acwnd = min(snd _ hiwat - cwnd, a) I A
JL - max_bw * qdelαy > 0 => α = {L - max_Zw * qdelαy) -Kp 0 < Kp ≤ 1 [L -max_bw* qdelαy ≤ 0 => α = 0 where: cwnd is a congestion window; sndjiiwαt is the maximum window; qdelαy (t) is the forward delay component of the round trip time;
L is a target buffer occupation; maxjbw is the maximum bandwidth; and
Kp is an empirically determined proportional controller constant between 0 and 1.
13. The method of claim 12, wherein the congestion window cwnd is changed every time a positive acknowledgement is received using the following equation:
—mm(snd_ hiwat - cwnd, a)
Aactcwnd = — cwnd
L - max_Z?w * qdelay > 0 => a = [I, — max_Zw * qdelay) • Kp 0 < Kp ≤ 1 L -max_bw* qdelαy < 0 => α = 0
cwnd = cwnd + Aactcwnd • ached
14. The method of claim 4, wherein the baseRTT is determined by measuring the minimum round trip time and by using the following equation:
baseRTT = min(min_ rtt, prev _ min_ rtt) where: prev_min_rtt is the best estimate of baseRTT on a period prior to the previous 30 seconds epoch min_rtt is the smallest rtt observed in the last 30 seconds epoch and
min_ rtt — prev _min_ rtt > min_ deltat => prev _min_ rtt = min_ rtt min_ deltat = L * 4 / max_ bw
15. A network device configured to perform congestion control using a transmission control protocol (TCP), the network device comprising: an interface coupled to an Internet Protocol (IP) network; and a processor operable to determine a forward delay component of a round trip time associated with sending data associated with a flow and receiving an acknowledgment from a destination node connected to the interface using the transmission control protocol (TCP), wherein the processor is further configured to adjust a maximum window by using the forward delay component and an observed minimum round trip time.
16. The network device of claim 15, wherein the maximum window is adjusted by also using a maximum bandwidth.
17. The network device of claim 16, wherein the maximum bandwidth is the capacity of the slowest link along the path from the source node to the destination node.
18. The network device of claim 15, wherein the maximum window is calculated using the following equation:
, , . , . ., .. baseRTT ' + rdelayit - 1) . , , , . ... snd _ hιwat{t) = cwnd (t - 1) — + a (max_ bw, qdelay (t - 1))
RTT(t — 1)
RTT(t) = baseRTT + qdelay{t) + rdelay{t)
JL - max_ bw * qdelay > 0 => a = (∑ - max_ bw * qdelay) - Kp 0 < Kp ≤ 1 [L - max_£>w * qdelay ≤ 0 => a = 0
where: cwnd is a congestion window; sndjiiwat is the maximum window;
RTT(t) is the round trip time; baseRTT is the observed minimum round trip time; rdelay(t) is a reverse delay component of the round trip time; qdelay (t) is the forward delay component of the round trip time;
I is a target buffer occupation; maxjbw is a maximum bandwidth; and
Kp is an empirically determined proportional controller constant between 0 and 1.
19. The network device of claim 18, wherein determining the forward delay component comprises determining the reverse delay component.
20. The network device of claim 18, wherein network buffer occupation is bound.
21. The network device of claim 15, wherein a variable rate shaper is used to determine a transmission rate.
22. The network device of claim 21, wherein the transmission rate is determined using the maximum window and the observed minimum round trip time.
23. The network device of claim 22, wherein the transmission rate is determined using the following equation:
max_ rate = (snd _ hiwat I base _ rtt)(l + 1 / 32)
where: max_røtø is a transmission rate; sndjiiwat is the maximum window; and baseRTT is the observed minimum round trip time.
24. The network device of claim 15, wherein a congestion window is updated using the amount of data acknowledged.
25. The network device of claim 24, wherein the change in the congestion window is determined using the maximum window, the congestion window, the forward delay component, and the maximum bandwidth.
26. The network device of claim 25, wherein the total change in the congestion window is determined using the following equation:
Acwnd = min(snd _ hiwat - cwnd, a) /4 L — max_Zw * qdelay > 0 => a = [L — max_bw * qdelαy) • Kp 0 < Kp ≤ 1 L - max_bw * qdelαy < 0 => α = 0 where: cwnd is a congestion window; sndjiiwat is the maximum window; qdelαy (t) is the forward delay component of the round trip time;
L is a target buffer occupation; mαx_ bw is a maximum bandwidth; and
Kp is an empirically determined proportional controller constant between 0 and 1.
27. The network device of claim 26, wherein the congestion window cwnd is changed every time a positive acknowledgement is received using the following equation: — mm(snd _ hiwat - cwnd, a)
Δactcwnd = — cwnd
[L - max_όw * qdelay > 0 => a - (∑ - max_bw * qdelay) - Kp 0 < Kp ≤ \ [L - max_bw * qdelay ≤ 0 => a = 0
cwnd = cwnd + Δactcwnd • acked
28. The network device of claim 18, wherein the baseRTT is determined by measuring the minimum round trip time and by using the following equation:
baseRTT = mm(m.in_rtt, prev _mm_rtt) where: prev_min_rtt is the best estimate of baseRTT on a period prior to the previous 30 seconds epoch min_rtt is the smallest rtt observed in the last 30 seconds epoch and
min_ rtt — prev _ min_ rtt > min_ deltat => prev _ min_ rtt = min_ rtt min_ deltat = L * 4 / max_ bw
29. A system for performing congestion control, the system comprising: means for determining a forward delay component of a round trip time associated with sending data associated with a flow and receiving an acknowledgment from a destination node using the transmission control protocol (TCP); means for adjusting a maximum window by using the forward delay component and an observed minimum round trip time.
PCT/US2006/045702 2005-11-30 2006-11-29 Transmission control protocol (tcp) congestion control using transmission delay components WO2007064712A2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
EP06838585.5A EP1955460B1 (en) 2005-11-30 2006-11-29 Transmission control protocol (tcp) congestion control using transmission delay components

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US11/291,201 US7760633B2 (en) 2005-11-30 2005-11-30 Transmission control protocol (TCP) congestion control using transmission delay components
US11/291,201 2005-11-30

Publications (2)

Publication Number Publication Date
WO2007064712A2 true WO2007064712A2 (en) 2007-06-07
WO2007064712A3 WO2007064712A3 (en) 2007-12-06

Family

ID=38087343

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2006/045702 WO2007064712A2 (en) 2005-11-30 2006-11-29 Transmission control protocol (tcp) congestion control using transmission delay components

Country Status (3)

Country Link
US (3) US7760633B2 (en)
EP (1) EP1955460B1 (en)
WO (1) WO2007064712A2 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2015133669A (en) * 2014-01-15 2015-07-23 株式会社日立製作所 Communication device
WO2016147650A1 (en) * 2015-03-19 2016-09-22 日本電気株式会社 Transmitting apparatus and control method therefor, communication system, and recording medium storing communication control program
WO2021013260A1 (en) * 2019-07-25 2021-01-28 中兴通讯股份有限公司 Network transmission control method and apparatus

Families Citing this family (54)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7760633B2 (en) * 2005-11-30 2010-07-20 Cisco Technology, Inc. Transmission control protocol (TCP) congestion control using transmission delay components
US7643418B1 (en) * 2006-03-24 2010-01-05 Packeteer, Inc. Aggregate rate control using PID
TW200816719A (en) * 2006-08-23 2008-04-01 Matsushita Electric Ind Co Ltd Communication equipment
US8296430B2 (en) 2007-06-18 2012-10-23 International Business Machines Corporation Administering an epoch initiated for remote memory access
US9065839B2 (en) * 2007-10-02 2015-06-23 International Business Machines Corporation Minimally buffered data transfers between nodes in a data communications network
US20090113308A1 (en) * 2007-10-26 2009-04-30 Gheorghe Almasi Administering Communications Schedules for Data Communications Among Compute Nodes in a Data Communications Network of a Parallel Computer
US8054822B2 (en) * 2008-01-28 2011-11-08 Alcatel Lucent Synchronization of call traffic in the forward direction over backhaul links
US8385207B2 (en) * 2008-05-27 2013-02-26 International Business Machines Corporation Method and apparatus for end-to-end network congestion management
US8140704B2 (en) * 2008-07-02 2012-03-20 International Busniess Machines Corporation Pacing network traffic among a plurality of compute nodes connected using a data communications network
US9143462B2 (en) * 2009-04-10 2015-09-22 International Business Machines Corporation Large send support in layer 2 switch to enhance effectiveness of large receive on NIC and overall network throughput
US8310930B2 (en) 2009-06-05 2012-11-13 New Jersey Institute Of Technology Allocating bandwidth in a resilient packet ring network by PI controller
US8089878B2 (en) * 2009-06-05 2012-01-03 Fahd Alharbi Allocating bandwidth in a resilient packet ring network by P controller
US8966110B2 (en) * 2009-09-14 2015-02-24 International Business Machines Corporation Dynamic bandwidth throttling
US8730799B2 (en) * 2010-03-03 2014-05-20 Akamai Technologies, Inc. Dynamic adjustment of receive window utilized by a transmitting device
US8234400B2 (en) 2010-03-16 2012-07-31 Microsoft Corporation Shaping virtual machine communication traffic
US8365186B2 (en) 2010-04-14 2013-01-29 International Business Machines Corporation Runtime optimization of an application executing on a parallel computer
US8504730B2 (en) 2010-07-30 2013-08-06 International Business Machines Corporation Administering connection identifiers for collective operations in a parallel computer
US8565120B2 (en) 2011-01-05 2013-10-22 International Business Machines Corporation Locality mapping in a distributed processing system
US9317637B2 (en) 2011-01-14 2016-04-19 International Business Machines Corporation Distributed hardware device simulation
US9558013B2 (en) * 2011-04-12 2017-01-31 Citrix Systems, Inc. Responsive scroller controls in server-hosted applications
US9391911B1 (en) * 2011-07-15 2016-07-12 Google Inc. Congestion window modification
US8689228B2 (en) 2011-07-19 2014-04-01 International Business Machines Corporation Identifying data communications algorithms of all other tasks in a single collective operation in a distributed processing system
US9250948B2 (en) 2011-09-13 2016-02-02 International Business Machines Corporation Establishing a group of endpoints in a parallel computer
WO2012149762A1 (en) * 2011-09-22 2012-11-08 华为技术有限公司 Congestion control method and equipment
US9014264B1 (en) * 2011-11-10 2015-04-21 Google Inc. Dynamic media transmission rate control using congestion window size
US10404562B2 (en) 2012-10-22 2019-09-03 Texas State University Optimization of retransmission timeout boundary
US10122645B2 (en) 2012-12-07 2018-11-06 Cisco Technology, Inc. Output queue latency behavior for input queue based device
CN103117922B (en) * 2013-02-20 2014-06-11 浪潮电子信息产业股份有限公司 Implementation method of message search by double sliding windows
US9628406B2 (en) * 2013-03-13 2017-04-18 Cisco Technology, Inc. Intra switch transport protocol
US9860185B2 (en) 2013-03-14 2018-01-02 Cisco Technology, Inc. Intra switch transport protocol
CA2920122A1 (en) * 2013-08-08 2015-02-12 Ricoh Company, Limited Program, communication quality estimation method, information processing apparatus, communication quality estimation system, and storage medium
EP3319281B1 (en) 2014-04-23 2020-08-19 Bequant S.L. Method and appratus for network congestion control based on transmission rate gradients
CN104188107B (en) * 2014-07-30 2019-01-08 深圳市合元科技有限公司 The baking-type smoking apparatus of controllable tobacco heating amount
CN104168284B (en) * 2014-08-25 2019-02-05 联想(北京)有限公司 A kind of data transmission method and the first electronic equipment
CN104270217B (en) * 2014-09-19 2018-09-14 国家电网公司 A method of realizing Enhanced time synchronizing process link delay fault-tolerance
US20160164788A1 (en) * 2014-12-05 2016-06-09 Qualcomm Incorporated Egress Rate Shaping To Reduce Burstiness In Application Data Delivery
CN105991462B (en) * 2015-03-02 2019-05-28 华为技术有限公司 Sending method, sending device and the system of transmission control protocol TCP data packet
CN105721333B (en) * 2016-01-21 2019-02-15 全时云商务服务股份有限公司 A kind of data transmission device and method
US10070403B2 (en) 2016-03-09 2018-09-04 Mueller International, Llc Time beacons
US10582347B2 (en) 2016-04-14 2020-03-03 Mueller International, Llc SMS communication for cellular node
US10097411B2 (en) 2016-05-23 2018-10-09 Mueller International, Llc Node migration
US10200947B2 (en) 2016-07-11 2019-02-05 Mueller International, Llc Asymmetrical hail timing
CN109309934B (en) * 2017-07-27 2021-01-15 华为技术有限公司 Congestion control method and related equipment
CN107787014B (en) * 2017-10-30 2021-04-13 沈阳理工大学 Method for controlling congestion of satellite network transmission control layer based on forward time delay
US10267652B1 (en) * 2018-01-23 2019-04-23 Mueller International, Llc Node communication with unknown network ID
CN109996210B (en) * 2018-04-02 2020-07-24 京东方科技集团股份有限公司 Congestion window control method, device and equipment for Internet of vehicles
CN110120921B (en) * 2019-05-13 2022-07-01 深圳市赛为智能股份有限公司 Congestion avoidance method, apparatus, computer device and storage medium
CN112311725B (en) * 2019-07-26 2022-01-11 华为技术有限公司 Data processing method and device and terminal
CN111130923B (en) * 2019-11-29 2021-07-09 北京达佳互联信息技术有限公司 Network bandwidth determining method and device, electronic equipment and storage medium
CN111371692B (en) * 2020-03-13 2020-11-27 中科驭数(北京)科技有限公司 Window control method and device based on TCP (Transmission control protocol) and electronic equipment
CN113141315B (en) * 2021-04-20 2022-12-27 上海卓易科技股份有限公司 Congestion control method and equipment
CN113438181B (en) * 2021-08-26 2021-11-09 北京邮电大学 Network congestion control method and device
US20230344768A1 (en) * 2022-04-22 2023-10-26 Huawei Technologies Co., Ltd. System and method for a scalable source notification mechanism for in-network events
CN115022247B (en) * 2022-06-02 2023-10-20 成都卫士通信息产业股份有限公司 Flow control transmission method, device, equipment and medium

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7643430B2 (en) 2005-11-30 2010-01-05 Cisco Technology, Inc. Methods and apparatus for determining reverse path delay

Family Cites Families (23)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6105064A (en) * 1997-05-30 2000-08-15 Novell, Inc. System for placing packets on network for transmission from sending endnode to receiving endnode at times which are determined by window size and metering interval
CA2249152C (en) * 1998-09-30 2003-07-08 Northern Telecom Limited Apparatus for and method of managing bandwidth for a packet-based connection
US8457627B2 (en) * 1999-08-24 2013-06-04 Gogo Llc Traffic scheduling system for wireless communications
JP2001237882A (en) * 2000-02-23 2001-08-31 Nec Corp Packet size controller in packet data transfer and its control method
US6757245B1 (en) 2000-06-01 2004-06-29 Nokia Corporation Apparatus, and associated method, for communicating packet data in a network including a radio-link
US7185082B1 (en) * 2000-08-09 2007-02-27 Microsoft Corporation Fast dynamic measurement of connection bandwidth using at least a pair of non-compressible packets having measurable characteristics
US7200111B2 (en) * 2000-08-31 2007-04-03 The Regents Of The University Of California Method for improving TCP performance over wireless links
US7304951B2 (en) 2000-11-21 2007-12-04 North Carolina State University Methods and systems for rate-based flow control between a sender and a receiver
US7099273B2 (en) * 2001-04-12 2006-08-29 Bytemobile, Inc. Data transport acceleration and management within a network communication system
US7305704B2 (en) 2002-03-16 2007-12-04 Trustedflow Systems, Inc. Management of trusted flow system
US7581019B1 (en) * 2002-06-05 2009-08-25 Israel Amir Active client buffer management method, system, and apparatus
US8230106B2 (en) * 2003-03-31 2012-07-24 Alcatel Lucent Methods and apparatus for improved transmission control protocol transmission over a wireless channel exhibiting rate and delay variations
US7974195B2 (en) * 2003-06-12 2011-07-05 California Institute Of Technology Method and apparatus for network congestion control
KR100933159B1 (en) * 2003-07-11 2009-12-21 삼성전자주식회사 Synchronization method and system for voice data transmission in mobile communication system
JP4343229B2 (en) * 2003-08-14 2009-10-14 テルコーディア テクノロジーズ インコーポレイテッド Automatic IP traffic optimization in mobile communication systems
US7675898B2 (en) * 2003-08-20 2010-03-09 Nec Corporation Session relay apparatus for relaying data, and a data relaying method
US20080037420A1 (en) * 2003-10-08 2008-02-14 Bob Tang Immediate ready implementation of virtually congestion free guaranteed service capable network: external internet nextgentcp (square waveform) TCP friendly san
KR100526187B1 (en) * 2003-10-18 2005-11-03 삼성전자주식회사 Method of obtaining the optimum rate control in mobile ad hoc network circumstance
US8125910B2 (en) * 2004-06-25 2012-02-28 Nec Corporation Communication system
US7656800B2 (en) * 2004-07-30 2010-02-02 Cisco Technology, Inc. Transmission control protocol (TCP)
WO2006023604A2 (en) * 2004-08-17 2006-03-02 California Institute Of Technology Method and apparatus for network congestion control using queue control and one-way delay measurements
US20060203730A1 (en) * 2005-03-14 2006-09-14 Zur Uri E Method and system for reducing end station latency in response to network congestion
US7760633B2 (en) * 2005-11-30 2010-07-20 Cisco Technology, Inc. Transmission control protocol (TCP) congestion control using transmission delay components

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7643430B2 (en) 2005-11-30 2010-01-05 Cisco Technology, Inc. Methods and apparatus for determining reverse path delay

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2015133669A (en) * 2014-01-15 2015-07-23 株式会社日立製作所 Communication device
WO2016147650A1 (en) * 2015-03-19 2016-09-22 日本電気株式会社 Transmitting apparatus and control method therefor, communication system, and recording medium storing communication control program
WO2021013260A1 (en) * 2019-07-25 2021-01-28 中兴通讯股份有限公司 Network transmission control method and apparatus
US12069508B2 (en) 2019-07-25 2024-08-20 Zte Corporation Network transmission control method and apparatus

Also Published As

Publication number Publication date
US8797871B2 (en) 2014-08-05
US20110013512A1 (en) 2011-01-20
US20070121511A1 (en) 2007-05-31
WO2007064712A3 (en) 2007-12-06
EP1955460A2 (en) 2008-08-13
US9160670B2 (en) 2015-10-13
EP1955460B1 (en) 2019-07-24
US20150049611A1 (en) 2015-02-19
EP1955460A4 (en) 2015-07-08
US7760633B2 (en) 2010-07-20

Similar Documents

Publication Publication Date Title
US9160670B2 (en) Transmission control protocol (TCP) congestion control using transmission delay components
US7656800B2 (en) Transmission control protocol (TCP)
US7859996B2 (en) Intelligent congestion feedback apparatus and method
US20070115814A1 (en) Method and apparatus for improved data transmission
KR20050085742A (en) Protecting real-time data in wireless networks
CN109714267B (en) Transmission control method and system for managing reverse queue
US20180176136A1 (en) TCP Bufferbloat Resolution
CA2940077C (en) Buffer bloat control
Le et al. SFC: Near-source congestion signaling and flow control
Lu et al. EQF: An explicit queue-length feedback for TCP congestion control in datacenter networks
Al-Saadi et al. Characterising LEDBAT performance through bottlenecks using PIE, FQ-CoDel and FQ-PIE active queue management
Vyakaranal et al. Performance evaluation of TCP using AQM schemes for congestion control
TWI308012B (en) Method for adaptive estimation of retransmission timeout in wireless communication systems
Das et al. A Dynamic Algorithm for Optimization of Network Traffic through Smart Network Switch Data Flow Management
Dangi et al. A new congestion control algorithm for high speed networks
Zhang et al. Second-order rate-control based transport protocols
Liu et al. TCP-CM: A transport protocol for TCP-friendly transmission of continuous media
Chan et al. A threshold controlled TCP for data center networks
Loula Congestion Control Supported Dual-Mode Video Transfer
Hung et al. Simple slow-start and a fair congestion avoidance for TCP communications
JP6450176B2 (en) Packet transmission equipment
Tong et al. Tcp fairness improvement of dccp flow control for bursty real-time applications
CN117896324A (en) Method and system for controlling congestion of RDMA (remote direct memory Access) network across data centers
CHI A N ENHANCED VERSION OF SACKTCP
Sun et al. Adaptive drop-tail: A simple and efficient active queue management algorithm for internet flow control

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application
WWE Wipo information: entry into national phase

Ref document number: 2006838585

Country of ref document: EP

NENP Non-entry into the national phase

Ref country code: DE