WO2001077851A1

WO2001077851A1 - Tcp-friendly system and method for marking scalable better best-effort services on the internet

Info

Publication number: WO2001077851A1
Application number: PCT/US2001/040453
Authority: WO
Inventors: Shivkumar Kalyanaraman; Azeem Feroz; Amit Rao; Giampiero Lo Monaco
Original assignee: Rensselaer Polytechnic Institute
Priority date: 2000-04-06
Filing date: 2001-04-06
Publication date: 2001-10-18
Also published as: AU2001293342A1

Abstract

A system and method for network-based packet marking, in conjunction with regular active queue management and packet dropping to scale the performance of buffer management for best-effort and differentiated services on the Internet. The present invention extends the notion of 'TCP-Friendly' packet marking to improve the performance of traditional best-effort services. The TCP-aware use of deterministic packet marking at the enterprise edge allows the protection of selected TCP packets from suffering loss, thus dramatically reducing the total number of timeouts and avoiding the resulting service degradation. TCP sessions are protected with very small windows, disperse packet loss across a given window, and protect retransmitted packets from encountering loss. The present method and system use a comprehensive set of metrics to measure 'better' best-effort service.

Description

TCP-FRIENDLY SYSTEM AND METHOD FOR MARKING SCALABLE BETTER BEST-EFFORT SERVICES ON THE INTERNET

FIELD OF THE INVENTION [0001] The present invention involves a method and apparatus for traffic management on the Internet

BACKGROUND OF THE INVENTION

[0002] As the Internet grows, both in terms of size and heterogeneity, it becomes necessary to have Internet traffic management schemes that keep up with this growth by leveraging the new architectural facilities available through IETF standards. The differentiated services (diflfserv) architecture is an example of one such IETF standard that can be leveraged for scalable buffer management. The diffserv architecture proposes two distinct components: the "edge" nodes and the "core" nodes, as part of a diflfserv domain. The edge nodes perform stateful control and data-plane functions like policy and traffic conditioning whereas the core nodes concentrate on stateless (or aggregate- state) hardware -implementable data-plane functions such as forwarding, buffer management and scheduling (a.k.a. per-hop behaviors, PHBs). The coordination of data-plane functions between the edge and the core nodes is achieved through the DS- field in the IP header, which the edge can "mark" and the core can read (and optionally "re-mark" ). This architectural model allows "marking" at the edge nodes as a way to complement and enhance the buffer management capabilities of core nodes. This marking is referred to as "network- based" marking (which has semantics only within the network) to differentiate it from end-to-end "ECN marking", (which is conveyed to end systems).

[0003] For the last decade, researchers have studied and proposed improvements to TCP/IP performance. These improvements include changes to TCP host implementations (e.g.: NewReno, SACK, FACKetc), better queue management algorithms (E.g.: AQM, RED, SRED, FRED, FPQ etc) or schemes which require both end-system and bottleneck support (e.g.: ECN, adaptive packet marking). Packeteer's rate control is a buffer management scheme which manipulates TCP headers and ack rates to perform explicit, transparent control of TCP flows in certain niche deployment scenarios. Performance improvements of all these approaches have been measured in terms of timeout avoidance, better filtering of burst loss to determine congested periods and window reductions, optimization of retransmission, and improved fairness across competing TCP sessions (including small, web-like and large, ftp-like sessions).

[0004] However, Ibanez et al in their work "Preliminary Simulation evaluation of an assured service," Internet Draft, draft-Ibanez-diffserv-assured-eval-OO.txt, August 1998 determined that these techniques still evidenced performance uncertainty when TCP was mapped to the "assured service" even with relatively low degrees of connection multiplexing. This uncertainty stems from the fact that the "assured service" introduces a new dimension in best-effort buffer management: differential marking and dropping of packets which was not being leveraged properly by TCP/IP. Specifically, Ibanez et al showed that the use of a simple token bucket marker for the above assured service results in TCP not realizing the minimum "assured rate. " The authors attributed the cause of such behavior to TCP's complex response primarily due to packet losses. In Ibanez et al., part of the TCP flow was marked "IN" and a part marked " OUT. " The overall performance was affected by the bursty losses being experienced by the "OUT" packets. Ibanez et al conclude that it was unclear under such circumstances how the "assured service" can be characterized quantitatively for TCP applications. Such problems can reappear in other types of better-than-best-effort services being designed by the diff-serv community.

[0005] Accordingly, there exists a need in the art to provide an Internet traffic management control method and apparatus that avoids those pitfalls noted above by Ibanez et al. SUMMARY AND OBJECTS OF THE INVENTION [0006] Briefly, the present invention bridges the above-noted performance gap. In contrast with the traditional approaches mentioned above, the present invention implements buffer management in two parts. The first part uses network- based, TCP- friendly marking which can work in combination with the second part, differential dropping in a dif -serv context. It should be noted that the present invention is implemented in a network-based environment and is transparent to the end-systems (unlike ECN), though the marker could be supported by end-systems in the future.

[0007] Although, the end-goals of the present invention are similar to the schemes mentioned above, it differs in at least two ways: it focuses on timeout-minimization, and its deployment context is different since it requires a diff-serv context. Therefore, the present invention is complementary and not conflicting with the efforts of the community mentioned above. Performance improvements are visible even with TCP SACK and with statefiil dropping algorithms (e.g.: FRED). From a deployment perspective, the marker of the present invention requires read-access to TCP headers and would typically be at the enterprise edge. However, this approach can potentially affect a larger scope of operation because a) the marking ensures protection of critical packets from loss even at remote bottlenecks (unlike TCP rate control and the marks are carried at the network or link layer headers (e.g.: DS-byte or MPLS) and do not require TCP-level visibility or processing beyond the marker.

BRIEF DESCRIPTION OF THE DRAWINGS [0008] The foregoing brief description as well as further objects, features and advantages of the present invention will be understood more completely from the following detailed description of the presently preferred, but nonetheless illustrative embodiments of the invention, with reference being had to the accompanying drawings, in which:

[0009] FIG. 1 is a block diagram of the logical topology used in the present invention. [00010] FIG. 2 is a pseudo-code flowchart of the TCP-FRIENDLY MARKER TOKEN ALLOCATION;

[00011] FIG. 3 is a pseudo-code flowchart of the TCP-FRIENDLY PACKET MARKING program; and

[00012] FIG. 4 is a is a pseudo-code flowchart of the FRIO BUFFER MANAGEMENT program.

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS [0010] Figure 1 illustrates the functional and logical topology of the preferred embodiment of the invention. All components in Fig. 1 are conventional network building blocks for computer and communications systems. It is understood that those skilled in the art will know that such components can be realized as hardware or as software functional components.

[0011] The term "TCP-Friendly" has been used to characterize the behavior of non- TCP end-to-end congestion control schemes. This term, in contrast, is to characterize building blocks that support superior best-effort performance for TCP applications. The key question then becomes: "What building blocks aspects matter for superior TCP performance? " It is well-known that timeouts are the leading source of TCP performance degradation from several perspectives (average goodput, transfer-time and fairness/service variance). The goal of "TCP-Friendly" building blocks therefore is to reduce the probability of timeouts. From an operational perspective, TCP times out:

• When it loses all the packets or acks corresponding to a single window (all TCP versions)

• When it runs out of duplicate acks during a fast recovery phase (Reno and NewReno) • When its window is very small (< 4 MSS) and it loses even one or two packets (all TCP versions)

• When a burst of at least three closely spaced packets wil-hin a window are lost (Reno)

• When retransmitted packets are lost (all TCP versions)

[0012] Active queue management schemes, such as RED and FRED try to address some of these issues and hence they are "TCP-Friendly" in the sense of the present inventor's definition. However, it is noted that the RED/FRED approach is to "randomize" the dropping behavior, and therefore it cannot provide a high level of assurance that timeouts would be contained, especially during heavy loads and/or during high degrees of TCP multiplexing. There are also end-system proposals to reduce the TCP dupack threshold and make other TCP level implementation improvements to address this issue. The present invention is complementary to these efforts given the large deployed base of old TCP implementations.

[0013] A packet marker is one of the traffic conditioners in diff-serv. The general problem in a marker is to optimally allocate an available pool of tokens to a set of incoming packets in a given interval of time. The available pool of tokens may depend upon service parameters such as the contracted rate and a measure of burstiness. Packets which get a token are said to be marked "IN" and those which do not get tokens are said to be marked "OUT. " This can be used as the basis of a simplified form of the "assured service " . The present invention can be generalized to the full assured service specification as well.

[0014] The present invention is also designed to leverage packet markers as follows:

• Mark packets to protect small- window (e.g. window < 4MSS) flows from packet loss. • Mark packets to maintain maximum spacing between packets marked with "OUT" (low priority), within limits of the total number of tokens, in order to disperse the effect of aggregate burst loss.

• Mark packets to protect retransmitted packets from encountering loss.

[0015] Of these, the idea of protecting retransmissions from loss results in a large performance improvement because it affects all TCP implementations and accounts for a large fraction of retransmission timeouts. Retransmissions are generally sent during episodes of congestion and experience a higher average loss probability. Moreover, in several cases, retransmission loss leads to a timeout in Reno/NewReno/SACK and hence accounts for a disproportionate share of the total number of timeouts. Therefore, protecting retransmissions results in remarkable reductions in the number of timeouts, especially if the window sizes are small. A quick look into the results illustrating an example of this specific aspect is presented in Table 1 (using the configuration and abbreviations explained later in Table 4).

Table 1: Effect of Retransmission Loss on Overall Loss Rate [0016] The table shows an example of the overall loss percentage and retransmission loss percentage for 50 and 100 flows with and without TCP-Friendly marking. The TCP-friendly marking virtually eliminates retransmission loss in all cases (see row #1). Note that the retransmission loss rate is almost twice that of the average packet loss rate (compare column #2 with column #1; e.g.: 12.08% vs 5.25%). Also, the elimination of retransmission loss has a large effect on the overall loss rate (e.g.: 2.67% in row #1 vs 5.25% in row #3), which suggests that a significant fraction of total packets lost are indeed retransmissions.

[0017] This section will discuss the meaning of better best-effort service from a (TCP) performance and deployment perspective. Best-effort is the service commonly available on the Internet. Quantitative per-flow guarantees for best-effort performance are generally not given to customers. However, quantitative per-flow statistics over the population of users can be formulated for engineering purposes. In particular, user metrics (average per-flow goodput, timeouts and coefficient, of variation of per-flow goodput) and network operator metrics (high utilization, low queue length, relatively low loss rates) can be used to measure best-eflfort TCP/IP performance. These two perspectives are necessary because otherwise, one could optimize user-perceived performance (per-flow goodput, number of timeouts) with aggressive control schemes at the expense of network stability and other well-behaved flows. Similarly, one could optimize the network operator perceived performance (high utilization, low queue length, relatively low loss rates) while inflicting a large service variation on users. In this sense, "better best-eflfort" means a superior tradeoff between these multiple metrics, or an improvement in all metrics (both user and operator). In particular, the present invention is expected to eliminate a large fraction of total timeout instances seen by TCP.

[0018] Architecturally, better best-eflfort service requires the deployment of new components in the network, or the upgrade of end-systems. In the present invention it is understood that the incremental deployment of differential dropping and transport-aware or application-aware marking components (e.g.: through (diff-serv) can lead to such services. In this context, assuming that the network supports differential dropping mechanisms (e.g.: RIO ) the present invention's TCP-friendly marker will provide better best-effort or enhanced assured services for the subset of users who deploy these markers. [0019] The complete program for the TCP Friendly market token allocation process is presented in Figure 2. Figure 3 shows the correspondent packet marking behavior program. It should be noted that either program can be implement as computer software or as hardware components designed to implement logical functions.

A detailed explanation of each figure, is provided below. Referring to Figure 2, the token

allocation mechanism consists of two major steps. The major operations from each one

of them are set forth below.

[0020] Step I (lines 11 through 18)

Calculate the per-flow expected amount of bytes (line 12): The number of bytes that every flow is expected to send during the next time interval is calculated as a weighted average of the previous estimate (10% weight) and the number of bytes sent during the immediate interval preceding the updating process (90% weight).

• Reset the tokens allocated in the past (line 13): Take away any remaining tokens that the flows might still have from the interval that just ended. • Identify^* flows as tiny or not tiny (lines 14 through 18): Append the flows to two different lists depending on whether or not they are tiny flows. A tiny flow is one that has a value of less than k for its sending congestion window, at the time of update. The comments on line 13 mention that the default for k is 4. We approximate each flow's window size by considering the last acknowledgement seen from that flow. We subtract the value of the most recently acknowledged byte from the value of the last byte sent, and divide the resulting number by the segment size established at the start of the connection.

• Calculate the tokens needed by all tiny flows (line 15): We subtract, from the total amount of tokens, the amount needed to fulfill the estimates of all the tiny flows. This is to find out whether we would have any tokens left after protecting all the tiny flows, by giving them as many tokens as they need.

[0021] Step II (lines 19 through 26)

• Determine whether we can fully serve the tiny flows (line 19): If there are not enough tokens for all the tiny flows, we do not even consider giving any tokens to the not tiny flows.

• Serve not-tiny flows according to the availability of tokens (lines 21 and 26): In the case of not (or just) enough tokens for the tiny flows, we do not give tokens to their not- tiny counterparts (line 21). If all tiny flows can be fully served, then we fairly divide the rest among the not- tiny ones (line 26).

• Serve tiny flows according to the availability of tokens (lines 22 and 25): If all tiny flows can be fully served, then we simply set their allocation of tokens to the amount they are expected to use (line 25). Otherwise, we fairly divide all the tokens exclusively among the tiny flows (line 22).

The fair allocation process is exclusively performed on either the tiny flows or the not- tiny

ones. Its purpose consists on giving all the flows, on which it is called upon, the same

number of tokens. It should be done without using any more tokens that the ones given

to it as argument, and without allowing any flows to have any more tokens that it is

expected to use.

[0022] The major operations of the allocation are described below.

[0023] Fair Allocation Function (lines 27 through 42)

• Determine the number of flows to consider in the sharing (lines 28 and 29): We find out how many flows we need to allocate tokens to. It is initially assumed that all of the flows are empty and wait to be given tokens.

• Do an iterative equal sharing among flows not full (line 30 through 35): We set up an iterative loop (of rounds) that will keep on giving tokens for as long as there any left, and there are flows that need them. At every round, t e equal amount to give to each flow (not full) is calculated. A flow is considered full if it has as many tokens as it is expected to use.

• Subtract extra tokens from the flows and make them available for sharing (lines 36 through 39): The system checks if the flows have been given more than they need, and subsequently discount the extra amount from their allocations and add such amount to the total tokens available. We also mark these flows as full as to not consider them in future rounds of the allocation.

• Update the byte spacing to be enforced in the marking (lines 40 and 41): As we give more tokens to the flows, we recalculate the marking byte spacing as a function of the estimate of their traffic and the amount of tokens they currently have.

• Record the number of tokens left after the complete allocation (line 42): This is done in order to use these tokens (if any) in the event of a flow running out of the number allocated to it.

Referring now to Figure 3, the marking process is done according to the allocation that

was previously discussed. The following major operations comprise the marking

procedure. • Mark not-TCP packets as OUT (line 47 through 49): When a packet comes into the marker, it is first checked if it contains a TCP segment. If it does not, it is marked it as OUT and put it in the forwarding queue.

• Update token reallocation on the time of packet arrival specifying interval expiration (lines 50 through 53): On packet arrival, the time is recorded in order to determine whether it is the moment for reallocation of tokens. If it is time, we call up the updating process and then record the ending time of the update, for further comparison, to determine when to update next.

• Mark SYN packets as IN and process the start of a new connection (lines 54 through 59): If the packet contains a SYN segment, the new flow information is inserted into the flow- state data structure and call up the update routine. We then mark the packet as IN and put it inside the queue for forwarding.

Mark FIN packets as IN and process the end of an existing connection (lines 60 through 63): If the packet contains a FIN segment, the flow information from the data structure, the packet is marked as IN, and is removed the FYN segment is put in the forwarding queue. • Update flow's congestion window and byte traffic for the interval (lines 64 and 65): The congestion window is updated for the flow and the number of bytes seen from that flow are also updated during this interval. This information will be used during the next reallocation of tokens for all the flows.

Mark retransmissions as IN (lines 66 through 72): If the packet contains a retransmission, the system checks for the availability of tokens for that flow, and marks the packet as IN. If the packet does not contain a retransmission, it updates the highest sequence number for the flow, in order to identify future retransmissions.

Mark packet as IN or OUT according to the availability of tokens and enforcing the correspondent byte spacing (lines 73 through 98): If the flow does not have enough tokens and neither does the common pool of possible remaining tokens (from the last updating process), the system simply marks the packet as OUT (lines 73 through 79). If there are tokens, and the spacing of bytes to enforce is zero, the packet is marked as IN (lines 80 through 83). If there is some spacing to enforce, the first packet is marked from the flow (during this interval) as IN and calculates the number of bytes (from that flow) that have to be marked as OUT before it can mark another of its packet as IN (lines 84 through 88). This last number is kept in flow '.out-count (line 87). The system marks packets as OUT until the number of bytes marked as OUT is at least equal to flow. out- count (lines 95 through 98), at which time the next packet is marked as IN, and the system recalculates flow. out- count (lines 89 through 94).

[0024] The TCP friendly packet marker implements the objectives laid out previously. The marker is expected to be implemented at the enterprise or ISP edge and operated in cooperation with an ISP which implements differential packet dropping. This scenario is suggested because the marker as presented here requires visibility into the TCP header for read operations only. An alternative would be to split the marker into two pieces, one in the enterprise side and one on the ISP side. The enterprise side piece uses the visibility of TCP headers and marks the DS-byte which can then be translated into an ISP-specific codepoint, (i.e. re-marked) by the ISP-side component.

[0025] A key per-flow variable required by the marker is an approximation of the TCP window size. This can be obtained by considering the difference between the last sequence number sent and the last byte acknowledged (seen in the latest acknowledgement). This allows classification of the packets as belonging to a "tiny" flow (W < 4) or otherwise. Tiny flows get absolute priority to tokens. The remaining tokens are fairly divided between the other flows and each flow's share is optimally spread between its packets. Retransmitted packets can be identified by comparing the sequence number of the packet with the largest sequence number sent by that flow. Retransmitted packets also get absolute priority in the token pool. Token pool mismatches, new flows etc. are handled across update intervals (which are for example sized at 400 ms, but can be made at any size).

[0026] In yet a further embodiment of this invention, Flow-RED (FRED) can also be extended into a differential dropping scheme called FRIO in a manner similar to how RED was extended to RIO. In particular, a separate set of parameters and queue measures are used for packets marked "IN" and for packets marked "OUT" . The original FRED algorithm was developed to improve fairness for fragile (small window or low rate) connections. FRED preferentially drops packets from a flow that has a large number of packets buffered at the congested gateway. The implementation issues of the FRIO algorithm are detailed in the technical report. The key modifications to basic FRED include:

• Use of instantaneous per-flow queue size instead of average per-flow queue size to approximate the per- flow number of packets (avgcq) in the buffer. This improved the performance seen by slow TCP connections.

• Calculation of average queue length exclusively on arrival of packets. The original FRED recalculates average queue length, both on arrival and departure of packets. In FRED the omission of the calculation during packet departure results in low link utilization due to unnecessary packet drops. It was found that the standard implementation of RED for Linux avoided this problem by "hand- modeling" the arrival of packets (for periods of no arrivals) and used this model in computing the average queue length, done exclusively upon packet arrival. This approach was investigated and it was determined that it did not recalculate average queue length on departure.

[0027] This FRIO extension is optional because the TCP-friendly marking component can achieve baseline gains even without this extension. Also since FRIO and RIO are fundamentally variants of the basic RED algorithm, they inherit some of its parameter- sensitivity issues, which may degrade performance under certain conditions. However, as discussed further below, the determinism in the marking process effectively masks such parameter-sensitivities .

[0028] The complete algorithm for FRIO is presented in Figure 4. The FRIO buffer management scheme performs operations on the enqueuing and dequeuing of packets into and from the buffer. The major operations of the scheme are as follows:

[0029] Enqueuing (lines 109 through 149)

• Determine priority of the packet and consider thresholds and average queue length accordingly (lines 109 through 116): If the packet is either IN or OUT, set the RED thresholds to be the respective ones for IN or OUT packets. Calculate the average queue length as the usual RED but in the following manner:

If the packet is marked as IN, the average queue length is set to the average length of the virtual queue of IN packets.

If the packet is marked as OUT, the average queue length is set to the sum of the average length of the virtual queue of IN packets and the average length of the virtual queue of OUT packets. This is in practice the same as considering the average queue length of the entire buffer.

• Perform core of "Flow operations" exclusively on TCP packets (lines 117 through 131): It is understood that the inherited FRED functionality can only be applied to packets containing TCP segments. We give all other packets the usual RED treatment, with the distinction of using either IN or OUT parameters (depending on the nature of packet) in deterπiining whether to drop or enqueue.

• Penalize unresponsive connections (lines 125 through 131): The maximum number of packets are calculated that any flow is simultaneously allowed in the buffer as a function of the RED minthresh for the type of packet (line 125). Any packets that attempt to violate this upper bound are dropped and records the fact that an active (an active connection, in this context, is one that currently has at least one packet in the queue) connection has tried to break the established bound. We also approximate the per-flow average number of packets in the buffer, and drop any packets that would make a penalized flow go beyond this average (lines 126 through 131).

• Protect fragile and slow TCP connections (lines 135 through 137 and 141 through 143): In the event of deterministic drop by RED (i.e. average queue size larger than maxthresh) the system introduces an exception to the rule that favors flows with a small number of packets in the queue. Specifically, if the packet belongs to a flow that has less than 2 packets in the queue, we do not drop the packet. Similarly, if it belongs to a flow with less than 4 packets, that has not been penalized (since last becoming active), then the system also enqueues the packet (lines 135 through 137). In the same fashion, but a little more aggressively, the system protects flows when RED is probabilistically dropping packets (i.e. average queue size is between minthresh and maxthresh). For example, the system never drops the packet if it belongs to a flow with less than 5 packets in the queue. To that effect, the system does not even compute the probability of drop for a packet in such circumstances (fines 141 through 143).

• Maintain status of flow activity (lines 150 through 157): Every time the system chooses to enqueue a TCP packet, it records information about the flow it belongs to. Specifically, the system increments by I the number of packets in the buffer for that flow.

In the event of the packet belonging to a flow with no packets in the queue, it inserts the flow into the flow data structure and update the number of active flows accordingly.

(lines 160 through 167)

• Update flow status information (lines 160 through 167): On dequeuing of packets, the system simply decrements by 1 the count of the packets that the flow has inside the queue. Incidentally, if the flow has no more packets in the queue, it removes its information from the tree flow data structure and says that the flow has become inactive. • [0031] Figure 1 shows the logical topology of the system 100 used in the present invention. There are multiple TCP flows destined to the other end of the network 100 through a 10 Mbit/sec bottleneck 102. The access 104 and egress links 106 are lOOMbit/sec. The differential dropping algorithms are implemented at the interface 108 of Rl connecting to 110 of R2, and the marking at the entry to access links 104. The present invention implements a delay component on the outgoing interface of the destination with the intention of emulating a (WAN) round trip delay of 200ms. The metrics used in the present invention have been carefully chosen to characterize best- effort performance, both from a user and operator perspective :

• Timeouts: a proxy measure for the increase in transfer delays, and retransmission delays as seen by the end user.

• Byte Loss Probability: a byte-level measure of the loss probability at the bottleneck.

• Average per-flow Goodput: measures the average per-flow throughput minus retransmissions across the set of flows.

• Coefficient of Variation (CoV) of per-flow Goodput: this metric measures the "spread" of per-flow goodputs, i.e., the (un)predictability of resulting best-effort service.

[0032] The premise of these multiple metrics is that best-effort performance looked at from the perspective of all these metrics is very different from looking at it just from the perspective of average per-flow goodput. Four configurations are used (configA through configD) in the experiments to capture: (a)the two most commonly deployed and important TCP implementations (Reno and SACK), as well as, (b) good or mistuned setting of RED-equivalent parameters (e.g.: in RIO/FRIO etc). In the present invention "good" RED parameters are those in which the max and min thresholds are relatively far apart. Mistuned parameters are those where the thresholds are set close to each other, wliich in effect defaults to a taildrop policy on a smaller buffer limit. Parameter mistuning risk is one of the leading reasons why RED is not widely deployed today and the present marking scheme can help mitigate this risk in a transparent manner. The specific RED parameters for each of these cases are presented in Table 2. The total buffer size in these experiments is about 500 Kbytes (or approx. 500 packets).

TABLE 2: RED PARAMETERS: GOOD AND MISTUNED

[0033] Based upon the above settings, the four configurations are shown in Table 3. Further, if the present invention considers combinations with dropping schemes and markers, a set of cases is obtained as shown in Table 4. The "old" TCP-friendly marker refers a marker which does not protect retransmissions. The token bucket and TCP- friendly markers deal with a pool of tokens refreshed every update interval. The number of (byte) tokens per interval is 296K and the update interval is 400ms (based upon the approx. 60% of the bottleneck capacity, 10 Mbps). In addition, each case was tested with 50 and 100 simultaneous TCP flows to get a moderate level of multiplexing.

TABLE 3: EXPERIMENTAL CONFIGURATIONS

TABLE 4: MARKER AND DROPPER CHOICES IN EACH CONFIGURATION [0034] It is acknowledged that this experimental setup captures the behavior only at a canonical bottleneck with a simple workload. The present invention focuses on a variety of schemes (SACK, Reno, RED, FRED, RIO, FRIO, TB, TCPF, RED parameters etc.) rather than a variety of configurations and workloads. However, it is argued that the basic observations on the effects of timeout reductions are valid more generally, even though the particular performance gains may vary. For example, factors like heterogeneous RTTs, multiple bottlenecks, are not essential in verifying basic timeout mitigation effects, even though they are important in determining issues like unfairness (because timeout is only one of the causes of the unfairness problem). Moreover, substantial performance gains have also been observed for short flows (e.g.: Web-like).

[0035] A variety of experiments have been run implementing the present invention. The results are presented as tables to allow selective row-by-row or column-by-column comparison, and to avoid graphs with too many curves. The following abbreviations are used in addition to the ones mentioned in Table 4: "Ploss: " (total) packet loss percentage; and "CoV: " coefficient of variation of per-flow goodput. Table entries which are referenced in the text are bold-faced.

[0036] Table 5 shows significant improvements achieved by the TCP -Friendly marker across all metrics. On the case of 100 flows (column 1) a reduction of timeouts from 911 (TB+RLO case) to 41 (TCP+FRIO case) is found, a factor of 22 improvement. In the 50 flows case, timeouts are virtually eliminated; a reduction from 491 to 2 timeouts. These results show that SACK by itself does not eliminate timeout behavior (e.g.: row #7, column #1 has 1322 timeouts), and that the TCP-Friendly marker provides performance improvement even with TCP SACK.

TABLE 5: RESULTS FOR CONFIG A:TCP SACK + GOOD RED PARAMETERS

[0037] The performance improvement is also seen in metrics other than timeouts.

Observe a reduction of 30-50% in the packet loss rate (Ploss%) with the TCP-friendly marker. For example, in column #2, Ploss% reduces from 9% in the last row to 4% in the first row. The average per-flow goodput is always higher by about 10-15% when using the present invention; e.g.: in column #3, with 100 flows, the average per-flow goodput increases from 9kB/s to 11.4kB/s. The coefficient of variation (Co V or spread) is also better (e.g.: column #4, 0.06 vs 0.38), though this CoV analysis is incomplete in general because factors such as RTT and multiple bottlenecks have a non-trivial effect. Also note that the relative performance improvement in the first 4 columns (100 flows' case) in order-of-magnitude terms is similar to that seen in the last four columns (50 flows' case), i.e., an indication of scalability performance gains using the present invention.

[0038] Though the use of FRIO provides better results than RIO, the relative improvement achieved by a TCP -Friendly marker with RIO itself is substantial. For example, compare results in row #2 (TCPF+RTO) with later rows; e.g.: 120 timeouts vs 911 timeouts. This means that, even though the present invention proposes and studies FRIO, it is not necessarily advocating complex per-flow schemes at the interior bottlenecks. In other words, the combination of per-flow, deterministic marking at the edge with stateless, random, differential dropping is a powerful one by itself.

• The performance improvement using FRED compared to RED[15]. Compare row #5 (NoMark+FRED) with row #7 (NoMark+RED); e.g.: 1030 timeouts with FRED vs 1322 timeouts with RED.

• The good performance of tail-drop compared to RED under reasonably high degrees of multiplexing [17]. Compare row #6 (NoMark+TailDrop) with row #7 (NoMark+RED); e.g.: 1140 timeouts achieved in TailDrop vs 1322 timeouts in RED.

• Ibanez et al's observations of the fact that assured services with simple token bucket marking does not lead to considerable performance improvement. Compare row #4 (TB+RIO) with later rows; e.g.: 911 timeouts with token bucket which is a marginal improvement over 1140 timeouts achieved with NoMark and TailDrop.

Simulation work illustrating the performance benefits of the old TCP-friendly marker which just focused on small window flows. Compare row #3 (TCP(O)+R,10) with later rows; E.g.: 390 timeouts achieved with the old marker is a good improvement over 911 timeouts achieved with token bucket marker.

Table 6: Results for Config B: Reno and Good RED Parameters

[0039] Table 6 reiterates the basic performance benefits noted in the earlier section with TCP Reno instead of TCP SACK. The number of timeouts go from 1127 (row #4) down to 62 (row #1) for 100 flows and from 610 to 8 for 50 flows. This represents orders-of-magnitude gain in the total number of timeouts, clearly decoupling the benefits of the marker of the present invention to the use of either Reno or SACK. The other metrics show similar relative performance gains across the board. As an aside, note that the absolute magnitudes of the Reno metrics are not as good as that of the corresponding SACK metrics presented in the previous table, reaffirming the general performance benefits of SACK.

Table 7: Results for Config C: SACK and Mistuned RED Parameters [0040] Table 7 presents results with mistuned RED (or RED-equivalent parameters). In comparison with Table 5, note that every timeout observation almost doubles (e.g.: row #, column #1 shows 92 timeouts in table 7 vs 41 timeouts in table 5) validating the negative effect of parameter mistuning. However, the relative performance benefits in terms of all the metrics of TCP-friendly marking are clearly preserved. Compare the first two rows with later rows in tables 5 and 7- e.g.: 92 or 152 timeouts with TCPF marker (rows #1-2, column #1) vs 1200+ timeouts without the TCPF marker (rows #4-7, column#l). In other words, the orders-of-magnitude performance gains especially in timeouts due to TCP-friendly marking overshadow the factor-of-2 performance degradation due to RED parameter misttining. This provides some positive incentive to deploying AQM-based differential dropping schemes. [0041] Table 8 replicates the results of configuration C (mistuned parameters) with TCP Reno instead of TCP SACK. The observations regarding relative performance and compensation of mis tuning effects made in the last section hold in this case too. Since TCP Reno is the dominant deployed protocol and since parameter mist-uning is a common occurrence, this experiment is more in tune with reality. As an aside, note that the loss rates during persistent congestion with Tail drop or RED (last two rows) can be as large as 17-20% even with moderate levels of flow multiplexing (100 flows).

Table 8: Results for Conf. D: Reno and Mistuned RED Parameters [0042] A goal of the present invention is to leverage the tools provided by diff-serv: packet marking and differential dropping. TCP-friendly marking is an interesting and powerful alternative especially due to its capability to deterministicaUy protect small window flows and retransmitted packets. The methodology of leveraging the marking and preferential dropping capabilities is clear: provision enough tokens, deterministically identify the packets which, when dropped, can disproportionately affect user performance and protect them, and then randomize or spread the remaining tokens over the rest of the packets. This type of application-aware or transport-aware marking can be done by the end systems or by edge proxy devices. In the particular case of TCP, the following benefits were found: • Reductions of up to two orders of magnitude in the number of timeouts.

Consistently larger average data goodput (10-15%) for all TCP flows.

30-50% reduction in the total loss probability across all instances of implementation.

• Considerable improvements to the predictability (variance) of best-effort service. Service variance of course depends upon factors such as heterogeneous RTTs and multiple bottlenecks.

Masking of the negative performance effects of RED parameter misconfiguration.

Improvements irrespective of TCP implementation (TCP Reno or TCP SACK).

• Relative improvements equal or better with respect to the number of TCP flows (100 flows vs 50 flows)

All stateful and TCP-aware activity can be done at the (enterprise) edge and not in the core.

Can be incrementally deployed on a customer- by- customer basis, provided bottlenecks are upgraded to support differential dropping.

Could be applied to peering points and international links (major congestion points) on the longer term to alleviate the effects of packet loss on TCP-flows.

[0043] In general, this work is complementary with other edge-to-edge work and end-to-end work (mentioned previously) to attain superior and scalable traffic management. The approach is consistent with the philosophy of complexity moving to edges; the marker at this point is best deployed at the enterprise-edge. The reduction in timeouts can also be used as the basis of differentiated TCP services offered by ISPs offered on private networks (on a single ISP or limited multi-ISP networks which support differential dropping). As mentioned earlier, this approach masks parameter sensitivities of RED and provides an incentive to deploy differential dropping and active queue management techniques at key congestion points (e.g.: peering points, international links, access links etc).

[0044] The invention provides an apparatus and method for traffic management on the Internet. The above description and drawings are only illustrative of preferred embodiments which achieve the objects, features and advantages of the present invention. It is not intended that the present invention be limited to the illustrated embodiments as modifications, substitutions and use of equivalent structures can be made. Accordingly, the invention is not to be considered as limited by the foregoing description, but is only limited by the scope of the appended claims.

[0045] What is claimed is:

Claims

We claim:

1. A method of leveraging the dimension of network based packet marking, said method comprising the steps of:

protecting small window flows form packet loss;

maintaining optimum spacing between packets marked with "IN" and "OUT" in order to disperse the effect of burst loss at interior bottlenecks; and

reducing losses encountered by retransmitted packets.

2. The method and apparatus hereunder described and defined.