WO2008155542A1

WO2008155542A1 - Method and apparatus for computer networks

Info

Publication number: WO2008155542A1
Application number: PCT/GB2008/002079
Authority: WO
Inventors: Christopher David Horton
Original assignee: Clear-Q Limited
Priority date: 2007-06-19
Filing date: 2008-06-18
Publication date: 2008-12-24
Also published as: GB0711797D0

Abstract

A method of managing a buffer (102) for data packets in a network node (101 ), the method comprising selecting a particular flow of data packets and determining to what extent data from the flow should be removed from the flow, the extent to which data is removed being determined, at least in part, in relation to a measure of demand imposed on the node by data packets arriving at the buffer.

Description

METHOD AND APPARATUS FOR COMPUTER NETWORKS

Field of the invention

The present invention relates a method and apparatus for computer networks.

In a preferred embodiment of the invention there is provided a means to restrict the impact on individual service experiences when their combined demand overloads the physical resource. This is especially necessary around the edge of a network, in the nodes that provide subscriber terminals with access and so support relatively few simultaneously active users compared with the core of the network.

Background

Congestion control features of the Transport Control Protocol (TCP) have proved successful in obtaining high utilisation and reasonably fair sharing of network capacity, while preventing congestion collapse. TCP works at the end points of data transfer and assumes that the network connecting them employs First-In-First-Out (FIFO) queues to handle temporary overload, discarding any packets that overflow a queue. However, whilst the basic mechanism is generally effective, it does create erratic delays and a variety of other undesirable effects.

The congestion control of TCP works by the sender probing the network's available bandwidth; gradually increasing its rate until a data packet is lost. The receiver signals any such packet loss back to the sender by a break in the sequence of acknowledgements, which, on the assumption that an overloaded queue has caused the loss, responds by halving its rate of sending. If TCP determines that several packets have been lost, a Retransmit Timeout briefly suspends sending before restarting from a low rate. However, a growing proportion of network traffic consists of media streams that do not respond to packet loss as a sign of congestion, and indeed are often real-time services, which are badly affected by the packet loss and queue latency that characterises the interaction between TCP traffic and FIFO queues. Some of them, such as audio telephony streams, need little capacity and barely contribute to any congestion, but others, for example video-conference streams, demand a large, and perhaps unfair, share of capacity.

Active Queue Management (AQM) schemes can shorten queue lengths by warning the sources of network traffic about incipient congestion. Such mechanisms include Random Early Detection (RED), with its flow-aware variants Flow RED (FRED) and RED with Preferential Dropping (RED-PD), and Adaptive Virtual Queue (AVQ). All of these methods track the occupancy level of real or virtual queues to decide the fraction of packets to be randomly marked or dropped. Since a queue fills when the demand exceeds the outgoing capacity, and gradually clears when demand falls within the capacity, the queue length usefully indicates the onset of overload. However, using this queue length as the basis of control action tends to stabilise the queue to a consistent and undesirable latency for every packet that passes through. The virtual queue methods aim to detect incipient congestion while demand is still with capacity so that the actual buffer runs almost clear. But the leaky bucket type of virtual queue is slow to respond to the changes in demand that are typical of a packet network. In practice it may take several round-trip times to detect an increase in demand, signal the control action, and for that action to take effect. But since TCP flows are continually increasing their demand, this lag may allow a serious overload to develop, which then requires excessive control action.

Although the proposed AQM schemes remove synchronisation effects and reduce delay, they still fail to serve the needs of real-time media flows.

Simple overflow and random AQM methods do not distinguish between the flows from which they discard packets. Certainly the large flows that contribute most to any overload also have most packets in the system and so are most likely to suffer a discard, and halving the rate of a larger flow more effectively reduces congestion. But the indiscriminate discard can also affect real-time streams and short-lived flows, spoiling their users¹ experience without much affecting any congestion. It might be necessary to drop several more packets before the demand is brought back within the limits of capacity. At other times the indiscriminate discard may remove multiple packets from a large flow, and the big reduction in demand leaves the resource under- utilised.

Most AQM solutions evolved from simple queue overflow and so discard or mark packets joining the tail of the queue, since tail drop is most likely to drop packets from a burst that is causing any overload. Unfortunately tail drop includes the queue latency in the delay around the congestion control loop, which undermines stability. But to eliminate that delay by dropping packets emerging from the head of the queue would almost certainly miss the cause of the overload, and be ineffective in controlling congestion.

Among the hundreds of flows in a core network node, statistical smoothing allows the random actions to consistently achieve the desired outcomes. However in an edge node with relatively few active flows the various unintended consequences occur often enough to promote erratic behaviour and instability in the control loop; ultimately the entire load surges between uncontrolled overload and sudden shut-off.

Finally, while the existing AQM methodologies are effective around the threshold of congestion, as overload increases their action generally reverts to the undesirable FIFO overflow, and so they lose much of their advantage. Figure 1 , illustrates the range of possible impacts that overload might have on users' experience of services. The TCP congestion controls ensure a robust response, with only a few larger interactive transfers affected during periods of overload, whilst the majority of users remain unaware of the congestion. The falling dotted line corresponds to most users suffering a breakdown in service with almost any overload, which is the fragile response that real-time services typically obtain from a packet network. The ideal congestion management solution would obtain a robust overload response for whatever applications transfer their data over the network, whether interactive or real-time.

Summary of the invention

We have realised that this ideal response could be achieved with a new type of AQM, which may be deployed in any type of network node that may be prone to overload. A preferred embodiment of the invention co-operates with TCP's end-end congestion control to minimise the storage of packets in network queues, and acts decisively against flows that would take an unfair share of the limited capacity. The resulting consistent low delay dramatically improves the service experience for real-time applications, while interactive services obtain the full benefit of the established TCP congestion controls. High utilisation of capacity is ensured, not merely by keeping the resource as full as possible consistent with low delay, but also by suppressing bad-put; the wasteful throughput of large streams that are worthless to the user because of delay, packet loss, or other undesirable effects.

A highly preferred embodiment of the invention is based on two interrelated approaches. Firstly, the methodology responds to the total demand approaching overload by selecting just one flow at a time from which to discard as a sign of congestion to the end points of that flow; the proportion of that flow which is discarded increasing progressively with total (measured) demand, until eventually, when demand reaches a predetermined point the entire flow is discarded. Secondly, what may be termed a shadow queue measures the demand separately from the (actual) queue that buffers the load. This allows the latter actual queue to be kept almost empty and so maintain a consistently low packet delay, while the shadow queue monitors the optimum temporal size of window for detecting incipient overload and selecting individual flows for discard. These principles continue to operate to the most extreme and unlikely levels of overload, when the demand exceeds the capacity many times over. This contrasts with known AQM methods, which generally revert to the basic overflow response, with all its limitations, when subjected to overloads perhaps as little as 10% to 30%.

Further embodiments of the invention include: automatically adapting to dynamically varying capacity; managing congestion with a composite shadow queue when multiple channels share a limited resource; selecting only lower-priority flows for discard in a system of policy-based QoS and when media streams are encoded in prioritised layers; and selecting flows for diversion in a multi-route network.

A network node that employs the present invention is preferably much more resilient than if it used a known AQM method, in that it can maintain a consistent quality of experience for the great majority of subscribers in the face of unexpected overload. This conveys important benefits to a network operator: in the quality of their services; in the utilisation of the network; in the planning of new capacity; in controlling operating costs; in attracting and retaining customers. The number of flows, which are subjected to discard as a result of the preferred methodology, is accordingly minimised so that the majority of users enjoy an acceptable level of service.

According to a first aspect of the invention there is provided a method of managing a buffer for data packets in a network node, the method comprising selecting a particular flow of data packets and determining to what extent data from the flow should be removed from the flow, the extent to which data is removed being determined, at least in part, in relation to a measure of demand imposed on the node by data packets arriving at the buffer.

According to a second aspect of the invention there is provided an apparatus for managing a buffer for data packets in a network node, the apparatus comprising a data processor which is configured to select a particular flow of data packets and determine what extent data from the flow should be removed from the flow, the extent to which data is removed being determined, at least in part, in relation to a measure of demand imposed on the node by data packets arriving at the buffer.

Further embodiments of the invention relate to machine-readable instructions, whether stored on a data carrier or encoded as a signal, which, when executed by a data processor realise the method of the first aspect of the invention.

In a highly preferred embodiment of the invention there is provided a method to manage a queue or buffer in a packet data network that, at a threshold related to the rate of total demand, selects one of the flows from which to discard; the discard proportion increasing progressively to an upper threshold, beyond which the entire flow is discarded, wherein: a) At the lower threshold the method discards just one packet from the flow, b) then from the next higher threshold discards every Nth packet of the flow, c) and above the upper threshold discards the selected flow completely.

The method preferably further includes the step that packets are discarded as they emerge from the head of the queue, as a send/drop decision.

Preferably the data rate of demand, against which the discard thresholds are set, is measured by a moving window on a shadow queue, separate from the actual queue that buffers the data. Each incoming packet joins the actual queue, and the packet's parameters pertaining to congestion control decisions are placed in the shadow queue. Packets leave the actual queue in First-In-First-Out order, as the outgoing resource capacity allows. Shadow packets are removed from the shadow queue after they have been in the shadow queue for longer than the window interval.

The flow selected for discard is preferably the flow that is the largest in the shadow queue. Preferably when complete discard of the selected flow is insufficient to relieve overload, then the next largest flow is progressively discarded, and so on.

After starting the complete discard, each packet of the selected flow is preferably discarded from the actual queue until there is none of that flow left in the shadow queue, even though the demand may drop below the thresholds.

If the actual queue exceeds a discard threshold, then the discard is desirably progressed further than indicated by the demand in relation to shadow queue thresholds.

Preferably the discard thresholds are dynamically adjusted according to the relation between actual and shadow queues, wherein; a) If the actual queue is growing as a symptom of overload but not the shadow queue, the thresholds are lowered, and b) If the shadow queue fills beyond its higher thresholds, but the actual queue is clear and so not actually overloaded, the thresholds are raised.

A single composite shadow queue desirably monitors the total demand of all of multiple channels that share the same resource.

In a Policy Enforcement Point of a QoS architecture, the packet parameters in the shadow queue preferably include an indication of priority, by whatever means, including a DiffServe codepoint in the ToS field of the packet header, or otherwise associated with the flow identity, the discard decision is based on each flow's net size; being the amount that the flow has in the shadow queue divided by a weighting factor proportional to its priority under QoS policy, or if the policy requires absolute priority then the net size is set at zero.

Where the resource is constrained other than by bit-rate, for instance by radio power, the shadow queue preferably sets parameters on each arriving packet on the power and data rate of the flow's most recently transmitted packets, and the discard thresholds are then set in terms of the total resource to transmit the contents of the shadow queue, and the net largest flow selected for discard is that which would take the largest share of resource to transmit that flow's content in the shadow queue.

When the flow selected for discard contains sub-flows distinguished by whatever means, including the ToS field of the packet header, the packet parameters in the shadow queue preferably include an indication of the sub-flow to which the packet belongs, and a packet is discarded only if it belongs to the sub-flow with the largest proportion of that flow's data in the shadow queue. When complete discard of the selected sub-flow flow is insufficient to relieve overload, then the next largest sub-flow in the largest flow is preferably progressively discarded, and so on, except that once there is only one sub-flow left in the largest flow, then its packets are sent, and the next largest flow is selected for discard.

In a multi-route network or wireless mesh, the measurement means preferably proposes, back to the routing function, any flow selected for complete discard as a candidate for re-routing. The packet order is preferably retained by the method of temporary queues to hold the packets of the diverted flow while the new route is found. The present invention desirably finds particular utility in access networks, for example in router equipment, such as edge routers, and wireless hub or base station equipment.

Brief description of the drawings

Various embodiments of the invention will now be described, by way of example only, with reference to the accompanying drawings in which:

Figure 2 is a schematic representation of the elements of a network node which is configured to manage congestion and control overload,

Figure 3 charts the progressive discard profile employed by the node of Figure 2,

Figure 4 is a flow diagram of the progressive discard process at the head of the actual queue in the buffer of the node in Figure 2,

Figure 5 is a schematic representation of the elements of the shadow queue by which the node of Figure 2 measures the demand and selects flows for packet discard,

Figure 6 is a flow diagram of the moving window flow measurements performed by the shadow queue,

Figure 7 is a flow diagram of the analysis of the measurements to detect incipient overload and to select a flow and the proportion of its packets to be discarded from the selected flow,

Figure 8 shows use of a composite shadow queue to manage congestion on multiple routes, by the example of a WLAN hub with incoming and outgoing routes sharing the same radio resource, Figure 9 is a table showing how the selection and discard decision may account for multiple parameters: resource usage, policy-based priority, layering of media flows,

Figure 10 shows an arrangement of queues to preserve packet order when redirecting a flow to a new route, and

Figure 1 1 shows multiple graphs to contrast the distinctive overload response of the current invention with that of known AQM methods.

Detailed description of exemplary embodiments of the invention

Reference is made initially to Figure 2, in which a node 100 in a packet data network receives packets from sources 111 , 112 and 113, and sends them towards destinations 115, 116 and 117 through a resource 101 , which has a (known) limited capacity for transmission of data thereby. The node 100 further comprises a data processor and a memory. When packets arrive simultaneously the buffer 102 queues any excess until the resource becomes free. For the purposes of the explanations that follow four different packet flows are considered, these are: a web page 121 from source 111 to destination 115; e-mails 122 and 123 fetched from server 112 to destinations 116 and 117 respectively; and a media stream 124 from source 113 to destination 117 (which is also receiving flow 123).

In an Internet Protocol (IP) network, each flow is distinguished by its source IP address and destination IP address. In a wireless network, the base station distinguishes between flows by the mobile terminal such as a Media Access Control (MAC) address in WLAN, or link identity such as a PDP context in 3GPP standards. Those skilled in the art will be able to determine the appropriate characterising distinctions between different flows in any particular network technology. A monitoring means 103 is configured to measure the total demand on the resource, and select the packets that are to be discarded into the discard receptacle 105 by the discard means 104. Because the mechanism selects a specific flow, the discard may be from the head of the queue, which removes the latency of the actual queue from the feedback loop. This ensures that the (intended) recipient receives the earliest possible indication of loss, and the sender the earliest signal to reduce the rate of sending.

In operation, one flow is selected and its packets are discarded according to the stepped profile of Figure 3. The discard proportion increases progressively as the measured demand in relation to the resource capacity passes thresholds 201 , 203 and 205. At the lower threshold 201 just one packet is discarded from the flow, which is a small percentage 202 of the flow's packets that pass through the node whilst it is selected for discard. Then when demand exceeds the next higher threshold 203 every Nth packet of the flow is discarded, where N is typically around 4 to 7, giving an intermediate discard percentage 204 about 14% to 25% of the flow. Above the upper threshold 205 the selected flow is discarded completely (as shown at 206). Typically the lower threshold is set somewhat less than the capacity of the actual outgoing link 101 ; the next higher threshold around its capacity; and the upper threshold somewhat above the actual capacity. By setting the lower threshold to below the capacity of the shadow queue little or no queuing results in the actual queue.

Detailed operation of the discard means 104 is now provided with reference to Figure 4. The first step on entry 300 to the continuous loop is to wait until the resource indicates that it is free to send the next packet and also that a packet is waiting in the queue 102, in which case this packet is sent (as shown at step 302). If there is no subsequent packet in the queue 102 then the process returns to step 301 to await the arrival of the next packet. Otherwise the method identifies the flow to which the packet waiting at the head of the queue belongs, examines the so-called drop state that the measurement means 103 has set for the flow and at step 304 sets a temporary variable 'n¹ according to the drop state. The value of n is set so that one packet is dropped for every n packets of the flow that pass into the queue 102. The discard means maintains a variable 'count' for each flow, which is set to 0 in each new flow record (at steps 312 described below). The test at step 306 keeps 'count' at 0 until a packet has been dropped from the flow, and thereafter counts the number of packets sent since the last one was dropped at step 308. The test at step 307 ensures dropping of the first packet that passes on transition from the initial No Drop state, and the dropping of one packet in n in the Drop N state. After discarding a packet and setting count to 1 at step 309 the method determines whether the next packet in the queue should be discarded. If the waiting packet is not to be dropped, as determined at steps 306 and 308, then the method returns to step 301 until the resource becomes free.

Assuming that the demand made on the node is caused mainly by TCP flows that gradually increase the rate of data entering the queue 102, discarding one packet will cause the largest flow to pause briefly and then resume sending at half the rate. Another flow then becomes "largest" until a discard causes that one to slow down, and so on. This action on successive peak flows is frequent and limited, which tends to ensure stability and continuous high utilisation. Zero packet loss is assured for any real-time services that send at a low rate, and the shortlived TCP flows that complete before ever becoming the largest, while the remaining capacity is shared evenly among the longer-lived TCP flows.

During severe overload, there may be so many flows that halving the rate of successive flows is insufficient, causing the demand to exceed the middle or higher thresholds. Those familiar with the TCP congestion response will see how this action reduces the rate substantially, or may trigger a Retransmit Timeout that suspends sending for several round- trip times. But advantageously still only a few flows are affected, while the majority of short-lived TCP flows and low-rate media streams remain untouched.

If the main reason for the overload is one large media flow that does not respond to packet loss, then the successive division of capacity between a growing set of TCP flows tends to leave the one media flow as the largest, and if this flow does not respond to packet loss, then it will soon be subject to complete discard. The discard means 104 then locks the flow into a total discard state, so that, to the user application, it appears that the connection is broken, and within a few seconds either the user or the application will abort the stream and relieve the overload. The responsive TCP flows can then resume their former sending rate. While the large media flow is brought under control the low-rate real-time services and short-lived flows remain unaware of any congestion.

Details of the monitoring means 103 are shown in Figure 5. As each incoming packet joins the actual queue 102, its parameters relevant to congestion control decisions are recorded in what may be termed a notional shadow queue 131. The shadow queue 131 maintains a temporally moving window by removing packets as soon as they have been in the shadow queue for longer than a window interval 132. For each packet added or removed from the shadow queue, the analysis (as shown in table 133) is updated to reflect the amount of data that each flow has in the shadow queue window. The thresholds 201 , 203 and 205 are set in relation to the total amount of data in the window (as shown at 134), which is proportional to the total demand on the outgoing resource 101. Figure 5 gives the example of the total demand 134 exceeding the middle threshold 203, so that the largest flow 122 is set to Drop Nth (as highlighted at 135).

If through monitoring the total amount of data in the window of the shadow queue 131 the overload is determined to be increasing, the methodology continues its action to successfully encompass as many flows as necessary to control congestion and maintain low delay. Figure 9 gives an example of such successive selection (noting that the same principle applies when thresholds are set on the total Energy to send (as shown at 143) rather than the Queued amount 134) in that the total demand 143 far exceeds the upper threshold 206, and even discarding the whole of that flow and removing its contribution from the demand leaves the remainder above the middle threshold 204, requiring the next largest flow to be set to Drop Nth. The process continues until the measured total demand falls below the lower threshold 202.

The measurement/monitoring process follows the flow of Figure 6, entering at point A (as shown at 310) when each new packet arrives. The arrival of each new packet triggers the start of the process whereby the total amount of data in the shadow queue at that instant (less any data that is to be discounted or removed because it has been there for longer than the predetermined interval) is used to calculate a measure of demand on the resource. The packet joins the actual queue (as shown at 311 ), and if the flow is not recognised (as shown at 312) a new flow record is created. The new amount for the flow (as shown at 313) is used to move the flow record up to its correct place in the ranking of flows (at step 314). Then, at 321 , the process checks if the window interval has moved on past any packets at the head of the shadow queue. If not, it proceeds via B 330 to select the largest flow and determine the discard percentage. If the test at 321 finds that the packet is too old for the window, then it is removed from the analysis (at step 322), and if that leaves a zero amount for the flow, meaning the flow has no packets in the window, its flow record is removed entirely (as shown at 323). Otherwise it is moved down to its correct place in the ranking (as shown at 324). This is repeated at step 321 until there are no more packets outside the window and the process moves on via point B (at step 330) to the entry point to Figure 7 that shows the detection and control of overload. Figure 7 shows the iterative process initialised (at step 331 ) by selecting the flow with the largest amount and setting a temporary variable 'rate¹ to the total demand (ie the total amount of data) of packets in the instantaneous window of the shadow queue. First of all the rate is adjusted (at step 332) to remove the effect of flows that are being completely discarded, but usually the process passes directly to step 333 to compare the thresholds and set the drop state of the largest flow that is not yet being 100% dropped. Finally the next smaller flow in the ranking is set to No Drop, except that once any flow is set to Drop 100% it stays in that drop state until all of its packets have left the shadow queue window. A similar step at 324 (for clarity Figure 6 does not show this), setting the drop state to No Drop whenever the next higher flow is also No Drop, would ensure that only one flow is ever set to an intermediate drop state of either Drop 1 or Drop Nth. At point C (at step 340) the analysis is complete until re-entry at point A when the next packet arrives.

Advantageously the window-based rate measurement detects an increase in demand much quicker than an actual queue or leaky bucket rate measurements. In practice the window-based measurement can detect an increase in demand and signal control action within one round-trip time. This, when combined with a discard means at the head of the queue 104, keeps the delay around the control loop within about two round-trip times, which greatly contributes to stability. The predetermined 'time out¹ window period is set such that items are removed from shadow queue some time after they are removed from actual queue in the buffer.

No state information is held outside the analysis 133 of the shadow queue, so that if a flow currently has nothing currently queued it is deemed not to exist. 79

16

The primary way of monitoring the shadow queue will usually have the result of maintaining a short actual queue for most usual traffic. However, an unusual burst might suddenly overload the actual queue, and so as a safeguard, in addition to the shadow queue thresholds, a threshold is set on the actual queue 102, lower than the tail-drop capacity would be for a basic FIFO queue. The comparisons 333 of Figure 7 are extended so that when the actual queue exceeds this threshold, the selected flow is set to Drop 100% to quickly eliminate the delay.

This also prevents excessive delay from accumulating when the largest flow is real-time and does not respond to packet drop, or when the actual resource capacity varies, for example as a result of fading on a radio link, to below the threshold set on the shadow queue.

Many network technologies provide dynamic link capacity, affected by factors that cannot be predicted or controlled. One example is an 802.11 WLAN operating in ad-hoc mode, where peer-to-peer transfers between terminals occupy some of the resource. Another is W-CDMA in which the interference floor, and hence available capacity, is affected by the load on neighbouring cells; known as cell breathing.

If the capacity increases, then the shadow threshold (ie the lowest threshold 202) would trigger unnecessary discard, and although the extra resource allows the actual queue to clear, the resource thereafter remains under-used. If the capacity contracts, then the actual queue may build to cause significant delay, while the amount in the shadow queue remains below any threshold. Preferably in such an environment the notional capacity, on which lower, middle and upper discard thresholds are based, is dynamically adjusted according to the relation between actual and shadow queues:

• If the actual queue indicates overload but not the shadow queue, the thresholds are lowered. • If the shadow queue indicates overload but the actual queue is clear, the thresholds are raised.

Thus interaction of demand and capacity with the actual and shadow queues may be used to track the dynamics and adjust the thresholds in response.

There is a class of wireless technologies; the most widely used being 802.11 WLAN, in which outgoing and incoming flows share the same spectral resource. Each node uses a Carrier Sense Multiple Access protocol (CSMA) to determine when they can transmit. When the hub operates in infrastructure mode, remote terminals always work through the hub, and never peer-to-peer. Figure 8 gives the example of such a hub managing congestion of the resource by removing the main cause of congestion whether the cause is an outgoing flow, or incoming, or both.

By comparison with Figures 2 and 5 it may be seen that Figure 8 shows the same sources 111 and 112 of flows 121 , 122 and 123 to destinations 115, 116 and 117, buffered by an actual queue 102 and controlling congestion with discard at 104. In addition, Figure 8 shows that source 113 is now a terminal beyond the shared resource 101 so that flow 124 is both incoming and outgoing, passing the link 142 within the hub to be buffered 102 for access to the resource 101 to its destination 117. There is an additional source 114 of two flows 125, 126 to destinations 118, 119. Sources 113 and 114 each contain a buffer, 153 and 154, for packets waiting for access to the shared resource 101.

The composite shadow queue 131 monitors the total traffic through the resource, with the methods of Figure 6 and 7 triggered by each packet passing through the incoming link, as well as those joining the queue 102 for the outgoing link. In the composite shadow queue 131 of Figure 8 the contents are drawn to distinguish between the incoming flows 125 and 126 and the outgoing flows 121 , 122 and 123, and it is seen that the flow that has packets in both incoming and outgoing links 124 is doubly recorded. The steps of Figures 5 and 6 are followed to select the largest flow and determine the amount, if any, to be discarded, and the steps of Figure 4 are executed at discard means on the incoming link 144 in parallel to that on the outgoing link 104. When the net largest flow is incoming, its packets are discarded at 144 as they arrive from the incoming receiver, or when the net largest flow is outgoing, its packets are discarded at 104 from the head of the actual queue. When the net largest flow has packets in both incoming and outgoing shadow queues, as for flow 124 in this example, its packets are discarded at both 104 and 144 from both outgoing queue and incoming receiver.

Now it may be noted that, when congestion control occurs downstream of the resource, some resource is wasted by discarding packets that have already used it. However, progressive discard ensures the minimum waste to control TCP congestion, and real-time flows will soon abort. In practice the method of the composite shadow queue serves to keep the indirectly managed queues 153 and 154 as clear as the directly managed queue 102.

Figure 9 illustrates how the analysis of the shadow queue 133 can adjust the net size of each flow 136 for the selection decision according to multiple parameters besides the amount of data.

If the outgoing link resource 101 is constrained by other than the amount of data, then feedback from the resource control functionality is used to determine flow sizes and thresholds in terms of the actual resource that it utilised in sending the queued data. If it is the link power that is controlled, for instance to minimise interference in the 3GPP technologies GPRS and W-CDMA, then the adjustment factor is the Energy per unit 141 , defined by:

Energy_per_unit = Link_Power / Transmit_Rate And the flows are ranked by the Energy to send at 142: Energy_to_send = Queued_amount x Energy_per_Unit So in the example of Figure 9 the resource to send a large amount 121 to a mobile terminal operating at low power close to the base station may be less than to send a small amount 124 to another terminal needing high power to reach it at the fringes of coverage. Since the output link capacity is limited by the transmitter power, then the overload thresholds 202, 204 and 206 shown in Figure 9 relate to the total Energy to send a total 143 of flows waiting to be transmitted.

The above embodiments may be adapted for use as a Policy Enforcement Point in a QoS architecture wherein if the total data amount or size is divided by a factor proportional to the priority of a particular flow. In the example of Figure 9: the flow 124 has the highest Energy to Send but a higher Priority factor 137 puts it second in the ranking of Net size; and although flow 121 contributes most of the total amount its low power and high priority put it at net smallest. Those flows whose policy determines that they obtain absolute priority may have their Net size set to zero, so ensuring that such flows are never selected for discard.

Since the methods described above result in consistently low delay for all flows, then this completes the QoS requirements by avoiding packet loss from those flows that should obtain enhanced service. In contrast to most conventional QoS scheduling mechanisms that shepherd priority packets through the network, the invention obtains the goal of low delay and loss by removing contending traffic out of the way. It becomes possible for admission control to risk higher utilisation, since the consequences of occasional overload are less damaging than conventional solutions with multiple queues.

Some media flows differentiate between packet priorities within the flow, and may mark different values in the ToS bits of the IP header, or by other means which will be understood by those skilled in the art. Such "sub-flows' within a flow are identified by the extension of steps 312 and 313 of Figure 6. In the example of Figure 9 one flow 124 selected for discard contains such sub-flows 138, and so only the largest sub-flow within it is set to Drop Nth, and smaller sub-flows left as No Drop. Under conditions of severe congestion, the methods described above are applied, so that when complete discard of the selected sub- flow flow is insufficient to relieve overload, the next largest sub-flow in the largest flow is progressively discarded, and so on. However it is preferred to allow the smallest sub-flow to continue, rather than deleting the flow completely. So once there is only one sub-flow left in the largest flow, then that sub-flow remains in the No Drop state, and the next largest flow is selected for discard.

This method of managing congestion, when flows contain prioritised sub- flows, is useful when for example a video stream consists of a base layer and one or more enhancement layers: the base layer being essential to reconstruction and takes highest priority; while the other layers merely improve the presentation and have lower priority. A large flow may enclose a virtual private network, containing multiple flows that are being tunnelled between parts of a corporate intranet, with the packets may be labelled with different drop priorities to ensure that the few really important flows within the tunnel are preserved, while allowing the discard, if necessary, of relatively ordinary flows. Passing all the packets through the single actual queue 102 keeps them in the order they were sent, which is important for many media flows. Policy-based QoS solutions typically queue the priorities separately, and risk changing the packet sequence.

The ranking of sub-flows is on size alone, and takes no account of the actual value in the ToS field. It is therefore essential that the base layer and the intermediate enhancement layers are evenly distributed as occasional packets among the more numerous and less important enhancement packets. To guard against a denial-of-service attack sending so many different sub-flows that only a small portion is ever discarded, the extensions to handle sub-flows from 312 and 313 of

Figure 6 preferably set a limit of 3 to 5 sub-flows, or that the largest sub-flow is at least 30% to 50% of the complete flow. Violating these limits causes to the flow being treated as one complete flow.

In a multi-route network or wireless mesh, where the shadow queue may warn the routing function which flow should be re-routed to avoid incipient overload, the focus on the few causes of congestion on any particular link minimises unnecessary re-routing, and so optimises efficiency and stability of the network.

Figure 10 shows how a routing function 161 might divert a flow from the original route 101 to a new route 162. The packet order is retained by the method of temporary queues to hold the packets of the diverted flow: the first queue holds packets emerging from the head of the actual queue 163; and another queue takes new packets that would otherwise have joined the tail of the actual queue 164. Once the new route is established, a third queue holds first packets to be sent towards the new route 165, while packets from the temporary head queue 163 are sent fresh, then those from the temporary tail queue 164. Once these queues have cleared, then any subsequent packets for the flow, which have been buffered in the third queue 165, are sent. Once all the queues are empty, packets pass directly from the routing function to the new outgoing resource. If no alternative route is found, then the contents of the temporary queues are discarded.

The above exemplary methods of the present invention result in a overload response that is robust, in the terms discussed in relation to Figure 1 , and clearly distinguishable from known AQM methods by amongst other things, its: consistent low delay, even with severe overload; robust response for real-time media streams; firm suppression of large and unresponsive causes of congestion.

The graphs 501 to 504 (which relate to known AQM methods) and 511 to 514 (which relate to embodiments of the present invention) in Figure 11 compare the response of known AQM methods with that of embodiments of the invention, in a network node carrying a broad mix of traffic: mainly interactive with some real-time; a few large flows and many smaller ones. The first six graphs show the flow-by flow response at the network level, with the horizontal X axis denoting the demand in terms of interactive flow size or real-time streaming rate as a proportion of resource capacity, and the vertical Y axis denoting the actual throughput. For interactive flows 521 this throughput is its size divided by the time to complete the transfer, and for real-time streams 525 it is plotted as the input rate less the proportion of packets lost or excessively delayed. Each chart shows a solid line denoting the usual light load performance of interactive 522 and real-time 526 flows, and a dotted line where it is assumed that users would notice the degradation. It may be noted that interactive services tolerate quite a large reduction in performance 523, and real-time flows comparatively little 527.

From the top pair of graphs it is seen that, when the demand-to-capacity ratio is 0.9, being just short of overload, the generic AQM response 501 is the same as that of the present invention 511. For the next row of graphs, the demand overloads capacity by 20%, and the random discard of generic AQM noticeably disrupts half of the real-time flows 531 , while the present invention discards only from the largest real-time flow 532, leaving others unaffected. The suppression of this real-time flow frees the capacity for faster completion of the interactive flow 534 than the equivalent flow 533 obtains from AQM. At the 50% overload shown in the third row of charts, AQM methods 503 generally revert to the basic overflow discard, which disrupts the larger interactive flows 535 and all but one of the real-time streams 536. The above described embodiment 513 maintains its active management and selective discard, to substantially block the largest real-time stream 538, and to noticeably degrade only one interactive flow 537, leaving the majority of flows barely touched by the congestion. The lowermost pair of graphs, 504 and 514 summarise the performance in the form of Figure 1. For interactive flows using TCP congestion controls, generic AQM 504 is successful in providing a robust overload response 551 , but the response is distinctly fragile for real-time services, since almost any overload results in most experiences 552 becoming unacceptable. In contrast the present invention 514 achieves a robust overload response to real-time streams 554, and is even more robust for interactive traffic 553.

A network node that employs the present invention is much more resilient than if it used a known AQM method, meaning that it can maintain a consistent quality of experience for the great majority of subscribers in the face of unexpected overload.

Whilst the above embodiments of the present invention have been discussed in terms of controlling network flows by discarding packets, one of ordinary skill in the art will recognise that the method and system control the rate of packets arriving in a queue, and that a signal sent to the source of a flow to reduce or halt the rate of sending would be effective.

Claims

1. A method of managing a buffer for data packets in a network node, the method comprising selecting a particular flow of data packets and determining to what extent data from the flow should be removed from the flow, the extent to which data is removed being determined, at least in part, in relation to a measure of demand imposed on the node by data packets arriving at the buffer.

2. A method as claimed in claim 1 in which the extent to which data is removed from the flow progressively increases in relation to the measure of demand until a point at which the demand is such that all packets of said flow are removed.

3. A method as claimed in any preceding claim in which the proportion of data which is removed from the flow is determined by multiple thresholds, wherein when the measure of the demand exceeds each progressively higher threshold a greater proportion of the selected flow is removed.

4. A method as claimed in claim 3 in which at least three thresholds are used, in which:

(a) above a lower threshold one packet is removed from the selected flow;

(b) above an intermediate threshold every N^th packet of the selected flow is removed;

(c) above an upper threshold all packets of the selected flow are removed.

5. A method as claimed in claim 4 in which the lower threshold is less than the capacity of the resource for transmission from the node, the intermediate threshold is around the capacity of the resource, and the upper threshold is above the capacity of the resource.

6. A method as claimed in any preceding claim in which the proportion of a flow which is removed for a given measure of demand is varied over time.

7. A method as claimed in claim 6 in which the proportion is varied in relation to the measure of demand on the node.

8. A method as claimed in any preceding claim in which for each flow received at the node, creating a log of a measure of the size of the received flow, and removing from a respective part of the recorded information in the log relating to packets which have been present in the record for longer than a predetermined time.

9. A method as claimed in claim 8 in which the recorded size of a flow is determined by the amount of data of that flow.

10. A method as claimed in claim 8 in which the recorded size of a flow is determined by the energy required by the node to send the received flow.

11. The method as claimed in any of claims 8 to 10 in which the cumulative size of all recorded flows is compared to a threshold value.

12. The method as claimed in claim 11 in which if the cumulative size is greater than the threshold value, the largest flow in the record is selected.

13. The method as claimed in claim 12 in which the amount of data to be removed from the flow in the buffer is related to the cumulative size.

14. A method as claimed in claim 13 in which the aforementioned data is also removed from the log.

15. A method as claimed in claim 14 in which if the updated cumulative size recorded in the log remains above the threshold value, then the largest flow in the log at that time is then selected, and it is determined to what extent data is removed from that flow.

16. A method as claimed in any preceding claim in which on indication the flow's priority status is taken into consideration when determining a flow's net size.

17. A method as claimed in any preceding claim in which the packet at the head of the queue in the buffer is analysed to determined whether the packet is to be removed.

18. A method as claimed in any of claims 8 to 17, when appended to claim 6, in which the proportion is dynamically adjusted in relation to relative cumulative data sizes in the log and in the buffer.

19. A method as claimed in any preceding claim in which only one flow is selected at a time for consideration as to whether data is to be removed from that flow.

20. A method as claimed in any preceding claim in which the measure of demand, and the need to remove any data from a flow, are determined on arrival of each new packet at the node.

21. Apparatus for managing a buffer for data packets in a network node, the apparatus comprising a data processor which is configured to select a particular flow of data packets and determine what extent data from the flow should be removed from the flow, the extent to which data is removed being determined, at least in part, in relation to a measure of demand imposed on the node by data packets arriving at the buffer.