WO2007047864A2 - Traffic shaping and metering scheme for oversubscribed and/or cascaded communication devices - Google Patents

Traffic shaping and metering scheme for oversubscribed and/or cascaded communication devices Download PDF

Info

Publication number
WO2007047864A2
WO2007047864A2 PCT/US2006/040926 US2006040926W WO2007047864A2 WO 2007047864 A2 WO2007047864 A2 WO 2007047864A2 US 2006040926 W US2006040926 W US 2006040926W WO 2007047864 A2 WO2007047864 A2 WO 2007047864A2
Authority
WO
WIPO (PCT)
Prior art keywords
traffic
queue
frame
block
oversubscribed
Prior art date
Application number
PCT/US2006/040926
Other languages
French (fr)
Other versions
WO2007047864A3 (en
Inventor
Edward Ellebrecht
Marek Thalka
Poly Palamuttam
Original Assignee
Ample Communicatins Inc.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority to US72816605P priority Critical
Priority to US60/728,166 priority
Application filed by Ample Communicatins Inc. filed Critical Ample Communicatins Inc.
Publication of WO2007047864A2 publication Critical patent/WO2007047864A2/en
Publication of WO2007047864A3 publication Critical patent/WO2007047864A3/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic regulation in packet switching networks
    • H04L47/10Flow control or congestion control
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic regulation in packet switching networks
    • H04L47/10Flow control or congestion control
    • H04L47/20Policing
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic regulation in packet switching networks
    • H04L47/10Flow control or congestion control
    • H04L47/22Traffic shaping
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic regulation in packet switching networks
    • H04L47/10Flow control or congestion control
    • H04L47/32Packet discarding or delaying
    • H04L47/326With random discard, e.g. random early discard [RED]

Abstract

A method for shaping traffic in an oversubscribed environment, supporting multiple queues and classes of service and extending this approach to cascaded devices is disclosed. The invention permits use of oversubscribed line-interface devices without adversely affecting high priority traffic and allowing the use of lo cost networking devices such as network processor units.

Description

Traffic Shaping and Metering Scheme for Oversubscribed and/or Cascaded Communication Devices

Edward T. Ellebrecht

This application claims the benefit of the U. S. Provisional Patent Application Number 60/728166 filed on October 18, 2005

FIELD OF THE INVENTION

The invention relates to the field of data transmission from multiple sources and more specifically to managing data when an Ethernet network is oversubscribed. The purpose of the invention is to permit the use of oversubscribed line-interface devices without adversely affecting essential (control plane) and other high-priority traffic while still permitting lower cost data processing devices (typically network processors units, or NPUs) to be used.

BACKGROUND OF THE INVENTION It is quite common to have a data network operate at less than its full capacity, typically around 50% utilization. This is due primarily to the "bursty"' nature of the data being transmitted. This approach is quite costly and methods have been developed to better utilize the available capacity. One approach oversubscribes the system by some factor and exploits the fact that many users will not be utilizing the system all at the same time, thus having sufficient capacity available under most conditions. This approach allows the designer to play the "averages" by assigning the information processing rate to a port that is greater than the speed of the port. The approach is attractive as it saves the costs of the port connections, typically a significant portion of the total system cost. However, to successfully implement such approach, one requires historic information about the network usage, such as that obtained from actual system measurements. Additionally, the approach requires continuous decision making as to which traffic to allow and which traffic to drop. SUMMARY OF THE INVENTION

The invention addresses two related but distinct problems:

1. Traffic shaping in an oversubscribed environment, supporting multiple queues and Classes of Service (CoS). 2. Traffic shaping for cascaded devices, supporting multiple queues and CoS.

This solves the problem of determining which traffic to drop when operating in an oversubscribed environment, i.e., the ingress traffic bandwidth exceeds the egress traffic bandwidth.

In both cases, the invention allows high-priority traffic to be serviced preferentially, including the ability to provide per-port Service Level Agreements (SLAs) that precisely control the traffic such that a minimum bandwidth is allowed access while simultaneously maintaining a maximum bandwidth for the port traffic. In addition, the traffic from multiple ports is combined such that any excess bandwidth is allocated in programmable proportions.

BRIEF DESCRIPTION OF THE DRAWINGS Figure 1. shows the basic arrangement of cascaded devices Figure 2. shows arrangement of per-port functional blocks Figure 3. is RED graph Figure 4. shows an arrangement of common functional blocks Figure 5 is rate limiting block diagram

DETAILED DESCRIPTION

When multiple devices are cascaded, as shown in Figure 1 , the invention combines the traffic from the port serviced by the second device so that they observe the same level of traffic management as the ports directly interfaced to on the first (primary) device.

The invention includes of a set of functional blocks that are provided for each interface port, organized as shown in Figure 2, and a different set of common functional blocks. The traffic enters from the line-side interface and enters the per- port blocks. From there it is sent to the common functional blocks. There it is acted upon and prepared for transmission.

Each block and its functionality is further decomposed, as the flow of traffic from point A to point G is discussed below:

l) The line-side interface receives the data from an external device, translating the external format to the device internal format and verifying that it is received error-free. This data, presented as a series of Ethernet 802.3-formatted frames, is sent to the Broadcast / Multicast Limiter block (point A). The Limiter block is used to throttle multicast and/or broadcast traffic. This is typically used to limit the amount of control-plane traffic following network disruptions or reconfigurations. The limiter block has independent broadcast and multicast controls. The multicast traffic has an additional control mechanism, in that four (4) mask and match exception registers are provided. The mask and match function is a two-step process. First, the ingress multicast address is "masked" (logical, bit-wise and function) with the contents of the mask register. If the resulting value matches that of the corresponding match register, then this frame is considered as an exception and does not undergo the limiting function. Since each mask and match register set acts independently, all four sets are applied in parallel.

The multicast traffic limiter works by decrementing a reloadable register with the number of bytes in the frame. As long as the counter is greater than zero the frame is allowed to pass. Once the register reaches zero, the ingress frames are dropped. The register is reloaded automatically every x seconds (or fractional seconds) with the drop threshold register value. The value of x is programmable.

The broadcast limiter works identically and independently in parallel, limiting only broadcast traffic. The time stamp functionality is present in the Classification Engine. The Classification Engine allows different classes of ingress data traffic to be segregated into the different queues shown in Figure 2. As part of this classification, the engine may be programmed to attach a timestamp to each packet of ingress traffic. The device supports more than one timestamp block (two), each of which operates independently. The timestamp is checked when a frame is dequeued by the port-based MDRR block. If the timestamp shows that the packet has taken too long to pass through the device, i.e., the latency is too high, then the packet can optionally be dropped. By providing two independent timestamp engines, the two most common types of latency- sensitive traffic, namely Voice over Internet Protocol (VoIP) and streaming media can simultaneously be serviced in the device, with each type checked for a different latency. Typically VoIP traffic must have very low latency, while streaming media may have roughly ten times the latency of VoIP traffic.

Depending upon the instantaneous congestion being experienced by the device, the latency of other, lower-priority traffic may be significantly higher. The job of the Classification Engine is to place the different types of traffic into the different CoS queues. As data enters each queue an independent Weighted Random Early Discard (WRED) block may act upon it. WRED is used to selectively discard frames, with increasing probability as the fullness of the queue increases. This mechanism is used to cause traffic sources that use traffic loss as a method of controlling their output rate (such as TCP/IP) to send less data. Because each queue has a WRED block in front of it, higher- priority traffic can be given a lesser degree of inhibition than low-priority traffic. Use of the WRED block is optional. Control plane traffic typically does not make use of WRED, because the traffic sources will not adjust their transmission rate based on frame loss. This is also true of some other transmission protocols, which may have been segregated into a particular queue, based on the actions of the Classification Engine. In these cases the queue will tail-drop any frames that attempt to enter the queue when no room is available for them.

After a frame passes through the WRED block without being dropped, it enters the queue associated with that block. Each queue is serviced in First- In, First-Out (FIFO) order.

The invention allows for the use of strict priority servicing to (optionally) be used on one or two queues, those queues being the two highest-priority queues. Strict priority servicing refers to how the packets are dequeued (removed) from each queue, in that each frame will be sent to the system- side interface as soon as it is available, ahead of traffic sent from other queues. This allows both control plane traffic and latency-sensitive traffic, such as VoIP, to be queued separately, yet still have the low-latency provided by strict scheduling. All of the queues are serviced using a Modified Deficit Round Robin (MDRR) scheme. In this scheme each queue is periodically given a user- programmable amount of credits, which can be different for each queue. The highest-priority queue with credits is allowed to send the next frame. Credits are consumed based upon the size of the frame. A queue may use up more credits than it has been allocated to finish sending a large frame of data. However the deficit credits are kept track of, so that that queue will net fewer credits the next time they are allocated. Once the highest-priority queue has used up its credits, the next highest-priority queue is serviced. This continues until either all of the credits are used up or new credits are given out. The device supports both a full-packet mode and a multiplexed mode. If multiplexed mode is selected, then, after the per-port traffic shaping is completed, the data stream for each port enters the rate limiter associated with that port. Each rate limiter operates independently, and consists of a minimum and a maximum rate limiter. The minimum rate limiter is used to make sure that no port is starved of traffic when the data from the different ports is commingled at the system-side interface. The maximum rate limiter is typically used to maintain maximum bandwidth customer contract agreements. The sum of the minimum rates should be less than or equal to the rate at which the system-side interface can send the frames. Either (or both) rate limiter(s) may be disabled. If full-packet mode is selected, then an Inter-Port MDRR block is used instead of the rate limiters. This MDRR block is used segment the system-side bandwidth amongst the different ports in proportions specified by the user. The bandwidth division is based on the Inter-Port MDRR credit allocations.

After a frame has passed through the Classification Engine (point C of Figure 2) it will have a Class of Service (CoS) assigned to it. This is a number, ranging from zero to seven. Seven is the highest (most important) class. The frame will enter the Weighted Random Early Discard (WRED) block associated with the class it has been assigned as shown in Figure 3. WRED is an Active Queue Management (AQM) technique used for handling network queues that may be subject to congestion.

The WRED block works by dropping frames with a probability dependent upon the depth of the queue. This is depicted graphically in Figure 3. As shown in the graph, the x-axis shows the averaged queue size, i.e., the amount of data waiting to be sent on this particular queue. The y-axis provides the drop probability, which varies from 0 (never drop) to 1 (always drop). The dotted line indicates the desired WRED operation. In this case, no frames are to be dropped until the queue size reaches approximately five (5) frames. As the queue size grows from five (5) to 20 frames, the drop probability grows linearly from 0 to 0.1 (one in ten). Since this queue is configured to use the "gentle RED", the drop probability increases linearly to 1 at twice the maximum threshold.

The device maintains a table of 1 ,024 entries (128 when external memory is not available). Each element of the list consists of a queue size entry and a drop probability. The drop probability is expressed as a 10-bit binary number. These elements are used to provide a piece-wise linear estimation of the desired drop functionality. When a frame enters the WRED block, it is assigned a number generated using a Linear Feedback Shift Register (LFSR), arranged to generate a Pseudo Random Bit Sequence (PRBS). The current queue size is then compared to the entries in the table, until an entry is found that corresponds to the current queue depth (starting from the smallest queue depth entry, the entry is selected that has the greatest value that does not exceed the current queue depth). The 10-bit drop probability associated with this entry is then compared with the pseudo-random number assigned to the current frame. If the assigned number is less than the drop probability taken from the table, the frame is placed in the queue and the current queue size is updated. Otherwise the frame is dropped and the relevant statistics counters are updated to reflect this.

The queuing of the kept frames takes place at point D of Figure 2.

2. As frames pass from point D to point E, they are stored in First-ln First-Out (FIFO) queues. The device does not use fixed memories for this storage. Rather, the memory is provided using a dynamic mechanism that makes use of a memory manager block (not shown in Figure 2). This block assigns memory buffers as required by the current traffic patterns incident to the device and then reclaims the used memory blocks as they are freed up by the dequeuing process. The maximum size of each queue is controlled via user programming of the device.

Each FIFO works independently and consists of a write (queuing) process and a read (dequeuing process). Frames are always read out in the same order that they were written into each queue. Note that the dequeuing mechanism has the ability to change the order in which the frames are sent onward, since those stored in one queue may be read out before frames stored in another queue (that were received on the line-side interface first). The converse case also applies.

2) As frames pass from point E to point F in Figure 2, they pass through a timestamp check block. The device's Classification Engine permits each frame to individually have a timestamp assigned to it. If the timestamp functionality of the device is enabled, then, as frames are dequeued, the timestamp is checked and frames that with a buffering latency exceeding a programmable value are dropped. Frames dropped due to excess latency are counted, allowing the user to monitor the number of latency drops occurring.

The timestamp is applied to the before it is placed into the CoS buffers. The timestamp itself is a copy of the current time, as determined by a user-programmable counter. When the frame is dequeued, the difference between the (new) current time and the time at which the frame was enqueued is calculated. This value is compared to the user-programmable excess latency value. If the buffering delay exceeds the excess latency value, the frame is dropped.

Timestamps are assigned on a frame-by-frame basis. Each port of the device maintains two separate timestamp check mechanisms, with the selected mechanism determined when the frame is enqueued. This is typically used to police separate maximum latencies for Voice over Internet Protocol (VoIP) and streaming video frames.

3) The Modified Deficit Round Robin (MDRR) block controls the flow of traffic as it passes from point F to point G of Figure 2. The MDRR block is used to allow the user to program preferential treatment of the ingress queues. The device permits the user to configure the device to provide from one (1) to eight (8) queues. If only one queue is being used, then this block is effectively disabled. In cases where more than one queue is being used, the user can make use of the MDRR block.

The MDRR block is based on a round robin approach to queue servicing. Round-robin means that each queue is serviced in order, starting with the highest priority queue and progressing through the queues to the lowest priority queue. In basic round robin servicing the queue is serviced by reading all of the frames contained within it until the queue is empty. Therefore round-robin queue servicing provides equal service to all queues. The device enhances this approach by allowing the user to assign a variable number of credits to each queue. As the queue is serviced, the credits are consumed proportional to the size of the frame being dequeued. Because the MDRR block works on a frame-basis, the credit total is allowed to become negative as the last frame for the currently serviced queue is dequeued. Once the credit total is no longer greater than zero for a given queue, no more frames are read from it until the credits have been reallocated. Each time the credits are reallocated, servicing begins anew with the highest priority queue. In this manner queues can be given higher priority by servicing them first in a service interval and allocating the number of credits commensurate with the priority. Note that this approach can lead to a low priority queue never being serviced, because new credits may be apportioned before the queue's servicing starts. When credits are reallocated, servicing starts over at the highest priority queue.

The device also allows for the optional use of one or two strict priority queues. Strict priority queues are typically used for control-plane traffic. Control-plane traffic is traffic that is used to control the network itself. Dropping control-plane traffic can lead to a breakdown in the network itself and is to be avoided. If the highest priority queue is assigned strict priority, then every time it contains a complete frame, this frame will be the next frame to be transmitted. Thus the normal round robin servicing can be repeatedly interrupted with these frames. After the strict priority frame has been sent, the round-robin mechanism continues where it left off. Referring again to Figure 4, the per-port traffic shaping shown in Fig. 4 is enclosed within the Per-Port Traffic Shaper blocks. Therefore point G of Figure 2 is also point G of Figure 4.

The functionality of the traffic as it passes from point G to point J is decomposed below:

l) As traffic passes from point G to point I it passes through the Min/Max Rate Limiter block. While the figure shows four separate blocks, their operation is actually coordinated, in that the current mode of each block is affected by the configuration and status of the other rate limiting blocks. A detailed block diagram of the Rate Limiter block is shown in Figure 5.

As shown in Figure 5, the minimum and maximum rate blocks both make use of a user-programmable counter and a user-programmable interval timer. Since rate is defined as bits per unit time, programming the counter and timer with bytes over a time interval can be used to estimate the rate. Varying these values allows the user to customize the rate measurement to the desired level.

The description provided above discusses the traffic shaping functionality provided for each ingress port of the device. The traffic from the different ports of the device are combined as shown in Figure 4. Note that Figure 4 also shows how the traffic from a cascaded device is handled. If no cascaded device is present, then the Inter-Port MDRR block is programmed to provide zero credits for these two interfaces, effectively disabling their functionality.

Claims

What is claimed is:
1. A method for traffic shaping and metering comprising: Receiving traffic from an external device
Translating the external data format into the device internal data format
Verifying that the traffic is error free
Sending the data to broadcast or multicast limiter
Matching the value of the ingress address with the contents of the register Decrementing the register with the number of bytes in the traffic frame
Sending the traffic to classification engine
Segregating the traffic by priority class
Attaching the time stamp to traffic
Placing the traffic into proper classification.
PCT/US2006/040926 2005-10-18 2006-10-18 Traffic shaping and metering scheme for oversubscribed and/or cascaded communication devices WO2007047864A2 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
US72816605P true 2005-10-18 2005-10-18
US60/728,166 2005-10-18

Publications (2)

Publication Number Publication Date
WO2007047864A2 true WO2007047864A2 (en) 2007-04-26
WO2007047864A3 WO2007047864A3 (en) 2007-06-07

Family

ID=37963288

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2006/040926 WO2007047864A2 (en) 2005-10-18 2006-10-18 Traffic shaping and metering scheme for oversubscribed and/or cascaded communication devices

Country Status (1)

Country Link
WO (1) WO2007047864A2 (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9647904B2 (en) 2013-11-25 2017-05-09 Amazon Technologies, Inc. Customer-directed networking limits in distributed systems
US9674042B2 (en) 2013-11-25 2017-06-06 Amazon Technologies, Inc. Centralized resource usage visualization service for large-scale network topologies
US9712390B2 (en) 2013-11-04 2017-07-18 Amazon Technologies, Inc. Encoding traffic classification information for networking configuration
US10002011B2 (en) 2013-11-04 2018-06-19 Amazon Technologies, Inc. Centralized networking configuration in distributed systems
US10027559B1 (en) 2015-06-24 2018-07-17 Amazon Technologies, Inc. Customer defined bandwidth limitations in distributed systems

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5850399A (en) * 1997-04-04 1998-12-15 Ascend Communications, Inc. Hierarchical packet scheduling method and apparatus
US7177276B1 (en) * 2000-02-14 2007-02-13 Cisco Technology, Inc. Pipelined packet switching and queuing architecture

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5850399A (en) * 1997-04-04 1998-12-15 Ascend Communications, Inc. Hierarchical packet scheduling method and apparatus
US7177276B1 (en) * 2000-02-14 2007-02-13 Cisco Technology, Inc. Pipelined packet switching and queuing architecture

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9712390B2 (en) 2013-11-04 2017-07-18 Amazon Technologies, Inc. Encoding traffic classification information for networking configuration
US10002011B2 (en) 2013-11-04 2018-06-19 Amazon Technologies, Inc. Centralized networking configuration in distributed systems
US9647904B2 (en) 2013-11-25 2017-05-09 Amazon Technologies, Inc. Customer-directed networking limits in distributed systems
US9674042B2 (en) 2013-11-25 2017-06-06 Amazon Technologies, Inc. Centralized resource usage visualization service for large-scale network topologies
US10027559B1 (en) 2015-06-24 2018-07-17 Amazon Technologies, Inc. Customer defined bandwidth limitations in distributed systems

Also Published As

Publication number Publication date
WO2007047864A3 (en) 2007-06-07

Similar Documents

Publication Publication Date Title
US6052375A (en) High speed internetworking traffic scaler and shaper
US7310348B2 (en) Network processor architecture
US7710871B2 (en) Dynamic assignment of traffic classes to a priority queue in a packet forwarding device
US5793747A (en) Event-driven cell scheduler and method for supporting multiple service categories in a communication network
JP4490956B2 (en) Policy-based Quality of Service
US6519595B1 (en) Admission control, queue management, and shaping/scheduling for flows
KR100933917B1 (en) Guaranteed bandwidth and overload protection method and apparatus in a network switch
EP1080560B1 (en) Method and apparatus for forwarding packets from a plurality of contending queues to an output
US7006440B2 (en) Aggregate fair queuing technique in a communications system using a class based queuing architecture
US7158528B2 (en) Scheduler for a packet routing and switching system
US6914881B1 (en) Prioritized continuous-deficit round robin scheduling
EP0944208B1 (en) Time based scheduler architecture and method for ATM networks
US6687781B2 (en) Fair weighted queuing bandwidth allocation system for network switch port
US7970888B2 (en) Allocating priority levels in a data flow
US7042843B2 (en) Algorithm for time based queuing in network traffic engineering
US7606154B1 (en) Fair bandwidth allocation based on configurable service classes
US7948880B2 (en) Adaptive dynamic thresholding mechanism for link level flow control scheme
JP2593120B2 (en) Methods for different traffic types fast packet prioritization, abandoned selectively, and multiplexes
EP1605648A1 (en) Fair WRED for TCP and UDP traffic mix
US6914882B2 (en) Method and apparatus for improved queuing
US20020021701A1 (en) Dynamic assignment of traffic classes to a priority queue in a packet forwarding device
US20030081624A1 (en) Methods and apparatus for packet routing with improved traffic management and scheduling
Semeria Supporting differentiated service classes: queue scheduling disciplines
US20110216773A1 (en) Work-conserving packet scheduling in network devices
US20030016686A1 (en) Traffic manager for network switch port

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application
NENP Non-entry into the national phase in:

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTINF OF LOSS OF RIGHTS PURSUANT TO RUEL 112(1) EPC DATED 24.06.2008 (EPO FORM 1205A)

122 Ep: pct application non-entry in european phase

Ref document number: 06826298

Country of ref document: EP

Kind code of ref document: A2