SYSTEM AND METHOD FOR DISTRIBUTING TRAFFIC IN A NETWORK
Cross-Reference To Related Applications
[0001] This application claims priority to U.S. Provisional Patent Application Serial
Number 60/359,491, entitled "PRACTICAL TRAFFIC ENGINEERING SOLUTIONS FOR OSPF/IS-IS NETWORKS", filed February 22, 2002, and is hereby incorporated by reference in its entirety.
Field Of The Invention
[0002] The present invention generally relates to traffic engineering in data networks, and more specifically relates to efficient management of destination based routing protocols.
Background Of The Invention
[0003] In current IP networks, packets of data reach their respective destinations via paths determined by routers in the network. These networks can become quite large and have an extensive geographical distribution. Thus, efficient routing of traffic on such networks is a primary concern. Routers typically utilize standard routing protocols. A protocol is a set of rules for communication defining the format of traffic transferred on the network, and the procedure for such transfers. For example, many autonomous systems (e.g. an Internet Service Provider's own network), utilize the well known OSPF (Open Shortest Path First) and IS-IS (Intermediate System to Intermediate System) routing protocols. These types of routing protocols incorporate specific constraints.
[0004] The OSPF and IS-IS routing protocols determine paths using a shortest path routing protocol. In the shortest path routing protocol, each link, also referred to as a hop, in the network is assigned a cost (e.g., the more bandwidth on a hop, the smaller its cost). A router (or node) will then use the shortest cost paths, (a path cost is defined to be the sum of hop costs on that path) to forward packets to other nodes in the network. Also, OSPF and IS-IS type routers split traffic evenly across paths having equal costs. For example, if a router, X, has two or more
paths of equal cost to another node in the network, Y, the traffic arriving at router X with destination Y is split equally over all the equal cost paths.
[0005] These constraints (shortest path constraint and equal splitting constraint) can significantly and detrimentally affect how traffic is routed in a network and hence affect the performance of the network. A system and method for distributing traffic on a network, which overcomes the above constraints is desired.
Summary Of The Invention
[0006] The present invention includes a method for distributing traffic in a network having predetermined traffic distribution constraints. The method includes the steps of selecting at least one next hop from candidate next hops in accordance with the predetermined constraints. The next hop(s) are selected also in accordance with (1) the residual throughput capacity of the next hoρ(s) compared to the residual throughput capacity of the candidate next hops, and/or (2) the traffic load on the next hop(s) compared to traffic load on the candidate next hops. The method also includes assigning an incoming route to the selected next hop(s). A network router that receives traffic data via at least one incoming route and provides traffic data via at least one next hop within a network having predetermined constraints implements the method of the invention using a data receiving portion, an assignment portion, and a data distribution portion. The assignment portion selects at least one next hop from candidate next hops in accordance with the predetermined constraints and in accordance with at least one of (1) residual throughput capacity ofthe next hop(s) compared to residual throughput capacity of the candidate next hops, and (2) traffic load on the next hop(s) compared to traffic load on the candidate next hops. The assignment portion also assigns an incoming route to the selected next hop(s). The data distribution portion distributes data from the incoming route to the selected next hop(s).
Brief Description Of The Drawings
[0007] The features and advantages of the invention will be best understood when considering the following description in conjunction with the accompanying drawings, of which:
[0008] Figure 1 is an illustration of a graph comprising a node, incoming routes, and candidate next hops in accordance with an exemplary embodiment ofthe present invention;
[0009] Figure 2 is a flow diagram of a process for distributing traffic in a network utilizing the max-min residual throughput capacity method in accordance with an exemplary embodiment ofthe present invention;
[0010] Figure 3 is a flow diagram of a process for distributing traffic in a network utilizing the min-max residual gap method in accordance with an exemplary embodiment of the present invention;
[0011] Figure 4 is a flow diagram of a process for distributing traffic in a network utilizing the min-max traffic load method in accordance with an exemplary embodiment of the present invention;
[0012] Figure 5 is a functional block diagram of a network router for distributing traffic in a network in accordance with an exemplary embodiment ofthe present invention;
[0013] Figure 6 is a graph of the evolution of an arbitrary cost function as a function of link load;
[0014] Figure 7 is a graph ofthe performance of three traffic distribution embodiments in accordance with the present invention tested on nodes having an average of 26500 routing prefixes per node;
[0015] Figure 8 is a graph of the deviation from optimal of three traffic distribution embodiments in accordance with the present invention tested on nodes having an average of 26500 routing prefixes per node;
[0016] Figure 9 is a graph of the deviation from optimal of three traffic distribution embodiments in accordance with the present invention tested on nodes having an average of 17000 routing prefixes per node;
[0017] Figure 10 is a graph of the performance of three traffic distribution embodiments in accordance with the present invention tested on nodes having an average of 17000 routing prefixes per node;
[0018] Figure 11 is a graph of the performance of three traffic distribution embodiments in accordance with the present invention tested on an ISP topology;
[0019] Figure 12 is a graph of the deviation from optimal of three traffic distribution embodiments in accordance with the present invention tested on an ISP topology;
[0020] Figure 13 is a graph of the cumulative contribution of routing prefixes at an ISP topology router sorted in decreasing order of intensity;
[0021] Figure 14 is a graph of the performance of three traffic distribution embodiments in accordance with the present invention tested as a function of configuration overhead on a GT- ITM 30 node, 238 edge network;
[0022] Figure 15 is a graph of the performance of three traffic distribution embodiments in accordance with the present invention tested as a function of configuration overhead on a BRITE 50 node, 200 edge network; and
[0023] Figure 16 is a graph of the performance of three traffic distribution embodiments in accordance with the present invention tested as a function of configuration overhead on an ISP topology.
Detailed Description Of Illustrative Embodiments
[0024] A traffic distribution system and method in accordance with the present invention selects a subset of next hops from a set of candidate next hops at a router. Incoming routes (also referred to as prefix routes) are assigned to selected next hops in accordance with specific
criteria. These criteria include minimizing the maximum residual throughput capacity on a next hop, maximizing the minimum residual throughput capacity on a next hop, and minimizing the maximum traffic load on a next hop. The techniques utilized are greedy in nature, that is, for each route they try to allocate the set of next hops which best satisfies some chosen criteria. Two types of implementations are described below. One incorporates determining the "gap" between the actual and throughput capacity (also referred to as the residual capacity), and the other incorporates determining the ratio ofthe actual to desired traffic load.
[0025] Two ofthe most widely used Interior Gateway routing protocols are OSPF (Open
Shortest Path First) and IS-IS (Intermediate System to Intermediate System). Advantageously, various embodiments of a system and method for distributing traffic in accordance with the present invention are capable of being implemented in conjunction with these protocols, thus allowing leveraging of their widespread deployment. However, there are two main difficulties in doing so. The first is that these protocols use shortest path routing with destination based forwarding. The second is that the protocols generate multiple equal cost paths for a given destination routing prefix, where the underlying forwarding mechanism performs load balancing across those paths by equally splitting traffic on the corresponding set of next hops.
[0026] In an attempt to overcome these difficulties, an embodiment of the traffic distribution system and method in accordance with the present invention utilizes shortest paths to achieve optimal link loads and for each prefix (incoming route) a set of allowable next hops is carefully selected from all candidate next hops. Underlying these solutions is the fact that current routers have thousands of route entries (destination routing prefixes) in their routing table. Instead of changing the forwarding mechanism responsible for distributing traffic across equal cost paths, actual (sub)sets of shortest paths (next hops) assigned to routing prefix entries in the forwarding table(s) of a router are controlled. This provides controlled traffic distribution without modifying existing routing protocols such as OSPF or IS-IS, and without requiring changes to the data path of current routers, i.e., their forwarding mechanism.
[0027] Because the problem is NP-hard (non-polynomial hard), heuristics are described.
It is to be understood that even though a system and method for traffic distribution in accordance
with the present invention is presented herein in the context of a routing problem, they are not to be limited thereto and are applicable to other load balancing scenarios. As described in more detail below, performance achieved by this approach is essentially indistinguishable from the optimal. In an attempt to mitigate the configuration overhead associated with "hand-crafting" the set of next hops that are to be installed for each routing prefix in a router's forwarding table, various embodiments are described that limit the number of routing prefixes for which the proposed selective installment of next hops is performed.
[0028] Described below is an introduction to the linear program formulation used to generate an optimal set of shortest paths, and modifications to provide compatibility with existing IP routers. For a more detailed description of linear programming in this context, please refer to R. K. Ahuja, T. L. Magnanti, and J. B. Orlin, Network Flows, Chapter 17, Section 17.2, which is hereby incorporated by reference in its entirety as if presented herein. As explained therein, a network is modeled as a directed graph G = (N,E) with m = ||N|| nodes and n = ||E|| directed links. Also, the existence of a traffic matrix, T, is assumed, where entry T(sr,tr) =dr denotes the average intensity of traffic from ingress node s to egress node t for commodity r e R. For more details describing the construction of such a traffic matrix see A. Feldman, A. Greenberg, C. Lund, Ν Reingold, J. Rexfbrd, and F. True, "Deriving Traffic Demands for Operational IP Networks: Methodology and Experience," IEEE/ ACM Transactions on Networking, vol 9, 2001, which is hereby incorporated by reference in its entirety as if presented herein. Assuming that an optimal allocation based on some network wide cost function yields a set of paths Pr for each commodity r, the total bandwidth consumed by these paths on link (i, j) is . The bandwidth consumed on a link is assumed to be the sum of the traffic on all paths that use the link. It can be shown that the same performance, in terms ofthe bandwidth consumed on each link, can be achieved with a set of shortest paths by formulating and solving a linear program and its dual. The linear program can be formulated as:
ΣX β- ∑XΪj= ≠sr,tr reR j:(i,j)eE j:(i,j)eE
drX^cif (i,j)sE reR
0≤X[j≤l(i,j)eE,reR (1)
where Xjj is the fraction of traffic for commodity r that flows through link (i, j). Solving the
linear program gives a traffic allocation {Xj Λ that consumes no more than cy amount of bandwidth on any link (i, j). In order to obtain link weights for shortest path routing, the dual of the linear program as formulated needs to be solved:
max ∑ Uf r - ∑cyWy reR,teV r (i,j)≡E
subject to
W i7j-≥O
Ur =0 (2) s'ij≥0
Solving this dual results in a set of link weights 1 + { Wj }, from which a set of shortest paths
can be constructed that are consistent with the traffic allocation variables { X } .
--1
[0029] It is, however, important to understand that although routing can now be done over shortest paths, this is still quite different from the forwarding paradigm currently deployed in existing OSPF and IS-IS networks. The reasons are two-fold and both can be traced to the output of the primal linear program, namely, the traffic allocation { X[ } . Firstly, observe that the traffic allocation is for each commodity or source-destination pair. This means that the routing protocol could possibly generate different sets of next hops for each source-destination pair on which traffic is to be forwarded. This impacts the forwarding mechanism on the data path, as it now needs to make decisions on the basis of both source and destination addresses.
[0030] A simple example can help illustrate this difference. Consider two sources
158.130.68.41 and 158.130.68.48, both sending traffic to 192.168.60.100. The optimal traffic allocation obtained from solving the linear program would require treatment of the flows (158,130.68.41, 192.168.60.100) and (158.130.68.48, 192.168.60.100) as distinct entities, and make independent forwarding decisions for each of them, regardless of the paths they take. In contrast, in OSPF and IS-IS, if the two flows were to meet at a common node, the routing decisions for their packets would henceforth be indistinguishable since they both share the same destination address 192.168.60.100.
[0031] The second problem relates to the fact that current forwarding mechanisms support only equal splitting of traffic on the set of equal cost next hops. The linear program yields a traffic allocation that is not guaranteed to obey this constraint. Modifying the forwarding engine to support unequal splitting of traffic would involve significant and expensive changes. The function used to select the next hop on which to send a packet would have to be modified, and additional information stored in the forwarding entries in order to achieve the desired split ratios. This change is all the more difficult since it impacts the data path.
[0032] The first problem of translating a traffic allocation that distinguishes between source-destination pairs into one that only depends on destinations can be achieved simply by transforming the individual splitting ratios of source-destination pairs that share a common destination into a possibly different splitting ratio for the aggregate traffic associated with the common destination. The reason this is possible is because all routes are shortest paths. Shortest
paths have the property that segments of shortest paths are also shortest paths, so that once two flows headed to the same destination meet at a common node they will subsequently follow the same set of shortest paths. This means that these packets need not be distinguished based on their source address, and splitting and forwarding decisions can be made simply based on their destination address. The new splitting ratios that are to be used on the aggregate traffic in order to achieve the same traffic distribution are computed as follows.
[0033] Let the traffic heading for destination t at node i ≠ t be denoted as:
/} = ∑ ∑drX[_j . j-.(j,i)eEs:(s,t)eR
It can be readily seen that using afj as the fraction of the overall traffic headed toward destination t and sent on link (i, j) will maintain the optimal traffic profile.
[0034] To overcome the second problem, advantage is taken of the fact that current routing tables are relatively large, with multiple routing prefixes associated with the same egress router. By controlling the (sub)set of next hops that each routing prefix is allowed to use, one can control the traffic headed toward a particular egress router(destination). In other words, instead of the current operation that has all routing prefixes use the full set of next hops, the invention selectively controls this choice based on the amount of traffic associated with each routing prefix and the desired link loads for an optimal traffic allocation.
[0035] An advantage ofthe above approach is that the forwarding mechanism on the data path remains unchanged, as packets are still distributed evenly over the set of next hops assigned to a routing prefix. This means that a close approximation of an optimal traffic engineering solution is obtainable even in the context of existing routing and forwarding technologies. There are, however, a number of issues to consider. The first is the need for traffic information at the granularity of a routing prefix entry instead of a destination (egress router). This in itself is not an insurmountable task as most ofthe techniques currently used to gather traffic data (e.g., router mechanisms such as Cisco's Netflow or Juniper's cflowd), can be readily adapted to yield
information at the granularity of a routing prefix. The second issue concerns the configuration overhead involved in communicating to each router the subset of next hops to be used for each routing prefix. This can clearly represent a substantial amount of configuration data, as routing tables are large and the information that needs to be conveyed is typically different for each router. A traffic distribution system and method in accordance with the present invention identifies a small set of prefixes for which careful allocation of next hops is done and rely on default behavior for the remaining prefixes. The third issue relates to actually formulating a technique for determining which subset of next hops to choose for each routing prefix in order to approximate an optimal allocation. In one exemplary embodiment, a goal is to minimize some metric that measures discrepancy between the optimal traffic allocation and the one achieved under equal-splitting constraints on any hop. Two metrics are described: the maximum gap between the optimal traffic and the allocated traffic on any hop, and the maximum load on any hop, where the load on a hop is the ratio ofthe allocated traffic and the optimal traffic.
[0036] In one embodiment of a traffic distribution system and method in accordance with the present invention, selective next-hop allocation is achieved at the global level, that is, a concurrent optimal assignment of next hops for each routing prefix is performed at each node. In other embodiments, in an attempt to mitigate possible computational difficulty, independent computations for each routing prefix at each node is performed. Computations are based only on the incoming traffic at the node and the desired outgoing traffic profile. A potential problem with this approach is that the traffic arriving at a node may not match the optimal profile due to the heuristic decisions at some upstream node. Consequently, the profile ofthe outgoing traffic from the node in question, could further deviate from the desired one. However, as described in detail below, the techniques perform excellently and hence incoming traffic seen at any node and the resultant outgoing traffic have a near-optimal profile. The techniques are greedy in nature, trying to minimize one ofthe two metrics previously described.
[0037] Figure 1, is an illustration of a graph depicting a node (router) I, incoming routes
(egress routes) A, B, C, and D, and candidate next hops J, K, and L. As described herein and illustrated in Figure 1, node I is considered an egress point (or egress router), and the word "stream" and phrase "traffic intensity of a routing prefix associated with the egress point" are
used interchangeably. A next hop is the outgoing path connected between the current node and a next node. As shown in the upper left corner of Figure 1, the traffic intensity (XJ = XA) of routing prefix A is 5 (arbitrary units), the traffic intensity (XJ = Xβ) of route prefix B is 1, the traffic intensity (Xi = Xc) of route prefix C is 8, and the traffic intensity (xj = XD) of route prefix D is 10. As indicated by the boxed number next to each next hop, the optimal (desired) traffic load on next hop J (fk = fj) is 9, the optimal traffic load on next hop K (fk = fk) is 3, and optimal traffic load on next hop L (f = fk) is 12.
[0038] Generally, a traffic distribution system and method in accordance with the present invention performs the following: (1) For an arbitrary node (router), order routing prefixes (e.g., prefixes A, B, C, and D) destined to a particular egress router in decreasing order of traffic intensity, and (2) sequentially assign each routing prefix to a subset of next hops so as to minimize a given metric.
[0039] The following notation is used herein. At an arbitrary node n, when assigning routing prefixes associated with an arbitrary egress router destination m to next hops:
1. Denote the set of next hops to egress router m by K= {1,2,..., K}, }\\K\\ = K.
2. Denote the desired (optimal) traffic load (for egress router m) on hop k e R7 by fk.
3. Denote the traffic intensity of routing prefix i by x;. Denote the collective set of the routing prefixes (at n for m) that need to be assigned to next hops by n>m.
4. Denote the traffic load on hop k after heuristic H has assigned i routing prefixes by l . Assume
= 0 for i < 0.
[0040] Max-Min Residual Capacity In one embodiment, referred to as the max-min residual capacity heuristic, each routing prefix is assigned such that the minimum gap between the optimal and desired traffic on any hop is maximized. Although this may seem to be at odds with the goal of matching the optimal profile, the intuition behind such an assignment is to always keep enough residual capacity (difference between optimal and assigned traffic) so as to
be able to accommodate subsequent routing prefixes. Since all routing prefixes must be allocated a set of next hops, by keeping enough residual capacity we try to ensure that an allocation does not "overflow". The max-min residual capacity technique is performed in accordance with the following.
1. Sort the set of prefixes Xn_m in descending order of traffic intensity.
2. For each prefix i e Xn,m choose a subset of next hops M e K, with cardinality \\M \\ which maximizes
Note that Step 2 is easily achieved by simply sorting all the next hops in decreasing order of their residual capacity fk - lk l , indexing them in that order, going through an increasing sequence of M
= {q_l, ..., q_k; keR}, k = 1, 2, ..., K assignments over ||M|| hops and choosing the best one (i.e., one with maximum min gap).
[0041] Referring again to Figure 1, the first step of sorting the prefixes is the same for all three of the techniques. Routes A, B, C, and D are first sorted in decreasing order of intensity, thus resulting in the following order.
1) Route D, intensity = 10
2) Route C, intensity = 8
3) Route A, intensity = 5
4) Route B, intensity = 1
[0042] An object of this technique is to maximize the minimum residual capacity while assigning each stream (route or prefix) utilizing two rules. The first rule involves testing the assignment over all hop assignments and the second rule is to minimize residual capacity for all such hop combinations (1 hop, 2 hop, 3 hop) and choose the one with the largest minimum residual capacity. For example, if there are 3 next hops, test the assignment over 1 hop, 2 hop
and 3 hops. Thus if the assignment is to be tested over "k" hops, simply assign the route to the "k" hops with the largest residual throughput capacities. Note that the assignment over any other combination of "k" hops need not be tested because if a hop with smaller residual capacity is chosen, it can always be improved upon by removing this hop and putting in its place a hop with larger capacity. Remember that the criterion of each such assignment is the minimum residual capacity.
[0043] Applying this to the scenario of Figure 1, the 1 hop assignment starts with route D
(because it has the largest value of intensity), and determines the residual capacity for each candidate next hop (hops J, K, and L). However, as described above, assigning route D to hop L is sufficient because hop L will have the largest residual capacity (because it has the largest optimal throughput value, 12). The resulting residual capacities are hop J = 9 (obtained by subtracting 9-0), hop K = 3 (obtained by subtracting 3-0), and hop L = 2 (obtained by subtracting 12 -10). Thus, the minimum residual capacity for a 1 hop assignment is 2 (hop L). Next, the 2 hop combinations are analyzed. This involves assigning to the 2 hops with largest residual capacity, J and L. Because route D, having an intensity of 10, is split between hops J and L, hops J and L each receive half of this intensity (i.e., 5). The resulting residual capacities are hop J =4 (obtained from subtracting 9-5), hop K =3 (obtained from subtracting 3-0), and hop L = 7 (obtained from subtracting 12 - 5). Thus, the minimum residual capacity for a 2 hop assignment is 3 (hop K) (from 4, 3 and 7). Next, the 3 hop combination is analyzed by assigning routed D to the 3 hops having the largest residual capacity (all three hops J, K, L). Because route D, having an intensity of 10, is split between hops J, K, and L, hops J, K, and L each receive one third of this intensity (i.e., 3.3). The resulting residual capacities are hop J = 5.7 (obtained by subtracting 9 - 3.3), hop K = -0.3 (obtained by subtracting 3 - 3.3) and hop L = 8.7 (obtained by subtracting 12 - 3.3). Thus, the minimum residual capacity for a 3 hop assignment is -0.3 (hop K) (from 5.7, -0.3, 8.7). Finally, the largest minimum residual throughput capacity over all the hop assignments is selected. Remembering that the 1 hop assignment has a minimum residual capacity of 2, the 2 hop assignment has a minimum residual capacity of 3, and the three hop assignment has a minimum residual capacity of -0.3. ; the largest is the 2 hop assignment, which corresponds to next hops J and L. Therefore, route D is assigned to next hops J and L. The
updated optimal throughput (load) values are 4 for hop J (9-5), 3 for hop K (3-0), and 7 for hop L (12-5).
[0044] The process is repeated for all other streams (i.e., C, A, and B). For route C (the next largest intensity), 1 hop assignments are tested using hop L (largest updated optimal load, and thus the largest residual throughput capacity). The resulting residual capacities are 4 for hop J, 3 for hop K, and -1 for hop L (7 - 8). Thus, the minimum residual capacity for a 1 hop assignment is -1 corresponding to hop L (from 4,3 and -1). Next, for the 2 hop combination, route C is assigned to the 2 hops having the largest residual capacity, i.e., J and L. The resulting residual capacities are 0 for hop J (4-4), 3 for hop K, and 3 for hop L (7 - 4). The minimum residual capacity for a 2 hop combination is 0, corresponding to hop J (from 0, 3 and 3). The 3 hop combination assigns to the 3 hops with the largest residual capacity (in this case, all 3 hops). The resulting residual capacities are 0.7 for hop J (4 - 3.3), -0.3 for hop K (3 - 3.3) and 3.7 for hop L (7 - 3.3). Thus, the minimum residual capacity for a 3 hop assignment is -0.3, corresponding to hop K (from 0.7, -0.3 and 3.7). Finally, the largest minimum residual capacity over all hop assignments is selected. Selecting from values of -1, 0, and -.3, the largest is 0, which corresponds to hops J and L. Therefore, route C is assigned to next hops J and L. The updated values of optimal hop load are 0 for hop J (4-4), 3 for hop K, and 3 for hop L (7 - 4).
[0045] For route A, hops L and K are the largest, each having an optimal load value of 3.
Arbitrarily choosing hop L, the resulting residual capacities are then 0 for hop J, 3 for hop K, and 2 for hop L (3 - 5). The minimum residual capacity for a 1 hop assignment is then -2 (hop L). For the 2 hop combination, the two hops having the largest residual capacity are K and L. The resulting residual capacities are 0 for hop J, 0.5 for hop K (3 - 2.5), and 0.5 for hop L (3 - 2.5). Thus, the minimum residual capacity for a 2 hop combination is 0, corresponding to hop J. For the 3 hop combination, all three hops are analyzed. The resulting residual capacities are -1.6 for hop J, 1.4 for hop K (3 - 1.6), and 1.4 for hop L (3 - 1.6). Thus, the minimum residual capacity for a 3 hop assignment is -1.6, corresponding to hop J. Finally, the largest minimum residual capacity is selected, resulting in the value of 0, corresponding to hops K and L. The updated values of optimal load are 0 for hop J, 0.5 for hop K (3 - 2.5), and 0.5 for hop L (3 - 2.5).
[0046] For route B, either hops L or K may be chosen (each has an optimal load value of
0.5). Arbitrarily choosing hop L, the resulting residual capacities are then 0 for hop J, 0.5 for hop K, and -0.5 for hop L ( 0.5 - 1). Thus, the minimum residual capacity for a 1 hop assignment is then -0.5, corresponding to hop L. For the 2 hop combination, hops K and L are chosen (have largest residual capacity). The resulting residual capacities are then 0 for hop J, 0 for hop K (0.5 - 0.5), and 0 for hop L (0.5 - 0.5). Thus, the minimum residual capacity for a 2 hop assignment is 0 (hop J or K or L). For the 3 hop combination, the 3 hops with the largest residual capacity are chosen (in this case, all 3 hops). The resulting residual capacities are then -0.3 for hop J (0 - 0.3), 0.2 for hop K (0.5 - 0.3) and 0.2 for hop L (0.5 - 0.3). Thus, the minimum residual capacity for a 3 hop assignment is -0.3, corresponding to hop J. Finally, the largest minimum residual capacity is chosen from values of -0.5, 0, and -0.3. The largest minimum value is 0, corresponding to a 2 hop combination of hops K and L. Therefore, route B is assigned to hops K and L. The updated residual capacities are 0 for hop J (0.5 - 0.5), 0 for hop K (0.5 - 0.5), and 0 for hop L (0.5 - 0.5). Hence, as depicted in the block in the lower left side of Figure 1, the final assignments for each route are:
Route D: Hops J & L
Route C: Hops J & L
Route A: Hops K & L
Route B: Hops K & L
[0047] Note that the traffic load on each hop (J, K, and L) is equal to the sum of the respective portions of the incoming routes (A, B, C, and D) that have been assigned thereto. As shown in the boxed number on the right side of Figure 1, next to each respect hop, the cumulative load for ho J is 9, for hop K is 3, and for hop L is 12, which is the same distribution as the desired (optimal) distribution.
[0048] Min-Max Residual Gap In another embodiment, referred to as the min-max residual gap heuristic, each routing prefix is assigned such that the maximum gap between the optimal and desired traffic on any hop is minimized. Observe that even though the metric used by this heuristic is the opposite of that used by heuristic max-min residual capacity, both
essentially try to achieve the same goal. This is because both heuristics must obey the conservation constraint of assigning all routing prefixes. The min-max residual gap technique is performed in accordance with the following.
1. Sort the set of prefixes Xn_m in descending order of traffic intensity.
2. For each prefix i e n,m choose a subset of next hops M e K, with cardinality \\M || which minimizes
max kej-qfk -(/j+1 + -)]
[0049] Again, note that this step is executed in the same fashion as for the max-min residual capacity heuristic. Here the attempt is to minimize the maximum "gap" between the desired and assigned load. This process is very similar to the previous process (max-min residual capacity). The same steps are followed except the maximum residual capacities are selected during the assignment to various combinations, and the minimum of these maximums are selected. The desire is to choose a hop assignment which minimizes the maximum gap between desired loads. Note that the two processes (min-max and max-min) are philosophically different (one is trying to save capacity, the other is trying to close the gap), however they achieve the same results.
[0050] Min-Max Load In yet another embodiment, referred to as the min-max load heuristic, a work conserving scheduling technique is utilized, which attempts to minimize the maximum load on any processor. The min-max load heuristic attempts to minimize the maximum ratio of assigned traffic to the optimal traffic load over all hops. The difference now is that each task (stream) can be split equally among multiple processors (next hops) and the processors (next hops) can have different speeds (optimal traffic loads). The min-max load technique is performed in accordance with the following.
1. Sort the set of prefixes n>m in descending order of traffic intensity.
2. For each prefix i e Xn,m choose a subset of next hops M e K, with cardinality \\M || which minimizes
Again, note that this step is executed in the same fashion as for the max-min residual capacity heuristic. Step 2 is achievable in two stages. First, for each index p = 1, 2,...,K, do a virtual assignment of routing prefix i to a set of p hops which yields the smallest maximum. This is
accomplished by simply sorting the set } in increasing order, re-indexing them and
virtually assigning i only to the first p hops. Second, from all the K such possible assignments, choose the one with the smallest maximum for an actual assignment. In case of a tie, choose a lexicographically smaller assignment.
[0051] Similar to the previous 2 heuristics (min-max residual gap and max-min residual capacity), this is also a greedy heuristic. For every route, it tries to find a hop assignment that minimizes the maximum load ratio. The load ratio of a hop is defined to be the ratio of the
achieved load to the desired load, i.e., — . Unlike the other two heuristics, in order to find the
assignment for each route that minimizes the maximum load ratio, the streams can not simply be placed on the hop(s) with the lowest load ratio(s). Instead, for example, in attempting a k hop assignment, x/k is assigned to every hop and then the k best hops are selected.
[0052] Referring again to Figure 1, starting with route D (the route having the largest intensity) a 1 hop combination assignment is conducted. An intensity of 10 units are allocated to each of the hops and the one with smallest load is selected. In this case, it is hop L, with load ratio of 10/12 = 0.833. Since this is a one hop combination, and there is only one ratio from which to choose, this is also the maximum ratio. Thus, the maximum load ratio for a 1 hop assignment is 0.833, corresponding hop L. For the 2 hop combination assignment, an intensity
of 10/2 = 5 is allocated to all hops. In this case the 2 best hops are hops J, with a load ratio of 5/9 =0.55 and hop L with a load ratio of 5/12 = 0.416. The maximum load ratio for 2 hop assignment is then 0.55 . For the three hop combination, 10/3 = 3.33 units are allocated to all hops and the 3 best (in this case, it happens to be all 3) are chosen. The resulting load ratios are hop J :3.3/9 =0.36, hop K: 3.3/3 = 1.1 and hop L: 3.3/12 = 0.275. The maximum load ratio for a 3 hop assignment is then 1.1. Next, now the assignment that minimizes the maximum Load ratio is selected from the following.
1 Hop assignment : max ratio = 0.833
2 Hop assignment: max ratio = 0.55
3 Hop assignment: max ratio = 1.1
From the above data, the heuristic chooses a 2 hop assignment (hops J and L) for route D. Assignments for other routes proceed in a similar fashion, except that now hops J and L carry the load of 5 units (from D) which must be accounted for when computing load ratios. E.g., if we assign 3 units to hop L, then its load ratio would be (5 + 3)/12.
The process for implementing the above heuristics is summarized below in pseudo-code:
Input = (Link Weights 1 + { Wy }, optimal traffic allocation {fj }, Traffic Matrix T)
For each destination node m do
Run Dijkstra's algorithm with weights Wij
For each node n ≠ m in order of decreasing distance from m do
Apply the heuristic to the set of routing prefixes Xn,m to determine, for each routing prefix i, the set of next hops K; For each routing prefix i e Xn,m do
Update the intensity of the corresponding routing prefix at each node j e
done done done
[0053] These processes are also shown in flow diagram form in Figure 2, Figure 3, and
Figure 4.
[0054] Figure 2 is a flow diagram of a process for distributing traffic in a network utilizing the max-min residual throughput capacity method, in accordance with an exemplary embodiment of the present invention. The incoming routes are ordered in decreasing intensity (also referred to as traffic load) at step 12. The process starts with the incoming route having the largest value of intensity (step 12). Subsets are formed from the candidate hops at step 14. The formation of these subsets is as described above. For example, a subset is formed for each individual candidate next hop, resulting in M subsets. Subsets are also formed having all combinations of M-l candidate next hops. The M-l candidate next hops are selected from candidate next hops having the largest values of residual throughput capacity. Incrementally smaller subsets are formed (described in more detail below with respect to Figure 4), including all combinations of candidate next hops within a respective incrementally smaller subset. The candidate next hops for a respective incrementally smaller subset are selected from candidate next hops having the largest values of residual throughput capacity. For each subset, the residual throughput capacity of each hop within that subset is determined at step 16. At step 18, the minimum value of residual throughput capacity for each subset is determined/selected. The incoming route is assigned to the hop(s) of the subset having the maximum value of minimum residual throughput capacity at step 20. At step 21, the traffic load for each assigned hop is updated to reflect the respective hop's portion ofthe incoming route's traffic load that is assigned to that hop. At step 22 it is determined if more incoming routes are to be assigned to next hops. If not, the process is ended at step 24. If more incoming routes are to be assigned, the next incoming route in decreasing order of intensity is selected at step 26, and the process is repeated starting at step 14.
[0055] Figure 3 is a flow diagram of a process for distributing traffic in a network utilizing the min-max residual gap method, in accordance with an exemplary embodiment of the present invention. The incoming routes are ordered in decreasing intensity (traffic load) at step 28. The process starts with the incoming route having the largest value of intensity (step 28). Subsets are formed from the candidate hops at step 30. The formation of these subsets is as
described above. For example, a subset is formed for each individual candidate next hop, resulting in M subsets. Subsets are also formed having all combinations of M-l candidate next hops. The M-l candidate next hops are selected from candidate next hops having the smallest values of residual throughput capacity. Incrementally smaller subsets are formed, including all combinations of candidate next hops within a respective incrementally smaller subset. The candidate next hops for a respective incrementally smaller subset are selected from candidate next hops having the largest values of residual throughput capacity. For each subset, the residual throughput capacity of each hop within that subset is determined at step 32. At step 34, the maximum value of residual throughput capacity for each subset is determined/selected. The incoming route is assigned to the hop(s) of the subset having the minimum value of maximum residual throughput capacity at step 36. At step 37, the traffic load for each assigned hop is updated to reflect the respective hop's portion ofthe incoming route's traffic load that is assigned to that hop. At step 38 it is determined if more incoming routes are to be assigned to next hops. If not, the process is ended at step 40. If more incoming routes are to be assigned, the next incoming route in decreasing order of intensity is selected at step 42, and the process is repeated starting at step 30.
[0056] Figure 4 is a flow diagram of a process for distributing traffic in a network utilizing the min-max traffic load method, in accordance with an exemplary embodiment of the present invention. The incoming routes are ordered in decreasing intensity (traffic load) at step 44. The process starts with the incoming route having the largest value of intensity (step 44). Subsets are formed from the candidate hops at step 48. The formation of these subsets is as described above. Thus, a number, M, of subsets is formed, where M is equal to the total number of next candidate hops. Each subset includes an individual candidate next hop. Subsets are also formed for all combinations of M-l and incrementally smaller (e.g., M-2, M-3,...) subsets of candidate next hops. For each subset, the traffic load of each hop within that subset is determined at step 50. The traffic load ratio is also determined at step 50. At step 52, the maximum traffic load values for all subsets are compared. The subset having a minimum value of traffic load/load ratio is determined/selected at step 54. The incoming route is assigned to the hop(s) ofthe subset having the minimum value of maximum traffic load/load ratio at step 56. At
step 21, the traffic load for each assigned hop is updated to reflect the respective hop's portion of the incoming route's traffic load that is assigned to that hop. At step 58 it is determined if more incoming routes are to be assigned to next hops. If not, the process is ended at step 60. If more incoming routes are to be assigned, the next incoming route in decreasing order of intensity is selected at step 62, and the process is repeated starting at step 48.
[0057] Figure 5 is a functional block diagram of a network router 64 for distributing fraffic in a network in accordance with an embodiment of the present invention. The router 64 distributes traffic in accordance with the processes described above. Specifically the router 64 performs the functions described in the minimum-maximum residual throughput capacity heuristic, the maximum residual throughput capacity heuristic, and/or the minimum-maximum traffic load heuristic. Accordingly, the router 64 comprises an assignment portion 66, a sorting portion 68, a residual throughput capacity portion 70, a subset forming portion 72, a data receiving portion 71, a data distribution portion 73, and a traffic load portion 74. The data receiving portion 71 receives traffic data from the incoming routes. The data distribution portion 73 provides traffic data to the selected next hops. The assignment portion 66 selects next hops from candidate next hops, assigns an incoming route to the selected next hops; and equally distributing traffic received from the incoming route among the assigned next hops in accordance with a cost function. The cost function may include any appropriate cost function. In one embodiment ofthe present invention, the cost function comprises the shortest path cost estimate previously described. The assignment portion 66 assigns each incoming route to at least one next hop selected from candidate next hops. The assignment portion 66 makes this assignment in accordance with the residual throughput capacity of a next hop (complying with the minimum- maximum residual throughput capacity process, the maximum-minimum residual throughput capacity process, or both) and/or the traffic load on a next hop (complying with the minimum- maximum traffic load process). The sorting portion 68 sorts incoming routes in descending order of respective traffic load values. The residual throughput capacity portion 70 determines the residual throughput capacities for each hop within subsets of the candidate next hops. The residual throughput capacity portion 70 also determines the minimum value of residual throughput capacity for each subset formed by the subset forming portion 72. The subset
forming portion 72 forms a number, M, of subsets equal to the total number of next candidate hops. Each subset includes each individual candidate next hop. The subset forming portion 72 also forms subsets for all combinations of M-l and incrementally smaller (e.g., M-2, M-3,...) subsets of candidate next hops. For processes involving residual throughput capacity, the subset forming portion 72 selects candidate next hops having the largest (for min-max process) or the smallest (for max-min process) values of residual throughput capacity. The traffic load portion 74 determines traffic loads of each hop within subsets of candidate next hops and determines a maximum value of traffic load for each subset.
[0058] A method for distributing network traffic as described herein may be embodied in the form of computer-implemented processes and system for practicing those processes. A method for distributing network traffic as described herein may also be embodied in the form of computer program code embodied in tangible media, such as floppy diskettes, read only memories (ROMs), CD-ROMs, hard drives, high density disk, or any other computer-readable storage medium, wherein, when the computer program code is loaded into and executed by a computer, the computer becomes a system for practicing the invention. The method for distributing network traffic as described herein may also be embodied in the form of computer program code, for example, whether stored in a storage medium, loaded into and/or executed by a computer, or transmitted over some transmission medium, such as over the electrical wiring or cabling, through fiber optics, or via electromagnetic radiation, wherein, when the computer program code is loaded into and executed by a computer, the computer becomes a system for practicing the invention. When implemented on a general-purpose processor, the computer program code segments configure the processor to create specific logic circuits.
[0059] Experiments: Two sets of experiments were conducted on artificially generated topologies as well as on an actual ISP topology. In the first set the performance ofthe heuristics when compared against optimal routing was studied. In the second set of experiments the tradeoff between performance and configuration overhead was studied by varying the number of routing prefixes for which the set of next hops they were assigned was controlled.
[0060] For purposes of comparison, a linear multi-commodity flow routing problem with the same piecewise linear cost function was solved. The only constraint in the routing problem is flow conservation and consequently it provides a lower bound on the performance of any routing scheme, for the same metric. Hence forth, this problem shall be referred to as the "optimal routing problem" and its solution as the "optimal routing solution ". The solution to this problem is a set of paths (traffic allocation) for each commodity which yields r'iJ the bandwidth consumed on each link.
[0061] The optimal allocation problem with regard to this cost function is reproduced below for completeness. Let the flow of commodity r on link (i, j) be denoted by * t,J'. The total
flow on link (i,j) is
and the capacity is Cy. Denote the cost of link (ij) by
'J U'. ' i,j , which is a piecewise linear function that approximates an exponentially growing curve. The cost grows as the traffic on the link increases and the rate of growth accelerates with increasing utilization. Evolution of the cost function with link load is shown in Figure 6. The problem may then be formulated as:
im T ih ifaj > Qj) subject ϊo dr if i = Hr
2-_ MM ~ Σ ¥?J = - —4r if . = i_. Q)
ΦJ} E 0 otherwise
V i € V* r e. 71
(4)
2
4. - 3/.J - -C.J, 1/3 < «,J 2/3 15)
IC *U - 10/o - - QJ, 2/3 «*j < 9/10 (6J
178
Φ.j - 70/j id Cfjj 9/10 ^ Uf.j i; 1 (7) MCS
3 Cr j, 1 < «*j 11 10 (S)
«ιj - SOOO/i j <7ιj, 11 10 < ».../ (Q) 3
Equation 3 imposes flow conservation constraints. Note that this approach is not limited to any particular cost function. This cost function is merely exemplary. Also note that the cost function in the Linear Program tries to avoid long paths while trying to meet bandwidth constraints. The experimental set up and observations regarding performance and complexity trade-off are now presented.
A. Experimental Set Up
[0062] For our experiments, the artificial topologies were generated using the Georgia
Tech and BRITE topology generators. Please refer to E. W. Zegura, "GT-ITM: Georgia Tech internet topology models (software), George Tech," 1966, and A. Medina, A. Lakina, I. Matta, and J. Byers, "BRITE: Boston university representative internet topology generator," Boston University," April 2001, for more detail concerning the Georgia Tech topology and the BRITE topology, which references are incorporated herein by reference in their entireties as if presented herein. ,BRITE allows several options for generating topologies: AS Level, Hierarchical and router level. The router level option was chosen. The topologies generated using both generators were random graphs constructed by choosing points uniformly on a grid. In all instances of simulated topologies, the link capacities were set to 500 Mbps. Actual physical link capacities were used for the topology based on the ISP topology.
[0063] For the artificially generated topologies, random traffic matrices were generated by picking the traffic intensity of each routing prefix from a Pareto distribution. The choice of a Pareto distribution was motivated by measurements taken from several routers on the ISP topology. Experiments with other distributions, i.e., uniform, bimodal, Gaussian, and exponential in other experiments, were tried but are not include here because the results were similar to those obtained with the Pareto distribution. The ISP topology traffic matrix was based on actual traffic traces downloaded from access links to two of the ISP topology routers. The traces were measured at the granularity of the routing table entries and giving two rows of the fraffic matrix. The routing prefixes were averages over 10 hours. The routing prefix intensities in the remaining rows were generated artificially using a Pareto distribution. The other parameter of importance is the number of routing prefix associated with each egress router. For
this, both a uniform and a Pareto distribution were used, as it gives a reasonable coverage for the possible difference in the number of available routing prefixes to a given egress router.
Each experiment was conducted in the following fashion.
1) For each network topology, generate random traffic matrices, varying both the total number of routing prefixes and distribution (except in the case of the ISP topology traffic matrix) from which the ingress traffic intensity of each routing prefix was picked.
2) Hot spots were introduced in the traffic matrix by randomly selecting elements from the traffic matrix and scaling them to create several instances of the traffic matrix. Cases where only some ofthe fraffic elements were chosen and also cases where all entries were chosen were tested. In the latter case, this involves scaling the entire traffic matrix.
3) The "optimal routing problem" (10) was then solved for each such instance (topology and traffic matrix).
4) The linear program (1), with the optimal link bandwidths from the "optimal routing solution" as input, was solved to obtain the traffic allocation (which was aggregated based on destination, ref. Section II-A and the set of link weights.
5) Finally, the three heuristics were run over the network with the link weights and traffic flows from the previous step (please refer to pseudo-code). In most trials, the link weights turned out to be integers in the range 1-20. In a few experiments however, the weights were not integers. In such cases, the link weights were rounded to within 5 digit accuracy, which was found to be sufficient in all cases. ILOG CPLEX was used to solve the optimal routing problem and the linear program (1). On a Dell 2500 1 GHz machine it took about 2 hours to solve the optimal routing problem and 30 minutes and less than 10 minutes for the LP (1) and the heuristics respectively, on the largest networks.
B. Performance Comparison against Optimal Routing
[0064] Presented and discussed are the results ofthe experiments. In Figure 7, is plotted cost vs. total traffic demand for all 3 heuristics and optimal routing on a 50 Node 200 Edge graph with a granularity of 26500 routing prefixes per node. This number was chosen simply as an approximation of the number of routing prefixes in a backbone router. Experiments were conducted with up to 100,000 routing prefixes and as few as 500 routing prefixes without any significant change in performance. The graph was generated using the BRITE generator. The horizontal lines represent various levels of maximum average link utilization over all links for optimal routing. The fraffic matrix for this experiment was scaled by selecting 70% ofthe traffic elements as hotspots. From Figure 7, it can be seen that in all the cases, the heuristics are very near the optimal solution indicating that they are able to match the optimal traffic split very closely. Moreover, all three heuristics perform equally well in all instances. For comparison, the performance of standard OSPF routing with weights computed using the implementation of the heuristic described in B. Fortz and M. Thorup, "Internet traffic engineering by optimizing OSPF weights," in Proceedings of INFOCOM '2000, Tel Aviv, Israel, March 2000, which is hereby incorporated by reference in its entirety as if presented herein, (denoted by , F&T Heuristic" in the graph of Figure 3) is shown. In Figure 8 is plotted the percent (%) deviation from the optimal for the three heuristics. The low percentage deviation (0.2%- 1%) from the optimal value highlights how effective the heuristics are.
[0065] Results for a 30 node topology generated using the Georgia Tech topology generator are shown in Figures 9 and 10. The results show that the heuristics close track the optimal solution even at high loads.
[0066] hi Figures 11 and 12, are plotted the performance of the heuristics on the ISP topology. The entire traffic matrix was scaled for the experiments involving the ISP topology. It can be seen that the heuristics clearly perform very well (well within 1% of the optimal). This performance was also observed in a number of other experiments that were conducted but have not shown here due to their similar nature.
C. Lowering Configuration Overhead
[0067] A goal of this experimentation was to investigate the trade-off between configuration overhead and performance. Recall that in the original approach the heuristics decide the subset of next hops assigned to every routing prefix. However, it has been observed that in practice, a large fraction of the fraffic is distributed over a relatively small number of routing prefixes. The analysis of the backbone traces obtained from the ISP topology router show that 95% ofthe total traffic was accounted for by only 10% ofthe routing prefixes. Figure 13 highlights this observation, in which is plotted the cumulative fraffic intensity as a function of the number of routing prefixes sorted in decreasing order of their traffic intensities. Potentially, this phenomenon can be exploited by configuring the set of next hops for only a few selective routing prefixes that carry most ofthe traffic and allowing the default assignment of all next hops for the remaining routing prefixes. This has the advantage of lowering configuration overhead, but raises the question of how it impacts performance.
[0068] A systematic study of such a trade-off on all the previous topologies was conducted. In each instance, the set of next hops at each node was configured for only a certain set of routing prefixes that were selected based on the amount of traffic they carried. The remaining routing prefixes were split equally over the entire set of next hops as would happen with default OSPF/IS-IS behavior. The set of configured routing prefixes was then progressively increased in each experiment to determine the evolution of the impact on performance. In all cases the Min-Max load heuristic was used when configuring the set of next hops.
[0069] The resulting performance curves for the 50 Node 200 Edge graph are shown in
Figure 14 and the number of configured routing prefixes are shown in Table I below. Each curve on the plot is referenced by the amount of traffic that was accounted for by the configured routing prefixes. This can be cross-referenced from the table against the number of routing prefixes that were configured. We observe that on an average, by configuring about 165 routing prefixes per router we get good performance till about 50% maximum link utilization. If we configure next hops for about 17% of all routing prefixes, or 4500 entries, at a router, we account for approximately 75% of the traffic and the resulting performance is quite close to that of optimal routing.
TABLE I
[0070] The observations also hold for the 30 Node 238 Edge graph, whose configuration overhead savings are shown in Table II below and the resultant performance in Figure 15.
TABLE II
[0071] Experiments conducted on the ISP topology (Figure 16, Table III below) yield similarly encouraging results. We get good performance up to approximately 50-60% maximum link utilization, by configuring only 200 routing prefixes per router and up to more than 70% link utilization if we configure 600 routing prefixes per router.
TABLE III
[0072] A system and method for distributing network traffic in accordance with the present invention has the potential for providing the benefits of traffic engineering to existing IP networks, without requiring changes to either the routing protocols or the forwarding mechanisms. Several advantages are evident. Optimal link loads can be closely approximated without changing current forwarding mechanisms, namely, by carefully controlling the set of next hops for each prefix. Second, a heuristic is presented having a provable performance bound as well as two other simple heuristics. All three are experimentally shown to give excellent and similar performance. These heuristics are generalized enough to be potentially useful in their own right. Finally, it is shown, using actual traffic traces, that configuration overhead can be vastly reduced without significant loss of performance. Specifically, by only configuring next hops for a small set of prefixes, near-optimal performance for link loads of up to 70% is obtainable. This is obviously an important aspect for the practical deployment of the fraffic engineering solution.
[0073] Although illustrated and described herein with reference to certain specific embodiments, the system for distributing network fraffic the methods for implementing same as described herein are nevertheless not intended to be limited to the details shown. Rather, various modifications may be made in the details within the scope and range of equivalents ofthe claims and without departing from the spirit ofthe invention..