US9270487B1 - Full bisection bandwidth network - Google Patents

Full bisection bandwidth network Download PDF

Info

Publication number
US9270487B1
US9270487B1 US12/577,979 US57797909A US9270487B1 US 9270487 B1 US9270487 B1 US 9270487B1 US 57797909 A US57797909 A US 57797909A US 9270487 B1 US9270487 B1 US 9270487B1
Authority
US
United States
Prior art keywords
path
vlan
paths
vlans
layer switch
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active, expires
Application number
US12/577,979
Inventor
Chinh Kim Nguyen
Curtis Hall Stehley
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Teradata US Inc
Original Assignee
Teradata US Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Teradata US Inc filed Critical Teradata US Inc
Priority to US12/577,979 priority Critical patent/US9270487B1/en
Assigned to TERADATA US, INC reassignment TERADATA US, INC ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: STEHLEY, CURTIS H, NGUYEN, CHINH K
Application granted granted Critical
Publication of US9270487B1 publication Critical patent/US9270487B1/en
Active legal-status Critical Current
Adjusted expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L12/00Data switching networks
    • H04L12/28Data switching networks characterised by path configuration, e.g. LAN [Local Area Networks] or WAN [Wide Area Networks]
    • H04L12/46Interconnection of networks
    • H04L12/4641Virtual LANs, VLANs, e.g. virtual private networks [VPN]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L12/00Data switching networks
    • H04L12/28Data switching networks characterised by path configuration, e.g. LAN [Local Area Networks] or WAN [Wide Area Networks]
    • H04L12/46Interconnection of networks
    • H04L12/4604LAN interconnection over a backbone network, e.g. Internet, Frame Relay
    • H04L12/462LAN interconnection over a bridge based backbone
    • H04L12/5689
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L45/00Routing or path finding of packets in data switching networks
    • H04L45/48Routing tree calculation

Definitions

  • a device connected to a network typically connects to a port on a network switch or hub.
  • Network switches and hubs have a limited number of ports. Expanding the network to include a number of devices beyond the number of ports typically requires linking two or more switches or hubs. Redundant paths in the network are typically disabled by network protocol to prevent broadcast storms and loops in the topology. Making efficient use of such multiple-switch networks is a challenge.
  • the invention features a method.
  • a full bisection bandwidth network having a plurality of nodes and a plurality of paths among the nodes, is divided into a plurality of Virtual Local Area Networks (“VLANs”) by assigning paths to the VLANs such that each VLAN satisfies a spanning tree protocol and all paths are active in at least one VLAN.
  • VLANs Virtual Local Area Networks
  • the full bisection bandwidth network may include a path A connecting node X and node Y and a path B connecting node X and node Y, such that standard Ethernet protocol would treat path A and path B as redundant paths.
  • the method may further include aggregating Path A and Path B into a single trunk group so that Path A and Path B are active.
  • the method may further include constructing the full bisection bandwidth network to have a fat tree topology.
  • the method may further include constructing the full bisection bandwidth network to have a fully connected mesh topology.
  • the plurality of nodes may include a root layer of N Ethernet switches and a branch layer of M Ethernet switches. M may be greater than N.
  • the plurality of paths may include a path from each root layer switch to each branch layer switch.
  • Assigning paths to the VLANs may include assigning paths from a first root layer switch to a first set of VLANs, assigning paths from a second root layer switch to a second set of VLANs, the first set of VLANs not containing any VLANS belonging to the second set of VLANS, and the second set of VLANs not containing any VLANs belonging to the first set of VLANs.
  • M may equal 2N.
  • Assigning paths to the VLANs may include assigning a path from branch layer switch BLS 1 to root layer switch RLS 1 to a first VLAN and assigning a path from branch layer switch BLS 1 to root layer switch RLS 2 to a second VLAN.
  • a plurality of servers may be coupled to the full bisection bandwidth network. The method further may further include providing redundant paths from one of the plurality of servers to another of the plurality of servers.
  • the method may further include providing redundant paths from each of the plurality of servers to the others of the plurality of servers.
  • Assigning paths to the VLANs may include assigning a first path from branch layer switch BLS 1 to root layer switch RLS 1 to a first VLAN and assigning a second path redundant to the first path to the first VLAN.
  • the plurality of paths may include a path from each root layer switch to each branch layer switch.
  • a plurality of servers is coupled to the branch layer of Ethernet servers.
  • Assigning paths to the VLANs may include assigning a first path from a first server to a second server and assigning a second path redundant to the first path from the first server to the second server.
  • Assigning paths to the VLANs may include assigning redundant paths from each of the plurality of servers through the branch layer of Ethernet switches and the root layer of Ethernet switches to the others of the plurality of servers.
  • the invention features a system.
  • the system includes a full bisection bandwidth network.
  • the full bisection bandwidth network includes a plurality of nodes.
  • the full bisection bandwidth network includes a plurality of paths among the nodes.
  • the full bisection bandwidth network includes a plurality of Virtual Local Area Networks (“VLANs”) incorporating the plurality of nodes and the plurality of paths. Each VLAN satisfies a spanning tree protocol. All paths are active in at least one VLAN.
  • VLANs Virtual Local Area Networks
  • the full bisection bandwidth network may include a path A connecting node X and node Y and a path B connecting node X and node Y, such that standard Ethernet protocol would treat path A and path B as redundant paths.
  • Path A and Path B may be aggregated into a single trunk group such that Path A and Path B are active.
  • the full bisection bandwidth network may have a fat tree topology.
  • the full bisection bandwidth network may have a fully connected mesh topology.
  • the plurality of nodes may include a root layer of N Ethernet switches.
  • the plurality of nodes may include a branch layer of M Ethernet switches. M may be greater than N.
  • the plurality of paths among the nodes may include a path from each root layer switch to each branch layer switch.
  • Paths from a first root layer switch may be assigned to a first set of VLANs.
  • Paths from a second root layer switch may be assigned to a second set of VLANs.
  • the first set of VLANs may not contain any VLANS belonging to the second set of VLANS.
  • the second set of VLANs may not contain any VLANs belonging to the first set of VLANs.
  • M may equal N.
  • the plurality of paths among the nodes may include a path from branch layer switch BLS 1 to root layer switch RLS 1 assigned to a first VLAN and a path from branch layer switch BLS 1 to root layer switch RLS 2 assigned to a second VLAN.
  • the plurality of paths among the nodes may include a path from branch layer switch BLS 1 to root layer switch RLS 1 assigned to a first VLAN and a path from branch layer switch BLS 2 to root layer switch RLS 1 assigned to a second VLAN.
  • the plurality of paths among the nodes may include a first path PATH 1 from a first branch layer switch BLS 1 to a first root layer switch RLS 1 and a second path PATH 2 from the first branch layer switch BLS 1 to the first root layer switch RLS 1 that are aggregated into a single trunk group.
  • the system may further include a plurality of servers coupled to the full bisection bandwidth network. he plurality of paths among the nodes may include a plurality of redundant paths from one of the plurality of servers to another of the plurality of servers.
  • the plurality of paths among the nodes may include a plurality of redundant paths from each of the plurality of servers to the others of the plurality of servers.
  • the plurality of paths among the nodes may include a first path from branch layer switch BLS 1 to root layer switch RLS 1 assigned to a first VLAN and a second path redundant to the first path assigned to the first VLAN.
  • the plurality of paths among the nodes may include a first path from a first server to a second server and a second path redundant to the first path from the first server to the second server.
  • the plurality of paths among the nodes may include redundant paths from each of the plurality of servers through the branch layer of Ethernet switches and the root layer of Ethernet switches to the others of the plurality of servers.
  • the invention features a method.
  • the method includes providing a full bisection bandwidth network, having a plurality of nodes and a plurality of paths among the nodes, that is divided into a plurality of Virtual Local Area Networks (“VLANs”) by assigning paths to the VLANs such that each VLAN satisfies a spanning tree protocol and all paths are active in at least one VLAN.
  • VLANs Virtual Local Area Networks
  • the full bisection bandwidth network carries a traffic load.
  • the method includes balancing the traffic load among the paths.
  • the invention features a method.
  • the method includes providing a full bisection bandwidth network, having a plurality of nodes and a plurality of paths among the nodes, that is divided into a plurality of Virtual Local Area Networks (“VLANs”) by assigning paths to the VLANs such that each VLAN satisfies a spanning tree protocol and all paths are active in at least one VLAN.
  • the method further includes adding a node.
  • the method further includes adding paths to connect the added node to the full bisection bandwidth network and adjusting the assignments of paths and the added paths to VLANs such that each VLAN satisfies a spanning tree protocol, all paths are active in at least one VLAN, and the network remains a full bisection bandwidth network.
  • Adjusting the assignments may include adding a new VLAN. Adjusting the assignments may include adding the added paths to the existing VLANs.
  • FIG. 1 is a block diagram of a node of a parallel processing database system.
  • FIG. 2 is a block diagram of a parsing engine.
  • FIG. 3 is a block diagram of a parser.
  • FIG. 4 is an illustration of a full bisection bandwidth network.
  • FIGS. 5 and 6 are illustrations of the effects of the spanning tree protocol.
  • FIGS. 7-9 are illustrations of the effects of the elimination of redundant links and the application of link aggregation.
  • FIG. 10 is an illustration of a network using three 8-port switch elements to create a network of 12 ports.
  • FIG. 11 is an illustration of a network using six 8-port switch elements to create a network of 16 ports in a fat tree topology.
  • FIG. 12 is an illustration of a network using twelve 8-port switch elements to create a network of 32 ports.
  • FIG. 13 is an illustration of a fully connected mesh network.
  • FIG. 14 is an illustration of a fat tree network.
  • FIG. 15 is a graphical representation of the traffic activities of the ports in an unconfigured network.
  • FIG. 16 is a graphical representation of the traffic activities of the ports in a configured network.
  • FIG. 17 is a chart showing the improvement in throughput from an unconfigured network to a fully configured network.
  • FIG. 18 is a chart showing the improvement in number of drops per second from an unconfigured network to a fully configured network.
  • FIG. 19 is a chart showing the network activity when traffic is injected into a single Virtual Local Area Network (“VLAN”).
  • VLAN Virtual Local Area Network
  • FIG. 20 is a representation of a network illustrating traffic flowing in only a single VLAN.
  • FIGS. 21-23 are flow charts.
  • FIG. 1 shows a sample architecture for one subsystem 105 1 of the DBS 100 .
  • the DBS subsystem 105 1 includes one or more processing modules 110 1 . . . N , connected by a network 115 , that manage the storage and retrieval of data in data-storage facilities 120 1 . . . N .
  • Each of the processing modules 110 1 . . . N may be one or more physical processors or each may be a virtual processor, with one or more virtual processors running on one or more physical processors.
  • the single physical processor swaps between the set of N virtual processors.
  • the subsystem's operating system schedules the N virtual processors to run on its set of M physical processors. If there are 4 virtual processors and 4 physical processors, then typically each virtual processor would run on its own physical processor. If there are 8 virtual processors and 4 physical processors, the operating system would schedule the 8 virtual processors against the 4 physical processors, in which case swapping of the virtual processors would occur.
  • Each of the processing modules 110 1 . . . N manages a portion of a database that is stored in a corresponding one of the data-storage facilities 120 1 . . . N .
  • Each of the data-storage facilities 120 1 . . . N includes one or more disk drives.
  • the DBS may include multiple subsystems 105 2 . . . N in addition to the illustrated subsystem 105 1 , connected by extending the network 115 .
  • the system stores data in one or more tables in the data-storage facilities 120 1 . . . N .
  • the rows 125 1 . . . 2 of the tables are stored across multiple data-storage facilities 120 1 . . . N to ensure that the system workload is distributed evenly across the processing modules 110 1 . . . N .
  • a parsing engine 130 organizes the storage of data and the distribution of table rows 125 1 . . . 2 among the processing modules 110 1 . . . N .
  • the parsing engine 130 also coordinates the retrieval of data from the data-storage facilities 120 1 . . . N in response to queries received from a user at a mainframe 135 or a client computer 140 .
  • the DBS 100 usually receives queries and commands to build tables in a standard format, such as SQL.
  • the rows 125 1 . . . Z are distributed across the data-storage facilities 120 1 . . . N by the parsing engine 130 in accordance with their primary index.
  • the primary index defines the columns of the rows that are used for calculating a hash value.
  • the function that produces the hash value from the values in the columns specified by the primary index is called the hash function.
  • Some portion, possibly the entirety, of the hash value is designated a “hash bucket”.
  • the hash buckets are assigned to data-storage facilities 120 1 . . . N and associated processing modules 110 1 . . . N by a hash bucket map. The characteristics of the columns chosen for the primary index determine how evenly the rows are distributed.
  • each storage facility is also logically organized.
  • One implementation divides the storage facilities into logical blocks of storage space.
  • Other implementations can divide the available storage space into different units of storage.
  • the logical units of storage can ignore or match the physical divisions of the storage facilities.
  • the parsing engine 130 is made up of three components: a session control 200 , a parser 205 , and a dispatcher 210 , as shown in FIG. 2 .
  • the session control 200 provides the logon and logoff function. It accepts a request for authorization to access the database, verifies it, and then either allows or disallows the access.
  • a user may submit a SQL query, which is routed to the parser 205 .
  • the parser 205 interprets the SQL query (block 300 ), checks it for proper SQL syntax (block 305 ), evaluates it semantically (block 310 ), and consults a data dictionary to ensure that all of the objects specified in the SQL query actually exist and that the user has the authority to perform the request (block 315 ).
  • the parser 205 runs an optimizer (block 320 ), which develops the least expensive plan to perform the request and produces executable steps to execute the plan.
  • a dispatcher 210 issues commands to the processing modules 110 1 . . . N to implement the executable steps.
  • the network 115 will continue to be described in the context of the system illustrated in FIG. 1 but it will be clear to persons of ordinary skill in the art that the network described herein is not limited to that context but can be used in any networking context.
  • the network 115 includes a network 405 , such as that illustrated in FIG. 4 .
  • the network 405 includes R switch elements, each having S ports, with each switch element having a connection to the other switch elements, leaving R(S ⁇ R+1) ports to which devices, such as the processing modules 110 1 . . . N in FIG. 1 , can connect.
  • the network 405 includes six 8-port switch elements 410 (only one is labeled).
  • An end point device 415 (“device” or “end point”: only one is labeled), such as one of the processing modules 110 1 . . . N in FIG. 1 , represented in FIG. 4 by an asterisk (*), can be coupled to one of the ports 420 (only one is labeled).
  • the network 405 illustrated in FIG. 4 can connect up to 18 devices.
  • the network 405 illustrated in FIG. 4 is a full bisection bandwidth network.
  • a device connected to one port on the network can communicate with another device connected to another port on the network at full speed, even when the network is fully populated and all ports are operating at full speed.
  • the network wants to transmit to destination end point B, source end point B (which may be the same as destination end point B) wants to transmit to destination end point C, and source end point C (which may be the same as destination end point C) wants to transmit to destination end point A (which may be the same as source end point A) at the same time, the network provides paths for all of them. In a network that has the less than full bisection bandwidth, that may not be true.
  • a full bisection bandwidth network is realized when the network can be arbitrarily cut in half, such as by line 425 , and the number of cut links is equal to the number of end points in each half.
  • the number of cross links cut by the line 425 ( 9 ) is equal to the number of end points (i.e., the asterisks) ( 9 ) on either side of the line 425 .
  • To achieve such the requirement of a full bisection bandwidth network typically requires that all available links in the network remain active.
  • the network 405 illustrated in FIG. 4 would not be a full bisection bandwidth network because the Ethernet spanning tree protocol among switch elements disables redundant paths to avoid broadcast storms. By disabling redundant paths, however, the network loses valuable connections between end points.
  • FIG. 5 illustrates a typical scenario in which a redundant path is disabled by the spanning tree protocol.
  • nodes 405 , 410 , and 415 are connected by paths 420 , 425 , and 430 .
  • multiple redundant paths are available between any two nodes.
  • nodes 405 and 410 are connected by (a) path 420 , and (b) paths 425 and 430 through node 415 .
  • such redundant paths create the possibility of loops and broadcast storms.
  • one of the paths shown in FIG. 5 would be disabled under the spanning tree protocol.
  • path 430 is disabled, as indicated by the dashed line representing path 430 on the right side of FIG. 5 , thereby eliminating redundant paths between nodes 405 , 410 , and 415 .
  • the IEEE 802.1q protocol is applied to divide a network subject to the spanning tree protocol into VLANs in order to keep all network paths active and available for traffic.
  • the network in FIG. 5 is configured to have three VLANs.
  • the first VLAN, illustrated by tree 1 in FIG. 6 has path 430 disabled.
  • the second VLAN, illustrated by tree 2 in FIG. 6 has path 425 disabled.
  • the third VLAN, illustrated by tree 3 in FIG. 6 has path 420 disabled.
  • all three of the paths 420 , 425 , and 430 are active and each is active in two VLANs (e.g., path 420 is active in VLANs tree 1 and tree 2 ).
  • Ethernet protocol disables redundant links to prevent loops in the network. For example, consider the nodes 705 and 710 connected by redundant paths 715 , 720 , and 725 in FIG. 7 . Typical Ethernet protocol will disable two of the paths (e.g., paths 715 and 720 ), as shown in FIG. 8 .
  • the IEEE 802.2ad protocol is applied, as shown in FIG. 9 , to aggregate multiple links into a single trunk group (represented by wrapper 905 ), such that all paths remain active during normal operation.
  • FIG. 10 shows one embodiment of a full configuration of a network 1005 using 3 switch elements 1010 , 1015 , and 1020 of 8 ports each to expand from a single 8-port switch to a full bisection bandwidth network of 12 ports.
  • the embodiment shown in FIG. 10 provides two paths 1025 and 1030 between switch element 1010 and switch element 1015 , two paths 1035 and 1040 between switch element 1010 and switch element 1020 , and two paths 1045 and 1050 between switch element 1015 and switch element 1020 .
  • each switch element provides connections to four end points 1055 (such as, for example, the processing modules 110 1 . . . N in FIG. 1 )(only one is labeled).
  • Each source end point has 2 paths available to reach any destination end point.
  • a source end point attached to switch element 1010 can reach a destination end point attached to switch element 1015 through path 1025 or through path 1030 .
  • network 1005 has been divided into 3 VLANs, as described above in connection with the description of FIG. 6
  • a source end point attached to switch element 1010 desires to communicate to a destination end point attached to switch element 1015
  • paths 1025 and 1030 are disabled in the VLAN being used for the communication
  • multiple paths are still available through paths 1035 , 1040 , 1045 and 1050 .
  • such communication could be accomplished via a VLAN in which paths 1025 and 1030 are enabled.
  • FIG. 11 shows one embodiment of a fully configured network with 6 switch elements of 8 ports each to expand from a single 8-port switch to a network of 16 ports in a full bisection bandwidth network having a fat tree topology.
  • Two of the switch elements 1110 and 1115 are at a root layer of the fat tree topology.
  • the other switches 1120 , 1125 , 1130 , and 1135 are at a branch layer of the fat tree topology.
  • the end points 1140 are at a leaf level of the fat tree topology.
  • the network in FIG. 11 has the appearance of an inverted tree with the root at the top and the leaves at the bottom.
  • 11 is known as a “fat tree” because there are more paths between the root layer and the branch layer (those paths are labeled generally as 1145 ) than between the branch layer and the leaf layer (those branches are labeled generally as 1150 ).
  • the network 1105 shown in FIG. 11 there are 16 paths between the root layer and the branch layer and 16 paths between the branch layer and the leaf layer. Both protocols 802.1q and 803.2ad are used to achieve full bisection bandwidth.
  • Each source end point has 2 paths to reach any destination end point. Any source end point can reach any destination end point through one of the dashed paths or through one of the solid paths. This multiplicity of paths provides the ability to load balance traffic across paths.
  • FIG. 12 shows a fat tree topology network 1205 with four 8-port switch elements 1210 , 1215 , 1220 , and 1225 at the root layer and eight 8-port switch elements 1230 , 1235 , 1240 , 1245 , 1250 , 1255 , 1260 , and 1265 at the branch layer, which produces a 32-port network.
  • the Link Aggregation Protocol 803.2ad is not needed to achieve full bisection bandwidth.
  • VLANs There are four VLANs in the network, a first represented by the solid paths, a second represented by the dashed (i.e., “----”) paths, a third represented by the dash-dot paths (“-•-•”), and a fourth represented by the long-dash/short-dash (“- — - —”) paths.
  • Any source end point (represented by the asterisks) in the topology shown in FIG. 12 can reach any destination end point (also represented by the asterisks) through one of the four path sets.
  • FIG. 13 shows one embodiment of a fully connected mesh network 1305 using 3 Dell 6248 48-port Gigabit Ethernet switches 1310 , 1315 , 1320 .
  • the switches in the network are cabled together as shown:
  • the switches are configured using Multiple Spanning Tree Protocol (802.1q) and Link Aggregation Protocol (803.2ad) to provide a 60-port network with full bisection bandwidth. All paths in the network are active at all times to carry traffic and there are 6 distinct paths for each source end point to inject traffic into the network to reach any destination end point.
  • 802.1q Spanning Tree Protocol
  • 803.2ad Link Aggregation Protocol
  • FIG. 14 shows a fat tree network using six Dell 6248 48-port Gigabit Ethernet switches. The switches are cabled together as shown:
  • the network is configured using Multiple Spanning Tree Protocol (802.1q) and Link Aggregation Protocol (803.2ad) to provide a 92-port network with full section bandwidth. All links in the network are active at all time to carry traffic and there are 4 distinct paths for each source node to inject traffic into the network to reach any destination node. Two ports in each switch are dedicated for system management.
  • 802.1q Spanning Tree Protocol
  • 803.2ad Link Aggregation Protocol
  • Switch element S10 configure vlan database vlan 2-5 All VLANs are declared exit interface range ethernet 1/g1-1/g2 spanning-tree disable spanning-tree portfast exit interface range ethernet 1/g3-1/g48 switchport mode general no switchport general acceptable-frame-type tagged-only switchport general allowed vlan add 2-5 tagged exit interface range ethernet 1/g3-1/g25 spanning-tree disable spanning-tree portfast exit interface range ethernet 1/g26-1/g30 Link aggregation setting channel-group 5 mode auto exit interface range ethernet 1/g31-1/g36 channel-group 6 mode auto exit interface range ethernet 1/g37-1/g42 channel-group 7 mode auto exit interface range ethernet 1/g43-1/g48 channel-group 8 mode auto exit interface port-channel 5 Attach VLAN to LAG group hashing-mode 5 switchport mode general no switchport general acceptable-frame-type tagged-only switchport general allowed vlan add 2-5 tagged exit interface port
  • interface port-channel 1 Attach VLAN to LAG group hashing-mode 5 switchport mode general no switchport general acceptable-frame-type tagged-only switchport general allowed vlan add 2-5 tagged exit ! interface port-channel 2 hashing-mode 5 switchport mode general no switchport general acceptable-frame-type tagged-only switchport general allowed vlan add 2-5 tagged exit interface port-channel 3 hashing-mode 5 switchport mode general no switchport general acceptable-frame-type tagged-only switchport general allowed vlan add 2-5 tagged exit interface port-channel 4 hashing-mode 5 switchport mode general no switchport general acceptable-frame-type tagged-only switchport general allowed vlan add 2-5 tagged exit interface port-channel 5 hashing-mode 5 switchport mode general no switchport general acceptable-frame-type tagged-only switchport general allowed vlan add 2-5 tagged exit interface port-channel 5 hashing-mode 5 switchport mode general no switchport general acceptable-frame-type tagged-only switchport general allowed vlan add 2-5 tagged
  • the network in FIG. 14 was used to collect relevant statistics for the invention. Fifteen server end points were used to emulate a 90-end-point fully connected, fully configured network. A diagnostic driver was designed to allow the ability to inject selectively one, many, or all VLAN traffic into the network. Using the diagnostic driver the network was characterized.
  • Tables 1 and 2 contain statistics collected when the network 1405 is not configured as described above. That is, the statistics shown in Table 1 show the number of bytes transmitted through one of the end points and the number of dropped packets when the network is configured with a single VLAN and is allowed to self-configure what it considers to be its best topology:
  • FIG. 15 shows the graphical representation of the traffic activities of all ports in the network. The figure shows that uplink activities are essentially reduced to one path and only one switch of the root layer switches (S 20 ) is actively carrying traffic.
  • Table 2 shows network statistics collected when the network is configured into a full bisection bandwidth topology, as described herein.
  • FIG. 16 shows the graphical representation of the traffic activities of all ports in the network. The figure shows that all paths in the network are actively carrying traffic and that all root level switches are actively carrying traffic.
  • FIGS. 17 and 18 show the difference in bandwidth and the number of packet drops per end point depending on whether the invention is used (“Fully configured network”) or not (“Unconfigured network”).
  • FIG. 19 shows that when only VLAN 2 traffic is injected into the network, only relevant ports that were configured for VLAN 2 carry the traffic.
  • the point of this experiment is to show that there is a distinct path for any source to any destination for a particular VLAN. Traffic for each VLAN is completely isolated from all other traffic. Consequently, each source end point can decide how best to load balance traffic based on VLAN. For instance, one VLAN could be used for high priority traffic, one VLAN could be used as a broadcast only path, etc.
  • FIG. 20 depicts the same concept in a different way.
  • the bold lines are paths that are dedicated to VLAN 2 . Injecting packets in different VLANs improves isolation and increases network throughput by avoiding congestion.
  • each switch element In order to achieve a topology with full bisection bandwidth and predictable routes between sources and destinations, the network is cabled with strict connectivity for full section bandwidth, and each switch element is configured to achieve a complete network.
  • Each switch element is configured to meet Multiple Spanning Tree Protocol (802.1q) (“MSTP”) and Link Aggregation Protocol (803.2ad) (“LAG”) requirements.
  • MSTP Multiple Spanning Tree Protocol
  • LAG Link Aggregation Protocol
  • the configuration allows the MSTP and LAG protocols to automatically produce the desired network.
  • network configuration of a fat tree network begins by cabling root layer switches and branch layer switches into a full bisection bandwidth topology (block 2105 ), such as that shown in FIG. 14 .
  • the branch layer switches are configured (block 2110 ) and the root layer switches are configured (block 2115 ).
  • Configuration of the branch layer switches begins by determining if the last branch layer switch has been configured (block 2205 ). If it has (“Y” branch out of block 2205 ), then branch layer configuration is complete. If it has not (“N” branch out of block 2205 ), then the next branch layer switch is configured (block 2210 ).
  • the management ports are set up (block 2215 ).
  • the spanning tree protocol is disabled on the communication ports (i.e., the non-management ports) that are or can be connected to end points (i.e., not ports used to connect two switches) (block 2220 ). Redundant communication ports are aggregated (block 2225 ). The aggregated communication ports are configured (block 2230 ). Aggregated communication ports are assigned to VLANs (block 2235 ).
  • Configuration of the root layer switches begins by determining if the last root layer switch has been configured (block 2305 ). If it has (“Y” branch out of block 2305 ), then root layer configuration is complete. If it has not (“N” branch out of block 2305 ), then the next root layer switch is configured (block 2310 ).
  • the management ports are set up (block 2315 ). Spanning tree is disabled on the communication ports (i.e., the non-management ports) that are or can be connected to end points (i.e., not ports used to connect two switches) (block 2320 ). Redundant communication ports are aggregated (block 2325 ). The aggregated communication ports are configured (block 2330 ). Aggregated communication ports are assigned to VLANs (block 2335 ). Root level switches are established as root nodes for selected VLANs (block 2340 ).

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Computer Security & Cryptography (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

A full bisection bandwidth network, having a plurality of nodes and a plurality of paths among the nodes, is divided into a plurality of Virtual Local Area Networks (“VLANs”) by assigning paths to the VLANs such that each VLAN satisfies a spanning tree protocol and all paths are active in at least one VLAN.

Description

BACKGROUND
A device connected to a network, e.g., an Ethernet network, typically connects to a port on a network switch or hub. Network switches and hubs have a limited number of ports. Expanding the network to include a number of devices beyond the number of ports typically requires linking two or more switches or hubs. Redundant paths in the network are typically disabled by network protocol to prevent broadcast storms and loops in the topology. Making efficient use of such multiple-switch networks is a challenge.
SUMMARY
In general, in one aspect, the invention features a method. A full bisection bandwidth network, having a plurality of nodes and a plurality of paths among the nodes, is divided into a plurality of Virtual Local Area Networks (“VLANs”) by assigning paths to the VLANs such that each VLAN satisfies a spanning tree protocol and all paths are active in at least one VLAN.
Implementations of the invention may include one or more of the following. The full bisection bandwidth network may include a path A connecting node X and node Y and a path B connecting node X and node Y, such that standard Ethernet protocol would treat path A and path B as redundant paths. The method may further include aggregating Path A and Path B into a single trunk group so that Path A and Path B are active. The method may further include constructing the full bisection bandwidth network to have a fat tree topology. The method may further include constructing the full bisection bandwidth network to have a fully connected mesh topology. The plurality of nodes may include a root layer of N Ethernet switches and a branch layer of M Ethernet switches. M may be greater than N. The plurality of paths may include a path from each root layer switch to each branch layer switch. Assigning paths to the VLANs may include assigning paths from a first root layer switch to a first set of VLANs, assigning paths from a second root layer switch to a second set of VLANs, the first set of VLANs not containing any VLANS belonging to the second set of VLANS, and the second set of VLANs not containing any VLANs belonging to the first set of VLANs. M may equal 2N. Assigning paths to the VLANs may include assigning a path from branch layer switch BLS1 to root layer switch RLS1 to a first VLAN and assigning a path from branch layer switch BLS1 to root layer switch RLS2 to a second VLAN. Assigning paths to the VLANs may include assigning a path from branch layer switch BLS1 to root layer switch RLS1 to a first VLAN and assigning a path from branch layer switch BLS2 to root layer switch RLS1 to a second VLAN. Assigning paths to the VLANs may include providing a first path PATH1 from a first branch layer switch BLS1 to a first root layer switch RLS1, providing a second path PATH2 from the first branch layer switch BLS1 to the first root layer switch RLS1, and aggregating PATH1 and PATH2 into a single trunk group. A plurality of servers may be coupled to the full bisection bandwidth network. The method further may further include providing redundant paths from one of the plurality of servers to another of the plurality of servers. The method may further include providing redundant paths from each of the plurality of servers to the others of the plurality of servers. Assigning paths to the VLANs may include assigning a first path from branch layer switch BLS1 to root layer switch RLS1 to a first VLAN and assigning a second path redundant to the first path to the first VLAN. The plurality of paths may include a path from each root layer switch to each branch layer switch. A plurality of servers is coupled to the branch layer of Ethernet servers. Assigning paths to the VLANs may include assigning a first path from a first server to a second server and assigning a second path redundant to the first path from the first server to the second server. Assigning paths to the VLANs may include assigning redundant paths from each of the plurality of servers through the branch layer of Ethernet switches and the root layer of Ethernet switches to the others of the plurality of servers.
In general, in another aspect, the invention features a system. The system includes a full bisection bandwidth network. The full bisection bandwidth network includes a plurality of nodes. The full bisection bandwidth network includes a plurality of paths among the nodes. The full bisection bandwidth network includes a plurality of Virtual Local Area Networks (“VLANs”) incorporating the plurality of nodes and the plurality of paths. Each VLAN satisfies a spanning tree protocol. All paths are active in at least one VLAN.
Implementations of the invention include one or more of the following. The full bisection bandwidth network may include a path A connecting node X and node Y and a path B connecting node X and node Y, such that standard Ethernet protocol would treat path A and path B as redundant paths. Path A and Path B may be aggregated into a single trunk group such that Path A and Path B are active. The full bisection bandwidth network may have a fat tree topology. The full bisection bandwidth network may have a fully connected mesh topology. The plurality of nodes may include a root layer of N Ethernet switches. The plurality of nodes may include a branch layer of M Ethernet switches. M may be greater than N. The plurality of paths among the nodes may include a path from each root layer switch to each branch layer switch. Paths from a first root layer switch may be assigned to a first set of VLANs. Paths from a second root layer switch may be assigned to a second set of VLANs. The first set of VLANs may not contain any VLANS belonging to the second set of VLANS. The second set of VLANs may not contain any VLANs belonging to the first set of VLANs. M may equal N. The plurality of paths among the nodes may include a path from branch layer switch BLS1 to root layer switch RLS1 assigned to a first VLAN and a path from branch layer switch BLS1 to root layer switch RLS2 assigned to a second VLAN. The plurality of paths among the nodes may include a path from branch layer switch BLS1 to root layer switch RLS1 assigned to a first VLAN and a path from branch layer switch BLS2 to root layer switch RLS1 assigned to a second VLAN. The plurality of paths among the nodes may include a first path PATH1 from a first branch layer switch BLS1 to a first root layer switch RLS1 and a second path PATH2 from the first branch layer switch BLS1 to the first root layer switch RLS1 that are aggregated into a single trunk group. The system may further include a plurality of servers coupled to the full bisection bandwidth network. he plurality of paths among the nodes may include a plurality of redundant paths from one of the plurality of servers to another of the plurality of servers. The plurality of paths among the nodes may include a plurality of redundant paths from each of the plurality of servers to the others of the plurality of servers. The plurality of paths among the nodes may include a first path from branch layer switch BLS1 to root layer switch RLS1 assigned to a first VLAN and a second path redundant to the first path assigned to the first VLAN. The plurality of paths among the nodes may include a first path from a first server to a second server and a second path redundant to the first path from the first server to the second server. The plurality of paths among the nodes may include redundant paths from each of the plurality of servers through the branch layer of Ethernet switches and the root layer of Ethernet switches to the others of the plurality of servers.
In general, in another aspect, the invention features a method. The method includes providing a full bisection bandwidth network, having a plurality of nodes and a plurality of paths among the nodes, that is divided into a plurality of Virtual Local Area Networks (“VLANs”) by assigning paths to the VLANs such that each VLAN satisfies a spanning tree protocol and all paths are active in at least one VLAN. The full bisection bandwidth network carries a traffic load. The method includes balancing the traffic load among the paths.
In general, in another aspect, the invention features a method. The method includes providing a full bisection bandwidth network, having a plurality of nodes and a plurality of paths among the nodes, that is divided into a plurality of Virtual Local Area Networks (“VLANs”) by assigning paths to the VLANs such that each VLAN satisfies a spanning tree protocol and all paths are active in at least one VLAN. The method further includes adding a node. The method further includes adding paths to connect the added node to the full bisection bandwidth network and adjusting the assignments of paths and the added paths to VLANs such that each VLAN satisfies a spanning tree protocol, all paths are active in at least one VLAN, and the network remains a full bisection bandwidth network.
Implementations of the invention may include one or more of the following. Adjusting the assignments may include adding a new VLAN. Adjusting the assignments may include adding the added paths to the existing VLANs.
BRIEF DESCRIPTION OF THE DRAWINGS
FIG. 1 is a block diagram of a node of a parallel processing database system.
FIG. 2 is a block diagram of a parsing engine.
FIG. 3 is a block diagram of a parser.
FIG. 4 is an illustration of a full bisection bandwidth network.
FIGS. 5 and 6 are illustrations of the effects of the spanning tree protocol.
FIGS. 7-9 are illustrations of the effects of the elimination of redundant links and the application of link aggregation.
FIG. 10 is an illustration of a network using three 8-port switch elements to create a network of 12 ports.
FIG. 11 is an illustration of a network using six 8-port switch elements to create a network of 16 ports in a fat tree topology.
FIG. 12 is an illustration of a network using twelve 8-port switch elements to create a network of 32 ports.
FIG. 13 is an illustration of a fully connected mesh network.
FIG. 14 is an illustration of a fat tree network.
FIG. 15 is a graphical representation of the traffic activities of the ports in an unconfigured network.
FIG. 16 is a graphical representation of the traffic activities of the ports in a configured network.
FIG. 17 is a chart showing the improvement in throughput from an unconfigured network to a fully configured network.
FIG. 18 is a chart showing the improvement in number of drops per second from an unconfigured network to a fully configured network.
FIG. 19 is a chart showing the network activity when traffic is injected into a single Virtual Local Area Network (“VLAN”).
FIG. 20 is a representation of a network illustrating traffic flowing in only a single VLAN.
FIGS. 21-23 are flow charts.
DETAILED DESCRIPTION
The full bisection bandwidth network technique disclosed herein has particular application, but is not limited, to large databases that might contain many millions or billions of records managed by a database system (“DBS”) 100, such as a Teradata Active Data Warehousing System available from the assignee hereof. FIG. 1 shows a sample architecture for one subsystem 105 1 of the DBS 100. The DBS subsystem 105 1 includes one or more processing modules 110 1 . . . N, connected by a network 115, that manage the storage and retrieval of data in data-storage facilities 120 1 . . . N. Each of the processing modules 110 1 . . . N may be one or more physical processors or each may be a virtual processor, with one or more virtual processors running on one or more physical processors.
For the case in which one or more virtual processors are running on a single physical processor, the single physical processor swaps between the set of N virtual processors.
For the case in which N virtual processors are running on an M-processor subsystem, the subsystem's operating system schedules the N virtual processors to run on its set of M physical processors. If there are 4 virtual processors and 4 physical processors, then typically each virtual processor would run on its own physical processor. If there are 8 virtual processors and 4 physical processors, the operating system would schedule the 8 virtual processors against the 4 physical processors, in which case swapping of the virtual processors would occur.
Each of the processing modules 110 1 . . . N manages a portion of a database that is stored in a corresponding one of the data-storage facilities 120 1 . . . N. Each of the data-storage facilities 120 1 . . . N includes one or more disk drives. The DBS may include multiple subsystems 105 2 . . . N in addition to the illustrated subsystem 105 1, connected by extending the network 115.
The system stores data in one or more tables in the data-storage facilities 120 1 . . . N. The rows 125 1 . . . 2 of the tables are stored across multiple data-storage facilities 120 1 . . . N to ensure that the system workload is distributed evenly across the processing modules 110 1 . . . N. A parsing engine 130 organizes the storage of data and the distribution of table rows 125 1 . . . 2 among the processing modules 110 1 . . . N. The parsing engine 130 also coordinates the retrieval of data from the data-storage facilities 120 1 . . . N in response to queries received from a user at a mainframe 135 or a client computer 140. The DBS 100 usually receives queries and commands to build tables in a standard format, such as SQL.
In one implementation, the rows 125 1 . . . Z are distributed across the data-storage facilities 120 1 . . . N by the parsing engine 130 in accordance with their primary index. The primary index defines the columns of the rows that are used for calculating a hash value. The function that produces the hash value from the values in the columns specified by the primary index is called the hash function. Some portion, possibly the entirety, of the hash value is designated a “hash bucket”. The hash buckets are assigned to data-storage facilities 120 1 . . . N and associated processing modules 110 1 . . . N by a hash bucket map. The characteristics of the columns chosen for the primary index determine how evenly the rows are distributed.
In addition to the physical division of storage among the storage facilities illustrated in FIG. 1, each storage facility is also logically organized. One implementation divides the storage facilities into logical blocks of storage space. Other implementations can divide the available storage space into different units of storage. The logical units of storage can ignore or match the physical divisions of the storage facilities.
In one example system, the parsing engine 130 is made up of three components: a session control 200, a parser 205, and a dispatcher 210, as shown in FIG. 2. The session control 200 provides the logon and logoff function. It accepts a request for authorization to access the database, verifies it, and then either allows or disallows the access.
Once the session control 200 allows a session to begin, a user may submit a SQL query, which is routed to the parser 205. As illustrated in FIG. 3, the parser 205 interprets the SQL query (block 300), checks it for proper SQL syntax (block 305), evaluates it semantically (block 310), and consults a data dictionary to ensure that all of the objects specified in the SQL query actually exist and that the user has the authority to perform the request (block 315). Finally, the parser 205 runs an optimizer (block 320), which develops the least expensive plan to perform the request and produces executable steps to execute the plan. A dispatcher 210 issues commands to the processing modules 110 1 . . . N to implement the executable steps.
The network 115 will continue to be described in the context of the system illustrated in FIG. 1 but it will be clear to persons of ordinary skill in the art that the network described herein is not limited to that context but can be used in any networking context.
In one embodiment, the network 115 includes a network 405, such as that illustrated in FIG. 4. In one embodiment, the network 405 includes R switch elements, each having S ports, with each switch element having a connection to the other switch elements, leaving R(S−R+1) ports to which devices, such as the processing modules 110 1 . . . N in FIG. 1, can connect. In the embodiment shown in FIG. 4, the network 405 includes six 8-port switch elements 410 (only one is labeled). Thus, R=6 and S=8, meaning that the resulting network will have 6(8−6+1)=18 ports. An end point device 415 (“device” or “end point”: only one is labeled), such as one of the processing modules 110 1 . . . N in FIG. 1, represented in FIG. 4 by an asterisk (*), can be coupled to one of the ports 420 (only one is labeled). The network 405 illustrated in FIG. 4 can connect up to 18 devices.
The network 405 illustrated in FIG. 4 is a full bisection bandwidth network. In a full bisection bandwidth network, a device connected to one port on the network can communicate with another device connected to another port on the network at full speed, even when the network is fully populated and all ports are operating at full speed. In such a network, if every source end point wants to transmit to a different destination end point, and all source end points want to transmit at the same time, then a path exists for each one of the source end points to transmit. That is, there is no conflict or contention between source end points for paths. For example, if source end point A, shown in FIG. 4, wants to transmit to destination end point B, source end point B (which may be the same as destination end point B) wants to transmit to destination end point C, and source end point C (which may be the same as destination end point C) wants to transmit to destination end point A (which may be the same as source end point A) at the same time, the network provides paths for all of them. In a network that has the less than full bisection bandwidth, that may not be true.
A full bisection bandwidth network is realized when the network can be arbitrarily cut in half, such as by line 425, and the number of cut links is equal to the number of end points in each half. In FIG. 4, the number of cross links cut by the line 425 (9) is equal to the number of end points (i.e., the asterisks) (9) on either side of the line 425. To achieve such the requirement of a full bisection bandwidth network typically requires that all available links in the network remain active.
In a typical Ethernet configuration, the network 405 illustrated in FIG. 4 would not be a full bisection bandwidth network because the Ethernet spanning tree protocol among switch elements disables redundant paths to avoid broadcast storms. By disabling redundant paths, however, the network loses valuable connections between end points.
FIG. 5 illustrates a typical scenario in which a redundant path is disabled by the spanning tree protocol. Before the spanning tree protocol is applied, nodes 405, 410, and 415 are connected by paths 420, 425, and 430. In this configuration, multiple redundant paths are available between any two nodes. For example, nodes 405 and 410 are connected by (a) path 420, and (b) paths 425 and 430 through node 415. Under the typical implementation of Ethernet networks, such redundant paths create the possibility of loops and broadcast storms.
In the typical Ethernet network, one of the paths shown in FIG. 5 would be disabled under the spanning tree protocol. For example, after the spanning tree protocol is applied, path 430 is disabled, as indicated by the dashed line representing path 430 on the right side of FIG. 5, thereby eliminating redundant paths between nodes 405, 410, and 415.
In one embodiment of a network 405, the IEEE 802.1q protocol is applied to divide a network subject to the spanning tree protocol into VLANs in order to keep all network paths active and available for traffic. For example, as shown in FIG. 6, the network in FIG. 5 is configured to have three VLANs. The first VLAN, illustrated by tree 1 in FIG. 6, has path 430 disabled. The second VLAN, illustrated by tree 2 in FIG. 6, has path 425 disabled. The third VLAN, illustrated by tree 3 in FIG. 6, has path 420 disabled. Thus, as can be seen, in the configuration illustrated in FIG. 6, all three of the paths 420, 425, and 430 are active and each is active in two VLANs (e.g., path 420 is active in VLANs tree 1 and tree 2).
In some embodiments, multiple point-to-point paths between nodes are used to achieve full bisection bandwidth. Typically, Ethernet protocol disables redundant links to prevent loops in the network. For example, consider the nodes 705 and 710 connected by redundant paths 715, 720, and 725 in FIG. 7. Typical Ethernet protocol will disable two of the paths (e.g., paths 715 and 720), as shown in FIG. 8.
In one embodiment of a network 405, the IEEE 802.2ad protocol is applied, as shown in FIG. 9, to aggregate multiple links into a single trunk group (represented by wrapper 905), such that all paths remain active during normal operation.
FIG. 10 shows one embodiment of a full configuration of a network 1005 using 3 switch elements 1010, 1015, and 1020 of 8 ports each to expand from a single 8-port switch to a full bisection bandwidth network of 12 ports. As can be seen, the embodiment shown in FIG. 10 provides two paths 1025 and 1030 between switch element 1010 and switch element 1015, two paths 1035 and 1040 between switch element 1010 and switch element 1020, and two paths 1045 and 1050 between switch element 1015 and switch element 1020. In addition, each switch element provides connections to four end points 1055 (such as, for example, the processing modules 110 1 . . . N in FIG. 1)(only one is labeled). Both protocols 802.1q and 803.2ad are used in this example to achieve full bisection bandwidth. Each source end point has 2 paths available to reach any destination end point. For example, a source end point attached to switch element 1010 can reach a destination end point attached to switch element 1015 through path 1025 or through path 1030. Further, if (a) network 1005 has been divided into 3 VLANs, as described above in connection with the description of FIG. 6, (b) a source end point attached to switch element 1010 desires to communicate to a destination end point attached to switch element 1015, and (c) paths 1025 and 1030 are disabled in the VLAN being used for the communication, multiple paths are still available through paths 1035, 1040, 1045 and 1050. Alternatively, such communication could be accomplished via a VLAN in which paths 1025 and 1030 are enabled.
FIG. 11 shows one embodiment of a fully configured network with 6 switch elements of 8 ports each to expand from a single 8-port switch to a network of 16 ports in a full bisection bandwidth network having a fat tree topology. Two of the switch elements 1110 and 1115 are at a root layer of the fat tree topology. The other switches 1120, 1125, 1130, and 1135 are at a branch layer of the fat tree topology. The end points 1140 are at a leaf level of the fat tree topology. Thus, the network in FIG. 11 has the appearance of an inverted tree with the root at the top and the leaves at the bottom. The network illustrated in FIG. 11 is known as a “fat tree” because there are more paths between the root layer and the branch layer (those paths are labeled generally as 1145) than between the branch layer and the leaf layer (those branches are labeled generally as 1150). In the network 1105 shown in FIG. 11, there are 16 paths between the root layer and the branch layer and 16 paths between the branch layer and the leaf layer. Both protocols 802.1q and 803.2ad are used to achieve full bisection bandwidth. Each source end point has 2 paths to reach any destination end point. Any source end point can reach any destination end point through one of the dashed paths or through one of the solid paths. This multiplicity of paths provides the ability to load balance traffic across paths.
FIG. 12 shows a fat tree topology network 1205 with four 8- port switch elements 1210, 1215, 1220, and 1225 at the root layer and eight 8- port switch elements 1230, 1235, 1240, 1245, 1250, 1255, 1260, and 1265 at the branch layer, which produces a 32-port network. In this configuration, the Link Aggregation Protocol (803.2ad) is not needed to achieve full bisection bandwidth. There are four VLANs in the network, a first represented by the solid paths, a second represented by the dashed (i.e., “----”) paths, a third represented by the dash-dot paths (“-•-•”), and a fourth represented by the long-dash/short-dash (“- — - —”) paths. Any source end point (represented by the asterisks) in the topology shown in FIG. 12 can reach any destination end point (also represented by the asterisks) through one of the four path sets.
FIG. 13 shows one embodiment of a fully connected mesh network 1305 using 3 Dell 6248 48-port Gigabit Ethernet switches 1310, 1315, 1320. The switches in the network are cabled together as shown:
    • (a) ports 1-20 of switch 1310 are connected to end points 1325 and 1330;
    • (b) ports 1-20 of switch 1315 are connected to end points 1335 and 1340;
    • (c) ports 1-20 of switch 1320 are connected to end points 1345 and 1350;
    • (d) ports 42-48 of switch 1310 are connected to ports 42-48 of switch 1320;
    • (e) ports 35-41 of switch 1310 are connected to ports 35-41 of switch 1320;
    • (g) ports 21-27 of switch 1310 are connected to ports 35-41 of switch 1315;
    • (h) ports 28-34 of switch 1310 are connected to ports 42-48 of switch 1315;
    • (i) ports 28-34 of switch 1315 are connected to ports 28-34 of switch 1320;
    • (j) ports 21-27 of switch 1315 are connected to ports 21-27 of switch 1320;
    • (k) switch 1310 is configured to be the root of VLANs 4 and 7;
    • (l) switch 1315 is configured to be the root of VLANs 3 and 6; and
    • (m) switch 1320 is configured to be the root of VLANs 2 and 5.
The switches are configured using Multiple Spanning Tree Protocol (802.1q) and Link Aggregation Protocol (803.2ad) to provide a 60-port network with full bisection bandwidth. All paths in the network are active at all times to carry traffic and there are 6 distinct paths for each source end point to inject traffic into the network to reach any destination end point.
FIG. 14 shows a fat tree network using six Dell 6248 48-port Gigabit Ethernet switches. The switches are cabled together as shown:
    • (a) ports 1 and 2 of each switch 1410, 1415, 1420, 1425, 1430, 1435 are used for system management by a monitor element (not shown);
    • (b) ports 26-30 of switch 1410 are aggregated together as trunk channel 5 and are connected to ports 26-30 of switch 1420, which are aggregated together as trunk channel 5;
    • (c) ports 31-36 of switch 1410 are aggregated together as trunk channel 6 and are connected to ports 31-36 of switch 1420, which are aggregated together as trunk channel 6;
    • (d) ports 3-7 of switch 1410 are aggregated together as trunk channel 1 and are connected to ports 3-7 of switch 1425, which are aggregated together as trunk channel 1;
    • (e) ports 8-13 of switch 1410 are aggregated together as trunk channel 2 and are connected to ports 8-13 of switch 1425, which are aggregated together as trunk channel 2;
    • (f) ports 14-19 of switch 1410 are aggregated together as trunk channel 3 and are connected to ports 14-19 of switch 1430, which are aggregated together as trunk channel 3;
    • (g) ports 20-25 of switch 1410 are aggregated together as trunk channel 4 and are connected to ports 20-25 of switch 1430, which are aggregated together as trunk channel 4;
    • (h) ports 37-42 of switch 1410 are aggregated together as trunk channel 7 and are connected to ports 37-42 of switch 1435, which are aggregated together as trunk channel 7;
    • (i) ports 43-48 of switch 1410 are aggregated together as trunk channel 8 and are connected to ports 43-48 of switch 1435, which are aggregated together as trunk channel 8;
    • (j) ports 37-42 of switch 1415 are aggregated together as trunk channel 7 and are connected to ports 37-42 of switch 1420, which are aggregated together as trunk channel 7;
    • (k) ports 43-48 of switch 1415 are aggregated together as trunk channel 8 and are connected to ports 43-48 of switch 1420, which are aggregated together as trunk channel 8;
    • (l) ports 14-19 of switch 1415 are aggregated together as trunk channel 3 and are connected to ports 14-19 of switch 1425, which are aggregated together as trunk channel 3;
    • (m) ports 20-25 of switch 1415 are aggregated together as trunk channel 4 and are connected to ports 20-25 of switch 1425, which are aggregated together as trunk channel 4;
    • (n) ports 26-30 of switch 1415 are aggregated together as trunk channel 5 and are connected to ports 26-30 of switch 1430, which are aggregated together as trunk channel 5;
    • (o) ports 31-36 of switch 1415 are aggregated together as trunk channel 6 and are connected to ports 31-36 of switch 1430, which are aggregated together as trunk channel 6;
    • (p) ports 3-7 of switch 1415 are aggregated together as trunk channel 1 and are connected to ports 3-7 of switch 1435, which are aggregated together as trunk channel 1;
    • (q) ports 8-13 of switch 1415 are aggregated together as trunk channel 2 and are connected to ports 8-13 of switch 1435, which are aggregated together as trunk channel 2;
    • (r) ports 3-25 of switch 1420 are available for connection to end points;
    • (s) ports 26-48 of switch 1425 are available for connection to end points;
    • (t) ports 3-13 and 37-48 of switch 1430 are available for connection to end points;
    • (u) ports 14-36 of switch 1435 are available for connection to end points;
    • (v) switch 1410 is configured to be the root of VLANs 2 and 3;
    • (w) switch 1415 is configured to be the root of VLANs 4 and 5;
    • (x) channels 1, 3, 5, and 7 on switch 1410, channel 5 on switch 1420, channel 1 on switch 1425, channel 3 on switch 1430, and channel 7 on switch 1435 are configured to be paths in VLAN 2;
    • (y) channels 2, 4, 6, and 8 on switch 1410, channel 6 on switch 1420, channel 2 on switch 1425, channel 4 on switch 1430, and channel 8 on switch 1435 are configured to be paths in VLAN 3;
    • (z) channels 1, 3, 5, and 7 on switch 1415, channel 7 on switch 1420, channel 3 on switch 1425, channel 5 on switch 1430, and channel 1 on switch 1435 are configured to be paths in VLAN 4; and
    • (aa) channels 2, 4, 6, and 8 on switch 1415, channel 8 on switch 1420, channel 4 on switch 1425, channel 6 on switch 1430, and channel 2 on switch 1435 are configured to be paths in VLAN 5;
The network is configured using Multiple Spanning Tree Protocol (802.1q) and Link Aggregation Protocol (803.2ad) to provide a 92-port network with full section bandwidth. All links in the network are active at all time to carry traffic and there are 4 distinct paths for each source node to inject traffic into the network to reach any destination node. Two ports in each switch are dedicated for system management.
The scripts used to accomplish this configuration with the network 1405 shown in FIG. 14 is repeated below (using Dell script language; comments are in italics):
Switch element S10:
configure
vlan database
vlan 2-5 All VLANs are declared
exit
interface range ethernet 1/g1-1/g2
spanning-tree disable
spanning-tree portfast
exit
interface range ethernet 1/g3-1/g48
switchport mode general
no switchport general acceptable-frame-type tagged-only
switchport general allowed vlan add 2-5 tagged
exit
interface range ethernet 1/g3-1/g25
spanning-tree disable
spanning-tree portfast
exit
interface range ethernet 1/g26-1/g30 Link aggregation setting
channel-group 5 mode auto
exit
interface range ethernet 1/g31-1/g36
channel-group 6 mode auto
exit
interface range ethernet 1/g37-1/g42
channel-group 7 mode auto
exit
interface range ethernet 1/g43-1/g48
channel-group 8 mode auto
exit
interface port-channel 5 Attach VLAN to LAG group
hashing-mode 5
switchport mode general
no switchport general acceptable-frame-type tagged-only
switchport general allowed vlan add 2-5 tagged
exit
interface port-channel 6
hashing-mode 5
switchport mode general
no switchport general acceptable-frame-type tagged-only
switchport general allowed vlan add 2-5 tagged
exit
interface port-channel 7
hashing-mode 5
switchport mode general
no switchport general acceptable-frame-type tagged-only
switchport general allowed vlan add 2-5 tagged
exit
interface port-channel 8
hashing-mode 5
switchport mode general
no switchport general acceptable-frame-type tagged-only
switchport general allowed vlan add 2-5 tagged
exit
spanning-tree mode mstp
spanning-tree mst configuration
instance 1 add vlan 1 Assign unique MSTP instance
instance 2 add vlan 2
instance 3 add vlan 3
instance 4 add vlan 4
instance 5 add vlan 5
exit
interface port-channel 5
spanning-tree mst 2 port-priority 0 Set up priority to guide routes
spanning-tree mst 3 port-priority 16
exit
interface port-channel 6
spanning-tree mst 2 port-priority 16
spanning-tree mst 3 port-priority 0
exit
interface port-channel 7
spanning-tree mst 4 port-priority 0
spanning-tree mst 5 port-priority 16
exit
interface port-channel 8
spanning-tree mst 4 port-priority 16
spanning-tree mst 5 port-priority 0
exit
spanning-tree mst configuration Declare a single MSTP region
name “teradata”
exit
exit
exit
Switch element S11:
configure
vlan database
vlan 2-5
exit
interface range ethernet 1/g1-1/g2
spanning-tree disable
spanning-tree portfast
exit
interface range ethernet 1/g3-1/g48
switchport mode general
no switchport general acceptable-frame-type tagged-only
switchport general allowed vlan add 2-5 tagged
exit
interface range ethernet 1/g26-1/g48
spanning-tree disable
spanning-tree portfast
exit
interface range ethernet 1/g3-1/g7
channel-group 1 mode auto
exit
interface range ethernet 1/g8-1/g13
channel-group 2 mode auto
exit
interface range ethernet 1/g14-1/g19
channel-group 3 mode auto
exit
interface range ethernet 1/g20-1/g25
channel-group 4 mode auto
exit
!
interface port-channel 1
hashing-mode 5
switchport mode general
no switchport general acceptable-frame-type tagged-only
switchport general allowed vlan add 2-5 tagged
exit
interface port-channel 2
hashing-mode 5
switchport mode general
no switchport general acceptable-frame-type tagged-only
switchport general allowed vlan add 2-5 tagged
exit
interface port-channel 3
hashing-mode 5
switchport mode general
no switchport general acceptable-frame-type tagged-only
switchport general allowed vlan add 2-5 tagged
exit
interface port-channel 4
hashing-mode 5
switchport mode general
no switchport general acceptable-frame-type tagged-only
switchport general allowed vlan add 2-5 tagged
exit
spanning-tree mode mstp
spanning-tree mst configuration
instance 1 add vlan 1
instance 2 add vlan 2
instance 3 add vlan 3
instance 4 add vlan 4
instance 5 add vlan 5
exit
interface port-channel 1
spanning-tree mst 2 port-priority 0
spanning-tree mst 3 port-priority 16
exit
interface port-channel 2
spanning-tree mst 2 port-priority 16
spanning-tree mst 3 port-priority 0
exit
interface port-channel 3
spanning-tree mst 4 port-priority 0
spanning-tree mst 5 port-priority 16
exit
interface port-channel 4
spanning-tree mst 4 port-priority 16
spanning-tree mst 5 port-priority 0
exit
spanning-tree mst configuration
name “teradata”
exit
exit
exit
Switch element S12:
configure
vlan database
vlan 2-5
exit
interface range ethernet 1/g1-1/g2
spanning-tree disable
spanning-tree portfast
exit
interface range ethernet 1/g3-1/g48
switchport mode general
no switchport general acceptable-frame-type tagged-only
switchport general allowed vlan add 2-5 tagged
exit
interface range ethernet 1/g3-1/g13
spanning-tree disable
spanning-tree portfast
exit
interface range ethernet 1/g37-1/g48
spanning-tree disable
spanning-tree portfast
exit
interface range ethernet 1/g14-1/g19
channel-group 3 mode auto
exit
interface range ethernet 1/g20-1/g25
channel-group 4 mode auto
exit
interface range ethernet 1/g26-1/g30
channel-group 5 mode auto
exit
interface range ethernet 1/g31-1/g36
channel-group 6 mode auto
exit
interface port-channel 3
hashing-mode 5
switchport mode general
no switchport general acceptable-frame-type tagged-only
switchport general allowed vlan add 2-5 tagged
exit
!
interface port-channel 4
hashing-mode 5
switchport mode general
no switchport general acceptable-frame-type tagged-only
switchport general allowed vlan add 2-5 tagged
exit
interface port-channel 5
hashing-mode 5
switchport mode general
no switchport general acceptable-frame-type tagged-only
switchport general allowed vlan add 2-5 tagged
exit
interface port-channel 6
hashing-mode 5
switchport mode general
no switchport general acceptable-frame-type tagged-only
switchport general allowed vlan add 2-5 tagged
exit
spanning-tree mode mstp
spanning-tree mst configuration
instance 1 add vlan 1
instance 2 add vlan 2
instance 3 add vlan 3
instance 4 add vlan 4
instance 5 add vlan 5
exit
interface port-channel 3
spanning-tree mst 2 port-priority 0
spanning-tree mst 3 port-priority 16
exit
interface port-channel 4
spanning-tree mst 2 port-priority 16
spanning-tree mst 3 port-priority 0
exit
interface port-channel 5
spanning-tree mst 4 port-priority 0
spanning-tree mst 5 port-priority 16
exit
interface port-channel 6
spanning-tree mst 4 port-priority 16
spanning-tree mst 5 port-priority 0
exit
spanning-tree mst configuration
name “teradata”
exit
exit
exit
Switch element S13:
configure
vlan database
vlan 2-5
exit
interface range ethernet 1/g1-1/g2
spanning-tree disable
spanning-tree portfast
exit
interface range ethernet 1/g3-1/g48
switchport mode general
no switchport general acceptable-frame-type tagged-only
switchport general allowed vlan add 2-5 tagged
exit
interface range ethernet 1/g14-1/g36
spanning-tree disable
spanning-tree portfast
exit
interface range ethernet 1/g37-1/g42
channel-group 7 mode auto
exit
interface range ethernet 1/g43-1/g48
channel-group 8 mode auto
exit
interface range ethernet 1/g3-1/g7
channel-group 1 mode auto
exit
interface range ethernet 1/g8-1/g13
channel-group 2 mode auto
exit
!
interface port-channel 7
hashing-mode 5
switchport mode general
no switchport general acceptable-frame-type tagged-only
switchport general allowed vlan add 2-5 tagged
exit
interface port-channel 8
hashing-mode 5
switchport mode general
no switchport general acceptable-frame-type tagged-only
switchport general allowed vlan add 2-5 tagged
exit
interface port-channel 1
hashing-mode 5
switchport mode general
no switchport general acceptable-frame-type tagged-only
switchport general allowed vlan add 2-5 tagged
exit
interface port-channel 2
hashing-mode 5
switchport mode general
no switchport general acceptable-frame-type tagged-only
switchport general allowed vlan add 2-5 tagged
exit
spanning-tree mode mstp
spanning-tree mst configuration
instance 1 add vlan 1
instance 2 add vlan 2
instance 3 add vlan 3
instance 4 add vlan 4
instance 5 add vlan 5
exit
interface port-channel 7
spanning-tree mst 2 port-priority 0
spanning-tree mst 3 port-priority 16
exit
interface port-channel 8
spanning-tree mst 2 port-priority 16
spanning-tree mst 3 port-priority 0
exit
interface port-channel 1
spanning-tree mst 4 port-priority 0
spanning-tree mst 5 port-priority 16
exit
interface port-channel 2
spanning-tree mst 4 port-priority 16
spanning-tree mst 5 port-priority 0
exit
spanning-tree mst configuration
name “teradata”
exit
exit
exit
Switch element S20:
configure
vlan database
vlan 2-5 All VLANs are declared
exit
!
interface range ethernet 1/g1-1/g2
spanning-tree disable
spanning-tree portfast
exit
interface range ethernet 1/g3-1/g48
switchport mode general
no switchport general acceptable-frame-type tagged-only
switchport general allowed vlan add 2-5 tagged
exit
interface range ethernet 1/g3-1/g7 Link aggregation setting
channel-group 1 mode auto
exit
interface range ethernet 1/g8-1/g13
channel-group 2 mode auto
exit
interface range ethernet 1/g14-1/g19
channel-group 3 mode auto
exit
interface range ethernet 1/g20-1/g25
channel-group 4 mode auto
exit
interface range ethernet 1/g26-1/g30
channel-group 5 mode auto
exit
interface range ethernet 1/g31-1/g36
channel-group 6 mode auto
exit
interface range ethernet 1/g37-1/g42
channel-group 7 mode auto
exit
interface range ethernet 1/g43-1/g48
channel-group 8 mode auto
exit
!
interface port-channel 1 Attach VLAN to LAG group
hashing-mode 5
switchport mode general
no switchport general acceptable-frame-type tagged-only
switchport general allowed vlan add 2-5 tagged
exit
!
interface port-channel 2
hashing-mode 5
switchport mode general
no switchport general acceptable-frame-type tagged-only
switchport general allowed vlan add 2-5 tagged
exit
interface port-channel 3
hashing-mode 5
switchport mode general
no switchport general acceptable-frame-type tagged-only
switchport general allowed vlan add 2-5 tagged
exit
interface port-channel 4
hashing-mode 5
switchport mode general
no switchport general acceptable-frame-type tagged-only
switchport general allowed vlan add 2-5 tagged
exit
interface port-channel 5
hashing-mode 5
switchport mode general
no switchport general acceptable-frame-type tagged-only
switchport general allowed vlan add 2-5 tagged
exit
interface port-channel 6
hashing-mode 5
switchport mode general
no switchport general acceptable-frame-type tagged-only
switchport general allowed vlan add 2-5 tagged
exit
interface port-channel 7
hashing-mode 5
switchport mode general
no switchport general acceptable-frame-type tagged-only
switchport general allowed vlan add 2-5 tagged
exit
interface port-channel 8
hashing-mode 5
switchport mode general
no switchport general acceptable-frame-type tagged-only
switchport general allowed vlan add 2-5 tagged
exit
spanning-tree mode mstp
spanning-tree mst configuration Assign unique MSTP instance
instance 1 add vlan 1
instance 2 add vlan 2
instance 3 add vlan 3
instance 4 add vlan 4
instance 5 add vlan 5
exit
interface port-channel 1 Set up priority to guide routes
spanning-tree mst 2 port-priority 0
spanning-tree mst 3 port-priority 16
exit
interface port-channel 2
spanning-tree mst 2 port-priority 16
spanning-tree mst 3 port-priority 0
exit
interface port-channel 3
spanning-tree mst 2 port-priority 0
spanning-tree mst 3 port-priority 16
exit
interface port-channel 4
spanning-tree mst 2 port-priority 16
spanning-tree mst 3 port-priority 0
exit
interface port-channel 5
spanning-tree mst 2 port-priority 0
spanning-tree mst 3 port-priority 16
exit
interface port-channel 6
spanning-tree mst 2 port-priority 16
spanning-tree mst 3 port-priority 0
exit
interface port-channel 7
spanning-tree mst 2 port-priority 0
spanning-tree mst 3 port-priority 16
exit
interface port-channel 8
spanning-tree mst 2 port-priority 16
spanning-tree mst 3 port-priority 0
exit
spanning-tree mst 1 priority 0 MSTP instance priority for VLAN root
spanning-tree mst 2 priority 0
spanning-tree mst 3 priority 0
spanning-tree mst 4 priority 16384
spanning-tree mst 5 priority 16384
spanning-tree mst configuration Assign MSTP region for network
name “teradata”
exit
exit
exit
Switch element S21:
configure
vlan database
vlan 2-5
exit
interface range ethernet 1/g1-1/g2
spanning-tree disable
spanning-tree portfast
exit
interface range ethernet 1/g3-1/g48
switchport mode general
no switchport general acceptable-frame-type tagged-only
switchport general allowed vlan add 2-5 tagged
exit
interface range ethernet 1/g3-1/g7
channel-group 1 mode auto
exit
interface range ethernet 1/g8-1/g13
channel-group 2 mode auto
exit
interface range ethernet 1/g14-1/g19
channel-group 3 mode auto
exit
interface range ethernet 1/g20-1/g25
channel-group 4 mode auto
exit
interface range ethernet 1/g26-1/g30
channel-group 5 mode auto
exit
interface range ethernet 1/g31-1/g36
channel-group 6 mode auto
exit
interface range ethernet 1/g37-1/g42
channel-group 7 mode auto
exit
interface range ethernet 1/g43-1/g48
channel-group 8 mode auto
exit
interface port-channel 1
hashing-mode 5
switchport mode general
no switchport general acceptable-frame-type tagged-only
switchport general allowed vlan add 2-5 tagged
exit
!
interface port-channel 2
hashing-mode 5
switchport mode general
no switchport general acceptable-frame-type tagged-only
switchport general allowed vlan add 2-5 tagged
exit
interface port-channel 3
hashing-mode 5
switchport mode general
no switchport general acceptable-frame-type tagged-only
switchport general allowed vlan add 2-5 tagged
exit
interface port-channel 4
hashing-mode 5
switchport mode general
no switchport general acceptable-frame-type tagged-only
switchport general allowed vlan add 2-5 tagged
exit
interface port-channel 5
hashing-mode 5
switchport mode general
no switchport general acceptable-frame-type tagged-only
switchport general allowed vlan add 2-5 tagged
exit
interface port-channel 6
hashing-mode 5
switchport mode general
no switchport general acceptable-frame-type tagged-only
switchport general allowed vlan add 2-5 tagged
exit
interface port-channel 7
hashing-mode 5
switchport mode general
no switchport general acceptable-frame-type tagged-only
switchport general allowed vlan add 2-5 tagged
exit
interface port-channel 8
hashing-mode 5
switchport mode general
no switchport general acceptable-frame-type tagged-only
switchport general allowed vlan add 2-5 tagged
exit
spanning-tree mode mstp
spanning-tree mst configuration
instance 1 add vlan 1
instance 2 add vlan 2
instance 3 add vlan 3
instance 4 add vlan 4
instance 5 add vlan 5
exit
interface port-channel 1
spanning-tree mst 4 port-priority 0
spanning-tree mst 5 port-priority 16
exit
interface port-channel 2
spanning-tree mst 4 port-priority 16
spanning-tree mst 5 port-priority 0
exit
interface port-channel 3
spanning-tree mst 4 port-priority 0
spanning-tree mst 5 port-priority 16
exit
interface port-channel 4
spanning-tree mst 4 port-priority 16
spanning-tree mst 5 port-priority 0
exit
interface port-channel 5
spanning-tree mst 4 port-priority 0
spanning-tree mst 5 port-priority 16
exit
interface port-channel 6
spanning-tree mst 4 port-priority 16
spanning-tree mst 5 port-priority 0
exit
interface port-channel 7
spanning-tree mst 4 port-priority 0
spanning-tree mst 5 port-priority 16
exit
interface port-channel 8
spanning-tree mst 4 port-priority 16
spanning-tree mst 5 port-priority 0
exit
spanning-tree mst 1 priority 16384
spanning-tree mst 2 priority 16384
spanning-tree mst 3 priority 16384
spanning-tree mst 4 priority 0
spanning-tree mst 5 priority 0
spanning-tree mst configuration
name “teradata”
exit
exit
exit
The network in FIG. 14 was used to collect relevant statistics for the invention. Fifteen server end points were used to emulate a 90-end-point fully connected, fully configured network. A diagnostic driver was designed to allow the ability to inject selectively one, many, or all VLAN traffic into the network. Using the diagnostic driver the network was characterized.
Tables 1 and 2 contain statistics collected when the network 1405 is not configured as described above. That is, the statistics shown in Table 1 show the number of bytes transmitted through one of the end points and the number of dropped packets when the network is configured with a single VLAN and is allowed to self-configure what it considers to be its best topology:
TABLE 1
Throughput Drops
100,669,496  4,711
101,644,964  5,197
96,521,132 4,561
93,187,456 4,557
97,508,640 4,812
FIG. 15 shows the graphical representation of the traffic activities of all ports in the network. The figure shows that uplink activities are essentially reduced to one path and only one switch of the root layer switches (S20) is actively carrying traffic.
Table 2 shows network statistics collected when the network is configured into a full bisection bandwidth topology, as described herein.
TABLE 2
Throughput Drops
527,064,660 6
566,409,564 6
524,742,348 9
539,036,720 12 
522,375,176 8
FIG. 16 shows the graphical representation of the traffic activities of all ports in the network. The figure shows that all paths in the network are actively carrying traffic and that all root level switches are actively carrying traffic.
FIGS. 17 and 18 show the difference in bandwidth and the number of packet drops per end point depending on whether the invention is used (“Fully configured network”) or not (“Unconfigured network”).
FIG. 19 shows that when only VLAN 2 traffic is injected into the network, only relevant ports that were configured for VLAN 2 carry the traffic. The point of this experiment is to show that there is a distinct path for any source to any destination for a particular VLAN. Traffic for each VLAN is completely isolated from all other traffic. Consequently, each source end point can decide how best to load balance traffic based on VLAN. For instance, one VLAN could be used for high priority traffic, one VLAN could be used as a broadcast only path, etc.
FIG. 20 depicts the same concept in a different way. In FIG. 20, the bold lines are paths that are dedicated to VLAN 2. Injecting packets in different VLANs improves isolation and increases network throughput by avoiding congestion.
In order to achieve a topology with full bisection bandwidth and predictable routes between sources and destinations, the network is cabled with strict connectivity for full section bandwidth, and each switch element is configured to achieve a complete network. Each switch element is configured to meet Multiple Spanning Tree Protocol (802.1q) (“MSTP”) and Link Aggregation Protocol (803.2ad) (“LAG”) requirements. The configuration allows the MSTP and LAG protocols to automatically produce the desired network. The following are the principle configuration settings used to meet MSTP and LAG requirements:
    • All switches in the network are declared with the same number of VLANs and allowed accessible by all declared VLAN traffic.
    • All links that connect between 2 switch elements have Link Aggregation groups declared for their corresponding connections. All LAG groups are attached to all of the VLANs tags.
    • Each VLAN is given a unique MSTP instance.
    • Each root level switch element is set up to make it a root switch for one MSTP instance.
    • All switch elements in the network are declared to be in a single MSTP region.
    • Priority is set up to guide the MSTP and traffic into the desired paths.
The resulting configuration:
    • allows expansion of Ethernet networks beyond one switch element without loss of full bisection bandwidth;
    • allows multiple low cost switches to be cascaded to achieve high port count networks at a lower cost that big iron networks;
    • provides multiple redundant paths in the network, resulting in greater network resiliency in the face of link failures (this can be further enhanced by connecting more than one port per end point to more than one switch element) (redundancy further helps prevent loss of connectivity when switch elements fail);
    • can be incrementally grown as needed;
    • allows load balancing through multiple paths to reduce congestion and packet loss.
As illustrated in FIG. 21, network configuration of a fat tree network according to the principles described herein begins by cabling root layer switches and branch layer switches into a full bisection bandwidth topology (block 2105), such as that shown in FIG. 14. The branch layer switches are configured (block 2110) and the root layer switches are configured (block 2115).
Configuration of the branch layer switches, as illustrated in FIG. 22, begins by determining if the last branch layer switch has been configured (block 2205). If it has (“Y” branch out of block 2205), then branch layer configuration is complete. If it has not (“N” branch out of block 2205), then the next branch layer switch is configured (block 2210). The management ports are set up (block 2215). The spanning tree protocol is disabled on the communication ports (i.e., the non-management ports) that are or can be connected to end points (i.e., not ports used to connect two switches) (block 2220). Redundant communication ports are aggregated (block 2225). The aggregated communication ports are configured (block 2230). Aggregated communication ports are assigned to VLANs (block 2235).
Configuration of the root layer switches, as illustrated in FIG. 23, begins by determining if the last root layer switch has been configured (block 2305). If it has (“Y” branch out of block 2305), then root layer configuration is complete. If it has not (“N” branch out of block 2305), then the next root layer switch is configured (block 2310). The management ports are set up (block 2315). Spanning tree is disabled on the communication ports (i.e., the non-management ports) that are or can be connected to end points (i.e., not ports used to connect two switches) (block 2320). Redundant communication ports are aggregated (block 2325). The aggregated communication ports are configured (block 2330). Aggregated communication ports are assigned to VLANs (block 2335). Root level switches are established as root nodes for selected VLANs (block 2340).
The foregoing description of the preferred embodiment of the invention has been presented for the purposes of illustration and description. It is not intended to be exhaustive or to limit the invention to the precise form disclosed. Many modifications and variations are possible in light of the above teaching. It is intended that the scope of the invention be limited not by this detailed description, but rather by the claims appended hereto.

Claims (27)

What is claimed is:
1. A system comprising:
a full bisection bandwidth network comprising:
a plurality of nodes;
a plurality of paths among the nodes;
a plurality of Virtual Local Area Networks (“VLANs”) incorporating the plurality of nodes and the plurality of paths, wherein the plurality of VLANs comprises a first VLAN and a second VLAN and the plurality of paths comprises a first path, wherein the first path is between two nodes and does not pass through any intervening nodes;
wherein the first path is assigned to the first VLAN and the second VLAN;
wherein the first VLAN would not satisfy a spanning tree protocol if the first path is enabled in the first VLAN and the first VLAN satisfies the spanning tree protocol with the first path disabled in the first VLAN; and
wherein the first path is enabled in the second VLAN and the second VLAN satisfies the spanning tree protocol with the first path enabled in the second VLAN;
adding a node;
adding paths to connect the added node to the full bisection bandwidth network;
adjusting the assignments of the paths and the added paths to VLANs such that:
each VLAN satisfies a spanning tree protocol;
each of the plurality of paths is active in the full bisection bandwidth network; and
the network remains a full bisection bandwidth network;
wherein:
the plurality of nodes comprises:
a root layer of N Ethernet switches;
a branch layer of M Ethernet switches, M>N;
the plurality of paths among the nodes comprises:
a path from branch layer switch BLS1 to root layer switch RLS1 assigned to a first VLAN; and
a path from branch layer switch BLS2 to root layer switch RLS1 assigned to a second VLAN.
2. The system of claim 1 wherein:
the full bisection bandwidth network comprises a path A connecting node X and node Y and a path B connecting node X and node Y, such that standard Ethernet protocol would treat path A and path B as redundant paths; and
Path A and Path B are aggregated into a single trunk group such that Path A and Path B are active.
3. The system of claim 1 wherein:
the full bisection bandwidth network has a fat tree topology.
4. The system of claim 1 further comprising:
the full bisection bandwidth network has a fully connected mesh topology.
5. The system of claim 1 wherein:
the plurality of paths among the nodes comprises:
a path from each root layer switch to each branch layer switch;
paths from a first root layer switch are assigned to a first set of VLANs;
paths from a second root layer switch are assigned to a second set of VLANs;
the first set of VLANs does not contain any VLANS belonging to the second set of VLANS; and
the second set of VLANs does not contain any VLANs belonging to the first set of VLANs.
6. The system of claim 5 wherein M=2N.
7. The system of claim 1 wherein:
the path from branch layer switch BLS1 to root layer switch RLS1 and a second path from branch layer switch BLS1 to root layer switch RLS1 are aggregated into a single trunk group.
8. The system of claim 1 further comprising:
a plurality of servers coupled to the full bisection bandwidth network; and
the plurality of paths among the nodes comprises a plurality of redundant paths from one of the plurality of servers to another of the plurality of servers.
9. The system of claim 1 further comprising:
a plurality of servers coupled to the full bisection bandwidth network; and
the plurality of paths among the nodes comprises a plurality of redundant paths from each of the plurality of servers to the others of the plurality of servers.
10. The system of claim 1 wherein:
the plurality of paths among the nodes comprises:
a second path redundant to the path from branch layer switch BLS1 to root layer switch RLS1 assigned to the first VLAN.
11. The system of claim 1 wherein:
a plurality of servers is coupled to the branch layer of Ethernet servers;
the plurality of paths among the nodes comprises:
a first path from a first server to a second server; and
a second path redundant to the first path from the first server to the second server.
12. The system of claim 1 wherein:
a plurality of servers coupled to the branch layer of Ethernet servers;
the plurality of paths among the nodes comprises:
redundant paths from each of the plurality of servers through the branch layer of Ethernet switches and the root layer of Ethernet switches to the others of the plurality of servers.
13. A method comprising:
providing a full bisection bandwidth network, having a plurality of nodes and a plurality of paths among the nodes, that is divided into a plurality of Virtual Local Area Networks (“VLANs”), wherein the plurality of VLANs comprises a first VLAN and a second VLAN and the plurality of paths comprises a first path, wherein the first path is between two nodes and does not pass through any intervening nodes, by assigning the first path to the first VLAN and to the second VLAN;
disabling the first path in the first VLAN, wherein the first VLAN with the first path enabled in the first VLAN would not satisfy a spanning tree protocol and the first VLAN with the first path disabled in the first VLAN satisfies the spanning tree protocol;
enabling the first path in the second VLAN, wherein the second VLAN satisfies the spanning tree protocol with the first path enabled in the second VLAN;
the full bisection bandwidth network carrying a traffic load;
balancing the traffic load among the paths;
adding a node;
adding paths to connect the added node to the full bisection bandwidth network;
adjusting the assignments of the paths and the added paths to VLANs such that:
each VLAN satisfies a spanning tree protocol;
each of the plurality of paths is active in the full bisection bandwidth network; and
the network remains a full bisection bandwidth network;
wherein the plurality of nodes comprises a root layer of N Ethernet switches and a branch layer of M Ethernet switches, M>N, the plurality of paths comprises a path from each root layer switch to each branch layer switch, and wherein assigning the first path to the first VLAN and to the second VLAN comprises:
assigning a path from branch layer switch BLS1 to root layer switch RLS1 to the first VLAN; and
assigning a path from branch layer switch BLS2 to root layer switch RLS1 to the second VLAN.
14. A method comprising:
providing a full bisection bandwidth network, having a plurality of nodes and a plurality of paths among the nodes, that is divided into a plurality of Virtual Local Area Networks (“VLANs”), wherein the plurality of VLANs comprises a first VLAN and a second VLAN and the plurality of paths comprises a first path, wherein the first path is between two nodes and does not pass through any intervening nodes, by assigning the first path to the first VLAN and to the second VLAN;
disabling the first path in the first VLAN, wherein the first VLAN with the first path enabled in the first VLAN would not satisfy a spanning tree protocol and the first VLAN with the first path disabled in the first VLAN satisfies the spanning tree protocol; and
enabling the first path in the second VLAN, wherein the second VLAN satisfies the spanning tree protocol with the first path enabled in the second VLAN;
adding a node;
adding added paths to connect the added node to the full bisection bandwidth network;
adjusting the assignments of the paths and the added paths to VLANs such that:
each VLAN satisfies a spanning tree protocol;
each of the plurality of paths is active in the full bisection bandwidth network; and
the network remains a full bisection bandwidth network;
wherein the plurality of nodes comprises a root layer of N Ethernet switches and a branch layer of M Ethernet switches, M>N, the plurality of paths comprises a path from each root layer switch to each branch layer switch, and wherein assigning the first path to the first VLAN and to the second VLAN comprises:
assigning a path from branch layer switch BLS1 to root layer switch RLS1 to the first VLAN; and
assigning a path from branch layer switch BLS2 to root layer switch RLS1 to the second VLAN.
15. A method of claim 14 wherein adjusting the assignments comprises:
adding a new VLAN.
16. A method of claim 14 wherein adjusting the assignments comprises:
adding the added paths to the existing VLANs.
17. The method of claim 14 wherein the full bisection bandwidth network comprises a path A connecting node X and node Y and a path B connecting node X and node Y, such that standard Ethernet protocol would treat path A and path B as redundant paths, the method further comprising:
aggregating Path A and Path B into a single trunk group so that Path A and Path B are active.
18. The method of claim 14 further comprising:
constructing the full bisection bandwidth network to have a fat tree topology.
19. The method of claim 14 further comprising:
constructing the full bisection bandwidth network to have a fully connected mesh topology.
20. The method of claim 14 further comprising:
assigning paths from a first root layer switch to a first set of VLANs;
assigning paths from a second root layer switch to a second set of VLANs;
the first set of VLANs not containing any VLANS belonging to the second set of VLANS; and
the second set of VLANs not containing any VLANs belonging to the first set of VLANs.
21. The method of claim 20 wherein M=2N.
22. The method of claim 14 further comprising:
aggregating the path from the first branch layer switch BLS1 to a first root layer switch RLS1 with the path from the first branch layer switch BLS1 to the first root layer switch RLS1 into a single trunk group.
23. The method of claim 14 wherein a plurality of servers is coupled to the full bisection bandwidth network and the method further comprises:
providing redundant paths from one of the plurality of servers to another of the plurality of servers.
24. The method of claim 14 wherein a plurality of servers is coupled to the full bisection bandwidth network and the method further comprises:
providing redundant paths from each of the plurality of servers to the others of the plurality of servers.
25. The method of claim 14 further comprising:
assigning a path redundant to the path from branch layer switch BLS1 to root layer switch RLS1 to the first VLAN.
26. The method of claim 14 further comprising:
assigning a first path from a first server to a second server; and
assigning a second path redundant to the first path from the first server to the second server.
27. The method of claim 14 further comprising:
assigning redundant paths from each of the plurality of servers through the branch layer of Ethernet switches and the root layer of Ethernet switches to the others of the plurality of servers.
US12/577,979 2009-10-13 2009-10-13 Full bisection bandwidth network Active 2032-07-12 US9270487B1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US12/577,979 US9270487B1 (en) 2009-10-13 2009-10-13 Full bisection bandwidth network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
US12/577,979 US9270487B1 (en) 2009-10-13 2009-10-13 Full bisection bandwidth network

Publications (1)

Publication Number Publication Date
US9270487B1 true US9270487B1 (en) 2016-02-23

Family

ID=55314780

Family Applications (1)

Application Number Title Priority Date Filing Date
US12/577,979 Active 2032-07-12 US9270487B1 (en) 2009-10-13 2009-10-13 Full bisection bandwidth network

Country Status (1)

Country Link
US (1) US9270487B1 (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040190454A1 (en) * 2002-12-19 2004-09-30 Anritsu Corporation Mesh network bridges making operable spanning tree protocol and line fault backup protocol in optimized forwarding environment
US7606178B2 (en) * 2005-05-31 2009-10-20 Cisco Technology, Inc. Multiple wireless spanning tree protocol for use in a wireless mesh network
US20090274153A1 (en) * 2002-10-01 2009-11-05 Andrew Tai-Chin Kuo System and method for implementation of layer 2 redundancy protocols across multiple networks
US20110302346A1 (en) * 2009-01-20 2011-12-08 The Regents Of The University Of California Reducing cabling complexity in large-scale networks

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090274153A1 (en) * 2002-10-01 2009-11-05 Andrew Tai-Chin Kuo System and method for implementation of layer 2 redundancy protocols across multiple networks
US20040190454A1 (en) * 2002-12-19 2004-09-30 Anritsu Corporation Mesh network bridges making operable spanning tree protocol and line fault backup protocol in optimized forwarding environment
US7606178B2 (en) * 2005-05-31 2009-10-20 Cisco Technology, Inc. Multiple wireless spanning tree protocol for use in a wireless mesh network
US20110302346A1 (en) * 2009-01-20 2011-12-08 The Regents Of The University Of California Reducing cabling complexity in large-scale networks

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
"IEEE 802.1Q", http://en.wikipedia.org/wiki/IEEE 802.1Q.
Davis, David "Preventing network loops with Spanning-Tree Protocol (STP)", http://www.petri.co.il/csc-preventing-network-loops-with-stp-8021d.htm, (Jan. 7, 2009).
Dell Inc., "Dell PowerConnect 6200 Systems CLI Reference Guide", (Oct. 2006).
Leiserson, Charles E., "Fat-Trees: Universal Networks for Hardware-Efficient Supercomputing", IEEE Transactions on Computers, vol. C-34, No. 10, Oct. 1985, (Oct. 1, 1985),892-901.

Similar Documents

Publication Publication Date Title
US9288555B2 (en) Data center network architecture
US7734175B2 (en) Network configuring apparatus
CN108234169B (en) Real-time dynamic optimization method for distributed simulation network structure
US6678268B1 (en) Multi-interface point-to-point switching system (MIPPSS) with rapid fault recovery capability
EP1032175B1 (en) System and method for transferring partitioned data sets over multiple threads
US6810211B1 (en) Preferred WDM packet-switched router architecture and method for generating same
US9054828B2 (en) Method and system for managing optical distribution network
US9883261B2 (en) Data switching system, method for sending data traffic, and switching apparatus
CN111669367B (en) Mimicry intranet and construction method thereof
US6580720B1 (en) Latency verification system within a multi-interface point-to-point switching system (MIPPSS)
US9270487B1 (en) Full bisection bandwidth network
Pieris et al. A linear lightwave Benes network
CN103368770B (en) Adaptive ALM overlay networks based on gateway-level topology are built and maintenance system
Cisco Creating and Maintaining VLANs
US20040215761A1 (en) Network management system
US6628648B1 (en) Multi-interface point-to-point switching system (MIPPSS) with hot swappable boards
US6188696B1 (en) Method of operating a communication network to provide load balancing
US6526048B1 (en) Multi-interface point-to-point switching system (MIPPSS) under unified control
US20230308336A1 (en) Deep fusing of Clos star networks to form a global contiguous web
CN112350770B (en) Multi-domain optical network multicast protection method based on sequential game
WO2024119934A1 (en) Modeling method and simulation method of optical transport network, electronic device, and storage medium
US20050141533A1 (en) Provisioning Ethernet dual working mode
CN113132260B (en) Multi-service virtual network system and multi-service separation parallel communication method
CN113132137B (en) Multiple virtual network system for guaranteeing communication and implementation method
RU2642380C2 (en) Method for managing the structure of the information and communication system

Legal Events

Date Code Title Description
AS Assignment

Owner name: TERADATA US, INC, OHIO

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:NGUYEN, CHINH K;STEHLEY, CURTIS H;SIGNING DATES FROM 20091005 TO 20091012;REEL/FRAME:023362/0231

STCF Information on status: patent grant

Free format text: PATENTED CASE

MAFP Maintenance fee payment

Free format text: PAYMENT OF MAINTENANCE FEE, 4TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1551); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

Year of fee payment: 4

REFU Refund

Free format text: REFUND - PAYMENT OF MAINTENANCE FEE, 4TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: R1551); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

MAFP Maintenance fee payment

Free format text: PAYMENT OF MAINTENANCE FEE, 8TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1552); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

Year of fee payment: 8