US20230077717A1 - Method and system for overlay routing with vxlan - Google Patents
Method and system for overlay routing with vxlan Download PDFInfo
- Publication number
- US20230077717A1 US20230077717A1 US18/057,558 US202218057558A US2023077717A1 US 20230077717 A1 US20230077717 A1 US 20230077717A1 US 202218057558 A US202218057558 A US 202218057558A US 2023077717 A1 US2023077717 A1 US 2023077717A1
- Authority
- US
- United States
- Prior art keywords
- vxlan
- address
- frame
- mac
- network device
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L45/00—Routing or path finding of packets in data switching networks
- H04L45/64—Routing or path finding of packets in data switching networks using an overlay routing layer
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L12/00—Data switching networks
- H04L12/28—Data switching networks characterised by path configuration, e.g. LAN [Local Area Networks] or WAN [Wide Area Networks]
- H04L12/46—Interconnection of networks
- H04L12/4641—Virtual LANs, VLANs, e.g. virtual private networks [VPN]
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L12/00—Data switching networks
- H04L12/28—Data switching networks characterised by path configuration, e.g. LAN [Local Area Networks] or WAN [Wide Area Networks]
- H04L12/46—Interconnection of networks
- H04L12/4633—Interconnection of networks using encapsulation techniques, e.g. tunneling
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L45/00—Routing or path finding of packets in data switching networks
- H04L45/58—Association of routers
- H04L45/586—Association of routers of virtual routers
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L45/00—Routing or path finding of packets in data switching networks
- H04L45/66—Layer 2 routing, e.g. in Ethernet based MAN's
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L45/00—Routing or path finding of packets in data switching networks
- H04L45/74—Address processing for routing
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L45/00—Routing or path finding of packets in data switching networks
- H04L45/74—Address processing for routing
- H04L45/745—Address table lookup; Address filtering
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L49/00—Packet switching elements
- H04L49/70—Virtual switches
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L69/00—Network arrangements, protocols or services independent of the application payload and not provided for in the other groups of this subclass
- H04L69/22—Parsing or analysis of headers
Definitions
- Data centers typically include multiple hosts where the hosts, in turn, each execute multiple virtual machines.
- the virtual machines may belong to virtual layer 2 segments that span across a physical layer- 3 data center network using an overlay technology.
- overlay technology when using an overlay technology, virtual machines in different layer 2 segments are unable to communicate.
- the invention in general, in one aspect, relates to a method for routing.
- the method includes receiving, by a first Top of Rack (ToR) switch, a first VXLAN frame comprising a first server media access control (MAC) address, a first ToR switch MAC address, a first server Internet Protocol (IP) address, a VARP VTEP IP address, a first VNI, and a MAC frame, wherein the MAC frame comprises a VARP MAC address, a first virtual machine (VM) IP address associated with a first VM, and a second VM IP address associated with a second VM, wherein the first VM is executing on the first server, decapsulating, by the first ToR switch, the first VXLAN frame to obtain the MAC frame, processing, on the first ToR switch, the MAC frame to obtain a rewritten MAC frame, wherein the rewritten MAC frame comprises a second VM MAC address associated with the second VM and the VARP MAC address, generating, by the first ToR
- the invention in general, in one aspect, relates to a method for routing.
- the method includes receiving, by a first Top of Rack (ToR) switch, a first VXLAN frame comprising a first media access control (MAC) address, a first ToR switch MAC address, a first server Internet Protocol (IP) address, a first VARP VTEP IP address, a first VNI, and a MAC frame, wherein the MAC frame comprises a first VARP MAC address, a first virtual machine (VM) IP address associated with the first VM, and a second VM IP address associated with a second VM, decapsulating, by the first ToR switch, the first VXLAN frame to obtain the MAC frame, processing, on the first ToR switch, the MAC frame to obtain a rewritten MAC frame, wherein the rewritten MAC frame comprises the first ToR switch MAC address and a second MAC address associated with a second ToR switch, generating, by the first ToR switch, a second VXLAN frame comprising the
- the invention in general, in one aspect, relates to a method for routing.
- the method includes receiving, by a first Top of Rack (ToR) switch, a first VXLAN frame comprising a first server media access control (MAC) address, a first ToR switch MAC address, a first server Internet Protocol (IP) address, a first VARP VTEP IP address, a first VNI, and a first MAC frame, wherein the first MAC frame comprises a first VARP MAC address, and an inner IP header, wherein the inner header comprises a first virtual machine (VM) IP address associated with the first VM, and a second VM IP address associated with a second VM, decapsulating, by the first ToR switch, the first VXLAN frame to obtain the MAC frame, processing, on the first ToR switch, the MAC frame to obtain a second MAC frame, wherein the second MAC frame comprises a second MAC address associated with a second ToR switch and the first ToR switch MAC address, routing the second MAC frame to the second
- FIG. 1 shows a system in accordance with one or more embodiments of the invention.
- FIG. 2 shows a VXLAN frame in accordance with one or more embodiments of the invention.
- FIG. 3 shows an exemplary system in accordance with one or more embodiments of the invention.
- FIG. 4 A shows a method for generating a VXLAN frame in accordance with one or more embodiments of the invention.
- FIG. 4 B shows a method for direct overlay routing in accordance with one or more embodiments of the invention.
- FIG. 5 A shows an exemplary path of a payload transmitted using direct overlay routing in accordance with one or more embodiments of the invention.
- FIG. 5 B shows an exemplary MAC frame in accordance with one or more embodiments of the invention.
- FIG. 5 C shows an exemplary VXLAN frame in accordance with one or more embodiments of the invention.
- FIG. 5 D shows an exemplary MAC frame in accordance with one or more embodiments of the invention.
- FIG. 5 E shows an exemplary VXLAN frame in accordance with one or more embodiments of the invention.
- FIG. 6 shows a method for indirect overlay routing in accordance with one or more embodiments of the invention.
- FIG. 7 shows an exemplary path of a payload transmitted using indirect overlay routing in accordance with one or more embodiments of the invention.
- FIG. 8 shows a method for naked overlay routing in accordance with one or more embodiments of the invention.
- FIG. 9 shows an exemplary path of a payload transmitted using naked overlay routing in accordance with one or more embodiments of the invention.
- any component described with regard to a figure in various embodiments of the invention, may be equivalent to one or more like-named components described with regard to any other figure.
- descriptions of these components will not be repeated with regard to each figure.
- each and every embodiment of the components of each figure is incorporated by reference and assumed to be optionally present within every other figure having one or more like-named components.
- any description of the components of a figure is to be interpreted as an optional embodiment which may be implemented in addition to, in conjunction with, or in place of the embodiments described with regard to a corresponding like-named component in any other figure.
- embodiments of the invention relate to routing packets between hosts or virtual machines in different layer 2 domains. More specifically, embodiments of the invention relate to using overlay routing mechanisms in an Internet Protocol (IP) fabric to enable communication between hosts or virtual machines in different layer 2 domains to communication.
- IP Internet Protocol
- the overlay routing mechanisms may include direct routing (see e.g., FIGS. 4 and 5 A- 5 E ), indirect routing (see e.g., FIGS. 6 and 7 ), naked routing (see e.g., FIGS. 8 and 9 ), or a combination thereof (e.g., hybrid routing).
- the overlay routing mechanisms use, at least in part, the VXLAN protocol.
- VXLAN protocol One version of the VXLAN protocol is defined in the document entitled “VXLAN: A Framework for Overlaying Virtualized Layer 2 Networks over Layer 3 Networks” version 09 dated April 2014. The VXLAN protocol is hereby incorporated by reference in its entirety. The invention is not limited to a particular version of VXLAN.
- a layer 2 domain is defined as the set of virtual machines and/or hosts (also referred to as servers) that communicate using the same virtual network identifier (VNI), where the VNI is defined by the VXLAN protocol (see e.g., FIG. 2 ).
- VNI virtual network identifier
- the VNI scopes the MAC frame originated by the virtual machine (or host) such that the MAC frame may only be received by destinations (hosts or virtual machines) associated with the same VNI.
- IP addresses e.g., SS IP address
- a MAC address associated with a specific component in system e.g., a virtual machine, a server, a ToR Switch, a Spine switch, etc.
- a component e.g., a virtual machine, a server, a ToR Switch, a Spine switch, etc.
- IP address e.g., a virtual machine, a server, a ToR Switch, a Spine switch, etc.
- one or more of the aforementioned components may be associated with multiple IP addresses.
- FIG. 1 shows a system in accordance with one or more embodiments of the invention.
- the system includes one or more servers ( 100 A- 100 I), a leaf tier ( 108 ), a spine tier ( 116 ), and one or more routers ( 118 , 120 ).
- the leaf tier and the spine tier may be collectively referred to as the IP Fabric.
- all of the aforementioned components may be co-located in the same physical location. Alternatively, the aforementioned components may not all be co-located. Additional details regarding each of the aforementioned components are provided below.
- a server (also referred to as a host) ( 100 A- 100 I) is a computer system.
- a computer system may include any type of physical system that is configured to generate, send, receive, and/or process MAC frames (see e.g., FIG. 4 - 9 ).
- each of the servers may include or be configured to execute one or more virtual tunnel end points (VTEP) VTEPs (see FIG. 3 ).
- the computer system may also include functionality to execute one or more virtual machines, where each virtual machine may be configured to generate, send, receive, and/or process MAC frames.
- each virtual machine corresponds to an execution environment that is distinct from the execution environment provided by the server upon which it is executing.
- virtual machines examples include, but are not limited to, Oracle® VM and VMware® Virtual Server. (Oracle is a registered trademark of Oracle International Corporation and VMware is a registered trademark of VMware, Inc.).
- the computer system may include a processor, memory, and one or more physical network interfaces.
- Each server is directly connected to at least one Top of Rack (ToR) switch ( 102 , 104 , 106 ) in the leaf tier ( 108 ).
- ToR Top of Rack
- each server is only directly connected to a single ToR switch in the leaf tier ( 108 ).
- the ToR switches in leaf tier ( 108 ) are not directly connected to each other.
- MLAG Multichassis Link Aggregation
- a given ToR switch may be directly connected to one other ToR switch in the leaf tier and a given server may be connected to each of the ToR switches in the MLAG domain.
- Each of the ToR switches may include or be configured to execute one or more virtual tunnel end points (VTEP) VTEPs (see FIG. 3 ).
- VTEP virtual tunnel end points
- Each ToR switch in the leaf tier ( 108 ) is connected to at least one spine switch ( 110 , 112 , 114 ) in the spine tier ( 116 ).
- each ToR switch is connected to every other switch in the spine tier.
- the spine switches in the spine tier ( 116 ) are not directly connected to each other.
- MLAG Multichassis Link Aggregation
- each leaf switch and each spine switch is a physical device that includes persistent storage, memory (e.g., Random Access Memory), one or more processors, and two or more physical ports.
- Each port may be connected to either: (i) a computer system (described above), or (ii) a network device (i.e., any device that is part of the network infrastructure such as a leaf switch, a spine switch or a router).
- Each switch (leaf switch and spine switch) is configured to receive VXLAN frames and/or MAC frames via the ports and determine whether to process the VXLAN and/or MAC frames in accordance with the methods described below in FIGS. 4 , 6 , and 8 .
- the spine switches may be directly connected to one or more routers ( 118 , 120 ) or may be indirectly connected to one or more routers (see FIG. 3 ). In the latter scenario, the spine switches may be connected to one or more edge switches (not shown in FIG. 1 ) that, in turn, are directly connected to one or more routers ( 118 , 120 ).
- the routers are configured to receive MAC frames from other networks (e.g., the Internet) and route the MAC frames towards the appropriate server ( 100 A- 100 I).
- each router includes a number of physical ports (hereafter ports) and is configured to receive MAC frames via the ports and determine whether to (i) drop the MAC frame, or (ii) send the MAC frame out over another one of the ports on the switch.
- the router uses the destination internet protocol (IP) address in the received MAC frame along with a routing table to determine out of which port to send the MAC frame.
- IP internet protocol
- FIG. 2 shows a VXLAN frame in accordance with one or more embodiments of the invention.
- the VXLAN frame ( 200 ) includes: (i) a MAC frame ( 208 ), (ii) a VXLAN header ( 206 ), (iii) an outer IP header ( 204 ), and (iv) an outer Ethernet header ( 202 ).
- a MAC frame 208
- VXLAN header 206
- an outer IP header 204
- iv an outer Ethernet header
- the MAC frame ( 210 ) is generated by a source host or virtual machine and may include an inner header ( 234 ) and a payload ( 222 ).
- the payload ( 222 ) may include the content that the source host or virtual machine is attempting to transmit to the destination host or virtual machine.
- the inner header ( 234 ) includes an inner Ethernet header ( 218 ) and an inner IP header ( 220 ).
- the inner Ethernet header ( 218 ) includes a source MAC address ( 224 ), a destination MAC address ( 226 ).
- the inner IP header ( 220 ) includes a source IP address ( 228 ) and a destination IP address ( 230 ).
- the MAC frame may include other information/content without departing from the invention.
- the VXLAN header ( 206 ) may include, but is not limited to, a virtual network identifier (VNI).
- VNI virtual network identifier
- the VNI scopes the MAC frame ( 208 ) originated by the host or virtual machine such that the MAC frame ( 208 ) may only be received by destination servers or virtual machines associated (via a VTEP) with the same VNI.
- the VXLAN header may include other information/content without departing from the invention.
- the outer Ethernet header ( 202 ) and the outer IP header ( 204 ) are used to route the VXLAN frame from the source VTEP to the destination VTEP.
- the outer Ethernet header ( 302 ) includes the source MAC address ( 210 ) and the next hop MAC address ( 212 ) and the outer IP header ( 204 ) includes the source VTEP IP address ( 214 ) and the destination VTEP IP address ( 216 ).
- the aforementioned mentioned components may include other information/content without departing from the invention.
- the outer Ethernet header ( 202 ), the outer IP header ( 204 ), and the VXLAN header ( 306 ) may be collectively referred to as an outer header ( 232 ).
- the VXLAN frame may include other components without departing from the invention.
- FIG. 3 shows an exemplary system in accordance with one or more embodiments of the invention.
- the system includes two servers (Server S 1 and Server S 2 ), where each of the servers includes two virtual machines and a VTEP.
- Server S 1 includes virtual machine A 1 , virtual machine B 1 , and VTEP 5
- Server S 2 includes virtual machine A 2 , virtual machine B 2 , and VTEP 6 .
- each server and virtual machine is associated with its own Internet Protocol (IP) address and its own media access control (MAC) address.
- IP Internet Protocol
- MAC media access control
- each VTEP on a server e.g., S 1
- IP Internet Protocol
- MAC media access control
- each VTEP on a server is associated with the IP address and MAC address of the server on which it is located.
- each VTEP includes functionality to generate VXLAN frames and process received VXLAN frames, in accordance with the VXLAN protocol, as described in FIGS. 4 A- 9 .
- Each VTEP may be implemented as a combination of software and storage (volatile and/or persistent storage).
- each VTEP may be implemented as a combination of hardware and storage (volatile and/or persistent storage).
- each VTEP may be implemented as a combination of hardware and software.
- each server is associated with two VXLANs.
- virtual machine A 1 and virtual machine A 2 are associated with VXLAN A
- virtual machine B 1 and virtual machine B 2 are associated with VXLAN B.
- VXLAN A and VXLAN B are distinct VXLANs and, as such, are associated with separate VNIs.
- server S 1 is directly connected to ToR switch ToR switch 1 and server S 2 is directly connected to ToR switch ToR switch 4 .
- each server is only connected to a single ToR switch.
- Each ToR switch (ToR switch 1 -ToR switch 4 ) includes a VTEP (VTEP 1 - 4 ).
- Each of the ToR switches is directly connected to every other spine switch (Spine Switch 1 - 3 ) in the spine tier.
- Each of the spine switches is, in turn, directly connected to an edge switch, where the edge switch includes a VTEP (VTEP 7 ).
- the edge switch is directly connected to a router.
- the each VTEP on a ToR (e.g., ToR switch 1 ) is associated with the IP address and MAC address of the ToR on which it is located.
- the aforementioned system is used below to describe various embodiments of the invention. Specifically, the aforementioned system is used to illustrate the different embodiments of overlay routing. However, the invention is not limited to the system shown in FIG. 3 .
- FIGS. 4 A- 4 B show flowcharts in accordance with one or more embodiments of the invention. While the various steps in the flowchart are presented and described sequentially, one of ordinary skill will appreciate that some or all of the steps may be executed in different orders, may be combined or omitted, and some or all of the steps may be executed in parallel. In one embodiment of the invention, the steps shown in FIGS. 4 A- 4 B may be performed in parallel with any other steps shown in FIGS. 6 and 8 without departing from the invention.
- FIGS. 4 A and 4 B show a method for direct overlay routing in accordance with one or more embodiments of the invention.
- the following discussion of direct overlay routing is described in relation to the system in FIG. 3 ; however, embodiments of the invention are not limited to the system shown in FIG. 3 .
- FIGS. 4 A- 4 B describe direct overlay routing to enable virtual machine A 1 (hereafter referred to as a source VM) in VXLAN A to communicate with virtual machine B 2 (hereafter referred to as a destination VM) on VXLAN B.
- virtual machine A 1 is not aware of the VXLAN protocol or of any overlay routing mechanisms; rather, virtual machine A 1 operates as if it can communicate directly with virtual machine B 2 using conventional routing mechanisms.
- the source VM issues an ARP request using the VARP IP address that is associated with VXLAN A.
- the VARP IP address Prior to issuing the ARP request in step 400 , the VARP IP address is set as the default gateway address for the overlay network.
- a ToR switch implementing one or more embodiments of the invention (e.g., a ToR Switch in the leaf tier (as discussed above)), receives the ARP request and subsequently generates an ARP response that includes the VARP MAC address.
- the ToR switch that sent the ARP response is the ToR Switch that is directly connected to the source server upon which the source VM is executing.
- each ToR switch includes a VARP IP address configured on each switch virtual interface (SVI) for every layer 2 domain with which the ToR switch is associated. For example, if the ToR switch is associated with VXLAN A and VXLAN B, then the VARP IP address assigned to the SVI for VXLAN A may be 192.168.1.1 and VARP IP address assigned to the SVI for VXLAN B may be 192.168.2.1.
- Each ToR Switch includes a VARP IP address to VARP MAC address mapping, such that when an Address Resolution Protocol (ARP) request includes any VARP IP address, the VARP MAC address is returned in the ARP response. There may be one VARP MAC address for each layer 2 domain.
- ARP Address Resolution Protocol
- the VARP MAC address corresponds to the MAC address that hosts (or virtual machines) use to send MAC frames that require routing. Accordingly, when a TOR switch receives a MAC frame that includes a VARP MAC address as the destination address, the ToR Switch removes the Ethernet header from the MAC frame and determines the next hop for the IP packet (i.e., IP header and payload).
- the source VM receives the VARP MAC address (via the ARP response).
- the source VM generates a source MAC frame that includes, at least, (i) the source VM MAC address as the source MAC address, (ii) the VARP MAC address as the destination MAC address, (iii) VM A 1 IP address as the source IP address, and (iv) VM B 2 IP address as the destination IP address.
- the source MAC frame (generated in Step 404 ) is transmitted towards a virtual switch (also referred to as vswitch) and/or hypervisor on the source server.
- a virtual switch also referred to as vswitch
- the source server's vswitch receives the aforementioned source MAC frame.
- the source server is the server upon which the source VM is executing. Further, the source server is executing a VTEP (e.g., in a hypervisor). The source server may also be executing a virtual switch (vswitch).
- a lookup is performed in the vswtich MAC table using the VARP MAC address. The result of the lookup is identification of a VXLAN binding that indicates the VTEP corresponding to the VARP MAC address is the VTEP associated with a VARP VTEP IP address. A second lookup is then performed using the source server's physical IP routing table.
- the result of the second lookup is a determination that the VARP VTEP IP address matches the default route in the physical IP routing table.
- the default route indicates that the next hop for the VXLAN frame is the ToR Switch directly connected to the SS. More specifically, the default route includes the IP address of the ToR Switch.
- the MAC address for the ToR Switch (identified in Step 410 ) is determined. More specifically, a determination is made that the source server's network interface card is configured with a subnet that includes an IP address of the ToR Switch. The IP address of the ToR switch is used to obtain the MAC address of the ToR Switch.
- the MAC address of the ToR Switch may be determined using, for example, ARP, if the MAC address is not already present in an ARP table on the source server.
- the SS VTEP encapsulates the source MAC frame within a VXLAN frame (see e.g., FIG. 2 ). More specifically, the VXLAN frame includes an outer header with the following information: a MAC address of the source server (as the source MAC address), a MAC address of the ToR Switch (as the destination MAC address) (i.e., the MAC address obtained in Step 412 ), an IP address of the SS (as the source IP address), the VARP VTEP IP address (as the destination IP address), and VNI A (i.e., the VNI associated with VXLAN A).
- the SS transmits the VXLAN frame to the ToR switch that is directly connected to the SS.
- the VTEP on the ToR Switch receives the VXLAN frame and removes the outer header (see e.g., 232 in FIG. 2 ) to obtain the MAC frame.
- the received VXLAN frame is trapped and decapsulated because the VXLAN frame includes the ToR switch MAC address as the destination MAC address in the outer Ethernet header and includes the VARP VTEP IP address as the destination IP address in the outer IP header.
- the ToR Switch processes the MAC frame in order to obtain a rewritten MAC frame.
- the ToR Switch performs a routing function using the VM B 2 IP address in order to determine that the ToR switch is directly connected (from an IP point of view) to VM B 2 .
- ToR switch routes the MAC frame as it is operating as a default gateway. Based on this determination, the VM B 2 MAC address is obtained.
- ARP may be used to obtain the VM B 2 MAC address.
- the ToR switch includes a routing table entry for each subnet that includes servers connected to the leaf tier and for each subnet that includes virtual machines, (see e.g., FIG. 3 ).
- the ToR switch includes two routing tables: one for the overlay network, and one for the underlay network.
- the underlay routing table includes a route for each subnet of servers or other equipment attached to the leaf tier, and one or more routes (possibly including a default route) pointing towards external network elements.
- the overlay routing table includes information about the IP segments carried by each layer 2 domain.
- there is one underlay routing table and a number of overlay routing tables (e.g., one overlay routing table per routing domain, which possibly correspond to different tenants in a multi-tenant data center).
- the inner MAC frame received in the VXLAN frame in step 420 is rewritten to remove the ToR Switch MAC address as the destination MAC address and to replace it with the VM B 2 MAC address. Further, the source MAC address in the inner MAC frame may be replaced with VARP MAC address. (See e.g., FIG. 5 D ).
- the VTEP on the ToR Switch encapsulates the rewritten MAC frame (obtained in step 422 ) in a VXLAN frame. More specifically, the VXLAN frame includes an outer header with the following information: ToR switch MAC address (as the source MAC address), a MAC address of next hop (as the destination MAC address), a VARP VTEP IP address (as the source IP address), an IP address of server S 2 (as the destination IP address), and VNI B (i.e., the VNI associated with VXLAN B).
- the destination IP address in the outer header corresponds to a destination server (i.e., server S 2 ) that includes the VTEP that will decapsulate the VXLAN frame generated in step 424 .
- the destination IP address may be determined using the VM B 2 IP address.
- VNI B is included in the VXLAN frame because VM B 2 is associated with VNI B and, as such, VNI B is required to be included for VM B 2 to ultimately receive the MAC frame generated in step 422 .
- the VXLAN frame generated in step 416 is transmitted, via the IP Fabric, to the VTEP on server S 2 .
- the VXLAN frame is transmitted in accordance with standard IP routing mechanisms through the IP fabric until it reach server S 2 .
- the VXLAN frame may be transmitted to spine switch 2 and spine switch 2 may subsequently transmit the VXLAN frame to ToR switch 4 .
- ToR switch 4 may subsequently transmit the VXLAN frame to server S 2 .
- the outer Ethernet header of the VXLAN frame is rewritten at each hop in the IP fabric until it reaches server S 2 .
- the VTEP on the server S 2 receives the VXLAN frame from ToR switch 4 and removes the outer header (see e.g., 232 in FIG. 2 ) to obtain the MAC frame (generated in Step 408 ).
- the VTEP on Server S 2 bridges (i.e., sends using the destination MAC address in the MAC frame) the MAC frame to virtual machine B 2 .
- VM B 2 subsequently processes the MAC frame and extracts the payload.
- FIG. 5 A shows an exemplary path of a payload transmitted using direct overlay routing in accordance with one or more embodiments of the invention. More specifically, FIG. 5 A shows an exemplary path the payload from VM A 1 may take to reach VM B 2 . The exemplary path tracks the path described in FIGS. 4 A- 4 B .
- the components shown in FIG. 5 A correspond to like named components in FIG. 3 and FIGS. 4 A- 4 B .
- the initial VXLAN frame (which encapsulated the initial MAC frame including the payload) is transmitted by server S 1 to ToR switch 1 switch, the VXLAN frame is transmitted on VXLAN A.
- the initial VXLAN frame is generated in accordance with FIG. 4 A .
- FIG. 5 B shows a source MAC frame ( 500 ) generated in accordance with FIG. 4 A and FIG. 5 C shows a VXLAN frame ( 502 ) generated in accordance with FIG. 4 A .
- the new resulting MAC frame (see FIG. 5 D, 504 ) is encapsulated into a new VXLAN frame (see FIG. 5 E, 506 ) and transmitted towards server S 2 .
- the new VXLAN frame is transmitted on VXLAN B.
- Embodiments of the invention enable ToR switch 1 switch to take a MAC frame received via one VXLAN and transmit the MAC frame (a portion of which is rewritten) in via a separate VXLAN.
- this functionality is achieved by first routing the MAC frame and then generating and IP forwarding the VXLAN frame.
- FIG. 6 shows a flowchart in accordance with one or more embodiments of the invention. While the various steps in the flowchart are presented and described sequentially, one of ordinary skill will appreciate that some or all of the steps may be executed in different orders, may be combined or omitted, and some or all of the steps may be executed in parallel. In one embodiment of the invention, the steps shown in FIG. 6 may be performed in parallel with any other steps shown in FIGS. 4 A, 4 B and 8 without departing from the invention.
- FIG. 6 shows a method for indirect overlay routing in accordance with one or more embodiments of the invention.
- the following discussion of indirect overlay routing is described in relation to the system in FIG. 3 ; however, embodiments of the invention are not limited to the system shown in FIG. 3 .
- the method shown in FIG. 6 describes indirect overlay routing to enable virtual machine A 1 in VXLAN A to communicate with virtual machine B 2 on VXLAN B. From the perspective of virtual machine A 1 , virtual machine A 1 is not aware of the VXLAN protocol or of any overlay routing; rather, virtual machine A 1 operates as if it can communicate directly with virtual machine B 2 using conventional routing mechanisms.
- the generation of the VXLAN frame that is transmitted from the source server to a ToR switch is performed in accordance with FIG. 4 A .
- the specific VARP MAC address and VARP VTEP IP address pair that is present in a given VXLAN Frame may vary based upon the layer 2 domain in which source VM and source server are located.
- the VTEP on ToR switch 2 receives the VXLAN frame and removes the outer header (see e.g., 232 in FIG. 2 ) to obtain the MAC frame.
- the VXLAN frame received in Step 600 includes a VARP VTEP IP address for VXLAN A (as the outer destination IP address) and a VARP MAC address for VXLAN A (as the inner header destination MAC address).
- the ToR Switch referenced in step 600 receives, traps, and decapsulates the VXLAN frame because the VXLAN frame includes the ToR switch MAC address of the ToR switch as the destination MAC address in the outer Ethernet header and includes the VARP VTEP IP address as the destination IP address in the outer IP header.
- VM A 1 prior to the generation of the aforementioned MAC frame, VM A 1 is configured to use a VARP VTEP IP address as the default gateway, which is implemented on ToR Switch 2 and other ToR Switches thereby providing active-active redundancy. Further, ToR switch 2 is also associated with a specific VARP MAC address, which in combination with the aforementioned VARP VTEP IP address, enables the VXLAN frame transmitted by the source server to reach ToR switch 2 .
- the ToR switch 2 processes the MAC frame in order to obtain a rewritten MAC frame. More specifically, in one embodiment of the invention, the ToR switch 2 performs a routing function using the VM B 2 IP address in order to determine that ToR switch 2 is directly connected to ToR Switch 3 (from an IP point of view). Based on this determination, the next hop MAC address for the MAC frame is obtained, which in this example is the MAC address of ToR Switch 3 .
- the IP fabric includes a dedicated layer 2 network (with a dedicated VNI) interconnecting all ToR switch routing functions thereby enabling the ToR switches to exchange information (e.g., using interior gateway protocol (IGP)) about which ToR switch provides routes to which overlay subnet(s).
- IGP interior gateway protocol
- the routing table in ToR switch 2 includes a route table entry specifying a route to the appropriate ToR switch from which VM B 2 may be accessed. Further, assume that the routing table entry indicates that VM B 2 is reachable via ToR switch 3 . Accordingly, the MAC frame received in the VXLAN frame in step 600 is rewritten to remove the VARP MAC address as the destination MAC address and to replace it with the ToR switch 3 MAC address. Further, the source MAC address ToR Switch 2 MAC address.
- the VTEP on ToR Switch 2 encapsulates the rewritten MAC frame (obtained in step 602 ) in a VXLAN frame. More specifically, the VXLAN frame includes an outer header with the following information: a MAC address of the ToR Switch 2 (e.g., ToR switch 2 router MAC address) (as the source MAC address), a MAC address of the next hop (e.g., the MAC address Spine Tier Switch 2 ) (as the destination MAC address), an IP address of ToR switch 2 (as the source IP address) (e.g., ToR switch 2 VTEP IP address), an IP address of ToR switch 3 (as the destination IP address) (e.g., ToR switch 2 VTEP IP address), and VNI C (i.e., the VNI associated with VXLAN C).
- ToR switch 2 e.g., ToR switch 2 router MAC address
- a MAC address of the next hop e.g., the MAC address Spine Tier Switch
- the destination IP address in the outer header corresponds to ToR that includes the VTEP that will decapsulate the VXLAN frame generated in step 604 .
- the destination VTEP may be determined using the VM B 2 IP address.
- VNI C is included in the VXLAN frame because ToR switch 3 is associated with VNI C and, as such, VNI C is required to be included for ToR switch 3 to ultimately receive the MAC frame generated in step 604 .
- the VXLAN frame generated in step 604 is transmitted, via the IP Fabric to ToR switch 3 .
- the VXLAN frame is forwarded in accordance with standard IP routing mechanisms through the IP fabric until it reaches ToR switch 3 .
- the VXLAN frame may be transmitted to spine switch 2 and spine switch 2 may subsequently route the VXLAN frame to ToR switch 3 .
- spine switch 2 may subsequently route the VXLAN frame to ToR switch 3 .
- the outer Ethernet header of the VXLAN frame is rewritten at each hop it traverses in the IP Fabric.
- the VTEP on ToR switch 3 receives the VXLAN frame from ToR switch 2 and removes the outer header (see e.g., 232 in FIG. 2 ) to obtain the MAC frame (generated in Step 602 ).
- ToR switch 3 subsequently processes the MAC frame in order to obtain a rewritten MAC frame. More specifically, in one embodiment of the invention, ToR switch 3 performs a routing function using the VM B 2 IP address in order to obtain the VM B 2 MAC address.
- ToR switch 3 includes a routing table, where the routing table includes a routing table entry for VM 2 .
- the MAC frame received in the VXLAN frame in step 608 is rewritten to remove the ToR switch 3 MAC address as the destination MAC address and to replace it with the VM B 2 MAC address.
- the source MAC address in the inner frame is VARP MAC address for VXLAN B.
- the VTEP on ToR switch 3 encapsulates the rewritten MAC frame (obtained in step 608 ) in a VXLAN frame. More specifically, the VXLAN frame includes an outer header with the following information: a MAC address of ToR switch 3 (as the source MAC address) (e.g., ToR switch 3 router MAC address), a MAC address of the next hop (e.g., Spine Tier Switch 3 (as the destination MAC address), a VARP VTEP IP address for VXLAN B (as the source IP address), an IP address of server S 2 (as the destination IP address), and VNI B (i.e., the VNI associated with VXLAN B).
- ToR switch 3 an MAC address of ToR switch 3
- ToR switch 3 router MAC address e.g., ToR switch 3 router MAC address
- a MAC address of the next hop e.g., Spine Tier Switch 3 (as the destination MAC address)
- the destination IP address in the outer header corresponds to server S 2 , which includes the VTEP that will decapsulate the VXLAN frame generated in step 610 .
- the destination VTEP may be determined using the VM B 2 IP address.
- VNI B is included in the VXLAN frame because VM B 2 is associated with VNI B and, as such, VNI B is required to be included for VM B 2 to ultimately receive the MAC frame generated in step 610 .
- the ToR switch 3 MAC address may be used in place of the VARP MAC address and the ToR switch 3 IP address may be used in place of the VARP VTEP IP address.
- the VXLAN frame generated in step 610 is transmitted, via the IP Fabric, to server S 2 .
- the VXLAN frame is routed in accordance with standard IP routing mechanisms through the IP fabric until it reaches server S 2 .
- the VXLAN frame may be transmitted to spine switch 3 and spine switch 3 may subsequently route the VXLAN frame to ToR switch 4 .
- ToR switch 4 may subsequently route the VXLAN frame to server S 2 .
- the outer Ethernet header of the VXLAN frame is rewritten at each hop it traverses in the IP Fabric.
- the VTEP on the server S 2 receives the VXLAN frame from ToR switch 4 and removes the outer header (see e.g., 232 in FIG. 2 ) to obtain the MAC frame (generated in Step 608 ).
- the VTEP on server S 2 bridges (i.e., sends using the destination MAC address in the MAC frame) the MAC frame to VM B 2 .
- VM B 2 subsequently processes the MAC frame and extracts the payload.
- each of the ToR switches in the leaf tier only include routing table entries for a subset of servers and/or virtual machines.
- each of the ToR switches includes routing table entries for each of the other ToR switches, where the routing table entries indicate to which subset of servers and/or virtual machines may be directly routed to by a given ToR switch.
- the ToR switches share the aforementioned routing information, for example, using interior gateway protocol (IGP).
- IGP interior gateway protocol
- the indirect overlay routing embodiment uses a separate layer 2 domain for ToR switch-to-ToR switch communication.
- FIG. 7 shows an exemplary path of a payload transmitted using indirect overlay routing in accordance with one or more embodiments of the invention. More specifically, FIG. 7 shows an exemplary path the payload from VM A 1 may take to reach VM B 2 . The exemplary path tracks the path described in FIG. 6 .
- the components shown in FIG. 7 correspond to like named components in FIG. 3 and FIG. 6 .
- the initial VXLAN frame (which encapsulated the initial MAC frame including the payload) is routed by server S 1 (via ToR Switch 1 ) to ToR switch 2 , the VXLAN frame is transmitted on VXLAN A.
- the initial VXLAN frame is generated in accordance with FIG. 4 A as described above with respect to FIGS.
- ToR switch 2 After the routing of the MAC frame (see Step 602 in FIG. 6 ), the new resulting MAC frame is encapsulated into a new VXLAN frame and routed to ToR switch 3 (via a spine tier switch). The new VXLAN frame is transmitted on VXLAN C. After receiving the VXLAN frame from ToR switch 2 , ToR switch 3 routes the MAC frame (see Step 608 in FIG. 6 ). The new resulting MAC frame is encapsulated into a new VXLAN frame and transmitted to server S 2 on VXLAN B.
- Embodiments of the invention enable ToR switch 2 and ToR switch 3 to take a MAC frame received via one VXLAN and transmit the MAC frame (a portion of which is rewritten) and transmit it in via a separate VXLAN.
- this functionality is achieved by first routing the MAC frame and then forwarding the VXLAN frame.
- FIG. 8 shows a flowchart in accordance with one or more embodiments of the invention. While the various steps in the flowchart are presented and described sequentially, one of ordinary skill will appreciate that some or all of the steps may be executed in different orders, may be combined or omitted, and some or all of the steps may be executed in parallel. In one embodiment of the invention, the steps shown in FIG. 8 may be performed in parallel with any other steps shown in FIGS. 4 A- 4 B and 6 without departing from the invention.
- FIG. 8 shows a method for naked overlay routing in accordance with one or more embodiments of the invention.
- the following discussion of indirect overlay routing is described in relation to the system in FIG. 3 ; however, embodiments of the invention are not limited to the system shown in FIG. 3 .
- the method shown in FIG. 8 describes naked overlay routing to enable virtual machine A 1 in VXLAN A to communicate with virtual machine B 2 on VXLAN B. From the perspective of virtual machine A 1 , virtual machine A 1 is not aware of the VXLAN protocol or of any overlay routing; rather, virtual machine A 1 operates as if it can communicate directly with virtual machine B 2 using conventional routing mechanisms.
- the generation of the VXLAN frame that is transmitted from the source server to a ToR switch is performed in accordance with FIG. 4 A . However, instead of single VARP MAC address and a single VARP VTEP IP address for all ToR switches, there are multiple VARP MAC addresses and VARP VTEP IP addresses, where different VARP MAC addresses and VARP VTEP IP addresses are used for different layer 2 domains.
- the specific VARP MAC address and VARP VTEP IP address pair that is present in a given VXLAN Frame may vary based upon the layer 2 domain in which source VM and source server are located. Said another way, because different ToR switches route in in and out of different layer 2 domains of VXLAN, it is essential that the VXLAN frames issued by the source server reach the appropriate ToR Switch (i.e., the ToR switch that has the appropriate routing information). This is enabled by using distinct VARP VTEP IP address and VARP MAC address combinations.
- the VTEP on ToR switch 2 receives the VXLAN frame and removes the outer header (see e.g., 232 in FIG. 2 ) to obtain the MAC frame.
- the VXLAN frame received in Step 600 includes a VARP VTEP IP address for VXLAN A and a VARP MAC address for VXLAN A.
- the ToR Switch referenced in step 800 receives, traps, and decapsulates the VXLAN frame because the VXLAN frame includes the ToR MAC address of the ToR Switch as the destination MAC address in the outer Ethernet header and includes the VARP VTEP IP address for VXLAN A as the destination IP address in the outer IP header.
- VM A 1 prior to the generation of the aforementioned MAC frame, VM A 1 is configured to use a VARP VTEP IP address for VXLAN A as the default gateway, which is implemented on ToR switch 2 and other ToR switches thereby providing active-active redundancy. Further, ToR switch ToR switch 2 is associated with a specific VARP MAC address, which in combination with the aforementioned VARP VTEP IP address enables the VXLAN frame transmitted by the source server to reach ToR switch ToR switch 2 .
- the MAC frame is routed, via the IP fabric, to a ToR switch from which VM B 2 may be reached.
- VM B 2 may be reached via ToR switch 3 .
- the routing table in ToR switch 2 includes a routing table entry specifying a route determined using VM B 2 IP address, where the routing table entry indicates that VM B 2 is reachable via spine switch 2 . Accordingly, the MAC frame received in the VXLAN frame in step 800 is rewritten to remove the VARP MAC address as the destination MAC address and to replace it with the spine switch 2 MAC address.
- the ToR switch 2 MAC address is included as the source MAC address in the rewritten MAC frame.
- the rewritten MAC frame is subsequently transmitted to spine switch 2 .
- Spine switch 2 upon receipt of the rewritten MAC frame, performs a routing function using the VM B 2 IP address and determines that the next hop is ToR switch 3 .
- Spine switch 2 rewrites the MAC frame it received to remove the Spine switch 2 MAC address as the destination MAC address and to replace it with the ToR switch 3 MAC address.
- the rewritten MAC frame is subsequently transmitted to ToR switch 3 .
- the ToR switch 3 receives the MAC frame from spine switch 2 .
- ToR switch 3 processes the MAC frame in order to obtain a rewritten MAC frame. More specifically, in one embodiment of the invention, ToR switch 3 performs a routing function using the VM B 2 IP address in order to obtain the VM B 2 MAC address.
- ToR switch 3 includes a routing table, where the routing table includes a routing table entry for VM 2 . Accordingly, in the instant example, the MAC frame received in step 804 is rewritten to remove the ToR switch 3 MAC address as the destination MAC address and to replace it with the VM B 2 MAC address. Further, the source MAC address in the inner frame is VARP MAC address for VXLAN B.
- the VTEP on ToR switch 3 encapsulates the rewritten MAC frame (obtained in step 806 ) in a VXLAN frame. More specifically, the VXLAN frame includes an outer header with the following information: a MAC address of ToR switch 3 (as the source MAC address), a MAC address of the next hop (i.e., MAC address of Server S 2 ) (as the destination MAC address), a VARP VTEP IP address for VXLAN B (as the source IP address), an IP address of server S 2 (as the destination IP address), and VNI B (i.e., the VNI associated with VXLAN B).
- the destination IP address in the outer header corresponds to the server that includes the VTEP that will decapsulate the VXLAN frame generated in step 808 .
- the destination server i.e., server S 2
- the destination server may be determined using the VM B 2 IP address.
- VNI B is included in the VXLAN frame because VM B 2 is associated with VNI B and, as such, VNI B is required to be included for VM B 2 to ultimately receive the MAC frame generated in step 808 .
- the ToR switch 3 MAC address may be used in place of the VARP MAC address and the ToR switch 3 IP address may be used in place of the VARP VTEP IP address.
- the VXLAN frame generated in step 808 is transmitted, via the IP fabric, to the VTEP on server S 2 .
- the VXLAN frame is routed in accordance with standard IP routing mechanisms through the IP fabric until it reaches server S 2 .
- the VXLAN frame may be transmitted to spine switch 2 and spine switch 2 may subsequently transmit the VXLAN frame to ToR switch 4 .
- ToR switch 4 may subsequently transmit the VXLAN frame to server S 2 .
- the outer Ethernet header of the VXLAN frame is rewritten at hop is traversed in the IP Fabric.
- step 812 the VTEP on the server S 2 receives the VXLAN frame from ToR switch 4 and removes the outer header (see e.g., 232 in FIG. 2 ) to obtain the MAC frame (generated in Step 806 ).
- step 814 the VTEP on server S 2 bridges (i.e., sends using the destination MAC address in the MAC frame) the MAC frame to VM B 2 .
- VM B 2 subsequently processes the MAC frame and extracts the payload.
- Naked overlay routing is similar to indirect overlay routing in that the payload from VM A 1 traverses the same number of switches in both of the aforementioned embodiments of overlay routing.
- naked overlay routing does not require the additional layer 2 domain from the ToR switches. Instead, naked overlay routing requires the participation of the spine switches, where the spine switches have knowledge (via their routing tables) about which layer 2 domains are accessible by each ToR. In contrast, in the indirect overlay routing embodiment, the spine switches are not aware of which layer 2 domains are accessible by each ToR.
- FIG. 9 shows an exemplary path of a payload transmitted using of naked overlay routing in accordance with one or more embodiments of the invention. More specifically, FIG. 9 shows an exemplary path the payload from VM A 1 may take to reach VM B 2 . The exemplary path tracks the path described in FIG. 8 .
- the components shown in FIG. 9 correspond to like named components in FIG. 3 and FIG. 8 .
- the initial VXLAN frame (which encapsulated the initial MAC frame including the payload) is transmitted by server S 1 to ToR switch 2 , the VXLAN frame is transmitted on VXLAN A.
- the initial VXLAN frame is generated in accordance with FIGS. 4 A and 8 .
- the MAC frame is routed (without VXLAN) to ToR switch 3 via a spine tier switch. After receiving the MAC frame from the spine tier switch, ToR switch 3 routes the MAC frame (see Step 808 in FIG. 8 ). The new resulting MAC frame is encapsulated into a new VXLAN frame and transmitted to server S 2 on VXLAN B.
- Embodiments of the invention enable ToR switch 2 to take a MAC frame received via one VXLAN and transmit the MAC frame (a portion of which is rewritten) and transmit it via a separate VXLAN.
- this functionality is achieved by first routing the MAC frame and then transmitting the VXLAN frame.
- the network topology may be arranged such that for a given layer 2 domain it may (i) use direct overlay routing to communicate with a first set of other layer 2 domains and (ii) use indirect and/or naked routing to communicate with a second set of layer 2 domains.
Landscapes
- Engineering & Computer Science (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Computer Security & Cryptography (AREA)
- Data Exchanges In Wide-Area Networks (AREA)
Abstract
In general, embodiments of the invention relate to routing packets between hosts or virtual machines in different layer 2 domains. More specifically, embodiments of the invention relate to using overlay routing mechanisms in an Internet Protocol (IP) fabric to enable communication between hosts or virtual machines in different layer 2 domains to communication. The overlay routing mechanisms may include direct routing, indirect routing, naked routing, or a combination thereof (e.g., hybrid routing).
Description
- Pursuant to 35 U.S.C. § 119(e), this application claims benefit of U.S. Provisional Application No. 61/842,132 filed on Jul. 2, 2013, entitled “METHOD AND SYSTEM FOR OVERLAY ROUTING WITH VXLAN.” The disclosure of the U.S. Provisional Application is incorporated herein by reference in its entirety.
- Pursuant to 35 U.S.C. § 119(e), this application claims benefit of U.S. Provisional Application No. 61/846,259 filed on Jul. 15, 2013, entitled “METHOD AND SYSTEM FOR TOP OF RACK SWITCH ROUTING WITH VXLAN AND NSX.” The disclosure of the U.S. Provisional Application is incorporated herein by reference in its entirety.
- Data centers typically include multiple hosts where the hosts, in turn, each execute multiple virtual machines. The virtual machines may belong to
virtual layer 2 segments that span across a physical layer-3 data center network using an overlay technology. Traditionally, when using an overlay technology, virtual machines indifferent layer 2 segments are unable to communicate. - In general, in one aspect, the invention relates to a method for routing. The method includes receiving, by a first Top of Rack (ToR) switch, a first VXLAN frame comprising a first server media access control (MAC) address, a first ToR switch MAC address, a first server Internet Protocol (IP) address, a VARP VTEP IP address, a first VNI, and a MAC frame, wherein the MAC frame comprises a VARP MAC address, a first virtual machine (VM) IP address associated with a first VM, and a second VM IP address associated with a second VM, wherein the first VM is executing on the first server, decapsulating, by the first ToR switch, the first VXLAN frame to obtain the MAC frame, processing, on the first ToR switch, the MAC frame to obtain a rewritten MAC frame, wherein the rewritten MAC frame comprises a second VM MAC address associated with the second VM and the VARP MAC address, generating, by the first ToR switch, a second VXLAN frame comprising the first ToR Switch MAC address, a next hop MAC address, a VARP VTEP IP address, a second server IP address, a second VNI, and the rewritten MAC frame, wherein the second server IP address is associated with a second server, and wherein the second VM executes on the second server, wherein the first VM does not execute on the second server, and routing the second VXLAN frame through an IP fabric to the second server, wherein the IP Fabric comprises a spine tier comprising a spine switch and a leaf tier comprising the first ToR switch, and a second ToR switch and wherein the second server is connected to the second ToR switch.
- In general, in one aspect, the invention relates to a method for routing. The method includes receiving, by a first Top of Rack (ToR) switch, a first VXLAN frame comprising a first media access control (MAC) address, a first ToR switch MAC address, a first server Internet Protocol (IP) address, a first VARP VTEP IP address, a first VNI, and a MAC frame, wherein the MAC frame comprises a first VARP MAC address, a first virtual machine (VM) IP address associated with the first VM, and a second VM IP address associated with a second VM, decapsulating, by the first ToR switch, the first VXLAN frame to obtain the MAC frame, processing, on the first ToR switch, the MAC frame to obtain a rewritten MAC frame, wherein the rewritten MAC frame comprises the first ToR switch MAC address and a second MAC address associated with a second ToR switch, generating, by the first ToR switch, a second VXLAN frame comprising the first ToR switch MAC address, a first next hop MAC address, a first ToR switch IP address, a second ToR IP address, a second VNI, and the rewritten MAC frame, routing the second VXLAN frame through an IP fabric to the second ToR switch, wherein the IP Fabric comprises a spine switch, the first ToR switch, and the second ToR switch, receiving, by the second ToR switch, the second VXLAN frame, decapsulating, by the second ToR switch, the second VXLAN frame to obtain the rewritten MAC frame, processing, on the second ToR switch, the rewritten MAC frame to obtain a second rewritten MAC frame, wherein the second rewritten MAC frame comprises a second VM MAC address and a second VARP MAC address, generating, by the second ToR switch, a third VXLAN frame comprising a second ToR switch MAC address, a second next hop MAC address, a second VARP VTEP IP address, a second server IP address, a third VNI, and the second rewritten MAC frame, wherein the second server IP address is associated with the second server, and wherein the second VM does not execute on the second server, and routing the third VXLAN frame through the IP fabric to the second server.
- In general, in one aspect, the invention relates to a method for routing. The method includes receiving, by a first Top of Rack (ToR) switch, a first VXLAN frame comprising a first server media access control (MAC) address, a first ToR switch MAC address, a first server Internet Protocol (IP) address, a first VARP VTEP IP address, a first VNI, and a first MAC frame, wherein the first MAC frame comprises a first VARP MAC address, and an inner IP header, wherein the inner header comprises a first virtual machine (VM) IP address associated with the first VM, and a second VM IP address associated with a second VM, decapsulating, by the first ToR switch, the first VXLAN frame to obtain the MAC frame, processing, on the first ToR switch, the MAC frame to obtain a second MAC frame, wherein the second MAC frame comprises a second MAC address associated with a second ToR switch and the first ToR switch MAC address, routing the second MAC frame to the second ToR switch via a spine tier, wherein the second MAC frame is not transmitted using a VXLAN protocol, receiving, by the second ToR switch from the spine tier, a third MAC frame comprising the inner IP header, processing, on the second ToR switch, the third MAC frame to obtain a fourth MAC frame, wherein the fourth MAC frame comprises a second VM MAC address and a second VARP MAC address, generating, by the second ToR switch, a second VXLAN frame comprising the second MAC address, a second next hop MAC address, a second VARP VTEP IP address, a second server IP address, a second VNI, and the second rewritten MAC frame, wherein the second server IP address is associated with a second server and wherein the second VM executes on second server, wherein the first VM does not execute on the second server, and routing the second VXLAN frame through a IP fabric towards the second server, wherein the IP fabric comprises the spine tier, the first ToR, and the second ToR switch.
- Other aspects of the invention will be apparent from the following description and the appended claims.
-
FIG. 1 shows a system in accordance with one or more embodiments of the invention. -
FIG. 2 shows a VXLAN frame in accordance with one or more embodiments of the invention. -
FIG. 3 shows an exemplary system in accordance with one or more embodiments of the invention. -
FIG. 4A shows a method for generating a VXLAN frame in accordance with one or more embodiments of the invention. -
FIG. 4B shows a method for direct overlay routing in accordance with one or more embodiments of the invention. -
FIG. 5A shows an exemplary path of a payload transmitted using direct overlay routing in accordance with one or more embodiments of the invention. -
FIG. 5B shows an exemplary MAC frame in accordance with one or more embodiments of the invention. -
FIG. 5C shows an exemplary VXLAN frame in accordance with one or more embodiments of the invention. -
FIG. 5D shows an exemplary MAC frame in accordance with one or more embodiments of the invention. -
FIG. 5E shows an exemplary VXLAN frame in accordance with one or more embodiments of the invention. -
FIG. 6 shows a method for indirect overlay routing in accordance with one or more embodiments of the invention. -
FIG. 7 shows an exemplary path of a payload transmitted using indirect overlay routing in accordance with one or more embodiments of the invention. -
FIG. 8 shows a method for naked overlay routing in accordance with one or more embodiments of the invention. -
FIG. 9 shows an exemplary path of a payload transmitted using naked overlay routing in accordance with one or more embodiments of the invention. - Specific embodiments of the invention will now be described in detail with reference to the accompanying figures. In the following detailed description of embodiments of the invention, numerous specific details are set forth in order to provide a more thorough understanding of the invention. However, it will be apparent to one of ordinary skill in the art that the invention may be practiced without these specific details. In other instances, well-known features have not been described in detail to avoid unnecessarily complicating the description.
- In the following description of
FIGS. 1-9 , any component described with regard to a figure, in various embodiments of the invention, may be equivalent to one or more like-named components described with regard to any other figure. For brevity, descriptions of these components will not be repeated with regard to each figure. Thus, each and every embodiment of the components of each figure is incorporated by reference and assumed to be optionally present within every other figure having one or more like-named components. Additionally, in accordance with various embodiments of the invention, any description of the components of a figure is to be interpreted as an optional embodiment which may be implemented in addition to, in conjunction with, or in place of the embodiments described with regard to a corresponding like-named component in any other figure. - In general, embodiments of the invention relate to routing packets between hosts or virtual machines in
different layer 2 domains. More specifically, embodiments of the invention relate to using overlay routing mechanisms in an Internet Protocol (IP) fabric to enable communication between hosts or virtual machines indifferent layer 2 domains to communication. The overlay routing mechanisms may include direct routing (see e.g.,FIGS. 4 and 5A-5E ), indirect routing (see e.g.,FIGS. 6 and 7 ), naked routing (see e.g.,FIGS. 8 and 9 ), or a combination thereof (e.g., hybrid routing). - In one embodiment of the invention, the overlay routing mechanisms use, at least in part, the VXLAN protocol. One version of the VXLAN protocol is defined in the document entitled “VXLAN: A Framework for Overlaying Virtualized
Layer 2 Networks overLayer 3 Networks” version 09 dated April 2014. The VXLAN protocol is hereby incorporated by reference in its entirety. The invention is not limited to a particular version of VXLAN. - In one embodiment of the invention, a
layer 2 domain is defined as the set of virtual machines and/or hosts (also referred to as servers) that communicate using the same virtual network identifier (VNI), where the VNI is defined by the VXLAN protocol (see e.g.,FIG. 2 ). The VNI scopes the MAC frame originated by the virtual machine (or host) such that the MAC frame may only be received by destinations (hosts or virtual machines) associated with the same VNI. - In the following description all references to specific MAC addresses, e.g., ToR switch MAC, refer to a MAC address associated with a specific component in system, e.g., a virtual machine, a server, a ToR Switch, a Spine switch, etc. but should not be interpreted to mean that such component only has one such MAC address. Rather, in various embodiments of the invention, one or more of the aforementioned components may be associated with multiple MAC addresses.
- In the following description all references to specific IP addresses, e.g., SS IP address, refer to a MAC address associated with a specific component in system, e.g., a virtual machine, a server, a ToR Switch, a Spine switch, etc. but should not be interpreted to mean that such component only has one such IP address. Rather, in various embodiments of the invention, one or more of the aforementioned components may be associated with multiple IP addresses.
-
FIG. 1 shows a system in accordance with one or more embodiments of the invention. The system includes one or more servers (100A-100I), a leaf tier (108), a spine tier (116), and one or more routers (118, 120). The leaf tier and the spine tier may be collectively referred to as the IP Fabric. Further, all of the aforementioned components may be co-located in the same physical location. Alternatively, the aforementioned components may not all be co-located. Additional details regarding each of the aforementioned components are provided below. - In one embodiment of the invention, a server (also referred to as a host) (100A-100I) is a computer system. A computer system may include any type of physical system that is configured to generate, send, receive, and/or process MAC frames (see e.g.,
FIG. 4-9 ). In addition, each of the servers may include or be configured to execute one or more virtual tunnel end points (VTEP) VTEPs (seeFIG. 3 ). The computer system may also include functionality to execute one or more virtual machines, where each virtual machine may be configured to generate, send, receive, and/or process MAC frames. In one embodiment of the invention, each virtual machine corresponds to an execution environment that is distinct from the execution environment provided by the server upon which it is executing. Examples of virtual machines include, but are not limited to, Oracle® VM and VMware® Virtual Server. (Oracle is a registered trademark of Oracle International Corporation and VMware is a registered trademark of VMware, Inc.). The computer system may include a processor, memory, and one or more physical network interfaces. - Each server is directly connected to at least one Top of Rack (ToR) switch (102, 104, 106) in the leaf tier (108). In one embodiment of the invention, each server is only directly connected to a single ToR switch in the leaf tier (108). In one embodiment of the invention, the ToR switches in leaf tier (108) are not directly connected to each other. Alternatively, if the ToR switches implement Multichassis Link Aggregation (MLAG), then a given ToR switch may be directly connected to one other ToR switch in the leaf tier and a given server may be connected to each of the ToR switches in the MLAG domain. Each of the ToR switches may include or be configured to execute one or more virtual tunnel end points (VTEP) VTEPs (see
FIG. 3 ). - Each ToR switch in the leaf tier (108) is connected to at least one spine switch (110, 112, 114) in the spine tier (116). In one embodiment of the invention, each ToR switch is connected to every other switch in the spine tier. Further, in one embodiment of the invention, the spine switches in the spine tier (116) are not directly connected to each other. Alternatively, if the spine switches implement Multichassis Link Aggregation (MLAG), then a given spine switch may be directly connected to one other spine switch in the spine tier.
- In one embodiment of the invention, each leaf switch and each spine switch is a physical device that includes persistent storage, memory (e.g., Random Access Memory), one or more processors, and two or more physical ports. Each port may be connected to either: (i) a computer system (described above), or (ii) a network device (i.e., any device that is part of the network infrastructure such as a leaf switch, a spine switch or a router). Each switch (leaf switch and spine switch) is configured to receive VXLAN frames and/or MAC frames via the ports and determine whether to process the VXLAN and/or MAC frames in accordance with the methods described below in
FIGS. 4, 6, and 8 . - Continuing discussion of
FIG. 1 , the spine switches may be directly connected to one or more routers (118, 120) or may be indirectly connected to one or more routers (seeFIG. 3 ). In the latter scenario, the spine switches may be connected to one or more edge switches (not shown inFIG. 1 ) that, in turn, are directly connected to one or more routers (118, 120). - In one embodiment of the invention, the routers (118, 120) are configured to receive MAC frames from other networks (e.g., the Internet) and route the MAC frames towards the appropriate server (100A-100I). In one embodiment of the invention, each router includes a number of physical ports (hereafter ports) and is configured to receive MAC frames via the ports and determine whether to (i) drop the MAC frame, or (ii) send the MAC frame out over another one of the ports on the switch. The router uses the destination internet protocol (IP) address in the received MAC frame along with a routing table to determine out of which port to send the MAC frame.
-
FIG. 2 shows a VXLAN frame in accordance with one or more embodiments of the invention. The VXLAN frame (200) includes: (i) a MAC frame (208), (ii) a VXLAN header (206), (iii) an outer IP header (204), and (iv) an outer Ethernet header (202). Each of the aforementioned components is described below. - In one embodiment of the invention, the MAC frame (210) is generated by a source host or virtual machine and may include an inner header (234) and a payload (222). The payload (222) may include the content that the source host or virtual machine is attempting to transmit to the destination host or virtual machine. The inner header (234) includes an inner Ethernet header (218) and an inner IP header (220). The inner Ethernet header (218) includes a source MAC address (224), a destination MAC address (226). The inner IP header (220) includes a source IP address (228) and a destination IP address (230). The MAC frame may include other information/content without departing from the invention.
- In one embodiment of the invention, the VXLAN header (206) may include, but is not limited to, a virtual network identifier (VNI). The VNI scopes the MAC frame (208) originated by the host or virtual machine such that the MAC frame (208) may only be received by destination servers or virtual machines associated (via a VTEP) with the same VNI. The VXLAN header may include other information/content without departing from the invention.
- In one embodiment of the invention, the outer Ethernet header (202) and the outer IP header (204) are used to route the VXLAN frame from the source VTEP to the destination VTEP. To this end, the outer Ethernet header (302) includes the source MAC address (210) and the next hop MAC address (212) and the outer IP header (204) includes the source VTEP IP address (214) and the destination VTEP IP address (216). The aforementioned mentioned components may include other information/content without departing from the invention. The outer Ethernet header (202), the outer IP header (204), and the VXLAN header (306) may be collectively referred to as an outer header (232).
- The VXLAN frame may include other components without departing from the invention.
-
FIG. 3 shows an exemplary system in accordance with one or more embodiments of the invention. The invention is not limited to the system shown inFIG. 3 . Turning toFIG. 3 , the system includes two servers (Server S1 and Server S2), where each of the servers includes two virtual machines and a VTEP. Specifically, Server S1 includes virtual machine A1, virtual machine B1, andVTEP 5, and Server S2 includes virtual machine A2, virtual machine B2, andVTEP 6. - In one embodiment of the invention, each server and virtual machine is associated with its own Internet Protocol (IP) address and its own media access control (MAC) address. Further, each VTEP on a server (e.g., S1) is associated with the IP address and MAC address of the server on which it is located. Further, each VTEP includes functionality to generate VXLAN frames and process received VXLAN frames, in accordance with the VXLAN protocol, as described in
FIGS. 4A-9 . Each VTEP may be implemented as a combination of software and storage (volatile and/or persistent storage). Alternatively, each VTEP may be implemented as a combination of hardware and storage (volatile and/or persistent storage). In another alternative, each VTEP may be implemented as a combination of hardware and software. - In the example shown in
FIG. 3 , each server is associated with two VXLANs. Specifically, virtual machine A1 and virtual machine A2 are associated with VXLAN A and virtual machine B1 and virtual machine B2 are associated with VXLAN B. VXLAN A and VXLAN B are distinct VXLANs and, as such, are associated with separate VNIs. - Continuing with the discussion of
FIG. 3 , server S1 is directly connected to ToRswitch ToR switch 1 and server S2 is directly connected to ToRswitch ToR switch 4. In this example, each server is only connected to a single ToR switch. Each ToR switch (ToR switch 1-ToR switch 4) includes a VTEP (VTEP 1-4). Each of the ToR switches is directly connected to every other spine switch (Spine Switch 1-3) in the spine tier. Each of the spine switches is, in turn, directly connected to an edge switch, where the edge switch includes a VTEP (VTEP 7). Finally, the edge switch is directly connected to a router. In one embodiment of the invention, the each VTEP on a ToR (e.g., ToR switch 1) is associated with the IP address and MAC address of the ToR on which it is located. - The aforementioned system is used below to describe various embodiments of the invention. Specifically, the aforementioned system is used to illustrate the different embodiments of overlay routing. However, the invention is not limited to the system shown in
FIG. 3 . -
FIGS. 4A-4B show flowcharts in accordance with one or more embodiments of the invention. While the various steps in the flowchart are presented and described sequentially, one of ordinary skill will appreciate that some or all of the steps may be executed in different orders, may be combined or omitted, and some or all of the steps may be executed in parallel. In one embodiment of the invention, the steps shown inFIGS. 4A-4B may be performed in parallel with any other steps shown inFIGS. 6 and 8 without departing from the invention. - Turning to
FIGS. 4A and 4B ,FIGS. 4A and 4B show a method for direct overlay routing in accordance with one or more embodiments of the invention. The following discussion of direct overlay routing is described in relation to the system inFIG. 3 ; however, embodiments of the invention are not limited to the system shown inFIG. 3 . - The method shown in
FIGS. 4A-4B describe direct overlay routing to enable virtual machine A1 (hereafter referred to as a source VM) in VXLAN A to communicate with virtual machine B2 (hereafter referred to as a destination VM) on VXLAN B. From the perspective of virtual machine A1, virtual machine A1 is not aware of the VXLAN protocol or of any overlay routing mechanisms; rather, virtual machine A1 operates as if it can communicate directly with virtual machine B2 using conventional routing mechanisms. - In
step 400, the source VM issues an ARP request using the VARP IP address that is associated with VXLAN A. Prior to issuing the ARP request instep 400, the VARP IP address is set as the default gateway address for the overlay network. A ToR switch implementing one or more embodiments of the invention (e.g., a ToR Switch in the leaf tier (as discussed above)), receives the ARP request and subsequently generates an ARP response that includes the VARP MAC address. In one embodiment of the invention, the ToR switch that sent the ARP response is the ToR Switch that is directly connected to the source server upon which the source VM is executing. - In one embodiment of the invention, each ToR switch includes a VARP IP address configured on each switch virtual interface (SVI) for every
layer 2 domain with which the ToR switch is associated. For example, if the ToR switch is associated with VXLAN A and VXLAN B, then the VARP IP address assigned to the SVI for VXLAN A may be 192.168.1.1 and VARP IP address assigned to the SVI for VXLAN B may be 192.168.2.1. Each ToR Switch includes a VARP IP address to VARP MAC address mapping, such that when an Address Resolution Protocol (ARP) request includes any VARP IP address, the VARP MAC address is returned in the ARP response. There may be one VARP MAC address for eachlayer 2 domain. - In one embodiment of the invention, the VARP MAC address corresponds to the MAC address that hosts (or virtual machines) use to send MAC frames that require routing. Accordingly, when a TOR switch receives a MAC frame that includes a VARP MAC address as the destination address, the ToR Switch removes the Ethernet header from the MAC frame and determines the next hop for the IP packet (i.e., IP header and payload).
- In
Step 402, the source VM receives the VARP MAC address (via the ARP response). InStep 404, the source VM generates a source MAC frame that includes, at least, (i) the source VM MAC address as the source MAC address, (ii) the VARP MAC address as the destination MAC address, (iii) VM A1 IP address as the source IP address, and (iv) VM B2 IP address as the destination IP address. InStep 406, the source MAC frame (generated in Step 404) is transmitted towards a virtual switch (also referred to as vswitch) and/or hypervisor on the source server. - In
Step 408, the source server's vswitch receives the aforementioned source MAC frame. The source server is the server upon which the source VM is executing. Further, the source server is executing a VTEP (e.g., in a hypervisor). The source server may also be executing a virtual switch (vswitch). Instep 410, a lookup is performed in the vswtich MAC table using the VARP MAC address. The result of the lookup is identification of a VXLAN binding that indicates the VTEP corresponding to the VARP MAC address is the VTEP associated with a VARP VTEP IP address. A second lookup is then performed using the source server's physical IP routing table. The result of the second lookup is a determination that the VARP VTEP IP address matches the default route in the physical IP routing table. The default route indicates that the next hop for the VXLAN frame is the ToR Switch directly connected to the SS. More specifically, the default route includes the IP address of the ToR Switch. InStep 412, the MAC address for the ToR Switch (identified in Step 410) is determined. More specifically, a determination is made that the source server's network interface card is configured with a subnet that includes an IP address of the ToR Switch. The IP address of the ToR switch is used to obtain the MAC address of the ToR Switch. The MAC address of the ToR Switch may be determined using, for example, ARP, if the MAC address is not already present in an ARP table on the source server. - In
step 414, the SS VTEP encapsulates the source MAC frame within a VXLAN frame (see e.g.,FIG. 2 ). More specifically, the VXLAN frame includes an outer header with the following information: a MAC address of the source server (as the source MAC address), a MAC address of the ToR Switch (as the destination MAC address) (i.e., the MAC address obtained in Step 412), an IP address of the SS (as the source IP address), the VARP VTEP IP address (as the destination IP address), and VNI A (i.e., the VNI associated with VXLAN A). Instep 416, the SS transmits the VXLAN frame to the ToR switch that is directly connected to the SS. - Referring to
FIG. 4B , instep 420, the VTEP on the ToR Switch receives the VXLAN frame and removes the outer header (see e.g., 232 inFIG. 2 ) to obtain the MAC frame. In one embodiment of the invention, the received VXLAN frame is trapped and decapsulated because the VXLAN frame includes the ToR switch MAC address as the destination MAC address in the outer Ethernet header and includes the VARP VTEP IP address as the destination IP address in the outer IP header. Instep 422, the ToR Switch processes the MAC frame in order to obtain a rewritten MAC frame. More specifically, in one embodiment of the invention, the ToR Switch performs a routing function using the VM B2 IP address in order to determine that the ToR switch is directly connected (from an IP point of view) to VM B2. In one embodiment of the invention, ToR switch routes the MAC frame as it is operating as a default gateway. Based on this determination, the VM B2 MAC address is obtained. In one embodiment of the invention, ARP may be used to obtain the VM B2 MAC address. In one embodiment of the invention, the ToR switch includes a routing table entry for each subnet that includes servers connected to the leaf tier and for each subnet that includes virtual machines, (see e.g.,FIG. 3 ). In one embodiment of the invention, the ToR switch includes two routing tables: one for the overlay network, and one for the underlay network. The underlay routing table includes a route for each subnet of servers or other equipment attached to the leaf tier, and one or more routes (possibly including a default route) pointing towards external network elements. The overlay routing table includes information about the IP segments carried by eachlayer 2 domain. In another embodiment, there is only one routing table that includes both underlay network and overlay network routes. In another embodiment, there is one underlay routing table and a number of overlay routing tables (e.g., one overlay routing table per routing domain, which possibly correspond to different tenants in a multi-tenant data center). - Continuing with the discussion of
FIG. 4B , in the instant example, the inner MAC frame received in the VXLAN frame instep 420 is rewritten to remove the ToR Switch MAC address as the destination MAC address and to replace it with the VM B2 MAC address. Further, the source MAC address in the inner MAC frame may be replaced with VARP MAC address. (See e.g.,FIG. 5D ). - Continuing with the discussion of
FIG. 4B , in step 424, the VTEP on the ToR Switch encapsulates the rewritten MAC frame (obtained in step 422) in a VXLAN frame. More specifically, the VXLAN frame includes an outer header with the following information: ToR switch MAC address (as the source MAC address), a MAC address of next hop (as the destination MAC address), a VARP VTEP IP address (as the source IP address), an IP address of server S2 (as the destination IP address), and VNI B (i.e., the VNI associated with VXLAN B). The destination IP address in the outer header corresponds to a destination server (i.e., server S2) that includes the VTEP that will decapsulate the VXLAN frame generated in step 424. The destination IP address may be determined using the VM B2 IP address. Finally, VNI B is included in the VXLAN frame because VM B2 is associated with VNI B and, as such, VNI B is required to be included for VM B2 to ultimately receive the MAC frame generated instep 422. - Continuing with the discussion on
FIG. 4B , instep 426, the VXLAN frame generated instep 416, is transmitted, via the IP Fabric, to the VTEP on server S2. The VXLAN frame is transmitted in accordance with standard IP routing mechanisms through the IP fabric until it reach server S2. In this example, the VXLAN frame may be transmitted tospine switch 2 andspine switch 2 may subsequently transmit the VXLAN frame toToR switch 4. Upon receipt of the VXLAN frame,ToR switch 4 may subsequently transmit the VXLAN frame to server S2. Those skilled in the art will appreciate the outer Ethernet header of the VXLAN frame is rewritten at each hop in the IP fabric until it reaches server S2. - In
step 428, the VTEP on the server S2 receives the VXLAN frame fromToR switch 4 and removes the outer header (see e.g., 232 inFIG. 2 ) to obtain the MAC frame (generated in Step 408). Instep 430, the VTEP on Server S2 bridges (i.e., sends using the destination MAC address in the MAC frame) the MAC frame to virtual machine B2. VM B2 subsequently processes the MAC frame and extracts the payload. -
FIG. 5A shows an exemplary path of a payload transmitted using direct overlay routing in accordance with one or more embodiments of the invention. More specifically,FIG. 5A shows an exemplary path the payload from VM A1 may take to reach VM B2. The exemplary path tracks the path described inFIGS. 4A-4B . The components shown inFIG. 5A correspond to like named components inFIG. 3 andFIGS. 4A-4B . Turning toFIG. 5A , when the initial VXLAN frame (which encapsulated the initial MAC frame including the payload) is transmitted by server S1 toToR switch 1 switch, the VXLAN frame is transmitted on VXLAN A. The initial VXLAN frame is generated in accordance withFIG. 4A .FIG. 5B shows a source MAC frame (500) generated in accordance withFIG. 4A andFIG. 5C shows a VXLAN frame (502) generated in accordance withFIG. 4A . Continuing with the discussion ofFIG. 5A , atToR switch 1 switch, after the routing of the MAC frame (seeStep 422 inFIG. 4B ), the new resulting MAC frame (seeFIG. 5D, 504 ) is encapsulated into a new VXLAN frame (seeFIG. 5E, 506 ) and transmitted towards server S2. The new VXLAN frame is transmitted on VXLAN B. Embodiments of the invention enableToR switch 1 switch to take a MAC frame received via one VXLAN and transmit the MAC frame (a portion of which is rewritten) in via a separate VXLAN. In one embodiment of the invention, this functionality is achieved by first routing the MAC frame and then generating and IP forwarding the VXLAN frame. -
FIG. 6 shows a flowchart in accordance with one or more embodiments of the invention. While the various steps in the flowchart are presented and described sequentially, one of ordinary skill will appreciate that some or all of the steps may be executed in different orders, may be combined or omitted, and some or all of the steps may be executed in parallel. In one embodiment of the invention, the steps shown inFIG. 6 may be performed in parallel with any other steps shown inFIGS. 4A, 4B and 8 without departing from the invention. - Turning to
FIG. 6 ,FIG. 6 shows a method for indirect overlay routing in accordance with one or more embodiments of the invention. The following discussion of indirect overlay routing is described in relation to the system inFIG. 3 ; however, embodiments of the invention are not limited to the system shown inFIG. 3 . - The method shown in
FIG. 6 describes indirect overlay routing to enable virtual machine A1 in VXLAN A to communicate with virtual machine B2 on VXLAN B. From the perspective of virtual machine A1, virtual machine A1 is not aware of the VXLAN protocol or of any overlay routing; rather, virtual machine A1 operates as if it can communicate directly with virtual machine B2 using conventional routing mechanisms. - The generation of the VXLAN frame that is transmitted from the source server to a ToR switch is performed in accordance with
FIG. 4A . However, instead of single VARP MAC address and a single VARP VTEP IP address for all ToR switches, there are multiple VARP MAC addresses and VARP VTEP IP addresses, where different VARP MAC addresses and VARP VTEP IP addresses are used fordifferent layer 2 domains. Accordingly, the specific VARP MAC address and VARP VTEP IP address pair that is present in a given VXLAN Frame may vary based upon thelayer 2 domain in which source VM and source server are located. Said another way, because different ToR switches route in in and out ofdifferent layer 2 domains of VXLAN, it is essential that the VXLAN frames issued by the source server reach the appropriate ToR Switch (i.e., the ToR switch that has the appropriate routing information). This is enabled by using distinct VARP VTEP IP address and VARP MAC address combinations. - Continuing with
FIG. 6 , instep 600, the VTEP onToR switch 2 receives the VXLAN frame and removes the outer header (see e.g., 232 inFIG. 2 ) to obtain the MAC frame. The VXLAN frame received inStep 600 includes a VARP VTEP IP address for VXLAN A (as the outer destination IP address) and a VARP MAC address for VXLAN A (as the inner header destination MAC address). In one embodiment of the invention, the ToR Switch referenced instep 600 receives, traps, and decapsulates the VXLAN frame because the VXLAN frame includes the ToR switch MAC address of the ToR switch as the destination MAC address in the outer Ethernet header and includes the VARP VTEP IP address as the destination IP address in the outer IP header. - In one embodiment of the invention, prior to the generation of the aforementioned MAC frame, VM A1 is configured to use a VARP VTEP IP address as the default gateway, which is implemented on
ToR Switch 2 and other ToR Switches thereby providing active-active redundancy. Further,ToR switch 2 is also associated with a specific VARP MAC address, which in combination with the aforementioned VARP VTEP IP address, enables the VXLAN frame transmitted by the source server to reachToR switch 2. - Continuing with the discussion of
FIG. 6 , instep 602, theToR switch 2 processes the MAC frame in order to obtain a rewritten MAC frame. More specifically, in one embodiment of the invention, theToR switch 2 performs a routing function using the VM B2 IP address in order to determine thatToR switch 2 is directly connected to ToR Switch 3 (from an IP point of view). Based on this determination, the next hop MAC address for the MAC frame is obtained, which in this example is the MAC address ofToR Switch 3. - In one embodiment of the invention, the IP fabric includes a
dedicated layer 2 network (with a dedicated VNI) interconnecting all ToR switch routing functions thereby enabling the ToR switches to exchange information (e.g., using interior gateway protocol (IGP)) about which ToR switch provides routes to which overlay subnet(s). - For purposes of this explanation, assume that the routing table in
ToR switch 2 includes a route table entry specifying a route to the appropriate ToR switch from which VM B2 may be accessed. Further, assume that the routing table entry indicates that VM B2 is reachable viaToR switch 3. Accordingly, the MAC frame received in the VXLAN frame instep 600 is rewritten to remove the VARP MAC address as the destination MAC address and to replace it with theToR switch 3 MAC address. Further, the source MACaddress ToR Switch 2 MAC address. - Continuing with the discussion of
FIG. 6 , instep 604, the VTEP onToR Switch 2 encapsulates the rewritten MAC frame (obtained in step 602) in a VXLAN frame. More specifically, the VXLAN frame includes an outer header with the following information: a MAC address of the ToR Switch 2 (e.g.,ToR switch 2 router MAC address) (as the source MAC address), a MAC address of the next hop (e.g., the MAC address Spine Tier Switch 2) (as the destination MAC address), an IP address of ToR switch 2 (as the source IP address) (e.g.,ToR switch 2 VTEP IP address), an IP address of ToR switch 3 (as the destination IP address) (e.g.,ToR switch 2 VTEP IP address), and VNI C (i.e., the VNI associated with VXLAN C). The destination IP address in the outer header corresponds to ToR that includes the VTEP that will decapsulate the VXLAN frame generated instep 604. The destination VTEP may be determined using the VM B2 IP address. Finally, VNI C is included in the VXLAN frame becauseToR switch 3 is associated with VNI C and, as such, VNI C is required to be included forToR switch 3 to ultimately receive the MAC frame generated instep 604. - Continuing with the discussion on
FIG. 6 , instep 606, the VXLAN frame generated instep 604 is transmitted, via the IP Fabric toToR switch 3. The VXLAN frame is forwarded in accordance with standard IP routing mechanisms through the IP fabric until it reachesToR switch 3. In this example, the VXLAN frame may be transmitted tospine switch 2 andspine switch 2 may subsequently route the VXLAN frame toToR switch 3. Those skilled in the art will appreciate the outer Ethernet header of the VXLAN frame is rewritten at each hop it traverses in the IP Fabric. - In
step 608, the VTEP onToR switch 3 receives the VXLAN frame fromToR switch 2 and removes the outer header (see e.g., 232 inFIG. 2 ) to obtain the MAC frame (generated in Step 602).ToR switch 3 subsequently processes the MAC frame in order to obtain a rewritten MAC frame. More specifically, in one embodiment of the invention,ToR switch 3 performs a routing function using the VM B2 IP address in order to obtain the VM B2 MAC address. In one embodiment of the invention,ToR switch 3 includes a routing table, where the routing table includes a routing table entry forVM 2. Accordingly, in the instant example, the MAC frame received in the VXLAN frame instep 608 is rewritten to remove theToR switch 3 MAC address as the destination MAC address and to replace it with the VM B2 MAC address. Further, the source MAC address in the inner frame is VARP MAC address for VXLAN B. - Continuing with the discussion of
FIG. 6 , instep 610, the VTEP onToR switch 3 encapsulates the rewritten MAC frame (obtained in step 608) in a VXLAN frame. More specifically, the VXLAN frame includes an outer header with the following information: a MAC address of ToR switch 3 (as the source MAC address) (e.g.,ToR switch 3 router MAC address), a MAC address of the next hop (e.g., Spine Tier Switch 3 (as the destination MAC address), a VARP VTEP IP address for VXLAN B (as the source IP address), an IP address of server S2 (as the destination IP address), and VNI B (i.e., the VNI associated with VXLAN B). The destination IP address in the outer header corresponds to server S2, which includes the VTEP that will decapsulate the VXLAN frame generated instep 610. The destination VTEP may be determined using the VM B2 IP address. Finally, VNI B is included in the VXLAN frame because VM B2 is associated with VNI B and, as such, VNI B is required to be included for VM B2 to ultimately receive the MAC frame generated instep 610. In one embodiment of the invention, theToR switch 3 MAC address may be used in place of the VARP MAC address and theToR switch 3 IP address may be used in place of the VARP VTEP IP address. - Continuing with the discussion on
FIG. 6 , instep 612, the VXLAN frame generated instep 610, is transmitted, via the IP Fabric, to server S2. The VXLAN frame is routed in accordance with standard IP routing mechanisms through the IP fabric until it reaches server S2. In this example, the VXLAN frame may be transmitted tospine switch 3 andspine switch 3 may subsequently route the VXLAN frame toToR switch 4. Upon receipt of the VXLAN frame,ToR switch 4 may subsequently route the VXLAN frame to server S2. Those skilled in the art will appreciate the outer Ethernet header of the VXLAN frame is rewritten at each hop it traverses in the IP Fabric. - In
step 614, the VTEP on the server S2 receives the VXLAN frame fromToR switch 4 and removes the outer header (see e.g., 232 inFIG. 2 ) to obtain the MAC frame (generated in Step 608). Instep 616, the VTEP on server S2 bridges (i.e., sends using the destination MAC address in the MAC frame) the MAC frame to VM B2. VM B2 subsequently processes the MAC frame and extracts the payload. - In one or more embodiments of the invention, unlike the direct overlay routing embodiment, each of the ToR switches in the leaf tier only include routing table entries for a subset of servers and/or virtual machines. However, each of the ToR switches includes routing table entries for each of the other ToR switches, where the routing table entries indicate to which subset of servers and/or virtual machines may be directly routed to by a given ToR switch. The ToR switches share the aforementioned routing information, for example, using interior gateway protocol (IGP). In addition, unlike the direct overlay routing embodiment, the indirect overlay routing embodiment uses a
separate layer 2 domain for ToR switch-to-ToR switch communication. -
FIG. 7 shows an exemplary path of a payload transmitted using indirect overlay routing in accordance with one or more embodiments of the invention. More specifically,FIG. 7 shows an exemplary path the payload from VM A1 may take to reach VM B2. The exemplary path tracks the path described inFIG. 6 . The components shown inFIG. 7 correspond to like named components inFIG. 3 andFIG. 6 . Turning toFIG. 7 , when the initial VXLAN frame (which encapsulated the initial MAC frame including the payload) is routed by server S1 (via ToR Switch 1) toToR switch 2, the VXLAN frame is transmitted on VXLAN A. The initial VXLAN frame is generated in accordance withFIG. 4A as described above with respect toFIGS. 4A and 6 . AtToR switch 2, after the routing of the MAC frame (seeStep 602 inFIG. 6 ), the new resulting MAC frame is encapsulated into a new VXLAN frame and routed to ToR switch 3 (via a spine tier switch). The new VXLAN frame is transmitted on VXLAN C. After receiving the VXLAN frame fromToR switch 2,ToR switch 3 routes the MAC frame (seeStep 608 inFIG. 6 ). The new resulting MAC frame is encapsulated into a new VXLAN frame and transmitted to server S2 on VXLAN B. - Embodiments of the invention enable
ToR switch 2 andToR switch 3 to take a MAC frame received via one VXLAN and transmit the MAC frame (a portion of which is rewritten) and transmit it in via a separate VXLAN. In one embodiment of the invention, this functionality is achieved by first routing the MAC frame and then forwarding the VXLAN frame. -
FIG. 8 shows a flowchart in accordance with one or more embodiments of the invention. While the various steps in the flowchart are presented and described sequentially, one of ordinary skill will appreciate that some or all of the steps may be executed in different orders, may be combined or omitted, and some or all of the steps may be executed in parallel. In one embodiment of the invention, the steps shown inFIG. 8 may be performed in parallel with any other steps shown inFIGS. 4A-4B and 6 without departing from the invention. - Turning to
FIG. 8 ,FIG. 8 shows a method for naked overlay routing in accordance with one or more embodiments of the invention. The following discussion of indirect overlay routing is described in relation to the system inFIG. 3 ; however, embodiments of the invention are not limited to the system shown inFIG. 3 . - The method shown in
FIG. 8 describes naked overlay routing to enable virtual machine A1 in VXLAN A to communicate with virtual machine B2 on VXLAN B. From the perspective of virtual machine A1, virtual machine A1 is not aware of the VXLAN protocol or of any overlay routing; rather, virtual machine A1 operates as if it can communicate directly with virtual machine B2 using conventional routing mechanisms. The generation of the VXLAN frame that is transmitted from the source server to a ToR switch is performed in accordance withFIG. 4A . However, instead of single VARP MAC address and a single VARP VTEP IP address for all ToR switches, there are multiple VARP MAC addresses and VARP VTEP IP addresses, where different VARP MAC addresses and VARP VTEP IP addresses are used fordifferent layer 2 domains. Accordingly, the specific VARP MAC address and VARP VTEP IP address pair that is present in a given VXLAN Frame may vary based upon thelayer 2 domain in which source VM and source server are located. Said another way, because different ToR switches route in in and out ofdifferent layer 2 domains of VXLAN, it is essential that the VXLAN frames issued by the source server reach the appropriate ToR Switch (i.e., the ToR switch that has the appropriate routing information). This is enabled by using distinct VARP VTEP IP address and VARP MAC address combinations. - Continuing with the discussion of
FIG. 8 , instep 800, the VTEP onToR switch 2 receives the VXLAN frame and removes the outer header (see e.g., 232 inFIG. 2 ) to obtain the MAC frame. The VXLAN frame received inStep 600 includes a VARP VTEP IP address for VXLAN A and a VARP MAC address for VXLAN A. In one embodiment of the invention, the ToR Switch referenced instep 800 receives, traps, and decapsulates the VXLAN frame because the VXLAN frame includes the ToR MAC address of the ToR Switch as the destination MAC address in the outer Ethernet header and includes the VARP VTEP IP address for VXLAN A as the destination IP address in the outer IP header. - In one embodiment of the invention, prior to the generation of the aforementioned MAC frame, VM A1 is configured to use a VARP VTEP IP address for VXLAN A as the default gateway, which is implemented on
ToR switch 2 and other ToR switches thereby providing active-active redundancy. Further, ToRswitch ToR switch 2 is associated with a specific VARP MAC address, which in combination with the aforementioned VARP VTEP IP address enables the VXLAN frame transmitted by the source server to reach ToRswitch ToR switch 2. - Continuing with the discussion of
FIG. 8 , instep 802, the MAC frame is routed, via the IP fabric, to a ToR switch from which VM B2 may be reached. For purposes of this explanation, assume VM B2 may be reached viaToR switch 3. Further, assume that the routing table inToR switch 2 includes a routing table entry specifying a route determined using VM B2 IP address, where the routing table entry indicates that VM B2 is reachable viaspine switch 2. Accordingly, the MAC frame received in the VXLAN frame instep 800 is rewritten to remove the VARP MAC address as the destination MAC address and to replace it with thespine switch 2 MAC address. Further, theToR switch 2 MAC address is included as the source MAC address in the rewritten MAC frame. The rewritten MAC frame is subsequently transmitted tospine switch 2.Spine switch 2, upon receipt of the rewritten MAC frame, performs a routing function using the VM B2 IP address and determines that the next hop isToR switch 3.Spine switch 2 rewrites the MAC frame it received to remove theSpine switch 2 MAC address as the destination MAC address and to replace it with theToR switch 3 MAC address. The rewritten MAC frame is subsequently transmitted toToR switch 3. - Continuing with the discussion of
FIG. 8 , instep 804, theToR switch 3 receives the MAC frame fromspine switch 2. InStep 806,ToR switch 3 processes the MAC frame in order to obtain a rewritten MAC frame. More specifically, in one embodiment of the invention,ToR switch 3 performs a routing function using the VM B2 IP address in order to obtain the VM B2 MAC address. In one embodiment of the invention,ToR switch 3 includes a routing table, where the routing table includes a routing table entry forVM 2. Accordingly, in the instant example, the MAC frame received instep 804 is rewritten to remove theToR switch 3 MAC address as the destination MAC address and to replace it with the VM B2 MAC address. Further, the source MAC address in the inner frame is VARP MAC address for VXLAN B. - Continuing with the discussion of
FIG. 8 , instep 808, the VTEP onToR switch 3 encapsulates the rewritten MAC frame (obtained in step 806) in a VXLAN frame. More specifically, the VXLAN frame includes an outer header with the following information: a MAC address of ToR switch 3 (as the source MAC address), a MAC address of the next hop (i.e., MAC address of Server S2) (as the destination MAC address), a VARP VTEP IP address for VXLAN B (as the source IP address), an IP address of server S2 (as the destination IP address), and VNI B (i.e., the VNI associated with VXLAN B). The destination IP address in the outer header corresponds to the server that includes the VTEP that will decapsulate the VXLAN frame generated instep 808. The destination server (i.e., server S2) may be determined using the VM B2 IP address. Finally, VNI B is included in the VXLAN frame because VM B2 is associated with VNI B and, as such, VNI B is required to be included for VM B2 to ultimately receive the MAC frame generated instep 808. In one embodiment of the invention, theToR switch 3 MAC address may be used in place of the VARP MAC address and theToR switch 3 IP address may be used in place of the VARP VTEP IP address. - Continuing with the discussion on
FIG. 8 , instep 810, the VXLAN frame generated instep 808, is transmitted, via the IP fabric, to the VTEP on server S2. The VXLAN frame is routed in accordance with standard IP routing mechanisms through the IP fabric until it reaches server S2. In this example, the VXLAN frame may be transmitted tospine switch 2 andspine switch 2 may subsequently transmit the VXLAN frame toToR switch 4. Upon receipt of the VXLAN frame,ToR switch 4 may subsequently transmit the VXLAN frame to server S2. Those skilled in the art will appreciate the outer Ethernet header of the VXLAN frame is rewritten at hop is traversed in the IP Fabric. - In
step 812, the VTEP on the server S2 receives the VXLAN frame fromToR switch 4 and removes the outer header (see e.g., 232 inFIG. 2 ) to obtain the MAC frame (generated in Step 806). Instep 814, the VTEP on server S2 bridges (i.e., sends using the destination MAC address in the MAC frame) the MAC frame to VM B2. VM B2 subsequently processes the MAC frame and extracts the payload. - Naked overlay routing is similar to indirect overlay routing in that the payload from VM A1 traverses the same number of switches in both of the aforementioned embodiments of overlay routing. However, naked overlay routing does not require the
additional layer 2 domain from the ToR switches. Instead, naked overlay routing requires the participation of the spine switches, where the spine switches have knowledge (via their routing tables) about whichlayer 2 domains are accessible by each ToR. In contrast, in the indirect overlay routing embodiment, the spine switches are not aware of whichlayer 2 domains are accessible by each ToR. -
FIG. 9 shows an exemplary path of a payload transmitted using of naked overlay routing in accordance with one or more embodiments of the invention. More specifically,FIG. 9 shows an exemplary path the payload from VM A1 may take to reach VM B2. The exemplary path tracks the path described inFIG. 8 . The components shown inFIG. 9 correspond to like named components inFIG. 3 andFIG. 8 . Turning toFIG. 9 , when the initial VXLAN frame (which encapsulated the initial MAC frame including the payload) is transmitted by server S1 toToR switch 2, the VXLAN frame is transmitted on VXLAN A. The initial VXLAN frame is generated in accordance withFIGS. 4A and 8 . AtToR switch 2, the MAC frame is routed (without VXLAN) toToR switch 3 via a spine tier switch. After receiving the MAC frame from the spine tier switch,ToR switch 3 routes the MAC frame (seeStep 808 inFIG. 8 ). The new resulting MAC frame is encapsulated into a new VXLAN frame and transmitted to server S2 on VXLAN B. - Embodiments of the invention enable
ToR switch 2 to take a MAC frame received via one VXLAN and transmit the MAC frame (a portion of which is rewritten) and transmit it via a separate VXLAN. In one embodiment of the invention, this functionality is achieved by first routing the MAC frame and then transmitting the VXLAN frame. - In one embodiment of the invention, the network topology may be arranged such that for a given
layer 2 domain it may (i) use direct overlay routing to communicate with a first set ofother layer 2 domains and (ii) use indirect and/or naked routing to communicate with a second set oflayer 2 domains. - While the invention has been described with respect to a limited number of embodiments, those skilled in the art, having benefit of this disclosure, will appreciate that other embodiments can be devised which do not depart from the scope of the invention as disclosed herein. Accordingly, the scope of the invention should be limited only by the attached claims.
Claims (21)
1.-20. (canceled)
21. A system for routing a first packet comprising:
a first network device, wherein the first network device is configured to receive a first encapsulated packet and rewrite a source media access control (MAC) frame to create a rewritten source MAC frame,
wherein said first encapsulated packet is an encapsulation of the first packet comprising the source MAC frame,
wherein said source MAC frame comprises: a payload, the first source MAC address, a first destination MAC address, a first source Internet Protocol (IP) address, and a first destination IP address,
said first source MAC address is associated with the first VXLAN, and said first destination MAC address is associated with a virtual address resolution protocol (VARP) virtual tunnel end point (VTEP),
wherein said rewritten source MAC frame replaces the first source MAC address with a MAC address associated with the VARP VTEP, and replaces the first destination MAC address with a MAC address associated with a second Virtual Extensible Local Area Network (VXLAN),
wherein a first server is associated with the first VXLAN and the second VXLAN,
wherein the first VXLAN and the second VXLAN are distinct,
wherein the first server is configured to receive the first packet and encapsulate the first packet in a first VXLAN frame to create the first encapsulated packet,
wherein said first packet comprises: a source media access control (MAC) frame,
wherein said first source IP address is associated with the first VXLAN and said first destination IP address is associated with the second VXLAN,
wherein said first VXLAN frame comprises a second destination IP address,
wherein the second destination IP address is a VARP VTEP IP address, and
wherein the first server and the first network device are configured to communicate with a second server associated with the first VXLAN and the second VXLAN.
22. The system of claim 21 , wherein the first network device is a Top of Rack (ToR) switch.
23. The system of claim 21 , wherein the first network device is further configured to decapsulate the first encapsulated packet to reveal the first packet before rewriting the source MAC frame.
24. The system of claim 21 , wherein the first network device is further configured to re-encapsulate the rewritten source MAC frame in a second VXLAN frame to create a re-encapsulated packet.
25. The system of claim 24 , wherein the second VXLAN frame allows the re-encapsulated packet to be routed to the second server on the second VXLAN.
26. The system of claim 21 , wherein the first network device is configured to route the first packet using a direct overlay.
27. The system of claim 21 , wherein the first server comprises a first routing table portion for an underlay network and a second routing table portion for an overlay network, and wherein the second routing table portion comprises information relating an IP address to a layer-2 domain.
28. The system of claim 21 , further comprising a spine tier comprising at least one spine switch.
29. The system of claim 21 , further comprising a leaf tier comprising the first network device.
30. The system of claim 29 , wherein the leaf tier further comprises a second network device for connecting the first network device to the second server.
31. The system of claim 30 , wherein the second network device is a ToR switch.
32. A system for routing a first packet comprising:
a first server associated with a first Virtual Extensible Local Area Network (VXLAN) and a second VXLAN, wherein the first VXLAN and the second VXLAN are distinct;
the first server configured to receive a first packet and encapsulate the first packet in a first VXLAN frame to create a first encapsulated packet,
said first packet comprising: a source media access control (MAC) frame,
said source MAC frame comprising: a payload, a first source MAC address, a first destination MAC address, a first source Internet Protocol (IP) address, and a first destination IP address, said first source MAC address associated with the first VXLAN, and said first destination MAC address associated with a virtual address resolution protocol (VARP) virtual tunnel end point (VTEP),
said first source IP address associated with the first VXLAN and said first destination IP address associated with the second VXLAN,
said first VXLAN frame comprising a second destination IP address, wherein the second destination IP address is a VARP VTEP IP address; and
a first network device configured to receive the first encapsulated packet, rewrite the source MAC frame to create a rewritten source MAC frame, and re-encapsulate the first packet in a second VXLAN frame to create a re-encapsulated packet,
said rewritten source MAC frame replacing the first source MAC address with a MAC address associated with a second network device, and replacing the first destination MAC address with a MAC address associated with the VARP VTEP;
the second network device configured to receive the re-encapsulated packet, rewrite the source MAC frame to create a second rewritten source MAC frame, and re-encapsulate the packet in a third VXLAN frame,
said second rewritten source MAC frame replacing the first source MAC address with a MAC address associated with the VARP VTEP, and replacing the first destination MAC address with a MAC address associated with the second VXLAN,
wherein the first server and the first network device are configured to communicate with a second server associated with the first VXLAN and the second VXLAN.
33. The system of claim 32 , wherein the first network device and the second network device are configured to route the first packet using an indirect overlay.
34. The system of claim 33 , wherein the first network device and the second network device are configured to route the first packet on a third VXLAN therebetween.
35. The system of claim 32 , wherein the first network device is a first Top of Rack (ToR) switch, and the second network device is a second ToR switch.
36. The system of claim 32 , wherein the first network device and the second network device are configured to route the first packet using a naked overlay.
37. The system of claim 36 , wherein the first network device and the second network device are configured to route the first packet therebetween not using a VXLAN protocol.
38. A first network device comprising:
at least one processor; and
a non-transitory computer-readable medium storing computer-executable instructions to cause the at least one processor to perform a method for routing a first packet from a first layer-2 domain to a second layer-2 domain, the method comprising:
receiving the first packet addressed to the first network device, wherein the first packet comprises a first VXLAN frame encapsulating a source MAC frame to form a first encapsulated packet,
decapsulating the first encapsulated packet to obtain the source MAC frame;
rewriting the source MAC frame by changing a destination address from a VARP VTEP MAC address to a second layer-2 domain MAC address, and changing a source address from a first layer-2 domain MAC address to the VARP VTEP MAC address;
generating a re-encapsulated packet comprising the rewritten source MAC frame and a second VXLAN frame; and
routing the re-encapsulated packet towards the destination address in the second layer-2 domain,
wherein the first network device maintains a plurality of VARP VTEP MAC addresses, each VARP VTEP MAC address being associated with a different layer-2 domain.
39. The first network device of claim 38 , wherein the first network device is configured to route the first packet using a direct overlay.
40. The first network device of claim 38 , wherein the first network device is a ToR switch.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US18/057,558 US20230077717A1 (en) | 2013-07-02 | 2022-11-21 | Method and system for overlay routing with vxlan |
Applications Claiming Priority (5)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US201361842132P | 2013-07-02 | 2013-07-02 | |
US201361846259P | 2013-07-15 | 2013-07-15 | |
US14/321,335 US9369383B2 (en) | 2013-07-02 | 2014-07-01 | Method and system for overlay routing with VXLAN |
US15/155,940 US11539618B2 (en) | 2013-07-02 | 2016-05-16 | Method and system for overlay routing with VXLAN |
US18/057,558 US20230077717A1 (en) | 2013-07-02 | 2022-11-21 | Method and system for overlay routing with vxlan |
Related Parent Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US15/155,940 Continuation US11539618B2 (en) | 2013-07-02 | 2016-05-16 | Method and system for overlay routing with VXLAN |
Publications (1)
Publication Number | Publication Date |
---|---|
US20230077717A1 true US20230077717A1 (en) | 2023-03-16 |
Family
ID=52132803
Family Applications (4)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US14/321,335 Active US9369383B2 (en) | 2013-07-02 | 2014-07-01 | Method and system for overlay routing with VXLAN |
US14/321,381 Active 2034-07-26 US9749231B2 (en) | 2013-07-02 | 2014-07-01 | Method and system for overlay routing with VXLAN on bare metal servers |
US15/155,940 Active 2035-06-29 US11539618B2 (en) | 2013-07-02 | 2016-05-16 | Method and system for overlay routing with VXLAN |
US18/057,558 Pending US20230077717A1 (en) | 2013-07-02 | 2022-11-21 | Method and system for overlay routing with vxlan |
Family Applications Before (3)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US14/321,335 Active US9369383B2 (en) | 2013-07-02 | 2014-07-01 | Method and system for overlay routing with VXLAN |
US14/321,381 Active 2034-07-26 US9749231B2 (en) | 2013-07-02 | 2014-07-01 | Method and system for overlay routing with VXLAN on bare metal servers |
US15/155,940 Active 2035-06-29 US11539618B2 (en) | 2013-07-02 | 2016-05-16 | Method and system for overlay routing with VXLAN |
Country Status (2)
Country | Link |
---|---|
US (4) | US9369383B2 (en) |
WO (1) | WO2015003029A1 (en) |
Families Citing this family (93)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8611355B1 (en) | 2013-09-03 | 2013-12-17 | tw telecom holdings inc. | Buffer-less virtual routing |
US9785455B2 (en) | 2013-10-13 | 2017-10-10 | Nicira, Inc. | Logical router |
US9374294B1 (en) | 2013-11-05 | 2016-06-21 | Cisco Technology, Inc. | On-demand learning in overlay networks |
US10951522B2 (en) * | 2013-11-05 | 2021-03-16 | Cisco Technology, Inc. | IP-based forwarding of bridged and routed IP packets and unicast ARP |
US9769078B2 (en) | 2013-11-05 | 2017-09-19 | Cisco Technology, Inc. | Dynamic flowlet prioritization |
US10778584B2 (en) | 2013-11-05 | 2020-09-15 | Cisco Technology, Inc. | System and method for multi-path load balancing in network fabrics |
US9655232B2 (en) | 2013-11-05 | 2017-05-16 | Cisco Technology, Inc. | Spanning tree protocol (STP) optimization techniques |
US9674086B2 (en) | 2013-11-05 | 2017-06-06 | Cisco Technology, Inc. | Work conserving schedular based on ranking |
US9888405B2 (en) | 2013-11-05 | 2018-02-06 | Cisco Technology, Inc. | Networking apparatuses and packet statistic determination methods employing atomic counters |
US9502111B2 (en) | 2013-11-05 | 2016-11-22 | Cisco Technology, Inc. | Weighted equal cost multipath routing |
US9825857B2 (en) | 2013-11-05 | 2017-11-21 | Cisco Technology, Inc. | Method for increasing Layer-3 longest prefix match scale |
EP3066796B1 (en) * | 2013-11-05 | 2020-01-01 | Cisco Technology, Inc. | Network fabric overlay |
US9397946B1 (en) | 2013-11-05 | 2016-07-19 | Cisco Technology, Inc. | Forwarding to clusters of service nodes |
US9876711B2 (en) | 2013-11-05 | 2018-01-23 | Cisco Technology, Inc. | Source address translation in overlay networks |
US20150257081A1 (en) | 2014-02-04 | 2015-09-10 | Architecture Technology, Inc. | Hybrid autonomous network and router for communication between heterogeneous subnets |
US10587509B2 (en) | 2014-02-04 | 2020-03-10 | Architecture Technology Corporation | Low-overhead routing |
US9893988B2 (en) | 2014-03-27 | 2018-02-13 | Nicira, Inc. | Address resolution using multiple designated instances of a logical router |
US10250443B2 (en) | 2014-09-30 | 2019-04-02 | Nicira, Inc. | Using physical location to modify behavior of a distributed virtual network element |
US10511458B2 (en) | 2014-09-30 | 2019-12-17 | Nicira, Inc. | Virtual distributed bridging |
US9768980B2 (en) | 2014-09-30 | 2017-09-19 | Nicira, Inc. | Virtual distributed bridging |
US10020960B2 (en) | 2014-09-30 | 2018-07-10 | Nicira, Inc. | Virtual distributed bridging |
JP2016092549A (en) * | 2014-10-31 | 2016-05-23 | 日立金属株式会社 | Relay system and switch device |
US10050876B2 (en) * | 2014-11-12 | 2018-08-14 | Cisco Technology, Inc. | Optimized inter-VRF (virtual routing and forwarding) route leaking in network overlay based environments |
JP6355536B2 (en) * | 2014-11-27 | 2018-07-11 | APRESIA Systems株式会社 | Relay system and switch device |
CN105704036B (en) * | 2014-11-27 | 2019-05-28 | 华为技术有限公司 | Message forwarding method, device and system |
US9853873B2 (en) | 2015-01-10 | 2017-12-26 | Cisco Technology, Inc. | Diagnosis and throughput measurement of fibre channel ports in a storage area network environment |
US20160226753A1 (en) * | 2015-02-04 | 2016-08-04 | Mediatek Inc. | Scheme for performing one-pass tunnel forwarding function on two-layer network structure |
US10103902B1 (en) * | 2015-03-05 | 2018-10-16 | Juniper Networks, Inc. | Auto-discovery of replication node and remote VTEPs in VXLANs |
US9900250B2 (en) * | 2015-03-26 | 2018-02-20 | Cisco Technology, Inc. | Scalable handling of BGP route information in VXLAN with EVPN control plane |
WO2016160043A1 (en) * | 2015-04-03 | 2016-10-06 | Hewlett Packard Enterprise Development Lp | Address cache for tunnel endpoint associated with an overlay network |
CN106209689B (en) | 2015-05-04 | 2019-06-14 | 新华三技术有限公司 | Multicast data packet forwarding method and apparatus from VXLAN to VLAN |
CN106209648B (en) | 2015-05-04 | 2019-06-14 | 新华三技术有限公司 | Multicast data packet forwarding method and apparatus across virtual expansible local area network |
CN106209636B (en) | 2015-05-04 | 2019-08-02 | 新华三技术有限公司 | Multicast data packet forwarding method and apparatus from VLAN to VXLAN |
US10222986B2 (en) | 2015-05-15 | 2019-03-05 | Cisco Technology, Inc. | Tenant-level sharding of disks with tenant-specific storage modules to enable policies per tenant in a distributed storage system |
US9628379B2 (en) * | 2015-06-01 | 2017-04-18 | Cisco Technology, Inc. | Large scale residential cloud based application centric infrastructures |
US11588783B2 (en) | 2015-06-10 | 2023-02-21 | Cisco Technology, Inc. | Techniques for implementing IPV6-based distributed storage space |
US10178024B2 (en) * | 2015-06-26 | 2019-01-08 | Nicira, Inc. | Traffic forwarding in a network with geographically dispersed sites |
US10225184B2 (en) | 2015-06-30 | 2019-03-05 | Nicira, Inc. | Redirecting traffic in a virtual distributed router environment |
US10778765B2 (en) | 2015-07-15 | 2020-09-15 | Cisco Technology, Inc. | Bid/ask protocol in scale-out NVMe storage |
CN106713158B (en) * | 2015-07-16 | 2019-11-29 | 华为技术有限公司 | The method and device of load balancing in Clos network |
US9838315B2 (en) * | 2015-07-29 | 2017-12-05 | Cisco Technology, Inc. | Stretched subnet routing |
US10719341B2 (en) | 2015-12-02 | 2020-07-21 | Nicira, Inc. | Learning of tunnel endpoint selections |
US10164885B2 (en) | 2015-12-02 | 2018-12-25 | Nicira, Inc. | Load balancing over multiple tunnel endpoints |
US9912616B2 (en) * | 2015-12-02 | 2018-03-06 | Nicira, Inc. | Grouping tunnel endpoints of a bridge cluster |
US10069646B2 (en) | 2015-12-02 | 2018-09-04 | Nicira, Inc. | Distribution of tunnel endpoint mapping information |
US9892075B2 (en) | 2015-12-10 | 2018-02-13 | Cisco Technology, Inc. | Policy driven storage in a microserver computing environment |
US10382529B2 (en) | 2016-01-29 | 2019-08-13 | Nicira, Inc. | Directed graph based span computation and configuration dispatching |
US10326617B2 (en) | 2016-04-15 | 2019-06-18 | Architecture Technology, Inc. | Wearable intelligent communication hub |
US10581793B1 (en) * | 2016-04-29 | 2020-03-03 | Arista Networks, Inc. | Address resolution in virtual extensible networks |
US10140172B2 (en) | 2016-05-18 | 2018-11-27 | Cisco Technology, Inc. | Network-aware storage repairs |
US11212240B2 (en) | 2016-05-26 | 2021-12-28 | Avago Technologies International Sales Pte. Limited | Efficient convergence in network events |
US20170351639A1 (en) | 2016-06-06 | 2017-12-07 | Cisco Technology, Inc. | Remote memory access using memory mapped addressing among multiple compute nodes |
US10664169B2 (en) | 2016-06-24 | 2020-05-26 | Cisco Technology, Inc. | Performance of object storage system by reconfiguring storage devices based on latency that includes identifying a number of fragments that has a particular storage device as its primary storage device and another number of fragments that has said particular storage device as its replica storage device |
US10142264B2 (en) * | 2016-07-29 | 2018-11-27 | Cisco Technology, Inc. | Techniques for integration of blade switches with programmable fabric |
US10129099B2 (en) | 2016-08-16 | 2018-11-13 | International Business Machines Corporation | Differentiated services for protocol suitable network virtualization overlays |
US11563695B2 (en) | 2016-08-29 | 2023-01-24 | Cisco Technology, Inc. | Queue protection using a shared global memory reserve |
CN107800549B (en) | 2016-08-30 | 2020-01-03 | 新华三技术有限公司 | Method and device for realizing multi-tenant equipment environment MDC (media data center) based on port of switching equipment |
KR102168047B1 (en) * | 2016-09-26 | 2020-10-20 | 난트 홀딩스 아이피, 엘엘씨 | Virtual circuits in cloud networks |
CN108234362B (en) * | 2016-12-15 | 2020-08-11 | 中国电信股份有限公司 | VXLAN message accelerated forwarding method and system, VNF and NFVI |
CN106789529B (en) * | 2016-12-16 | 2020-04-14 | 平安科技(深圳)有限公司 | Method and terminal for implementing OVERLAY network |
US10545914B2 (en) | 2017-01-17 | 2020-01-28 | Cisco Technology, Inc. | Distributed object storage |
US10243823B1 (en) | 2017-02-24 | 2019-03-26 | Cisco Technology, Inc. | Techniques for using frame deep loopback capabilities for extended link diagnostics in fibre channel storage area networks |
US10713203B2 (en) | 2017-02-28 | 2020-07-14 | Cisco Technology, Inc. | Dynamic partition of PCIe disk arrays based on software configuration / policy distribution |
US10254991B2 (en) | 2017-03-06 | 2019-04-09 | Cisco Technology, Inc. | Storage area network based extended I/O metrics computation for deep insight into application performance |
US10757004B2 (en) | 2017-04-12 | 2020-08-25 | Nicira, Inc. | Routing domain identifier assignment in logical network environments |
US10333836B2 (en) * | 2017-04-13 | 2019-06-25 | Cisco Technology, Inc. | Convergence for EVPN multi-homed networks |
US10243846B2 (en) | 2017-05-15 | 2019-03-26 | Nicira, Inc. | Defining routing domain for distributed packet processing |
CN108934058B (en) * | 2017-05-25 | 2020-11-27 | 华为技术有限公司 | Communication method and device |
CN109218158B (en) * | 2017-07-05 | 2021-05-11 | 中国电信股份有限公司 | VxLAN-based data transmission method, control method, controller, gateway, intermediate network element and system |
US10419389B2 (en) * | 2017-07-20 | 2019-09-17 | Arista Networks, Inc. | Method and system for using a top of rack switch as an overlay routing intermediate point |
US10303534B2 (en) | 2017-07-20 | 2019-05-28 | Cisco Technology, Inc. | System and method for self-healing of application centric infrastructure fabric memory |
CN107612808B (en) * | 2017-09-13 | 2020-09-08 | 新华三技术有限公司 | Tunnel establishment method and device |
US10404596B2 (en) | 2017-10-03 | 2019-09-03 | Cisco Technology, Inc. | Dynamic route profile storage in a hardware trie routing table |
US10693769B2 (en) * | 2017-10-10 | 2020-06-23 | Vmware, Inc. | Methods and apparatus to perform network fabric migration in virtualized server systems |
US10942666B2 (en) | 2017-10-13 | 2021-03-09 | Cisco Technology, Inc. | Using network device replication in distributed storage clusters |
US10374827B2 (en) | 2017-11-14 | 2019-08-06 | Nicira, Inc. | Identifier that maps to different networks at different datacenters |
US10511459B2 (en) | 2017-11-14 | 2019-12-17 | Nicira, Inc. | Selection of managed forwarding element for bridge spanning multiple datacenters |
CN109995638A (en) * | 2018-01-02 | 2019-07-09 | 中国移动通信有限公司研究院 | A kind of method and apparatus carrying out double layer intercommunication |
US10904148B2 (en) | 2018-03-12 | 2021-01-26 | Nicira, Inc. | Flow-based local egress in a multisite datacenter |
TWI819072B (en) | 2018-08-23 | 2023-10-21 | 美商阿爾克斯股份有限公司 | System, non-transitory computer readable storage media and computer-implemented method for loop conflict avoidance in a network computing environment |
US11627080B2 (en) * | 2019-01-18 | 2023-04-11 | Vmware, Inc. | Service insertion in public cloud environments |
US10892989B2 (en) | 2019-01-18 | 2021-01-12 | Vmware, Inc. | Tunnel-based service insertion in public cloud environments |
US11128490B2 (en) | 2019-04-26 | 2021-09-21 | Microsoft Technology Licensing, Llc | Enabling access to dedicated resources in a virtual network using top of rack switches |
US11374879B2 (en) * | 2019-06-17 | 2022-06-28 | Cyxtera Data Centers, Inc. | Network configuration of top-of-rack switches across multiple racks in a data center |
US11032162B2 (en) | 2019-07-18 | 2021-06-08 | Vmware, Inc. | Mothod, non-transitory computer-readable storage medium, and computer system for endpoint to perform east-west service insertion in public cloud environments |
US11323287B2 (en) | 2019-07-18 | 2022-05-03 | International Business Machines Corporation | Link layer method of configuring a bare-metal server in a virtual network |
US11102080B2 (en) | 2019-07-18 | 2021-08-24 | International Business Machines Corporation | Network layer method of configuration of a bare-metal server in a virtual network |
US11323409B2 (en) | 2020-01-17 | 2022-05-03 | Arista Networks, Inc. | Efficient ARP bindings distribution in VPN networks |
US11538562B1 (en) | 2020-02-04 | 2022-12-27 | Architecture Technology Corporation | Transmission of medical information in disrupted communication networks |
US11178041B1 (en) * | 2020-07-07 | 2021-11-16 | Juniper Networks, Inc. | Service chaining with physical network functions and virtualized network functions |
US11469987B2 (en) | 2020-09-30 | 2022-10-11 | Vmware, Inc. | Incremental and parallel routing domain computation |
CN113746717B (en) * | 2021-09-07 | 2023-04-18 | 中国联合网络通信集团有限公司 | Network equipment communication method and network equipment communication device |
US11743191B1 (en) | 2022-07-25 | 2023-08-29 | Vmware, Inc. | Load balancing over tunnel endpoint groups |
Family Cites Families (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP4647825B2 (en) * | 2001-04-27 | 2011-03-09 | 富士通セミコンダクター株式会社 | Packet transmission / reception system, host, and program |
US7890633B2 (en) * | 2003-02-13 | 2011-02-15 | Oracle America, Inc. | System and method of extending virtual address resolution for mapping networks |
US8296459B1 (en) | 2010-06-30 | 2012-10-23 | Amazon Technologies, Inc. | Custom routing decisions |
JP5618886B2 (en) * | 2011-03-31 | 2014-11-05 | 株式会社日立製作所 | Network system, computer distribution apparatus, and computer distribution method |
US8670450B2 (en) | 2011-05-13 | 2014-03-11 | International Business Machines Corporation | Efficient software-based private VLAN solution for distributed virtual switches |
US8819267B2 (en) | 2011-11-16 | 2014-08-26 | Force10 Networks, Inc. | Network virtualization without gateway function |
US9898317B2 (en) * | 2012-06-06 | 2018-02-20 | Juniper Networks, Inc. | Physical path determination for virtual network packet flows |
US9036639B2 (en) * | 2012-11-29 | 2015-05-19 | Futurewei Technologies, Inc. | System and method for VXLAN inter-domain communications |
US20140317616A1 (en) * | 2013-04-23 | 2014-10-23 | Thomas P. Chu | Cloud computing resource management |
-
2014
- 2014-07-01 US US14/321,335 patent/US9369383B2/en active Active
- 2014-07-01 US US14/321,381 patent/US9749231B2/en active Active
- 2014-07-02 WO PCT/US2014/045183 patent/WO2015003029A1/en active Application Filing
-
2016
- 2016-05-16 US US15/155,940 patent/US11539618B2/en active Active
-
2022
- 2022-11-21 US US18/057,558 patent/US20230077717A1/en active Pending
Also Published As
Publication number | Publication date |
---|---|
US20150010001A1 (en) | 2015-01-08 |
US20160337234A1 (en) | 2016-11-17 |
US11539618B2 (en) | 2022-12-27 |
US9749231B2 (en) | 2017-08-29 |
US20150010002A1 (en) | 2015-01-08 |
US9369383B2 (en) | 2016-06-14 |
WO2015003029A1 (en) | 2015-01-08 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US20230077717A1 (en) | Method and system for overlay routing with vxlan | |
US9866409B2 (en) | Method and system for VXLAN encapsulation offload | |
US12021826B2 (en) | Techniques for managing software defined networking controller in-band communications in a data center network | |
US10171357B2 (en) | Techniques for managing software defined networking controller in-band communications in a data center network | |
EP2981031B1 (en) | Method and system for vtep redundancy in a multichassis link aggregation domain | |
US9621508B2 (en) | System and method for sharing VXLAN table information with a network controller | |
US10237230B2 (en) | Method and system for inspecting network traffic between end points of a zone | |
US10536297B2 (en) | Indirect VXLAN bridging | |
US10205657B2 (en) | Packet forwarding in data center network | |
KR102054338B1 (en) | Routing vlan tagged packets to far end addresses of virtual forwarding instances using separate administrations | |
US9509603B2 (en) | System and method for route health injection using virtual tunnel endpoints | |
US11336485B2 (en) | Hitless linkup of ethernet segment | |
US20130163594A1 (en) | Overlay-Based Packet Steering | |
US10848457B2 (en) | Method and system for cross-zone network traffic between different zones using virtual network identifiers and virtual layer-2 broadcast domains | |
US10855733B2 (en) | Method and system for inspecting unicast network traffic between end points residing within a same zone | |
US10419389B2 (en) | Method and system for using a top of rack switch as an overlay routing intermediate point | |
US11012412B2 (en) | Method and system for network traffic steering towards a service device | |
US12126598B2 (en) | Managing exchanges between edge gateways in a cloud environment to support a private network connection | |
US10749789B2 (en) | Method and system for inspecting broadcast network traffic between end points residing within a same zone | |
US20230239274A1 (en) | Managing exchanges between edge gateways in a cloud environment to support a private network connection |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: ARISTA NETWORKS, INC., CALIFORNIA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:DUDA, KENNETH JAMES;SWEENEY, ADAM JAMES;REEL/FRAME:061844/0960 Effective date: 20150210 |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION |