EP3804236A1 - Method and apparatus for optimized dissemination of layer 3 forwarding information in software defined networking (sdn) networks - Google Patents

Method and apparatus for optimized dissemination of layer 3 forwarding information in software defined networking (sdn) networks

Info

Publication number
EP3804236A1
EP3804236A1 EP18920579.2A EP18920579A EP3804236A1 EP 3804236 A1 EP3804236 A1 EP 3804236A1 EP 18920579 A EP18920579 A EP 18920579A EP 3804236 A1 EP3804236 A1 EP 3804236A1
Authority
EP
European Patent Office
Prior art keywords
network
designated
network elements
elements
forwarding table
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
EP18920579.2A
Other languages
German (de)
French (fr)
Other versions
EP3804236A4 (en
Inventor
Vinayak Joshi
Kandagal HANAMANTAGOUD
Siva Kumar V V K A PERUMALLA
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Telefonaktiebolaget LM Ericsson AB
Original Assignee
Telefonaktiebolaget LM Ericsson AB
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Telefonaktiebolaget LM Ericsson AB filed Critical Telefonaktiebolaget LM Ericsson AB
Publication of EP3804236A1 publication Critical patent/EP3804236A1/en
Publication of EP3804236A4 publication Critical patent/EP3804236A4/en
Withdrawn legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L45/00Routing or path finding of packets in data switching networks
    • H04L45/64Routing or path finding of packets in data switching networks using an overlay routing layer
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L45/00Routing or path finding of packets in data switching networks
    • H04L45/02Topology update or discovery
    • H04L45/036Updating the topology between route computation elements, e.g. between OpenFlow controllers

Definitions

  • Embodiments of the invention relate to the field of packet networking; and more specifically, to the optimized dissemination of Layer 3 forwarding information in Software Defined Networking (SDN) networks.
  • SDN Software Defined Networking
  • SDN Software-Defined Networking
  • a network controller which can be deployed as a cluster of server nodes, has the role of the control plane and is coupled to one or more network elements (NEs) that have the role of the data plane.
  • NEs network elements
  • the Open Networking Foundation an industrial consortium focusing on commercializing SDN and its underlying technologies, has defined a set of open commands, functions, and protocols.
  • the defined protocol suites are known as the OpenFlow (OF) protocol.
  • the network controller acting as the control plane, may then program the data plane on the network elements by causing packet handling rules to be installed on the forwarding network elements using OF commands and messages. These packet handling rules may have criteria to match various packet types as well as actions that may be performed on those packets.
  • L3 forwarding tables which can be referred to as forwarding information base (FIB)
  • FIB forwarding information base
  • the network controller translates local and remote prefix information into OpenFlow rules and programs the rules into the forwarding tables of the network elements (i.e., the OpenFlow switches).
  • LPM Longest Prefix Matching
  • IP Internet Protocol
  • the network controller For traffic to flow over a given L3 network (e.g., a Layer 3 Virtual Private Network (VPN)), the network controller needs to configure the forwarding tables for each one of the network elements of the data plane forming the L3 network. However, the network controller is not aware of and cannot predict which traffic flows will be present in the network when programming the network.
  • L3 network e.g., a Layer 3 Virtual Private Network (VPN)
  • VPN Layer 3 Virtual Private Network
  • One general aspect includes a method, in a network controller of a network, of configuring a plurality of network elements in a data plane of the network, the method including: selecting a set of one or more designated network elements from the plurality of network elements, where the number of designated network elements is strictly less than the number of all of the plurality of network elements in the data plane of the network, and the remaining network elements from the plurality of network elements in the data plane (120) of the network are non-designated network elements; configuring each designated network element from the set of designated network elements to include a Layer 3 forwarding table including a forwarding table entry for each network element from the plurality of network elements in the data plane of the network; and configuring each non-designated network element to include a Layer 3 forwarding table including a default forwarding table entry having a designated network element from the set of designated network elements as a next hop destination for a plurality of traffic flows causing each of the non-designated network elements to forward all traffic associated with the plurality of traffic flows to the designated network element.
  • One general aspect includes a machine-readable medium including computer program code which when executed by a computer carries out the method including: selecting a set of one or more designated network elements from the plurality of network elements, where the number of designated network elements is strictly less than the number of all of the plurality of network elements in the data plane of the network, and the remaining network elements from the plurality of network elements in the data plane (120) of the network are non-designated network elements; configuring each designated network element from the set of designated network elements to include a Layer 3 forwarding table including a forwarding table entry for each network element from the plurality of network elements in the data plane of the network; and configuring each non-designated network element to include a Layer 3 forwarding table including a default forwarding table entry having a designated network element from the set of designated network elements as a next hop destination for a plurality of traffic flows causing each of the non-designated network elements to forward all traffic associated with the plurality of traffic flows to the designated network element.
  • One general aspect includes a network controller of a network for configuring a plurality of network elements in a data plane of the network, the network controller including: a non-transitory machine -readable storage medium that provides instructions that, if executed by a processor, will cause the network controller to perform operations including: selecting a set of one or more designated network elements from the plurality of network elements, where the number of designated network elements is strictly less than the number of all of the plurality of network elements in the data plane of the network, and the remaining network elements from the plurality of network elements in the data plane of the network are non-designated network elements; configuring each designated network element from the set of designated network elements to include a Layer 3 forwarding table including a forwarding table entry for each network element from the plurality of network elements in the data plane of the network; and configuring each non-designated network element to include a Layer 3 forwarding table including a default forwarding table entry having a designated network element from the set of designated network elements as a next hop destination for a plurality of traffic flows
  • Figure 1 illustrates a block diagram of an exemplary network for optimized dissemination of Layer 3 forwarding information in accordance with some embodiments.
  • Figure 2A illustrates an exemplary Layer 3 forwarding table that may be configured in a designated NE of a network, in accordance with some embodiments.
  • Figure 2B illustrates an exemplary Layer 3 forwarding table that may be configured in a first non-designated NE of a network, in accordance with some embodiments.
  • Figure 2C illustrates an exemplary Layer 3 forwarding table that may be configured in a second non-designated NE of a network, in accordance with some embodiments.
  • Figure 3 illustrates a block diagram of an exemplary network where heavy flows are forwarded through the designated NE in accordance with some embodiments.
  • Figure 4A illustrates an exemplary Layer 3 forwarding table that may be configured in a designated NE to efficiently forward heavy flows in the network, in accordance with some embodiments.
  • Figure 4B illustrates an exemplary Layer 3 forwarding table that may be configured in a first non-designated NE to efficiently forward heavy flows in the network, in accordance with some embodiments.
  • Figure 4C illustrates an exemplary Layer 3 forwarding table that may be configured in a second non-designated NE to efficiently forward heavy flows in the network, in accordance with some embodiments.
  • Figure 5 illustrates a block diagram of an exemplary network where heavy flows are forwarded through non-designated NEs in accordance with some embodiments.
  • Figure 6 illustrates a flow diagram of exemplary operations for optimized dissemination of Layer 3 forwarding information in a network in accordance with some embodiments.
  • Figure 7 illustrates a flow diagram of exemplary operations performed by a network controller when determining that a heavy flow is forwarded at a designated network element, in accordance with some embodiments.
  • Figure 8A illustrates connectivity between network devices (NDs) within an exemplary network, as well as three exemplary implementations of the NDs, according to some embodiments of the invention.
  • Figure 8B illustrates an exemplary way to implement a special-purpose network device according to some embodiments of the invention.
  • FIG. 8C illustrates various exemplary ways in which virtual network elements (VNEs) may be coupled according to some embodiments of the invention.
  • VNEs virtual network elements
  • Figure 8D illustrates a network with a single network element (NE) on each of the NDs, and within this straight forward approach contrasts a traditional distributed approach (commonly used by traditional routers) with a centralized approach for maintaining reachability and forwarding information (also called network control), according to some embodiments of the invention.
  • NE network element
  • Figure 8E illustrates the simple case of where each of the NDs implements a single NE, but a centralized control plane has abstracted multiple of the NEs in different NDs into (to represent) a single NE in one of the virtual network(s), according to some embodiments of the invention.
  • Figure 8F illustrates a case where multiple VNEs are implemented on different NDs and are coupled to each other, and where a centralized control plane has abstracted these multiple VNEs such that they appear as a single VNE within one of the virtual networks, according to some embodiments of the invention.
  • Figure 9 illustrates a general purpose control plane device with centralized control plane (CCP) software, according to some embodiments of the invention.
  • SDN Software Defined Networking
  • numerous specific details such as logic implementations, opcodes, means to specify operands, resource partitioning/sharing/duplication implementations, types and interrelationships of system components, and logic partitioning/integration choices are set forth in order to provide a more thorough understanding of the present invention. It will be appreciated, however, by one skilled in the art that the invention may be practiced without such specific details. In other instances, control structures, gate level circuits and full software instruction sequences have not been shown in detail in order not to obscure the invention. Those of ordinary skill in the art, with the included descriptions, will be able to implement appropriate functionality without undue experimentation.
  • references in the specification to “one embodiment,” “an embodiment,” “an example embodiment,” etc., indicate that the embodiment described may include a particular feature, structure, or characteristic, but every embodiment may not necessarily include the particular feature, structure, or characteristic. Moreover, such phrases are not necessarily referring to the same embodiment. Further, when a particular feature, structure, or characteristic is described in connection with an embodiment, it is submitted that it is within the knowledge of one skilled in the art to affect such feature, structure, or characteristic in connection with other embodiments whether or not explicitly described.
  • Bracketed text and blocks with dashed borders may be used herein to illustrate optional operations that add additional features to embodiments of the invention. However, such notation should not be taken to mean that these are the only options or optional operations, and/or that blocks with solid borders are not optional in certain embodiments of the invention.
  • Coupled is used to indicate that two or more elements, which may or may not be in direct physical or electrical contact with each other, co-operate or interact with each other.
  • Connected is used to indicate the establishment of communication between two or more elements that are coupled with each other.
  • each network element of the L3 network includes all potential flows configured as forwarding table entries in the L3 forwarding table(s) of the network element.
  • the number of network elements can reach thousands of elements resulting in large forwarding tables that need to be programmed on all the elements of the network.
  • the number of routes advertised by edge network elements (such as gateway devices) to the network controller can significantly increase and cannot be controlled by the network controller.
  • SR-IOV virtual switches
  • NIC network interface controller
  • SR-IOV-capable hosts instead of running on regular cores of the compute host
  • Another disadvantage of this first approach is that the configuration of the network elements can prove to be too heavy for the network controller as the controller needs to program large L3 forwarding tables across the multiple network elements. While at regular run time, the rate of network elements configuration may be low, after the occurrence of some network events (e.g., reboot of an NE, reboot of the network controller, or VM migration, etc.) a large set of forwarding table entries has to be programmed on the network elements of the data plane and this can take a significant amount of time causing significant delays in the network.
  • some network events e.g., reboot of an NE, reboot of the network controller, or VM migration, etc.
  • the network controller may program forwarding table entries that will never be used, consequently using storage resources that could otherwise be used for other purposes. Such forwarding table entries or routes contribute to the bloating of the forwarding tables. However, since the network controller cannot predict if an entry in the forwarding table is needed for actual traffic that will flow in the network, not programming these entries on a network element can lead to traffic drop.
  • the network controller does not pre-configure the forwarding table entries for all flows in all the network elements. Instead, when a packet arrives at a network element, the forwarding pipeline performs a match look-up on the destination IP address in the forwarding table and transmits any table-miss (i.e., non-existing table entries matching the IP address) to the network controller. Packets with no corresponding table entries in the forwarding table are punted to the network controller. If the flow is known to the network controller, it programs forwarding of the flow on the network element from which the packets of the flow were received, consequently adding the corresponding flow as a forwarding table entry.
  • table-miss i.e., non-existing table entries matching the IP address
  • the second approach avoids the cost of pre-configuring at each network element all the possible table entries (i.e., flows) for the network.
  • this approach is more suitable for Layer 2 forwarding where the matching of traffic to an entry in a forwarding table is based on an exact match on the destination media access control (DMAC) address as opposed to Layer 3 forwarding that is typically done based on Longest Prefix Matching. Longest Prefix Matching may result in packets being matched against undesired prefixes instead of being punted to the network controller.
  • DMAC destination media access control
  • the network controller can program a prefix in the forwarding table that causes more flows received at the network element to be punted to the network controller if there is a table-miss. The network controller may then decide to program a more specific forwarding table entry for that flow.
  • This third approach which can be referred to as need-based flow programming, solves the flow scale programming problem on the network element as forwarding table entries are programmed when a packet corresponding to that flow is received and punted to the network controller.
  • the third approach suffers from the following disadvantages: punting packets of a flow to the network controller and processing them always incurs a latency on the first packet of the flow, which may not be desirable for some flows and some network applications.
  • the network controller of the network determines one or more designated network elements from the plurality of network elements.
  • the number of designated network elements is strictly less than the number of all of the plurality of network elements in the data plane of the SDN network.
  • the network controller configures each one of the designated network elements to include a Layer 3 forwarding table including a forwarding table entry for each network element from the plurality of network elements in the data plane of the SDN network.
  • the network controller configures each one of the network elements from the plurality of network elements that are different from the designated network elements to include a Layer 3 forwarding table including a default forwarding table entry having a designated network element from the designated network elements as a next hop destination for a plurality of traffic flows causing each one of the network elements from the plurality of network elements that are different from the designated network elements to forward all traffic associated with the plurality of traffic flows to the designated network element.
  • Figure 1 illustrates a block diagram of an exemplary network for optimized dissemination of Layer 3 forwarding information in accordance with some embodiments.
  • Figure 1 includes a network controller 110 and a data plane 120 of a first network 100 coupled with a second network 130 through an edge network element 105.
  • the second network 130 is further coupled with the remote network 140 through one or more edge network elements, which are not illustrated.
  • the data plane 120 can be, or includes, a data center (DC) including resources (e.g., Virtual Machines, Containers, bare metal applications, etc.) that can be allocated and used by multiple tenants.
  • DC data center
  • the data plane 120 includes multiple network elements 101-104 that are coupled with local hosts 111, 112A-B, 113, and 114.
  • the network elements 101-104 form the data plane 120 of the network 100 and are controlled by the network controller 110. While Figure 1 illustrates 5 network elements (101-105) as part of the network 100, this is intended to be exemplary only and one of ordinary skill in the art would understand that the network 100 may include various numbers of network elements. In typical implementations, the network 100 includes hundreds to thousands of network elements.
  • the network elements 101-105 provide Layer 2 and Layer 3 connectivity to the hosts 111, 112A-B, 113 and 114.
  • the L2/L3 traffic forwarded within the data plane 120 can be exchanged between the hosts, exchanged between networks (e.g., intra data centers), or forwarded towards/from the Internet.
  • An NE such as NE 101
  • an NE such as NE 102
  • ToR Top of the Rack
  • each one of the NEs 101-104 is an OF switch (e.g., OpenFlow Virtual Switch (OVS), or a ToR switch supporting OF).
  • OFS OpenFlow Virtual Switch
  • Figure 1 illustrates a determined number of hosts coupled with respective switches, this is intended to be exemplary only and should not be considered as a limitation to the present invention.
  • the data plane 120 may include several NEs (typically hundreds or thousands) and each NE may be coupled to one or more host devices.
  • the network elements 101 - 105 and the network controller 110 are implemented as described with references to the various embodiments of Figures 8A-F and Figure 9.
  • the tenants of the data plane 120 can be separated into different Layer 3 Virtual Private Networks (L3VPN).
  • L3VPN Layer 3 Virtual Private Networks
  • Figure 1 shows an example where local host 111, local host 112A and local host 113 form a first L3VPN (indicated with shaded boxes), and the local host 112B and local host 114 form another L3VPN within the same data plane 120.
  • the data plane 120 is, or includes, a data center
  • the resources used by or allocated to a tenant can form a single L3VPN.
  • the illustrated embodiment shows the elements of a single L3VPN located within the data plane 120, in other embodiments, the L3VPN might extend beyond the data plane 120 (e.g., enterprise tenant's VPN might extend to multiple premises).
  • the network controller 110 is a centralized control plane and has the responsibility for generating reachability and forwarding information.
  • the network controller 110 is sometimes referred to as a SDN control module, controller, network controller, OpenFlow controller, SDN controller, control plane node, network virtualization authority, or management control entity.
  • the network controller 110 enables a centralized process of neighbor discovery and topology discovery.
  • the network controller 110 has a south bound interface with a data plane (sometimes referred to as the infrastructure layer, network forwarding plane, or forwarding plane) that includes the NEs 101-104 (sometimes referred to as switches, forwarding elements, data plane elements, or nodes).
  • the network controller 110 determines the reachability within the network and distributes the forwarding information to the NEs 101-104 of the data plane over the south bound interface (which may use the OpenFlow protocol).
  • the network intelligence is centralized in the network controller 110 executing on electronic devices that are typically separate from the network devices on which the NEs are implemented.
  • the network controller 110 further communicates with edge network elements, such as the edge NE 105 (e.g., a Data Center Gateway), that couple the data plane 120 with external and remote networks (e.g., other data centers, the Internet, etc.).
  • the network controller 110 can use the Multi-Protocol Border Gateway Protocol (MP-BGP) to exchange routes with the edge NE 105.
  • MP-BGP Multi-Protocol Border Gateway Protocol
  • the data plane 120 allows the distribution and forwarding of traffic flows to and from the hosts 111-114. To enable the distribution of the flows across the data plane 120, the network controller configures forwarding information on the network elements 101-104.
  • the network controller 110 discovers the prefixes of each one of the network elements within the data plane 120 through various mechanisms. Sniffing Address Resolution Protocol (ARP), Dynamic Host Configuration Protocol (DHCP), or the interaction with a cloud orchestrator (e.g., OpenS tack) are non-limiting examples of mechanisms that can be used by the network controller 110 to discover the prefixes of the network elements forming the data plane 120.
  • ARP Sniffing Address Resolution Protocol
  • DHCP Dynamic Host Configuration Protocol
  • OpenS tack e.g., OpenS tack
  • the network controller 110 advertises the L3VPN routes for reaching the local hosts to the edge NE 105. Similarly, the network controller 110 learns L3VPN routes from the DC-gateways that can be used for reaching the remote hosts and networks. For example, the network controller 110 learns from the edge NE 105 the prefixes of the remote network 140 (e.g., prefix 6.6.6.0/24) and the prefix (e.g., prefix 7.7.7.7) of the remote host 117 that is located in the network 130. The network controller 110 translates the prefixes of the local and remote NEs and networks into control and forwarding rules (e.g., OF rules) and programs the rules into the local NEs forming the data plane 120.
  • control and forwarding rules e.g., OF rules
  • the network controller 110 is operative to perform an optimized dissemination of Layer 3 forwarding information in the data plane 120 by configuring the network elements 101-104 according to the techniques described herein.
  • the network controller 110 selects one or more designated NEs from the set of network elements 101-104.
  • the number of designated network elements is strictly less than the number of all of the NEs 101-104 in the data plane of the network 100.
  • the network controller 110 selects the NE 103 as a designated NE for NEs 101, 102 and 103.
  • the NEs 101, 102, and 103 are network elements that enable the hosts 111, 112A and 113 to communicate.
  • the hosts 111, 112A, and 113 are part of a first L3VPN network within the data plane 120.
  • the network controller 110 may select more than one NE as a designated NE.
  • the network controller 110 configures the designated network element to include a Layer 3 forwarding table including a forwarding table entry for each network element from the plurality of network elements in the data plane of the SDN network.
  • the designated NE 103 receives one or more control commands and messages from the network controller 110 that include rules for programming the L3 forwarding table(s). As a result of these rules, the designated NE 103 includes the L3 forwarding table for the entire subset of NEs forming the L3VPN network.
  • the L3 forwarding table in the designated NE 103 is configured to include all routes for reaching all of the NEs 101 and 102 regardless of whether or not traffic will actually flow towards those NEs. While the embodiments describe an L3 forwarding table, in other embodiments, the NE includes multiple L3 forwarding table(s) that result from the configuration of the designated NE by the network controller 110.
  • the network controller 110 may select a network element from the data plane 120 that is coupled with the least number of hosts.
  • the network element having the least number of hosts in the network has the least number of forwarding table entries associated with local hosts served by the network element and can accommodate several forwarding table entries of many non-designated network elements.
  • the network controller 110 may select all of these NEs or alternatively select a subset of these NEs to be designated NEs.
  • the network controller 110 can receive from an administrator (e.g., a data center administrator or the network administrator) a list of network elements that can be used as designated network elements.
  • an administrator e.g., a data center administrator or the network administrator
  • the administrator can take into consideration, the memory and/or processing power of the network elements.
  • the network controller configures each non-designated network elements to include a different L3 forwarding table.
  • the NEs that are referred to as non-designated NEs are network elements different from the designated NEs. These non-designated NEs are not part of the set of designated NEs.
  • NE 101 and NE 102 are non-designated NEs in the L3VPN which has NE 103 as a designated NE.
  • the L3 forwarding table configured at the non-designated NEs includes a default forwarding table entry having the designated network element NE 103 as a next hop destination for a plurality of traffic flows causing each of the non-designated network elements (e.g., NE 101 and NE 102) to forward all traffic associated with the plurality of traffic flows to the designated network element.
  • the non-designated NEs 101, 102 receive one or more control commands and messages from the network controller 110 that include rules for programming the L3 forwarding table(s) of the NEs. As a result of these rules, the non-designated NEs 101-102 include an L3 forwarding table with a default forwarding table entry having the designated NE 103 as a next hop destination for the plurality of traffic flows.
  • all traffic flows forwarded at the non-designated NE that are not destined to a local port of the non-designated NE are forwarded towards the designated NE.
  • the network controller 110 configures NE 101 and NE 102 to forward all traffic that is not destined for one of the local ports PI and P2 towards the designated NE 103.
  • These traffic flows may include packet flows originating from sources connected to non-designated NEs (e.g., host 111 for NE 101 and host 112A for NE 102).
  • the packet flows are received at the NE 103, they are forwarded based on the L3-forwarding table that was configured by the network controller 110.
  • the NE 103 performs a LPM match based on the destination of the packet flows and sends the packets to the proper destination.
  • the destination of the packet flows can be internal or external to the data plane 120.
  • these NEs may provide redundancy to the network and can be used for load balancing the traffic flows over the data plane 120.
  • the solution is applicable to all types of network elements of the network (e.g., vSwitches, hardware switches, etc.).
  • the solution does not require any new OF extension and can use standard OpenFlow messages for configuring each of the network elements.
  • the solution presented herein allows for a fast and efficient configuration of a L3 network that does not impose a heavy processing burden on the network controller.
  • the solution allows for a reduction of the number of control commands and messages exchanged between the network controller and the NEs of a network when compared with standard forwarding dissemination techniques.
  • the solution also allows for judicious use of memory resources in the network as only a subset of NEs includes larger L3 forwarding tables.
  • FIG. 2A illustrates an exemplary Layer 3 forwarding table that may be configured in a designated NE of a network, in accordance with some embodiments.
  • the forwarding table 203 is a Layer 3 forwarding table configured in NE 103 by the network controller 110 when the network controller selects the NE 103 as a designated NE for the data plane 120.
  • the NE 103 is selected as a designated NE of the L3VPN formed by NE 101, NE 102, and NE 103.
  • the forwarding table 203 includes forwarding table entries for reaching each one of the other NEs of the L3VPN.
  • the first entry in the forwarding table 203 is accessed when flow packets are received at the NE 103 with a destination address of the packet that matches, based on the longest prefix match, the prefix“1.1.1.1/32” of the host 111 coupled with NE 101. This entry causes the matching flow packets to be sent to the NE 101 towards the host 111.
  • the second entry in the forwarding table 203 is accessed when flow packets are received at the NE 103 with a destination address of the packet that matches, based on the longest prefix match, the prefix“2.2.2.2/32” of the host 112A coupled with NE 102. This entry causes the matching flow packets to be sent to the NE 102 towards the host 112A.
  • the third entry in the forwarding table 203 is accessed when flow packets are received at the NE 103 with a destination address of the packet that matches, based on the longest prefix match, the prefix“3.3.3.3/32” of the local port P3 of the NE 103. This entry causes the matching flow packets to be sent to the port P3 towards the host 113.
  • the fourth entry in the forwarding table 203 is accessed when flow packets are received at the NE 103 with a destination address of the packet that matches, based on the longest prefix match, the prefix “6.6.6.0/24” of the remote network 140. This entry causes the matching flow packets to be sent to the edge NE 105 to be forwarded towards the remote network 140.
  • FIG. 2B illustrates an exemplary Layer 3 forwarding table that may be configured in a first non-designated NE of a network, in accordance with some embodiments.
  • the forwarding table 201 includes forwarding table entries for reaching the local port PI of the NE 101 and for reaching the designated NE 103.
  • the first entry in the forwarding table 201 is accessed when flow packets are received at the NE 101 with a destination address of the packet that matches, based on the longest prefix match, the prefix“1.1.1.1/32” of the host 111 coupled with NE 101. This entry causes the matching flow packets to be sent to the port PI of the NE 101 towards the host 111.
  • the second entry in the forwarding table 201 is accessed when flow packets are received at the NE 101 with a destination address of the packet that matches, based on the longest prefix match, the prefix“0.0.0.0/0”. This entry causes all flow packets that are not destined to the local host coupled with the NE 101 to be sent to the designated NE 103. Once NE 103 is reached, the NE 103 can forward these packet flows towards another NE (which can be internal or external to the data plane 120).
  • FIG. 2C illustrates an exemplary Layer 3 forwarding table that may be configured in a second non-designated NE of a network, in accordance with some embodiments.
  • the forwarding table 202 includes forwarding table entries for reaching the local port P2 of the NE 102 and for reaching the designated NE 103.
  • the first entry in the forwarding table 202 is accessed when flow packets are received at the NE 102 with a destination address of the packet that matches, based on the longest prefix match, the prefix“2.2.2.2/32” of the host 112A coupled with NE 102. This entry causes the matching flow packets to be sent to the port P2 of the NE 102 towards the host 112A.
  • the second entry in the forwarding table 202 is accessed when flow packets are received at the NE 102 with a destination address of the packet that matches, based on the longest prefix match, the prefix“0.0.0.0/0”. This entry causes all flow packets that are not destined to the local host 112A with prefix“2.2.2.2/32” coupled with the NE 102 to be sent to the designated NE 103. Once NE 103 is reached, the NE 103 can forward these packet flows towards another NE (which can be internal or external to the data plane 120).
  • the forwarding tables 201 and 202 include a reduced number of forwarding table entries.
  • Each of the forwarding tables 201 and 202 includes one or more forwarding table entries to forward the traffic flows to local host(s), if any.
  • the forwarding tables 201 and 202 also include a default route for all other traffic that is received at their respective NEs (i.e., NE 101 and NE 102). The default route causes all traffic that is not local to be forwarded towards a designated NE (here NE 103).
  • the L3 forwarding tables 201, 202, and 203 include routes towards one or more NEs that are part of the layer 3 network for which NE 103 is a designated NE.
  • the forwarding information to be configured on each one of the NEs of the data plane 120 might involve additional tables (flow table, group tables, etc.) and/or additional routes in the existing tables based on different forwarding protocols (e.g., the forwarding table may include Multiprotocol Label Switching (MPLS) labels) that are not illustrates in Figures 2A-C.
  • MPLS Multiprotocol Label Switching
  • each of the forwarding tables includes the prefix that is matched based on the longest prefix match as well as the overall action that is to be performed for this prefix. While the overall action is shown as a single action, this action may be the result of multiple sub-actions that are not illustrated.
  • the forwarding table of the designated NE has a size of 4 flows.
  • While the illustrated data plane 120 includes a relatively small number of NEs, typical networks will include hundreds to thousands of NEs and the use of the proposed solution for dissemination of L3 forwarding information results in a significant reduction in the number of forwarding table entries that needs to be programmed on the various NEs when compared with the number of forwarding table entries programmed based on standard forwarding information dissemination techniques. This results in an optimized use of network resources (e.g., messages exchanges between the network controller and the different NEs during configuration operations) and memory resources.
  • network resources e.g., messages exchanges between the network controller and the different NEs during configuration operations
  • the designated network elements may receive heavy flows (which can also be referred to as elephant flows).
  • a flow is considered to be a heavy flow when a network element receives more than a predetermined number N of packets for that flow during a predetermined period of time.
  • the number of packets N can be set by an administrator of the data plane 120 and may vary depending on the applications and other factors decided by the administrator.
  • the occurrence of heavy flows can cause additional latency and congestion in the network and may clog the designated NE of the data plane 120.
  • the network controller is operative to detect the heavy flows that are forwarded through a designated NE and configure one or more NEs of the network to allow the heavy flow to bypass the designated NE.
  • Figure 3 illustrates a block diagram of an exemplary network where heavy flows are forwarded through the designated NE in accordance with some embodiments.
  • the host 111 and the host 112A transmit packets to the remote network 140 as illustrated in Figure 3. These packets are forwarded by each of the NE 101 and the NE 102 towards the designated NE 103.
  • the designated NE 103 transmits the packets towards the edge NE 105 to be forwarded to the remote network 140.
  • the host 111 transmits a high number of packets per second in a sustained manner and the host 112A sends a lower number of packets per second.
  • the packets transmitted by the host 111 are considered a heavy flow of packets as their number exceeds a threshold set for this flow.
  • the network controller 110 is operative to configure the designated NE 103 to enable an efficient detection of the heavy flows and a rerouting of these flows such that the designated NE is avoided.
  • the network controller 110 configures each of the forwarding table entries with an associated predetermined threshold value which, when exceeded by the number of packets received at the designated NE, triggers an alert to the network controller 110.
  • L3FIB flow entries of the designated OF switch are programmed with OFPIT_STAT_TRIGGER including a predetermined value N.
  • the value of N can be determined by an administrator and is used to define which flows are considered heavy flows.
  • the value N is defined to be sufficiently large (e.g., thousands or millions of packets) for a determined period (e.g., per second).
  • the designated NE periodically monitors the number of packets received for a given flow in order to determine whether the flow is a heavy flow or not.
  • the designated NE 103 transmits a message to the network controller 110 each time a flow is determined to be a heavy flow. For example, in OF networks, when the OFPSTF_PERIODIC flag is set, the trigger will apply not only when the value N of the threshold is reached for the flow, but also when multiples of that value (e.g., 2N, 3N, ...) are reached for the flow.
  • the designated NE may monitor the number of packets and may transmit an alert when a threshold is reached once. For example, in OpenFlow, when the OFPSTF_ONFY_FIRST is set, only the first threshold that is crossed is considered, and other thresholds (e.g., multiples of the first threshold) are ignored.
  • FIG. 4A illustrates an exemplary Fayer 3 forwarding table that may be configured in a designated NE to efficiently forward heavy flows in the network, in accordance with some embodiments.
  • an instruction is added to indicate that when a number of packets of the flow reaches the number N, a message is triggered for transmission to the network controller 110.
  • the network controller 110 receives the message from the designated network element 103.
  • the message includes an indication that a number of packets of a flow forwarded by the designated network element 103 has exceeded a predetermined threshold N.
  • the network controller 110 configures one or more non-designated network elements to forward the packets of the flow. This causes the packets of the flow to bypass the designated network element when being forwarded in the network consequently reducing the backlog that may have been caused by the flow at the designated NE.
  • Figure 4B illustrates an exemplary Fayer 3 forwarding table that may be configured in a first non-designated NE to efficiently forward heavy flows in the network, in accordance with some embodiments.
  • Figure 4C illustrates an exemplary Fayer 3 forwarding table that may be configured in a second non-designated NE to efficiently forward heavy flows in the network, in accordance with some embodiments.
  • the network controller upon the receipt of a message that includes an indication that a number of packets of a flow (e.g., flow destined to the remote network 140 prefix 6.6.6.0/24) forwarded by the designated network element 103 has exceeded a predetermined threshold, the network controller configures the NE 101 and the NE 102 to directly forward the packets of this flow to the edge NE 105 and bypass the designated NE 103.
  • the network controller 110 adds to each one of the forwarding tables 401 and 402 the forwarding table entry for the prefix “6.6.6.0/24” for forwarding the flow of packets destined to the remote network 140. This flow of packets is forwarded towards the edge NE 105 and bypasses the designated NE 103.
  • Figure 5 illustrates a block diagram of an exemplary network where heavy flows are forwarded through non-designated NEs in accordance with some embodiments.
  • the heavy flow bypasses the designated NE 103 and is forwarded from each one of the NEs 101 and 102 towards the remote network 140 via the edge NE 105.
  • bypass of the designated network element by the packets of the flow is temporary and is set to expire after a predetermined period of time.
  • the configuration of the additional forwarding table entries in the non-designated NEs is set to expire after a predetermined time interval has elapsed causing the packets of the heavy flow to be forwarded through and by the designated NE 103 after the expiration of the interval of time.
  • the detection of the heavy flows improves the forwarding information dissemination techniques described above by enabling the bypass of the designated NE by heavy flows.
  • This detection does not mandate any complex elephant flow detection algorithms to be run on the network controller 110 or the designated network element 103. Instead a mechanism is used to configure each forwarding table entry with a predetermined threshold, and when certain packet flows exceed their respective threshold, an alert is transmitted to the network controller 110.
  • one or more network elements of the network are programmed such that the heavy flows are not forwarded to the designated NE.
  • these flows are configured in the non-designated NEs with an aging time causing them to phase out when that time is reached.
  • FIG. 6 illustrates a flow diagram of exemplary operations for optimized dissemination of Layer 3 forwarding information in a network in accordance with some embodiments.
  • the network controller 110 selects, at operation 602, a set of one or more designated network elements (e.g., NE 103) from multiple network elements (e.g., NE 101- 103) of a data plane of a network.
  • the number of designated network elements is strictly less than the number of all of the network elements in the data plane of the network.
  • the remaining network elements from the multiple network elements of the data plane of the network are non-designated network elements (e.g., NE 101 and NE 102).
  • the network controller 110 configures each designated network element (e.g., NE 103) from the set of designated network elements to include a Layer 3 forwarding table (e.g., forwarding table 203).
  • the forwarding table includes a forwarding table entry for each network element from the multiple network elements in the data plane of the network.
  • the network controller 110 configures each non- designated network element (e.g., NE 101 and NE 102) to include a Layer 3 forwarding table (e.g., forwarding tables 201 and 202).
  • the forwarding table includes a default forwarding table entry having a designated network element (e.g., NE 103) from the set of designated network elements as a next hop destination for a plurality of traffic flows causing each of the non-designated network elements to forward all traffic associated with the plurality of traffic flows to the designated network element (e.g., NE 103).
  • a designated network element e.g., NE 103
  • FIG. 7 illustrates a flow diagram of exemplary operations performed by a network controller when determining that a heavy flow is forwarded at a designated network element, in accordance with some embodiments.
  • the network controller 110 configures, at operation 702, for each traffic flow a predetermined threshold that when exceeded by the packets of the traffic flow forwarded by the designated network element causes the designated network element to transmit a message to the network controller 110.
  • the network controller 110 receives the message from the designated network element (e.g., NE 103), where the message includes an indication that a number of packets of a traffic flow forwarded by the designated network element has exceeded the predetermined threshold.
  • the designated network element e.g., NE 103
  • the network controller 110 configures, at operation 706, one or more non-designated network elements (e.g., NE 101 and NE 102) to forward the packets of the traffic flow causing the packets of the traffic flow to bypass the designated network element (e.g., NE 103) when being forwarded in the network.
  • one or more non-designated network elements e.g., NE 101 and NE 102
  • the designated network element e.g., NE 103
  • the embodiments described herein present efficient and optimized Layer 3 forwarding information dissemination techniques.
  • the network controller of a centralized network selects a subset of designated NEs to be configured with a complete L3 forwarding table (e.g., the entire L3FIB of the network) and configures the rest of the NEs of the network to forward any non-local traffic flows towards at least one of the designated NEs resulting in a significant reduction of the number of L3 forwarding table entries that are programmed by the network controller for a particular network.
  • a complete L3 forwarding table e.g., the entire L3FIB of the network
  • the solution significantly reduces the number of flows to be written on multiple network elements of the network as only a subset of these NEs is to include the complete L3 -forwarding table of the network. Further, the solution is applicable to all types of network elements of the network (e.g., vSwitches, hardware switches, etc.). In OF networks, the solution does not require any new OF extension and can use standard OpenFlow messages for configuring each of the network elements.
  • a dynamic programming of the NEs is performed by enabling the detection of heavy flows that may reach the designated NEs and by configuring the network to forward these flows while bypassing the designated NE. This immediately alleviates any potential backlog at the designated NE.
  • the solution presented herein does not overload the network controller with overheads (such as periodic statistics queries from the network element). Instead, the monitoring of the traffic at the designated NEs is solely performed based on events detected at the NEs.
  • An electronic device stores and transmits (internally and/or with other electronic devices over a network) code (which is composed of software instructions and which is sometimes referred to as computer program code or a computer program) and/or data using machine-readable media (also called computer-readable media), such as machine-readable storage media (e.g., magnetic disks, optical disks, solid state drives, read only memory (ROM), flash memory devices, phase change memory) and machine-readable transmission media (also called a carrier) (e.g., electrical, optical, radio, acoustical or other form of propagated signals - such as carrier waves, infrared signals).
  • machine-readable media also called computer-readable media
  • machine-readable storage media e.g., magnetic disks, optical disks, solid state drives, read only memory (ROM), flash memory devices, phase change memory
  • machine-readable transmission media also called a carrier
  • carrier e.g., electrical, optical, radio, acoustical or other form of propagated signals - such as carrier waves, inf
  • an electronic device e.g., a computer
  • hardware and software such as a set of one or more processors (e.g., wherein a processor is a microprocessor, controller, microcontroller, central processing unit, digital signal processor, application specific integrated circuit, field programmable gate array, other electronic circuitry, a combination of one or more of the preceding) coupled to one or more machine-readable storage media to store code for execution on the set of processors and/or to store data.
  • processors e.g., wherein a processor is a microprocessor, controller, microcontroller, central processing unit, digital signal processor, application specific integrated circuit, field programmable gate array, other electronic circuitry, a combination of one or more of the preceding
  • an electronic device may include non-volatile memory containing the code since the non-volatile memory can persist code/data even when the electronic device is turned off (when power is removed), and while the electronic device is turned on that part of the code that is to be executed by the processor(s) of that electronic device is typically copied from the slower non-volatile memory into volatile memory (e.g., dynamic random access memory (DRAM), static random access memory (SRAM)) of that electronic device.
  • Typical electronic devices also include a set or one or more physical network interface(s) (NI(s)) to establish network connections (to transmit and/or receive code and/or data using propagating signals) with other electronic devices.
  • NI(s) physical network interface
  • a physical NI may comprise radio circuitry capable of receiving data from other electronic devices over a wireless connection and/or sending data out to other devices via a wireless connection.
  • This radio circuitry may include transmitter(s), receiver(s), and/or transceiver(s) suitable for radiofrequency communication.
  • the radio circuitry may convert digital data into a radio signal having the appropriate parameters (e.g., frequency, timing, channel, bandwidth, etc.). The radio signal may then be transmitted via antennas to the appropriate recipient(s).
  • the set of physical NI(s) may comprise network interface controller(s) (NICs), also known as a network interface card, network adapter, or local area network (LAN) adapter.
  • NICs network interface controller
  • the NIC(s) may facilitate in connecting the electronic device to other electronic devices allowing them to communicate via wire through plugging in a cable to a physical port connected to a NIC.
  • One or more parts of an embodiment of the invention may be implemented using different combinations of software, firmware, and/or hardware.
  • a network device is an electronic device that communicatively interconnects other electronic devices on the network (e.g., other network devices, end-user devices).
  • Some network devices are“multiple services network devices” that provide support for multiple networking functions (e.g., routing, bridging, switching, Layer 2 aggregation, session border control, Quality of Service, and/or subscriber management), and/or provide support for multiple application services (e.g., data, voice, and video).
  • Figure 8A illustrates connectivity between network devices (NDs) within an exemplary network, as well as three exemplary implementations of the NDs, according to some embodiments of the invention.
  • Figure 8A shows NDs 800A-H, and their connectivity by way of lines between 800A-800B, 800B-800C, 800C-800D, 800D-800E, 800E-800F, 800F-800G, and 800A-800G, as well as between 800H and each of 800A, 800C, 800D, and 800G.
  • These NDs are physical devices, and the connectivity between these NDs can be wireless or wired (often referred to as a link).
  • NDs 800A, 800E, and 800F An additional line extending from NDs 800A, 800E, and 800F illustrates that these NDs act as ingress and egress points for the network (and thus, these NDs are sometimes referred to as edge NDs; while the other NDs may be called core NDs).
  • Two of the exemplary ND implementations in Figure 8 A are: 1) a special-purpose network device 802 that uses custom application-specific integrated-circuits (ASICs) and a special-purpose operating system (OS); and 2) a general purpose network device 804 that uses common off-the-shelf (COTS) processors and a standard OS.
  • ASICs application-specific integrated-circuits
  • OS special-purpose operating system
  • COTS common off-the-shelf
  • the special-purpose network device 802 includes networking hardware 810 comprising a set of one or more processor(s) 812, forwarding resource(s) 814 (which typically include one or more ASICs and/or network processors), and physical network interfaces (NIs) 816 (through which network connections are made, such as those shown by the connectivity between NDs 800A-H), as well as non-transitory machine readable storage media 818 having stored therein networking software 820.
  • the networking software 820 may be executed by the networking hardware 810 to instantiate a set of one or more networking software instance(s) 822.
  • Each of the networking software instance(s) 822, and that part of the networking hardware 810 that executes that network software instance form a separate virtual network element 830A-R.
  • Each of the virtual network element(s) (VNEs) 830A-R includes a control communication and configuration module 832A-R (sometimes referred to as a local control module or control communication module) and forwarding table(s) 834A-R, such that a given virtual network element (e.g., 830A) includes the control communication and configuration module (e.g., 832A), a set of one or more forwarding table(s) (e.g., 834A), and that portion of the networking hardware 810 that executes the virtual network element (e.g., 830A).
  • a control communication and configuration module 832A-R sometimes referred to as a local control module or control communication module
  • forwarding table(s) 834A-R forwarding table(s) 834A-R
  • the special-purpose network device 802 is often physically and/or logically considered to include: 1) a ND control plane 824 (sometimes referred to as a control plane) comprising the processor(s) 812 that execute the control communication and configuration module(s) 832A-R; and 2) a ND forwarding plane 826 (sometimes referred to as a forwarding plane, a data plane, or a media plane) comprising the forwarding resource(s) 814 that utilize the forwarding table(s) 834A-R and the physical NIs 816.
  • a ND control plane 824 (sometimes referred to as a control plane) comprising the processor(s) 812 that execute the control communication and configuration module(s) 832A-R
  • a ND forwarding plane 826 sometimes referred to as a forwarding plane, a data plane, or a media plane
  • the forwarding resource(s) 814 that utilize the forwarding table(s) 834A-R and the physical NIs 816.
  • the ND control plane 824 (the processor(s) 812 executing the control communication and configuration module(s) 832A-R) is typically responsible for participating in controlling how data (e.g., packets) is to be routed (e.g., the next hop for the data and the outgoing physical NI for that data) and storing that routing information in the forwarding table(s) 834A-R, and the ND forwarding plane 826 is responsible for receiving that data on the physical NIs 816 and forwarding that data out the appropriate ones of the physical NIs 816 based on the forwarding table(s) 834A-
  • Figure 8B illustrates an exemplary way to implement the special-purpose network device 802 according to some embodiments of the invention.
  • Figure 8B shows a special- purpose network device including cards 838 (typically hot pluggable). While in some embodiments the cards 838 are of two types (one or more that operate as the ND forwarding plane 826 (sometimes called line cards), and one or more that operate to implement the ND control plane 824 (sometimes called control cards)), alternative embodiments may combine functionality onto a single card and/or include additional card types (e.g., one additional type of card is called a service card, resource card, or multi-application card).
  • additional card types e.g., one additional type of card is called a service card, resource card, or multi-application card.
  • a service card can provide specialized processing (e.g., Layer 4 to Layer 7 services (e.g., firewall, Internet Protocol Security (IPsec), Secure Sockets Layer (SSL) / Transport Layer Security (TLS), Intrusion Detection System (IDS), peer-to-peer (P2P), Voice over IP (VoIP) Session Border Controller, Mobile Wireless Gateways (Gateway General Packet Radio Service (GPRS) Support Node (GGSN), Evolved Packet Core (EPC) Gateway)).
  • Layer 4 to Layer 7 services e.g., firewall, Internet Protocol Security (IPsec), Secure Sockets Layer (SSL) / Transport Layer Security (TLS), Intrusion Detection System (IDS), peer-to-peer (P2P), Voice over IP (VoIP) Session Border Controller, Mobile Wireless Gateways (Gateway General Packet Radio Service (GPRS) Support Node (GGSN), Evolved Packet Core (EPC) Gateway)
  • GPRS General Pack
  • the general purpose network device 804 includes hardware 840 comprising a set of one or more processor(s) 842 (which are often COTS processors) and physical NIs 846, as well as non-transitory machine readable storage media 848 having stored therein software 850.
  • the processor(s) 842 execute the software 850 to instantiate one or more sets of one or more applications 864A-R. While one embodiment does not implement virtualization, alternative embodiments may use different forms of virtualization.
  • the virtualization layer 854 represents the kernel of an operating system (or a shim executing on a base operating system) that allows for the creation of multiple instances 862A-R called software containers that may each be used to execute one (or more) of the sets of applications 864A-R; where the multiple software containers (also called virtualization engines, virtual private servers, or jails) are user spaces (typically a virtual memory space) that are separate from each other and separate from the kernel space in which the operating system is run; and where the set of applications running in a given user space, unless explicitly allowed, cannot access the memory of the other processes.
  • the multiple software containers also called virtualization engines, virtual private servers, or jails
  • user spaces typically a virtual memory space
  • the virtualization layer 854 represents a hypervisor (sometimes referred to as a virtual machine monitor (VMM)) or a hypervisor executing on top of a host operating system, and each of the sets of applications 864A-R is run on top of a guest operating system within an instance 862A-R called a virtual machine (which may in some cases be considered a tightly isolated form of software container) that is run on top of the hypervisor - the guest operating system and application may not know they are running on a virtual machine as opposed to running on a “bare metal” host electronic device, or through para-virtualization the operating system and/or application may be aware of the presence of virtualization for optimization purposes.
  • a hypervisor sometimes referred to as a virtual machine monitor (VMM)
  • VMM virtual machine monitor
  • one, some or all of the applications are implemented as unikernel(s), which can be generated by compiling directly with an application only a limited set of libraries (e.g., from a library operating system (LibOS) including drivers/libraries of OS services) that provide the particular OS services needed by the application.
  • libraries e.g., from a library operating system (LibOS) including drivers/libraries of OS services
  • unikernel can be implemented to run directly on hardware 840, directly on a hypervisor (in which case the unikemel is sometimes described as running within a LibOS virtual machine), or in a software container
  • embodiments can be implemented fully with unikernels running directly on a hypervisor represented by virtualization layer 854, unikernels running within software containers represented by instances 862A-R, or as a combination of unikemels and the above-described techniques (e.g., unikemels and virtual machines both run directly on a hypervisor, unikemels and sets of applications that are run in different software containers).
  • the virtual network element(s) 860A-R perform similar functionality to the virtual network element(s) 830A-R - e.g., similar to the control communication and configuration module(s) 832A and forwarding table(s) 834A (this virtualization of the hardware 840 is sometimes referred to as network function virtualization (NFV)).
  • NFV network function virtualization
  • CPE customer premise equipment
  • each instance 862A-R corresponding to one VNE 860A-R
  • alternative embodiments may implement this correspondence at a finer level granularity (e.g., line card virtual machines virtualize line cards, control card virtual machine virtualize control cards, etc.); it should be understood that the techniques described herein with reference to a correspondence of instances 862A-R to VNEs also apply to embodiments where such a finer level of granularity and/or unikernels are used.
  • the virtualization layer 854 includes a virtual switch that provides similar forwarding services as a physical Ethernet switch. Specifically, this virtual switch forwards traffic between instances 862A-R and the physical NI(s) 846, as well as optionally between the instances 862A-R; in addition, this virtual switch may enforce network isolation between the VNEs 860A-R that by policy are not permitted to communicate with each other (e.g., by honoring virtual local area networks (VLANs)).
  • VLANs virtual local area networks
  • the third exemplary ND implementation in Figure 8A is a hybrid network device 806, which includes both custom ASICs/special-purpose OS and COTS processors/standard OS in a single ND or a single card within an ND.
  • a platform VM i.e., a VM that that implements the functionality of the special-purpose network device 802 could provide for para- virtualization to the networking hardware present in the hybrid network device 806.
  • NE network element
  • each of the VNEs receives data on the physical NIs (e.g., 816, 846) and forwards that data out the appropriate ones of the physical NIs (e.g., 816, 846).
  • the physical NIs e.g., 816, 846
  • a VNE implementing IP router functionality forwards IP packets on the basis of some of the IP header information in the IP packet; where IP header information includes source IP address, destination IP address, source port, destination port (where “source port” and“destination port” refer herein to protocol ports, as opposed to physical ports of a ND), transport protocol (e.g., user datagram protocol (UDP), Transmission Control Protocol (TCP), and differentiated services code point (DSCP) values.
  • transport protocol e.g., user datagram protocol (UDP), Transmission Control Protocol (TCP), and differentiated services code point (DSCP) values.
  • UDP user datagram protocol
  • TCP Transmission Control Protocol
  • DSCP differentiated services code point
  • Figure 8C illustrates various exemplary ways in which VNEs may be coupled according to some embodiments of the invention.
  • Figure 8C shows VNEs 870A.1-870A.P (and optionally VNEs 870A.Q-870A.R) implemented in ND 800A and VNE 870H.1 in ND 800H.
  • VNEs 870A.1-P are separate from each other in the sense that they can receive packets from outside ND 800A and forward packets outside of ND 800A; VNE 870A.1 is coupled with VNE 870H.1, and thus they communicate packets between their respective NDs; VNE 870A.2-870A.3 may optionally forward packets between themselves without forwarding them outside of the ND 800A; and VNE 870A.P may optionally be the first in a chain of VNEs that includes VNE 870A.Q followed by VNE 870A.R (this is sometimes referred to as dynamic service chaining, where each of the VNEs in the series of VNEs provides a different service - e.g., one or more layer 4-7 network services). While Figure 8C illustrates various exemplary relationships between the VNEs, alternative embodiments may support other relationships (e.g., more/fewer VNEs, more/fewer dynamic service chains, multiple different dynamic service chains with some common VNEs and some different V
  • the NDs of Figure 8A may form part of the Internet or a private network; and other electronic devices (not shown; such as end user devices including workstations, laptops, netbooks, tablets, palm tops, mobile phones, smartphones, phablets, multimedia phones, Voice Over Internet Protocol (VOIP) phones, terminals, portable media players, GPS units, wearable devices, gaming systems, set-top boxes, Internet enabled household appliances) may be coupled to the network (directly or through other networks such as access networks) to communicate over the network (e.g., the Internet or virtual private networks (VPNs) overlaid on (e.g., tunneled through) the Internet) with each other (directly or through servers) and/or access content and/or services.
  • VOIP Voice Over Internet Protocol
  • Such content and/or services are typically provided by one or more servers (not shown) belonging to a service/content provider or one or more end user devices (not shown) participating in a peer- to-peer (P2P) service, and may include, for example, public webpages (e.g., free content, store fronts, search services), private webpages (e.g., username/password accessed webpages providing email services), and/or corporate networks over VPNs.
  • end user devices may be coupled (e.g., through customer premise equipment coupled to an access network (wired or wirelessly)) to edge NDs, which are coupled (e.g., through one or more core NDs) to other edge NDs, which are coupled to electronic devices acting as servers.
  • one or more of the electronic devices operating as the NDs in Figure 8 A may also host one or more such servers (e.g., in the case of the general purpose network device 804, one or more of the software instances 862A-R may operate as servers; the same would be true for the hybrid network device 806; in the case of the special-purpose network device 802, one or more such servers could also be run on a virtualization layer executed by the processor(s) 812); in which case the servers are said to be co-located with the VNEs of that ND.
  • the servers are said to be co-located with the VNEs of that ND.
  • a virtual network is a logical abstraction of a physical network (such as that in Figure 8 A) that provides network services (e.g., L2 and/or L3 services).
  • a virtual network can be implemented as an overlay network (sometimes referred to as a network virtualization overlay) that provides network services (e.g., layer 2 (L2, data link layer) and/or layer 3 (L3, network layer) services) over an underlay network (e.g., an L3 network, such as an Internet Protocol (IP) network that uses tunnels (e.g., generic routing encapsulation (GRE), layer 2 tunneling protocol (L2TP), IPSec) to create the overlay network).
  • IP Internet Protocol
  • a network virtualization edge sits at the edge of the underlay network and participates in implementing the network virtualization; the network-facing side of the NVE uses the underlay network to tunnel frames to and from other NVEs; the outward-facing side of the NVE sends and receives data to and from systems outside the network.
  • a virtual network instance is a specific instance of a virtual network on a NVE (e.g., a NE/VNE on an ND, a part of a NE/VNE on a ND where that NE/VNE is divided into multiple VNEs through emulation); one or more VNIs can be instantiated on an NVE (e.g., as different VNEs on an ND).
  • a virtual access point is a logical connection point on the NVE for connecting external systems to a virtual network; a VAP can be physical or virtual ports identified through logical interface identifiers (e.g., a VLAN ID).
  • Examples of network services include: 1) an Ethernet LAN emulation service (an Ethernet-based multipoint service similar to an Internet Engineering Task Force (IETF) Multiprotocol Label Switching (MPLS) or Ethernet VPN (EVPN) service) in which external systems are interconnected across the network by a LAN environment over the underlay network (e.g., an NVE provides separate L2 VNIs (virtual switching instances) for different such virtual networks, and L3 (e.g., IP/MPLS) tunneling encapsulation across the underlay network); and 2) a virtualized IP forwarding service (similar to IETF IP VPN (e.g., Border Gateway Protocol (BGP)/MPLS IP VPN) from a service definition perspective) in which external systems are interconnected across the network by an L3 environment over the underlay network (e.g., an NVE provides separate L3 VNIs (forwarding and routing instances) for different such virtual networks, and L3 (e.g., IP/MPLS) tunneling encapsulation across the underlay network)).
  • Network services may also include quality of service capabilities (e.g., traffic classification marking, traffic conditioning and scheduling), security capabilities (e.g., filters to protect customer premises from network - originated attacks, to avoid malformed route announcements), and management capabilities (e.g., full detection and processing).
  • quality of service capabilities e.g., traffic classification marking, traffic conditioning and scheduling
  • security capabilities e.g., filters to protect customer premises from network - originated attacks, to avoid malformed route announcements
  • management capabilities e.g., full detection and processing
  • Figure 8D illustrates a network with a single network element on each of the NDs of Figure 8A, and within this straight forward approach contrasts a traditional distributed approach (commonly used by traditional routers) with a centralized approach for maintaining reachability and forwarding information (also called network control), according to some embodiments of the invention.
  • Figure 8D illustrates network elements (NEs) 870A-H with the same connectivity as the NDs 800A-H of Figure 8A.
  • Figure 8D illustrates that the distributed approach 872 distributes responsibility for generating the reachability and forwarding information across the NEs 870A-H; in other words, the process of neighbor discovery and topology discovery is distributed.
  • the control communication and configuration module(s) 832A-R of the ND control plane 824 typically include a reachability and forwarding information module to implement one or more routing protocols (e.g., an exterior gateway protocol such as Border Gateway Protocol (BGP), Interior Gateway Protocol(s) (IGP) (e.g., Open Shortest Path First (OSPF), Intermediate System to Intermediate System (IS-IS), Routing Information Protocol (RIP), Label Distribution Protocol (LDP), Resource Reservation Protocol (RSVP) (including RSVP- Traffic Engineering (TE): Extensions to RSVP for LSP Tunnels and Generalized Multi- Protocol Label Switching (GMPLS) Signaling RSVP-TE)) that communicate with other NEs to exchange routes, and then selects those routes based on one or more routing metrics.
  • Border Gateway Protocol BGP
  • IGP Interior Gateway Protocol
  • OSPF Open Shortest Path First
  • IS-IS Intermediate System to Intermediate System
  • RIP Routing Information Protocol
  • LDP Label Distribution Protocol
  • RSVP Resource Reservation Protocol
  • RSVP- Traffic Engineering TE
  • GMPS Generalized Multi- Protocol Label
  • the NEs 870A-H e.g., the processor(s) 812 executing the control communication and configuration module(s) 832A-R
  • Routes and adjacencies are stored in one or more routing structures (e.g., Routing Information Base (RIB), Label Information Base (LIB), one or more adjacency structures) on the ND control plane 824.
  • routing structures e.g., Routing Information Base (RIB), Label Information Base (LIB), one or more adjacency structures
  • the ND control plane 824 programs the ND forwarding plane 826 with information (e.g., adjacency and route information) based on the routing structure(s). For example, the ND control plane 824 programs the adjacency and route information into one or more forwarding table(s) 834A-R (e.g., Forwarding Information Base (FIB), Label Forwarding Information Base (LFIB), and one or more adjacency structures) on the ND forwarding plane 826.
  • the ND can store one or more bridging tables that are used to forward data based on the layer 2 information in that data. While the above example uses the special-purpose network device 802, the same distributed approach 872 can be implemented on the general purpose network device 804 and the hybrid network device 806.
  • Figure 8D illustrates that a centralized approach 874 (also known as software defined networking (SDN)) that decouples the system that makes decisions about where traffic is sent from the underlying systems that forwards traffic to the selected destination.
  • the illustrated centralized approach 874 has the responsibility for the generation of reachability and forwarding information in a centralized control plane 876 (sometimes referred to as a SDN control module, controller, network controller, OpenFlow controller, SDN controller, control plane node, network virtualization authority, or management control entity), and thus the process of neighbor discovery and topology discovery is centralized.
  • a centralized control plane 876 sometimes referred to as a SDN control module, controller, network controller, OpenFlow controller, SDN controller, control plane node, network virtualization authority, or management control entity
  • the centralized control plane 876 has a south bound interface 882 with a data plane 880 (sometime referred to the infrastructure layer, network forwarding plane, or forwarding plane (which should not be confused with a ND forwarding plane)) that includes the NEs 870A-H (sometimes referred to as switches, forwarding elements, data plane elements, or nodes).
  • the centralized control plane 876 includes a network controller 878, which includes a centralized reachability and forwarding information module 879 that determines the reachability within the network and distributes the forwarding information to the NEs 870A- H of the data plane 880 over the south bound interface 882 (which may use the OpenFlow protocol).
  • the network intelligence is centralized in the centralized control plane 876 executing on electronic devices that are typically separate from the NDs.
  • each of the control communication and configuration module(s) 832A-R of the ND control plane 824 typically include a control agent that provides the VNE side of the south bound interface 882.
  • the ND control plane 824 (the processor(s) 812 executing the control communication and configuration module(s) 832A-R) performs its responsibility for participating in controlling how data (e.g., packets) is to be routed (e.g., the next hop for the data and the outgoing physical NI for that data) through the control agent communicating with the centralized control plane 876 to receive the forwarding information (and in some cases, the reachability information) from the centralized reachability and forwarding information module 879 (it should be understood that in some embodiments of the invention, the control communication and configuration module(s) 832A-R, in addition to communicating with the centralized control plane 876, may also play some role in determining reachability and/or calculating forwarding information - albeit less so than in the case of a distributed approach; such embodiments are generally considered to fall under the centralized approach 874, but may also be considered a hybrid approach).
  • data e.g., packets
  • the control agent communicating with the centralized control plane 876 to receive the forwarding
  • the same centralized approach 874 can be implemented with the general purpose network device 804 (e.g., each of the VNE 860A-R performs its responsibility for controlling how data (e.g., packets) is to be routed (e.g., the next hop for the data and the outgoing physical NI for that data) by communicating with the centralized control plane 876 to receive the forwarding information (and in some cases, the reachability information) from the centralized reachability and forwarding information module 879; it should be understood that in some embodiments of the invention, the VNEs 860A-R, in addition to communicating with the centralized control plane 876, may also play some role in determining reachability and/or calculating forwarding information - albeit less so than in the case of a distributed approach) and the hybrid network device 806.
  • the general purpose network device 804 e.g., each of the VNE 860A-R performs its responsibility for controlling how data (e.g., packets) is to be routed (e.g., the next hop for
  • NFV is able to support SDN by providing an infrastructure upon which the SDN software can be run
  • NFV and SDN both aim to make use of commodity server hardware and physical switches.
  • Figure 8D also shows that the centralized control plane 876 has a north bound interface 884 to an application layer 886, in which resides application(s) 888.
  • the centralized control plane 876 has the ability to form virtual networks 892 (sometimes referred to as a logical forwarding plane, network services, or overlay networks (with the NEs 870A-H of the data plane 880 being the underlay network)) for the application(s) 888.
  • virtual networks 892 sometimes referred to as a logical forwarding plane, network services, or overlay networks (with the NEs 870A-H of the data plane 880 being the underlay network)
  • the centralized control plane 876 maintains a global view of all NDs and configured NEs/VNEs, and it maps the virtual networks to the underlying NDs efficiently (including maintaining these mappings as the physical network changes either through hardware (ND, link, or ND component) failure, addition, or removal).
  • Figure 8D shows the distributed approach 872 separate from the centralized approach 874
  • the effort of network control may be distributed differently or the two combined in certain embodiments of the invention.
  • embodiments may generally use the centralized approach (SDN) 874, but have certain functions delegated to the NEs (e.g., the distributed approach may be used to implement one or more of fault monitoring, performance monitoring, protection switching, and primitives for neighbor and/or topology discovery); or 2) embodiments of the invention may perform neighbor discovery and topology discovery via both the centralized control plane and the distributed protocols, and the results compared to raise exceptions where they do not agree.
  • SDN centralized approach
  • Such embodiments are generally considered to fall under the centralized approach 874, but may also be considered a hybrid approach.
  • Figure 8D illustrates the simple case where each of the NDs 800A-H implements a single NE 870A-H
  • the network control approaches described with reference to Figure 8D also work for networks where one or more of the NDs 800A-H implement multiple VNEs (e.g., VNEs 830A-R, VNEs 860A-R, those in the hybrid network device 806).
  • the network controller 878 may also emulate the implementation of multiple VNEs in a single ND.
  • the network controller 878 may present the implementation of a VNE/NE in a single ND as multiple VNEs in the virtual networks 892 (all in the same one of the virtual network(s) 892, each in different ones of the virtual network(s) 892, or some combination).
  • the network controller 878 may cause an ND to implement a single VNE (a NE) in the underlay network, and then logically divide up the resources of that NE within the centralized control plane 876 to present different VNEs in the virtual network(s) 892 (where these different VNEs in the overlay networks are sharing the resources of the single VNE/NE implementation on the ND in the underlay network).
  • Figures 8E and 8F respectively illustrate exemplary abstractions of NEs and VNEs that the network controller 878 may present as part of different ones of the virtual networks 892.
  • Figure 8E illustrates the simple case of where each of the NDs 800A-H implements a single NE 870A-H (see Figure 8D), but the centralized control plane 876 has abstracted multiple of the NEs in different NDs (the NEs 870A-C and G-H) into (to represent) a single NE 8701 in one of the virtual network(s) 892 of Figure 8D, according to some embodiments of the invention.
  • Figure 8E shows that in this virtual network, the NE 8701 is coupled to NE 870D and 870F, which are both still coupled to NE 870E.
  • Figure 8F illustrates a case where multiple VNEs (VNE 870A.1 and VNE 870H.1) are implemented on different NDs (ND 800A and ND 800H) and are coupled to each other, and where the centralized control plane 876 has abstracted these multiple VNEs such that they appear as a single VNE 870T within one of the virtual networks 892 of Figure 8D, according to some embodiments of the invention.
  • the abstraction of a NE or VNE can span multiple NDs.
  • the electronic device(s) running the centralized control plane 876 may be implemented a variety of ways (e.g., a special purpose device, a general-purpose (e.g., COTS) device, or hybrid device). These electronic device(s) would similarly include processor(s), a set or one or more physical NIs, and a non-transitory machine -readable storage medium having stored thereon the centralized control plane software.
  • Figure 9 illustrates, a general purpose control plane device 904 including hardware 940 comprising a set of one or more processor(s) 942 (which are often COTS processors) and physical NIs 946, as well as non- transitory machine readable storage media 948 having stored therein centralized control plane (CCP) software 950.
  • processor(s) 942 which are often COTS processors
  • NIs 946 physical NIs 946
  • non- transitory machine readable storage media 948 having stored therein centralized control plane (CCP) software 950.
  • CCP centralized control plane
  • the processor(s) 942 typically execute software to instantiate a virtualization layer 954 (e.g., in one embodiment the virtualization layer 954 represents the kernel of an operating system (or a shim executing on a base operating system) that allows for the creation of multiple instances 962A-R called software containers (representing separate user spaces and also called virtualization engines, virtual private servers, or jails) that may each be used to execute a set of one or more applications; in another embodiment the virtualization layer 954 represents a hypervisor (sometimes referred to as a virtual machine monitor (VMM)) or a hypervisor executing on top of a host operating system, and an application is run on top of a guest operating system within an instance 962A-R called a virtual machine (which in some cases may be considered a tightly isolated form of software container) that is run by the hypervisor ; in another embodiment, an application is implemented as a unikemel, which can be generated by compiling directly with an application only a limited set
  • VMM virtual machine monitor
  • an instance of the CCP software 950 (illustrated as CCP instance 976A) is executed (e.g., within the instance 962A) on the virtualization layer 954.
  • the CCP instance 976A is executed, as a unikemel or on top of a host operating system, on the“bare metal” general purpose control plane device 904.
  • the instantiation of the CCP instance 976A, as well as the virtualization layer 954 and instances 962A-R if implemented, are collectively referred to as software instance(s) 952.
  • the CCP instance 976A includes a network controller instance 978.
  • the network controller instance 978 includes a centralized reachability and forwarding information module instance 979 (which is a middleware layer providing the context of the network controller 878 to the operating system and communicating with the various NEs), and an CCP application layer 980 (sometimes referred to as an application layer) over the middleware layer (providing the intelligence required for various network operations such as protocols, network situational awareness, and user - interfaces).
  • this CCP application layer 980 within the centralized control plane 876 works with virtual network view(s) (logical view(s) of the network) and the middleware layer provides the conversion from the virtual networks to the physical view.
  • the centralized control plane 876 transmits relevant messages to the data plane 880 based on CCP application layer 980 calculations and middleware layer mapping for each flow.
  • a flow may be defined as a set of packets whose headers match a given pattern of bits; in this sense, traditional IP forwarding is also flow-based forwarding where the flows are defined by the destination IP address for example; however, in other implementations, the given pattern of bits used for a flow definition may include more fields (e.g., 10 or more) in the packet headers.
  • Different NDs/NEs/VNEs of the data plane 880 may receive different messages, and thus different forwarding information.
  • the data plane 880 processes these messages and programs the appropriate flow information and corresponding actions in the forwarding tables (sometime referred to as flow tables) of the appropriate NE/VNEs, and then the NEs/VNEs map incoming packets to flows represented in the forwarding tables and forward packets based on the matches in the forwarding tables.
  • Standards such as OpenFlow define the protocols used for the messages, as well as a model for processing the packets.
  • the model for processing packets includes header parsing, packet classification, and making forwarding decisions. Header parsing describes how to interpret a packet based upon a well-known set of protocols. Some protocol fields are used to build a match structure (or key) that will be used in packet classification (e.g., a first key field could be a source media access control (MAC) address, and a second key field could be a destination MAC address).
  • MAC media access control
  • Packet classification involves executing a lookup in memory to classify the packet by determining which entry (also referred to as a forwarding table entry or flow entry) in the forwarding tables best matches the packet based upon the match structure, or key, of the forwarding table entries. It is possible that many flows represented in the forwarding table entries can correspond/match to a packet; in this case the system is typically configured to determine one forwarding table entry from the many according to a defined scheme (e.g., selecting a first forwarding table entry that is matched).
  • Forwarding table entries include both a specific set of match criteria (a set of values or wildcards, or an indication of what portions of a packet should be compared to a particular value/values/wildcards, as defined by the matching capabilities - for specific fields in the packet header, or for some other packet content), and a set of one or more actions for the data plane to take on receiving a matching packet. For example, an action may be to push a header onto the packet, for the packet using a particular port, flood the packet, or simply drop the packet.
  • TCP transmission control protocol
  • an unknown packet for example, a“missed packet” or a“match- miss” as used in OpenFlow parlance
  • the packet (or a subset of the packet header and content) is typically forwarded to the centralized control plane 876.
  • the centralized control plane 876 will then program forwarding table entries into the data plane 880 to accommodate packets belonging to the flow of the unknown packet. Once a specific forwarding table entry has been programmed into the data plane 880 by the centralized control plane 876, the next packet with matching credentials will match that forwarding table entry and take the set of actions associated with that matched entry.
  • a network interface may be physical or virtual; and in the context of IP, an interface address is an IP address assigned to a NI, be it a physical NI or virtual NI.
  • a virtual NI may be associated with a physical NI, with another virtual interface, or stand on its own (e.g., a loopback interface, a point-to-point protocol interface).
  • a NI physical or virtual
  • a loopback interface (and its loopback address) is a specific type of virtual NI (and IP address) of a NE/VNE (physical or virtual) often used for management purposes; where such an IP address is referred to as the nodal loopback address.
  • IP addresses of that ND are referred to as IP addresses of that ND; at a more granular level, the IP address(es) assigned to NI(s) assigned to a NE/VNE implemented on a ND can be referred to as IP addresses of that NE/VNE.
  • Each VNE e.g., a virtual router, a virtual bridge (which may act as a virtual switch instance in a Virtual Private LAN Service (VPLS) is typically independently administrable.
  • each of the virtual routers may share system resources but is separate from the other virtual routers regarding its management domain, AAA (authentication, authorization, and accounting) name space, IP address, and routing database(s).
  • AAA authentication, authorization, and accounting
  • Multiple VNEs may be employed in an edge ND to provide direct network access and/or different classes of services for subscribers of service and/or content providers.
  • “interfaces” that are independent of physical NIs may be configured as part of the VNEs to provide higher-layer protocol and service information (e.g., Layer 3 addressing).
  • the subscriber records in the AAA server identify, in addition to the other subscriber configuration requirements, to which context (e.g., which of the VNEs/NEs) the corresponding subscribers should be bound within the ND.
  • a binding forms an association between a physical entity (e.g., physical NI, channel) or a logical entity (e.g., circuit such as a subscriber circuit or logical circuit (a set of one or more subscriber circuits)) and a context’s interface over which network protocols (e.g., routing protocols, bridging protocols) are configured for that context. Subscriber data flows on the physical entity when some higher-layer protocol interface is configured and associated with that physical entity.
  • a physical entity e.g., physical NI, channel
  • a logical entity e.g., circuit such as a subscriber circuit or logical circuit (a set of one or more subscriber circuits)
  • network protocols e.g., routing protocols, bridging protocols
  • Some NDs provide support for implementing VPNs (Virtual Private Networks) (e.g., Layer 3 VPNs).
  • VPNs Virtual Private Networks
  • the ND where a provider’s network and a customer’s network are coupled are respectively referred to as PEs (Provider Edge) and CEs (Customer Edge).
  • PEs Provide Edge
  • CEs Customer Edge
  • routing typically is performed by the PEs.
  • an edge ND that supports multiple VNEs may be deployed as a PE; and a VNE may be configured with a VPN protocol, and thus that VNE is referred as a VPN VNE.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Computing Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

Methods and apparatuses for configuring a plurality of network elements in a data plane of a network are described. A network controller selects designated network elements. The number of designated network elements is strictly less than the number of all of the plurality of network elements in the network. The network controller configures each designated network element to include a Layer 3 forwarding table including a forwarding table entry for each network element from the plurality of network elements. The remaining network elements from the plurality of network elements are non-designated network elements. The network controller configures each non-designated network element to include a Layer 3 forwarding table including a default forwarding table entry having a designated network element as a next hop destination for traffic flows causing each of the non-designated network elements to forward all traffic associated with the plurality of traffic flows to the designated network element.

Description

METHOD AND APPARATUS FOR OPTIMIZED DISSEMINATION OF LAYER 3 FORWARDING INFORMATION IN SOFTWARE DEFINED NETWORKING (SDN)
NETWORKS
TECHNICAL FIELD
[0001] Embodiments of the invention relate to the field of packet networking; and more specifically, to the optimized dissemination of Layer 3 forwarding information in Software Defined Networking (SDN) networks.
BACKGROUND ART
[0002] Software-Defined Networking (SDN) is an approach to computer networking that allows network administrators to manage network services through abstraction of lower- level functionality. This is done by decoupling the system that makes decisions about where traffic is sent (the control plane) from the underlying systems that forward traffic to the selected destination (the data plane). In such a system, a network controller, which can be deployed as a cluster of server nodes, has the role of the control plane and is coupled to one or more network elements (NEs) that have the role of the data plane.
[0003] For implementing SDN, the Open Networking Foundation (ONF), an industrial consortium focusing on commercializing SDN and its underlying technologies, has defined a set of open commands, functions, and protocols. The defined protocol suites are known as the OpenFlow (OF) protocol. The network controller, acting as the control plane, may then program the data plane on the network elements by causing packet handling rules to be installed on the forwarding network elements using OF commands and messages. These packet handling rules may have criteria to match various packet types as well as actions that may be performed on those packets.
[0004] In Layer 3 (L3) based networks, the network controller configures the network elements of the data plane to include L3 forwarding tables. The L3 forwarding tables (which can be referred to as forwarding information base (FIB)) include forwarding table entries which represent the information that a routing/switching network element uses to select the interface that a given packet received at the network element will use for egress. For example, when OpenFlow is used, the network controller translates local and remote prefix information into OpenFlow rules and programs the rules into the forwarding tables of the network elements (i.e., the OpenFlow switches). Typically, Longest Prefix Matching (LPM) is used on the destination Internet Protocol (IP) address of packets received at the network element to select the appropriate forwarding table entry in an L3 forwarding table of the network element and determine the egress interface for that packet.
[0005] For traffic to flow over a given L3 network (e.g., a Layer 3 Virtual Private Network (VPN)), the network controller needs to configure the forwarding tables for each one of the network elements of the data plane forming the L3 network. However, the network controller is not aware of and cannot predict which traffic flows will be present in the network when programming the network.
SUMMARY
[0006] One general aspect includes a method, in a network controller of a network, of configuring a plurality of network elements in a data plane of the network, the method including: selecting a set of one or more designated network elements from the plurality of network elements, where the number of designated network elements is strictly less than the number of all of the plurality of network elements in the data plane of the network, and the remaining network elements from the plurality of network elements in the data plane (120) of the network are non-designated network elements; configuring each designated network element from the set of designated network elements to include a Layer 3 forwarding table including a forwarding table entry for each network element from the plurality of network elements in the data plane of the network; and configuring each non-designated network element to include a Layer 3 forwarding table including a default forwarding table entry having a designated network element from the set of designated network elements as a next hop destination for a plurality of traffic flows causing each of the non-designated network elements to forward all traffic associated with the plurality of traffic flows to the designated network element.
[0007] One general aspect includes a machine-readable medium including computer program code which when executed by a computer carries out the method including: selecting a set of one or more designated network elements from the plurality of network elements, where the number of designated network elements is strictly less than the number of all of the plurality of network elements in the data plane of the network, and the remaining network elements from the plurality of network elements in the data plane (120) of the network are non-designated network elements; configuring each designated network element from the set of designated network elements to include a Layer 3 forwarding table including a forwarding table entry for each network element from the plurality of network elements in the data plane of the network; and configuring each non-designated network element to include a Layer 3 forwarding table including a default forwarding table entry having a designated network element from the set of designated network elements as a next hop destination for a plurality of traffic flows causing each of the non-designated network elements to forward all traffic associated with the plurality of traffic flows to the designated network element.
[0008] One general aspect includes a network controller of a network for configuring a plurality of network elements in a data plane of the network, the network controller including: a non-transitory machine -readable storage medium that provides instructions that, if executed by a processor, will cause the network controller to perform operations including: selecting a set of one or more designated network elements from the plurality of network elements, where the number of designated network elements is strictly less than the number of all of the plurality of network elements in the data plane of the network, and the remaining network elements from the plurality of network elements in the data plane of the network are non-designated network elements; configuring each designated network element from the set of designated network elements to include a Layer 3 forwarding table including a forwarding table entry for each network element from the plurality of network elements in the data plane of the network; and configuring each non-designated network element to include a Layer 3 forwarding table including a default forwarding table entry having a designated network element from the set of designated network elements as a next hop destination for a plurality of traffic flows causing each of the non-designated network elements to forward all traffic associated with the plurality of traffic flows to the designated network element.
BRIEF DESCRIPTION OF THE DRAWINGS
[0009] The invention may best be understood by referring to the following description and accompanying drawings that are used to illustrate embodiments of the invention. In the drawings:
[0010] Figure 1 illustrates a block diagram of an exemplary network for optimized dissemination of Layer 3 forwarding information in accordance with some embodiments.
[0011] Figure 2A illustrates an exemplary Layer 3 forwarding table that may be configured in a designated NE of a network, in accordance with some embodiments. [0012] Figure 2B illustrates an exemplary Layer 3 forwarding table that may be configured in a first non-designated NE of a network, in accordance with some embodiments.
[0013] Figure 2C illustrates an exemplary Layer 3 forwarding table that may be configured in a second non-designated NE of a network, in accordance with some embodiments.
[0014] Figure 3 illustrates a block diagram of an exemplary network where heavy flows are forwarded through the designated NE in accordance with some embodiments.
[0015] Figure 4A illustrates an exemplary Layer 3 forwarding table that may be configured in a designated NE to efficiently forward heavy flows in the network, in accordance with some embodiments.
[0016] Figure 4B illustrates an exemplary Layer 3 forwarding table that may be configured in a first non-designated NE to efficiently forward heavy flows in the network, in accordance with some embodiments.
[0017] Figure 4C illustrates an exemplary Layer 3 forwarding table that may be configured in a second non-designated NE to efficiently forward heavy flows in the network, in accordance with some embodiments.
[0018] Figure 5 illustrates a block diagram of an exemplary network where heavy flows are forwarded through non-designated NEs in accordance with some embodiments.
[0019] Figure 6 illustrates a flow diagram of exemplary operations for optimized dissemination of Layer 3 forwarding information in a network in accordance with some embodiments.
[0020] Figure 7 illustrates a flow diagram of exemplary operations performed by a network controller when determining that a heavy flow is forwarded at a designated network element, in accordance with some embodiments.
[0021] Figure 8A illustrates connectivity between network devices (NDs) within an exemplary network, as well as three exemplary implementations of the NDs, according to some embodiments of the invention.
[0022] Figure 8B illustrates an exemplary way to implement a special-purpose network device according to some embodiments of the invention.
[0023] Figure 8C illustrates various exemplary ways in which virtual network elements (VNEs) may be coupled according to some embodiments of the invention.
[0024] Figure 8D illustrates a network with a single network element (NE) on each of the NDs, and within this straight forward approach contrasts a traditional distributed approach (commonly used by traditional routers) with a centralized approach for maintaining reachability and forwarding information (also called network control), according to some embodiments of the invention.
[0025] Figure 8E illustrates the simple case of where each of the NDs implements a single NE, but a centralized control plane has abstracted multiple of the NEs in different NDs into (to represent) a single NE in one of the virtual network(s), according to some embodiments of the invention.
[0026] Figure 8F illustrates a case where multiple VNEs are implemented on different NDs and are coupled to each other, and where a centralized control plane has abstracted these multiple VNEs such that they appear as a single VNE within one of the virtual networks, according to some embodiments of the invention.
[0027] Figure 9 illustrates a general purpose control plane device with centralized control plane (CCP) software, according to some embodiments of the invention.
DETAILED DESCRIPTION
[0028] The following description describes methods and apparatus for optimized dissemination of Layer 3 forwarding information in Software Defined Networking (SDN) networks. In the following description, numerous specific details such as logic implementations, opcodes, means to specify operands, resource partitioning/sharing/duplication implementations, types and interrelationships of system components, and logic partitioning/integration choices are set forth in order to provide a more thorough understanding of the present invention. It will be appreciated, however, by one skilled in the art that the invention may be practiced without such specific details. In other instances, control structures, gate level circuits and full software instruction sequences have not been shown in detail in order not to obscure the invention. Those of ordinary skill in the art, with the included descriptions, will be able to implement appropriate functionality without undue experimentation.
[0029] References in the specification to “one embodiment,” “an embodiment,” “an example embodiment,” etc., indicate that the embodiment described may include a particular feature, structure, or characteristic, but every embodiment may not necessarily include the particular feature, structure, or characteristic. Moreover, such phrases are not necessarily referring to the same embodiment. Further, when a particular feature, structure, or characteristic is described in connection with an embodiment, it is submitted that it is within the knowledge of one skilled in the art to affect such feature, structure, or characteristic in connection with other embodiments whether or not explicitly described.
[0030] Bracketed text and blocks with dashed borders (e.g., large dashes, small dashes, dot-dash, and dots) may be used herein to illustrate optional operations that add additional features to embodiments of the invention. However, such notation should not be taken to mean that these are the only options or optional operations, and/or that blocks with solid borders are not optional in certain embodiments of the invention.
[0031] In the following description and claims, the terms“coupled” and“connected,” along with their derivatives, may be used. It should be understood that these terms are not intended as synonyms for each other.“Coupled” is used to indicate that two or more elements, which may or may not be in direct physical or electrical contact with each other, co-operate or interact with each other.“Connected” is used to indicate the establishment of communication between two or more elements that are coupled with each other.
Prior Approaches for dissemination of Laver 3 forwarding information in a network
[0032] Several approaches exist for configuring the network elements forming the data plane of an L3-based network such as a L3VPN. In a first existing approach, the network controller configures each network element of the network as a next hop in all the other network elements such that each network element of the network can be reached from the other network elements. In other words, in this first approach each network element of the L3 network includes all potential flows configured as forwarding table entries in the L3 forwarding table(s) of the network element. In typical networks, the number of network elements can reach thousands of elements resulting in large forwarding tables that need to be programmed on all the elements of the network. In addition, the number of routes advertised by edge network elements (such as gateway devices) to the network controller can significantly increase and cannot be controlled by the network controller. With this approach an increase in the number of network elements in the L3 network has a significant impact on the size of the L3 forwarding tables. The increase in the size of the forwarding table to be configured on a large number of network elements in a network has several drawbacks and scaling issues. Several types of network elements cannot handle large forwarding tables due to a limit in memory size available to the network elements. Top of the Rack (ToR) switches connected to bare metal network elements or single root input/output virtualization (SR-IOV) virtual machines (VMs) are examples of network elements that can be limited in memory. Further, virtual switches (vSwitches) supporting SR-IOV (which are network elements housed inside the network interface controller (NIC) of the SR-IOV-capable hosts instead of running on regular cores of the compute host) also have limited memory resources. Another disadvantage of this first approach is that the configuration of the network elements can prove to be too heavy for the network controller as the controller needs to program large L3 forwarding tables across the multiple network elements. While at regular run time, the rate of network elements configuration may be low, after the occurrence of some network events (e.g., reboot of an NE, reboot of the network controller, or VM migration, etc.) a large set of forwarding table entries has to be programmed on the network elements of the data plane and this can take a significant amount of time causing significant delays in the network. Further, when the first approach is used to configure the network elements of the data plane, the network controller may program forwarding table entries that will never be used, consequently using storage resources that could otherwise be used for other purposes. Such forwarding table entries or routes contribute to the bloating of the forwarding tables. However, since the network controller cannot predict if an entry in the forwarding table is needed for actual traffic that will flow in the network, not programming these entries on a network element can lead to traffic drop.
[0033] In a second approach, the network controller does not pre-configure the forwarding table entries for all flows in all the network elements. Instead, when a packet arrives at a network element, the forwarding pipeline performs a match look-up on the destination IP address in the forwarding table and transmits any table-miss (i.e., non-existing table entries matching the IP address) to the network controller. Packets with no corresponding table entries in the forwarding table are punted to the network controller. If the flow is known to the network controller, it programs forwarding of the flow on the network element from which the packets of the flow were received, consequently adding the corresponding flow as a forwarding table entry. The second approach avoids the cost of pre-configuring at each network element all the possible table entries (i.e., flows) for the network. However, this approach is more suitable for Layer 2 forwarding where the matching of traffic to an entry in a forwarding table is based on an exact match on the destination media access control (DMAC) address as opposed to Layer 3 forwarding that is typically done based on Longest Prefix Matching. Longest Prefix Matching may result in packets being matched against undesired prefixes instead of being punted to the network controller. Thus, applying the second approach to Layer 3 forwarding tables can lead to unintended behavior which can be detrimental to traffic flow in the network. [0034] In a third approach, the network controller can program a prefix in the forwarding table that causes more flows received at the network element to be punted to the network controller if there is a table-miss. The network controller may then decide to program a more specific forwarding table entry for that flow. This third approach, which can be referred to as need-based flow programming, solves the flow scale programming problem on the network element as forwarding table entries are programmed when a packet corresponding to that flow is received and punted to the network controller. However, the third approach suffers from the following disadvantages: punting packets of a flow to the network controller and processing them always incurs a latency on the first packet of the flow, which may not be desirable for some flows and some network applications. In addition, punting packets at high rates, as may be desirable for some applications, can overwhelm the network controller. None of the existing techniques provides efficient and optimized forwarding information dissemination in a network. There is a need for more efficient and optimized techniques for dissemination of Layer 3 forwarding information in SDN networks.
Optimized Dissemination of Laver 3 Forwarding Information in Software Defined
Networking (SDN) Networks
[0035] Methods and apparatuses for configuring a plurality of network elements in a data plane of a network are described. In some embodiments, the network controller of the network determines one or more designated network elements from the plurality of network elements. The number of designated network elements is strictly less than the number of all of the plurality of network elements in the data plane of the SDN network. The network controller configures each one of the designated network elements to include a Layer 3 forwarding table including a forwarding table entry for each network element from the plurality of network elements in the data plane of the SDN network. The network controller configures each one of the network elements from the plurality of network elements that are different from the designated network elements to include a Layer 3 forwarding table including a default forwarding table entry having a designated network element from the designated network elements as a next hop destination for a plurality of traffic flows causing each one of the network elements from the plurality of network elements that are different from the designated network elements to forward all traffic associated with the plurality of traffic flows to the designated network element.
[0036] Figure 1 illustrates a block diagram of an exemplary network for optimized dissemination of Layer 3 forwarding information in accordance with some embodiments. Figure 1 includes a network controller 110 and a data plane 120 of a first network 100 coupled with a second network 130 through an edge network element 105. The second network 130 is further coupled with the remote network 140 through one or more edge network elements, which are not illustrated. In a non-limiting exemplary implementation, the data plane 120 can be, or includes, a data center (DC) including resources (e.g., Virtual Machines, Containers, bare metal applications, etc.) that can be allocated and used by multiple tenants.
[0037] The data plane 120 includes multiple network elements 101-104 that are coupled with local hosts 111, 112A-B, 113, and 114. The network elements 101-104 form the data plane 120 of the network 100 and are controlled by the network controller 110. While Figure 1 illustrates 5 network elements (101-105) as part of the network 100, this is intended to be exemplary only and one of ordinary skill in the art would understand that the network 100 may include various numbers of network elements. In typical implementations, the network 100 includes hundreds to thousands of network elements. The network elements 101-105 provide Layer 2 and Layer 3 connectivity to the hosts 111, 112A-B, 113 and 114. The L2/L3 traffic forwarded within the data plane 120 can be exchanged between the hosts, exchanged between networks (e.g., intra data centers), or forwarded towards/from the Internet. An NE, such as NE 101, can be a virtual network element when the end hosts are virtual machines. Alternatively, an NE, such as NE 102, can be a Top of the Rack (ToR) network device that is coupled with legacy bare metal appliances and servers (such as local host 112A and local host 112B). In some embodiments, when OpenFlow is used between the network controller 110 and the NEs, each one of the NEs 101-104 is an OF switch (e.g., OpenFlow Virtual Switch (OVS), or a ToR switch supporting OF). While Figure 1 illustrates a determined number of hosts coupled with respective switches, this is intended to be exemplary only and should not be considered as a limitation to the present invention. The data plane 120 may include several NEs (typically hundreds or thousands) and each NE may be coupled to one or more host devices. In some embodiments, the network elements 101 - 105 and the network controller 110 are implemented as described with references to the various embodiments of Figures 8A-F and Figure 9.
[0038] In some embodiments, the tenants of the data plane 120 can be separated into different Layer 3 Virtual Private Networks (L3VPN). Figure 1 shows an example where local host 111, local host 112A and local host 113 form a first L3VPN (indicated with shaded boxes), and the local host 112B and local host 114 form another L3VPN within the same data plane 120. For example, when the data plane 120 is, or includes, a data center, the resources used by or allocated to a tenant can form a single L3VPN. While the illustrated embodiment shows the elements of a single L3VPN located within the data plane 120, in other embodiments, the L3VPN might extend beyond the data plane 120 (e.g., enterprise tenant's VPN might extend to multiple premises).
[0039] The network controller 110 is a centralized control plane and has the responsibility for generating reachability and forwarding information. The network controller 110 is sometimes referred to as a SDN control module, controller, network controller, OpenFlow controller, SDN controller, control plane node, network virtualization authority, or management control entity. The network controller 110 enables a centralized process of neighbor discovery and topology discovery. The network controller 110 has a south bound interface with a data plane (sometimes referred to as the infrastructure layer, network forwarding plane, or forwarding plane) that includes the NEs 101-104 (sometimes referred to as switches, forwarding elements, data plane elements, or nodes). The network controller 110 determines the reachability within the network and distributes the forwarding information to the NEs 101-104 of the data plane over the south bound interface (which may use the OpenFlow protocol). Thus, the network intelligence is centralized in the network controller 110 executing on electronic devices that are typically separate from the network devices on which the NEs are implemented. The network controller 110 further communicates with edge network elements, such as the edge NE 105 (e.g., a Data Center Gateway), that couple the data plane 120 with external and remote networks (e.g., other data centers, the Internet, etc.). The network controller 110 can use the Multi-Protocol Border Gateway Protocol (MP-BGP) to exchange routes with the edge NE 105.
[0040] The data plane 120 allows the distribution and forwarding of traffic flows to and from the hosts 111-114. To enable the distribution of the flows across the data plane 120, the network controller configures forwarding information on the network elements 101-104. The network controller 110 discovers the prefixes of each one of the network elements within the data plane 120 through various mechanisms. Sniffing Address Resolution Protocol (ARP), Dynamic Host Configuration Protocol (DHCP), or the interaction with a cloud orchestrator (e.g., OpenS tack) are non-limiting examples of mechanisms that can be used by the network controller 110 to discover the prefixes of the network elements forming the data plane 120. When the L3VPNs present within the data plane 120 extend beyond the data plane 120, the network controller 110 advertises the L3VPN routes for reaching the local hosts to the edge NE 105. Similarly, the network controller 110 learns L3VPN routes from the DC-gateways that can be used for reaching the remote hosts and networks. For example, the network controller 110 learns from the edge NE 105 the prefixes of the remote network 140 (e.g., prefix 6.6.6.0/24) and the prefix (e.g., prefix 7.7.7.7) of the remote host 117 that is located in the network 130. The network controller 110 translates the prefixes of the local and remote NEs and networks into control and forwarding rules (e.g., OF rules) and programs the rules into the local NEs forming the data plane 120.
[0041] The network controller 110 is operative to perform an optimized dissemination of Layer 3 forwarding information in the data plane 120 by configuring the network elements 101-104 according to the techniques described herein. The network controller 110 selects one or more designated NEs from the set of network elements 101-104. The number of designated network elements is strictly less than the number of all of the NEs 101-104 in the data plane of the network 100. For example, in Figure 1, the network controller 110 selects the NE 103 as a designated NE for NEs 101, 102 and 103. In some embodiments, the NEs 101, 102, and 103 are network elements that enable the hosts 111, 112A and 113 to communicate. The hosts 111, 112A, and 113 are part of a first L3VPN network within the data plane 120. In some embodiments, the network controller 110 may select more than one NE as a designated NE. The network controller 110 configures the designated network element to include a Layer 3 forwarding table including a forwarding table entry for each network element from the plurality of network elements in the data plane of the SDN network. The designated NE 103 receives one or more control commands and messages from the network controller 110 that include rules for programming the L3 forwarding table(s). As a result of these rules, the designated NE 103 includes the L3 forwarding table for the entire subset of NEs forming the L3VPN network. The L3 forwarding table in the designated NE 103 is configured to include all routes for reaching all of the NEs 101 and 102 regardless of whether or not traffic will actually flow towards those NEs. While the embodiments describe an L3 forwarding table, in other embodiments, the NE includes multiple L3 forwarding table(s) that result from the configuration of the designated NE by the network controller 110.
[0042] Several strategies can be used by the network controller 110 for selecting the set of one or more designated network elements without departing from the scope of the present invention. In one embodiment, the network controller 110 may select a network element from the data plane 120 that is coupled with the least number of hosts. The network element having the least number of hosts in the network has the least number of forwarding table entries associated with local hosts served by the network element and can accommodate several forwarding table entries of many non-designated network elements. When the network controller 110 determines that there is more than one network element that can be selected based on this criteria, it may select all of these NEs or alternatively select a subset of these NEs to be designated NEs.
[0043] In another embodiment, the network controller 110 can receive from an administrator (e.g., a data center administrator or the network administrator) a list of network elements that can be used as designated network elements. When making the selection, the administrator can take into consideration, the memory and/or processing power of the network elements.
[0044] The network controller configures each non-designated network elements to include a different L3 forwarding table. In the following description, the NEs that are referred to as non-designated NEs are network elements different from the designated NEs. These non-designated NEs are not part of the set of designated NEs. For example, NE 101 and NE 102 are non-designated NEs in the L3VPN which has NE 103 as a designated NE. The L3 forwarding table configured at the non-designated NEs includes a default forwarding table entry having the designated network element NE 103 as a next hop destination for a plurality of traffic flows causing each of the non-designated network elements (e.g., NE 101 and NE 102) to forward all traffic associated with the plurality of traffic flows to the designated network element. The non-designated NEs 101, 102 receive one or more control commands and messages from the network controller 110 that include rules for programming the L3 forwarding table(s) of the NEs. As a result of these rules, the non-designated NEs 101-102 include an L3 forwarding table with a default forwarding table entry having the designated NE 103 as a next hop destination for the plurality of traffic flows. In some embodiments, all traffic flows forwarded at the non-designated NE that are not destined to a local port of the non-designated NE (i.e., the flows that are destined to a network device that is other than a host coupled directly with the non-designated NE) are forwarded towards the designated NE. With reference to the data plane 120, the network controller 110 configures NE 101 and NE 102 to forward all traffic that is not destined for one of the local ports PI and P2 towards the designated NE 103. These traffic flows may include packet flows originating from sources connected to non-designated NEs (e.g., host 111 for NE 101 and host 112A for NE 102).
[0045] When the packet flows are received at the NE 103, they are forwarded based on the L3-forwarding table that was configured by the network controller 110. For example, the NE 103 performs a LPM match based on the destination of the packet flows and sends the packets to the proper destination. The destination of the packet flows can be internal or external to the data plane 120. When the network controller 110 has selected multiple designated NEs, these NEs may provide redundancy to the network and can be used for load balancing the traffic flows over the data plane 120.
[0046] Selecting a subset of designated NEs from the set of NEs that form a network to be configured with a complete L3 forwarding table (e.g., the entire L3FIB of the network) and configuring the rest of the NEs of the network to forward any non-local traffic flows towards at least one of the designated NEs result in a significant reduction of the number of L3 forwarding table entries that are programmed by the network controller for a particular network. Thus, the novel approach described herein for dissemination of L3 forwarding network information has several advantages with respect to the standard existing approaches. The solution significantly reduces the number of flows to be written on multiple network elements of the network as only a subset of these NEs is to include the complete L3-forwarding table of the network. Further, the solution is applicable to all types of network elements of the network (e.g., vSwitches, hardware switches, etc.). In OF networks, the solution does not require any new OF extension and can use standard OpenFlow messages for configuring each of the network elements. Thus, the solution presented herein allows for a fast and efficient configuration of a L3 network that does not impose a heavy processing burden on the network controller. The solution allows for a reduction of the number of control commands and messages exchanged between the network controller and the NEs of a network when compared with standard forwarding dissemination techniques. The solution also allows for judicious use of memory resources in the network as only a subset of NEs includes larger L3 forwarding tables.
[0047] Figure 2A illustrates an exemplary Layer 3 forwarding table that may be configured in a designated NE of a network, in accordance with some embodiments. The forwarding table 203 is a Layer 3 forwarding table configured in NE 103 by the network controller 110 when the network controller selects the NE 103 as a designated NE for the data plane 120. For example, as discussed above the NE 103 is selected as a designated NE of the L3VPN formed by NE 101, NE 102, and NE 103. As illustrated in Figure 2A, the forwarding table 203 includes forwarding table entries for reaching each one of the other NEs of the L3VPN. The first entry in the forwarding table 203 is accessed when flow packets are received at the NE 103 with a destination address of the packet that matches, based on the longest prefix match, the prefix“1.1.1.1/32” of the host 111 coupled with NE 101. This entry causes the matching flow packets to be sent to the NE 101 towards the host 111. The second entry in the forwarding table 203 is accessed when flow packets are received at the NE 103 with a destination address of the packet that matches, based on the longest prefix match, the prefix“2.2.2.2/32” of the host 112A coupled with NE 102. This entry causes the matching flow packets to be sent to the NE 102 towards the host 112A. The third entry in the forwarding table 203 is accessed when flow packets are received at the NE 103 with a destination address of the packet that matches, based on the longest prefix match, the prefix“3.3.3.3/32” of the local port P3 of the NE 103. This entry causes the matching flow packets to be sent to the port P3 towards the host 113. The fourth entry in the forwarding table 203 is accessed when flow packets are received at the NE 103 with a destination address of the packet that matches, based on the longest prefix match, the prefix “6.6.6.0/24” of the remote network 140. This entry causes the matching flow packets to be sent to the edge NE 105 to be forwarded towards the remote network 140.
[0048] Figure 2B illustrates an exemplary Layer 3 forwarding table that may be configured in a first non-designated NE of a network, in accordance with some embodiments. As illustrated in Figure 2B, the forwarding table 201 includes forwarding table entries for reaching the local port PI of the NE 101 and for reaching the designated NE 103. The first entry in the forwarding table 201 is accessed when flow packets are received at the NE 101 with a destination address of the packet that matches, based on the longest prefix match, the prefix“1.1.1.1/32” of the host 111 coupled with NE 101. This entry causes the matching flow packets to be sent to the port PI of the NE 101 towards the host 111. The second entry in the forwarding table 201 is accessed when flow packets are received at the NE 101 with a destination address of the packet that matches, based on the longest prefix match, the prefix“0.0.0.0/0”. This entry causes all flow packets that are not destined to the local host coupled with the NE 101 to be sent to the designated NE 103. Once NE 103 is reached, the NE 103 can forward these packet flows towards another NE (which can be internal or external to the data plane 120).
[0049] Figure 2C illustrates an exemplary Layer 3 forwarding table that may be configured in a second non-designated NE of a network, in accordance with some embodiments. As illustrated in Figure 2C, the forwarding table 202 includes forwarding table entries for reaching the local port P2 of the NE 102 and for reaching the designated NE 103. The first entry in the forwarding table 202 is accessed when flow packets are received at the NE 102 with a destination address of the packet that matches, based on the longest prefix match, the prefix“2.2.2.2/32” of the host 112A coupled with NE 102. This entry causes the matching flow packets to be sent to the port P2 of the NE 102 towards the host 112A. The second entry in the forwarding table 202 is accessed when flow packets are received at the NE 102 with a destination address of the packet that matches, based on the longest prefix match, the prefix“0.0.0.0/0”. This entry causes all flow packets that are not destined to the local host 112A with prefix“2.2.2.2/32” coupled with the NE 102 to be sent to the designated NE 103. Once NE 103 is reached, the NE 103 can forward these packet flows towards another NE (which can be internal or external to the data plane 120).
[0050] As opposed to the L3 forwarding table 203, which includes a route towards each one of the NEs that are part of the layer 3 network for which NE 103 is a designated NE, the forwarding tables 201 and 202 include a reduced number of forwarding table entries. Each of the forwarding tables 201 and 202 includes one or more forwarding table entries to forward the traffic flows to local host(s), if any. The forwarding tables 201 and 202 also include a default route for all other traffic that is received at their respective NEs (i.e., NE 101 and NE 102). The default route causes all traffic that is not local to be forwarded towards a designated NE (here NE 103).
[0051] The L3 forwarding tables 201, 202, and 203 include routes towards one or more NEs that are part of the layer 3 network for which NE 103 is a designated NE. In some embodiments, the forwarding information to be configured on each one of the NEs of the data plane 120 might involve additional tables (flow table, group tables, etc.) and/or additional routes in the existing tables based on different forwarding protocols (e.g., the forwarding table may include Multiprotocol Label Switching (MPLS) labels) that are not illustrates in Figures 2A-C. Further, each of the forwarding tables includes the prefix that is matched based on the longest prefix match as well as the overall action that is to be performed for this prefix. While the overall action is shown as a single action, this action may be the result of multiple sub-actions that are not illustrated.
[0052] In the example illustrated in Figures 1, 2A-C, the forwarding table of the designated NE has a size of 4 flows. The total number of flows programmed by network controller 110 across all the NEs of the L3VPN network is 4 + 2 + 2 = 8. This is a significant improvement when compared with the number of flows that would have been programmed with the naive approach of programming all the Layer 3 flows on all the network elements of the L3VPN. While the illustrated data plane 120 includes a relatively small number of NEs, typical networks will include hundreds to thousands of NEs and the use of the proposed solution for dissemination of L3 forwarding information results in a significant reduction in the number of forwarding table entries that needs to be programmed on the various NEs when compared with the number of forwarding table entries programmed based on standard forwarding information dissemination techniques. This results in an optimized use of network resources (e.g., messages exchanges between the network controller and the different NEs during configuration operations) and memory resources.
Detection of Heavy Flows
[0053] In some embodiments, the designated network elements may receive heavy flows (which can also be referred to as elephant flows). A flow is considered to be a heavy flow when a network element receives more than a predetermined number N of packets for that flow during a predetermined period of time. The number of packets N can be set by an administrator of the data plane 120 and may vary depending on the applications and other factors decided by the administrator. The occurrence of heavy flows can cause additional latency and congestion in the network and may clog the designated NE of the data plane 120. To address this potential issue, the network controller is operative to detect the heavy flows that are forwarded through a designated NE and configure one or more NEs of the network to allow the heavy flow to bypass the designated NE.
[0054] Figure 3 illustrates a block diagram of an exemplary network where heavy flows are forwarded through the designated NE in accordance with some embodiments. The host 111 and the host 112A transmit packets to the remote network 140 as illustrated in Figure 3. These packets are forwarded by each of the NE 101 and the NE 102 towards the designated NE 103. The designated NE 103 transmits the packets towards the edge NE 105 to be forwarded to the remote network 140. The host 111 transmits a high number of packets per second in a sustained manner and the host 112A sends a lower number of packets per second. The packets transmitted by the host 111 are considered a heavy flow of packets as their number exceeds a threshold set for this flow.
[0055] The network controller 110 is operative to configure the designated NE 103 to enable an efficient detection of the heavy flows and a rerouting of these flows such that the designated NE is avoided. At the time of configuration of the designated NEs (e.g., designated NE 103), the network controller 110 configures each of the forwarding table entries with an associated predetermined threshold value which, when exceeded by the number of packets received at the designated NE, triggers an alert to the network controller 110. For example, in OF networks, L3FIB flow entries of the designated OF switch are programmed with OFPIT_STAT_TRIGGER including a predetermined value N. The value of N can be determined by an administrator and is used to define which flows are considered heavy flows. The value N is defined to be sufficiently large (e.g., thousands or millions of packets) for a determined period (e.g., per second). [0056] In some embodiments, the designated NE periodically monitors the number of packets received for a given flow in order to determine whether the flow is a heavy flow or not. The designated NE 103 transmits a message to the network controller 110 each time a flow is determined to be a heavy flow. For example, in OF networks, when the OFPSTF_PERIODIC flag is set, the trigger will apply not only when the value N of the threshold is reached for the flow, but also when multiples of that value (e.g., 2N, 3N, ...) are reached for the flow. In other embodiments, the designated NE may monitor the number of packets and may transmit an alert when a threshold is reached once. For example, in OpenFlow, when the OFPSTF_ONFY_FIRST is set, only the first threshold that is crossed is considered, and other thresholds (e.g., multiples of the first threshold) are ignored.
[0057] Figure 4A illustrates an exemplary Fayer 3 forwarding table that may be configured in a designated NE to efficiently forward heavy flows in the network, in accordance with some embodiments. In the illustrated table 403 for each forwarding table entry configured (except for the local port P3 of the designated NE), an instruction is added to indicate that when a number of packets of the flow reaches the number N, a message is triggered for transmission to the network controller 110. The network controller 110 receives the message from the designated network element 103. The message includes an indication that a number of packets of a flow forwarded by the designated network element 103 has exceeded a predetermined threshold N.
[0058] In response to receiving the message, the network controller 110 configures one or more non-designated network elements to forward the packets of the flow. This causes the packets of the flow to bypass the designated network element when being forwarded in the network consequently reducing the backlog that may have been caused by the flow at the designated NE.
[0059] Figure 4B illustrates an exemplary Fayer 3 forwarding table that may be configured in a first non-designated NE to efficiently forward heavy flows in the network, in accordance with some embodiments. Figure 4C illustrates an exemplary Fayer 3 forwarding table that may be configured in a second non-designated NE to efficiently forward heavy flows in the network, in accordance with some embodiments. Referring to Figures 4B and 4C, upon the receipt of a message that includes an indication that a number of packets of a flow (e.g., flow destined to the remote network 140 prefix 6.6.6.0/24) forwarded by the designated network element 103 has exceeded a predetermined threshold, the network controller configures the NE 101 and the NE 102 to directly forward the packets of this flow to the edge NE 105 and bypass the designated NE 103. The network controller 110 adds to each one of the forwarding tables 401 and 402 the forwarding table entry for the prefix “6.6.6.0/24” for forwarding the flow of packets destined to the remote network 140. This flow of packets is forwarded towards the edge NE 105 and bypasses the designated NE 103. Figure 5 illustrates a block diagram of an exemplary network where heavy flows are forwarded through non-designated NEs in accordance with some embodiments. The heavy flow bypasses the designated NE 103 and is forwarded from each one of the NEs 101 and 102 towards the remote network 140 via the edge NE 105. In some embodiments, bypass of the designated network element by the packets of the flow is temporary and is set to expire after a predetermined period of time. In these embodiments, the configuration of the additional forwarding table entries in the non-designated NEs (e.g., NE 101 and NE 102) is set to expire after a predetermined time interval has elapsed causing the packets of the heavy flow to be forwarded through and by the designated NE 103 after the expiration of the interval of time.
[0060] The detection of the heavy flows improves the forwarding information dissemination techniques described above by enabling the bypass of the designated NE by heavy flows. This detection does not mandate any complex elephant flow detection algorithms to be run on the network controller 110 or the designated network element 103. Instead a mechanism is used to configure each forwarding table entry with a predetermined threshold, and when certain packet flows exceed their respective threshold, an alert is transmitted to the network controller 110. In response to the detection of the heavy flows in a designated NE, one or more network elements of the network are programmed such that the heavy flows are not forwarded to the designated NE. In some embodiments, these flows are configured in the non-designated NEs with an aging time causing them to phase out when that time is reached.
[0061] The operations in the flow diagrams will be described with reference to the exemplary embodiments of the other figures. However, it should be understood that the operations of the flow diagrams can be performed by embodiments of the invention other than those discussed with reference to the other figures, and the embodiments of the invention discussed with reference to these other figures can perform operations different than those discussed with reference to the flow diagrams.
[0062] Figure 6 illustrates a flow diagram of exemplary operations for optimized dissemination of Layer 3 forwarding information in a network in accordance with some embodiments. The network controller 110 selects, at operation 602, a set of one or more designated network elements (e.g., NE 103) from multiple network elements (e.g., NE 101- 103) of a data plane of a network. The number of designated network elements is strictly less than the number of all of the network elements in the data plane of the network. The remaining network elements from the multiple network elements of the data plane of the network are non-designated network elements (e.g., NE 101 and NE 102).
[0063] At operation 604, the network controller 110 configures each designated network element (e.g., NE 103) from the set of designated network elements to include a Layer 3 forwarding table (e.g., forwarding table 203). The forwarding table includes a forwarding table entry for each network element from the multiple network elements in the data plane of the network. At operation 606, the network controller 110 configures each non- designated network element (e.g., NE 101 and NE 102) to include a Layer 3 forwarding table (e.g., forwarding tables 201 and 202). The forwarding table includes a default forwarding table entry having a designated network element (e.g., NE 103) from the set of designated network elements as a next hop destination for a plurality of traffic flows causing each of the non-designated network elements to forward all traffic associated with the plurality of traffic flows to the designated network element (e.g., NE 103).
[0064] Figure 7 illustrates a flow diagram of exemplary operations performed by a network controller when determining that a heavy flow is forwarded at a designated network element, in accordance with some embodiments. In some embodiments, the network controller 110, configures, at operation 702, for each traffic flow a predetermined threshold that when exceeded by the packets of the traffic flow forwarded by the designated network element causes the designated network element to transmit a message to the network controller 110. At operation 704, the network controller 110 receives the message from the designated network element (e.g., NE 103), where the message includes an indication that a number of packets of a traffic flow forwarded by the designated network element has exceeded the predetermined threshold. Responsive to receiving the message, the network controller 110 configures, at operation 706, one or more non-designated network elements (e.g., NE 101 and NE 102) to forward the packets of the traffic flow causing the packets of the traffic flow to bypass the designated network element (e.g., NE 103) when being forwarded in the network.
[0065] The embodiments described herein present efficient and optimized Layer 3 forwarding information dissemination techniques. The network controller of a centralized network selects a subset of designated NEs to be configured with a complete L3 forwarding table (e.g., the entire L3FIB of the network) and configures the rest of the NEs of the network to forward any non-local traffic flows towards at least one of the designated NEs resulting in a significant reduction of the number of L3 forwarding table entries that are programmed by the network controller for a particular network. The novel approach described herein for dissemination of L3 forwarding network information has several advantages with respect to the standard existing approaches. The solution significantly reduces the number of flows to be written on multiple network elements of the network as only a subset of these NEs is to include the complete L3 -forwarding table of the network. Further, the solution is applicable to all types of network elements of the network (e.g., vSwitches, hardware switches, etc.). In OF networks, the solution does not require any new OF extension and can use standard OpenFlow messages for configuring each of the network elements. In some embodiments, a dynamic programming of the NEs is performed by enabling the detection of heavy flows that may reach the designated NEs and by configuring the network to forward these flows while bypassing the designated NE. This immediately alleviates any potential backlog at the designated NE. The solution presented herein does not overload the network controller with overheads (such as periodic statistics queries from the network element). Instead, the monitoring of the traffic at the designated NEs is solely performed based on events detected at the NEs.
[0066] An electronic device stores and transmits (internally and/or with other electronic devices over a network) code (which is composed of software instructions and which is sometimes referred to as computer program code or a computer program) and/or data using machine-readable media (also called computer-readable media), such as machine-readable storage media (e.g., magnetic disks, optical disks, solid state drives, read only memory (ROM), flash memory devices, phase change memory) and machine-readable transmission media (also called a carrier) (e.g., electrical, optical, radio, acoustical or other form of propagated signals - such as carrier waves, infrared signals). Thus, an electronic device (e.g., a computer) includes hardware and software, such as a set of one or more processors (e.g., wherein a processor is a microprocessor, controller, microcontroller, central processing unit, digital signal processor, application specific integrated circuit, field programmable gate array, other electronic circuitry, a combination of one or more of the preceding) coupled to one or more machine-readable storage media to store code for execution on the set of processors and/or to store data. For instance, an electronic device may include non-volatile memory containing the code since the non-volatile memory can persist code/data even when the electronic device is turned off (when power is removed), and while the electronic device is turned on that part of the code that is to be executed by the processor(s) of that electronic device is typically copied from the slower non-volatile memory into volatile memory (e.g., dynamic random access memory (DRAM), static random access memory (SRAM)) of that electronic device. Typical electronic devices also include a set or one or more physical network interface(s) (NI(s)) to establish network connections (to transmit and/or receive code and/or data using propagating signals) with other electronic devices. For example, the set of physical NIs (or the set of physical NI(s) in combination with the set of processors executing code) may perform any formatting, coding, or translating to allow the electronic device to send and receive data whether over a wired and/or a wireless connection. In some embodiments, a physical NI may comprise radio circuitry capable of receiving data from other electronic devices over a wireless connection and/or sending data out to other devices via a wireless connection. This radio circuitry may include transmitter(s), receiver(s), and/or transceiver(s) suitable for radiofrequency communication. The radio circuitry may convert digital data into a radio signal having the appropriate parameters (e.g., frequency, timing, channel, bandwidth, etc.). The radio signal may then be transmitted via antennas to the appropriate recipient(s). In some embodiments, the set of physical NI(s) may comprise network interface controller(s) (NICs), also known as a network interface card, network adapter, or local area network (LAN) adapter. The NIC(s) may facilitate in connecting the electronic device to other electronic devices allowing them to communicate via wire through plugging in a cable to a physical port connected to a NIC. One or more parts of an embodiment of the invention may be implemented using different combinations of software, firmware, and/or hardware.
[0067] A network device (ND) is an electronic device that communicatively interconnects other electronic devices on the network (e.g., other network devices, end-user devices). Some network devices are“multiple services network devices” that provide support for multiple networking functions (e.g., routing, bridging, switching, Layer 2 aggregation, session border control, Quality of Service, and/or subscriber management), and/or provide support for multiple application services (e.g., data, voice, and video).
[0068] Figure 8A illustrates connectivity between network devices (NDs) within an exemplary network, as well as three exemplary implementations of the NDs, according to some embodiments of the invention. Figure 8A shows NDs 800A-H, and their connectivity by way of lines between 800A-800B, 800B-800C, 800C-800D, 800D-800E, 800E-800F, 800F-800G, and 800A-800G, as well as between 800H and each of 800A, 800C, 800D, and 800G. These NDs are physical devices, and the connectivity between these NDs can be wireless or wired (often referred to as a link). An additional line extending from NDs 800A, 800E, and 800F illustrates that these NDs act as ingress and egress points for the network (and thus, these NDs are sometimes referred to as edge NDs; while the other NDs may be called core NDs).
[0069] Two of the exemplary ND implementations in Figure 8 A are: 1) a special-purpose network device 802 that uses custom application-specific integrated-circuits (ASICs) and a special-purpose operating system (OS); and 2) a general purpose network device 804 that uses common off-the-shelf (COTS) processors and a standard OS.
[0070] The special-purpose network device 802 includes networking hardware 810 comprising a set of one or more processor(s) 812, forwarding resource(s) 814 (which typically include one or more ASICs and/or network processors), and physical network interfaces (NIs) 816 (through which network connections are made, such as those shown by the connectivity between NDs 800A-H), as well as non-transitory machine readable storage media 818 having stored therein networking software 820. During operation, the networking software 820 may be executed by the networking hardware 810 to instantiate a set of one or more networking software instance(s) 822. Each of the networking software instance(s) 822, and that part of the networking hardware 810 that executes that network software instance (be it hardware dedicated to that networking software instance and/or time slices of hardware temporally shared by that networking software instance with others of the networking software instance(s) 822), form a separate virtual network element 830A-R. Each of the virtual network element(s) (VNEs) 830A-R includes a control communication and configuration module 832A-R (sometimes referred to as a local control module or control communication module) and forwarding table(s) 834A-R, such that a given virtual network element (e.g., 830A) includes the control communication and configuration module (e.g., 832A), a set of one or more forwarding table(s) (e.g., 834A), and that portion of the networking hardware 810 that executes the virtual network element (e.g., 830A).
[0071] The special-purpose network device 802 is often physically and/or logically considered to include: 1) a ND control plane 824 (sometimes referred to as a control plane) comprising the processor(s) 812 that execute the control communication and configuration module(s) 832A-R; and 2) a ND forwarding plane 826 (sometimes referred to as a forwarding plane, a data plane, or a media plane) comprising the forwarding resource(s) 814 that utilize the forwarding table(s) 834A-R and the physical NIs 816. By way of example, where the ND is a router (or is implementing routing functionality), the ND control plane 824 (the processor(s) 812 executing the control communication and configuration module(s) 832A-R) is typically responsible for participating in controlling how data (e.g., packets) is to be routed (e.g., the next hop for the data and the outgoing physical NI for that data) and storing that routing information in the forwarding table(s) 834A-R, and the ND forwarding plane 826 is responsible for receiving that data on the physical NIs 816 and forwarding that data out the appropriate ones of the physical NIs 816 based on the forwarding table(s) 834A-
R.
[0072] Figure 8B illustrates an exemplary way to implement the special-purpose network device 802 according to some embodiments of the invention. Figure 8B shows a special- purpose network device including cards 838 (typically hot pluggable). While in some embodiments the cards 838 are of two types (one or more that operate as the ND forwarding plane 826 (sometimes called line cards), and one or more that operate to implement the ND control plane 824 (sometimes called control cards)), alternative embodiments may combine functionality onto a single card and/or include additional card types (e.g., one additional type of card is called a service card, resource card, or multi-application card). A service card can provide specialized processing (e.g., Layer 4 to Layer 7 services (e.g., firewall, Internet Protocol Security (IPsec), Secure Sockets Layer (SSL) / Transport Layer Security (TLS), Intrusion Detection System (IDS), peer-to-peer (P2P), Voice over IP (VoIP) Session Border Controller, Mobile Wireless Gateways (Gateway General Packet Radio Service (GPRS) Support Node (GGSN), Evolved Packet Core (EPC) Gateway)). By way of example, a service card may be used to terminate IPsec tunnels and execute the attendant authentication and encryption algorithms. These cards are coupled together through one or more interconnect mechanisms illustrated as backplane 836 (e.g., a first full mesh coupling the line cards and a second full mesh coupling all of the cards).
[0073] Returning to Figure 8A, the general purpose network device 804 includes hardware 840 comprising a set of one or more processor(s) 842 (which are often COTS processors) and physical NIs 846, as well as non-transitory machine readable storage media 848 having stored therein software 850. During operation, the processor(s) 842 execute the software 850 to instantiate one or more sets of one or more applications 864A-R. While one embodiment does not implement virtualization, alternative embodiments may use different forms of virtualization. For example, in one such alternative embodiment the virtualization layer 854 represents the kernel of an operating system (or a shim executing on a base operating system) that allows for the creation of multiple instances 862A-R called software containers that may each be used to execute one (or more) of the sets of applications 864A-R; where the multiple software containers (also called virtualization engines, virtual private servers, or jails) are user spaces (typically a virtual memory space) that are separate from each other and separate from the kernel space in which the operating system is run; and where the set of applications running in a given user space, unless explicitly allowed, cannot access the memory of the other processes. In another such alternative embodiment the virtualization layer 854 represents a hypervisor (sometimes referred to as a virtual machine monitor (VMM)) or a hypervisor executing on top of a host operating system, and each of the sets of applications 864A-R is run on top of a guest operating system within an instance 862A-R called a virtual machine (which may in some cases be considered a tightly isolated form of software container) that is run on top of the hypervisor - the guest operating system and application may not know they are running on a virtual machine as opposed to running on a “bare metal” host electronic device, or through para-virtualization the operating system and/or application may be aware of the presence of virtualization for optimization purposes. In yet other alternative embodiments, one, some or all of the applications are implemented as unikernel(s), which can be generated by compiling directly with an application only a limited set of libraries (e.g., from a library operating system (LibOS) including drivers/libraries of OS services) that provide the particular OS services needed by the application. As a unikernel can be implemented to run directly on hardware 840, directly on a hypervisor (in which case the unikemel is sometimes described as running within a LibOS virtual machine), or in a software container, embodiments can be implemented fully with unikernels running directly on a hypervisor represented by virtualization layer 854, unikernels running within software containers represented by instances 862A-R, or as a combination of unikemels and the above-described techniques (e.g., unikemels and virtual machines both run directly on a hypervisor, unikemels and sets of applications that are run in different software containers).
[0074] The instantiation of the one or more sets of one or more applications 864A-R, as well as virtualization if implemented, are collectively referred to as software instance(s) 852. Each set of applications 864 A-R, corresponding virtualization construct (e.g., instance 862A-R) if implemented, and that part of the hardware 840 that executes them (be it hardware dedicated to that execution and/or time slices of hardware temporally shared), forms a separate virtual network element(s) 860A-R.
[0075] The virtual network element(s) 860A-R perform similar functionality to the virtual network element(s) 830A-R - e.g., similar to the control communication and configuration module(s) 832A and forwarding table(s) 834A (this virtualization of the hardware 840 is sometimes referred to as network function virtualization (NFV)). Thus, NFV may be used to consolidate many network equipment types onto industry standard high volume server hardware, physical switches, and physical storage, which could be located in Data centers, NDs, and customer premise equipment (CPE). While embodiments of the invention are illustrated with each instance 862A-R corresponding to one VNE 860A-R, alternative embodiments may implement this correspondence at a finer level granularity (e.g., line card virtual machines virtualize line cards, control card virtual machine virtualize control cards, etc.); it should be understood that the techniques described herein with reference to a correspondence of instances 862A-R to VNEs also apply to embodiments where such a finer level of granularity and/or unikernels are used.
[0076] In certain embodiments, the virtualization layer 854 includes a virtual switch that provides similar forwarding services as a physical Ethernet switch. Specifically, this virtual switch forwards traffic between instances 862A-R and the physical NI(s) 846, as well as optionally between the instances 862A-R; in addition, this virtual switch may enforce network isolation between the VNEs 860A-R that by policy are not permitted to communicate with each other (e.g., by honoring virtual local area networks (VLANs)).
[0077] The third exemplary ND implementation in Figure 8A is a hybrid network device 806, which includes both custom ASICs/special-purpose OS and COTS processors/standard OS in a single ND or a single card within an ND. In certain embodiments of such a hybrid network device, a platform VM (i.e., a VM that that implements the functionality of the special-purpose network device 802) could provide for para- virtualization to the networking hardware present in the hybrid network device 806.
[0078] Regardless of the above exemplary implementations of an ND, when a single one of multiple VNEs implemented by an ND is being considered (e.g., only one of the VNEs is part of a given virtual network) or where only a single VNE is currently being implemented by an ND, the shortened term network element (NE) is sometimes used to refer to that VNE. Also in all of the above exemplary implementations, each of the VNEs (e.g., VNE(s) 830A- R, VNEs 860A-R, and those in the hybrid network device 806) receives data on the physical NIs (e.g., 816, 846) and forwards that data out the appropriate ones of the physical NIs (e.g., 816, 846). For example, a VNE implementing IP router functionality forwards IP packets on the basis of some of the IP header information in the IP packet; where IP header information includes source IP address, destination IP address, source port, destination port (where “source port” and“destination port” refer herein to protocol ports, as opposed to physical ports of a ND), transport protocol (e.g., user datagram protocol (UDP), Transmission Control Protocol (TCP), and differentiated services code point (DSCP) values.
[0079] Figure 8C illustrates various exemplary ways in which VNEs may be coupled according to some embodiments of the invention. Figure 8C shows VNEs 870A.1-870A.P (and optionally VNEs 870A.Q-870A.R) implemented in ND 800A and VNE 870H.1 in ND 800H. In Figure 8C, VNEs 870A.1-P are separate from each other in the sense that they can receive packets from outside ND 800A and forward packets outside of ND 800A; VNE 870A.1 is coupled with VNE 870H.1, and thus they communicate packets between their respective NDs; VNE 870A.2-870A.3 may optionally forward packets between themselves without forwarding them outside of the ND 800A; and VNE 870A.P may optionally be the first in a chain of VNEs that includes VNE 870A.Q followed by VNE 870A.R (this is sometimes referred to as dynamic service chaining, where each of the VNEs in the series of VNEs provides a different service - e.g., one or more layer 4-7 network services). While Figure 8C illustrates various exemplary relationships between the VNEs, alternative embodiments may support other relationships (e.g., more/fewer VNEs, more/fewer dynamic service chains, multiple different dynamic service chains with some common VNEs and some different VNEs).
[0080] The NDs of Figure 8A, for example, may form part of the Internet or a private network; and other electronic devices (not shown; such as end user devices including workstations, laptops, netbooks, tablets, palm tops, mobile phones, smartphones, phablets, multimedia phones, Voice Over Internet Protocol (VOIP) phones, terminals, portable media players, GPS units, wearable devices, gaming systems, set-top boxes, Internet enabled household appliances) may be coupled to the network (directly or through other networks such as access networks) to communicate over the network (e.g., the Internet or virtual private networks (VPNs) overlaid on (e.g., tunneled through) the Internet) with each other (directly or through servers) and/or access content and/or services. Such content and/or services are typically provided by one or more servers (not shown) belonging to a service/content provider or one or more end user devices (not shown) participating in a peer- to-peer (P2P) service, and may include, for example, public webpages (e.g., free content, store fronts, search services), private webpages (e.g., username/password accessed webpages providing email services), and/or corporate networks over VPNs. For instance, end user devices may be coupled (e.g., through customer premise equipment coupled to an access network (wired or wirelessly)) to edge NDs, which are coupled (e.g., through one or more core NDs) to other edge NDs, which are coupled to electronic devices acting as servers. However, through compute and storage virtualization, one or more of the electronic devices operating as the NDs in Figure 8 A may also host one or more such servers (e.g., in the case of the general purpose network device 804, one or more of the software instances 862A-R may operate as servers; the same would be true for the hybrid network device 806; in the case of the special-purpose network device 802, one or more such servers could also be run on a virtualization layer executed by the processor(s) 812); in which case the servers are said to be co-located with the VNEs of that ND.
[0081] A virtual network is a logical abstraction of a physical network (such as that in Figure 8 A) that provides network services (e.g., L2 and/or L3 services). A virtual network can be implemented as an overlay network (sometimes referred to as a network virtualization overlay) that provides network services (e.g., layer 2 (L2, data link layer) and/or layer 3 (L3, network layer) services) over an underlay network (e.g., an L3 network, such as an Internet Protocol (IP) network that uses tunnels (e.g., generic routing encapsulation (GRE), layer 2 tunneling protocol (L2TP), IPSec) to create the overlay network).
[0082] A network virtualization edge (NVE) sits at the edge of the underlay network and participates in implementing the network virtualization; the network-facing side of the NVE uses the underlay network to tunnel frames to and from other NVEs; the outward-facing side of the NVE sends and receives data to and from systems outside the network. A virtual network instance (VNI) is a specific instance of a virtual network on a NVE (e.g., a NE/VNE on an ND, a part of a NE/VNE on a ND where that NE/VNE is divided into multiple VNEs through emulation); one or more VNIs can be instantiated on an NVE (e.g., as different VNEs on an ND). A virtual access point (VAP) is a logical connection point on the NVE for connecting external systems to a virtual network; a VAP can be physical or virtual ports identified through logical interface identifiers (e.g., a VLAN ID).
[0083] Examples of network services include: 1) an Ethernet LAN emulation service (an Ethernet-based multipoint service similar to an Internet Engineering Task Force (IETF) Multiprotocol Label Switching (MPLS) or Ethernet VPN (EVPN) service) in which external systems are interconnected across the network by a LAN environment over the underlay network (e.g., an NVE provides separate L2 VNIs (virtual switching instances) for different such virtual networks, and L3 (e.g., IP/MPLS) tunneling encapsulation across the underlay network); and 2) a virtualized IP forwarding service (similar to IETF IP VPN (e.g., Border Gateway Protocol (BGP)/MPLS IP VPN) from a service definition perspective) in which external systems are interconnected across the network by an L3 environment over the underlay network (e.g., an NVE provides separate L3 VNIs (forwarding and routing instances) for different such virtual networks, and L3 (e.g., IP/MPLS) tunneling encapsulation across the underlay network)). Network services may also include quality of service capabilities (e.g., traffic classification marking, traffic conditioning and scheduling), security capabilities (e.g., filters to protect customer premises from network - originated attacks, to avoid malformed route announcements), and management capabilities (e.g., full detection and processing).
[0084] Figure 8D illustrates a network with a single network element on each of the NDs of Figure 8A, and within this straight forward approach contrasts a traditional distributed approach (commonly used by traditional routers) with a centralized approach for maintaining reachability and forwarding information (also called network control), according to some embodiments of the invention. Specifically, Figure 8D illustrates network elements (NEs) 870A-H with the same connectivity as the NDs 800A-H of Figure 8A.
[0085] Figure 8D illustrates that the distributed approach 872 distributes responsibility for generating the reachability and forwarding information across the NEs 870A-H; in other words, the process of neighbor discovery and topology discovery is distributed.
[0086] For example, where the special-purpose network device 802 is used, the control communication and configuration module(s) 832A-R of the ND control plane 824 typically include a reachability and forwarding information module to implement one or more routing protocols (e.g., an exterior gateway protocol such as Border Gateway Protocol (BGP), Interior Gateway Protocol(s) (IGP) (e.g., Open Shortest Path First (OSPF), Intermediate System to Intermediate System (IS-IS), Routing Information Protocol (RIP), Label Distribution Protocol (LDP), Resource Reservation Protocol (RSVP) (including RSVP- Traffic Engineering (TE): Extensions to RSVP for LSP Tunnels and Generalized Multi- Protocol Label Switching (GMPLS) Signaling RSVP-TE)) that communicate with other NEs to exchange routes, and then selects those routes based on one or more routing metrics. Thus, the NEs 870A-H (e.g., the processor(s) 812 executing the control communication and configuration module(s) 832A-R) perform their responsibility for participating in controlling how data (e.g., packets) is to be routed (e.g., the next hop for the data and the outgoing physical NI for that data) by distributively determining the reachability within the network and calculating their respective forwarding information. Routes and adjacencies are stored in one or more routing structures (e.g., Routing Information Base (RIB), Label Information Base (LIB), one or more adjacency structures) on the ND control plane 824. The ND control plane 824 programs the ND forwarding plane 826 with information (e.g., adjacency and route information) based on the routing structure(s). For example, the ND control plane 824 programs the adjacency and route information into one or more forwarding table(s) 834A-R (e.g., Forwarding Information Base (FIB), Label Forwarding Information Base (LFIB), and one or more adjacency structures) on the ND forwarding plane 826. For layer 2 forwarding, the ND can store one or more bridging tables that are used to forward data based on the layer 2 information in that data. While the above example uses the special-purpose network device 802, the same distributed approach 872 can be implemented on the general purpose network device 804 and the hybrid network device 806.
[0087] Figure 8D illustrates that a centralized approach 874 (also known as software defined networking (SDN)) that decouples the system that makes decisions about where traffic is sent from the underlying systems that forwards traffic to the selected destination. The illustrated centralized approach 874 has the responsibility for the generation of reachability and forwarding information in a centralized control plane 876 (sometimes referred to as a SDN control module, controller, network controller, OpenFlow controller, SDN controller, control plane node, network virtualization authority, or management control entity), and thus the process of neighbor discovery and topology discovery is centralized. The centralized control plane 876 has a south bound interface 882 with a data plane 880 (sometime referred to the infrastructure layer, network forwarding plane, or forwarding plane (which should not be confused with a ND forwarding plane)) that includes the NEs 870A-H (sometimes referred to as switches, forwarding elements, data plane elements, or nodes). The centralized control plane 876 includes a network controller 878, which includes a centralized reachability and forwarding information module 879 that determines the reachability within the network and distributes the forwarding information to the NEs 870A- H of the data plane 880 over the south bound interface 882 (which may use the OpenFlow protocol). Thus, the network intelligence is centralized in the centralized control plane 876 executing on electronic devices that are typically separate from the NDs.
[0088] For example, where the special-purpose network device 802 is used in the data plane 880, each of the control communication and configuration module(s) 832A-R of the ND control plane 824 typically include a control agent that provides the VNE side of the south bound interface 882. In this case, the ND control plane 824 (the processor(s) 812 executing the control communication and configuration module(s) 832A-R) performs its responsibility for participating in controlling how data (e.g., packets) is to be routed (e.g., the next hop for the data and the outgoing physical NI for that data) through the control agent communicating with the centralized control plane 876 to receive the forwarding information (and in some cases, the reachability information) from the centralized reachability and forwarding information module 879 (it should be understood that in some embodiments of the invention, the control communication and configuration module(s) 832A-R, in addition to communicating with the centralized control plane 876, may also play some role in determining reachability and/or calculating forwarding information - albeit less so than in the case of a distributed approach; such embodiments are generally considered to fall under the centralized approach 874, but may also be considered a hybrid approach).
[0089] While the above example uses the special-purpose network device 802, the same centralized approach 874 can be implemented with the general purpose network device 804 (e.g., each of the VNE 860A-R performs its responsibility for controlling how data (e.g., packets) is to be routed (e.g., the next hop for the data and the outgoing physical NI for that data) by communicating with the centralized control plane 876 to receive the forwarding information (and in some cases, the reachability information) from the centralized reachability and forwarding information module 879; it should be understood that in some embodiments of the invention, the VNEs 860A-R, in addition to communicating with the centralized control plane 876, may also play some role in determining reachability and/or calculating forwarding information - albeit less so than in the case of a distributed approach) and the hybrid network device 806. In fact, the use of SDN techniques can enhance the NFV techniques typically used in the general purpose network device 804 or hybrid network device 806 implementations as NFV is able to support SDN by providing an infrastructure upon which the SDN software can be run, and NFV and SDN both aim to make use of commodity server hardware and physical switches.
[0090] Figure 8D also shows that the centralized control plane 876 has a north bound interface 884 to an application layer 886, in which resides application(s) 888. The centralized control plane 876 has the ability to form virtual networks 892 (sometimes referred to as a logical forwarding plane, network services, or overlay networks (with the NEs 870A-H of the data plane 880 being the underlay network)) for the application(s) 888. Thus, the centralized control plane 876 maintains a global view of all NDs and configured NEs/VNEs, and it maps the virtual networks to the underlying NDs efficiently (including maintaining these mappings as the physical network changes either through hardware (ND, link, or ND component) failure, addition, or removal).
[0091] While Figure 8D shows the distributed approach 872 separate from the centralized approach 874, the effort of network control may be distributed differently or the two combined in certain embodiments of the invention. For example: 1) embodiments may generally use the centralized approach (SDN) 874, but have certain functions delegated to the NEs (e.g., the distributed approach may be used to implement one or more of fault monitoring, performance monitoring, protection switching, and primitives for neighbor and/or topology discovery); or 2) embodiments of the invention may perform neighbor discovery and topology discovery via both the centralized control plane and the distributed protocols, and the results compared to raise exceptions where they do not agree. Such embodiments are generally considered to fall under the centralized approach 874, but may also be considered a hybrid approach.
[0092] While Figure 8D illustrates the simple case where each of the NDs 800A-H implements a single NE 870A-H, it should be understood that the network control approaches described with reference to Figure 8D also work for networks where one or more of the NDs 800A-H implement multiple VNEs (e.g., VNEs 830A-R, VNEs 860A-R, those in the hybrid network device 806). Alternatively or in addition, the network controller 878 may also emulate the implementation of multiple VNEs in a single ND. Specifically, instead of (or in addition to) implementing multiple VNEs in a single ND, the network controller 878 may present the implementation of a VNE/NE in a single ND as multiple VNEs in the virtual networks 892 (all in the same one of the virtual network(s) 892, each in different ones of the virtual network(s) 892, or some combination). For example, the network controller 878 may cause an ND to implement a single VNE (a NE) in the underlay network, and then logically divide up the resources of that NE within the centralized control plane 876 to present different VNEs in the virtual network(s) 892 (where these different VNEs in the overlay networks are sharing the resources of the single VNE/NE implementation on the ND in the underlay network).
[0093] On the other hand, Figures 8E and 8F respectively illustrate exemplary abstractions of NEs and VNEs that the network controller 878 may present as part of different ones of the virtual networks 892. Figure 8E illustrates the simple case of where each of the NDs 800A-H implements a single NE 870A-H (see Figure 8D), but the centralized control plane 876 has abstracted multiple of the NEs in different NDs (the NEs 870A-C and G-H) into (to represent) a single NE 8701 in one of the virtual network(s) 892 of Figure 8D, according to some embodiments of the invention. Figure 8E shows that in this virtual network, the NE 8701 is coupled to NE 870D and 870F, which are both still coupled to NE 870E.
[0094] Figure 8F illustrates a case where multiple VNEs (VNE 870A.1 and VNE 870H.1) are implemented on different NDs (ND 800A and ND 800H) and are coupled to each other, and where the centralized control plane 876 has abstracted these multiple VNEs such that they appear as a single VNE 870T within one of the virtual networks 892 of Figure 8D, according to some embodiments of the invention. Thus, the abstraction of a NE or VNE can span multiple NDs. [0095] While some embodiments of the invention implement the centralized control plane 876 as a single entity (e.g., a single instance of software running on a single electronic device), alternative embodiments may spread the functionality across multiple entities for redundancy and/or scalability purposes (e.g., multiple instances of software running on different electronic devices).
[0096] Similar to the network device implementations, the electronic device(s) running the centralized control plane 876, and thus the network controller 878 including the centralized reachability and forwarding information module 879, may be implemented a variety of ways (e.g., a special purpose device, a general-purpose (e.g., COTS) device, or hybrid device). These electronic device(s) would similarly include processor(s), a set or one or more physical NIs, and a non-transitory machine -readable storage medium having stored thereon the centralized control plane software. For instance, Figure 9 illustrates, a general purpose control plane device 904 including hardware 940 comprising a set of one or more processor(s) 942 (which are often COTS processors) and physical NIs 946, as well as non- transitory machine readable storage media 948 having stored therein centralized control plane (CCP) software 950.
[0097] In embodiments that use compute virtualization, the processor(s) 942 typically execute software to instantiate a virtualization layer 954 (e.g., in one embodiment the virtualization layer 954 represents the kernel of an operating system (or a shim executing on a base operating system) that allows for the creation of multiple instances 962A-R called software containers (representing separate user spaces and also called virtualization engines, virtual private servers, or jails) that may each be used to execute a set of one or more applications; in another embodiment the virtualization layer 954 represents a hypervisor (sometimes referred to as a virtual machine monitor (VMM)) or a hypervisor executing on top of a host operating system, and an application is run on top of a guest operating system within an instance 962A-R called a virtual machine (which in some cases may be considered a tightly isolated form of software container) that is run by the hypervisor ; in another embodiment, an application is implemented as a unikemel, which can be generated by compiling directly with an application only a limited set of libraries (e.g , from a library operating system (LibOS) including drivers/libraries of OS services) that provide the particular OS services needed by the application, and the unikemel can run directly on hardware 940, directly on a hypervisor represented by virtualization layer 954 (in which case the unikemel is sometimes described as running within a LibOS virtual machine), or in a software container represented by one of instances 962A-R). Again, in embodiments where compute virtualization is used, during operation an instance of the CCP software 950 (illustrated as CCP instance 976A) is executed (e.g., within the instance 962A) on the virtualization layer 954. In embodiments where compute virtualization is not used, the CCP instance 976A is executed, as a unikemel or on top of a host operating system, on the“bare metal” general purpose control plane device 904. The instantiation of the CCP instance 976A, as well as the virtualization layer 954 and instances 962A-R if implemented, are collectively referred to as software instance(s) 952.
[0098] In some embodiments, the CCP instance 976A includes a network controller instance 978. The network controller instance 978 includes a centralized reachability and forwarding information module instance 979 (which is a middleware layer providing the context of the network controller 878 to the operating system and communicating with the various NEs), and an CCP application layer 980 (sometimes referred to as an application layer) over the middleware layer (providing the intelligence required for various network operations such as protocols, network situational awareness, and user - interfaces). At a more abstract level, this CCP application layer 980 within the centralized control plane 876 works with virtual network view(s) (logical view(s) of the network) and the middleware layer provides the conversion from the virtual networks to the physical view.
[0099] The centralized control plane 876 transmits relevant messages to the data plane 880 based on CCP application layer 980 calculations and middleware layer mapping for each flow. A flow may be defined as a set of packets whose headers match a given pattern of bits; in this sense, traditional IP forwarding is also flow-based forwarding where the flows are defined by the destination IP address for example; however, in other implementations, the given pattern of bits used for a flow definition may include more fields (e.g., 10 or more) in the packet headers. Different NDs/NEs/VNEs of the data plane 880 may receive different messages, and thus different forwarding information. The data plane 880 processes these messages and programs the appropriate flow information and corresponding actions in the forwarding tables (sometime referred to as flow tables) of the appropriate NE/VNEs, and then the NEs/VNEs map incoming packets to flows represented in the forwarding tables and forward packets based on the matches in the forwarding tables.
[00100] Standards such as OpenFlow define the protocols used for the messages, as well as a model for processing the packets. The model for processing packets includes header parsing, packet classification, and making forwarding decisions. Header parsing describes how to interpret a packet based upon a well-known set of protocols. Some protocol fields are used to build a match structure (or key) that will be used in packet classification (e.g., a first key field could be a source media access control (MAC) address, and a second key field could be a destination MAC address).
[00101] Packet classification involves executing a lookup in memory to classify the packet by determining which entry (also referred to as a forwarding table entry or flow entry) in the forwarding tables best matches the packet based upon the match structure, or key, of the forwarding table entries. It is possible that many flows represented in the forwarding table entries can correspond/match to a packet; in this case the system is typically configured to determine one forwarding table entry from the many according to a defined scheme (e.g., selecting a first forwarding table entry that is matched). Forwarding table entries include both a specific set of match criteria (a set of values or wildcards, or an indication of what portions of a packet should be compared to a particular value/values/wildcards, as defined by the matching capabilities - for specific fields in the packet header, or for some other packet content), and a set of one or more actions for the data plane to take on receiving a matching packet. For example, an action may be to push a header onto the packet, for the packet using a particular port, flood the packet, or simply drop the packet. Thus, a forwarding table entry for IPv4/IPv6 packets with a particular transmission control protocol (TCP) destination port could contain an action specifying that these packets should be dropped.
[00102] Making forwarding decisions and performing actions occurs, based upon the forwarding table entry identified during packet classification, by executing the set of actions identified in the matched forwarding table entry on the packet.
[00103] However, when an unknown packet (for example, a“missed packet” or a“match- miss” as used in OpenFlow parlance) arrives at the data plane 880, the packet (or a subset of the packet header and content) is typically forwarded to the centralized control plane 876. The centralized control plane 876 will then program forwarding table entries into the data plane 880 to accommodate packets belonging to the flow of the unknown packet. Once a specific forwarding table entry has been programmed into the data plane 880 by the centralized control plane 876, the next packet with matching credentials will match that forwarding table entry and take the set of actions associated with that matched entry.
[00104] A network interface (NI) may be physical or virtual; and in the context of IP, an interface address is an IP address assigned to a NI, be it a physical NI or virtual NI. A virtual NI may be associated with a physical NI, with another virtual interface, or stand on its own (e.g., a loopback interface, a point-to-point protocol interface). A NI (physical or virtual) may be numbered (a NI with an IP address) or unnumbered (a NI without an IP address). A loopback interface (and its loopback address) is a specific type of virtual NI (and IP address) of a NE/VNE (physical or virtual) often used for management purposes; where such an IP address is referred to as the nodal loopback address. The IP address(es) assigned to the NI(s) of a ND are referred to as IP addresses of that ND; at a more granular level, the IP address(es) assigned to NI(s) assigned to a NE/VNE implemented on a ND can be referred to as IP addresses of that NE/VNE.
[00105] Each VNE (e.g., a virtual router, a virtual bridge (which may act as a virtual switch instance in a Virtual Private LAN Service (VPLS) is typically independently administrable. For example, in the case of multiple virtual routers, each of the virtual routers may share system resources but is separate from the other virtual routers regarding its management domain, AAA (authentication, authorization, and accounting) name space, IP address, and routing database(s). Multiple VNEs may be employed in an edge ND to provide direct network access and/or different classes of services for subscribers of service and/or content providers.
[00106] Within certain NDs,“interfaces” that are independent of physical NIs may be configured as part of the VNEs to provide higher-layer protocol and service information (e.g., Layer 3 addressing). The subscriber records in the AAA server identify, in addition to the other subscriber configuration requirements, to which context (e.g., which of the VNEs/NEs) the corresponding subscribers should be bound within the ND. As used herein, a binding forms an association between a physical entity (e.g., physical NI, channel) or a logical entity (e.g., circuit such as a subscriber circuit or logical circuit (a set of one or more subscriber circuits)) and a context’s interface over which network protocols (e.g., routing protocols, bridging protocols) are configured for that context. Subscriber data flows on the physical entity when some higher-layer protocol interface is configured and associated with that physical entity.
[00107] Some NDs provide support for implementing VPNs (Virtual Private Networks) (e.g., Layer 3 VPNs). For example, the ND where a provider’s network and a customer’s network are coupled are respectively referred to as PEs (Provider Edge) and CEs (Customer Edge). In a Layer 3 VPN, routing typically is performed by the PEs. By way of example, an edge ND that supports multiple VNEs may be deployed as a PE; and a VNE may be configured with a VPN protocol, and thus that VNE is referred as a VPN VNE.
[00108] While the flow diagrams in the figures show a particular order of operations performed by certain embodiments of the invention, it should be understood that such order is exemplary (e.g., alternative embodiments may perform the operations in a different order, combine certain operations, overlap certain operations, etc.).
[00109] While the invention has been described in terms of several embodiments, those skilled in the art will recognize that the invention is not limited to the embodiments described, can be practiced with modification and alteration within the spirit and scope of the appended claims. The description is thus to be regarded as illustrative instead of limiting.

Claims

CLAIMS:
1. A method, in a network controller (110) of a network (100), of configuring a plurality of network elements (101-103) in a data plane (120) of the network (100), the method comprising:
selecting (602) a set of one or more designated network elements (103) from the plurality of network elements (101-103), wherein the number of designated network elements (103) is strictly less than the number of all of the plurality of network elements (101-103) in the data plane (120) of the network (100), and the remaining network elements from the plurality of network elements (101-103) in the data plane (120) of the network (100) are non-designated network elements (101-102);
configuring (604) each designated network element from the set of designated network elements (103) to include a Layer 3 forwarding table (203) including a forwarding table entry for each network element from the plurality of network elements (101-103) in the data plane (120) of the network (100); and configuring (606) each non-designated network element (101, 102) to include a Layer 3 forwarding table (201, 202) including a default forwarding table entry having a designated network element (103) from the set of designated network elements as a next hop destination for a plurality of traffic flows causing each of the non-designated network elements (101, 102) to forward all traffic associated with the plurality of traffic flows to the designated network element (103).
2. The method of claim 1 further comprising:
receiving (704) a message from a designated network element from the set of designated network elements (103), wherein the message includes an indication that a number of packets of a traffic flow forwarded by the designated network element has exceeded a predetermined threshold; and responsive to receiving the message, configuring (706) one or more non-designated network elements (101-102) to forward packets of the traffic flow causing the packets of the traffic flow to bypass the designated network element (103) when being forwarded in the network (100).
3. The method of claim 2, wherein configuring (604) each designated network element from the set of designated network elements (103) to include a Layer 3 forwarding table includes:
configuring (702) for each traffic flow the predetermined threshold that when exceeded by the number of packets of the traffic flow forwarded by the designated network element (103) causes the designated network element (103) to transmit the message to the network controller (110).
4. The method of claim 2, wherein the bypass (708) of the designated network element (103) by the packets of the traffic flow is temporary and is set to expire after a predetermined period of time.
5. The method of any of claims 1-4, wherein the network is a software defined networking (SDN) network.
6. The method of claim 5, wherein the data plane of the SDN network is a data center, and the plurality of network elements (101-103) form a Layer 3 Virtual Private Network (VPN) in the data center.
7. The method of any of claims 1-6, wherein configuring the designated network elements (103) and configuring the non-designated network elements (101, 102) is performed by transmitting OpenFlow messages.
8. A machine-readable medium comprising computer program code which when executed by a computer carries out the method steps of any of claims 1-7.
9. A network controller (110) of a network (100) for configuring a plurality of network elements (101-103) in a data plane (120) of the network (100), the network controller (110) comprising:
a non-transitory machine-readable storage medium that provides instructions that, if executed by a processor, will cause the network controller to perform operations comprising:
select (602) a set of one or more designated network elements (103) from the plurality of network elements (101-103), wherein the number of designated network elements (103) is strictly less than the number of all of the plurality of network elements (101-103) in the data plane (120) of the network (100), and the remaining network elements from the plurality of network elements (101-103) in the data plane (120) of the network (100) are non-designated network elements (101-102), configure (604) each designated network element from the set of designated network elements (103) to include a Layer 3 forwarding table (203) including a forwarding table entry for each network element from the plurality of network elements (101-103) in the data plane (120) of the network (100), and
configure (606) each non-designated network element (101, 102) to include a Layer 3 forwarding table (201, 202) including a default forwarding table entry having a designated network element (103) from the set of designated network elements as a next hop destination for a plurality of traffic flows causing each of the non-designated network elements (101, 102) to forward all traffic associated with the plurality of traffic flows to the designated network element (103).
10. The network controller of claim 9, wherein the processor is further to:
receive (704) a message from a designated network element from the set of designated network elements (103), wherein the message includes an indication that a number of packets of a traffic flow forwarded by the designated network element has exceeded a predetermined threshold, and responsive to receiving the message, configure (706) one or more non-designated network elements (101-102) to forward packets of the traffic flow causing the packets of the traffic flow to bypass the designated network element (103) when being forwarded in the network (100).
11. The network controller of claim 10, wherein to configure (604) each designated network element from the set of designated network elements (103) to include a Layer 3 forwarding table includes to:
configure (702) for each traffic flow the predetermined threshold that when exceeded by the number of packets of the traffic flow forwarded by the designated network element (103) causes the designated network element (103) to transmit the message to the network controller (110).
12. The network controller of claim 10, wherein the bypass (708) of the designated network element (103) by the packets of the traffic flow is temporary and is set to expire after a predetermined period of time.
13. The network controller of any of claims 9-12, wherein the network is a software defined networking (SDN) network.
14. The network controller of claim 13, wherein the data plane of the SDN network is a data center, and the plurality of network elements (101-103) form a Layer 3 Virtual Private Network (VPN) in the data center.
15. The network controller of any of claims 9-14, wherein to configure the designated network elements (103) and to configure the non-designated network elements (101, 102) is performed by transmitting OpenFlow messages.
EP18920579.2A 2018-05-30 2018-05-30 Method and apparatus for optimized dissemination of layer 3 forwarding information in software defined networking (sdn) networks Withdrawn EP3804236A4 (en)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/IN2018/050346 WO2019229760A1 (en) 2018-05-30 2018-05-30 Method and apparatus for optimized dissemination of layer 3 forwarding information in software defined networking (sdn) networks

Publications (2)

Publication Number Publication Date
EP3804236A1 true EP3804236A1 (en) 2021-04-14
EP3804236A4 EP3804236A4 (en) 2021-06-09

Family

ID=68697901

Family Applications (1)

Application Number Title Priority Date Filing Date
EP18920579.2A Withdrawn EP3804236A4 (en) 2018-05-30 2018-05-30 Method and apparatus for optimized dissemination of layer 3 forwarding information in software defined networking (sdn) networks

Country Status (2)

Country Link
EP (1) EP3804236A4 (en)
WO (1) WO2019229760A1 (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111741382B (en) * 2020-06-11 2022-06-17 北京全路通信信号研究设计院集团有限公司 Dynamic network topology management system and method
US11336570B1 (en) * 2021-03-09 2022-05-17 Juniper Networks, Inc. Layer three multi-homing for virtual networks

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8644149B2 (en) * 2011-11-22 2014-02-04 Telefonaktiebolaget L M Ericsson (Publ) Mechanism for packet forwarding using switch pools in flow-based, split-architecture networks
US9350671B2 (en) * 2012-03-22 2016-05-24 Futurewei Technologies, Inc. Supporting software defined networking with application layer traffic optimization
US20150180769A1 (en) * 2013-12-20 2015-06-25 Alcatel-Lucent Usa Inc. Scale-up of sdn control plane using virtual switch based overlay
US9641429B2 (en) * 2014-06-18 2017-05-02 Radware, Ltd. Predictive traffic steering over software defined networks
US9853874B2 (en) * 2015-03-23 2017-12-26 Brocade Communications Systems, Inc. Flow-specific failure detection in SDN networks

Also Published As

Publication number Publication date
EP3804236A4 (en) 2021-06-09
WO2019229760A1 (en) 2019-12-05

Similar Documents

Publication Publication Date Title
US11431554B2 (en) Mechanism for control message redirection for SDN control channel failures
US20210135985A1 (en) Mechanism for Hitless Resynchronization During SDN Controller Upgrades Between Incompatible Versions
US11115328B2 (en) Efficient troubleshooting in openflow switches
US10225169B2 (en) Method and apparatus for autonomously relaying statistics to a network controller in a software-defined networking network
US20170070416A1 (en) Method and apparatus for modifying forwarding states in a network device of a software defined network
EP3692685B1 (en) Remotely controlling network slices in a network
US11362925B2 (en) Optimizing service node monitoring in SDN
EP3378205A1 (en) Service based intelligent packet-in buffering mechanism for openflow switches by having variable buffer timeouts
WO2016174597A1 (en) Service based intelligent packet-in mechanism for openflow switches
WO2018109536A1 (en) Method and apparatus for monitoring virtual extensible local area network (vxlan) tunnel with border gateway protocol (bgp)-ethernet virtual private network (evpn) infrastructure
WO2018100437A1 (en) Policy based configuration in programmable access networks
EP3593497B1 (en) Optimizing tunnel monitoring in sdn
EP3935814B1 (en) Dynamic access network selection based on application orchestration information in an edge cloud system
US10721157B2 (en) Mechanism to detect data plane loops in an openflow network
WO2017221050A1 (en) Efficient handling of multi-destination traffic in multi-homed ethernet virtual private networks (evpn)
WO2018065813A1 (en) Method and system for distribution of virtual layer 2 traffic towards multiple access network devices
US20220247679A1 (en) Method and apparatus for layer 2 route calculation in a route reflector network device
EP3804236A1 (en) Method and apparatus for optimized dissemination of layer 3 forwarding information in software defined networking (sdn) networks
US20220311703A1 (en) Controller watch port for robust software defined networking (sdn) system operation
US11451637B2 (en) Method for migration of session accounting to a different stateful accounting peer
US11218406B2 (en) Optimized datapath troubleshooting
WO2017187222A1 (en) Robust method of distributing packet-ins in a software defined networking (sdn) network

Legal Events

Date Code Title Description
STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE INTERNATIONAL PUBLICATION HAS BEEN MADE

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE INTERNATIONAL PUBLICATION HAS BEEN MADE

PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: REQUEST FOR EXAMINATION WAS MADE

17P Request for examination filed

Effective date: 20201025

AK Designated contracting states

Kind code of ref document: A1

Designated state(s): AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR

AX Request for extension of the european patent

Extension state: BA ME

A4 Supplementary search report drawn up and despatched

Effective date: 20210510

RIC1 Information provided on ipc code assigned before grant

Ipc: H04L 12/715 20130101AFI20210503BHEP

Ipc: H04L 12/751 20130101ALI20210503BHEP

Ipc: H04L 12/70 20130101ALI20210503BHEP

Ipc: H04L 12/24 20060101ALI20210503BHEP

Ipc: G06F 9/44 20180101ALI20210503BHEP

DAV Request for validation of the european patent (deleted)
DAX Request for extension of the european patent (deleted)
STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE APPLICATION IS DEEMED TO BE WITHDRAWN

18D Application deemed to be withdrawn

Effective date: 20210112