WO2017175033A1 - Procédé et appareil d'activation de routage ininterrompu (snr) dans un réseau de transmission par paquets - Google Patents

Procédé et appareil d'activation de routage ininterrompu (snr) dans un réseau de transmission par paquets Download PDF

Info

Publication number
WO2017175033A1
WO2017175033A1 PCT/IB2016/051951 IB2016051951W WO2017175033A1 WO 2017175033 A1 WO2017175033 A1 WO 2017175033A1 IB 2016051951 W IB2016051951 W IB 2016051951W WO 2017175033 A1 WO2017175033 A1 WO 2017175033A1
Authority
WO
WIPO (PCT)
Prior art keywords
network element
transport protocol
routing
protocol
message
Prior art date
Application number
PCT/IB2016/051951
Other languages
English (en)
Inventor
Abhay RAJURE
Original Assignee
Telefonaktiebolaget Lm Ericsson (Publ)
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Telefonaktiebolaget Lm Ericsson (Publ) filed Critical Telefonaktiebolaget Lm Ericsson (Publ)
Priority to PCT/IB2016/051951 priority Critical patent/WO2017175033A1/fr
Publication of WO2017175033A1 publication Critical patent/WO2017175033A1/fr

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L45/00Routing or path finding of packets in data switching networks
    • H04L45/02Topology update or discovery
    • H04L45/021Ensuring consistency of routing table updates, e.g. by using epoch numbers
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L45/00Routing or path finding of packets in data switching networks
    • H04L45/24Multipath
    • H04L45/247Multipath using M:N active or standby paths

Definitions

  • Embodiments of the invention relate to the field of packet networks; and more specifically, to nonstop routing in a packet network.
  • Redundancy systems aim at providing a high availability (HA) solution that increases the availability of network elements, and may optionally be used to provide geographical redundancy. Redundancy systems are commonly implemented through a mated pair of an active network element and a standby network element.
  • the active network element handles current sessions using session states.
  • the session data is synchronized or replicated from the active network element to the standby network element.
  • the standby network element begins to handle the sessions when a switchover event occurs.
  • a switchover occurring in a redundancy system causes routing protocol adjacencies to break and corresponding routing protocol sessions to fail.
  • a primary network element i.e., the NE which was in an active state prior to the switchover
  • the backup NE i.e., the NE which was in a standby state prior to the switchover
  • the neighbor may have advertised to its own neighbors that the NE X (i.e., the redundancy system) is no longer a valid next hop to any destinations beyond it, and the neighbors should find another path.
  • NSR Nonstop routing
  • GR Graceful Restart
  • PE provider edge
  • CE customer edge
  • NSR uses internal processes to keep the standby NE aware of routing protocol state and adjacency maintenance activities, so that after a switchover the standby NE can take charge of the existing routing protocol sessions rather than having to establish new ones. The switchover is then transparent to the neighbors, and because the NSR process is internal there is no need for the neighbors to support any kind of protocol extension.
  • One technique for enabling a standby/backup NE to synchronize its routing protocol states is performed by storing a copy of the neighbor state information and transmitting route refresh messages from the newly active NE (i.e., the NE which has switched from the standby state to the active state) to the neighbors of the redundancy system based on the stored copy of neighbor state information, to obtain the best routes.
  • Another technique packet replication is used to enable both the active and standby NEs of the redundancy system to be synchronized to the same states of a routing protocol.
  • packets are processed at both NEs of the redundancy system in parallel causing the two NEs to maintain identical routing protocol states.
  • the routing protocol states of the two NEs easily result in being out of synch due to processing delays and processing priorities in the respective NEs, creating a complex synchronization problem.
  • Routing protocols e.g., Border Gateway Protocol (BGP)
  • Border Gateway Protocol BGP
  • TCP Transmission Control Protocol
  • protocol information e.g., TCP segments as well as BGP packets, etc.
  • TCP segments are transmitted from the active NE to the standby NE to be processed along with the BGP packets.
  • both NEs process TCP and BGP messages received at the active NE and achieve synchronization by enabling the standby NE to process the BGP packets only once the active NE has successfully processed the BGP packets.
  • One general aspect includes a method in a first network element coupled with a second network element, where the first and the second network elements are part of a redundancy system, and the first network element is in an active state, of enabling nonstop routing.
  • the method includes receiving one or more transport protocol packets from a peer network device that has a routing protocol session established with the first network element, where the one or more transport protocol packets include a routing protocol message associated with the routing protocol session; processing the transport protocol packets to retrieve the routing protocol message; and transmitting a synchronization message to the second network element, where the synchronization message includes the retrieved routing protocol message, an identifier of a transport protocol session associated with the routing protocol session, and current transport protocol states associated with the routing protocol session.
  • One general aspect includes a first network element to be coupled with a second network element, where the first and the second network elements are part of a redundancy system, and where the first network element is in an active state, for enabling nonstop routing.
  • the first network element includes a non-transitory computer readable medium to store instructions; and a processor coupled with the non-transitory computer readable medium to process the stored instructions to receive one or more transport protocol packets from a peer network device that has a routing protocol session established with the first network element, where the one or more transport protocol packets include a routing protocol message associated with the routing protocol session.
  • the first network element is further to process the transport protocol packets to retrieve the routing protocol message and transmit a synchronization message to the second network element, where the synchronization message includes the retrieved routing protocol message, an identifier of a transport protocol session associated with the routing protocol session, and current transport protocol states associated with the transport protocol session.
  • One general aspect includes a non- transitory computer readable storage medium that provide instructions, which when executed by a processor of a first network element to be coupled with a second network element, where the first and the second network elements are part of a redundancy system, and where the first network element is in an active state, cause said processor to perform operations including: receiving one or more transport protocol packets from a peer network device that has a routing protocol session established with the first network element, where the one or more transport protocol packets include a routing protocol message associated with the routing protocol session; processing the transport protocol packets to retrieve the routing protocol message; and transmitting a synchronization message to the second network element, where the synchronization message includes the retrieved routing protocol message, an identifier of a transport protocol session associated with the routing protocol session, and current transport protocol states associated with the transport protocol session.
  • Figure 1 illustrates a block diagram of a redundancy system for enabling nonstop routing in a packet network according to some embodiments of the invention.
  • Figure 2 illustrates a flow diagram of operations performed for updating the standby
  • NE upon receipt of information indicating a mapping between the routing protocol sessions and the transport protocol sessions according to some embodiments of the invention.
  • Figure 3 illustrates a block diagram of exemplary state transitions when performing an initialization process within the redundancy system according to some embodiments of the invention.
  • Figure 4 illustrates a block diagram of operations performed for synchronization of NEs of a redundancy system to enable nonstop routing according to some embodiments of the invention.
  • Figure 5 illustrates a flow diagram of operations performed for synchronization of NEs of a redundancy system to enable nonstop routing according to some embodiments of the invention.
  • Figure 6 A illustrates a flow diagram of exemplary detailed operations performed at the active network device upon receipt of transport protocol packets including a routing protocol message according to some embodiments of the invention.
  • Figure 6B illustrates a flow diagram of exemplary operations performed at the standby network device upon receipt of a synchronization message including a BGP message according to some embodiments of the invention.
  • Figure 7 illustrates a block diagram of operations performed for synchronization of NEs of a redundancy system to enable nonstop routing according to some embodiments of the invention.
  • Figure 8A illustrates a flow diagram of exemplary detailed operations performed at the active network element for transmitting a BGP message according to some embodiments of the invention.
  • Figure 8B illustrates a flow diagram of exemplary operations performed at the standby network device upon receipt of a synchronization message including a BGP message according to some embodiments of the invention.
  • Figure 9 illustrates a flow diagram of operations performed at the NE 101B when a switchover occurs and the NE assumes an active role within the redundancy system in which a nonstop routing is enabled according to some embodiments of the invention.
  • Figure 10A illustrates connectivity between network devices (NDs) within an exemplary network, as well as three exemplary implementations of the NDs, according to some embodiments of the invention.
  • Figure 10B illustrates an exemplary way to implement a special-purpose network device according to some embodiments of the invention.
  • FIG. 1 illustrates various exemplary ways in which virtual network elements (VNEs) may be coupled according to some embodiments of the invention.
  • VNEs virtual network elements
  • Figure 10D illustrates a network with a single network element (NE) on each of the NDs, and within this straight forward approach contrasts a traditional distributed approach (commonly used by traditional routers) with a centralized approach for maintaining reachability and forwarding information (also called network control), according to some embodiments of the invention.
  • NE network element
  • Figure 10E illustrates the simple case of where each of the NDs implements a single NE, but a centralized control plane has abstracted multiple of the NEs in different NDs into (to represent) a single NE in one of the virtual network(s), according to some embodiments of the invention.
  • Figure 10F illustrates a case where multiple VNEs are implemented on different NDs and are coupled to each other, and where a centralized control plane has abstracted these multiple VNEs such that they appear as a single VNE within one of the virtual networks, according to some embodiments of the invention.
  • Figure 11 illustrates a general purpose control plane device with centralized control plane (CCP) software 1150), according to some embodiments of the invention.
  • CCP centralized control plane
  • partitioning/sharing/duplication implementations types and interrelationships of system components, and logic partitioning/integration choices are set forth in order to provide a more thorough understanding of the present invention. It will be appreciated, however, by one skilled in the art that the invention may be practiced without such specific details. In other instances, control structures, gate level circuits and full software instruction sequences have not been shown in detail in order not to obscure the invention. Those of ordinary skill in the art, with the included descriptions, will be able to implement appropriate functionality without undue experimentation.
  • references in the specification to "one embodiment,” “an embodiment,” “an example embodiment,” etc., indicate that the embodiment described may include a particular feature, structure, or characteristic, but every embodiment may not necessarily include the particular feature, structure, or characteristic. Moreover, such phrases are not necessarily referring to the same embodiment. Further, when a particular feature, structure, or characteristic is described in connection with an embodiment, it is submitted that it is within the knowledge of one skilled in the art to affect such feature, structure, or characteristic in connection with other embodiments whether or not explicitly described.
  • Bracketed text and blocks with dashed borders may be used herein to illustrate optional operations that add additional features to embodiments of the invention. However, such notation should not be taken to mean that these are the only options or optional operations, and/or that blocks with solid borders are not optional in certain embodiments of the invention.
  • Coupled is used to indicate that two or more elements, which may or may not be in direct physical or electrical contact with each other, co-operate or interact with each other.
  • Connected is used to indicate the establishment of communication between two or more elements that are coupled with each other.
  • Figure 1 illustrates a block diagram of an exemplary redundancy system 101 for enabling nonstop routing in a packet network 100 in accordance with some embodiments.
  • Packet network 100 includes one or more end users' devices 107A-N.
  • the end user devices may belong to subscribers to a service offered over the packet network.
  • suitable end user devices include, but are not limited to, servers, workstations, laptops, netbooks, palm tops, mobile phones, smartphones, multimedia phones, tablets, phablets, Voice Over Internet Protocol (VOIP) phones, user equipment, terminals, portable media players, GPS units, gaming systems, set-top boxes, and combinations thereof.
  • End users' devices 107 A-N access content/services provided over the Internet and/or content/services provided on virtual private networks (VPNs) overlaid on (e.g., tunneled through) the Internet.
  • VPNs virtual private networks
  • the content and/or services are typically provided by one or more provider end stations (e.g., application servers) belonging to a service or content provider.
  • provider end stations e.g., application servers
  • Examples of such content and/or services include, but are not limited to, public webpages (e.g., free content, store fronts, search services), private webpages (e.g., username/password accessed webpages providing email services), and/or corporate networks over VPNs, etc.
  • end users' devices 107 A-N are communicatively coupled to network 105 (e.g., a backhaul network) which includes one or more network devices such as peer ND 102.
  • the network devices e.g., peer ND 102
  • the network devices can be communicatively coupled to redundancy system 101.
  • redundancy system can implement a provider edge of a provider network.
  • the provider edge may be communicatively coupled to one or more provider end stations or application servers (not illustrated). While in Figure 1, two end users devices are illustrated (107 A and 107N), the provider edge may host on the order of thousands to millions of wire line type and/or wireless end users' devices, and the scope of the invention is not limited to any known number.
  • network elements 101 A a NE 101B form the redundancy system/cluster 101.
  • a redundancy system there are typically two network elements; however, the system may include more than two network elements.
  • one network element operates in an active role (herein referred to as active network element (NE) 101 A) while the other operates in a standby role (herein referred to as standby NE 101B).
  • the active NE is responsible for handling network traffic with a plurality of other network devices (e.g., end users' devices 107 A-N, or neighbor network devices as identified with respect to routing protocols).
  • each of the NEs 101 A and 101B is a control plane of a single network device which are separate such that a failure of a first one of the control planes does not cause a failure of the other control plane.
  • each of the NEs includes a control plane of a separate network device associated with a corresponding forwarding plane.
  • Each one of the NEs 101A and 101B are operative to run one or more higher layer routing protocols as well as a lower layer transport protocol.
  • the embodiments will be described with respect to higher layer routing protocol BGP, however alternative embodiments could use other higher layer routing protocols, such as Multiprotocol Label Switching (MPLS) Label Distribution Protocol (LDP).
  • MPLS Multiprotocol Label Switching
  • LDP Label Distribution Protocol
  • BGP messages encapsulated within TCP segments are mentioned herein for transmitting and receiving routing protocol information and updates, one of ordinary skill in the art would understand that routes can also arrive within messages of other protocols (e.g., Open Shortest Path First (OSPF)), or alternatively as a result of configuration changes (e.g., static routes).
  • OSPF Open Shortest Path First
  • One or more peer network devices e.g., peer network device 102 within network 105) communicate with the redundancy system 101 through routing protocol sessions to exchange routing protocol messages which will be used to dynamically update routing protocol information at the redundancy system.
  • the active NE 101 A Prior the standby NE joining the redundancy system 101 (where the standby NE may joining the redundancy system as a result of a first activation of the standby NE; its coupling with the active NE and/or a restart), the active NE 101 A includes initial routing protocol information and transport protocol session information.
  • the NE 101 A have one or more transport protocol sessions (e.g., TCP sockets) established between the redundancy system (i.e., the active NE 101A) and multiple peer network devices (e.g., exemplary peer network device 102 of network 105) and may include transport protocol session states and information associated with each transport protocol session.
  • transport protocol sessions e.g., TCP sockets
  • Further NE 101 A includes routing protocol information stored in one or more databases related to a routing protocol (e.g., BGP).
  • the NE 101A may include a Routing Information Base (RIB)-In (which includes IP prefixes received locally or from a routing protocol peer), Label Information Base (LIB) (which includes a database for labels allocated/received for one or more paths on the network), one or more adjacency structures (which may be referred to as next hop database (NH database) including the next hop network device for each IP prefix), path attribute database (which includes attributes for one or more paths in the network), a RIB-out (which is includes last BGP messages sent by the NE to each peer network device for a given IP prefix's reachability advertisement or in other words the best paths advertised by the active NE for each IP prefix).
  • RIB Routing Information Base
  • LIB Label Information Base
  • NH database next hop database
  • path attribute database which includes attributes for one or more paths in the network
  • RIB-out which
  • the NE 101A may further include routing protocol states associated with current routing protocol sessions.
  • the NE 101A may include routing protocol instance identifier (ID) identifying an instance of the protocol running on the NE; an identifier of a peer network device; the peer's BGP Finite State Machine (FSM) state; and any additional dynamic information that is generated by the active NE 101A while processing BGP messages prior to the addition of the standby NE 101B to the redundancy system.
  • ID routing protocol instance identifier
  • FSM BGP Finite State Machine
  • NE 101B joins the redundancy system 101 at operation 1 (i.e., the NE 101B is added to the system for the first time or it is restarting from a failure)
  • initial routing protocol information and transport protocol information are transmitted from NE 101 A to NE 101B at operation 2a.
  • NE 101B updates corresponding databases and data structures enabling the NE 101B to be synchronized with the current routing protocol information of NE 101 A at this initial time of synchronization.
  • all routing databases will be synchronized (e.g., RIB, Labels, etc.) from NE 101A to NE 101B
  • only a subset of these databases is copied.
  • the RIB-out database is copied from NE 101 A to NE 101B, where the RIB-out database includes the best paths computed and advertised at NE 101 A for each IP prefix.
  • each BGP session i.e., session established between NE 101A and a BGP peer network device
  • the session is mapped to a corresponding TCP socket on which the BGP session is established.
  • each BGP session is identified by a peer identifier (BGP-peer ID) uniquely identifying the peer network device, and a BGP instance identifier uniquely identifying a BGP instance to which the BGP session belongs.
  • NE 101A transmits the following information to NE 101B indicating the mapping between the BGP sessions and the TCP sockets established at the active NE 101A: a TCP identifier (TCP ID) identifying the TCP socket (e.g., a socket file descriptor (fd) identifying the active TCP socket (i.e., the TCP socket running on the active NE 101A) established between the NE 101A and the BGP peer; as well as TCP states associated with the TCP ID.
  • TCP ID TCP identifier
  • fd socket file descriptor
  • TCP states can be used to achieve a seamless TCP socket transition when a switchover occurs within the redundancy system.
  • the embodiments of the present invention process the TCP segments at the active NE 101 A and transmit only relevant information that may be needed to seamlessly transition the TCP sockets from the NE 101A to the NE 101B when a switchover occurs in the redundancy system consequently reducing the amount of data transmitted from NE 101A and NE 101B and the amount of processing performed at the standby NE 101B during runtime as will be described in further details below.
  • the TCP state information for each TCP ID transmitted to NE 101B may include TCP socket's local and remote address tuples (e.g., family IP address of the peer network device (sin_family), the TCP port associated with the TCP socket for the peer network device (sin_port), the IP address of the peer network device (sin_addr)).
  • TCP socket's local and remote address tuples e.g., family IP address of the peer network device (sin_family), the TCP port associated with the TCP socket for the peer network device (sin_port), the IP address of the peer network device (sin_addr)).
  • the TCP state information may further include TCP control block flags associated with the TCP ID; and TCP control block fields (e.g., ts_recent, ts_val, t_family, the peer's maximum segment size, the local TCP's max segment size, the current segment size in use, the maximum segment life, the initial send sequence number, the send unacknowledged sequence number, the initial receive sequence number, the receive next, the highest sequence number sent, the send next, the send window, the interface maximum transmission unit (MTU), the path MTU, etc.).
  • TCP control block fields e.g., ts_recent, ts_val, t_family, the peer's maximum segment size, the local TCP's max segment size, the current segment size in use, the maximum segment life, the initial send sequence number, the send unacknowledged sequence number, the initial receive sequence number, the receive next, the highest sequence number sent, the send next, the send window, the interface maximum transmission unit (MTU), the path MTU, etc
  • operation 2b is performed for each TCP socket established at the active NE 101A when the standby NE 101B joins the redundancy system and is ready to receive synchronization messages.
  • the synchronization messages are transmitted through an inter process communication (IPC) channel established between NE lOlA and NE 101B.
  • IPC inter process communication
  • NE 101 A is updated, at operation 3, to include the routing protocol information and transport protocol states received, and the information indicating the mapping between the higher layer routing protocol sessions (BGP sessions) and the transport protocol sessions (TCP sockets).
  • BGP sessions higher layer routing protocol sessions
  • TCP sockets transport protocol sessions
  • Figure 2 illustrates a flow diagram of operations performed to update the standby NE 101 A upon receipt of information indicating a mapping between the routing protocol sessions and the transport protocol sessions in accordance with some embodiments. The operations in the flow diagram will be described with reference to the exemplary embodiments of Figure 1.
  • NE 101B upon receipt of a synchronization message including the information indicating the mapping between routing protocol sessions (e.g., BGP sessions) and transport protocol sessions (e.g., TCP sockets) of the active NE, NE 101B allocates a new local transport protocol session (e.g., a new local TCP socket (i.e., new TCP socket between the peer network device 102 and NE 101B)). Flow then moves to operation 204 at which NE 101B obtains a new identifier (TCP ID) for this session.
  • TCP ID new identifier
  • NE 101B creates and stores a mapping 141B between the transport protocol session on the active network element and the newly created transport protocol session on the standby NE.
  • the mapping includes a mapping between the TCP ID associated with a BGP session on the active NE and the TCP ID associated with that same BGP session on the standby NE (operation 216).
  • the mapping 141B is stored at the NE 101B and will be used to enable a seamless switchover of the transport sessions for each routing protocol session.
  • NE 101B updates the new transport protocol session allocated according to transport protocol information received from NE 101 A.
  • the redundancy system 101 is determined to be in a nonstop routing (NSR) ready state such that if a switchover occurs at this moment in time a seamless transition of the routing protocol sessions (e.g., BGP sessions) and the transport protocol sessions (e.g., TCP sockets) occurs between NE 101A and NE 101B, where the transition is transparent to the peer network devices and does not require any processing at the peer network devices.
  • NSR nonstop routing
  • Figure 3 illustrates a block diagram of exemplary state transitions when performing an initialization process within the redundancy system in accordance with some embodiments.
  • the state transitions in the block diagram will be described with reference to the exemplary embodiments of Figure 1.
  • the state transitions of the block diagram of Figure 3 can be performed by embodiments of the invention other than those discussed with reference to Figure 1, and the embodiments of the invention discussed with reference to Figure 1 can perform operations different than those discussed with reference to the block diagram.
  • the active NE 101A is in a "not ready” state prior to the initialization of the synchronization of the routing protocol information and the transport protocol information with the standby NE 10 IB.
  • the NE 101B Prior to NE 101B joining the redundancy system (operation 1 of Figure 1), the NE 101B is set to be in a "not ready” NSR state indicating that the NE 101B is not ready to undergo a successful and seamless switchover of the redundancy system.
  • the NSR state of the standby NE 101B is set to "init-sync-ready" indicating that the NE is ready to start the initial synchronization process.
  • the synchronization process is in progress at the standby NE 101B and a message is transmitted to the NE 101A for requesting the initialization of the synchronization.
  • the active NE 101A starts the synchronization process upon receipt of the request from the standby NE and transmits a request to initiate the synchronization process to the NE 101B.
  • the NE 101B determines that the NE 101A may now start the synchronization process (operations 306A and 306B) at which the NE 101A transmits protocol routing information and mapping information (operations 2a and 2b of Figure 1) to the standby NE 101A.
  • the NE 101A transmits an "end-init-sync" message indicating to the NE 101B that all information is synchronized and that the initial synchronization process is now terminated.
  • the standby NE 101B completes the synchronization process (operation 308B).
  • the NSR states of the two network element are set to "ready” such that if a switchover occurs at this moment a seamless transition occurs between the two network elements with respect to the routing protocols sessions and the transport protocol sessions.
  • the active NE 101A may receive higher layer routing protocol messages or updates (through static configuration) that may alter the routing protocol information stored at the active NE.
  • the routing protocol information is then synchronized to the standby NE in order to enable nonstop routing at the redundancy system 101.
  • Figure 4 illustrates a block diagram of operations performed for synchronization of NEs of a redundancy system to enable nonstop routing in accordance with some embodiments.
  • the active NE 101 A receives, at operation 4a, one or more lower layer transport protocol packets including a routing protocol message over a lower layer transport protocol session (e.g., a TCP socket established between peer network device 102 and NE 101A).
  • a lower layer transport protocol session e.g., a TCP socket established between peer network device 102 and NE 101A.
  • a lower layer transport protocol packet (e.g., a TCP segment) is not necessarily congruent with a higher layer routing protocol packet (e.g., a BGP message).
  • a TCP stack of the peer network device 102 may transmit a single BGP message within one or more TCP segments in order to conform with the protocol's and network requirements and capabilities.
  • NE 101A Upon receipt of these packets, NE 101A processes the transport protocol packets to retrieve, at operation 4b, the routing protocol message transmitted from the peer network device (e.g., a BGP message).
  • the active NE 101A processes the higher layer routing protocol message (e.g., at the BGP stack of the NE 101 A) and updates, at operation 4c, the routing protocol information 11 IB according to the routing protocol message (e.g., by updating or adding entries in one or more of the routing databases).
  • the higher layer routing protocol message e.g., at the BGP stack of the NE 101 A
  • the routing protocol information 11 IB e.g., by updating or adding entries in one or more of the routing databases.
  • NE 101A transmits a synchronization message including the routing protocol message and an identifier of the transport protocol session through which the routing protocol was received from the peer network device 102.
  • Flow then moves to operation 6a at which NE 101B transmits an acknowledgment message to the active NE confirming the receipt of the synchronization message.
  • the flow then moves to operation 6b, at which the standby NE 101B to update routing protocol information and transport protocol information according to the routing protocol and the transport session identifier and the states of the transport protocol session received in the synchronization message.
  • the standby NE 101B updates routing protocol information and uses the mapping information stored at the NE to update the local transport protocol session associated with the transport protocol session of the active NE 101 A identified in the synchronization message.
  • the embodiments of the present invention process the transport packets at the active NE 101A and transmit only relevant information that may be needed to seamlessly transition the TCP sockets from the NE 101 A to the NE 101B when a switchover occurs in the redundancy system consequently reducing the amount of data transmitted from NE 101 A towards the NE 101B and the amount of processing performed at the standby NE 101B during this synchronization process.
  • Figure 5 illustrates a flow diagram of operations performed for synchronization of NEs of a redundancy system to enable nonstop routing in accordance with some embodiments.
  • the operations in the flow diagram will be described with reference to the exemplary embodiments of the other figures. However, it should be understood that the operations of the flow diagram can be performed by embodiments of the invention other than those discussed with reference to the other figures, and the embodiments of the invention discussed with reference to these other figures can perform operations different than those discussed with reference to the flow diagram.
  • a single TCP socket and a single BGP session established between the NE 101 A and the peer network device 102 will be described.
  • One of ordinary skill in the art would understand that the invention is not so limited and that NE 101 A may have multiple BGP sessions established over multiple TCP sockets with peer network devices.
  • the embodiments of the present invention apply to each one of these BGP sessions established between a BGP peer network device and the NE 101A.
  • the active NE 101A receives, at operation 502, one or more transport protocol packets including a routing protocol message over a transport protocol session (e.g., a TCP socket established between peer network device 102 and NE 101A). Upon receipt of these packets, NE 101A processes the packets to retrieve, at operation 504, the routing protocol message transmitted from the peer network device (e.g., a BGP message).
  • a transport protocol session e.g., a TCP socket established between peer network device 102 and NE 101A.
  • the synchronization message further includes current transport protocol states associated with the transport protocol session established at the active NE.
  • the active NE 101A processes the higher layer routing protocol message (e.g., at the BGP stack of the NE 101 A) and updates the routing protocol information 11 IB according to the routing protocol message (e.g., by updating or adding entries in one or more of the routing databases).
  • the receipt of the acknowledgment message will cause the active NE 101A to transmit a transport protocol acknowledgment message (e.g., TCP Ack) to the peer network device.
  • a transport protocol acknowledgment message e.g., TCP Ack
  • this operation can be skipped and the transport protocol acknowledgment message may be automatically transmitted to the peer network device when the active NE 101A transmits the synchronization message to the standby NE 101B.
  • the standby NE 101B updates routing protocol information and uses the mapping information stored at the NE to update the local transport protocol session associated with the transport protocol session of the active NE identified in the synchronization message.
  • the embodiments of the present invention process the transport packets at the active NE 101A and transmit only relevant information that may be needed to seamlessly transition the TCP sockets from the NE 101 A to the NE 10 IB when a switchover occurs in the redundancy system consequently reducing the amount of data transmitted from NE 101 A towards the NE 101B and the amount of processing performed at the standby NE 101B during this synchronization process.
  • Figure 6A illustrates a flow diagram of exemplary detailed operations performed at the active network device upon receipt of transport protocol packets including a routing protocol message in accordance with some embodiments.
  • Figure 6A illustrates operations performed for a BGP session established between NE 101A and a peer network device and a TCP socket at the active NE 101A, where the TCP socket is associated with a transport protocol session identifier (e.g., TCP ID).
  • the NE 101A receives one or more TCP segments that include a BGP message associated with the BGP session.
  • NE 101A Upon receipt of the BGP message, NE 101A initiates a synchronization procedure towards the standby NE 101B.
  • the flow upon receipt of the BGP message, the flow moves directly to operation 618 at which the message is transmitted to the standby NE 101B. In other embodiments, flow moves to operation 604 at which the BGP message is read from a socket buffer and added to a local buffer. Flow then moves to operation 606, at which NE 101 A determines the value of a nonstop routing (NSR) state associated with the NE.
  • NSR nonstop routing
  • the NSR state of a network element indicates whether the initial synchronization of the NEs has been performed as described with reference to Figures 1-3. If the NSR state of the NE 101A is ready, flow moves to operation 610, at which the NE 101A determines whether the standby NE 101B has restarted or not.
  • the standby NE 101B has not restarted flow moves to operation 616 at which the NE determines if the NSR state is still in a ready state. If it is determined that the NSR state is not ready then flow moves to operation 622 at which NE signals the TCP stack to send an acknowledgment message to the peer network device 102 for the received TCP messages. Alternatively, if it is determined that the NSR state is ready, flow moves to operation 618 at which the NE 101A transmits a synchronization message to the standby NE including the BGP message, TCP states associated with the TCP socket on which the BGP message was received and the TCP ID.
  • the synchronization messages is transmitted through an IPC messaging channel and includes the TCP ID of the TCP socket on which the BGP message was received; a BGP instance identifier, a BGP peer identifier, and the BGP message.
  • the NE 101 A may then process the BGP message and update routing protocol information accordingly.
  • Figure 6B illustrates a flow diagram of exemplary operations performed at the standby network element upon receipt of a synchronization message including a BGP message in accordance with some embodiments.
  • the standby NE 101B receives the synchronization message including an identifier of an active transport protocol session, current states of the active transport protocol session, and a routing protocol message which was received through the active transport protocol session at the active network device.
  • the synchronization message further includes routing protocol instance identifier (BGP instance ID) and a peer session identifier (peer ID).
  • the determination is performed by determining (operation 644) an identifier of a standby transport protocol session (e.g., TCP ID) based on the identifier of the active transport protocol session (which is established at the active network element 101 A between the NE 101 A and the peer network device 102) using the mapping 141B established between active transport protocol sessions (active TCP sockets) at the NE 101 A and standby transport protocol sessions (standby TCP sockets) at the NE 101B.
  • TCP ID an identifier of a standby transport protocol session
  • the BGP configuration is synchronized between the active NE 101A and the standby NE 101B such that the standby NE has up-to-date information of any configuration changes occurring on the active NE. For example, any changes made to BGP instances and BGP peers are synchronized with the standby to keep the BGP- instance identifiers and peer identifiers mappings consistent.
  • Figure 7 illustrates a block diagram of operations performed for synchronization of NEs of a redundancy system to enable nonstop routing in accordance with some embodiments.
  • the active NE 101A determines that a BGP message is to be transmitted to a peer network device (e.g., ND 102) over a lower layer transport protocol session (e.g., a TCP socket established between peer network device 102 and NE 101 A).
  • a peer network device e.g., ND 102
  • a lower layer transport protocol session e.g., a TCP socket established between peer network device 102 and NE 101 A.
  • the NE 101A synchronizes the BGP message with the standby NE 101B prior to transmitting the BGP message to the peer ND.
  • NE 101A formats a BGP message including an update to routing protocol information to be transmitted over a given BGP session associated with a BGP instance identifier.
  • the BGP message is to be forwarded to a peer network device (102) through a TCP socket associated with a TCP ID.
  • the NE 101A adds the BGP message to a TCP transmission buffer of the active NE without sending the message.
  • the NE 101 A constructs a synchronization message including an identifier of the TCP socket (TCP ID) associated with the BGP message to be transmitted; relevant lower layer protocol state information (TCP state) from current TCP session; an identifier of the peer ND and BGP instance identifier.
  • TCP ID an identifier of the TCP socket
  • TCP state relevant lower layer protocol state information
  • the standby NE 101B transmits an acknowledgment message to the active NE 101A confirming the receipt of the BGP message.
  • the NE 101B updates routing protocol information to include the last routing protocol message (BGP message) 131B advertised to a peer ND by the active NE 101A.
  • the standby NE 101B updates transport protocol session states according to the current transport protocol states received in the synchronization message.
  • the active NE 101A may now transmit the routing protocol message (BGP message), at operation 10, to the peer network device.
  • FIG. 8A illustrates a flow diagram of exemplary detailed operations performed at the active network element for transmitting a BGP message in accordance with some embodiments.
  • a single TCP socket and a single BGP session established between the NE 101A and the peer network device 102 will be described.
  • NE 101 A may have multiple BGP sessions established over multiple TCP sockets with peer network devices.
  • the embodiments of the present invention described with reference to Figure 8 apply to each one of these BGP sessions established between a BGP peer network device and the active NE 101A.
  • FIG. 8A illustrates operations performed for a BGP session established between NE 101A and a peer network device 102 and an associated TCP socket at the active NE 101A, where the TCP socket is associated with a transport protocol session identifier (e.g., TCP ID).
  • the NE 101A determines that a BGP IP prefix is to be updated.
  • Flow then moves to operation 804, at which the NE 101A formats a BGP message, which is to be transmitted to the peer ND 102 to advertise the update.
  • Flow then moves to operation 806, at which NE 101A determines whether the initial synchronization process of the active NE and the standby NE is complete.
  • the NSR state of the NE 101 A is set to wait until the standby NE is synchronized with NE 101 A and the NSR process is in a ready state. Flow then moves from operation 812, 814 or 812 to operation 818 at which the BGP message is added to a socket buffer (socket associated with the TCP ID) and held there until a confirmation is received from the standby NE 101B that the BGP message has been received. Flow then moves from operation 818 to operation 820 at which the NE 101A determines whether at least one of the conditions (a) is synchronization process completed, (b) is NSR state of the NE 101A ready, or (c) is the BGP session established is false.
  • FIG. 8B illustrates a flow diagram of exemplary operations performed at the standby network device upon receipt of a synchronization message including a BGP message in accordance with some embodiments.
  • the standby NE 101B receives a synchronization message including an identifier of an active transport protocol session (TCP ID), a routing protocol message (BGP message) which is to be transmitted on the active transport protocol session to a peer network device, and current transport protocol states associated with the active transport protocol session (TCP states).
  • the synchronization message further includes routing protocol instance identifier (BGP instance ID) and a peer session identifier (peer ID).
  • Flow then moves to operation 834 at which the NE 101B determines a standby transport protocol session associated with the active transport protocol session on which the routing protocol message is to be transmitted from the active NE 101A.
  • the determination is performed by determining (operation 844) an identifier of a standby transport protocol session (e.g., TCP ID) based on the identifier of the active transport protocol session (which is established at the active network element 101A between the NE 101A and the peer network device 102) using the mapping 141B established between active transport protocol sessions (active TCP sockets) at the NE 101 A and standby transport protocol sessions (standby TCP sockets) at the NE 101B.
  • Flow then moves to operation 836 at which the NE 101B determines a routing protocol session (using BGP instance ID) at the standby NE and locates the peer network device to which the routing protocol session belongs according to the peer id included in the synchronization message.
  • Flow then moves to operation 838 at which the NE 101B adds the BGP message received within the synchronization message to a TCP transmission queue without transmitting the BGP message.
  • the NE 101B further processes (operation 840) the BGP message which is to be transmitted by the active NE 101A to the peer network device 102.
  • the NE 101B copies the message to a receiving buffer and set a flag indicating that the BGP message is to be transmitted at the active NE 101A towards the ND 102.
  • the NE 101B then processes the BGP message by parsing the message and determining the IP prefix which is to be updated or alternatively withdrawn.
  • the new routing information associated with that IP prefix is then added to the routing protocol information 11 IB of the standby NE 101B (e.g., an entry of the RIB -out database can be added/updated or alternatively withdrawn).
  • Additional databases may further be updated enabling the standby NE 101B to have routing protocol information synchronized with the NE 101A.
  • the BGP message is not parsed and a peer-bit is "set" in a prefix's peer bitmap which indicates that the particular peer network device has been updated with the best path for this prefix.
  • a best route is not computed by the standby NE 101B until after a switchover occurs as described in further details below with reference to Figure 9.
  • peer-group can be used to store only a single last published update for all the peer network devices belonging to the group.
  • Figure 9 illustrates a flow diagram of operations performed at the NE 101B when a switchover occurs and the NE assumes an active role within the redundancy system in which a nonstop routing is enabled in accordance with some embodiments.
  • the NE 101 A assumes a standby role and the NE 101B assumes an active role.
  • the operations of Figure 9 will be described with reference to a single TCP socket to be established between a peer network device (e.g., 102) and the NE 101B by way of example and not limitation.
  • the newly active NE 101B activates a transport protocol session (TCP socket) coupling the peer ND 102 and NE 101B.
  • TCP socket transport protocol session
  • the new active NE 101B includes a set of standby transport protocol sessions that were established during multiple synchronization processes (initial synchronization process and synchronization updates received from the previously active NE 101A during a runtime prior to the switchover).
  • NE 101B upon detection of the switchover starts activating all these transport protocol sessions.
  • Flow then moves to operation 934, at which NE 101B starts transmission of transport protocol keep-alive messages (TCP keepalive) in order to indicate that the transport session (TCP socket) is alive. In some embodiments this is performed by starting a "keep-alive" timer and initiating the transmission of the keepalive packets.
  • the NE 101B may further determine that the TCP states are adjusted (corrected) to the newly activated TCP socket.
  • BGP keepalive messages keep alive routing protocol message
  • NE 101B waits for routing protocols (running on the NE 101B) to converge causing the receipt of a route redistribution from the RIB.
  • the BGP stack of the NE 101B computes the BEST-PATH for each IP prefix.
  • Flow then moves to operation 944 at which NE 101B determines whether the newly computed best path is different from the best path already stored in the routing protocol information 11 IB (e.g., stored in the RIB-out). If the best path is identical, this information is discarded and not transmitted to the peer ND 102 (since the peer ND already includes this routing information).
  • the flow moves to operation 948 at which the NE 101B advertises the best path to the peer network device and updates routing protocol information database accordingly.
  • the transmission of this new BGP message will cause the synchronization process of the redundancy system to be initiated such that the new standby NE 101 A is synchronized according to the routing protocol states of NE 10 IB.
  • the peer bit in the prefix's peer bitmap is checked to see if the best-path for this prefix has been advertised to this peer. If yes, then the update message being advertised to the peer is compared with a newly computed best path (which is determined at the NE 101B after the switchover occurs following the conversion of the routing protocol(s)) to see if it is different.
  • the best path for the prefix will be advertised to the peer only if at least one of the two conditions occurs the peer bit is set and the last published update for the peer is different from the newly computed best path, or the peer-bit is not set, indicating that no update was advertised to the peer for this prefix.
  • the embodiments described herein provide methods and apparatuses for enabling nonstop routing in a packet network.
  • the embodiments provide a minimal runtime
  • synchronization overhead by having synchronization messages including the BGP message and an identifier of the TCP sessions along with TCP state on which the BGP message is to be transmitted.
  • a single routing protocol message is transmitted to peer network devices to achieve the nonstop routing and routing protocol synchronization after a switchover. Further in some embodiments, the message is only transmitted when the newly active NE determines that the peer network device does not include the updated routing protocol information.
  • the embodiments provide a generic synchronization mechanism which is protocol independent and can be used with multiple version of routing protocols (for example multiple version of BGP, or MPLS LDP).
  • An electronic device stores and transmits (internally and/or with other electronic devices over a network) code (which is composed of software instructions and which is sometimes referred to as computer program code or a computer program) and/or data using machine -readable media (also called computer-readable media), such as machine -readable storage media (e.g., magnetic disks, optical disks, read only memory (ROM), flash memory devices, phase change memory) and machine-readable transmission media (also called a carrier) (e.g., electrical, optical, radio, acoustical or other form of propagated signals - such as carrier waves, infrared signals).
  • machine -readable media also called computer-readable media
  • machine -readable storage media e.g., magnetic disks, optical disks, read only memory (ROM), flash memory devices, phase change memory
  • machine-readable transmission media also called a carrier
  • carrier e.g., electrical, optical, radio, acoustical or other form of propagated signals - such as carrier waves, infrared signals.
  • an electronic device e.g., a computer
  • includes hardware and software such as a set of one or more processors coupled to one or more machine -readable storage media to store code for execution on the set of processors and/or to store data.
  • an electronic device may include non-volatile memory containing the code since the non-volatile memory can persist code/data even when the electronic device is turned off (when power is removed), and while the electronic device is turned on that part of the code that is to be executed by the processor(s) of that electronic device is typically copied from the slower nonvolatile memory into volatile memory (e.g., dynamic random access memory (DRAM), static random access memory (SRAM)) of that electronic device.
  • volatile memory e.g., dynamic random access memory (DRAM), static random access memory (SRAM)
  • Typical electronic devices also include a set or one or more physical network interface(s) to establish network connections (to transmit and/or receive code and/or data using propagating signals) with other electronic devices.
  • network connections to transmit and/or receive code and/or data using propagating signals.
  • One or more parts of an embodiment of the invention may be implemented using different combinations of software, firmware, and/or hardware.
  • a network device is an electronic device that communicatively interconnects other electronic devices on the network (e.g., other network devices, end-user devices).
  • Some network devices are "multiple services network devices" that provide support for multiple networking functions (e.g., routing, bridging, switching, Layer 2 aggregation, session border control, Quality of Service, and/or subscriber management), and/or provide support for multiple application services (e.g., data, voice, and video).
  • Figure 10A illustrates connectivity between network devices (NDs) within an exemplary network, as well as three exemplary implementations of the NDs, according to some embodiments of the invention.
  • Figure 10A shows NDs 1000A-H, and their connectivity by way of lines between 1000A-1000B, lOOOB-lOOOC, lOOOC-lOOOD, 1000D-1000E, 1000E-1000F, 1000F-1000G, and 1000A-1000G, as well as between 1000H and each of 1000A, lOOOC, 1000D, and 1000G.
  • These NDs are physical devices, and the connectivity between these NDs can be wireless or wired (often referred to as a link).
  • NDs 1000A, 1000E, and 1000F An additional line extending from NDs 1000A, 1000E, and 1000F illustrates that these NDs act as ingress and egress points for the network (and thus, these NDs are sometimes referred to as edge NDs; while the other NDs may be called core NDs).
  • Two of the exemplary ND implementations in Figure 10A are: 1) a special-purpose network device 1002 that uses custom application-specific integrated-circuits (ASICs) and a special-purpose operating system (OS); and 2) a general purpose network device 1004 that uses common off-the-shelf (COTS) processors and a standard OS.
  • ASICs application-specific integrated-circuits
  • OS special-purpose operating system
  • COTS common off-the-shelf
  • the special-purpose network device 1002 includes networking hardware 1010 comprising compute resource(s) 1012 (which typically include a set of one or more processors), forwarding resource(s) 1014 (which typically include one or more ASICs and/or network processors), and physical network interfaces (NIs) 1016 (sometimes called physical ports), as well as non-transitory machine readable storage media 1018 having stored therein networking software 1020.
  • a physical NI is hardware in a ND through which a network connection (e.g., wirelessly through a wireless network interface controller (WNIC) or through plugging in a cable to a physical port connected to a network interface controller (NIC)) is made, such as those shown by the connectivity between NDs 1000A-H.
  • WNIC wireless network interface controller
  • NIC network interface controller
  • the networking software 1020 may be executed by the networking hardware 1010 to instantiate a set of one or more networking software instance(s) 1022.
  • Each of the networking software instance(s) 1022, and that part of the networking hardware 1010 that executes that network software instance form a separate virtual network element 1030A-R.
  • Each of the virtual network element(s) (VNEs) 1030A-R includes a control communication and configuration module 1032A-R (sometimes referred to as a local control module or control communication module) and forwarding table(s) 1034A-R, such that a given virtual network element (e.g., 1030A) includes the control communication and configuration module (e.g., 1032A), a set of one or more forwarding table(s) (e.g., 1034A), and that portion of the networking hardware 1010 that executes the virtual network element (e.g., 1030A).
  • a control communication and configuration module 1032A-R sometimes referred to as a local control module or control communication module
  • forwarding table(s) 1034A-R forwarding table(s) 1034A-R
  • the networking software 1020 further includes NonStop Routing Unit (NSRU) 1021, which when executed by the networking hardware 1010 instantiates a set of one or more NSRU 1033 part of the instances 1022 enabling the network device 1002 to perform operations described with reference to Figures 1-9.
  • NSRU NonStop Routing Unit
  • the special-purpose network device 1002 is often physically and/or logically considered to include: 1) a ND control plane 1024 (sometimes referred to as a control plane) comprising the compute resource(s) 1012 that execute the control communication and configuration module(s) 1032A-R; and 2) a ND forwarding plane 1026 (sometimes referred to as a forwarding plane, a data plane, or a media plane) comprising the forwarding resource(s) 1014 that utilize the forwarding table(s) 1034A-R and the physical NIs 1016.
  • a ND control plane 1024 (sometimes referred to as a control plane) comprising the compute resource(s) 1012 that execute the control communication and configuration module(s) 1032A-R
  • a ND forwarding plane 1026 sometimes referred to as a forwarding plane, a data plane, or a media plane
  • the forwarding resource(s) 1014 that utilize the forwarding table(s) 1034A-R and the physical NIs 1016.
  • the ND control plane 1024 (the compute resource(s) 1012 executing the control communication and configuration module(s) 1032A-R) is typically responsible for participating in controlling how data (e.g., packets) is to be routed (e.g., the next hop for the data and the outgoing physical NI for that data) and storing that routing information in the forwarding table(s) 1034A-R, and the ND forwarding plane 1026 is responsible for receiving that data on the physical NIs 1016 and forwarding that data out the appropriate ones of the physical NIs 1016 based on the forwarding table(s) 1034A-R.
  • data e.g., packets
  • the ND forwarding plane 1026 is responsible for receiving that data on the physical NIs 1016 and forwarding that data out the appropriate ones of the physical NIs 1016 based on the forwarding table(s) 1034A-R.
  • Figure 10B illustrates an exemplary way to implement the special-purpose network device 1002 according to some embodiments of the invention.
  • Figure 10B shows a special- purpose network device including cards 1038 (typically hot pluggable). While in some embodiments the cards 1038 are of two types (one or more that operate as the ND forwarding plane 1026 (sometimes called line cards), and one or more that operate to implement the ND control plane 1024 (sometimes called control cards)), alternative embodiments may combine functionality onto a single card and/or include additional card types (e.g., one additional type of card is called a service card, resource card, or multi-application card).
  • additional card types e.g., one additional type of card is called a service card, resource card, or multi-application card.
  • a service card can provide specialized processing (e.g., Layer 4 to Layer 7 services (e.g., firewall, Internet Protocol Security (IPsec), Secure Sockets Layer (SSL) / Transport Layer Security (TLS), Intrusion Detection System (IDS), peer-to-peer (P2P), Voice over IP (VoIP) Session Border Controller, Mobile Wireless Gateways (Gateway General Packet Radio Service (GPRS) Support Node (GGSN), Evolved Packet Core (EPC) Gateway)).
  • Layer 4 to Layer 7 services e.g., firewall, Internet Protocol Security (IPsec), Secure Sockets Layer (SSL) / Transport Layer Security (TLS), Intrusion Detection System (IDS), peer-to-peer (P2P), Voice over IP (VoIP) Session Border Controller, Mobile Wireless Gateways (Gateway General Packet Radio Service (GPRS) Support Node (GGSN), Evolved Packet Core (EPC) Gateway)
  • GPRS General Pack
  • the general purpose network device 1004 includes hardware 1040 comprising a set of one or more processor(s) 1042 (which are often COTS processors) and network interface controller(s) 1044 (NICs; also known as network interface cards) (which include physical NIs 1046), as well as non-transitory machine readable storage media 1048 having stored therein software 1050.
  • the processor(s) 1042 execute the software 1050 to instantiate one or more sets of one or more applications 1064A-R.
  • the software 1050 further includes NonStop Routing Unit (NSRU) 1051, which when executed by the networking hardware 1040 enables the instance(s) 1052 of the network device 1004 to perform operations described with reference to Figures 1-9.
  • NRU NonStop Routing Unit
  • the virtualization layer 1054 represents the kernel of an operating system (or a shim executing on a base operating system) that allows for the creation of multiple instances 1062A-R called software containers that may each be used to execute one (or more) of the sets of applications 1064A-R; where the multiple software containers (also called virtualization engines, virtual private servers, or jails) are user spaces (typically a virtual memory space ) that are separate from each other and separate from the kernel space in which the operating system is run; and where the set of applications running in a given user space, unless explicitly allowed, cannot access the memory of the other processes.
  • the virtualization layer 1054 represents the kernel of an operating system (or a shim executing on a base operating system) that allows for the creation of multiple instances 1062A-R called software containers that may each be used to execute one (or more) of the sets of applications 1064A-R; where the multiple software containers (also called virtualization engines, virtual private servers, or jails) are user spaces (typically a virtual memory space ) that are
  • the virtualization layer 1054 represents a hypervisor (sometimes referred to as a virtual machine monitor (VMM)) or a hypervisor executing on top of a host operating system, and each of the sets of applications 1064A-R is run on top of a guest operating system within an instance 1062A-R called a virtual machine (which may in some cases be considered a tightly isolated form of software container) that is run on top of the hypervisor - the guest operating system and application may not know they are running on a virtual machine as opposed to running on a "bare metal" host electronic device, or through para- virtualization the operating system and/or application may be aware of the presence of virtualization for optimization purposes.
  • a hypervisor sometimes referred to as a virtual machine monitor (VMM)
  • VMM virtual machine monitor
  • one, some or all of the applications are implemented as unikernel(s), which can be generated by compiling directly with an application only a limited set of libraries (e.g., from a library operating system. (LibOS) including drivers/libraries of OS services) that provide the particular OS services needed by the application.
  • libraries e.g., from a library operating system. (LibOS) including drivers/libraries of OS services
  • unikernel can be implemented to run directly on hardware 1040, directly on a hypervisor (in which case the unikernel is sometimes described as running within a LibOS virtual machine), or in a software container
  • embodiments can be implemented fully with unikernels running directly on a hypervisor represented by virtualization layer 1054, unikernels running within software containers represented by instances 1062A-R, or as a combination of unikernels and the above-described techniques (e.g., unikernels and virtual machines both run directly on a hypervisor, unikernels and sets of applications that are run in different software containers).
  • the instantiation of the one or more sets of one or more applications 1064A-R, as well as virtualization if implemented, are collectively referred to as software instance(s) 1052.
  • the virtual network element(s) 1060A-R perform similar functionality to the virtual network element(s) 1030A-R - e.g., similar to the control communication and configuration module(s) 1032A and forwarding table(s) 1034A (this virtualization of the hardware 1040 is sometimes referred to as network function virtualization (NFV)).
  • NFV network function virtualization
  • CPE customer premise equipment
  • each instance 1062A-R corresponding to one VNE 1060A-R
  • alternative embodiments may implement this correspondence at a finer level granularity (e.g., line card virtual machines virtualize line cards, control card virtual machine virtualize control cards, etc.); it should be understood that the techniques described herein with reference to a correspondence of instances 1062A-R to VNEs also apply to embodiments where such a finer level of granularity and/or unikernels are used.
  • the virtualization layer 1054 includes a virtual switch that provides similar forwarding services as a physical Ethernet switch. Specifically, this virtual switch forwards traffic between instances 1062A-R and the NIC(s) 1044, as well as optionally between the instances 1062A-R; in addition, this virtual switch may enforce network isolation between the VNEs 1060A-R that by policy are not permitted to communicate with each other (e.g., by honoring virtual local area networks (VLANs)).
  • VLANs virtual local area networks
  • the third exemplary ND implementation in Figure 10A is a hybrid network device 1006, which includes both custom ASICs/special-purpose OS and COTS processors/standard OS in a single ND or a single card within an ND.
  • a platform VM i.e., a VM that that implements the functionality of the special- purpose network device 1002 could provide for para-virtualization to the networking hardware present in the hybrid network device 1006.
  • NE network element
  • each of the VNEs receives data on the physical NIs (e.g., 1016, 1046) and forwards that data out the appropriate ones of the physical NIs (e.g., 1016, 1046).
  • a VNE implementing IP router functionality forwards IP packets on the basis of some of the IP header information in the IP packet; where IP header information includes source IP address, destination IP address, source port, destination port (where "source port" and
  • destination port refer herein to protocol ports, as opposed to physical ports of a ND), transport protocol (e.g., user datagram protocol (UDP), Transmission Control Protocol (TCP), and differentiated services code point (DSCP) values.
  • transport protocol e.g., user datagram protocol (UDP), Transmission Control Protocol (TCP), and differentiated services code point (DSCP) values.
  • Figure IOC illustrates various exemplary ways in which VNEs may be coupled according to some embodiments of the invention.
  • Figure IOC shows VNEs 1070A.1-1070A.P (and optionally VNEs 1070A.Q-1070A.R) implemented in ND 1000A and VNE 1070H.1 in ND 1000H.
  • VNEs 1070A.1-P are separate from each other in the sense that they can receive packets from outside ND 1000A and forward packets outside of ND 1000A; VNE 1070A.1 is coupled with VNE 1070H.1, and thus they communicate packets between their respective NDs; VNE 1070A.2-1070A.3 may optionally forward packets between themselves without forwarding them outside of the ND 1000A; and VNE 1070A.P may optionally be the first in a chain of VNEs that includes VNE 1070A.Q followed by VNE 1070A.R (this is sometimes referred to as dynamic service chaining, where each of the VNEs in the series of VNEs provides a different service - e.g., one or more layer 4-7 network services). While Figure IOC illustrates various exemplary relationships between the VNEs, alternative embodiments may support other relationships (e.g., more/fewer VNEs, more/fewer dynamic service chains, multiple different dynamic service chains with some common VNEs and some different V
  • the NDs of Figure 10A may form part of the Internet or a private network; and other electronic devices (not shown; such as end user devices including workstations, laptops, netbooks, tablets, palm tops, mobile phones, smartphones, phablets, multimedia phones, Voice Over Internet Protocol (VOIP) phones, terminals, portable media players, GPS units, wearable devices, gaming systems, set-top boxes, Internet enabled household appliances) may be coupled to the network (directly or through other networks such as access networks) to communicate over the network (e.g., the Internet or virtual private networks (VPNs) overlaid on (e.g., tunneled through) the Internet) with each other (directly or through servers) and/or access content and/or services.
  • VOIP Voice Over Internet Protocol
  • VPNs virtual private networks
  • Such content and/or services are typically provided by one or more servers (not shown) belonging to a service/content provider or one or more end user devices (not shown) participating in a peer-to-peer (P2P) service, and may include, for example, public webpages (e.g., free content, store fronts, search services), private webpages (e.g., username/password accessed webpages providing email services), and/or corporate networks over VPNs.
  • end user devices may be coupled (e.g., through customer premise equipment coupled to an access network (wired or wirelessly)) to edge NDs, which are coupled (e.g., through one or more core NDs) to other edge NDs, which are coupled to electronic devices acting as servers.
  • one or more of the electronic devices operating as the NDs in Figure 10A may also host one or more such servers (e.g., in the case of the general purpose network device 1004, one or more of the software instances 1062A-R may operate as servers; the same would be true for the hybrid network device 1006; in the case of the special-purpose network device 1002, one or more such servers could also be run on a virtualization layer executed by the compute resource(s) 1012); in which case the servers are said to be co-located with the VNEs of that ND.
  • the servers are said to be co-located with the VNEs of that ND.
  • a virtual network is a logical abstraction of a physical network (such as that in Figure 10A) that provides network services (e.g., L2 and/or L3 services).
  • a virtual network can be implemented as an overlay network (sometimes referred to as a network virtualization overlay) that provides network services (e.g., layer 2 (L2, data link layer) and/or layer 3 (L3, network layer) services) over an underlay network (e.g., an L3 network, such as an Internet Protocol (IP) network that uses tunnels (e.g., generic routing encapsulation (GRE), layer 2 tunneling protocol (L2TP), IPSec) to create the overlay network).
  • IP Internet Protocol
  • a network virtualization edge sits at the edge of the underlay network and participates in implementing the network virtualization; the network-facing side of the NVE uses the underlay network to tunnel frames to and from other NVEs; the outward-facing side of the NVE sends and receives data to and from systems outside the network.
  • a virtual network instance is a specific instance of a virtual network on a NVE (e.g., a NE/VNE on an ND, a part of a NE/VNE on a ND where that NE/VNE is divided into multiple VNEs through emulation); one or more VNIs can be instantiated on an NVE (e.g., as different VNEs on an ND).
  • a virtual access point is a logical connection point on the NVE for connecting external systems to a virtual network; a VAP can be physical or virtual ports identified through logical interface identifiers (e.g., a VLAN ID).
  • Examples of network services include: 1) an Ethernet LAN emulation service (an Ethernet-based multipoint service similar to an Internet Engineering Task Force (IETF) Multiprotocol Label Switching (MPLS) or Ethernet VPN (EVPN) service) in which external systems are interconnected across the network by a LAN environment over the underlay network (e.g., an NVE provides separate L2 VNIs (virtual switching instances) for different such virtual networks, and L3 (e.g., IP/MPLS) tunneling encapsulation across the underlay network); and 2) a virtualized IP forwarding service (similar to IETF IP VPN (e.g., Border Gateway Protocol (BGP)/MPLS IP VPN) from a service definition perspective) in which external systems are interconnected across the network by an L3 environment over the underlay network (e.g., an NVE provides separate L3 VNIs (forwarding and routing instances) for different such virtual networks, and L3 (e.g., IP/MPLS) tunneling encapsulation across the underlay network)).
  • Network services may also include quality of service capabilities (e.g., traffic classification marking, traffic conditioning and scheduling), security capabilities (e.g., filters to protect customer premises from network - originated attacks, to avoid malformed route announcements), and management capabilities (e.g., full detection and processing).
  • quality of service capabilities e.g., traffic classification marking, traffic conditioning and scheduling
  • security capabilities e.g., filters to protect customer premises from network - originated attacks, to avoid malformed route announcements
  • management capabilities e.g., full detection and processing
  • FIG. 10D illustrates a network with a single network element on each of the NDs of Figure 10A, and within this straight forward approach contrasts a traditional distributed approach (commonly used by traditional routers) with a centralized approach for maintaining reachability and forwarding information (also called network control), according to some embodiments of the invention.
  • Figure 10D illustrates network elements (NEs) 1070A-H with the same connectivity as the NDs 1000A-H of Figure 10A.
  • Figure 10D illustrates that the distributed approach 1072 distributes responsibility for generating the reachability and forwarding information across the NEs 1070A-H; in other words, the process of neighbor discovery and topology discovery is distributed.
  • the control communication and configuration module(s) 1032A-R of the ND control plane 1024 typically include a reachability and forwarding information module to implement one or more routing protocols (e.g., an exterior gateway protocol such as Border Gateway Protocol (BGP), Interior Gateway Protocol(s) (IGP) (e.g., Open Shortest Path First (OSPF), Intermediate System to Intermediate System (IS-IS), Routing Information Protocol (RIP), Label Distribution Protocol (LDP), Resource Reservation Protocol (RSVP) (including RSVP-Traffic Engineering (TE): Extensions to RSVP for LSP Tunnels and Generalized Multi-Protocol Label Switching
  • Border Gateway Protocol BGP
  • IGP Interior Gateway Protocol
  • OSPF Open Shortest Path First
  • IS-IS Intermediate System to Intermediate System
  • RIP Routing Information Protocol
  • LDP Label Distribution Protocol
  • RSVP Resource Reservation Protocol
  • the NEs 1070A-H e.g., the compute resource(s) 1012 executing the control communication and configuration module(s) 1032A-R
  • the NEs 1070A-H perform their responsibility for participating in controlling how data (e.g., packets) is to be routed (e.g., the next hop for the data and the outgoing physical NI for that data) by distributively determining the reachability within the network and calculating their respective forwarding information.
  • Routes and adjacencies are stored in one or more routing structures (e.g., Routing Information Base (RIB), Label Information Base (LIB), one or more adjacency structures) on the ND control plane 1024.
  • the ND control plane 1024 programs the ND forwarding plane 1026 with information (e.g., adjacency and route information) based on the routing structure(s). For example, the ND control plane 1024 programs the adjacency and route information into one or more forwarding table(s) 1034A-R (e.g., Forwarding Information Base (FIB), Label Forwarding Information Base (LFIB), and one or more adjacency structures) on the ND forwarding plane 1026.
  • FIB Forwarding Information Base
  • LFIB Label Forwarding Information Base
  • the ND can store one or more bridging tables that are used to forward data based on the layer 2 information in that data. While the above example uses the special-purpose network device 1002, the same distributed approach 1072 can be implemented on the general purpose network device 1004 and the hybrid network device 1006.
  • FIG. 10D illustrates that a centralized approach 1074 (also known as software defined networking (SDN)) that decouples the system that makes decisions about where traffic is sent from the underlying systems that forwards traffic to the selected destination.
  • the illustrated centralized approach 1074 has the responsibility for the generation of reachability and forwarding information in a centralized control plane 1076 (sometimes referred to as a SDN control module, controller, network controller, OpenFlow controller, SDN controller, control plane node, network virtualization authority, or management control entity), and thus the process of neighbor discovery and topology discovery is centralized.
  • a centralized control plane 1076 sometimes referred to as a SDN control module, controller, network controller, OpenFlow controller, SDN controller, control plane node, network virtualization authority, or management control entity
  • the centralized control plane 1076 has a south bound interface 1082 with a data plane 1080 (sometime referred to the infrastructure layer, network forwarding plane, or forwarding plane (which should not be confused with a ND forwarding plane)) that includes the NEs 1070A-H (sometimes referred to as switches, forwarding elements, data plane elements, or nodes).
  • the centralized control plane 1076 includes a network controller 1078, which includes a centralized reachability and forwarding information module 1079 that determines the reachability within the network and distributes the forwarding information to the NEs 1070A-H of the data plane 1080 over the south bound interface 1082 (which may use the OpenFlow protocol).
  • the network intelligence is centralized in the centralized control plane 1076 executing on electronic devices that are typically separate from the NDs.
  • the network controller 1078 further includes NonStop Routing Unit 1081, which enables the centralized control plane 1076 to perform operations described with reference to Figures 1-9.
  • each of the control communication and configuration module(s) 1032A-R of the ND control plane 1024 typically include a control agent that provides the VNE side of the south bound interface 1082.
  • the ND control plane 1024 (the compute resource(s) 1012 executing the control communication and configuration module(s) 1032A-R) performs its responsibility for participating in controlling how data (e.g., packets) is to be routed (e.g., the next hop for the data and the outgoing physical NI for that data) through the control agent communicating with the centralized control plane 1076 to receive the forwarding information (and in some cases, the reachability information) from the centralized reachability and forwarding information module 1079 (it should be understood that in some embodiments of the invention, the control communication and configuration module(s) 1032A-R, in addition to communicating with the centralized control plane 1076, may also play some role in determining reachability and/or calculating forwarding information - albeit less so than in the case of a distributed approach; such embodiments are generally considered to fall under the centralized approach 1074, but may also be considered a hybrid approach).
  • data e.g., packets
  • the control agent communicating with the centralized control plane 1076 to receive the forward
  • the same centralized approach 1074 can be implemented with the general purpose network device 1004 (e.g., each of the VNE 1060A-R performs its responsibility for controlling how data (e.g., packets) is to be routed (e.g., the next hop for the data and the outgoing physical NI for that data) by communicating with the centralized control plane 1076 to receive the forwarding information (and in some cases, the reachability information) from the centralized reachability and forwarding information module 1079; it should be understood that in some embodiments of the invention, the VNEs 1060A-R, in addition to communicating with the centralized control plane 1076, may also play some role in determining reachability and/or calculating forwarding information - albeit less so than in the case of a distributed approach) and the hybrid network device 1006.
  • the general purpose network device 1004 e.g., each of the VNE 1060A-R performs its responsibility for controlling how data (e.g., packets) is to be routed (e.g., the next hop for
  • NFV is able to support SDN by providing an infrastructure upon which the SDN software can be run
  • NFV and SDN both aim to make use of commodity server hardware and physical switches.
  • FIG. 10D also shows that the centralized control plane 1076 has a north bound interface 1084 to an application layer 1086, in which resides application(s) 1088.
  • the centralized control plane 1076 has the ability to form virtual networks 1092 (sometimes referred to as a logical forwarding plane, network services, or overlay networks (with the NEs 1070A-H of the data plane 1080 being the underlay network)) for the application(s) 1088.
  • virtual networks 1092 sometimes referred to as a logical forwarding plane, network services, or overlay networks (with the NEs 1070A-H of the data plane 1080 being the underlay network)
  • the centralized control plane 1076 maintains a global view of all NDs and configured NEs/VNEs, and it maps the virtual networks to the underlying NDs efficiently (including maintaining these mappings as the physical network changes either through hardware (ND, link, or ND component) failure, addition, or removal).
  • Figure 10D shows the distributed approach 1072 separate from the centralized approach 1074
  • the effort of network control may be distributed differently or the two combined in certain embodiments of the invention.
  • embodiments may generally use the centralized approach (SDN) 1074, but have certain functions delegated to the NEs (e.g., the distributed approach may be used to implement one or more of fault monitoring, performance monitoring, protection switching, and primitives for neighbor and/or topology discovery); or 2) embodiments of the invention may perform neighbor discovery and topology discovery via both the centralized control plane and the distributed protocols, and the results compared to raise exceptions where they do not agree.
  • SDN centralized approach
  • Such embodiments are generally considered to fall under the centralized approach 1074, but may also be considered a hybrid approach.
  • Figure 10D illustrates the simple case where each of the NDs 1000A-H implements a single NE 1070A-H
  • the network control approaches described with reference to Figure 10D also work for networks where one or more of the NDs 1000A-H implement multiple VNEs (e.g., VNEs 1030A-R, VNEs 1060A-R, those in the hybrid network device 1006).
  • the network controller 1078 may also emulate the implementation of multiple VNEs in a single ND.
  • the network controller 1078 may present the implementation of a VNE/NE in a single ND as multiple VNEs in the virtual networks 1092 (all in the same one of the virtual network(s) 1092, each in different ones of the virtual network(s) 1092, or some combination).
  • the network controller 1078 may cause an ND to implement a single VNE (a NE) in the underlay network, and then logically divide up the resources of that NE within the centralized control plane 1076 to present different VNEs in the virtual network(s) 1092 (where these different VNEs in the overlay networks are sharing the resources of the single VNE/NE implementation on the ND in the underlay network).
  • Figures 10E and 10F respectively illustrate exemplary abstractions of NEs and VNEs that the network controller 1078 may present as part of different ones of the virtual networks 1092.
  • Figure 10E illustrates the simple case of where each of the NDs 1000A- H implements a single NE 1070A-H (see Figure 10D), but the centralized control plane 1076 has abstracted multiple of the NEs in different NDs (the NEs 1070A-C and G-H) into (to represent) a single NE 10701 in one of the virtual network(s) 1092 of Figure 10D, according to some embodiments of the invention.
  • Figure 10E shows that in this virtual network, the NE 10701 is coupled to NE 1070D and 1070F, which are both still coupled to NE 1070E.
  • Figure 10F illustrates a case where multiple VNEs (VNE 1070A.1 and VNE 1070H.1) are implemented on different NDs (ND 1000A and ND 1000H) and are coupled to each other, and where the centralized control plane 1076 has abstracted these multiple VNEs such that they appear as a single VNE 1070T within one of the virtual networks 1092 of Figure 10D, according to some embodiments of the invention.
  • the abstraction of a NE or VNE can span multiple NDs.
  • the electronic device(s) running the centralized control plane 1076 may be implemented a variety of ways (e.g., a special purpose device, a general-purpose (e.g., COTS) device, or hybrid device). These electronic device(s) would similarly include compute resource(s), a set or one or more physical NICs, and a non-transitory machine-readable storage medium having stored thereon the centralized control plane software.
  • FIG. 11 illustrates, a general purpose control plane device 1104 including hardware 1140 comprising a set of one or more processor(s) 1142 (which are often COTS processors) and network interface controller(s) 1144 (NICs; also known as network interface cards) (which include physical NIs 1146), as well as non-transitory machine readable storage media 1148 having stored therein centralized control plane (CCP) software 1150.
  • processors which are often COTS processors
  • NICs network interface controller
  • NICs network interface controller
  • CCP centralized control plane
  • the processor(s) 1142 typically execute software to instantiate a virtualization layer 1154 (e.g., in one embodiment the virtualization layer 1154 represents the kernel of an operating system (or a shim executing on a base operating system) that allows for the creation of multiple instances 1162A-R called software containers (representing separate user spaces and also called virtualization engines, virtual private servers, or jails) that may each be used to execute a set of one or more applications; in another embodiment the virtualization layer 1154 represents a hypervisor (sometimes referred to as a virtual machine monitor (VMM)) or a hypervisor executing on top of a host operating system, and an application is run on top of a guest operating system within an instance 1162A-R called a virtual machine (which in some cases may be considered a tightly isolated form of software container) that is run by the hypervisor ; in another embodiment, an application is implemented as a unikernel, which can be generated by compiling directly with an application only a
  • VMM virtual machine monitor
  • an instance of the CCP software 1150 (illustrated as CCP instance 1176 A) is executed (e.g., within the instance 1162A) on the virtualization layer 1154.
  • the CCP instance 1176A is executed, as a unikernel or on top of a host operating system, on the "bare metal" general purpose control plane device 1104.
  • the instantiation of the CCP instance 1176A, as well as the virtualization layer 1154 and instances 1162A-R if implemented, are collectively referred to as software instance(s) 1152.
  • the CCP instance 1176A includes a network controller instance 1178.
  • the network controller instance 1178 includes a centralized reachability and forwarding information module instance 1179 (which is a middleware layer providing the context of the network controller 1078 to the operating system and communicating with the various NEs), and an CCP application layer 1180 (sometimes referred to as an application layer) over the middleware layer (providing the intelligence required for various network operations such as protocols, network situational awareness, and user - interfaces).
  • this CCP application layer 1180 within the centralized control plane 1076 works with virtual network view(s) (logical view(s) of the network) and the middleware layer provides the conversion from the virtual networks to the physical view.
  • the network controller instance further includes NonStop Routing Instance 1181, which enables the control plane device 1104 to perform operations described with reference to Figures 1-9.
  • the centralized control plane 1076 transmits relevant messages to the data plane 1080 based on CCP application layer 1180 calculations and middleware layer mapping for each flow.
  • a flow may be defined as a set of packets whose headers match a given pattern of bits; in this sense, traditional IP forwarding is also flow-based forwarding where the flows are defined by the destination IP address for example; however, in other implementations, the given pattern of bits used for a flow definition may include more fields (e.g., 10 or more) in the packet headers.
  • Different NDs/NEs/VNEs of the data plane 1080 may receive different messages, and thus different forwarding information.
  • the data plane 1080 processes these messages and programs the appropriate flow information and corresponding actions in the forwarding tables (sometime referred to as flow tables) of the appropriate NE/VNEs, and then the NEs/VNEs map incoming packets to flows represented in the forwarding tables and forward packets based on the matches in the forwarding tables.
  • Standards such as OpenFlow define the protocols used for the messages, as well as a model for processing the packets.
  • the model for processing packets includes header parsing, packet classification, and making forwarding decisions. Header parsing describes how to interpret a packet based upon a well-known set of protocols. Some protocol fields are used to build a match structure (or key) that will be used in packet classification (e.g., a first key field could be a source media access control (MAC) address, and a second key field could be a destination MAC address).
  • MAC media access control
  • Packet classification involves executing a lookup in memory to classify the packet by determining which entry (also referred to as a forwarding table entry or flow entry) in the forwarding tables best matches the packet based upon the match structure, or key, of the forwarding table entries. It is possible that many flows represented in the forwarding table entries can correspond/match to a packet; in this case the system is typically configured to determine one forwarding table entry from the many according to a defined scheme (e.g., selecting a first forwarding table entry that is matched).
  • Forwarding table entries include both a specific set of match criteria (a set of values or wildcards, or an indication of what portions of a packet should be compared to a particular value/values/wildcards, as defined by the matching capabilities - for specific fields in the packet header, or for some other packet content), and a set of one or more actions for the data plane to take on receiving a matching packet. For example, an action may be to push a header onto the packet, for the packet using a particular port, flood the packet, or simply drop the packet.
  • TCP transmission control protocol
  • a network interface may be physical or virtual; and in the context of IP, an interface address is an IP address assigned to a NI, be it a physical NI or virtual NI.
  • a virtual NI may be associated with a physical NI, with another virtual interface, or stand on its own (e.g., a loopback interface, a point-to-point protocol interface).
  • a NI physical or virtual
  • a loopback interface (and its loopback address) is a specific type of virtual NI (and IP address) of a NE/VNE (physical or virtual) often used for management purposes; where such an IP address is referred to as the nodal loopback address.
  • IP addresses of that ND are referred to as IP addresses of that ND; at a more granular level, the IP address(es) assigned to NI(s) assigned to a NE/VNE implemented on a ND can be referred to as IP addresses of that NE/VNE.
  • Next hop selection by the routing system for a given destination may resolve to one path (that is, a routing protocol may generate one next hop on a shortest path); but if the routing system determines there are multiple viable next hops (that is, the routing protocol generated forwarding solution offers more than one next hop on a shortest path - multiple equal cost next hops), some additional criteria is used - for instance, in a connectionless network, Equal Cost Multi Path (ECMP) (also known as Equal Cost Multi Pathing, multipath forwarding and IP multipath) may be used (e.g., typical implementations use as the criteria particular header fields to ensure that the packets of a particular packet flow are always forwarded on the same next hop to preserve packet flow ordering).
  • ECMP Equal Cost Multi Path
  • a packet flow is defined as a set of packets that share an ordering constraint.
  • the set of packets in a particular TCP transfer sequence need to arrive in order, else the TCP logic will interpret the out of order delivery as congestion and slow the TCP transfer rate down.
  • Some NDs include functionality for authentication, authorization, and accounting (AAA) protocols (e.g., RADIUS (Remote Authentication Dial-In User Service), Diameter, and/or TACACS+ (Terminal Access Controller Access Control System Plus).
  • AAA can be provided through a client/server model, where the AAA client is implemented on a ND and the AAA server can be implemented either locally on the ND or on a remote electronic device coupled with the ND.
  • Authentication is the process of identifying and verifying a subscriber. For instance, a subscriber might be identified by a combination of a username and a password or through a unique key.
  • Authorization determines what a subscriber can do after being authenticated, such as gaining access to certain electronic device information resources (e.g., through the use of access control policies). Accounting is recording user activity.
  • end user devices may be coupled (e.g., through an access network) through an edge ND (supporting AAA processing) coupled to core NDs coupled to electronic devices implementing servers of service/content providers.
  • AAA processing is performed to identify for a subscriber the subscriber record stored in the AAA server for that subscriber.
  • a subscriber record includes a set of attributes (e.g., subscriber name, password, authentication information, access control information, rate-limiting information, policing information) used during processing of that subscriber's traffic.
  • Certain NDs internally represent end user devices (or sometimes customer premise equipment (CPE) such as a residential gateway (e.g., a router, modem)) using subscriber circuits.
  • CPE customer premise equipment
  • a subscriber circuit uniquely identifies within the ND a subscriber session and typically exists for the lifetime of the session.
  • a ND typically allocates a subscriber circuit when the subscriber connects to that ND, and correspondingly deallocates that subscriber circuit when that subscriber disconnects.
  • Each subscriber session represents a distinguishable flow of packets communicated between the ND and an end user device (or sometimes CPE such as a residential gateway or modem) using a protocol, such as the point-to-point protocol over another protocol (PPPoX) (e.g., where X is Ethernet or
  • Asynchronous Transfer Mode (ATM)
  • Ethernet 802.1Q Virtual LAN (VLAN), Internet Protocol, or ATM
  • ATM Asynchronous Transfer Mode
  • a subscriber session can be initiated using a variety of mechanisms (e.g., manual provisioning a dynamic host configuration protocol (DHCP), DHCP/client-less internet protocol service (CLIPS) or Media Access Control (MAC) address tracking).
  • DHCP dynamic host configuration protocol
  • CLIPS client-less internet protocol service
  • MAC Media Access Control
  • PPP point-to-point protocol
  • DSL digital subscriber line
  • DSL digital subscriber line
  • DHCP When DHCP is used (e.g., for cable modem services), a username typically is not provided; but in such situations other information (e.g., information that includes the MAC address of the hardware in the end user device (or CPE)) is provided.
  • CPE end user device
  • a virtual circuit synonymous with virtual connection and virtual channel, is a connection oriented communication service that is delivered by means of packet mode communication.
  • Virtual circuit communication resembles circuit switching, since both are connection oriented, meaning that in both cases data is delivered in correct order, and signaling overhead is required during a connection establishment phase.
  • Virtual circuits may exist at different layers. For example, at layer 4, a connection oriented transport layer datalink protocol such as Transmission Control Protocol (TCP) may rely on a connectionless packet switching network layer protocol such as IP, where different packets may be routed over different paths, and thus be delivered out of order.
  • TCP Transmission Control Protocol
  • IP connectionless packet switching network layer protocol
  • the virtual circuit is identified by the source and destination network socket address pair, i.e. the sender and receiver IP address and port number.
  • TCP includes segment numbering and reordering on the receiver side to prevent out-of-order delivery.
  • Virtual circuits are also possible at Layer 3 (network layer) and Layer 2 (datalink layer); such virtual circuit protocols are based on connection oriented packet switching, meaning that data is always delivered along the same network path, i.e. through the same NEs/VNEs.
  • the packets are not routed individually and complete addressing information is not provided in the header of each data packet; only a small virtual channel identifier (VCI) is required in each packet; and routing information is transferred to the NEs/VNEs during the connection establishment phase;
  • VCI virtual channel identifier
  • VCI virtual channel identifier
  • VCI virtual channel identifier
  • VCI virtual channel identifier
  • ATM Asynchronous Transfer Mode
  • VCI virtual path identifier
  • VCI virtual channel identifier
  • GPRS General Packet Radio Service
  • MPLS Multiprotocol label switching
  • Each VNE e.g., a virtual router, a virtual bridge (which may act as a virtual switch instance in a Virtual Private LAN Service (VPLS) is typically independently administrable.
  • each of the virtual routers may share system resources but is separate from the other virtual routers regarding its management domain, AAA (authentication, authorization, and accounting) name space, IP address, and routing database(s).
  • AAA authentication, authorization, and accounting
  • Multiple VNEs may be employed in an edge ND to provide direct network access and/or different classes of services for subscribers of service and/or content providers.
  • interfaces that are independent of physical NIs may be configured as part of the VNEs to provide higher-layer protocol and service information (e.g., Layer 3 addressing).
  • the subscriber records in the AAA server identify, in addition to the other subscriber configuration requirements, to which context (e.g., which of the VNEs/NEs) the corresponding subscribers should be bound within the ND.
  • a binding forms an association between a physical entity (e.g., physical NI, channel) or a logical entity (e.g., circuit such as a subscriber circuit or logical circuit (a set of one or more subscriber circuits)) and a context's interface over which network protocols (e.g., routing protocols, bridging protocols) are configured for that context. Subscriber data flows on the physical entity when some higher- layer protocol interface is configured and associated with that physical entity.
  • a physical entity e.g., physical NI, channel
  • a logical entity e.g., circuit such as a subscriber circuit or logical circuit (a set of one or more subscriber circuits)
  • network protocols e.g., routing protocols, bridging protocols

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

L'invention concerne des procédés et des appareils d'activation de routage ininterrompu. Un ou plusieurs paquets de protocole de transport sont reçus (4a) à un premier élément de réseau, NE, 101A, d'un dispositif de réseau homologue 102, le ou les paquets de protocole de transport contenant un message de protocole de routage associé à une session de protocole de routage établie entre le premier élément de réseau 101A et le dispositif de réseau homologue 102. Le message de protocole de routage est récupéré (4b) et un message de synchronisation est transmis (5) à un second élément de réseau 101B, le second élément de réseau 101B et le premier élément de réseau 101A faisant partie d'un système redondant 101. Le message de synchronisation contient le message de protocole de routage récupéré, un identifiant d'une session de protocole de transport associée à la session de protocole de routage, et des états de protocole de transport actuels associés à la session de protocole de transport. Le flux passe alors à l'étape (6a) où le NE 101B transmet un message d'accusé de réception au NE actif, pour confirmer la réception du message de synchronisation. Le NE de secours 101B met à jour (6b) des informations de protocole de routage et utilise les informations de mappage stockées au NE pour mettre à jour la session de protocole de transport local associée à la session de protocole de transport du NE actif 101A identifié dans le message de synchronisation. Ainsi, au lieu de transmettre tous les paquets de protocole de transport (segments TCP) reçus au NE actif 101A, et devant être traités par une pile de transport (pile TCP) du NE de secours 101B afin de déterminer des états de transport actuels et/ou des informations de protocole de routage mises à jour (transmises à l'intérieur des segments TCP) comme c'est le cas dans les techniques de l'état de la technique, les modes de réalisation de la présente invention traitent les paquets de transport au NE actif 101A et transmettent uniquement les informations pertinentes qui peuvent est nécessaires pour exécuter une transition sans coupure des connecteurs TCP, du NE 101A au NE 101B, lorsqu'une commutation se produit dans le système redondant.
PCT/IB2016/051951 2016-04-06 2016-04-06 Procédé et appareil d'activation de routage ininterrompu (snr) dans un réseau de transmission par paquets WO2017175033A1 (fr)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/IB2016/051951 WO2017175033A1 (fr) 2016-04-06 2016-04-06 Procédé et appareil d'activation de routage ininterrompu (snr) dans un réseau de transmission par paquets

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/IB2016/051951 WO2017175033A1 (fr) 2016-04-06 2016-04-06 Procédé et appareil d'activation de routage ininterrompu (snr) dans un réseau de transmission par paquets

Publications (1)

Publication Number Publication Date
WO2017175033A1 true WO2017175033A1 (fr) 2017-10-12

Family

ID=55702046

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/IB2016/051951 WO2017175033A1 (fr) 2016-04-06 2016-04-06 Procédé et appareil d'activation de routage ininterrompu (snr) dans un réseau de transmission par paquets

Country Status (1)

Country Link
WO (1) WO2017175033A1 (fr)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3691207A1 (fr) * 2019-01-31 2020-08-05 Siemens Aktiengesellschaft Procédé de fonctionnement d'un système de communication doté de routeurs redondants et routeurs
CN113114641A (zh) * 2021-03-30 2021-07-13 烽火通信科技股份有限公司 一种单cpu实现协议nsr的方法及系统
WO2021244588A1 (fr) * 2020-06-04 2021-12-09 华为技术有限公司 Procédé de traitement d'un message de routage, dispositif de communication, support d'enregistrement et système
WO2022141440A1 (fr) * 2020-12-31 2022-07-07 Telefonaktiebolaget Lm Ericsson (Publ) Procédé et élément de réseau pour la redondance de réseau
WO2024092664A1 (fr) * 2022-11-03 2024-05-10 北京小米移动软件有限公司 Procédé et appareil de distinction de session
US12001888B2 (en) 2022-01-28 2024-06-04 Hewlett Packard Enterprise Development Lp Server instance allocation for execution of application instances

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1331769A1 (fr) * 2002-01-24 2003-07-30 Alcatel Canada Inc. Procédé et dispositif pour fournir des processus redondants de protocole dans un élément de réseau
US20130258839A1 (en) * 2012-03-28 2013-10-03 Dawei Wang Inter-chassis redundancy with coordinated traffic direction
US9021459B1 (en) * 2011-09-28 2015-04-28 Juniper Networks, Inc. High availability in-service software upgrade using virtual machine instances in dual control units of a network device

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1331769A1 (fr) * 2002-01-24 2003-07-30 Alcatel Canada Inc. Procédé et dispositif pour fournir des processus redondants de protocole dans un élément de réseau
US9021459B1 (en) * 2011-09-28 2015-04-28 Juniper Networks, Inc. High availability in-service software upgrade using virtual machine instances in dual control units of a network device
US20130258839A1 (en) * 2012-03-28 2013-10-03 Dawei Wang Inter-chassis redundancy with coordinated traffic direction

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3691207A1 (fr) * 2019-01-31 2020-08-05 Siemens Aktiengesellschaft Procédé de fonctionnement d'un système de communication doté de routeurs redondants et routeurs
WO2020156746A1 (fr) * 2019-01-31 2020-08-06 Siemens Aktiengesellschaft Procédé destiné à faire fonctionner un système de communication à routeurs redondants et routeurs
CN113302886A (zh) * 2019-01-31 2021-08-24 西门子股份公司 用于运行具有冗余路由器的通信系统的方法和路由器
US11329915B2 (en) 2019-01-31 2022-05-10 Siemens Aktiengesellschaft Router and method for operating a communication system having redundant routers
CN113302886B (zh) * 2019-01-31 2023-04-07 西门子股份公司 用于运行具有冗余路由器的通信系统的方法和路由器
WO2021244588A1 (fr) * 2020-06-04 2021-12-09 华为技术有限公司 Procédé de traitement d'un message de routage, dispositif de communication, support d'enregistrement et système
WO2022141440A1 (fr) * 2020-12-31 2022-07-07 Telefonaktiebolaget Lm Ericsson (Publ) Procédé et élément de réseau pour la redondance de réseau
CN113114641A (zh) * 2021-03-30 2021-07-13 烽火通信科技股份有限公司 一种单cpu实现协议nsr的方法及系统
CN113114641B (zh) * 2021-03-30 2022-10-14 烽火通信科技股份有限公司 一种单cpu实现协议nsr的方法及系统
US12001888B2 (en) 2022-01-28 2024-06-04 Hewlett Packard Enterprise Development Lp Server instance allocation for execution of application instances
WO2024092664A1 (fr) * 2022-11-03 2024-05-10 北京小米移动软件有限公司 Procédé et appareil de distinction de session

Similar Documents

Publication Publication Date Title
EP3420708B1 (fr) Réacheminement dynamique dans le système redondant d'un réseau à commutation par paquets
EP3378193B1 (fr) Élection et réélection d'un forwarder désigné sur l'échec d'un fournisseur dans une topologie de redondance tout-active
US10581726B2 (en) Method and apparatus for supporting bidirectional forwarding (BFD) over multi-chassis link aggregation group (MC-LAG) in internet protocol (IP) multiprotocol label switching (MPLS) networks
EP3304812B1 (fr) Procédé et système pour la resynchronisation d'états de transfert dans un dispositif de transfert de réseau
US9774524B2 (en) Method and apparatus for fast reroute, control plane and forwarding plane synchronization
EP3488564B1 (fr) Procédé de convergence rapide dans un réseau de recouvrement de la couche 2 et support de stockage non transitoire lisible par ordinateur
US10841207B2 (en) Method and apparatus for supporting bidirectional forwarding (BFD) over multi-chassis link aggregation group (MC-LAG) in internet protocol (IP) networks
US20190286469A1 (en) Methods and apparatus for enabling live virtual machine (vm) migration in software-defined networking networks
US20160285753A1 (en) Lock free flow learning in a network device
US20160316011A1 (en) Sdn network element affinity based data partition and flexible migration schemes
US20160323179A1 (en) Bng subscribers inter-chassis redundancy using mc-lag
WO2017175033A1 (fr) Procédé et appareil d'activation de routage ininterrompu (snr) dans un réseau de transmission par paquets
WO2017221050A1 (fr) Gestion efficace de trafic multi-destination dans des réseaux privés virtuels ethernet à hébergements multiples (evpn)
US11343332B2 (en) Method for seamless migration of session authentication to a different stateful diameter authenticating peer
US20220247679A1 (en) Method and apparatus for layer 2 route calculation in a route reflector network device
US11451637B2 (en) Method for migration of session accounting to a different stateful accounting peer
US20230239235A1 (en) Transient loop prevention in ethernet virtual private network egress fast reroute
WO2017149364A1 (fr) Réacheminement de trafic coordonné dans un système avec redondance inter-châssis

Legal Events

Date Code Title Description
NENP Non-entry into the national phase

Ref country code: DE

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 16715635

Country of ref document: EP

Kind code of ref document: A1

122 Ep: pct application non-entry in european phase

Ref document number: 16715635

Country of ref document: EP

Kind code of ref document: A1