WO2007139542A1 - Uninterrupted network control message generation during local node outages - Google Patents

Uninterrupted network control message generation during local node outages Download PDF

Info

Publication number
WO2007139542A1
WO2007139542A1 PCT/US2006/020681 US2006020681W WO2007139542A1 WO 2007139542 A1 WO2007139542 A1 WO 2007139542A1 US 2006020681 W US2006020681 W US 2006020681W WO 2007139542 A1 WO2007139542 A1 WO 2007139542A1
Authority
WO
WIPO (PCT)
Prior art keywords
state machine
messages
cache
network
nodes
Prior art date
Application number
PCT/US2006/020681
Other languages
French (fr)
Inventor
Dieter Stoll
Georg Wenzel
Wolfgang Thomas
Original Assignee
Lucent Technologies Inc.
Lucent Technologies Network Systems Gmbh
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Lucent Technologies Inc., Lucent Technologies Network Systems Gmbh filed Critical Lucent Technologies Inc.
Priority to KR1020087029207A priority Critical patent/KR101017540B1/en
Priority to CNA2006800547591A priority patent/CN101461196A/en
Priority to PCT/US2006/020681 priority patent/WO2007139542A1/en
Priority to EP06771449A priority patent/EP2030378A4/en
Priority to JP2009513106A priority patent/JP2009539305A/en
Publication of WO2007139542A1 publication Critical patent/WO2007139542A1/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0677Localisation of faults
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L45/00Routing or path finding of packets in data switching networks
    • H04L45/28Routing or path finding of packets in data switching networks using route fault recovery
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L45/00Routing or path finding of packets in data switching networks
    • H04L45/58Association of routers

Definitions

  • the present invention generally relates to computer networks.
  • the present invention relates to packet switching and control plane protocols.
  • Packet switching networks include control plane protocols, such as the spanning tree protocol (STP), the generic attribute registration protocol (GARP) and its version for virtual local area networks, the VLAN registration protocol (GVRP), the link aggregation control protocol (LACP), Y.1711 fast failure detection (FFD), and reservation protocol (RSVP) refresh.
  • Control protocols have the responsibility to, for example, control the topology and distribution of how layer 2 (L2) traffic flows through the network. These protocols are realized in the state machines running on each participating network element. Once a stable network configuration has been reached, the protocols tend to repeat the same messages they send to the network. Different messages usually result from an operator or defect driven change in the network.
  • a failure in participating in the protocol by a network element leads to traffic rearrangements once a timeout period ranging from a few milliseconds to a few seconds is exceeded.
  • traffic rearrangements involve the entire network.
  • the packet control protocols fall into one of three categories. They are (1 ) unprotected; (2) protected via proprietary communication with the neighbor network elements prior to control plane outages; or (3) protected by standardized graceful restart technology, which requires interaction with neighbor network elements shortly before or after a protocol outage.
  • the result will, in general, be that the traffic flow through the network is reconfigured. During the time of reconfiguration, traffic loss will occur in parts of the network that can be as large as the entire network domain.
  • Exemplary embodiments of the present invention prevent packet network reconfiguration and associated traffic loss by providing uninterrupted network control message generation, during local node outages.
  • a message cache receives a number of sent messages from a protocol state machine for a local node and forwards them to other nodes in the network.
  • the message cache also receives messages from the nodes.
  • the message cache stores both the sent and received messages in a buffer.
  • the message cache Upon failure of the protocol state machine, the message cache sends messages to and receives messages from the nodes, so long as the buffer remains valid.
  • the messages may be sent periodically to the nodes.
  • the message cache may determine whether the buffer is valid based on the messages in the buffer and messages received from the nodes after the
  • the method may also include switching to a standby protocol state machine, upon failure of the active protocol state machine, where the standby protocol state machine includes another buffer replicating the first buffer.
  • Another embodiment is a computer readable medium storing instructions for performing this method for providing uninterrupted network control message generation during local node outages.
  • Yet another embodiment is a system for providing uninterrupted network control message generation during local node outages, including a protocol state machine and a message cache.
  • the protocol state machine generates messages.
  • the message cache receives the messages from the protocol state machine and forwards them to nodes in the network.
  • the message cache stores both the sent and received messages in one or more buffers.
  • the message cache Upon failure of the protocol state machine, the message cache sends messages to and receives message from the nodes, so long as the message cache remains valid.
  • the message cache may include a timer for sending periodic messages to the nodes and a status control determining whether the message cache is valid.
  • the system may include a worker node and a protection node, each having protocol state machines and message caches so that the protection node is able to become active when the worker node fails.
  • the protection message cache may replicate the worker message cache, while the worker protocol state machine is active.
  • Figure 1 is a block diagram illustrating an exemplary embodiment of a cache concept for a default case, when a state machine for a control plane protocol is active
  • Figure 2 is a block diagram illustrating the exemplary embodiment of the cache concept of Figure 1 for a control plane failure case, when the protocol state machine is unavailable and the network state is stable;
  • Figure 3 is a block diagram illustrating the exemplary embodiment of the cache concept of Figure 1 for a control plane failure case, when the protocol state machine is unavailable and the network state is unstable;
  • Figure 4 is a block diagram illustrating an exemplary embodiment of a cache concept for a default case, when two instances of a state machine exist (worker and protection), the worker state machine being active, the protection state machine being standby, and each being associated with a cache;
  • Figure 5 is a block diagram illustrating the exemplary embodiment of the cache concept of Figure 4 for an intermediate state when the worker state machine was active and failed, the protection state machine in standby state is recovering (from standby to full operation), but the network state is stable;
  • Figure 6 is a block diagram illustrating the exemplary embodiment of the cache concept of Figure 4 when the protection state machine is active and the worker state machine is standby (after a switch over from worker to protection);
  • Figure 7 is a chart showing selected state transitions and events on a time line for the exemplary embodiment of the cache concept of Figure 4;
  • Figure 8 is a block diagram illustrating an exemplary embodiment of a distributed cache.
  • the network element should maintain a stable network if the only cause of instability is the equipment protection switch, i.e., for the case of a single failure (e.g., circuit pack defect) but also for the case of operator driven events such as manual switches.
  • the network element should minimize network impact in case a network is already undergoing a reconfiguration, e.g., due to a remote network element failure, while simultaneously the protection switch is required due to local defect (double failure) or operator commands.
  • Exemplary embodiments of the present invention achieve these goals not only for this L2 Ethernet example, but more broadly for any failure (e.g., hardware defect) causing a temporary unavailability of the local control plane of any network for many protocols.
  • the network element behavior may be described by three states.
  • the state machine In the first state, the state machine is fully operable and reacting to all requests.
  • the state machine In the second state, the state machine is not available but the cache maintains PDU sending until a change in the network happens, which invalidates the cache, or the state machine becomes operable.
  • both the state machine and the cache are not available, e.g., due to an ongoing reconfiguration in the network while the state machine is inoperable, or due to the protocol state machine and cache not being synchronized.
  • Exemplary embodiments of the caching concept are derived from the observation that in a stable network, the spanning tree protocol nodes distribute identical PDUs to their neighbors repeatedly. A network defect or network change is detected, if no PDUs have been received by a spanning tree node during three consecutive sending periods or the content of a PDU is different from the preceding PDU. Thus, in an otherwise stable network topology, the activity of a spanning tree protocol machine can be suspended for an indefinite amount of time, as long as the periodic sending of PDUs is maintained. Thus, the caching concept uses this fact so that the network demands for PDUs are satisfied from the cache, without the need for all of the configuration, protocol state machines, and the like being started and synchronized.
  • the caching concept relieves the demand regarding recovery speed of all software components, except the one operating the cache (which is in hot standby). There are certain times when the cache can be considered valid for PDU sending and other times when the cache needs to be invalidated. Note that within a stable network topology, to some extent, even new services can be established (e.g., forwarding traffic can be modified in terms of new quality of service (QoS) parameters, new customers (distinguished by C-VLANs) can be added to a service provider (802.1 ad) network, etc.).
  • QoS quality of service
  • a packet switched network is a network in which messages or fragments of messages (packets) are sent to their destination through the most expedient route, as determined by a routing algorithm.
  • a control plane is a virtual network function used to set up, maintain, and terminate data plane connections. It is virtual in the sense that it is distributed over network nodes that need to interoperate to realize the function.
  • a data plane is a virtual network path used to distribute data between nodes. Some networks may disaggregate control and forwarding planes as well.
  • the term cache refers to any storage managed to take advantage of locality of access.
  • a message cache stores messages. The message cache is instantiated and its
  • 468327 1 messages are kept in a synchronous state with the messages that the control plane sends/receives to/from the network.
  • the cache satisfies the demands of the network by sending the cached messages. Once the control plane recovers, the cache again follows the control operation and keeps in sync.
  • Unstable networks are those where the traffic flow distribution has not reached a stable state, such as power on scenarios of a network element. Double failures are those scenarios where, in addition to a control plane outage in one network element, other network elements experience defects or operator driven reconfigurations.
  • FIG. 1 illustrates an exemplary embodiment of a cache concept 100 for a default case, when a state machine 102 for a control plane protocol is active.
  • the control plane protocol may be any kind of protocol, e.g., STP, VLAN registration protocol, LACP, Y.1711 FFD, or RSVP refresh.
  • the protocol state machine 102 communicates (via intermediate hardware layers) with the neighboring nodes 106 and the rest of the network 108.
  • this embodiment includes a message cache 104 interposed between the protocol state machine 102 and the network 108.
  • the protocol state machine 102 sends messages to the message cache 104, which then forwards those messages to the network 108.
  • the message cache 104 captures communication between the protocol state machine 102 and the network by storing both sent messages 110 and received messages 112 in buffers.
  • the message cache 104 also includes a timer 114 and a status control 116.
  • the state machine 102 may convey additional state information
  • the contents of the message cache 104 vary depending on the control plane protocol implemented.
  • the message cache 104 stores what is needed to temporarily serve the needs of the network 108 in the case of a failure of the state machine 102.
  • Figure 2 illustrates the exemplary embodiment of the cache concept 100 of Figure 1 for a control plane failure case, when the protocol state machine 102 is unavailable and the network state is stable.
  • the message cache 104 protects against situations where the protocol state machine is unavailable, for any reason by temporarily continuing to serve the network. For example, the processor holding the protocol state machine 102 may be rebooting.
  • the message cache 104 generally continues to send messages from the buffers so that neighboring nodes 106 in the network 108 do not become aware that the protocol state machine 102 is unavailable. Communication to the neighboring nodes 106 is mimicked based on information stored in the message cache 104.
  • the message cache 104 bridges at least a portion of the time that the protocol state machine 102 is unavailable.
  • Protocols that periodically send the same message (e.g., hello message, update message) to the neighboring nodes 106 can easily be mimicked.
  • the message cache 104 uses the timer 114 to send messages stored in the sent messages buffer 110 periodically in the same manner as the protocol state machine 102. As a result, the neighboring nodes 106 do not detect any change in the protocol state machine 102.
  • the message cache 104 receives messages from neighboring nodes 106 and stores them in the received message buffer 112.
  • the message cache 104 is able to detect any event or change (e.g., state change) in the network 108 that would make the message cache 104 invalid by examining the status control 116 and the received messages.
  • the status control 116 determines whether the message cache 104 is valid or invalid. When the message cache 104 becomes invalid, it ceases sending messages because it cannot properly react to the event or change in the network 108.
  • the message cache 104 is a simplified component to simulate at least a portion of the protocol state machine 102. An efficient implementation of the message cache 104 probably does not simulate the complete behavior of the
  • the degree of simplicity or complexity of the message cache 104 may vary depending on the control plane protocol implemented.
  • the message cache may simulate transition between two or more states of the protocol state machine 102 with logic in the status control 116.
  • the message cache may be implemented in hardware, firmware, or software (e.g., field-programmable gate array (FPGA) or application-specific integrated circuit (ASIC)).
  • FPGA field-programmable gate array
  • ASIC application-specific integrated circuit
  • the message cache 104 continues to mimic the protocol state machine so long as it remains valid, which may be a short time or the entire time the protocol state machine is unavailable, depending on circumstances. Some protocols require updates in the milliseconds range, while others require updates in the seconds range. This embodiment is not limited to any particular protocol or degree of complexity of the status control logic 116.
  • Figure 3 illustrates the exemplary embodiment of the cache concept 100 of Figure 1 for a control plane failure case, when the protocol state machine 102 is unavailable and the network state is unstable.
  • the message cache 104 transitions into an invalid state.
  • the status control 116 determines that some event occurred, making the network state unstable so that simulation of the protocol state machine 102 by the message cache 104 must stop according to the particular protocol implemented.
  • the neighboring nodes 106 may become aware that the protocol state machine 102 is failed or otherwise unavailable, as if no message cache 104 were present.
  • Figure 4 illustrates an exemplary embodiment of a cache concept 400 for a default case, when two instances of a state machine exist (worker and protection), the worker state machine being active, the protection state machine being standby, and each being associated with a cache.
  • This embodiment is a particular realization of a control plane protocol in a particular context; however, the invention is not limited to any particular implementation. In this embodiment, network availability is improved by caching messages.
  • a blade server is a server chassis housing multiple thin, modular electronic circuit boards,
  • Each blade is a server on a card, containing processors, memory, integrated network controllers, and input/output (I/O) ports. Blade servers increasingly allow the inclusion of functions, such as network switches and routers as individual blades.
  • the state machines (SMs) for two such blades are shown in Figure 4: a worker state machine 406 for a worker packet switch (PS) 402 and a protection state machine 408 for a protection PS 404.
  • the worker state machine 408 is initially active and the protection state machine 406 is initially standby and soon to become active.
  • the two instances (active/standby) of the protocol state machine are located on different hardware (e.g., CPUs) but still within the same network node.
  • This embodiment illustrates the worker state machine 406 and the protection state machine 404 for a spanning tree protocol (STP); however, the invention is not limited to any particular protocol.
  • a spanning tree protocol provides a loop free topology for any bridged network.
  • the IEEE standard 802.1 D defines STP.
  • the worker PS 402 and protection PS 404 each include a STP state machine 406, 408 for a specific independent bridge partition (IBP) (e.g., one Ethernet switch instance) and timers 416, 412.
  • IBP independent bridge partition
  • a network bridge (a/k/a network switch) connects multiple network segments (e.g., partitions, domains) and forwards traffic from one segment to another.
  • These state machines 406, 408 are in a control plane and create messages for sending to neighboring nodes 106 in the rest of the network 108.
  • a worker cache 410 is interposed between the worker state machine 406 and the network 108.
  • Figure 4 illustrates an initial state where the worker state machine 406 is active, sending/receiving messages to/from the network 108 and storing messages in the worker cache 410.
  • the worker cache 410 stores both the messages sent out 412 and the messages received 414.
  • Bridge protocol data units (BPDUs) are the frames that carry the STP information.
  • a switch sends a BPDU frame using a unique MAC address of a port itself as a source address and a destination address of the STP multicast address.
  • a protection cache 418 is synchronized with the worker cache 410 by cache replication for the protection state machine 408, which is in a warm standby state, waiting to be started.
  • Figure 5 illustrates the exemplary embodiment of the cache concept 400 of Figure 4 for an intermediate state when the worker state machine 406 was active and failed (e.g., software crash), the protection state machine 408 in standby state is recovering (from standby to full operation), but the network state is stable.
  • This intermediate state occurs, because there is a delay between the time when the worker state machine 406 fails and the time when the protection state machine 408 is ready (i.e.,, started after boot up) to serve the network 108.
  • the protection cache 418 is now the active cache and operates as described for Figure 2.
  • Figure 6 illustrates the exemplary embodiment of the cache concept of
  • Figure 4 when the protection state machine 408 is active and the worker state machine is standby (after a switch over from worker to protection). Comparing Figures 4 and 6, the protection state machine 408 in the scenario illustrated by Figure 6 behaves similarly to the worker state machine 406 in the scenario illustrated by Figure 4, i.e., behaving as the active state machine.
  • the protection cache 418 stores both the messages sent out 420 and the messages received 422 and, thus, operates in the same way as in Figure 4. While the protection state machine 408 is active, messages in the protection cache 418 are replicated to the worker cache 410.
  • Figure 7 is a chart showing selected state transitions and events on a time line for the worker state machine 406, protection state machine 408, and protection cache 418 of Figure 4.
  • Figure 7 illustrates various combinations of states when the protection cache 418 is valid and can be used temporarily to serve the needs of the network 108 and when the protection cache 418 is invalid and cannot be used.
  • Figure 7 illustrates several scenarios. The first scenario is from Ti to T 5 , the second is from T 5 to T 9 , and the third is from T 9 to T 12 .
  • the first scenario starts at Ti.
  • T-i when the worker state machine 406 is in an active state and the protection state machine 408 is in a synchronizing state, the protection cache 418 is invalid and replicates the worker cache 410.
  • the protection state machine 408 is initially in the synchronizing state, because the protection PS 404 blade has been added to the network element.
  • T 2 When synchronization is completed at T 2 , the protection state
  • the protection state machine 408 transitions from starting-up to active and the protection cache 418 is updating (i.e., taking a passive role by continuing the synchronizing with the active protocol state machine 408).
  • the worker state machine 406 transitions from synchronizing to standby. After this is done, at T 5 , the protection state machine 408 is active and the worker state machine 406 is standby.
  • the second scenario starts at T 5 .
  • the worker state machine 406 is active, the protection state machine 408 is synchronizing, and the protection cache 418 is invalid.
  • the protection state machine 408 transitions from synchronizing to standby and the protection cache 418 is ready and inactive.
  • a network reconfiguration occurs at T 7 (e.g., a network element fails)
  • the worker state machine 406 transitions from active to reconfiguring and the protection cache 418 becomes invalid at T 7 .
  • the worker state machine 406 handles changing state in the network. After the network has stabilized at Te, the worker state machine 406 transitions from reconfiguring to active and the protection cache 418 becomes ready and inactive again.
  • the third scenario starts at T 9 and differs from the second scenario in the ordering of the events.
  • the worker state machine 406 is active, the protection state machine 408 is synchronizing, and the protection cache 418 is invalid.
  • a network reconfiguration occurs during the interval from T 9 to Tn.
  • the worker state machine 406 transitions from active to reconfiguring.
  • the protection state machine 408 transitions from synchronizing to standby.
  • the protection cache 418 does not transition from invalid to ready, inactive, until T 12 , when the worker state machine 406 transitions from reconfiguring to active.
  • each independent bridge partition has its own cache implementation to guarantee independent operations and reconfigurations.
  • each port has a certain port state. Depending on the state of the bridge, PDUs are sent, received, or both.
  • the cache not only remembers the PDUs that are sent or received, but also that no PDUs have to be sent or received. Note that on some ports PDU
  • caches 468327 1 13 sending/receiving will stop at some point during the network convergence process, i.e., the cache is filled only after the network converges.
  • caches are kept in hot-standby mode.
  • caches carry a flag indicating whether they are valid for PDU generation.
  • Various situations may lead to invalidating the cache, e.g., ongoing reconfigurations in the network, provisioning which demands calculation of the spanning tree and changes in BPDUs, etc.
  • the cache on the active PS is updated by incoming and outgoing PDUs.
  • the cache on the standby PS is immediately invalidated in the following conditions: when network provided PDUs differ from the cache content and when PDUs differ from the cache content. Note that both differences indicate a change in the network, which can only be handled by a working spanning tree state machine. Any replication of outdated PDUs may lead to serious impact on customer traffic and convergence of the spanning tree. For example, loops could be created. Note that it is the cache on the protection (standby) PS that is invalidated in case of an active worker PS. In the case where the worker PS is failing and the protection PS is in transition from standby to active, the protection PS' cache is invalidated. Note that it may be necessary to change all port states to discarding when the cache is invalidated on a just recovering PS.
  • the cache may be declared valid only when the topology has converged.
  • an active state machine is required. Note that the end of the network convergence period can either be told by the protocol state machine or it is derived from a sufficiently long stable network state. This may require tracking changes in PDUs over several seconds. This adds to the time the system (network) is vulnerable for equipment protection switches, but only after a possibly traffic affecting network configuration already happened. Note that after a switch-overswitchover and in a stable network, the PDUs generated from the state machine after its recovery will be unchanged to those in the cache, i.e., in this situation, the topology can be considered converged when both hold. The cache was active and is set to inactive by the first PDU send from the state machine. All PDUs in the cache
  • 468327 1 14 have at least once been updated by PDUs from the state machine since the time the cache was deactivated.
  • the cache may be declared valid only when the standby PS is fully synchronized.
  • there is timer triggering of PDU generation from the cache In the event that the protection PS status changes to active PDUs is sent from the cache it is flagged valid. To this end an appropriate repetition timer (and distribution over the allowed period) is started.
  • the state in which PDUs are created from the cache starts with the activation status, provided the cache is flagged valid. It ends when either different PDUs are received from the network or when the state machine has fully recovered. This can be recognized by the fact that the state machine starts sending PDUs to the network.
  • the first PDU can be used as a trigger to stop the cache activity, because the state machine is capable of sending out all remaining PDUs in the required time interval.
  • FIG 8 illustrates an exemplary embodiment of a distributed cache. This example shows how the message cache may be distributed within a system as opposed to a single message cache for a system.
  • the periodic message cache 810 is distributed on two input/output (I/O) packs 802. The number of I/O packs is, of course, not limited to two.
  • Each I/O pack 802 includes packet forwarding hardware 810 and a board controller 808.
  • a local node 804 includes packet forwarding hardware 812 and one or more central packet control plane processors 814.
  • the central packet control plane processor 814 sends updates to the periodic message caches 810 on the board controllers 808 of the I/O packs 802.
  • the periodic message cache 810 sends outgoing periodic messages via packet forwarding hardware 810 in the I/O pack 802.
  • the periodic message caches 810 simulate a control plane protocol, when the control plane state machine is unavailable or fails.
  • Application protocols include any protocols that have periodic outgoing messages with constant contents, such as (R)STP, GVRP, RSVP, open shortest path first (OSPF), intermediate system-to-intermediate system (IS-IS or ISIS), Y.1711 , FFD, etc.
  • message caches may be implemented broadly in many other ways for many different system architectures. For
  • message caches may be on several hardware blades, on several computer processing units (CPUs), on several threads within one CPU, in FPGAs, ASICs and the like.
  • Embodiments of the present invention may be implemented in one or more computers in a network system.
  • Each computer comprises a processor as well as memory for storing various programs and data.
  • the memory may also store an operating system supporting the programs.
  • the processor cooperates with conventional support circuitry such as power supplies, clock circuits, cache memory, and the like as well as circuits that assist in executing the software routines stored in the memory.
  • the computer also contains input/output (I/O) circuitry that forms an interface between the various functional elements communicating with the computer.
  • Embodiments of the present invention may also be implemented in hardware or firmware, e.g., in FPGAs or ASICs.
  • the present invention may be implemented as a computer program product wherein computer instructions, when processed by a computer, adapt the operation of the computer such that the methods and/or techniques of the present invention are invoked or otherwise provided.
  • Instructions for invoking the inventive methods may be stored in fixed or removable media, transmitted via a data stream in a broadcast media or other signal-bearing medium, and/or stored within a working memory within a computing device operating according to the instructions.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

A caching mechanism is provided to prevent packet network reconfiguration and associated traffic loss in case of temporary control plane outages.

Description

UNINTERRUPTED NETWORK CONTROL MESSAGE GENERATION DURING LOCAL NODE OUTAGES
FIELD OF THE INVENTION The present invention generally relates to computer networks. In particular, the present invention relates to packet switching and control plane protocols.
BACKGROUND OF THE INVENTION Packet switching networks include control plane protocols, such as the spanning tree protocol (STP), the generic attribute registration protocol (GARP) and its version for virtual local area networks, the VLAN registration protocol (GVRP), the link aggregation control protocol (LACP), Y.1711 fast failure detection (FFD), and reservation protocol (RSVP) refresh. Control protocols have the responsibility to, for example, control the topology and distribution of how layer 2 (L2) traffic flows through the network. These protocols are realized in the state machines running on each participating network element. Once a stable network configuration has been reached, the protocols tend to repeat the same messages they send to the network. Different messages usually result from an operator or defect driven change in the network. A failure in participating in the protocol by a network element leads to traffic rearrangements once a timeout period ranging from a few milliseconds to a few seconds is exceeded. In some cases, traffic rearrangements involve the entire network. In current network elements, the packet control protocols fall into one of three categories. They are (1 ) unprotected; (2) protected via proprietary communication with the neighbor network elements prior to control plane outages; or (3) protected by standardized graceful restart technology, which requires interaction with neighbor network elements shortly before or after a protocol outage. In the unprotected case, the result will, in general, be that the traffic flow through the network is reconfigured. During the time of reconfiguration, traffic loss will occur in parts of the network that can be as large as the entire network domain. When the failed network element recovers, a
468327 1 second reconfiguration will occur to re-establish the traffic flow distribution prior to the failure. Again, traffic loss will occur in a similar order of magnitude as before. In the proprietary implementation, there are two disadvantages. First, it covers only part of the problem scenarios, namely those that are voluntarily entered (e.g., in case of an operator driven software upgrade in a network element) and which allows the failing network element to inform its neighbors of the control plane failure to come. Second, it is restricted to interacting network elements that possess these capabilities, i.e., it will not function in general interworking scenarios with other equipment vendors. In the standardized graceful restart case, only a small set of protocols are covered. If time constraints are small for telling neighbor elements after the failure that a graceful restart is to be applied, then the likelihood of missing the constraint for unintended failures is high. Missing the time limit will result in traffic loss, as the neighbor elements will detect the control plane outage and trigger network reconfiguration.
Accordingly, there is a need for a mechanism to prevent packet network reconfiguration and associated traffic loss in case of temporary packet control plane outages.
SUMMARY
Exemplary embodiments of the present invention prevent packet network reconfiguration and associated traffic loss by providing uninterrupted network control message generation, during local node outages.
One embodiment is a method for providing uninterrupted network control message generation during local node outages. A message cache receives a number of sent messages from a protocol state machine for a local node and forwards them to other nodes in the network. The message cache also receives messages from the nodes. The message cache stores both the sent and received messages in a buffer. Upon failure of the protocol state machine, the message cache sends messages to and receives messages from the nodes, so long as the buffer remains valid. The messages may be sent periodically to the nodes. The message cache may determine whether the buffer is valid based on the messages in the buffer and messages received from the nodes after the
468327 1 failure. The method may also include switching to a standby protocol state machine, upon failure of the active protocol state machine, where the standby protocol state machine includes another buffer replicating the first buffer.
Another embodiment is a computer readable medium storing instructions for performing this method for providing uninterrupted network control message generation during local node outages.
Yet another embodiment is a system for providing uninterrupted network control message generation during local node outages, including a protocol state machine and a message cache. The protocol state machine generates messages. The message cache receives the messages from the protocol state machine and forwards them to nodes in the network. The message cache stores both the sent and received messages in one or more buffers. Upon failure of the protocol state machine, the message cache sends messages to and receives message from the nodes, so long as the message cache remains valid. The message cache may include a timer for sending periodic messages to the nodes and a status control determining whether the message cache is valid. The system may include a worker node and a protection node, each having protocol state machines and message caches so that the protection node is able to become active when the worker node fails. The protection message cache may replicate the worker message cache, while the worker protocol state machine is active.
BRIEF DESCRIPTION OF THE DRAWINGS
The teachings of the present invention can be readily understood by considering the following detailed description in conjunction with the accompanying drawings, in which:
Figure 1 is a block diagram illustrating an exemplary embodiment of a cache concept for a default case, when a state machine for a control plane protocol is active; Figure 2 is a block diagram illustrating the exemplary embodiment of the cache concept of Figure 1 for a control plane failure case, when the protocol state machine is unavailable and the network state is stable;
468327 1 Figure 3 is a block diagram illustrating the exemplary embodiment of the cache concept of Figure 1 for a control plane failure case, when the protocol state machine is unavailable and the network state is unstable;
Figure 4 is a block diagram illustrating an exemplary embodiment of a cache concept for a default case, when two instances of a state machine exist (worker and protection), the worker state machine being active, the protection state machine being standby, and each being associated with a cache;
Figure 5 is a block diagram illustrating the exemplary embodiment of the cache concept of Figure 4 for an intermediate state when the worker state machine was active and failed, the protection state machine in standby state is recovering (from standby to full operation), but the network state is stable;
Figure 6 is a block diagram illustrating the exemplary embodiment of the cache concept of Figure 4 when the protection state machine is active and the worker state machine is standby (after a switch over from worker to protection); Figure 7 is a chart showing selected state transitions and events on a time line for the exemplary embodiment of the cache concept of Figure 4; and
Figure 8 is a block diagram illustrating an exemplary embodiment of a distributed cache.
To facilitate understanding, identical reference numerals have been used, where possible, to designate identical elements that are common to the figures.
DETAILED DESCRIPTION OF THE INVENTION
The description of the present invention is primarily within a general context of packet switched networks and control plane protocols. However, those skilled in the art and informed by the teachings herein will realize that main concept of the invention is generally applicable to computer networks and may be broadly applied to any network architecture and design, communication protocols, network software, network technologies, network services and applications, and network operations management. Accordingly, the general concepts of the present invention are broadly applicable and are not limited to any particular implementation.
468327 1 Introduction - L2 Ethernet Example in Conjunction with Equipment Protection
There is a need to maintain a stable network configuration for L2 Ethernet services under the condition of equipment protection switches that will affect the L2 control plane, i.e., spanning tree protocols and link aggregation control protocols, generic attribute registration protocol (GARP) and variants of it, and other protocols. It is possible for a local protection switch to lead to a reconfiguration of the entire spanning tree in the network, if protocol data unit (PDU) distribution is interrupted for about three seconds. This may cause traffic outages of several tens of seconds, until the network converges to a stable state again. Therefore, immediately after a protection switch, it is desirable for a network element to do the following. First, the network element should maintain a stable network if the only cause of instability is the equipment protection switch, i.e., for the case of a single failure (e.g., circuit pack defect) but also for the case of operator driven events such as manual switches. Second, the network element should minimize network impact in case a network is already undergoing a reconfiguration, e.g., due to a remote network element failure, while simultaneously the protection switch is required due to local defect (double failure) or operator commands. Exemplary embodiments of the present invention achieve these goals not only for this L2 Ethernet example, but more broadly for any failure (e.g., hardware defect) causing a temporary unavailability of the local control plane of any network for many protocols.
High Level Description of the Network Element Behavior The network element behavior may be described by three states. In the first state, the state machine is fully operable and reacting to all requests. In the second state, the state machine is not available but the cache maintains PDU sending until a change in the network happens, which invalidates the cache, or the state machine becomes operable. In the third state, both the state machine and the cache are not available, e.g., due to an ongoing reconfiguration in the network while the state machine is inoperable, or due to the protocol state machine and cache not being synchronized.
468327 1 High-Level Cache Concept- STP Example
Exemplary embodiments of the caching concept are derived from the observation that in a stable network, the spanning tree protocol nodes distribute identical PDUs to their neighbors repeatedly. A network defect or network change is detected, if no PDUs have been received by a spanning tree node during three consecutive sending periods or the content of a PDU is different from the preceding PDU. Thus, in an otherwise stable network topology, the activity of a spanning tree protocol machine can be suspended for an indefinite amount of time, as long as the periodic sending of PDUs is maintained. Thus, the caching concept uses this fact so that the network demands for PDUs are satisfied from the cache, without the need for all of the configuration, protocol state machines, and the like being started and synchronized. Thus, the caching concept relieves the demand regarding recovery speed of all software components, except the one operating the cache (which is in hot standby). There are certain times when the cache can be considered valid for PDU sending and other times when the cache needs to be invalidated. Note that within a stable network topology, to some extent, even new services can be established (e.g., forwarding traffic can be modified in terms of new quality of service (QoS) parameters, new customers (distinguished by C-VLANs) can be added to a service provider (802.1 ad) network, etc.).
High-Level Cache Concept - General
One embodiment includes a control plane and a message cache in a packet switched network. A packet switched network is a network in which messages or fragments of messages (packets) are sent to their destination through the most expedient route, as determined by a routing algorithm. A control plane is a virtual network function used to set up, maintain, and terminate data plane connections. It is virtual in the sense that it is distributed over network nodes that need to interoperate to realize the function. A data plane is a virtual network path used to distribute data between nodes. Some networks may disaggregate control and forwarding planes as well. The term cache refers to any storage managed to take advantage of locality of access. A message cache stores messages. The message cache is instantiated and its
468327 1 messages are kept in a synchronous state with the messages that the control plane sends/receives to/from the network. In case the control plane fails, the cache satisfies the demands of the network by sending the cached messages. Once the control plane recovers, the cache again follows the control operation and keeps in sync. The cache allows instances of the control plane state machines to fail while still transmitting all the traffic in the network. This concept works in most situations, except for unstable networks, double failures, and systems where the forwarding plane is not independent from the control plane. Unstable networks are those where the traffic flow distribution has not reached a stable state, such as power on scenarios of a network element. Double failures are those scenarios where, in addition to a control plane outage in one network element, other network elements experience defects or operator driven reconfigurations.
The present invention has many advantages, including significantly minimizing traffic loss in failure and software upgrade scenarios affecting the control plane. This gain is achieved locally if the network element supports a cache operation as described. A caching feature in a network element may be added to an existing network. Interoperability with other equipment is possible without the need for the other equipment to support a cache operation. Figure 1 illustrates an exemplary embodiment of a cache concept 100 for a default case, when a state machine 102 for a control plane protocol is active. The control plane protocol may be any kind of protocol, e.g., STP, VLAN registration protocol, LACP, Y.1711 FFD, or RSVP refresh. In a traditional network, the protocol state machine 102 communicates (via intermediate hardware layers) with the neighboring nodes 106 and the rest of the network 108. By contrast, this embodiment includes a message cache 104 interposed between the protocol state machine 102 and the network 108. The protocol state machine 102 sends messages to the message cache 104, which then forwards those messages to the network 108. The message cache 104 captures communication between the protocol state machine 102 and the network by storing both sent messages 110 and received messages 112 in buffers. The message cache 104 also includes a timer 114 and a status control 116. Optionally, the state machine 102 may convey additional state information
468327 1 to the status control 1 16 (i.e., in addition to the messages exchanged), depending on the particular protocol to be supported. The contents of the message cache 104 vary depending on the control plane protocol implemented. The message cache 104 stores what is needed to temporarily serve the needs of the network 108 in the case of a failure of the state machine 102.
Figure 2 illustrates the exemplary embodiment of the cache concept 100 of Figure 1 for a control plane failure case, when the protocol state machine 102 is unavailable and the network state is stable. The message cache 104 protects against situations where the protocol state machine is unavailable, for any reason by temporarily continuing to serve the network. For example, the processor holding the protocol state machine 102 may be rebooting. The message cache 104 generally continues to send messages from the buffers so that neighboring nodes 106 in the network 108 do not become aware that the protocol state machine 102 is unavailable. Communication to the neighboring nodes 106 is mimicked based on information stored in the message cache 104. Thus, the message cache 104 bridges at least a portion of the time that the protocol state machine 102 is unavailable. Protocols that periodically send the same message (e.g., hello message, update message) to the neighboring nodes 106 can easily be mimicked. The message cache 104 uses the timer 114 to send messages stored in the sent messages buffer 110 periodically in the same manner as the protocol state machine 102. As a result, the neighboring nodes 106 do not detect any change in the protocol state machine 102. The message cache 104 receives messages from neighboring nodes 106 and stores them in the received message buffer 112. The message cache 104 is able to detect any event or change (e.g., state change) in the network 108 that would make the message cache 104 invalid by examining the status control 116 and the received messages. The status control 116 determines whether the message cache 104 is valid or invalid. When the message cache 104 becomes invalid, it ceases sending messages because it cannot properly react to the event or change in the network 108.
The message cache 104 is a simplified component to simulate at least a portion of the protocol state machine 102. An efficient implementation of the message cache 104 probably does not simulate the complete behavior of the
468327 1 8 control plane protocol. The degree of simplicity or complexity of the message cache 104 may vary depending on the control plane protocol implemented. For example, the message cache may simulate transition between two or more states of the protocol state machine 102 with logic in the status control 116. The message cache may be implemented in hardware, firmware, or software (e.g., field-programmable gate array (FPGA) or application-specific integrated circuit (ASIC)). The message cache 104 continues to mimic the protocol state machine so long as it remains valid, which may be a short time or the entire time the protocol state machine is unavailable, depending on circumstances. Some protocols require updates in the milliseconds range, while others require updates in the seconds range. This embodiment is not limited to any particular protocol or degree of complexity of the status control logic 116.
Figure 3 illustrates the exemplary embodiment of the cache concept 100 of Figure 1 for a control plane failure case, when the protocol state machine 102 is unavailable and the network state is unstable. In this case, the message cache 104 transitions into an invalid state. Based on the received messages 112, the status control 116 determines that some event occurred, making the network state unstable so that simulation of the protocol state machine 102 by the message cache 104 must stop according to the particular protocol implemented. Once the message cache 104 stops simulating the protocol state machine 102, the neighboring nodes 106 may become aware that the protocol state machine 102 is failed or otherwise unavailable, as if no message cache 104 were present.
Figure 4 illustrates an exemplary embodiment of a cache concept 400 for a default case, when two instances of a state machine exist (worker and protection), the worker state machine being active, the protection state machine being standby, and each being associated with a cache. This embodiment is a particular realization of a control plane protocol in a particular context; however, the invention is not limited to any particular implementation. In this embodiment, network availability is improved by caching messages.
This embodiment is in the context of a blade server (not shown); however, the invention is not limited to any particular hardware. A blade server is a server chassis housing multiple thin, modular electronic circuit boards,
468327 1 known as server blades. Each blade is a server on a card, containing processors, memory, integrated network controllers, and input/output (I/O) ports. Blade servers increasingly allow the inclusion of functions, such as network switches and routers as individual blades. The state machines (SMs) for two such blades are shown in Figure 4: a worker state machine 406 for a worker packet switch (PS) 402 and a protection state machine 408 for a protection PS 404. The worker state machine 408 is initially active and the protection state machine 406 is initially standby and soon to become active. The two instances (active/standby) of the protocol state machine are located on different hardware (e.g., CPUs) but still within the same network node.
This embodiment illustrates the worker state machine 406 and the protection state machine 404 for a spanning tree protocol (STP); however, the invention is not limited to any particular protocol. A spanning tree protocol provides a loop free topology for any bridged network. The IEEE standard 802.1 D defines STP. The worker PS 402 and protection PS 404 each include a STP state machine 406, 408 for a specific independent bridge partition (IBP) (e.g., one Ethernet switch instance) and timers 416, 412. A network bridge (a/k/a network switch) connects multiple network segments (e.g., partitions, domains) and forwards traffic from one segment to another. These state machines 406, 408 are in a control plane and create messages for sending to neighboring nodes 106 in the rest of the network 108.
In this embodiment, a worker cache 410 is interposed between the worker state machine 406 and the network 108. Figure 4 illustrates an initial state where the worker state machine 406 is active, sending/receiving messages to/from the network 108 and storing messages in the worker cache 410. The worker cache 410 stores both the messages sent out 412 and the messages received 414. Bridge protocol data units (BPDUs) are the frames that carry the STP information. A switch sends a BPDU frame using a unique MAC address of a port itself as a source address and a destination address of the STP multicast address. A protection cache 418 is synchronized with the worker cache 410 by cache replication for the protection state machine 408, which is in a warm standby state, waiting to be started.
468327 1 10 Figure 5 illustrates the exemplary embodiment of the cache concept 400 of Figure 4 for an intermediate state when the worker state machine 406 was active and failed (e.g., software crash), the protection state machine 408 in standby state is recovering (from standby to full operation), but the network state is stable. This intermediate state occurs, because there is a delay between the time when the worker state machine 406 fails and the time when the protection state machine 408 is ready (i.e.,, started after boot up) to serve the network 108. During this intermediate state, the protection cache 418 is now the active cache and operates as described for Figure 2. Figure 6 illustrates the exemplary embodiment of the cache concept of
Figure 4 when the protection state machine 408 is active and the worker state machine is standby (after a switch over from worker to protection). Comparing Figures 4 and 6, the protection state machine 408 in the scenario illustrated by Figure 6 behaves similarly to the worker state machine 406 in the scenario illustrated by Figure 4, i.e., behaving as the active state machine. The protection cache 418 stores both the messages sent out 420 and the messages received 422 and, thus, operates in the same way as in Figure 4. While the protection state machine 408 is active, messages in the protection cache 418 are replicated to the worker cache 410. Figure 7 is a chart showing selected state transitions and events on a time line for the worker state machine 406, protection state machine 408, and protection cache 418 of Figure 4. (Table 1 below describes Figure 7 in tabular form.) Figure 7 illustrates various combinations of states when the protection cache 418 is valid and can be used temporarily to serve the needs of the network 108 and when the protection cache 418 is invalid and cannot be used. Figure 7 illustrates several scenarios. The first scenario is from Ti to T5, the second is from T5 to T9, and the third is from T9 to T12.
The first scenario starts at Ti. At T-i, when the worker state machine 406 is in an active state and the protection state machine 408 is in a synchronizing state, the protection cache 418 is invalid and replicates the worker cache 410. For example, the protection state machine 408 is initially in the synchronizing state, because the protection PS 404 blade has been added to the network element. When synchronization is completed at T2, the protection state
468327 1 1 1 machine 408 transitions from synchronizing to standby and the protection cache 418 is ready and inactive. When a failure occurs at T3, the worker state machine 406 transitions from active to failed, the protection state machine 408 transitions from standby to starting-up (i.e., preparing to take over the active role), and the protection cache 418 is ready and sending, (i.e., temporarily serving the needs of the network 108). During the interval from T3 onwards, the worker state machine 406 transitions from failed to synchronizing (e.g., as a consequence of a reboot). The exact times do not matter for the anticipated behavior of the network element. They depend on the implementation and, thus, are not shown explicitly. At T4, the protection state machine 408 transitions from starting-up to active and the protection cache 418 is updating (i.e., taking a passive role by continuing the synchronizing with the active protocol state machine 408). During the interval from T3 onwards, the worker state machine 406 transitions from synchronizing to standby. After this is done, at T5, the protection state machine 408 is active and the worker state machine 406 is standby.
The second scenario starts at T5. At T5, the worker state machine 406 is active, the protection state machine 408 is synchronizing, and the protection cache 418 is invalid. At T6, the protection state machine 408 transitions from synchronizing to standby and the protection cache 418 is ready and inactive. When a network reconfiguration occurs at T7 (e.g., a network element fails), the worker state machine 406 transitions from active to reconfiguring and the protection cache 418 becomes invalid at T7. During the interval from T7 to Te, the worker state machine 406 handles changing state in the network. After the network has stabilized at Te, the worker state machine 406 transitions from reconfiguring to active and the protection cache 418 becomes ready and inactive again.
The third scenario starts at T9 and differs from the second scenario in the ordering of the events. At T9, the worker state machine 406 is active, the protection state machine 408 is synchronizing, and the protection cache 418 is invalid. A network reconfiguration occurs during the interval from T9 to Tn. At T-io, the worker state machine 406 transitions from active to reconfiguring. At T-i 1, the protection state machine 408 transitions from synchronizing to standby.
468327 1 12 The protection cache 418 does not transition from invalid to ready, inactive, until T12, when the worker state machine 406 transitions from reconfiguring to active.
Table 1. Description of PS state machine and cache states
Figure imgf000014_0001
In one embodiment, there is one cache instance per independent bridge partition. Each independent bridge partition has its own cache implementation to guarantee independent operations and reconfigurations.
In one embodiment, there are two cache entries per port: one for the incoming PDU and one for the outgoing PDU. Each port has a certain port state. Depending on the state of the bridge, PDUs are sent, received, or both. The cache not only remembers the PDUs that are sent or received, but also that no PDUs have to be sent or received. Note that on some ports PDU
468327 1 13 sending/receiving will stop at some point during the network convergence process, i.e., the cache is filled only after the network converges. In one embodiment, caches are kept in hot-standby mode. In one embodiment, caches carry a flag indicating whether they are valid for PDU generation. Various situations may lead to invalidating the cache, e.g., ongoing reconfigurations in the network, provisioning which demands calculation of the spanning tree and changes in BPDUs, etc.
In one embodiment, the cache on the active PS is updated by incoming and outgoing PDUs. In one embodiment, the cache on the standby PS is immediately invalidated in the following conditions: when network provided PDUs differ from the cache content and when PDUs differ from the cache content. Note that both differences indicate a change in the network, which can only be handled by a working spanning tree state machine. Any replication of outdated PDUs may lead to serious impact on customer traffic and convergence of the spanning tree. For example, loops could be created. Note that it is the cache on the protection (standby) PS that is invalidated in case of an active worker PS. In the case where the worker PS is failing and the protection PS is in transition from standby to active, the protection PS' cache is invalidated. Note that it may be necessary to change all port states to discarding when the cache is invalidated on a just recovering PS.
In one embodiment, the cache may be declared valid only when the topology has converged. During the convergence process, an active state machine is required. Note that the end of the network convergence period can either be told by the protocol state machine or it is derived from a sufficiently long stable network state. This may require tracking changes in PDUs over several seconds. This adds to the time the system (network) is vulnerable for equipment protection switches, but only after a possibly traffic affecting network configuration already happened. Note that after a switch-overswitchover and in a stable network, the PDUs generated from the state machine after its recovery will be unchanged to those in the cache, i.e., in this situation, the topology can be considered converged when both hold. The cache was active and is set to inactive by the first PDU send from the state machine. All PDUs in the cache
468327 1 14 have at least once been updated by PDUs from the state machine since the time the cache was deactivated.
In one embodiment, the cache may be declared valid only when the standby PS is fully synchronized. In one embodiment, there is timer triggering of PDU generation from the cache. In the event that the protection PS status changes to active PDUs is sent from the cache it is flagged valid. To this end an appropriate repetition timer (and distribution over the allowed period) is started. The state in which PDUs are created from the cache starts with the activation status, provided the cache is flagged valid. It ends when either different PDUs are received from the network or when the state machine has fully recovered. This can be recognized by the fact that the state machine starts sending PDUs to the network. The first PDU can be used as a trigger to stop the cache activity, because the state machine is capable of sending out all remaining PDUs in the required time interval.
Figure 8 illustrates an exemplary embodiment of a distributed cache. This example shows how the message cache may be distributed within a system as opposed to a single message cache for a system. In this example, the periodic message cache 810 is distributed on two input/output (I/O) packs 802. The number of I/O packs is, of course, not limited to two. Each I/O pack 802 includes packet forwarding hardware 810 and a board controller 808. A local node 804 includes packet forwarding hardware 812 and one or more central packet control plane processors 814. The central packet control plane processor 814 sends updates to the periodic message caches 810 on the board controllers 808 of the I/O packs 802. The periodic message cache 810 sends outgoing periodic messages via packet forwarding hardware 810 in the I/O pack 802. In this way, the periodic message caches 810 simulate a control plane protocol, when the control plane state machine is unavailable or fails. Application protocols include any protocols that have periodic outgoing messages with constant contents, such as (R)STP, GVRP, RSVP, open shortest path first (OSPF), intermediate system-to-intermediate system (IS-IS or ISIS), Y.1711 , FFD, etc. Of course, message caches may be implemented broadly in many other ways for many different system architectures. For
468327 1 15 example, message caches may be on several hardware blades, on several computer processing units (CPUs), on several threads within one CPU, in FPGAs, ASICs and the like.
Embodiments of the present invention may be implemented in one or more computers in a network system. Each computer comprises a processor as well as memory for storing various programs and data. The memory may also store an operating system supporting the programs. The processor cooperates with conventional support circuitry such as power supplies, clock circuits, cache memory, and the like as well as circuits that assist in executing the software routines stored in the memory. As such, it is contemplated that some of the steps discussed herein as software methods may be implemented within hardware, for example, as circuitry that cooperates with the processor to perform various method steps. The computer also contains input/output (I/O) circuitry that forms an interface between the various functional elements communicating with the computer. Embodiments of the present invention may also be implemented in hardware or firmware, e.g., in FPGAs or ASICs.
The present invention may be implemented as a computer program product wherein computer instructions, when processed by a computer, adapt the operation of the computer such that the methods and/or techniques of the present invention are invoked or otherwise provided. Instructions for invoking the inventive methods may be stored in fixed or removable media, transmitted via a data stream in a broadcast media or other signal-bearing medium, and/or stored within a working memory within a computing device operating according to the instructions. While the foregoing is directed to various embodiments of the present invention, other and further embodiments of the invention may be devised without departing from the basic scope thereof. As such, the appropriate scope of the invention is to be determined according to the claims, which follow.
468327 1 16

Claims

What is claimed is:
1. A method for providing uninterrupted network control message generation during local node outages, comprising: receiving a plurality of sent messages from a protocol state machine; forwarding the sent messages to a plurality of nodes in a network; receiving a plurality of received messages from the nodes; storing the sent and received messages in a buffer; and sending messages to and receiving messages from the nodes, upon failure of the protocol state machine, so long as the buffer remains valid.
2. The method of claim 1 , wherein the messages are sent periodically to the nodes.
3. The method of claim 1 , further comprising: determining whether the buffer is valid based on the sent and received message in the buffer and messages received from the nodes after the failure.
4. The method of claim 1 , further comprising: switching to a standby protocol state machine, upon failure of the protocol state machine, the standby protocol state machine including another buffer including replicas of the sent and received messages.
5. A system for providing uninterrupted network control message generation during local node outages, comprising: a protocol state machine for generating a plurality of messages; a message cache for receiving the messages from the protocol state machine and forwarding them to a plurality of nodes in a network, the message cache storing both messages sent to the nodes and messages received from the nodes in at least one buffer; wherein the message cache sends messages to and receives message from the nodes, upon failure of the protocol state machine, so long as the message cache remains valid.
468327 1 17
6. The system of claim 5, wherein the message cache includes a timer for sending periodic messages to the nodes.
7. The system of claim 5, wherein the message cache includes a status control for determining whether the message cache is valid.
8. The system of claim 7, wherein the protocol state machine is a worker protocol state machine, the message cache is a worker message cache, and a worker node includes the worker protocol state machine and the worker message cache; and further comprising: a protection node including a protection protocol state machine and a protection message cache; wherein the protection state machine is able to become active upon failure of the worker protocol state machine.
9. The system of claim 7, wherein the protection message cache replicates the worker message cache, while the worker protocol state machine is active.
10. A computer readable medium storing instructions for performing a method for providing uninterrupted network control message generation during local node outages, the method comprising: receiving a plurality of sent messages from a protocol state machine; forwarding the sent messages to a plurality of nodes in a network; receiving a plurality of received messages from the nodes; storing the sent and received messages in a buffer; and sending messages to and receiving messages from the nodes, upon failure of the protocol state machine, so long as the message cache remains valid.
468327 1 18
PCT/US2006/020681 2006-05-30 2006-05-30 Uninterrupted network control message generation during local node outages WO2007139542A1 (en)

Priority Applications (5)

Application Number Priority Date Filing Date Title
KR1020087029207A KR101017540B1 (en) 2006-05-30 2006-05-30 Uninterrupted network control message generation during local node outages
CNA2006800547591A CN101461196A (en) 2006-05-30 2006-05-30 Uninterrupted network control message generation during local node outages
PCT/US2006/020681 WO2007139542A1 (en) 2006-05-30 2006-05-30 Uninterrupted network control message generation during local node outages
EP06771449A EP2030378A4 (en) 2006-05-30 2006-05-30 Uninterrupted network control message generation during local node outages
JP2009513106A JP2009539305A (en) 2006-05-30 2006-05-30 Uninterrupted network control message generation during local node outage

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/US2006/020681 WO2007139542A1 (en) 2006-05-30 2006-05-30 Uninterrupted network control message generation during local node outages

Publications (1)

Publication Number Publication Date
WO2007139542A1 true WO2007139542A1 (en) 2007-12-06

Family

ID=38778944

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2006/020681 WO2007139542A1 (en) 2006-05-30 2006-05-30 Uninterrupted network control message generation during local node outages

Country Status (5)

Country Link
EP (1) EP2030378A4 (en)
JP (1) JP2009539305A (en)
KR (1) KR101017540B1 (en)
CN (1) CN101461196A (en)
WO (1) WO2007139542A1 (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120102482A1 (en) * 2009-06-25 2012-04-26 Zte Corporation Method for Communication System Service Upgrade and Upgrade Container Device Thereof
EP2798801A4 (en) * 2011-12-28 2015-05-20 Hangzhou H3C Tech Co Ltd Graceful restart (gr) methods and devices
WO2015106822A1 (en) * 2014-01-17 2015-07-23 Nokia Solutions And Networks Management International Gmbh Controlling of communication network comprising virtualized network functions
US9860336B2 (en) 2015-10-29 2018-01-02 International Business Machines Corporation Mitigating service disruptions using mobile prefetching based on predicted dead spots
CN109889367A (en) * 2019-01-04 2019-06-14 烽火通信科技股份有限公司 The method and system of LACP NSR are realized in the distributed apparatus for not supporting NSR
US10534598B2 (en) 2017-01-04 2020-01-14 International Business Machines Corporation Rolling upgrades in disaggregated systems
US11153164B2 (en) 2017-01-04 2021-10-19 International Business Machines Corporation Live, in-line hardware component upgrades in disaggregated systems

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP5728783B2 (en) * 2011-04-25 2015-06-03 株式会社オー・エフ・ネットワークス Transmission apparatus and transmission system

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020089990A1 (en) 2001-01-11 2002-07-11 Alcatel Routing system providing continuity of service for the interfaces associated with neighboring networks
US20040087304A1 (en) * 2002-10-21 2004-05-06 Buddhikot Milind M. Integrated web cache
US6757248B1 (en) * 2000-06-14 2004-06-29 Nokia Internet Communications Inc. Performance enhancement of transmission control protocol (TCP) for wireless network applications
US20050201375A1 (en) 2003-01-14 2005-09-15 Yoshihide Komatsu Uninterrupted transfer method in IP network in the event of line failure
US20050243722A1 (en) * 2004-04-30 2005-11-03 Zhen Liu Method and apparatus for group communication with end-to-end reliability
US7050187B1 (en) * 2000-04-28 2006-05-23 Texas Instruments Incorporated Real time fax-over-packet packet loss compensation

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH11136258A (en) * 1997-11-04 1999-05-21 Fujitsu Ltd Cell read synchronization control method
JP4021841B2 (en) * 2003-10-29 2007-12-12 富士通株式会社 Control packet processing apparatus and method in spanning tree protocol
JP3932994B2 (en) * 2002-06-25 2007-06-20 株式会社日立製作所 Server handover system and method
JP2005341282A (en) * 2004-05-27 2005-12-08 Nec Corp System changeover system

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7050187B1 (en) * 2000-04-28 2006-05-23 Texas Instruments Incorporated Real time fax-over-packet packet loss compensation
US6757248B1 (en) * 2000-06-14 2004-06-29 Nokia Internet Communications Inc. Performance enhancement of transmission control protocol (TCP) for wireless network applications
US20020089990A1 (en) 2001-01-11 2002-07-11 Alcatel Routing system providing continuity of service for the interfaces associated with neighboring networks
US20040087304A1 (en) * 2002-10-21 2004-05-06 Buddhikot Milind M. Integrated web cache
US20050201375A1 (en) 2003-01-14 2005-09-15 Yoshihide Komatsu Uninterrupted transfer method in IP network in the event of line failure
US20050243722A1 (en) * 2004-04-30 2005-11-03 Zhen Liu Method and apparatus for group communication with end-to-end reliability

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP2030378A4

Cited By (25)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2448175A1 (en) * 2009-06-25 2012-05-02 ZTE Corporation Method for communication system service upgrade and upgrade container device thereof
EP2448175A4 (en) * 2009-06-25 2012-11-28 Zte Corp Method for communication system service upgrade and upgrade container device thereof
US20120102482A1 (en) * 2009-06-25 2012-04-26 Zte Corporation Method for Communication System Service Upgrade and Upgrade Container Device Thereof
US9225590B2 (en) 2011-12-28 2015-12-29 Hangzhou H3C Technologies Co., Ltd. Graceful restart (GR) methods and devices
EP2798801A4 (en) * 2011-12-28 2015-05-20 Hangzhou H3C Tech Co Ltd Graceful restart (gr) methods and devices
KR101954314B1 (en) 2014-01-17 2019-03-05 노키아 솔루션스 앤드 네트웍스 게엠베하 운트 코. 카게 Controlling of communication network comprising virtualized network functions
WO2015106822A1 (en) * 2014-01-17 2015-07-23 Nokia Solutions And Networks Management International Gmbh Controlling of communication network comprising virtualized network functions
US20160344587A1 (en) 2014-01-17 2016-11-24 Nokia Solutions And Networks Management International Gmbh Controlling of communication network comprising virtualized network functions
US10581677B2 (en) 2014-01-17 2020-03-03 Nokia Solutions And Networks Gmbh & Co. Kg Controlling of communication network comprising virtualized network functions
KR20180023068A (en) * 2014-01-17 2018-03-06 노키아 솔루션스 앤드 네트웍스 게엠베하 운트 코. 카게 Controlling of communication network comprising virtualized network functions
KR20180023069A (en) * 2014-01-17 2018-03-06 노키아 솔루션스 앤드 네트웍스 게엠베하 운트 코. 카게 Controlling of communication network comprising virtualized network functions
KR101868918B1 (en) * 2014-01-17 2018-07-20 노키아 솔루션스 앤드 네트웍스 게엠베하 운트 코. 카게 Controlling of communication network comprising virtualized network functions
US10652088B2 (en) 2014-01-17 2020-05-12 Nokia Solutions And Networks Gmbh & Co. Kg Controlling of communication network comprising virtualized network functions
KR101954310B1 (en) * 2014-01-17 2019-03-05 노키아 솔루션스 앤드 네트웍스 게엠베하 운트 코. 카게 Controlling of communication network comprising virtualized network functions
US10652089B2 (en) 2014-01-17 2020-05-12 Nokia Solutions And Networks Gmbh & Co. Kg Controlling of communication network comprising virtualized network functions
US10432458B2 (en) 2014-01-17 2019-10-01 Nokia Solutions And Networks Gmbh & Co. Kg Controlling of communication network comprising virtualized network functions
KR102061655B1 (en) * 2014-01-17 2020-01-02 노키아 솔루션스 앤드 네트웍스 게엠베하 운트 코. 카게 Controlling of communication network comprising virtualized network functions
KR102061661B1 (en) 2014-01-17 2020-01-02 노키아 솔루션스 앤드 네트웍스 게엠베하 운트 코. 카게 Controlling of communication network comprising virtualized network functions
KR20160110476A (en) * 2014-01-17 2016-09-21 노키아 솔루션스 앤드 네트웍스 매니지먼트 인터내셔널 게엠베하 Controlling of communication network comprising virtualized network functions
US9860336B2 (en) 2015-10-29 2018-01-02 International Business Machines Corporation Mitigating service disruptions using mobile prefetching based on predicted dead spots
US10534598B2 (en) 2017-01-04 2020-01-14 International Business Machines Corporation Rolling upgrades in disaggregated systems
US10970061B2 (en) 2017-01-04 2021-04-06 International Business Machines Corporation Rolling upgrades in disaggregated systems
US11153164B2 (en) 2017-01-04 2021-10-19 International Business Machines Corporation Live, in-line hardware component upgrades in disaggregated systems
CN109889367A (en) * 2019-01-04 2019-06-14 烽火通信科技股份有限公司 The method and system of LACP NSR are realized in the distributed apparatus for not supporting NSR
CN109889367B (en) * 2019-01-04 2021-08-03 烽火通信科技股份有限公司 Method and system for realizing LACP NSR in distributed equipment not supporting NSR

Also Published As

Publication number Publication date
KR101017540B1 (en) 2011-02-28
EP2030378A4 (en) 2010-01-27
EP2030378A1 (en) 2009-03-04
KR20090016676A (en) 2009-02-17
CN101461196A (en) 2009-06-17
JP2009539305A (en) 2009-11-12

Similar Documents

Publication Publication Date Title
KR101099822B1 (en) Redundant routing capabilities for a network node cluster
US7304940B2 (en) Network switch assembly, network switching device, and method
US7453797B2 (en) Method to provide high availability in network elements using distributed architectures
US8873377B2 (en) Method and apparatus for hitless failover in networking systems using single database
US6941487B1 (en) Method, system, and computer program product for providing failure protection in a network node
US7269133B2 (en) IS-IS high availability design
WO2007139542A1 (en) Uninterrupted network control message generation during local node outages
US7417947B1 (en) Routing protocol failover between control units within a network router
JP4021841B2 (en) Control packet processing apparatus and method in spanning tree protocol
US20110134931A1 (en) Virtual router migration
US20050050136A1 (en) Distributed and disjoint forwarding and routing system and method
JPH11154979A (en) Multiplexed router
JP2005503055A (en) Method and system for implementing OSPF redundancy
JP5941404B2 (en) Communication system, path switching method, and communication apparatus
WO2011157151A2 (en) Method, device and system for realizing disaster-tolerant backup
WO2011120423A1 (en) System and method for communications system routing component level high availability
JP2006246152A (en) Packet transfer apparatus, packet transfer network, and method for transferring packet
US7184394B2 (en) Routing system providing continuity of service for the interfaces associated with neighboring networks
CN113992571B (en) Multipath service convergence method, device and storage medium in SDN network
US11979286B1 (en) In-service software upgrade in a virtual switching stack
KR100917603B1 (en) Routing system with distributed structure and control method for non-stop forwarding thereof
JP2015138987A (en) Communication system and service restoration method in communication system

Legal Events

Date Code Title Description
WWE Wipo information: entry into national phase

Ref document number: 200680054759.1

Country of ref document: CN

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 06771449

Country of ref document: EP

Kind code of ref document: A1

DPE2 Request for preliminary examination filed before expiration of 19th month from priority date (pct application filed from 20040101)
WWE Wipo information: entry into national phase

Ref document number: 2006771449

Country of ref document: EP

WWE Wipo information: entry into national phase

Ref document number: 2009513106

Country of ref document: JP

Ref document number: 1020087029207

Country of ref document: KR

NENP Non-entry into the national phase

Ref country code: DE