US20220200885A1 - High availability router switchover decision using monitoring and policies - Google Patents
High availability router switchover decision using monitoring and policies Download PDFInfo
- Publication number
- US20220200885A1 US20220200885A1 US17/127,946 US202017127946A US2022200885A1 US 20220200885 A1 US20220200885 A1 US 20220200885A1 US 202017127946 A US202017127946 A US 202017127946A US 2022200885 A1 US2022200885 A1 US 2022200885A1
- Authority
- US
- United States
- Prior art keywords
- node
- network
- standby
- active
- parameter
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L45/00—Routing or path finding of packets in data switching networks
- H04L45/22—Alternate routing
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
- H04L41/06—Management of faults, events, alarms or notifications
- H04L41/0654—Management of faults, events, alarms or notifications using network fault recovery
- H04L41/0663—Performing the actions predefined by failover planning, e.g. switching to standby network elements
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
- H04L41/08—Configuration management of networks or network elements
- H04L41/0803—Configuration setting
- H04L41/0813—Configuration setting characterised by the conditions triggering a change of settings
- H04L41/082—Configuration setting characterised by the conditions triggering a change of settings the condition being updates or upgrades of network functionality
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
- H04L41/08—Configuration management of networks or network elements
- H04L41/0894—Policy-based network configuration management
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L43/00—Arrangements for monitoring or testing data switching networks
- H04L43/08—Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters
- H04L43/0805—Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters by checking availability
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L43/00—Arrangements for monitoring or testing data switching networks
- H04L43/08—Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters
- H04L43/0805—Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters by checking availability
- H04L43/0817—Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters by checking availability by checking functioning
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L43/00—Arrangements for monitoring or testing data switching networks
- H04L43/10—Active monitoring, e.g. heartbeat, ping or trace-route
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L45/00—Routing or path finding of packets in data switching networks
- H04L45/02—Topology update or discovery
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L45/00—Routing or path finding of packets in data switching networks
- H04L45/44—Distributed routing
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
- H04L41/40—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks using virtualisation of network functions or resources, e.g. SDN or NFV entities
Definitions
- the embodiments herein relate to data communications through networks and switching traffic over to a different node in the event of a network failure.
- Network data communications have moved beyond asynchronous text and file transfer traffic to time-sensitive traffic such as streaming and interactive communications.
- a VNF Virtual Network Function
- a VRRP Virtual Router Redundancy Protocol
- a VRRP Virtual Router Redundancy Protocol is used to assign routes through virtual routers denoting some virtual routers as masters and others as backups.
- High Availability (HA) techniques are intended to provide uninterrupted internet data communications service in the event of failures. HA attempts to compensate for the design of the Internet to provide reliability similar to that provided by telephony service at the lower cost of the Internet.
- HA techniques the states of an active node of a wide area network (WAN) or metropolitan area network (MAN) are replicated on a standby node that is coupled to the same network in order to provide a seamless switchover to the standby node in the event of a failure at the active node.
- a switchover from the active node to a standby node is made based on the health of the processes running on the active node or based on a failure of network interfaces coupled to the active node. After the switchover, the active node becomes a standby node and the standby node takes the place of the active node, becoming active and maintaining the state of the active node.
- the new active node carries the data communications traffic instead of the original active node.
- High availability router switchover decisions are described using monitoring and policies.
- available network routing parameters are monitored.
- a change of one of the network routing parameters is detected.
- a parameter matrix of network routing parameters is updated in response to the detected change.
- the changed network routing parameters are applied to a policy in response to updating the parameter matrix and a switchover request is sent to a standby network node when the policy is mite by the changed network routing parameter.
- Some embodiments include generating an interrupt at a system monitor of the active node and sending the interrupt to a high availability module of the active node when the change in a network routing parameter is detected and wherein applying the changed network routing parameter to the policy is performed by the high availability module.
- Some embodiments include applying network routing parameters of the standby network node to a policy at the network standby node and sending a switchover request acknowledgment to the active network node from the standby network node when the standby network node policy is not met by the standby network node network routing parameters.
- Some embodiments include sending a portion of the parameter matrix as a capability matrix to the standby network node for comparison with a capability matrix of the standby node.
- the switchover request is rejected by the standby network node based on the comparison.
- the comparison includes comparing network capabilities of the active network node to the capabilities of the standby network node.
- applying comprises applying a count of the changed network routing parameter to a rule with a comparison operator to determine if the comparison is met. In some embodiments, applying comprises applying a plurality of counts of the parameter matrix to a sequence of rules having comparison operators. In some embodiments, the rules of the sequence are conditional upon meeting a preceding rule of the sequence.
- the plurality of network routing parameters comprises a status of the active network node as a VRRP (Virtual Router Redundancy Protocol) master or backup node and wherein detecting a change comprises detecting when the status of the active network node changes.
- the plurality of network routing parameters comprises availability of a connected routing peer.
- monitoring the availability of the connected routing peer comprises sending a ping to the connected routing peer.
- the connected routing peer comprises a next hop border gateway protocol neighbor router.
- a non-transitory computer-readable storage medium containing program instructions, wherein execution of the program instructions by the computer causes the computer to perform operations comprising monitoring a plurality of network routing parameters that are available to an active network node, detecting a change of one of the plurality of network routing parameters, updating a parameter matrix of network routing parameters in response to the detected change, applying the changed network routing parameter to a policy in response to updating the parameter matrix, and sending a switchover request to a standby network node when the policy is met by the changed network routing parameter.
- applying comprises applying a count of the changed network routing parameter to a rule with a comparison operator to determine if the comparison is met. In some embodiments, applying comprises applying a plurality of counts of the parameter matrix to a sequence of rules having comparison operators.
- an active network node includes a system monitor configured to monitor a plurality of network routing parameters that are available to the active network node and to detect a change of one of the plurality of network routing parameters, and a high availability module configured to update a parameter matrix of network routing parameters in response to the detected change, to apply the changed network routing parameter to a policy in response to updating the parameter matrix, and to send a switchover request to a standby network node when the policy is met by the changed network routing parameter.
- the system monitor further generates an interrupt and sends the interrupt to the high availability module when the change in a network routing parameter is detected.
- the high availability node further sends a portion of the parameter matrix as a capability matrix to the standby network node for comparison with a capability matrix of the standby node.
- FIG. 1 is a block diagram of a network suitable for use with the present invention
- FIG. 2 is a process flow diagram of making a switchover decision using policies and optionally a matrix according to embodiments of the present invention
- FIG. 3 is a sequence diagram of two nodes performing a switchover in response to a network interface and remote host interface going down according to embodiments of the invention
- FIG. 4 is an alternative sequence diagram of two nodes performing a switchover in response to a network interface and remote host interface going down according to embodiments of the invention
- FIG. 5 is another alternative sequence diagram of two nodes performing a switchover in response to a network interface and remote host interface going down according to embodiments of the invention
- FIG. 6 is a sequence diagram of two nodes performing a switchover in response to a connected BGP peer route being withdrawn according to embodiments of the invention
- FIG. 7 is a sequence diagram of two nodes performing a switchover in response to a VRRP status change according to embodiments of the invention.
- FIG. 8 is a block diagram of a network node according to embodiments of the invention.
- SD-WAN Software Defined Wide Area Network
- hub node for each of the branch nodes and each hub node potentially acting as a gateway to a plurality of branch nodes.
- branch nodes themselves may have direct access to the Internet through one or more WAN links.
- embodiments disclosed herein can be applied in non-software-defined WANs and for applications hosted within the network, e.g., within a LAN (Local Area Network).
- a High Availability (HA) system tracks multiple circumstances regarding the data communications traffic through the network. These circumstances include the health of the running processes on the active node, the VRRP (Virtual Router Redundancy Protocol) state, and the availability of other resources that may be used to send or receive traffic.
- the availabilities may include the availability of network interfaces, the availability of connected routing peers, for example BGP (Border Gateway Protocol) neighbor peers, the availability of routes, and the availability of remote hosts.
- BGP Border Gateway Protocol
- a missing route can lead to brownouts of an active VNF (Virtual Network Function) when the route is withdrawn. Failed interfaces can lead to a blackout.
- a failed remote host can cause a brownout for an active VNF.
- a stateful HA may track the availability of BGP neighbor peers, VRRP state, and other factors and decide whether to remain active or transition to a standby role by comparing these factors to those of a standby node. The comparison may be initiated by the active node, for example a VNF, negotiating with a standby node, for example another VNF, to transition to active when the standby node has better data communication metrics.
- VRRP tracks the configured routes and network interface availability to decide whether to transition a node to backup or to stay in the master role.
- FIG. 1 is a simplified diagram of a wired or wireless network in which multiple nodes, for example routers, have redundant paths between multiple clients and the cloud or the Internet.
- network communication devices are referred to as nodes.
- a node may be a router or it may be another device that receives data traffic and sends the data traffic to another node.
- nodes are intended to include “routers” and also a variety of other physical or virtual devices to which the techniques and structures herein may apply.
- a first node 102 is an active VNF that is coupled to a second node 104 that is a standby VNF through a high availability (HA) link 106 .
- the first and the second nodes 102 , 104 may be physical nodes or virtualized resources.
- the first and the second nodes 102 , 104 also communicate on the link through VRRP and have a status relationship in which the first node 102 is a VRRP master and the second node 104 is a VRRP backup.
- the first and the second nodes are both coupled to a client network 108 that includes multiple clients C 1 , C 2 , C 3 .
- the first and the second nodes are both coupled to an AR (Access Router) 112 through E-BGP (External Border Gateway Protocol) 110 .
- the AR 112 is coupled to a second client network 114 with clients C 4 , C 5 .
- the first and the second nodes 102 , 104 are coupled to a first PE (Provider Edge) 122 and a second PE 124 .
- the first and second PEs 122 , 124 are coupled through a WAN (Wide Area Network) 120 to further external resources, for example the Internet 128 .
- the first and second PEs 122 , 124 are both connected to each of the first and second nodes 102 , 104 .
- the first node 102 is coupled to both the first and the second PE 122 , 124 using E-BGP.
- the second node 104 is also coupled to both the first and the second PE 122 , 124 using E-BGP.
- the simplified diagram shows how the first and the second nodes 102 , 104 are connected so that if either one fails, then the other one can make all of the same connections. The same is true of the first and the second PEs 122 , 124 . In addition, if there is a failure in any of the links used by one of the nodes then there is likely to be an alternative link coupled to the other one of the nodes that can be used as an alternative. While the first and the second nodes 102 , 104 are indicated as being configured as VNFs, alternative configurations may be used to suit different implementations.
- FIG. 2 is a process flow diagram of making a switchover decision using policies and optionally a matrix.
- a process begins at 202 with configuration of one or more switchover policies. These polices may be adapted to suit the configuration of the network and the availability of protocols, routes, neighbor routers and other resources.
- the policies at 202 are designated for a particular node and different nodes in a network may have different policies. In the present example, the policies are configured for the first node 102 and identical policies are configured for the second node 104 . At least some of the policies have an input and produce an output based on the input.
- the policy inputs are monitored. These inputs may include multiple characteristics of the network that affect the ability of the node to support traffic through the network from and to any other nodes. In the present example, four such network characteristics are monitored but embodiments may have more or fewer or different characteristics.
- the first characteristic is the availability of a BGP or OSPF (Open Shortest Path First) neighbor or next hop router. In some embodiments, multiple neighbors may be monitored. As examples, the AR 112 and the first and second PEs 122 , 124 may be monitored. A status update on any monitored BGP or OSPF neighbor is tested at 206 . If the route to the neighbor has been removed, then the process updates a parameter matrix at 220 that is maintained by or for the node.
- BGP or OSPF Open Shortest Path First
- a second characteristic is the VRRP status of the node.
- the primary traffic should be routed through a VRRP master node. This status is monitored so that when the status of the node is changed to backup, the policy input monitor generates an update and the status is tested at 208 . If the status is backup then the process updates the parameter matrix at 220 .
- a third characteristic is the status of the network interfaces to which the node connects.
- the monitor policy inputs operation 204 sends messages to interfaces of connected nodes and monitors their status.
- the parameter matrix is updated at 220 .
- the interface may be down due to a broken connection or a restart of the connection or of the other node.
- the status of one or more interfaces may be monitored depending on the network configuration.
- a fourth characteristic is the status of a remote host, for example, an intermediate router, or any other connected router. As with the other characteristics there may be one or more remote hosts that are monitored.
- the monitor policy inputs operation 204 pings the remote host at time intervals. When the ping indicates that the remote host is no longer available then a test at 212 indicates that the host is down and the parameter matrix is updated at 220 .
- the node maintains a parameter matrix of one or more of the characteristics shown herein. These characteristics are also referred to as network routing parameters, however the parameter matrix may also contain other information of similar and different kinds.
- the values in the parameter matrix may be applied to one or more policies at 222 .
- the policies may be in the form of testing a parameter matrix count against a rule. If the rule is not satisfied, then the policy is not met and the process returns to monitor the policy inputs.
- a policy may have multiple rules that are applied in a specific sequence or that are conditional on other rules.
- the process optionally goes to a capability matrix comparison at 224 at which the process determines if a capability matrix of a standby node is better than a capability matrix at this active node.
- the capability matrix contains information about the availability of network resources and may have the same information as the parameter matrix or less information. The policies in this process are met when there is a failure or reduction in the available resources. If the standby node capability matrix shows that the standby node has experienced the same or more failures or resource reduction, then at 228 that active node will stay active and there is no switchover.
- the capability matrix shows the network capabilities of the respective node.
- the capability matrix comparison allows capabilities of the nodes to be compared using a convenient matrix configuration. However, a matrix is used herein for ease of understanding. The capabilities of an active node may be compared to the capabilities of a standby node using data configurations other than or, in addition to, a matrix.
- the standby node has a better capability matrix in that it has access to more or better network resources.
- a switchover to the standby node is performed.
- the traffic that was carried by the active node is moved over to the standby node.
- the standby node becomes the active node and the active node becomes a standby node.
- the master becomes the backup and the backup is changed to the master.
- the switchover is performed with a transfer of state but without any restart. This improves availability because the nodes stay active through the process.
- FIG. 2 is directed to a process that is performed at the active or master node.
- the capability matrix is compared to a capability matrix that was maintained by an associated standby or backup node.
- the standby or backup node performs the same operations as described for the active or master node as shown in FIG. 2 so that its capability matrix is maintained and ready for a valid comparison.
- the monitor policy inputs operation 204 is performed for the network characteristics that are important to the operation of the standby node as a standby for the active node.
- the configured policies at 202 may be the same or different to suit the network configuration. The interaction between the two nodes is described in more detail below.
- Table 1 is a simplified example of a parameter matrix for a first active node and Table 2 is a simplified example of a parameter matrix for a second standby node. These nodes may correspond to the active VNF 102 and Standby VNF of FIG. 1 as well as to the Active Node 302 and Standby Node 304 of FIG. 3 and the other sequence diagrams.
- the parameter matrix is presented as a two-dimensional table for ease of understanding but may take any of a variety of other forms, including unstructured text strings, metadata, and configuration registers. There may be more or fewer entries to suit different network configurations. In this example, the rows are selected to align with the policy inputs of FIG. 2 . More or fewer policy inputs may be used.
- Table 1 are values that were determined by the first node and will be applied to the policies configured for the first node, an active node.
- the values in Table 2 were determined by the second node and will be applied to the policies configured for the second node which is a VRRP backup node with respect to the first node but a VRRP Master for other purposes.
- Table 1 shows values as counts for network interfaces, routes, monitored remote hosts and VRRP group status. Each time one of the counts changes, the parameter matrix is updated as at block 220 and the policies are applied as at block 222 .
- the nodes may have any of a wide range of different policies active for managing operations including the network interfaces, routes, remote hosts, and groupings, for example VRRP groups, mentioned above.
- a particular set of policies may be configured to support switchover for high availability and while a few examples are provided more, fewer, and different policies may be used.
- the switchover policies are based around the characteristics discussed above of interfaces, routes, remote host monitoring and VRRP group status.
- An example policy would be that if the VRRP master count is less than three then go to a switchover. In Table 1, the VRRP master count is 2 so, upon updating the count in the parameter matrix from 3 to 2, the policy would be met and a switchover would be invoked. Similarly, an interface count of less than 3 may invoke a switchover.
- the below example considers only the count for the number of active network interfaces that are available to the node. If there are not enough network interfaces, then the traffic is switched over to a node that has more network interfaces.
- the tracked network characteristics are first named and a tracking function is established. This tracking applies to all of the examples and in some embodiments more characteristics are tracked.
- the rule applies a comparison operator, less than, to the interface count. If the rule is not met, then the switchover is rejected. The rule may be reversed to apply a greater than or equal comparison operator to the count that results in a switchover rejection. The rule may also be written to use different comparison operators.
- Track-interfaces >Inf-1, Inf-2, Inf-3, Inf-4, Inf-5
- Track remote host monitors >H1, H2, H3, H4, H5 switchover policy begin rule-1 begin if interface count less than 3; then switchover endif rule-1 end
- This example policy has a single rule and the rule is to switch over if there are less than three active network interfaces.
- the below example considers the number of active network interfaces and the number of active remote hosts detected by the remote host monitor.
- the rules are sequential in that rule 2 is not assessed unless rule 1 is not satisfied.
- This policy may be stated as first, a switchover is declared if there are less than three out of the five total network interfaces in an UP state (i.e., any two interfaces go down). Second, if the first rule is not satisfied, then a switchover is declared if any one interface goes down and any two of the five total tracked host monitors go down. The first rule applies a less than comparison operator to the network interface count and the second rule applies a less than comparison operator to the remote host monitor count. If the rule is not met then switchover is rejected.
- the first rule may be made a condition precedent of the second rule. In other words, the second rule may be conditional on the first rule being met.
- Track-interfaces >Inf-1, Inf-2, Inf-3, Inf-4, Inf-5
- Track remote host monitors >H1, H2, H3, H4, H5 switchover-policy begin rule-1 begin if interface-count less-than 3; then switch-over endif rule-1 end rule-2 begin if interface-count less-than 4 and remote-host-monitor-count less-than 3; then switch-over endif rule-2 end switchover policy end
- a third example considers only the number of VRRP groups that are in Master state at the active node.
- This third example policy may be invoked.
- This policy has a single rule that if any one VRRP group transitions to Backup from the three Master VRRP groups being tracked, then a switchover process is started. This rule applies a less than comparison operator to the VRRP group master count.
- policies may also be combined in any particular sequence as shown, for example, in the second example. There may be more than two rules in any policy. While the three example policies relate to VRRP status, network interface status and remote host monitor, other monitored status events may be used as inputs. One such input is a BGP/OSPF route status or event but there may be many others.
- the process at the active node may optionally go to a capability matrix comparison 224 .
- the capability matrix may have many more or fewer entries than are necessary to evaluate the policies and may have configurations that are different from a parameter matrix.
- a capability matrix is presented here for ease of understanding. For comparison only a portion of the parameter matrix of Table 1 and Table 2 is needed. Table 3 is a portion of the parameter matrix of Table 1 maintained by the active node and Table 4 is a portion of the parameter matrix of Table 2 maintained by the standby node.
- the values are the same except that Table 3 corresponding to the active node has more active routes than Table 4 corresponding to the standby node.
- Table 3 corresponding to the active node has more active routes than Table 4 corresponding to the standby node.
- the standby node has worse accessibility and a switchover from the active node to the standby node will degrade traffic availability.
- the operation of the process of FIG. 2 will be to reject the switchover because the standby node capability matrix is not better and for the current node to stay in the active role at 228 .
- the values used for the capability matrix comparison may be modified to suit particular node and network implementations.
- the VRRP status is not used for comparison because it is not relevant to a node's availability to process traffic.
- the VRRP group state will always differ between the active node and the standby node because one would be in the Master state and other in the Backup state. In such an event, a comparison is not useful
- FIG. 3 is a sequence diagram of two nodes performing a switchover in response to a network interface and remote host interface going down.
- the sequence diagram shows an example of the operations of FIG. 2 in a particular example. These operations include monitoring network interfaces and remote hosts, updating the parameter matrix to reflect changes, applying the changes as policy inputs, comparing matrices, and performing a switchover to a standby node.
- FIG. 3 has an active node 302 , a standby node 304 , and a remote host 306 , for example a next hop BGP router, all connected through a network, for example the network of FIG. 1 or any other suitable data communications network.
- the active node includes a High Availability (HA) module 312 , a VRRP module 314 , a routing module 316 , and a system monitor 318 .
- the standby node 304 includes a HA module 322 , a VRRP module 324 , a routing module 326 , and a system monitor 328 .
- the modules may have dedicated physical hardware resources, dedicated virtual resources, or may exist as portions of larger systems.
- the sequence diagram shows only certain example signals to illustrate particular parts of the system operation. Many more signals may be sent and received before during and after the signals described herein.
- the VRRP module 314 of the active node 302 sends VRRP advertisements 330 to the standby node 304 VRRP module 324 . In the same way advertisements may be sent to many other modules (not shown).
- the first trigger in the sequence occurs when the system monitor 318 detects 331 that a network interface is down.
- the active node 302 system monitor 318 sends a notification 332 to the routing module 316 , a notification 333 to the VRRP module 314 and a notification 334 to the HA module 312 .
- the system monitor 318 is performing an operation 204 of monitoring the policy inputs and determining whether there are updates or status changes 206 , 208 , 210 , 212 as shown in FIG. 2 .
- the system monitor operates as a background application, for example a daemon, and generates and sends interrupts or alerts to the modules 312 , 314 , 316 of the active node 302 in which it operates.
- the VRRP module 314 In response to receiving an interface down notification 333 , the VRRP module 314 changes the corresponding VRRP group state of the active node 302 from the master state to the standby state.
- the HA module 312 of the active node 302 updates 335 the active node 302 parameter matrix with the new network interface value. In this case, the value is reduced by one.
- the HA module 312 evaluates 336 the policies using the new value for network interfaces as input to the policies.
- the rules do not match. The policies are not met and no switchover is taken. The active node 302 stays active. Alternatively, if the rules do match, then a switchover may be requested as shown, for example, at 347 .
- the system monitor 318 of the active node 302 pings the remote host 306 . While only one remote host 306 is shown, there may be multiple remote hosts and the same or a similar process may be applied to each.
- the remote host replies and so the status has not changed and there is no action taken. After this first ping 341 , the remote host 306 or the connection to the remote host 306 fails 343 .
- the system monitor 318 of the active node 302 pings the same remote host 306 again. However, at 343 the remote host 306 is down and does not send a reply.
- the system monitor 318 of the active node 302 detects the change in the host monitor status, generates an interrupt or alert and sends 344 the interrupt or alert to the HA module 312 of the active node 302 that the remote host 306 is down.
- the HA module 312 updates 345 the remote host field of the parameter matrix and evaluates 346 the policies by applying the parameter matrix update as a new input to the policies.
- the rules match at 347 and the HA module 312 starts 347 a switchover process to the standby node 304 . If the rules do not provide a match, then no switchover is requested.
- the switchover works only if a capability matrix comparison indicates a switchover.
- the HA module 312 of the active node 302 sends 348 a capability matrix to a suitable switchover candidate, in this case the illustrated HA module 322 of the standby node 304 .
- the HA module 322 of the standby node 304 receives the capability matrix and compares 349 the received capability matrix to the capability matrix of the standby node 304 . If the standby node 304 capability matrix is better, then the standby node 304 HA module 322 sends a switchover acknowledgement 350 back to the active node 302 HA module 312 .
- a negative acknowledgment may be sent instead.
- the active node 302 HA module 312 answers with an HA switchover signal 351 and also transitions 352 to a standby state.
- the standby node 304 HA module 322 upon receiving the switchover signal 351 transitions 353 to an active state. There may be a parameter matrix update operation (not shown) after the switchover operation is complete.
- FIG. 4 is an alternative sequence diagram of two nodes performing a switchover in response to a network interface and remote host interface going down.
- the sequence diagram shows an example of the operations of FIG. 2 in a particular example. These operations include monitoring network interfaces and remote hosts, updating the parameter matrix to reflect changes, applying the changes as policy inputs, comparing matrices, and performing a switchover to a standby node.
- FIG. 4 has an active node 402 , a standby node 404 , and a remote host 406 all connected through a network, for example the network of FIG. 1 or any other suitable data communications network.
- the active node includes a High Availability (HA) module 412 , a VRRP module 414 , a routing module 416 , and a system monitor 418 .
- the standby node 404 includes a HA module 422 , a VRRP module 424 , a routing module 426 , and a system monitor 428 .
- the modules may have dedicated physical hardware resources, dedicated virtual resources, or may exist as portions of larger systems.
- the sequence diagram shows only certain example signals to illustrate particular parts of the system operation. Many more signals may be sent and received before during and after the signals described herein.
- the VRRP module 414 of the active node 402 sends VRRP advertisements 430 to the standby node 404 VRRP module 424 .
- advertisements may be sent to many other modules (not shown).
- the first trigger in the sequence occurs when the system monitor 418 detects 431 that a network interface is down.
- the active node 402 system monitor 418 detects this condition as a network interface status change at 431 and sends a notification 432 to the routing module 416 , a notification 433 to the VRRP module 414 and a notification 434 to the HA module 412 .
- the system monitor 418 is performing an operation 204 of monitoring the policy inputs and determining whether there are updates or status changes 206 , 208 , 210 , 212 as shown in FIG. 2 .
- the system monitor operates as a background application, for example a daemon, and generates and sends interrupts or alerts to the modules 412 , 414 , 416 of the active node 402 in which it operates.
- the VRRP module 414 In response to receiving an interface down notification 433 , the VRRP module 414 changes the corresponding VRRP group state of the active node 402 from the master state to the standby state.
- the HA module 412 of the active node 402 updates 435 the active node 402 parameter matrix with the new network interface value. In this case, the value is reduced by one.
- the HA module 412 evaluates 436 the policies using the new value for network interfaces as input to the policies.
- the rules do not match. The policies are not met and no switchover is taken.
- the active node 402 stays active.
- a switchover may be requested using a switchover request signal 448 . In this case, the switchover may occur before the remote host is down at 443 .
- the system monitor 418 of the active node 402 pings the remote host 406 . While only one remote host 406 is shown, there may be multiple remote hosts and the same or a similar process may be applied to each.
- the remote host replies and so the status has not changed and there is no action taken. After this first ping 441 , the remote host 406 or the connection to the remote host 406 fails 443 .
- the system monitor 418 of the active node 402 pings the same remote host 406 again. However, the remote host 406 is down and does not send a reply.
- the system monitor 418 of the active node 402 detects the change in the status of the host monitor, generates an interrupt and sends 444 the interrupt or alert to the HA module 412 of the active node 402 that the remote host 406 is down.
- the HA module 412 updates 445 the remote host field of the parameter matrix and evaluates 446 the policies by applying the parameter matrix update as a new input to the policies.
- the rules match at 447 and the HA module 412 starts a switchover process to the standby node 404 . If the rules do not match then there is no switchover request.
- the switchover works only if the standby node 404 policy evaluation indicates a switchover.
- the HA module 412 of the active node 402 sends a switchover request signal 448 to a suitable switchover candidate, in this case the illustrated HA module 422 of the standby node 404 .
- the HA module 422 of the standby node 404 receives the switchover request signal 448 and then evaluates 449 its own policies against its own capability matrix.
- the standby node 404 policy evaluation does not indicate a switchover, i.e., the standby node switchover polices are not met, then the standby node 404 HA module 422 sends a switchover acknowledgement 450 back to the active node 402 HA module 412 .
- the active node 402 HA module 412 answers with an HA switchover signal 451 and also transitions 452 to a standby state.
- the standby node 404 HA module 422 upon receiving the switchover signal 451 transitions 453 to an active state. There may be a parameter matrix update operation (not shown) after the switchover operation is complete.
- the policy evaluation at the standby node 404 prevents an immediate switchover back to the originating node.
- the standby node 404 after becoming active, might evaluate its own policies and then request a switchover back to the previously active node 402 . This node would then become active and again request a switchover and so on so that the traffic routing does not stabilize.
- FIG. 5 is a second alternative sequence diagram of two nodes performing a switchover in response to a network interface and remote host interface going down. This example combines the capability matrix comparison of FIG. 3 and the policy evaluation of FIG. 4 into the same switchover process.
- FIG. 5 has an active node 502 , a standby node 504 , and a remote host 506 all connected through a network, for example the network of FIG. 1 or any other suitable data communications network.
- the active node includes a High Availability (HA) module 512 , a VRRP module 514 , a routing module 516 , and a system monitor 518 .
- HA High Availability
- the standby node 504 includes a HA module 522 , a VRRP module 524 , a routing module 526 , and a system monitor 528 .
- the VRRP module 514 of the active node 502 sends VRRP advertisements 530 to the standby node 504 VRRP module 524 . In the same way advertisements may be sent to many other modules (not shown).
- the first trigger in the sequence occurs when the system monitor 518 detects 531 that a network interface is down.
- the active node 502 system monitor 518 detects this condition as a network interface status change at 531 and sends a notification 532 to the routing module 516 , a notification 533 to the VRRP module 514 and a notification 534 to the HA module 512 .
- the VRRP module 414 In response to receiving an interface down notification 433 , the VRRP module 414 changes the corresponding VRRP group state of the active node 402 from the master state to the standby state.
- the HA module 512 of the active node 502 updates 535 the active node 502 parameter matrix with the new network interface value.
- the HA module 512 evaluates 536 the policies using the new value for network interfaces as input to the policies.
- the rules do not match. The policies are not met and no switchover is taken.
- the active node 502 stays active.
- the system monitor 518 of the active node 502 pings the remote host 506 .
- the remote host replies and so the status has not changed and there is no action taken.
- the remote host 506 or the connection to the remote host 506 fails 543 .
- the system monitor 518 of the active node 502 pings the same remote host 506 again. However, the remote host 506 is down and does not send a reply. Accordingly, the system monitor 518 of the active node 502 detects the change in the status of the host monitor, generates an interrupt and sends 544 the interrupt to the HA module 512 of the active node 502 that the remote host 506 is down.
- the HA module 512 updates 545 the remote host field of the parameter matrix and evaluates 546 the policies by applying the parameter matrix update as a new input to the policies.
- the rules match at 547 and the HA module 512 starts 547 a switchover process to the standby node 504 .
- the switchover works only if the standby capability matrix is better and the standby node 504 policy evaluation indicates no switchover from the standby node to the active node or another node.
- the HA module 512 of the active node 502 sends 548 a capability matrix to the HA module 522 of the standby node 504 .
- the HA module 522 of the standby node 504 receives the capability matrix and compares 549 the received capability matrix to the capability matrix of the standby node 504 . If the standby node 504 capability matrix is better, then the HA module 522 of the standby node 504 evaluates 550 its own policies against its own capability matrix.
- the two tests 549 , 550 are sequential and conditional in that the first test must be met before the second test is performed. The particular sequence and relationships of the tests may be modified to suit different circumstances. If the standby node 504 policy evaluation does not indicate a switchover, then the standby node 504 HA module 522 sends a switchover acknowledgement 551 back to the active node 502 HA module 512 .
- the active node 502 HA module 512 answers with an HA switchover signal 552 and also transitions 553 to a standby state.
- the standby node 504 HA module 522 upon receiving the switchover signal 552 transitions 554 to an active state.
- the policy evaluation may be performed before the capability matrix comparison.
- the standby node may send a NACK instead of the ACK 551 as shown rejecting the switchover request and no switchover is performed.
- FIG. 6 is a sequence diagram of two nodes performing a switchover in response to a connected BGP peer route being withdrawn.
- the sequence diagram shows another example of the operations of FIG. 2 in a particular example. These operations include monitoring VRRP states and BGP/OSPF updates, updating the parameter matrix to reflect changes, applying the changes as policy inputs, comparing matrices, and not performing a switchover to a standby node.
- FIG. 6 has an active node 602 , a standby node 604 , and a connected remote BGP peer 606 all connected through a network, for example the network of FIG. 1 or any other suitable data communications network.
- the remote BGP peer may be a neighbor router, a next hop router, or a more remote router.
- the active node includes a High Availability (HA) module 612 , a VRRP module 614 , a routing module 616 , and a system monitor 618 .
- the standby node 604 includes a HA module 622 , a VRRP module 624 , a routing module 626 , and a system monitor 628 .
- the modules may have dedicated physical hardware resources, dedicated virtual resources, or may exist as portions of larger systems.
- the sequence diagram shows only certain example signals to illustrate particular parts of the system operation. Many more signals may be sent and received before during and after the signals described herein.
- the VRRP module 614 of the active node 602 sends VRRP advertisements 630 to the standby node 604 VRRP module 624 .
- the connected remote BGP peer 606 connection fails and then there is a failed message 634 from the disconnected remote BGP peer 606 that fails to reach the routing module 616 of the active node 602 .
- the route to the BGP peer 606 may be withdrawn due to a failure of the BGP peer, the active node or any other node along the route.
- the routing module 616 of the active node 602 will not receive the failed message 634 .
- any message (not shown) from the routing module 616 of the active node 602 will not reach the BGP peer 606 , and the routing module 616 will not receive any acknowledgements from the BGP peer 606 for sent messages.
- the active node 602 routing module 616 detects the change in the BGP peer status generates a BGP route update in light of this condition and sends a route withdrawn notification 636 to the HA module 612 .
- the system monitor 618 operates as a background application, for example a daemon, and sends interrupts or alerts to the modules 612 , 614 , 616 of the active node 602 in which it operates.
- the HA module 612 of the active node 602 updates 637 the active node 602 parameter matrix with the new route availability value. In this case, the value is reduced by one.
- the HA module 612 evaluates 638 the policies using the new value for routes as input to the policies.
- the rules match and the switchover process is started.
- the HA module 612 at the active node 602 sends a switchover request signal 642 including its capability matrix to the HA module 622 of the standby node 604 to start a switchover to the standby node.
- the routing module 626 of the standby node 604 also fails to receive a failed message 635 from the remote BGP peer 606 .
- the routing module will also fail to receive acknowledgements of messages that are attempted to be sent to the remote BGP peer 606 .
- the routing module 626 of the standby node 604 sends a route withdrawn interrupt or alert 640 to the HA module 622 of the standby node 604 that the remote BGP peer 606 is down or at least the route to the remote BGP peer is withdrawn.
- the HA module 622 updates 641 the BGP route field of the standby node 604 parameter matrix.
- the HA module 612 of the active node 602 sends a switchover request signal that includes 642 a capability matrix to a suitable switchover candidate, in this case the illustrated HA module 622 of the standby node 604 .
- the HA module 622 of the standby node 604 receives the capability matrix and compares 643 the received capability matrix to the capability matrix of the standby node 604 . If the standby node 604 capability matrix is better, then the standby node 604 HA module 622 sends a switchover acknowledgement back to the active node 602 HA module 612 .
- the standby node 604 capability matrix is not better and so a negative acknowledgement (NACK) 644 is sent from the standby node 604 HA module 622 to the active node 602 HA module 612 rejecting the switchover request. No switchover is made.
- the active node 602 HA module 612 receives the NACK 644 as a rejection and remains active or sends a switchover request to a different standby node (not shown).
- the standby node 604 HA module 622 after sending the NACK 644 remains in the standby status.
- FIG. 7 is a sequence diagram of two nodes performing a switchover in response to a VRRP status changing.
- the sequence diagram shows an example of the operations of FIG. 2 in a particular example. These operations include monitoring network interfaces and VRRP status, updating the parameter matrix to reflect changes, applying the changes as policy inputs, comparing matrices, and performing a switchover to a standby node.
- FIG. 7 has an active node 702 and a standby node 704 connected through a network, for example the network of FIG. 1 or any other suitable data communications network.
- the active node includes a High Availability (HA) module 712 , a VRRP module 714 , a routing module 716 , and a system monitor 718 .
- HA High Availability
- the standby node 704 includes a HA module 722 , a VRRP module 724 , a routing module 726 , and a system monitor 728 .
- the modules may have dedicated physical hardware resources, dedicated virtual resources, or may exist as portions of larger systems.
- the sequence diagram shows only certain example signals to illustrate particular parts of the system operation. Many more signals may be sent and received before during and after the signals described herein.
- the VRRP module 714 of the active node 702 sends VRRP advertisements 730 to the standby node 704 VRRP module 724 .
- the first trigger in the sequence occurs when the system monitor 718 detects 731 that a network interface is down.
- the active node 702 system monitor 718 detects this condition as a network interface status change at 731 and sends a notification 732 to the routing module 716 , a notification 733 to the HA module 712 and a notification 734 to the VRRP module 714 .
- the VRRP module responds by transitioning 735 the active node 702 to a VRRP backup status.
- the VRRP module 714 In response to the state changes from VRRP master to backup at the active node 702 , the VRRP module 714 notifies 739 the HA module 712 of the state change.
- the HA module 712 of the active node 702 updates 740 the active node 702 parameter matrix with the new VRRP status value. In this case, the value is reduced by one.
- the HA module 712 evaluates 741 the policies using the new value for VRRP status as input to the policies.
- the rules match and a switchover request signal 743 is sent to the standby node 704 with the active node 702 capability matrix.
- the VRRP module 724 In response to the state changes from VRRP backup to master at the standby node 704 , the VRRP module 724 notifies 744 the HA module 722 of the state change.
- the HA module 722 of the standby node 704 updates 745 the standby node 704 parameter matrix with the new VRRP status value. In this case, the value is increased by one.
- the HA module 722 may then evaluate the policies and perform other operations not shown.
- the HA module 712 of the active node 702 which is now a VRRP backup node, sends a switchover request signal 743 with a capability matrix to a suitable switchover candidate, in this case the illustrated HA module 722 of the standby node 704 , which is now a VRRP master node.
- the HA module 722 of the standby node 704 receives the capability matrix and compares 746 the received capability matrix to the capability matrix of the standby node 704 . If the standby node 704 capability matrix is better, then the standby node 704 HA module 722 sends a switchover acknowledgement 747 back to the active node 702 HA module 712 .
- the active node 702 HA module 712 answers with an HA switchover signal 750 and also transitions 751 to a standby state.
- the standby node 704 HA module 722 upon receiving the switchover signal 750 transitions 752 to an active state. There may be a parameter matrix update operation (not shown) after the switchover operation is complete.
- FIG. 8 is a block diagram of a network node 802 , which may be an active node, an inactive node or a remote or peer host, according to an embodiment herein.
- the node includes a processor 810 , memory 812 , and a communications interface 804 connected together through a bus 820 .
- the processor 810 may include a multifunction processor and/or an application-specific processor.
- the memory 812 within the node may include, volatile and non-volatile memory for example, a non-transitory storage medium such as read only memory (ROM), flash memory, RAM, and a large capacity permanent storage device such as a hard disk drive.
- ROM read only memory
- RAM random access memory
- a large capacity permanent storage device such as a hard disk drive.
- the communications interface 804 enables data communications with high availability as described above via local and wide area connections using one or more different protocols including BGP and VRRP.
- the node executes computer readable instructions stored in the storage medium to implement various tasks as described above.
- the node 802 further includes a traffic cache module 814 coupled to the bus 820 with various caches (e.g., application cache, domain application cache, client route cache, and application route cache) to store mapping information and other traffic communication data.
- various caches e.g., application cache, domain application cache, client route cache, and application route cache
- the node 802 further includes a configuration monitor 806 to monitor policy input as described above including BGP/OSPF updates, VRRP state updates, network interface state updates, and remote monitor updates, among others.
- the configuration monitor 806 generates alerts or interrupts and updates a parameter matrix 808 when there are changes to any of the monitored policy inputs.
- the processor 810 may alternatively be configured to update the parameter matrix as well as apply policies to the updates, compare matrices, and generate switchover requests, acknowledgements, and negative acknowledgments, among other tasks.
- a control interface 816 may be provided for node management and configuration purposes as an interface to a computer monitor or flat panel display but may include any output device.
- the control interface 816 may include an interface to a computer keyboard and/or pointing device such as a computer mouse, computer track pad, touch screen, or the like, that allows a user to provide inputs and receive outputs including a GUI (graphical user interface).
- GUI graphical user interface
- a GUI can be responsive of user inputs and typically displays images and data.
- the control interface 816 can be provided as a web page served via a communication to a remote device for display to a user and for receiving inputs from the user.
- each of the modules may be implemented through computer-readable instructions that are executed on a physical processor of a computing system that supports the node
- the embodiments disclosed herein can be implemented through at least one software program running on at least one hardware device and performing network management functions to control the network elements.
- the network elements shown in FIG. 1 include blocks which can be at least one of a hardware device, or a combination of hardware device and software module. It is understood that the scope of the protection for systems and methods disclosed herein is extended to such a program and in addition to a computer readable means having a message therein, such computer readable storage means contain program code means for implementation of one or more steps of the method, when the program runs on a server or mobile device or any suitable programmable device.
- the functionality described above is performed by a computer device that executes computer readable instructions (software).
- computer readable instructions software
- the operations of the method(s) herein are shown and described in a particular order, the order of the operations of each method may be altered so that certain operations may be performed in an inverse order or so that certain operations may be performed, at least in part, concurrently with other operations.
- instructions or sub-operations of distinct operations may be implemented in an intermittent and/or alternating manner.
Landscapes
- Engineering & Computer Science (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Health & Medical Sciences (AREA)
- Cardiology (AREA)
- General Health & Medical Sciences (AREA)
- Environmental & Geological Engineering (AREA)
- Data Exchanges In Wide-Area Networks (AREA)
Abstract
Description
- The embodiments herein relate to data communications through networks and switching traffic over to a different node in the event of a network failure.
- Network data communications have moved beyond asynchronous text and file transfer traffic to time-sensitive traffic such as streaming and interactive communications. At the same time many of the resources being used to carry the data are virtualized. A VNF (Virtual Network Function) may take the place of a hardware router. A VRRP (Virtual Router Redundancy Protocol) is used to assign routes through virtual routers denoting some virtual routers as masters and others as backups.
- High Availability (HA) techniques are intended to provide uninterrupted internet data communications service in the event of failures. HA attempts to compensate for the design of the Internet to provide reliability similar to that provided by telephony service at the lower cost of the Internet. In HA techniques, the states of an active node of a wide area network (WAN) or metropolitan area network (MAN) are replicated on a standby node that is coupled to the same network in order to provide a seamless switchover to the standby node in the event of a failure at the active node. A switchover from the active node to a standby node is made based on the health of the processes running on the active node or based on a failure of network interfaces coupled to the active node. After the switchover, the active node becomes a standby node and the standby node takes the place of the active node, becoming active and maintaining the state of the active node. The new active node carries the data communications traffic instead of the original active node.
- High availability router switchover decisions are described using monitoring and policies. In one example, available network routing parameters are monitored. A change of one of the network routing parameters is detected. A parameter matrix of network routing parameters is updated in response to the detected change. The changed network routing parameters are applied to a policy in response to updating the parameter matrix and a switchover request is sent to a standby network node when the policy is mite by the changed network routing parameter.
- Some embodiments include generating an interrupt at a system monitor of the active node and sending the interrupt to a high availability module of the active node when the change in a network routing parameter is detected and wherein applying the changed network routing parameter to the policy is performed by the high availability module.
- Some embodiments include applying network routing parameters of the standby network node to a policy at the network standby node and sending a switchover request acknowledgment to the active network node from the standby network node when the standby network node policy is not met by the standby network node network routing parameters.
- Some embodiments include sending a portion of the parameter matrix as a capability matrix to the standby network node for comparison with a capability matrix of the standby node. In some embodiments, the switchover request is rejected by the standby network node based on the comparison. In some embodiments, the comparison includes comparing network capabilities of the active network node to the capabilities of the standby network node.
- In some embodiments, applying comprises applying a count of the changed network routing parameter to a rule with a comparison operator to determine if the comparison is met. In some embodiments, applying comprises applying a plurality of counts of the parameter matrix to a sequence of rules having comparison operators. In some embodiments, the rules of the sequence are conditional upon meeting a preceding rule of the sequence.
- In some embodiments, the plurality of network routing parameters comprises a status of the active network node as a VRRP (Virtual Router Redundancy Protocol) master or backup node and wherein detecting a change comprises detecting when the status of the active network node changes. In some embodiments, the plurality of network routing parameters comprises availability of a connected routing peer. In some embodiments, monitoring the availability of the connected routing peer comprises sending a ping to the connected routing peer. In some embodiments, the connected routing peer comprises a next hop border gateway protocol neighbor router.
- In another example, a non-transitory computer-readable storage medium containing program instructions, wherein execution of the program instructions by the computer causes the computer to perform operations comprising monitoring a plurality of network routing parameters that are available to an active network node, detecting a change of one of the plurality of network routing parameters, updating a parameter matrix of network routing parameters in response to the detected change, applying the changed network routing parameter to a policy in response to updating the parameter matrix, and sending a switchover request to a standby network node when the policy is met by the changed network routing parameter.
- In some embodiments, applying comprises applying a count of the changed network routing parameter to a rule with a comparison operator to determine if the comparison is met. In some embodiments, applying comprises applying a plurality of counts of the parameter matrix to a sequence of rules having comparison operators.
- In another example, an active network node includes a system monitor configured to monitor a plurality of network routing parameters that are available to the active network node and to detect a change of one of the plurality of network routing parameters, and a high availability module configured to update a parameter matrix of network routing parameters in response to the detected change, to apply the changed network routing parameter to a policy in response to updating the parameter matrix, and to send a switchover request to a standby network node when the policy is met by the changed network routing parameter.
- In some embodiments, the system monitor further generates an interrupt and sends the interrupt to the high availability module when the change in a network routing parameter is detected. In some embodiments, the high availability node further sends a portion of the parameter matrix as a capability matrix to the standby network node for comparison with a capability matrix of the standby node.
- The embodiments herein will be better understood from the following detailed description with reference to the drawings, in which:
-
FIG. 1 is a block diagram of a network suitable for use with the present invention; -
FIG. 2 is a process flow diagram of making a switchover decision using policies and optionally a matrix according to embodiments of the present invention; -
FIG. 3 is a sequence diagram of two nodes performing a switchover in response to a network interface and remote host interface going down according to embodiments of the invention; -
FIG. 4 is an alternative sequence diagram of two nodes performing a switchover in response to a network interface and remote host interface going down according to embodiments of the invention; -
FIG. 5 is another alternative sequence diagram of two nodes performing a switchover in response to a network interface and remote host interface going down according to embodiments of the invention; -
FIG. 6 is a sequence diagram of two nodes performing a switchover in response to a connected BGP peer route being withdrawn according to embodiments of the invention; -
FIG. 7 is a sequence diagram of two nodes performing a switchover in response to a VRRP status change according to embodiments of the invention; -
FIG. 8 is a block diagram of a network node according to embodiments of the invention; - The embodiments herein and the various features and advantageous details thereof are explained more fully with reference to the non-limiting embodiments that are illustrated in the accompanying drawings and detailed in the following description. Descriptions of well-known components and processing techniques are omitted so as to not unnecessarily obscure the embodiments herein. The examples used herein are intended merely to facilitate an understanding of ways in which the embodiments herein may be practiced and to further enable those of skill in the art to practice the embodiments herein. Accordingly, the examples should not be construed as limiting the scope of the embodiments herein.
- The embodiments herein are described in the context of a Software Defined Wide Area Network (SD-WAN or SDWAN) where there is at least one designated hub node for each of the branch nodes and each hub node potentially acting as a gateway to a plurality of branch nodes. Further, branch nodes themselves may have direct access to the Internet through one or more WAN links. However, embodiments disclosed herein can be applied in non-software-defined WANs and for applications hosted within the network, e.g., within a LAN (Local Area Network).
- As described herein, a High Availability (HA) system tracks multiple circumstances regarding the data communications traffic through the network. These circumstances include the health of the running processes on the active node, the VRRP (Virtual Router Redundancy Protocol) state, and the availability of other resources that may be used to send or receive traffic. The availabilities may include the availability of network interfaces, the availability of connected routing peers, for example BGP (Border Gateway Protocol) neighbor peers, the availability of routes, and the availability of remote hosts. A missing route can lead to brownouts of an active VNF (Virtual Network Function) when the route is withdrawn. Failed interfaces can lead to a blackout. A failed remote host can cause a brownout for an active VNF.
- When an event occurs related to any of these circumstances then the event may be applied to policies to determine whether a switchover should be attempted. A stateful HA may track the availability of BGP neighbor peers, VRRP state, and other factors and decide whether to remain active or transition to a standby role by comparing these factors to those of a standby node. The comparison may be initiated by the active node, for example a VNF, negotiating with a standby node, for example another VNF, to transition to active when the standby node has better data communication metrics.
- For a switchover with VRRP, the active node will initially be in a VRRP Master state but may change its status to a Backup state after the switchover and not remain in a VRRP master state. Without such a change in status, the traffic may be punted between active and backup nodes thereby increasing the load on the overall system. In some embodiments, VRRP tracks the configured routes and network interface availability to decide whether to transition a node to backup or to stay in the master role.
-
FIG. 1 is a simplified diagram of a wired or wireless network in which multiple nodes, for example routers, have redundant paths between multiple clients and the cloud or the Internet. As used herein network communication devices are referred to as nodes. A node may be a router or it may be another device that receives data traffic and sends the data traffic to another node. As such, “nodes” are intended to include “routers” and also a variety of other physical or virtual devices to which the techniques and structures herein may apply. Afirst node 102 is an active VNF that is coupled to asecond node 104 that is a standby VNF through a high availability (HA) link 106. The first and thesecond nodes second nodes first node 102 is a VRRP master and thesecond node 104 is a VRRP backup. The first and the second nodes are both coupled to aclient network 108 that includes multiple clients C1, C2, C3. The first and the second nodes are both coupled to an AR (Access Router) 112 through E-BGP (External Border Gateway Protocol) 110. TheAR 112 is coupled to asecond client network 114 with clients C4, C5. - On the northbound side and opposite from the clients on the southbound side of the diagram, the first and the
second nodes second PE 124. The first andsecond PEs Internet 128. The first andsecond PEs second nodes first node 102 is coupled to both the first and thesecond PE second node 104 is also coupled to both the first and thesecond PE second nodes second PEs second nodes -
FIG. 2 is a process flow diagram of making a switchover decision using policies and optionally a matrix. A process begins at 202 with configuration of one or more switchover policies. These polices may be adapted to suit the configuration of the network and the availability of protocols, routes, neighbor routers and other resources. The policies at 202 are designated for a particular node and different nodes in a network may have different policies. In the present example, the policies are configured for thefirst node 102 and identical policies are configured for thesecond node 104. At least some of the policies have an input and produce an output based on the input. - At 204 the policy inputs are monitored. These inputs may include multiple characteristics of the network that affect the ability of the node to support traffic through the network from and to any other nodes. In the present example, four such network characteristics are monitored but embodiments may have more or fewer or different characteristics. The first characteristic is the availability of a BGP or OSPF (Open Shortest Path First) neighbor or next hop router. In some embodiments, multiple neighbors may be monitored. As examples, the
AR 112 and the first andsecond PEs - A second characteristic is the VRRP status of the node. In a VRRP system the primary traffic should be routed through a VRRP master node. This status is monitored so that when the status of the node is changed to backup, the policy input monitor generates an update and the status is tested at 208. If the status is backup then the process updates the parameter matrix at 220.
- A third characteristic is the status of the network interfaces to which the node connects. The monitor
policy inputs operation 204 sends messages to interfaces of connected nodes and monitors their status. When a node interface is down at 210, then the parameter matrix is updated at 220. The interface may be down due to a broken connection or a restart of the connection or of the other node. The status of one or more interfaces may be monitored depending on the network configuration. - A fourth characteristic is the status of a remote host, for example, an intermediate router, or any other connected router. As with the other characteristics there may be one or more remote hosts that are monitored. In some embodiments, the monitor
policy inputs operation 204 pings the remote host at time intervals. When the ping indicates that the remote host is no longer available then a test at 212 indicates that the host is down and the parameter matrix is updated at 220. - The node maintains a parameter matrix of one or more of the characteristics shown herein. These characteristics are also referred to as network routing parameters, however the parameter matrix may also contain other information of similar and different kinds. When the parameter matrix is changed, the values in the parameter matrix may be applied to one or more policies at 222. The policies may be in the form of testing a parameter matrix count against a rule. If the rule is not satisfied, then the policy is not met and the process returns to monitor the policy inputs. A policy may have multiple rules that are applied in a specific sequence or that are conditional on other rules. If the one or more rules are satisfied and the policy is met, then the process optionally goes to a capability matrix comparison at 224 at which the process determines if a capability matrix of a standby node is better than a capability matrix at this active node. The capability matrix contains information about the availability of network resources and may have the same information as the parameter matrix or less information. The policies in this process are met when there is a failure or reduction in the available resources. If the standby node capability matrix shows that the standby node has experienced the same or more failures or resource reduction, then at 228 that active node will stay active and there is no switchover. The capability matrix shows the network capabilities of the respective node. The capability matrix comparison allows capabilities of the nodes to be compared using a convenient matrix configuration. However, a matrix is used herein for ease of understanding. The capabilities of an active node may be compared to the capabilities of a standby node using data configurations other than or, in addition to, a matrix.
- On the other hand, if the standby node has a better capability matrix in that it has access to more or better network resources, then at 226 a switchover to the standby node is performed. The traffic that was carried by the active node is moved over to the standby node. The standby node becomes the active node and the active node becomes a standby node. In a VRRP context the master becomes the backup and the backup is changed to the master. The switchover is performed with a transfer of state but without any restart. This improves availability because the nodes stay active through the process.
-
FIG. 2 is directed to a process that is performed at the active or master node. At 224, the capability matrix is compared to a capability matrix that was maintained by an associated standby or backup node. The standby or backup node performs the same operations as described for the active or master node as shown inFIG. 2 so that its capability matrix is maintained and ready for a valid comparison. In other words, even if the backup node is not actively communicating data traffic or is actively communicating other unrelated data traffic, the monitorpolicy inputs operation 204 is performed for the network characteristics that are important to the operation of the standby node as a standby for the active node. The configured policies at 202 may be the same or different to suit the network configuration. The interaction between the two nodes is described in more detail below. - Table 1 is a simplified example of a parameter matrix for a first active node and Table 2 is a simplified example of a parameter matrix for a second standby node. These nodes may correspond to the
active VNF 102 and Standby VNF ofFIG. 1 as well as to theActive Node 302 andStandby Node 304 ofFIG. 3 and the other sequence diagrams. The parameter matrix is presented as a two-dimensional table for ease of understanding but may take any of a variety of other forms, including unstructured text strings, metadata, and configuration registers. There may be more or fewer entries to suit different network configurations. In this example, the rows are selected to align with the policy inputs ofFIG. 2 . More or fewer policy inputs may be used. -
TABLE 1 Object Count Interface 3 (Up) Routes 5 (Installed) Remote Host Monitor 3 (Reachability is Up) VRRP Group 2 (Master) -
TABLE 2 Object Count Interface 3 (Up) Routes 4 (Installed) Remote Host Monitor 3 (Reachability is Up) VRRP Group 1 (Master) - The values in Table 1 are values that were determined by the first node and will be applied to the policies configured for the first node, an active node. The values in Table 2 were determined by the second node and will be applied to the policies configured for the second node which is a VRRP backup node with respect to the first node but a VRRP Master for other purposes. Table 1 shows values as counts for network interfaces, routes, monitored remote hosts and VRRP group status. Each time one of the counts changes, the parameter matrix is updated as at
block 220 and the policies are applied as atblock 222. - The nodes may have any of a wide range of different policies active for managing operations including the network interfaces, routes, remote hosts, and groupings, for example VRRP groups, mentioned above. A particular set of policies may be configured to support switchover for high availability and while a few examples are provided more, fewer, and different policies may be used. In some embodiments, the switchover policies are based around the characteristics discussed above of interfaces, routes, remote host monitoring and VRRP group status. An example policy would be that if the VRRP master count is less than three then go to a switchover. In Table 1, the VRRP master count is 2 so, upon updating the count in the parameter matrix from 3 to 2, the policy would be met and a switchover would be invoked. Similarly, an interface count of less than 3 may invoke a switchover. This policy is not met by Table 1. Another example policy would be first to determine if the interface count is less than 4 based on an update to the parameter matrix, then determine if the remote host monitor count is less than 3. If both conditions are met, then a switchover is invoked. For this example, the first condition is met by Table 1 but the second condition is not. Accordingly, switchover would not be invoked. These policies may be described another way using a pseudocode representation as follows.
- The below example considers only the count for the number of active network interfaces that are available to the node. If there are not enough network interfaces, then the traffic is switched over to a node that has more network interfaces. The tracked network characteristics are first named and a tracking function is established. This tracking applies to all of the examples and in some embodiments more characteristics are tracked. The rule applies a comparison operator, less than, to the interface count. If the rule is not met, then the switchover is rejected. The rule may be reversed to apply a greater than or equal comparison operator to the count that results in a switchover rejection. The rule may also be written to use different comparison operators.
-
Track-interfaces=>Inf-1, Inf-2, Inf-3, Inf-4, Inf-5 Track remote host monitors=>H1, H2, H3, H4, H5 switchover policy begin rule-1 begin if interface count less than 3; then switchover endif rule-1 end - This example policy has a single rule and the rule is to switch over if there are less than three active network interfaces.
- The below example considers the number of active network interfaces and the number of active remote hosts detected by the remote host monitor. The rules are sequential in that
rule 2 is not assessed unlessrule 1 is not satisfied. This policy may be stated as first, a switchover is declared if there are less than three out of the five total network interfaces in an UP state (i.e., any two interfaces go down). Second, if the first rule is not satisfied, then a switchover is declared if any one interface goes down and any two of the five total tracked host monitors go down. The first rule applies a less than comparison operator to the network interface count and the second rule applies a less than comparison operator to the remote host monitor count. If the rule is not met then switchover is rejected. The first rule may be made a condition precedent of the second rule. In other words, the second rule may be conditional on the first rule being met. -
Track-interfaces=>Inf-1, Inf-2, Inf-3, Inf-4, Inf-5 Track remote host monitors=>H1, H2, H3, H4, H5 switchover-policy begin rule-1 begin if interface-count less-than 3; then switch-over endif rule-1 end rule-2 begin if interface-count less-than 4 and remote-host-monitor-count less-than 3; then switch-over endif rule-2 end switchover policy end - A third example considers only the number of VRRP groups that are in Master state at the active node. When an event or interrupt is received from a VRRP module at the active node then this third example policy may be invoked. This policy has a single rule that if any one VRRP group transitions to Backup from the three Master VRRP groups being tracked, then a switchover process is started. This rule applies a less than comparison operator to the VRRP group master count.
-
Track-vrrp-groups=> VR1, VR2, VR3 switchover-policy begin rule-1 begin if vrrp-group-master-count less-than 3; then switch-over endif rule-1 end switchover-policy end - While only three example policies are shown, more or fewer may be used. The policies may also be combined in any particular sequence as shown, for example, in the second example. There may be more than two rules in any policy. While the three example policies relate to VRRP status, network interface status and remote host monitor, other monitored status events may be used as inputs. One such input is a BGP/OSPF route status or event but there may be many others.
- Upon determining that a policy is met at 222 of
FIG. 2 , as for example when the network interface count is below 3, then the process at the active node may optionally go to acapability matrix comparison 224. As mentioned above, the capability matrix may have many more or fewer entries than are necessary to evaluate the policies and may have configurations that are different from a parameter matrix. A capability matrix is presented here for ease of understanding. For comparison only a portion of the parameter matrix of Table 1 and Table 2 is needed. Table 3 is a portion of the parameter matrix of Table 1 maintained by the active node and Table 4 is a portion of the parameter matrix of Table 2 maintained by the standby node. In a matrix comparison, the values are the same except that Table 3 corresponding to the active node has more active routes than Table 4 corresponding to the standby node. As a result, the standby node has worse accessibility and a switchover from the active node to the standby node will degrade traffic availability. The operation of the process ofFIG. 2 will be to reject the switchover because the standby node capability matrix is not better and for the current node to stay in the active role at 228. -
TABLE 3 Object Count Interface 3 (Up) Routes 5 (Installed) Remote Host Monitor 3 (Reachability is Up) -
TABLE 4 Object Count Interface 3 (Up) Routes 4 (Installed) Remote Host Monitor 3 (Reachability is Up) - The values used for the capability matrix comparison may be modified to suit particular node and network implementations. In the example of Tables 3 and 4 the VRRP status is not used for comparison because it is not relevant to a node's availability to process traffic. The VRRP group state will always differ between the active node and the standby node because one would be in the Master state and other in the Backup state. In such an event, a comparison is not useful
-
FIG. 3 is a sequence diagram of two nodes performing a switchover in response to a network interface and remote host interface going down. The sequence diagram shows an example of the operations ofFIG. 2 in a particular example. These operations include monitoring network interfaces and remote hosts, updating the parameter matrix to reflect changes, applying the changes as policy inputs, comparing matrices, and performing a switchover to a standby node.FIG. 3 has anactive node 302, astandby node 304, and aremote host 306, for example a next hop BGP router, all connected through a network, for example the network ofFIG. 1 or any other suitable data communications network. The active node includes a High Availability (HA)module 312, aVRRP module 314, arouting module 316, and asystem monitor 318. Similarly, thestandby node 304 includes aHA module 322, aVRRP module 324, arouting module 326, and asystem monitor 328. The modules may have dedicated physical hardware resources, dedicated virtual resources, or may exist as portions of larger systems. The sequence diagram shows only certain example signals to illustrate particular parts of the system operation. Many more signals may be sent and received before during and after the signals described herein. - At 330, as a part of normal network interface operations, the
VRRP module 314 of theactive node 302 sendsVRRP advertisements 330 to thestandby node 304VRRP module 324. In the same way advertisements may be sent to many other modules (not shown). The first trigger in the sequence occurs when the system monitor 318 detects 331 that a network interface is down. In response, theactive node 302 system monitor 318 sends anotification 332 to therouting module 316, anotification 333 to theVRRP module 314 and anotification 334 to theHA module 312. In this way, the system monitor 318 is performing anoperation 204 of monitoring the policy inputs and determining whether there are updates or status changes 206, 208, 210, 212 as shown inFIG. 2 . In some embodiments, the system monitor operates as a background application, for example a daemon, and generates and sends interrupts or alerts to themodules active node 302 in which it operates. - In response to receiving an interface down
notification 333, theVRRP module 314 changes the corresponding VRRP group state of theactive node 302 from the master state to the standby state. - In response to receiving an interface down
notification 334, theHA module 312 of theactive node 302updates 335 theactive node 302 parameter matrix with the new network interface value. In this case, the value is reduced by one. TheHA module 312 evaluates 336 the policies using the new value for network interfaces as input to the policies. At 337, the rules do not match. The policies are not met and no switchover is taken. Theactive node 302 stays active. Alternatively, if the rules do match, then a switchover may be requested as shown, for example, at 347. - In a separate process, at 340, the system monitor 318 of the
active node 302 pings theremote host 306. While only oneremote host 306 is shown, there may be multiple remote hosts and the same or a similar process may be applied to each. At 341, the remote host replies and so the status has not changed and there is no action taken. After thisfirst ping 341, theremote host 306 or the connection to theremote host 306 fails 343. At 342, the system monitor 318 of theactive node 302 pings the sameremote host 306 again. However, at 343 theremote host 306 is down and does not send a reply. Accordingly, the system monitor 318 of theactive node 302 detects the change in the host monitor status, generates an interrupt or alert and sends 344 the interrupt or alert to theHA module 312 of theactive node 302 that theremote host 306 is down. TheHA module 312 updates 345 the remote host field of the parameter matrix and evaluates 346 the policies by applying the parameter matrix update as a new input to the policies. In this example, the rules match at 347 and theHA module 312 starts 347 a switchover process to thestandby node 304. If the rules do not provide a match, then no switchover is requested. - In this embodiment, the switchover works only if a capability matrix comparison indicates a switchover. For a switchover, the
HA module 312 of theactive node 302 sends 348 a capability matrix to a suitable switchover candidate, in this case the illustratedHA module 322 of thestandby node 304. TheHA module 322 of thestandby node 304 receives the capability matrix and compares 349 the received capability matrix to the capability matrix of thestandby node 304. If thestandby node 304 capability matrix is better, then thestandby node 304HA module 322 sends aswitchover acknowledgement 350 back to theactive node 302HA module 312. Alternatively, if the capability matrix is not better, then a negative acknowledgment (NACK) may be sent instead. Theactive node 302HA module 312 answers with anHA switchover signal 351 and also transitions 352 to a standby state. Thestandby node 304HA module 322 upon receiving theswitchover signal 351transitions 353 to an active state. There may be a parameter matrix update operation (not shown) after the switchover operation is complete. -
FIG. 4 is an alternative sequence diagram of two nodes performing a switchover in response to a network interface and remote host interface going down. The sequence diagram shows an example of the operations ofFIG. 2 in a particular example. These operations include monitoring network interfaces and remote hosts, updating the parameter matrix to reflect changes, applying the changes as policy inputs, comparing matrices, and performing a switchover to a standby node.FIG. 4 has anactive node 402, astandby node 404, and aremote host 406 all connected through a network, for example the network ofFIG. 1 or any other suitable data communications network. The active node includes a High Availability (HA)module 412, aVRRP module 414, arouting module 416, and asystem monitor 418. Similarly, thestandby node 404 includes aHA module 422, aVRRP module 424, arouting module 426, and asystem monitor 428. The modules may have dedicated physical hardware resources, dedicated virtual resources, or may exist as portions of larger systems. The sequence diagram shows only certain example signals to illustrate particular parts of the system operation. Many more signals may be sent and received before during and after the signals described herein. - At 430, as a part of normal network interface operations, the
VRRP module 414 of theactive node 402 sendsVRRP advertisements 430 to thestandby node 404VRRP module 424. In the same way advertisements may be sent to many other modules (not shown). The first trigger in the sequence occurs when the system monitor 418 detects 431 that a network interface is down. In response, theactive node 402 system monitor 418 detects this condition as a network interface status change at 431 and sends anotification 432 to therouting module 416, anotification 433 to theVRRP module 414 and anotification 434 to theHA module 412. In this way, the system monitor 418 is performing anoperation 204 of monitoring the policy inputs and determining whether there are updates or status changes 206, 208, 210, 212 as shown inFIG. 2 . In some embodiments, the system monitor operates as a background application, for example a daemon, and generates and sends interrupts or alerts to themodules active node 402 in which it operates. - In response to receiving an interface down
notification 433, theVRRP module 414 changes the corresponding VRRP group state of theactive node 402 from the master state to the standby state. - In response to receiving an interface down
notification 434, theHA module 412 of theactive node 402updates 435 theactive node 402 parameter matrix with the new network interface value. In this case, the value is reduced by one. TheHA module 412 evaluates 436 the policies using the new value for network interfaces as input to the policies. At 437, the rules do not match. The policies are not met and no switchover is taken. Theactive node 402 stays active. Alternatively, if the rules do match, a switchover may be requested using aswitchover request signal 448. In this case, the switchover may occur before the remote host is down at 443. - In a separate process, at 440, the system monitor 418 of the
active node 402 pings theremote host 406. While only oneremote host 406 is shown, there may be multiple remote hosts and the same or a similar process may be applied to each. At 441, the remote host replies and so the status has not changed and there is no action taken. After thisfirst ping 441, theremote host 406 or the connection to theremote host 406 fails 443. At 442, the system monitor 418 of theactive node 402 pings the sameremote host 406 again. However, theremote host 406 is down and does not send a reply. Accordingly, upon not receiving the reply, the system monitor 418 of theactive node 402 detects the change in the status of the host monitor, generates an interrupt and sends 444 the interrupt or alert to theHA module 412 of theactive node 402 that theremote host 406 is down. TheHA module 412updates 445 the remote host field of the parameter matrix and evaluates 446 the policies by applying the parameter matrix update as a new input to the policies. In this example, the rules match at 447 and theHA module 412 starts a switchover process to thestandby node 404. If the rules do not match then there is no switchover request. - In this embodiment, the switchover works only if the
standby node 404 policy evaluation indicates a switchover. For a switchover, theHA module 412 of theactive node 402 sends aswitchover request signal 448 to a suitable switchover candidate, in this case the illustratedHA module 422 of thestandby node 404. TheHA module 422 of thestandby node 404 receives theswitchover request signal 448 and then evaluates 449 its own policies against its own capability matrix. If thestandby node 404 policy evaluation does not indicate a switchover, i.e., the standby node switchover polices are not met, then thestandby node 404HA module 422 sends aswitchover acknowledgement 450 back to theactive node 402HA module 412. Theactive node 402HA module 412 answers with anHA switchover signal 451 and also transitions 452 to a standby state. Thestandby node 404HA module 422 upon receiving theswitchover signal 451transitions 453 to an active state. There may be a parameter matrix update operation (not shown) after the switchover operation is complete. The policy evaluation at thestandby node 404 prevents an immediate switchover back to the originating node. Without the policy evaluation, thestandby node 404, after becoming active, might evaluate its own policies and then request a switchover back to the previouslyactive node 402. This node would then become active and again request a switchover and so on so that the traffic routing does not stabilize. -
FIG. 5 is a second alternative sequence diagram of two nodes performing a switchover in response to a network interface and remote host interface going down. This example combines the capability matrix comparison ofFIG. 3 and the policy evaluation ofFIG. 4 into the same switchover process.FIG. 5 has anactive node 502, astandby node 504, and aremote host 506 all connected through a network, for example the network ofFIG. 1 or any other suitable data communications network. The active node includes a High Availability (HA)module 512, aVRRP module 514, arouting module 516, and asystem monitor 518. Similarly, thestandby node 504 includes aHA module 522, aVRRP module 524, arouting module 526, and asystem monitor 528. At 530, theVRRP module 514 of theactive node 502 sendsVRRP advertisements 530 to thestandby node 504VRRP module 524. In the same way advertisements may be sent to many other modules (not shown). The first trigger in the sequence occurs when the system monitor 518 detects 531 that a network interface is down. In response, theactive node 502 system monitor 518 detects this condition as a network interface status change at 531 and sends anotification 532 to therouting module 516, anotification 533 to theVRRP module 514 and anotification 534 to theHA module 512. - In response to receiving an interface down
notification 433, theVRRP module 414 changes the corresponding VRRP group state of theactive node 402 from the master state to the standby state. - In response to receiving an interface down
notification 534, theHA module 512 of theactive node 502updates 535 theactive node 502 parameter matrix with the new network interface value. TheHA module 512 evaluates 536 the policies using the new value for network interfaces as input to the policies. At 537, the rules do not match. The policies are not met and no switchover is taken. Theactive node 502 stays active. - In a separate process, at 540, the system monitor 518 of the
active node 502 pings theremote host 506. At 541, the remote host replies and so the status has not changed and there is no action taken. After thisfirst ping 541, theremote host 506 or the connection to theremote host 506 fails 543. At 542, the system monitor 518 of theactive node 502 pings the sameremote host 506 again. However, theremote host 506 is down and does not send a reply. Accordingly, the system monitor 518 of theactive node 502 detects the change in the status of the host monitor, generates an interrupt and sends 544 the interrupt to theHA module 512 of theactive node 502 that theremote host 506 is down. TheHA module 512updates 545 the remote host field of the parameter matrix and evaluates 546 the policies by applying the parameter matrix update as a new input to the policies. In this example, the rules match at 547 and theHA module 512 starts 547 a switchover process to thestandby node 504. In this embodiment, the switchover works only if the standby capability matrix is better and thestandby node 504 policy evaluation indicates no switchover from the standby node to the active node or another node. - For a switchover, the
HA module 512 of theactive node 502 sends 548 a capability matrix to theHA module 522 of thestandby node 504. TheHA module 522 of thestandby node 504 receives the capability matrix and compares 549 the received capability matrix to the capability matrix of thestandby node 504. If thestandby node 504 capability matrix is better, then theHA module 522 of thestandby node 504 evaluates 550 its own policies against its own capability matrix. In this embodiment, the twotests standby node 504 policy evaluation does not indicate a switchover, then thestandby node 504HA module 522 sends aswitchover acknowledgement 551 back to theactive node 502HA module 512. - The
active node 502HA module 512 answers with anHA switchover signal 552 and also transitions 553 to a standby state. Thestandby node 504HA module 522 upon receiving theswitchover signal 552transitions 554 to an active state. There may be a parameter matrix update operation (not shown) after the switchover operation is complete. Alternatively, the policy evaluation may be performed before the capability matrix comparison. On the other hand, if either the standby capability matrix is not better or the standby policy evaluation indicates a switchover, then the standby node may send a NACK instead of theACK 551 as shown rejecting the switchover request and no switchover is performed. -
FIG. 6 is a sequence diagram of two nodes performing a switchover in response to a connected BGP peer route being withdrawn. The sequence diagram shows another example of the operations ofFIG. 2 in a particular example. These operations include monitoring VRRP states and BGP/OSPF updates, updating the parameter matrix to reflect changes, applying the changes as policy inputs, comparing matrices, and not performing a switchover to a standby node.FIG. 6 has anactive node 602, astandby node 604, and a connected remote BGP peer 606 all connected through a network, for example the network ofFIG. 1 or any other suitable data communications network. The remote BGP peer may be a neighbor router, a next hop router, or a more remote router. The same techniques and messages may also apply in the case or an OSPF peer or neighbor. The active node includes a High Availability (HA)module 612, aVRRP module 614, arouting module 616, and asystem monitor 618. Similarly, thestandby node 604 includes aHA module 622, aVRRP module 624, arouting module 626, and asystem monitor 628. The modules may have dedicated physical hardware resources, dedicated virtual resources, or may exist as portions of larger systems. The sequence diagram shows only certain example signals to illustrate particular parts of the system operation. Many more signals may be sent and received before during and after the signals described herein. - At 630, as a part of normal network interface operations, the
VRRP module 614 of theactive node 602 sendsVRRP advertisements 630 to thestandby node 604VRRP module 624. As shown, there areBGP messages 632 between the connectedremote BGP peer 606 and therouting module 616 of theactive node 602. There are also BGP messages 633 between the connectedremote BGP peer 606 and therouting module 626 of thestandby node 604. At some later time, the connected remote BGP peer 606 connection fails and then there is a failedmessage 634 from the disconnected remote BGP peer 606 that fails to reach therouting module 616 of theactive node 602. The route to theBGP peer 606 may be withdrawn due to a failure of the BGP peer, the active node or any other node along the route. Therouting module 616 of theactive node 602 will not receive the failedmessage 634. In addition, any message (not shown) from therouting module 616 of theactive node 602 will not reach theBGP peer 606, and therouting module 616 will not receive any acknowledgements from theBGP peer 606 for sent messages. Accordingly, theactive node 602routing module 616 detects the change in the BGP peer status generates a BGP route update in light of this condition and sends a route withdrawnnotification 636 to theHA module 612. In some embodiments, the system monitor 618 operates as a background application, for example a daemon, and sends interrupts or alerts to themodules active node 602 in which it operates. - In response to receiving the route withdrawn
notification 636, theHA module 612 of theactive node 602updates 637 theactive node 602 parameter matrix with the new route availability value. In this case, the value is reduced by one. TheHA module 612 evaluates 638 the policies using the new value for routes as input to the policies. At 639, the rules match and the switchover process is started. TheHA module 612 at theactive node 602 sends aswitchover request signal 642 including its capability matrix to theHA module 622 of thestandby node 604 to start a switchover to the standby node. - In a separate process, at 635, the
routing module 626 of thestandby node 604 also fails to receive a failedmessage 635 from theremote BGP peer 606. The routing module will also fail to receive acknowledgements of messages that are attempted to be sent to theremote BGP peer 606. At 640 therouting module 626 of thestandby node 604 sends a route withdrawn interrupt or alert 640 to theHA module 622 of thestandby node 604 that theremote BGP peer 606 is down or at least the route to the remote BGP peer is withdrawn. TheHA module 622updates 641 the BGP route field of thestandby node 604 parameter matrix. - For a switchover from the
active node 602, theHA module 612 of theactive node 602 sends a switchover request signal that includes 642 a capability matrix to a suitable switchover candidate, in this case the illustratedHA module 622 of thestandby node 604. TheHA module 622 of thestandby node 604 receives the capability matrix and compares 643 the received capability matrix to the capability matrix of thestandby node 604. If thestandby node 604 capability matrix is better, then thestandby node 604HA module 622 sends a switchover acknowledgement back to theactive node 602HA module 612. In this embodiment, thestandby node 604 capability matrix is not better and so a negative acknowledgement (NACK) 644 is sent from thestandby node 604HA module 622 to theactive node 602HA module 612 rejecting the switchover request. No switchover is made. Theactive node 602HA module 612 receives theNACK 644 as a rejection and remains active or sends a switchover request to a different standby node (not shown). Similarly, thestandby node 604HA module 622 after sending theNACK 644 remains in the standby status. -
FIG. 7 is a sequence diagram of two nodes performing a switchover in response to a VRRP status changing. The sequence diagram shows an example of the operations ofFIG. 2 in a particular example. These operations include monitoring network interfaces and VRRP status, updating the parameter matrix to reflect changes, applying the changes as policy inputs, comparing matrices, and performing a switchover to a standby node.FIG. 7 has anactive node 702 and astandby node 704 connected through a network, for example the network ofFIG. 1 or any other suitable data communications network. The active node includes a High Availability (HA)module 712, aVRRP module 714, arouting module 716, and asystem monitor 718. Similarly, thestandby node 704 includes aHA module 722, aVRRP module 724, arouting module 726, and asystem monitor 728. The modules may have dedicated physical hardware resources, dedicated virtual resources, or may exist as portions of larger systems. The sequence diagram shows only certain example signals to illustrate particular parts of the system operation. Many more signals may be sent and received before during and after the signals described herein. - At 730, the
VRRP module 714 of theactive node 702 sendsVRRP advertisements 730 to thestandby node 704VRRP module 724. The first trigger in the sequence occurs when the system monitor 718 detects 731 that a network interface is down. In response, theactive node 702 system monitor 718 detects this condition as a network interface status change at 731 and sends anotification 732 to therouting module 716, anotification 733 to theHA module 712 and anotification 734 to theVRRP module 714. The VRRP module responds by transitioning 735 theactive node 702 to a VRRP backup status. It also attempts to sendVRRP advertisements 736 to theVRRP module 724 of thestandby node 704 after thetransition 735 from VRRP master to VRRP backup. However, with the connectivity broken, thestandby node 704VRRP module 724 does not receiveadvertisements 736. The broken connectivity will be discovered by the system monitor 722, if not by theVRRP module 724. In response, theVRRP module 724 of thestandby node 704 similarly transitions 737 thestandby node 704 from VRRP backup to VRRP master. - In response to the state changes from VRRP master to backup at the
active node 702, theVRRP module 714 notifies 739 theHA module 712 of the state change. TheHA module 712 of theactive node 702updates 740 theactive node 702 parameter matrix with the new VRRP status value. In this case, the value is reduced by one. TheHA module 712 evaluates 741 the policies using the new value for VRRP status as input to the policies. At 742, the rules match and aswitchover request signal 743 is sent to thestandby node 704 with theactive node 702 capability matrix. - In response to the state changes from VRRP backup to master at the
standby node 704, theVRRP module 724 notifies 744 theHA module 722 of the state change. TheHA module 722 of thestandby node 704updates 745 thestandby node 704 parameter matrix with the new VRRP status value. In this case, the value is increased by one. TheHA module 722 may then evaluate the policies and perform other operations not shown. - For a switchover, the
HA module 712 of theactive node 702, which is now a VRRP backup node, sends aswitchover request signal 743 with a capability matrix to a suitable switchover candidate, in this case the illustratedHA module 722 of thestandby node 704, which is now a VRRP master node. TheHA module 722 of thestandby node 704 receives the capability matrix and compares 746 the received capability matrix to the capability matrix of thestandby node 704. If thestandby node 704 capability matrix is better, then thestandby node 704HA module 722 sends aswitchover acknowledgement 747 back to theactive node 702HA module 712. Theactive node 702HA module 712 answers with anHA switchover signal 750 and also transitions 751 to a standby state. Thestandby node 704HA module 722 upon receiving theswitchover signal 750transitions 752 to an active state. There may be a parameter matrix update operation (not shown) after the switchover operation is complete. -
FIG. 8 is a block diagram of anetwork node 802, which may be an active node, an inactive node or a remote or peer host, according to an embodiment herein. The node includes aprocessor 810,memory 812, and acommunications interface 804 connected together through abus 820. Theprocessor 810 may include a multifunction processor and/or an application-specific processor. Thememory 812 within the node may include, volatile and non-volatile memory for example, a non-transitory storage medium such as read only memory (ROM), flash memory, RAM, and a large capacity permanent storage device such as a hard disk drive. Thecommunications interface 804 enables data communications with high availability as described above via local and wide area connections using one or more different protocols including BGP and VRRP. The node executes computer readable instructions stored in the storage medium to implement various tasks as described above. Thenode 802 further includes atraffic cache module 814 coupled to thebus 820 with various caches (e.g., application cache, domain application cache, client route cache, and application route cache) to store mapping information and other traffic communication data. - The
node 802 further includes aconfiguration monitor 806 to monitor policy input as described above including BGP/OSPF updates, VRRP state updates, network interface state updates, and remote monitor updates, among others. Theconfiguration monitor 806 generates alerts or interrupts and updates aparameter matrix 808 when there are changes to any of the monitored policy inputs. Theprocessor 810 may alternatively be configured to update the parameter matrix as well as apply policies to the updates, compare matrices, and generate switchover requests, acknowledgements, and negative acknowledgments, among other tasks. - A
control interface 816 may be provided for node management and configuration purposes as an interface to a computer monitor or flat panel display but may include any output device. In addition, thecontrol interface 816 may include an interface to a computer keyboard and/or pointing device such as a computer mouse, computer track pad, touch screen, or the like, that allows a user to provide inputs and receive outputs including a GUI (graphical user interface). A GUI can be responsive of user inputs and typically displays images and data. Thecontrol interface 816 can be provided as a web page served via a communication to a remote device for display to a user and for receiving inputs from the user. Additionally, each of the modules may be implemented through computer-readable instructions that are executed on a physical processor of a computing system that supports the node - The embodiments disclosed herein can be implemented through at least one software program running on at least one hardware device and performing network management functions to control the network elements. The network elements shown in
FIG. 1 include blocks which can be at least one of a hardware device, or a combination of hardware device and software module. It is understood that the scope of the protection for systems and methods disclosed herein is extended to such a program and in addition to a computer readable means having a message therein, such computer readable storage means contain program code means for implementation of one or more steps of the method, when the program runs on a server or mobile device or any suitable programmable device. - In an embodiment, the functionality described above is performed by a computer device that executes computer readable instructions (software). Although the operations of the method(s) herein are shown and described in a particular order, the order of the operations of each method may be altered so that certain operations may be performed in an inverse order or so that certain operations may be performed, at least in part, concurrently with other operations. In another embodiment, instructions or sub-operations of distinct operations may be implemented in an intermittent and/or alternating manner.
- The foregoing description of the specific embodiments will so fully reveal the general nature of the embodiments herein that others can, by applying current knowledge, readily modify and/or adapt for various applications such specific embodiments without departing from the generic concept, and, therefore, such adaptations and modifications should and are intended to be comprehended within the meaning and range of equivalents of the disclosed embodiments. It is to be understood that the phraseology or terminology employed herein is for the purpose of description and not of limitation. Therefore, while the embodiments herein have been described in terms of preferred embodiments, those skilled in the art will recognize that the embodiments herein can be practiced with modification within the spirit and scope of the claims as described herein.
Claims (20)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US17/127,946 US11374849B1 (en) | 2020-12-18 | 2020-12-18 | High availability router switchover decision using monitoring and policies |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US17/127,946 US11374849B1 (en) | 2020-12-18 | 2020-12-18 | High availability router switchover decision using monitoring and policies |
Publications (2)
Publication Number | Publication Date |
---|---|
US20220200885A1 true US20220200885A1 (en) | 2022-06-23 |
US11374849B1 US11374849B1 (en) | 2022-06-28 |
Family
ID=82022676
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US17/127,946 Active US11374849B1 (en) | 2020-12-18 | 2020-12-18 | High availability router switchover decision using monitoring and policies |
Country Status (1)
Country | Link |
---|---|
US (1) | US11374849B1 (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20230121682A1 (en) * | 2021-10-14 | 2023-04-20 | Arista Networks, Inc. | Determining readiness for switchover operations for network devices |
Family Cites Families (24)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6973034B1 (en) * | 1999-06-29 | 2005-12-06 | Cisco Technology, Inc. | Technique for collecting operating information from network elements, and for controlling network element behavior in a feedback-based, adaptive data network |
US6505244B1 (en) * | 1999-06-29 | 2003-01-07 | Cisco Technology Inc. | Policy engine which supports application specific plug-ins for enforcing policies in a feedback-based, adaptive data network |
US6584502B1 (en) * | 1999-06-29 | 2003-06-24 | Cisco Technology, Inc. | Technique for providing automatic event notification of changing network conditions to network elements in an adaptive, feedback-based data network |
JP4103816B2 (en) * | 2003-02-12 | 2008-06-18 | 松下電器産業株式会社 | Router setting method and router apparatus |
JP4134916B2 (en) * | 2003-02-14 | 2008-08-20 | 松下電器産業株式会社 | Network connection device and network connection switching method |
US7463654B2 (en) * | 2003-12-22 | 2008-12-09 | 3Com Corporation | Stackable routers employing a routing protocol |
CN1980230B (en) * | 2005-11-30 | 2011-06-01 | 华为技术有限公司 | Method for managing VRRP group |
EP1969768B1 (en) * | 2005-12-28 | 2013-08-28 | Telecom Italia S.p.A. | Method and system for providing user access to communication services, and related computer program product |
US8077709B2 (en) * | 2007-09-19 | 2011-12-13 | Cisco Technology, Inc. | Redundancy at a virtual provider edge node that faces a tunneling protocol core network for virtual private local area network (LAN) service (VPLS) |
US20090303990A1 (en) * | 2008-06-06 | 2009-12-10 | Emulex Design & Manufacturing Corporation | Off-Chip Interface for External Routing |
US8886834B2 (en) * | 2010-12-14 | 2014-11-11 | Cisco Technology, Inc. | Hot standby neighbor discovery protocol for internet protocol version 6 |
US9389968B2 (en) * | 2014-04-30 | 2016-07-12 | Netapp, Inc. | Preventing non-detectable data loss during site switchover |
US10075329B2 (en) * | 2014-06-25 | 2018-09-11 | A 10 Networks, Incorporated | Customizable high availability switchover control of application delivery controllers |
US9853882B2 (en) * | 2014-07-23 | 2017-12-26 | Cisco Technology, Inc. | Dynamic path switchover decision override based on flow characteristics |
US20160080249A1 (en) * | 2014-09-17 | 2016-03-17 | Telefonaktiebolaget L M Ericsson (Publ) | Prevent vrrp master / master split in active / standby icr system |
US10250562B1 (en) * | 2015-03-31 | 2019-04-02 | Juniper Networks, Inc. | Route signaling driven service management |
US9985875B1 (en) * | 2015-03-31 | 2018-05-29 | Juniper Networks, Inc. | Route signalling based resilient application overlay network |
CN105187249B (en) * | 2015-09-22 | 2018-12-07 | 华为技术有限公司 | A kind of fault recovery method and device |
US10558767B1 (en) * | 2017-03-16 | 2020-02-11 | Amazon Technologies, Inc. | Analytical derivative-based ARMA model estimation |
EP3632090A1 (en) * | 2017-05-31 | 2020-04-08 | Affirmed Networks, Inc. | Decoupled control and data plane synchronization for ipsec geographic redundancy |
US10681091B2 (en) * | 2018-07-31 | 2020-06-09 | Juniper Networks, Inc. | N:1 stateful application gateway redundancy model |
US11323310B2 (en) * | 2019-06-24 | 2022-05-03 | Allot Ltd. | Method, device, and system for providing hot reservation for in-line deployed network functions with multiple network interfaces |
KR102384685B1 (en) * | 2019-11-20 | 2022-04-11 | 한국전자통신연구원 | Centralized scheduling apparatus and method considering non-uniform traffic |
US11265240B1 (en) * | 2020-08-19 | 2022-03-01 | Cisco Technology, Inc. | Systems and methods for determining FHRP switchover |
-
2020
- 2020-12-18 US US17/127,946 patent/US11374849B1/en active Active
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20230121682A1 (en) * | 2021-10-14 | 2023-04-20 | Arista Networks, Inc. | Determining readiness for switchover operations for network devices |
US11770291B2 (en) * | 2021-10-14 | 2023-09-26 | Arista Networks, Inc. | Determining readiness for switchover operations for network devices |
Also Published As
Publication number | Publication date |
---|---|
US11374849B1 (en) | 2022-06-28 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110912780B (en) | High-availability cluster detection method, system and controlled terminal | |
JP4680919B2 (en) | Redundant routing capabilities for network node clusters | |
US10182105B2 (en) | Policy based framework for application management in a network device having multiple packet-processing nodes | |
US10148554B2 (en) | System and methods for load placement in data centers | |
CN110784400B (en) | N: 1 method, system and standby service gateway for redundancy of stateful application gateway | |
US20100220622A1 (en) | Adaptive network with automatic scaling | |
Moazzeni et al. | On reliability improvement of software-defined networks | |
JP2017508401A (en) | Switch replacement of partial software defined network in IP network | |
US10972337B2 (en) | Method and apparatus for split-brain avoidance in sub-secondary high availability systems | |
CN113315699B (en) | Multi-master multi-standby fast rerouting system and method for distinguishing priority | |
US8817605B2 (en) | Cross-layer reconfiguration method for surviving multiple-link network failures | |
US11706142B2 (en) | Multihoming optimizations for fast failover in single-active networks | |
US10447581B2 (en) | Failure handling at logical routers according to a non-preemptive mode | |
US11374849B1 (en) | High availability router switchover decision using monitoring and policies | |
US20220116311A1 (en) | Equal cost multi-path (ecmp) failover within an automated system (as) | |
US11258700B1 (en) | Enhanced messaging for backup state status notifications in communications networks | |
US11418382B2 (en) | Method of cooperative active-standby failover between logical routers based on health of attached services | |
US20220103425A1 (en) | Systems and methods for convergence of network traffic after an interruption of a network device's link | |
US20220337503A1 (en) | Identifying zero redundancy paths and affected endpoints in a software defined network | |
JP6490167B2 (en) | COMMUNICATION DEVICE, COMMUNICATION METHOD, COMPUTER PROGRAM, AND COMMUNICATION SYSTEM | |
EP4274176A1 (en) | Data preservation for node evacuation in unstable nodes within a mesh | |
US20240184676A1 (en) | Data preservation for node evacuation in unstable nodes within a mesh | |
Menaceur et al. | Fault Tolerance and Failure Recovery Techniques in Software-Defined Networking: A Comprehensive Approach |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: VERSA NETWORKS, INC., CALIFORNIA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:BAJAJ, KAPIL;MEHTA, APURVA;REEL/FRAME:054701/0931 Effective date: 20201217 |
|
FEPP | Fee payment procedure |
Free format text: ENTITY STATUS SET TO UNDISCOUNTED (ORIGINAL EVENT CODE: BIG.); ENTITY STATUS OF PATENT OWNER: SMALL ENTITY |
|
FEPP | Fee payment procedure |
Free format text: ENTITY STATUS SET TO SMALL (ORIGINAL EVENT CODE: SMAL); ENTITY STATUS OF PATENT OWNER: SMALL ENTITY |
|
AS | Assignment |
Owner name: SILICON VALLEY BANK, AS ADMINISTRATIVE AGENT, CALIFORNIA Free format text: SECURITY INTEREST;ASSIGNOR:VERSA NETWORKS, INC.;REEL/FRAME:059423/0028 Effective date: 20220328 Owner name: SILICON VALLEY BANK, CALIFORNIA Free format text: SECURITY INTEREST;ASSIGNOR:VERSA NETWORKS, INC.;REEL/FRAME:059423/0004 Effective date: 20220328 |
|
STCF | Information on status: patent grant |
Free format text: PATENTED CASE |
|
AS | Assignment |
Owner name: JPMORGAN CHASE BANK, N.A., CALIFORNIA Free format text: SECURITY INTEREST;ASSIGNOR:VERSA NETWORKS, INC.;REEL/FRAME:065289/0303 Effective date: 20231020 |