WO2016122495A1 - Network switching node with state machine replication - Google Patents

Network switching node with state machine replication Download PDF

Info

Publication number
WO2016122495A1
WO2016122495A1 PCT/US2015/013324 US2015013324W WO2016122495A1 WO 2016122495 A1 WO2016122495 A1 WO 2016122495A1 US 2015013324 W US2015013324 W US 2015013324W WO 2016122495 A1 WO2016122495 A1 WO 2016122495A1
Authority
WO
WIPO (PCT)
Prior art keywords
state
network switching
network
node
switching nodes
Prior art date
Application number
PCT/US2015/013324
Other languages
French (fr)
Inventor
Diego DOMPE
Original Assignee
Hewlett Packard Enterprise Development Lp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hewlett Packard Enterprise Development Lp filed Critical Hewlett Packard Enterprise Development Lp
Priority to PCT/US2015/013324 priority Critical patent/WO2016122495A1/en
Publication of WO2016122495A1 publication Critical patent/WO2016122495A1/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L49/00Packet switching elements
    • H04L49/25Routing or path finding in a switch fabric

Definitions

  • Network switches are used to route packets towards their destination in a network.
  • Current modular hardware architecture of network switches often includes multiple nodes distributed across a chassis of a network switch. Each node includes ports and hardware to make packet routing decisions at line rates, such as 10 Gigabit (Gb), 100 Gb, etc.
  • Switching information such as information for a forwarding information base (FIB) or a routing information base (RIB), may be programmed on the nodes by a control plane processor in the switch.
  • Each node may have a partial view of switching information for making packet routing decisions locally and moving packets via a switch fabric to other nodes where the packets can be further processed.
  • FIB forwarding information base
  • RRIB routing information base
  • Figure 1 shows a system including network switching nodes, according to an example of the present disclosure
  • Figure 2 shows a network switching node, according to an example of the present disclosure.
  • Figure 3 shows a method of managing system state, according to an example of the present disclosure.
  • the present disclosure is described by referring mainly to an example thereof.
  • numerous specific details are set forth in order to provide a thorough understanding of the present disclosure. It will be readily apparent however, that the present disclosure may be practiced without limitation to these specific details. In other instances, some methods and structures have not been described in detail so as not to unnecessarily obscure the present disclosure.
  • the term “includes” means includes but not limited thereto, the term “including” means including but not limited thereto.
  • the term “based on” means based at least in part on.
  • the terms "a” and “an” are intended to denote at least one of a particular element.
  • a network switch includes multiple network switching nodes that may be programmed by a controller. Some programming tasks may be offloaded to a state machine replication system and command logic local to each of the network switching nodes.
  • the state machine replication system maintains the state of the system at each network switching node.
  • the state for example, is the programming state of the network switching nodes, and may include programming commands or instructions from the controller for controlling packet forwarding and may include other events that occur in the system that may impact packet forwarding.
  • the same state may be stored in each network switching node in a state log.
  • a leader node is selected from the network switching nodes to implement the programming of the network switching nodes.
  • the leader node receives commands from the controller and broadcasts state messages, including the commands, to the network switching nodes via an internal network.
  • the controller does not need to communicate with each network switching node to program the state.
  • Existing network switches may have processors that are capable of executing programming tasks. However, typically, these processors sit idle for periods of time and are underutilized. The system described herein may offload programming tasks to these processors.
  • the system described herein may include a decentralized data-plane, which increases fault-tolerance, since the system is able to operate even without the controller or without the controller's knowledge of a network switching node. Also, communication of state messages for programming and maximizing availability of nodes is offloaded to the distributed network switching nodes, such as by utilizing the leader node. This allows a developer or programmer to focus on proper programming of the network switching nodes rather than communication. Also, the system may include framework for handling network switching nodes joining/leaving the system, which can also be used for error recovery and handling. This simplifies testing and verification of new routing solutions implemented in the system.
  • FIG. 1 illustrates a network switch 100 according to an example.
  • the network switch 100 includes network switching nodes 1 10a-n connected via an internal network 120.
  • the network switch 100 may be provided as a single device and the network switching nodes 1 10a-n, controller 160 and internal network 120 are housed in the device.
  • the controller 160 coordinates the internal network switching nodes 1 10a-n.
  • the controller 160 sends commands to the network switching nodes 1 10a-n to program the network switching nodes 1 10a-n via the internal network 120.
  • the internal network 120 may include a communication fabric that connects the network switching nodes 1 10a-n and may also connect the controller 160 to the network switching nodes 1 10a-n.
  • the internal network 120 may be a bus.
  • the internal network is an Internet Protocol(IP)- based private network for communication between the the controller 160 and the network switching nodes 1 10a-n.
  • the network switching nodes 1 10a-n may be nodes on a single chassis.
  • the network switching nodes 1 10a-n may include chips housed on the same chassis.
  • the controller 160 may be housed on the same chassis as the network switching nodes 1 10a-n or provided on a different device.
  • Network switching nodes 1 10a-n are shown but the system 100 may include more or less network switching nodes than shown.
  • the network switching nodes 1 10a-n in the network switch 100 route packets in the network 150. Although not shown multiple network switches may be connected to the network 150 to route packets between sources and destinations.
  • the network switching nodes 1 10a-n include hardware routing circuits 1 1 1 a-n to route packets in the network 150 via ports, according to a routing table (e.g., FIB, RIB).
  • the packet routing may include layer 2 packet routing (e.g., based on a media access control (MAC) address) or layer 3 packet routing (e.g., based on an IP address) in the network 150.
  • MAC media access control
  • IP address IP address
  • the hardware routing circuits 1 1 1 1 a-n may include application-specific integrated circuits (ASICs), field- programmable gate arrays (FPGAs) or other customized circuits that are designed to make routing decisions to route packets towards their destinations at line rates or near line rates.
  • ASICs application-specific integrated circuits
  • FPGAs field- programmable gate arrays
  • the network switching nodes 1 10a-n are able to execute control- plane-type decision making and programming at the network switching nodes 1 10a-n rather than at the controller 160.
  • the network switching nodes 1 10a-n include state machine replication systems 1 12a-n and command logic 1 13a-n to make such decisions.
  • the state machine replication systems 1 12a-n execute protocols to maintain a state of the system 100.
  • the state machine replication systems 1 12a-n provide state machine replication in the network switch 100.
  • Events related to programming of the network switching nodes 1 10a-n and the order the events occur represent the state of the system 100.
  • the state may include state messages related to programming the network switching nodes 1 10a-n and an order for consuming the state messages.
  • Consuming the state messages may include reading the state messages and executing an action if needed based on a state message.
  • the state is stored in a state log in each of the network switching nodes 1 10a-n, and the state log may store the state messages and an order for consuming the state messages.
  • a state log may be copied to a new node entering the system 100 to replicate the state in a new node.
  • all the network switching nodes 1 10a-n maintain the same state, and thus store the same state messages and order. Also, the network switching nodes 1 10a-n may store a same set of routing table entries in each of their routing tables.
  • the state messages include routing command messages from the controller 160 and event messages identifying events occurring at the network switching nodes 1 10a-n.
  • the routing command messages from the controller 160 may include routing programming instructions, such as adding, modifying or deleting routing entries in the routing table (e.g., FIB and/or RIB), adding a new virtual local area network (VLAN), modifying VLAN membership, etc.
  • the command logics 1 13a-n determine whether an action is to be performed at the network switching node in response to a state message, and executes the action if the action is to be performed. For example, the command logics 1 13a-n execute the instructions in the routing command messages to program the network switching nodes 1 10a-n.
  • the event messages identify events occurring at the network switching nodes 1 10a-n that may impact the state of the system 100, and the event messages provide notification of the events to the other nodes 1 10a-n and the controller 160.
  • a new transceiver with ports is plugged into a chassis for the network switch 100, and the network switching node with the new transceiver sends an event message to the other nodes and the controller 160 indicating the event occurred.
  • the controller 160 may subsequently send instructions to add the ports to a VLAN.
  • the state machine replication systems 1 12a-n may select a leader node from the network switching nodes 1 10a-n, and the leader node maintains the state and replicates the state to the other network switching nodes.
  • the network switching nodes 1 10a may use an application program interface (API).
  • API application program interface
  • the API may be asynchronous and no responses are given for commands provided by the controller 160 to the network switching nodes 1 10a-n.
  • the leader node may trigger notifications to the controller 160 in response to events broadcasted by the network switching nodes 1 10a-n reacting to commands from the controller 160.
  • the controller 160 can address state messages, such as routing command messages, to any of the network switching nodes 1 10a-n, and the state messages are redirected to the leader node by the other network switching nodes.
  • state messages such as routing command messages
  • only the leader node triggers sending of programming commands to the network switching nodes 1 10a-n, such as in response to requests from either the controller 160 or events from the network switching nodes 1 10a-n that are to be broadcasted to other nodes.
  • a request including a command to add a routing table entry is sent from the controller 160 to any of the network switching nodes 1 10a-n. The request is routed to the leader node.
  • the leader node generates a routing command message, including the command from the controller 160, and broadcasts the message to the other network switching nodes via the network 120.
  • the network switching nodes receive the routing command message from the leader node, determine whether they need to perform an action in response to the message, such as adding the routing table entry, and execute the action if needed.
  • the command is stored in the state log.
  • the order is also stored. The order may be determined through a consensus function.
  • the state machine protocol may include the consensus function such that specific messages are sent between the nodes and current leader to determine the order. For example, the order may be indicated through numeric values or other indicator, such as command number 16 out of 16 commands that were sent.
  • the network switching nodes may respond to state messages from the leader node to indicate to the leader node that the state messages were processed.
  • a network switching node that receives a command in a state message from the leader node may execute an action, if required, based on the command, and the command is locally stored in the state log of the network switching node.
  • the network switching node may trigger new commands to the leader node, and the leader node may send notifications about an event, such as a change in hardware, or broadcast a current state for other nodes to know.
  • the leader node may perform additional operations. For example, the leader node may notify the controller 160 of events upon broadcasting of such events to the other network switching nodes.
  • the leader node can offload processing tasks from the controller 160. For example the leader node may perform fault-tolerance by selecting nodes to operate for failed nodes, perform load balancing, etc.
  • Network switching nodes may execute additional functions, such as log compaction to minimize the amount of memory to store the state log.
  • the leader node may request one of the network switching nodes to perform a state log transfer to the new network switching node.
  • the state log is copied, e.g., transmitted, to the new network switching node.
  • the new network switching node does not process any new commands until its state log is up-to-date as determined by the order specified in the transferred state log.
  • the new network switching node may execute commands in the state log according to the specified order and then may execute new commands.
  • Consensus functions allow a collection of machines to work as a coherent group that can survive the failures of some of its members, and thus provide a reliable mechanism to build a multi-node system.
  • the system 100 may use a consensus function for managing and replicating the state log.
  • the system 100 uses the Raft consensus function described in "In Search of an Understandable Consensus Algorithm" by Ongaro et al. to implement key elements of consensus, including state log replication and leader node election.
  • the network switching nodes 1 10a-n may use heartbeats for leader node election.
  • the network switching nodes receive heartbeats from a leader node.
  • a network switching node If no heartbeat is received over a period of time called the election timeout, then a network switching node assumes there is no viable leader and begins an election to choose a new leader. To begin an election, a network switching node may send a RequestVote message to all other network switching nodes in the system 100 and vote for itself. The network switching node wins an election if it receives votes from a majority of the wins an election if it receives votes from a majority of the other network switching nodes in the system 100.
  • Figure 2 shows a network switching node 200 that may be used in the network switch 100 shown in figure 1 .
  • the network switching node 200 may be implemented as any of the network switching nodes 1 10a-n shown in figure 1 .
  • the network switching node 200 includes a hardware routing circuit 21 1 , e.g., ASIC, FPGA, etc., that route packets in the network 150 via ports 205, according to a routing table.
  • the hardware routing circuit 21 1 is designed to make routing decisions to route packets towards their destinations at line rates or near line rates.
  • the network switching node 200 includes a processor 201 an input/output interface 202, a data storage 206, state machine replication module 212 and command logic module 213.
  • the processor 201 may include a microprocessor operable to execute machine readable instructions to perform programmed operations.
  • the data storage 206 may include volatile and/or nonvolatile data storage, such as random access memory, memristors, flash memory, and the like. Machine readable instructions, tables, and any information used by the network switching node 200 may be stored on the data storage 206.
  • the input/output interface 202 may include a network interface or another interface to connect to the network 1 12, which is connected to other network switching nodes and the controller 160.
  • the state machine replication module 212 may perform the functions of the state machine replication system 1 12 described with respect to figure 1 .
  • the command logic module 213 may perform the functions of the command logic 1 13 described with respect to figure 1 .
  • the modules 212 and 213 may comprise machine readable instructions stored in the data storage 206 and executed by the processor 201 .
  • the data storage 206 may also store state log 220 and routing table 221 .
  • the state log 220 includes the commands, events and order derived from state messages, for example, from the leader node.
  • the routing table 221 stores routing table entries used by the hardware routing circuit 21 1 to route packets in the network 150.
  • the routing table 221 e.g., FIB, RIB, etc., may be stored in data storage at the hardware routing circuit 21 1 for line-rate routing performed by the hardware routing circuit 21 1 .
  • FIG. 3 illustrates a method 300 that may be executed by one or more of the network switching nodes 1 10a-n shown in figure 1 .
  • the method 300 is a method for managing a state for the network switch 100 shown in figure 1 .
  • a state message is received from a leader node of the network switching nodes 1 10a-n at a network switching node of the nodes 1 10a-n.
  • the leader node sends state messages to the network switching nodes 1 10b-n.
  • the state messages may include routing commands from the controller 160, such as commands to add or modify routing table entries, VLAN membership, and other commands.
  • the state messages may include events occurring in a network switching node that is to be notified to other network switching nodes and/or the controller 160.
  • the events may include a hardware change event or other events.
  • the network switching node receiving the state message stores the state in a state log.
  • the state log may include the received state messages, including commands, event, etc., in the state messages, and the order of consuming the state messages as specified by the leader node.
  • the network switching node determines whether an action is to be performed at the network switching node in response to the received state message.
  • the state message may include a command to add a port to a VLAN. If the network switching node does not include the port specified in the state message, then no action may be taken. If the network switching node includes the port, then the network switching node adds the port to the specified VLAN. Accordingly, different network switching nodes receiving the same state message may perform different actions or perform no action, but the state is the same across the network switching nodes, because the state includes the command from the state message.
  • the network switching node performs the action if it determines the action is to be taken; otherwise at 305, no action is taken.

Abstract

In an example, a network switching node stores a routing table and a state log. A hardware routing circuit routes packets in a network via ports, according to the routing table. A state machine replication system receives state messages and stores a state of the system in the state log based on the state messages. Command logic determines whether an action is to be performed at the network switching node in response to a state message.

Description

NETWORK SWITCHING NODE WITH STATE MACHINE REPLICATION
BACKGROUND
[0001] Network switches are used to route packets towards their destination in a network. Current modular hardware architecture of network switches often includes multiple nodes distributed across a chassis of a network switch. Each node includes ports and hardware to make packet routing decisions at line rates, such as 10 Gigabit (Gb), 100 Gb, etc. Switching information, such as information for a forwarding information base (FIB) or a routing information base (RIB), may be programmed on the nodes by a control plane processor in the switch. Each node may have a partial view of switching information for making packet routing decisions locally and moving packets via a switch fabric to other nodes where the packets can be further processed.
BRIEF DESCRIPTION OF THE DRAWINGS
[0002] Features of the present disclosure are illustrated by way of example and not limited in the following figure(s), in which like numerals indicate like elements, and in which:
[0003] Figure 1 shows a system including network switching nodes, according to an example of the present disclosure;
[0004] Figure 2 shows a network switching node, according to an example of the present disclosure; and
[0005] Figure 3 shows a method of managing system state, according to an example of the present disclosure.
DETAILED DESCRIPTION
[0006] For simplicity and illustrative purposes, the present disclosure is described by referring mainly to an example thereof. In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present disclosure. It will be readily apparent however, that the present disclosure may be practiced without limitation to these specific details. In other instances, some methods and structures have not been described in detail so as not to unnecessarily obscure the present disclosure. In the present disclosure, the term "includes" means includes but not limited thereto, the term "including" means including but not limited thereto. The term "based on" means based at least in part on. In addition, the terms "a" and "an" are intended to denote at least one of a particular element.
[0007] According to examples of the present disclosure, a network switch includes multiple network switching nodes that may be programmed by a controller. Some programming tasks may be offloaded to a state machine replication system and command logic local to each of the network switching nodes. The state machine replication system maintains the state of the system at each network switching node. The state, for example, is the programming state of the network switching nodes, and may include programming commands or instructions from the controller for controlling packet forwarding and may include other events that occur in the system that may impact packet forwarding. The same state may be stored in each network switching node in a state log.
[0008] In an example, a leader node is selected from the network switching nodes to implement the programming of the network switching nodes. For example, the leader node receives commands from the controller and broadcasts state messages, including the commands, to the network switching nodes via an internal network. By utilizing the leader node, the controller does not need to communicate with each network switching node to program the state. [0009] Existing network switches may have processors that are capable of executing programming tasks. However, typically, these processors sit idle for periods of time and are underutilized. The system described herein may offload programming tasks to these processors. Also, the system described herein may include a decentralized data-plane, which increases fault-tolerance, since the system is able to operate even without the controller or without the controller's knowledge of a network switching node. Also, communication of state messages for programming and maximizing availability of nodes is offloaded to the distributed network switching nodes, such as by utilizing the leader node. This allows a developer or programmer to focus on proper programming of the network switching nodes rather than communication. Also, the system may include framework for handling network switching nodes joining/leaving the system, which can also be used for error recovery and handling. This simplifies testing and verification of new routing solutions implemented in the system.
[0010] Figure 1 illustrates a network switch 100 according to an example. The network switch 100 includes network switching nodes 1 10a-n connected via an internal network 120. The network switch 100 may be provided as a single device and the network switching nodes 1 10a-n, controller 160 and internal network 120 are housed in the device. The controller 160 coordinates the internal network switching nodes 1 10a-n. The controller 160 sends commands to the network switching nodes 1 10a-n to program the network switching nodes 1 10a-n via the internal network 120. The internal network 120 may include a communication fabric that connects the network switching nodes 1 10a-n and may also connect the controller 160 to the network switching nodes 1 10a-n. The internal network 120 may be a bus. In one example, the internal network is an Internet Protocol(IP)- based private network for communication between the the controller 160 and the network switching nodes 1 10a-n. The network switching nodes 1 10a-n may be nodes on a single chassis. For example, the network switching nodes 1 10a-n may include chips housed on the same chassis. The controller 160 may be housed on the same chassis as the network switching nodes 1 10a-n or provided on a different device. Network switching nodes 1 10a-n are shown but the system 100 may include more or less network switching nodes than shown.
[0011] The network switching nodes 1 10a-n in the network switch 100 route packets in the network 150. Although not shown multiple network switches may be connected to the network 150 to route packets between sources and destinations. The network switching nodes 1 10a-n include hardware routing circuits 1 1 1 a-n to route packets in the network 150 via ports, according to a routing table (e.g., FIB, RIB). The packet routing may include layer 2 packet routing (e.g., based on a media access control (MAC) address) or layer 3 packet routing (e.g., based on an IP address) in the network 150. By way of example, the hardware routing circuits 1 1 1 a-n may include application-specific integrated circuits (ASICs), field- programmable gate arrays (FPGAs) or other customized circuits that are designed to make routing decisions to route packets towards their destinations at line rates or near line rates.
[0012] The network switching nodes 1 10a-n are able to execute control- plane-type decision making and programming at the network switching nodes 1 10a-n rather than at the controller 160. The network switching nodes 1 10a-n include state machine replication systems 1 12a-n and command logic 1 13a-n to make such decisions.
[0013] The state machine replication systems 1 12a-n execute protocols to maintain a state of the system 100. The state machine replication systems 1 12a-n provide state machine replication in the network switch 100. Events related to programming of the network switching nodes 1 10a-n and the order the events occur represent the state of the system 100. For example, the state may include state messages related to programming the network switching nodes 1 10a-n and an order for consuming the state messages. Consuming the state messages may include reading the state messages and executing an action if needed based on a state message. The state is stored in a state log in each of the network switching nodes 1 10a-n, and the state log may store the state messages and an order for consuming the state messages. A state log may be copied to a new node entering the system 100 to replicate the state in a new node. In an example, all the network switching nodes 1 10a-n maintain the same state, and thus store the same state messages and order. Also, the network switching nodes 1 10a-n may store a same set of routing table entries in each of their routing tables.
[0014] The state messages, for example, include routing command messages from the controller 160 and event messages identifying events occurring at the network switching nodes 1 10a-n. The routing command messages from the controller 160 may include routing programming instructions, such as adding, modifying or deleting routing entries in the routing table (e.g., FIB and/or RIB), adding a new virtual local area network (VLAN), modifying VLAN membership, etc. The command logics 1 13a-n determine whether an action is to be performed at the network switching node in response to a state message, and executes the action if the action is to be performed. For example, the command logics 1 13a-n execute the instructions in the routing command messages to program the network switching nodes 1 10a-n. The event messages identify events occurring at the network switching nodes 1 10a-n that may impact the state of the system 100, and the event messages provide notification of the events to the other nodes 1 10a-n and the controller 160. For example, a new transceiver with ports is plugged into a chassis for the network switch 100, and the network switching node with the new transceiver sends an event message to the other nodes and the controller 160 indicating the event occurred. The controller 160 may subsequently send instructions to add the ports to a VLAN.
[0015] To program, maintain and replicate the state among the network switching nodes 1 10a-n, the state machine replication systems 1 12a-n may select a leader node from the network switching nodes 1 10a-n, and the leader node maintains the state and replicates the state to the other network switching nodes.
[0016] To communicate with each other and the controller 160, the network switching nodes 1 10a may use an application program interface (API). The API may be asynchronous and no responses are given for commands provided by the controller 160 to the network switching nodes 1 10a-n. However the leader node may trigger notifications to the controller 160 in response to events broadcasted by the network switching nodes 1 10a-n reacting to commands from the controller 160.
[0017] The controller 160 can address state messages, such as routing command messages, to any of the network switching nodes 1 10a-n, and the state messages are redirected to the leader node by the other network switching nodes. In an example, only the leader node triggers sending of programming commands to the network switching nodes 1 10a-n, such as in response to requests from either the controller 160 or events from the network switching nodes 1 10a-n that are to be broadcasted to other nodes. For example, a request including a command to add a routing table entry is sent from the controller 160 to any of the network switching nodes 1 10a-n. The request is routed to the leader node. The leader node generates a routing command message, including the command from the controller 160, and broadcasts the message to the other network switching nodes via the network 120. The network switching nodes receive the routing command message from the leader node, determine whether they need to perform an action in response to the message, such as adding the routing table entry, and execute the action if needed. Also, the command is stored in the state log. The order is also stored. The order may be determined through a consensus function. The state machine protocol may include the consensus function such that specific messages are sent between the nodes and current leader to determine the order. For example, the order may be indicated through numeric values or other indicator, such as command number 16 out of 16 commands that were sent. Also, the network switching nodes may respond to state messages from the leader node to indicate to the leader node that the state messages were processed.
[0018] A network switching node that receives a command in a state message from the leader node may execute an action, if required, based on the command, and the command is locally stored in the state log of the network switching node. The network switching node may trigger new commands to the leader node, and the leader node may send notifications about an event, such as a change in hardware, or broadcast a current state for other nodes to know.
[0019] Besides behaving as a network switching node and executing leader node functions, such as replicating state, the leader node may perform additional operations. For example, the leader node may notify the controller 160 of events upon broadcasting of such events to the other network switching nodes. The leader node can offload processing tasks from the controller 160. For example the leader node may perform fault-tolerance by selecting nodes to operate for failed nodes, perform load balancing, etc. Network switching nodes may execute additional functions, such as log compaction to minimize the amount of memory to store the state log.
[0020] If a new network switching node is added to the system 100, the leader node may request one of the network switching nodes to perform a state log transfer to the new network switching node. The state log is copied, e.g., transmitted, to the new network switching node. The new network switching node does not process any new commands until its state log is up-to-date as determined by the order specified in the transferred state log. The new network switching node may execute commands in the state log according to the specified order and then may execute new commands.
[0021] Consensus functions allow a collection of machines to work as a coherent group that can survive the failures of some of its members, and thus provide a reliable mechanism to build a multi-node system. The system 100 may use a consensus function for managing and replicating the state log. In one example, the system 100 uses the Raft consensus function described in "In Search of an Understandable Consensus Algorithm" by Ongaro et al. to implement key elements of consensus, including state log replication and leader node election. The network switching nodes 1 10a-n may use heartbeats for leader node election. The network switching nodes receive heartbeats from a leader node. If no heartbeat is received over a period of time called the election timeout, then a network switching node assumes there is no viable leader and begins an election to choose a new leader. To begin an election, a network switching node may send a RequestVote message to all other network switching nodes in the system 100 and vote for itself. The network switching node wins an election if it receives votes from a majority of the wins an election if it receives votes from a majority of the other network switching nodes in the system 100.
[0022] Figure 2 shows a network switching node 200 that may be used in the network switch 100 shown in figure 1 . The network switching node 200 may be implemented as any of the network switching nodes 1 10a-n shown in figure 1 . The network switching node 200 includes a hardware routing circuit 21 1 , e.g., ASIC, FPGA, etc., that route packets in the network 150 via ports 205, according to a routing table. The hardware routing circuit 21 1 is designed to make routing decisions to route packets towards their destinations at line rates or near line rates.
[0023] The network switching node 200 includes a processor 201 an input/output interface 202, a data storage 206, state machine replication module 212 and command logic module 213. The processor 201 may include a microprocessor operable to execute machine readable instructions to perform programmed operations. The data storage 206 may include volatile and/or nonvolatile data storage, such as random access memory, memristors, flash memory, and the like. Machine readable instructions, tables, and any information used by the network switching node 200 may be stored on the data storage 206. [0024] The input/output interface 202 may include a network interface or another interface to connect to the network 1 12, which is connected to other network switching nodes and the controller 160.
[0025] The state machine replication module 212 may perform the functions of the state machine replication system 1 12 described with respect to figure 1 . The command logic module 213 may perform the functions of the command logic 1 13 described with respect to figure 1 . The modules 212 and 213 may comprise machine readable instructions stored in the data storage 206 and executed by the processor 201 .
[0026] The data storage 206 may also store state log 220 and routing table 221 . The state log 220 includes the commands, events and order derived from state messages, for example, from the leader node. The routing table 221 stores routing table entries used by the hardware routing circuit 21 1 to route packets in the network 150. The routing table 221 , e.g., FIB, RIB, etc., may be stored in data storage at the hardware routing circuit 21 1 for line-rate routing performed by the hardware routing circuit 21 1 .
[0027] Figure 3 illustrates a method 300 that may be executed by one or more of the network switching nodes 1 10a-n shown in figure 1 . The method 300 is a method for managing a state for the network switch 100 shown in figure 1 . At 301 shown in figure 3, a state message is received from a leader node of the network switching nodes 1 10a-n at a network switching node of the nodes 1 10a-n. For example, assume the network switching node 1 10a shown in figure 1 is the leader node. The leader node sends state messages to the network switching nodes 1 10b-n. The state messages may include routing commands from the controller 160, such as commands to add or modify routing table entries, VLAN membership, and other commands. The state messages may include events occurring in a network switching node that is to be notified to other network switching nodes and/or the controller 160. The events may include a hardware change event or other events.
[0028] At 302, the network switching node receiving the state message stores the state in a state log. The state log may include the received state messages, including commands, event, etc., in the state messages, and the order of consuming the state messages as specified by the leader node.
[0029] At 303, the network switching node determines whether an action is to be performed at the network switching node in response to the received state message. For example, the state message may include a command to add a port to a VLAN. If the network switching node does not include the port specified in the state message, then no action may be taken. If the network switching node includes the port, then the network switching node adds the port to the specified VLAN. Accordingly, different network switching nodes receiving the same state message may perform different actions or perform no action, but the state is the same across the network switching nodes, because the state includes the command from the state message. At 304, the network switching node performs the action if it determines the action is to be taken; otherwise at 305, no action is taken.
[0030] What has been described and illustrated herein are examples of the disclosure along with some variations. The terms, descriptions and figures used herein are set forth by way of illustration only and are not meant as limitations. Many variations are possible within the scope of the disclosure, which is intended to be defined by the following claims, and their equivalents, in which all terms are meant in their broadest reasonable sense unless otherwise indicated.

Claims

What is claimed is:
1 . A network switching node in a network switch including a plurality of network switching nodes, the network switching node comprising:
data storage to store a routing table and a state log;
a hardware routing circuit to route packets in a network via ports, according to the routing table;
a state machine replication system to receive a plurality of state messages for programming the network switching node and store a state of the system in the state log, wherein the state includes the plurality of state messages and an order of consuming the plurality of state messages; and
command logic to determine whether an action is to be performed at the network switching node in response to a state message of the plurality of state messages, and execute the action if the action is to be performed.
2. The network switching node of claim 1 , wherein the plurality of state messages include a routing command message from a controller programming the plurality of network switching nodes, and an event message identifying an event occurring from another of the plurality of network switching nodes.
3. The network switching node of claim 2, wherein the command logic determines an action to be performed in response to the routing command message, and executes the action, and wherein another of the plurality of network switching nodes receiving the routing command message performs a different action in response to the routing command message or performs no action in response to the routing command message.
4. The network switching node of claim 1 , wherein the plurality of network switching nodes store the same state, including the plurality of state messages and the order.
5. The network switching node of claim 1 , wherein the state machine replication system copies the state log to a new switching node included in the system to replicate the state of the system in the new switching node.
6. A network switch comprising:
an internal network;
a plurality of network switching nodes connected via the internal network, wherein each of the plurality of network switching nodes comprises:
a hardware routing circuit to route packets in an external network via ports, according to a routing table;
a state machine replication system to receive a plurality of state messages via the internal network and store a state of the system in a state log, wherein the state includes the plurality of state messages and an order of consuming the plurality of state messages, and
command logic to execute commands from a controller, wherein a leader node is selected from the plurality of network switching nodes, and the leader node receives commands from the controller and broadcasts the plurality of state messages, including the commands, to the plurality of network switching nodes via the internal network.
7. The network switch of claim 6, wherein the broadcasted state messages from the leader node include events received from one of the plurality of network switching nodes reacting to at least one of the commands.
8. The network switch of claim 7, wherein a message, from the controller, requesting execution of a programming command, is received by one of the plurality of network switching nodes and is forwarded by the network switching node to the leader node, and
the leader node generates and broadcasts a routing command message via the internal network to the plurality of network switching nodes as a state message.
9. The network switch of claim 6, wherein the state machine replication of one of the plurality of network switching nodes copies the state log via the internal network to a new switching node joining the to replicate the state of the in the new switching node.
10. The network switch of claim 6, wherein the command logic determines an action to be performed in response to receiving a routing command message, and executes the action, and another of the plurality of network switching nodes receiving the routing command message performs a different action in response to the routing command message or performs no action in response to the routing command message.
1 1 . The network switch of claim 6, wherein the plurality of network switching nodes store the same state, including the plurality of state messages and the order.
12. The network switch of claim 6, wherein the leader node sends events to the controller in response to broadcasting the events to the plurality of network switching nodes as state messages.
13. The network switch of claim 6, wherein the leader node selects one of the plurality of network switching nodes as a backup leader node that takes over as leader node in response the leader node failing.
14. The network switch of claim 6, wherein each of the plurality of network switching nodes periodically compresses the state log stored locally.
15. A method of state management performed by a network switching node in a network switch, the method comprising:
receiving state messages from a leader node of the plurality of network switching nodes, wherein the state messages include programming commands from a controller and notification of an event occurring at a network switching node of the plurality of network switching nodes;
storing a state of the system in a state log at the network switching node, wherein the state includes the plurality of state messages and an order of consuming the plurality of state messages;
determining whether an action is to be performed at the network switching node in response to a state message of the plurality of state messages; and
executing the action if the action is to be performed.
PCT/US2015/013324 2015-01-28 2015-01-28 Network switching node with state machine replication WO2016122495A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/US2015/013324 WO2016122495A1 (en) 2015-01-28 2015-01-28 Network switching node with state machine replication

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/US2015/013324 WO2016122495A1 (en) 2015-01-28 2015-01-28 Network switching node with state machine replication

Publications (1)

Publication Number Publication Date
WO2016122495A1 true WO2016122495A1 (en) 2016-08-04

Family

ID=56543921

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2015/013324 WO2016122495A1 (en) 2015-01-28 2015-01-28 Network switching node with state machine replication

Country Status (1)

Country Link
WO (1) WO2016122495A1 (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7864769B1 (en) * 2002-08-14 2011-01-04 Juniper Networks, Inc. Multicast packet replication
US20130142202A1 (en) * 2011-12-06 2013-06-06 International Business Machines Corporation Distributing functions in a distributed and embedded environment
WO2013184846A1 (en) * 2012-06-06 2013-12-12 Juniper Networks, Inc. Physical path determination for virtual network packet flows
US20140310240A1 (en) * 2013-04-15 2014-10-16 International Business Machines Corporation Executing distributed globally-ordered transactional workloads in replicated state machines
US20140369186A1 (en) * 2013-06-17 2014-12-18 Telefonaktiebolaget L M Ericsspm (publ) Methods and systems with enhanced robustness for multi-chassis link aggregation group

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7864769B1 (en) * 2002-08-14 2011-01-04 Juniper Networks, Inc. Multicast packet replication
US20130142202A1 (en) * 2011-12-06 2013-06-06 International Business Machines Corporation Distributing functions in a distributed and embedded environment
WO2013184846A1 (en) * 2012-06-06 2013-12-12 Juniper Networks, Inc. Physical path determination for virtual network packet flows
US20140310240A1 (en) * 2013-04-15 2014-10-16 International Business Machines Corporation Executing distributed globally-ordered transactional workloads in replicated state machines
US20140369186A1 (en) * 2013-06-17 2014-12-18 Telefonaktiebolaget L M Ericsspm (publ) Methods and systems with enhanced robustness for multi-chassis link aggregation group

Similar Documents

Publication Publication Date Title
CN109952740B (en) Large-scale scalable, low-latency, high-concurrency, and high-throughput decentralized consensus method
US20180316607A1 (en) Providing non-interrupt failover using a link aggregation mechanism
US7392424B2 (en) Router and routing protocol redundancy
EP3029893B1 (en) Method, controller, device and system for protecting service path
KR101099822B1 (en) Redundant routing capabilities for a network node cluster
US7929420B2 (en) Method and apparatus for learning VRRP backup routers
JP5033856B2 (en) Devices and systems for network configuration assumptions
EP2891286B1 (en) System and method for supporting discovery and routing degraded fat-trees in a middleware machine environment
EP2883334B1 (en) Techniques for flooding optimization for link state protocols in a network topology
CN109344014B (en) Main/standby switching method and device and communication equipment
EP3316555B1 (en) Mac address synchronization method, device and system
CN106936682B (en) Processing method and controller for appointed forwarder and provider edge device
JP2010283811A (en) Method and apparatus for maintaining port state table in forwarding plane of network element
EP3213441B1 (en) Redundancy for port extender chains
CN111901133B (en) Multicast switching method, device, network equipment and storage medium
CN112737956A (en) Message sending method and first network equipment
US20140269265A1 (en) Failover procedure for networks
US10979328B2 (en) Resource monitoring
Tu et al. In-band control for an ethernet-based software-defined network
US10516625B2 (en) Network entities on ring networks
US10666553B2 (en) Method for quick reconfiguration of routing in the event of a fault in a port of a switch
WO2016122495A1 (en) Network switching node with state machine replication
US10824443B1 (en) State machine communication
CN110912837B (en) VSM system-based main/standby switching method and device
Venâncio et al. Nfv-rbcast: Enabling the network to offer reliable and ordered broadcast services

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 15880394

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 15880394

Country of ref document: EP

Kind code of ref document: A1