WO2021232190A1 - Forward path planning method in massive data center networks - Google Patents

Forward path planning method in massive data center networks Download PDF

Info

Publication number
WO2021232190A1
WO2021232190A1 PCT/CN2020/090827 CN2020090827W WO2021232190A1 WO 2021232190 A1 WO2021232190 A1 WO 2021232190A1 CN 2020090827 W CN2020090827 W CN 2020090827W WO 2021232190 A1 WO2021232190 A1 WO 2021232190A1
Authority
WO
WIPO (PCT)
Prior art keywords
forward path
computing node
data
data packet
destination device
Prior art date
Application number
PCT/CN2020/090827
Other languages
French (fr)
Inventor
Jianguo Liang
ChenChen QI
Haiyang ZHENG
Xuemei SHI
Original Assignee
Alibaba Group Holding Limited
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba Group Holding Limited filed Critical Alibaba Group Holding Limited
Priority to CN202080100357.0A priority Critical patent/CN115462049B/en
Priority to PCT/CN2020/090827 priority patent/WO2021232190A1/en
Publication of WO2021232190A1 publication Critical patent/WO2021232190A1/en

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/10Flow control; Congestion control
    • H04L47/11Identifying congestion
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L45/00Routing or path finding of packets in data switching networks
    • H04L45/12Shortest path evaluation
    • H04L45/121Shortest path evaluation by minimising delays
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L45/00Routing or path finding of packets in data switching networks
    • H04L45/28Routing or path finding of packets in data switching networks using route fault recovery
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/10Flow control; Congestion control
    • H04L47/12Avoiding congestion; Recovering from congestion
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/10Flow control; Congestion control
    • H04L47/12Avoiding congestion; Recovering from congestion
    • H04L47/122Avoiding congestion; Recovering from congestion by diverting traffic away from congested entities
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L49/00Packet switching elements
    • H04L49/15Interconnection of switching modules
    • H04L49/1553Interconnection of ATM switching modules, e.g. ATM switching fabrics
    • H04L49/1569Clos switching fabrics
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L45/00Routing or path finding of packets in data switching networks
    • H04L45/12Shortest path evaluation
    • H04L45/123Evaluation of link metrics
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L45/00Routing or path finding of packets in data switching networks
    • H04L45/74Address processing for routing
    • H04L45/745Address table lookup; Address filtering
    • H04L45/7453Address table lookup; Address filtering using hashing
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks

Definitions

  • Equal Cost Multipath (ECMP) path planning algorithm enables the usage of multiple equal cost paths from the source node to the destination node in the network. The advantage of using this algorithm is that the data flows can be split more uniformly to the whole network, thus, avoiding congestion and increasing bandwidth consumption.
  • multiple servers connect to the network via a first tier switch, i.e., a leaf switch or an access switch.
  • a first tier switch i.e., a leaf switch or an access switch.
  • the server forwards the data packets to various paths in the network to their respective destinations.
  • the data packets forwarding may be determined based on the “server ⁇ version” topology information and pre ⁇ calculated values associated with the various paths.
  • each of the leaf switches performs dynamic path planning to distribute the data packets based on the “switch ⁇ version” topology information and the path planning algorithm.
  • the “server ⁇ version” topology information may not represent the real ⁇ time network topology information.
  • the server cannot determine the source of the network congestion and respond timely to re ⁇ route the data flow.
  • the present disclosure implements the dynamic path planning capability of a switch (i.e., a leaf switch or an access switch of a two ⁇ tier or three ⁇ tire Clos network) on a computing node (i.e., a server device) that access the network via the switch.
  • the computing node communicates with one or more switches to obtain information related to the path planning algorithm or the routing algorithm used by the switch and the network topology associated with the switch through various protocols, such as Link Layer Discovery Protocol (LLDP) .
  • LLDP Link Layer Discovery Protocol
  • the computing node further configures the path planning algorithm or the routing algorithm used therein based on the information related to the path planning algorithm or the routing algorithm used by the switch.
  • the path planning algorithm or the routing algorithm used by the switch may include an equal cost multipath (ECMP) planning algorithm.
  • the computing node further synchronizes the network topology associated with the computing node with the network topology associated with the switch.
  • the present disclosure enables the computing node to perform the dynamic path planning for received data packets ahead of the switch in massive data center networks, hence, effectively avoiding the data flow collisions in the network. Further, the computing node according to the present disclosure can efficiently detect the network congestions and respond timely by re ⁇ routing the data flow so as to bypass the network congestions.
  • FIG. 1 illustrates an example environment in which a forward path planning system may be used in accordance with an embodiment of the present disclosure.
  • FIG. 2 illustrates an example network architecture of massive data center networks in accordance with an embodiment of the present disclosure.
  • FIG. 3 illustrates examples failures occurred in the massive data center networks in accordance with an embodiment of the present disclosure.
  • FIG. 4 illustrates an example configuration of a computing node for implementing the forward path planning method in accordance with an embodiment of the present disclosure.
  • FIG. 5 illustrates an example forward path planning in accordance with an embodiment of the present disclosure.
  • FIG. 6 illustrates another example forward path planning in accordance with an embodiment of the present disclosure.
  • FIG. 7 illustrates an example equal cost multipath (ECMP) planning in accordance with an embodiment of the present disclosure.
  • ECMP equal cost multipath
  • FIG. 8 illustrates an example forward path planning algorithm in accordance with an embodiment of the present disclosure.
  • FIG. 9 illustrates another example forward path planning algorithm in accordance with an embodiment of the present disclosure.
  • FIG. 10 illustrates another example forward path planning algorithm in accordance with an embodiment of the present disclosure.
  • FIG. 11 illustrates another example forward path planning algorithm in accordance with an embodiment of the present disclosure.
  • the application describes multiple and varied embodiments and implementations.
  • the following section describes an example framework that is suitable for practicing various implementations.
  • the application describes example systems, devices, and processes for implementing a distributed training system.
  • FIG. 1 illustrates an example environment in which a forward path planning system may be used in accordance with an embodiment of the present disclosure.
  • the environment 100 may include a data center network 102.
  • the data center network 102 may include a plurality of computing nodes or servers 104 ⁇ 1, 104 ⁇ 2, ..., 104 ⁇ K (which are collectively called hereinafter as computing nodes 104) , where K is a positive integer greater than one.
  • the plurality of computing nodes 104 may communicate data with each other via a communication network 106.
  • the computing node 104 may be implemented as any of a variety of computing devices having computing/processing and communication capabilities, which may include, but not limited to, a server, a desktop computer, a notebook or portable computer, a handheld device, a netbook, an Internet appliance, a tablet computer, a mobile device (e.g., a mobile phone, a personal digital assistant, a smart phone, etc. ) , etc., or a combination thereof.
  • a server e.g., a desktop computer, a notebook or portable computer, a handheld device, a netbook, an Internet appliance, a tablet computer, a mobile device (e.g., a mobile phone, a personal digital assistant, a smart phone, etc. ) , etc., or a combination thereof.
  • the communication network 106 may be a wireless or a wired network, or a combination thereof.
  • the network 106 may be a collection of individual networks interconnected with each other and functioning as a single large network (e.g., the Internet or an intranet) . Examples of such individual networks include, but are not limited to, telephone networks, cable networks, Local Area Networks (LANs) , Wide Area Networks (WANs) , and Metropolitan Area Networks (MANs) . Further, the individual networks may be wireless or wired networks, or a combination thereof. Wired networks may include an electrical carrier connection (such a communication cable, etc. ) and/or an optical carrier or connection (such as an optical fiber connection, etc. ) .
  • Wireless networks may include, for example, a WiFi network, other radio frequency networks (e.g., Zigbee, etc. ) , etc.
  • the communication network 106 may include a plurality of inter ⁇ node interconnects or switches 108 ⁇ 1, 108 ⁇ 2, ..., 108 ⁇ L (which are collectively called hereinafter as inter ⁇ node switches 108) for providing connections between the computing nodes 104, where L is a positive integer greater than one.
  • the environment 100 may further include a plurality of client devices 110 ⁇ 1, 110 ⁇ 2, ..., 110 ⁇ N (which are collectively called hereinafter as client devices 110) , where N is a positive integer greater than one.
  • client devices 110 may communicate with each other via the communication network 106, or access online resources and services. These online resources and services may be implemented at the computing nodes 104.
  • Data flows generated by users of the client devices 110 may be distributed to a plurality of routing paths and routed to a destination device through one or more of the plurality of paths.
  • the destination device may include another client device 110 or a computing node 104.
  • each of the plurality of routing paths may include one or more computing nodes 104 and switches 108 inter ⁇ connected by physical links.
  • FIG. 2 illustrates an example network architecture of massive data center networks in accordance with an embodiment of the present disclosure.
  • the network architecture of massive data center networks 200 may provide a detailed view of the environment in which a forward path planning system may be used.
  • the network architecture of massive data center networks is a three ⁇ tier Clos network architecture in a full ⁇ mesh topology.
  • a first tier corresponds to a tier of leaf switches 206, also called access switches or top of rack (ToR) switches.
  • the computing nodes 208 are directly connected to the leaf switches 206, with each computing node 208 being connected to at least two leaf switches 206.
  • a computing node 208 may include one or more network interface controllers (e.g., four network interface controllers) which are connected to one or more ports (e.g., four ports) of a leaf switch 206. In implementations, the number of network interface controllers in each computing node 208 may or may not be the same.
  • a second tier corresponds to a tier of aggregation switches 204, also called spine switches 204, that are connected to one or more leaf switches 206.
  • a number of computing nodes 208, the interconnected leaf switches 206, and the interconnected aggregation switches 204 may form a point of delivery (PoD) unit, for example, PoD ⁇ 1 and PoD ⁇ 2, as illustrated.
  • PoD point of delivery
  • a third tier corresponds to a tier of core switches 202 that are connected to one or more aggregation switches 204.
  • the core switches 202 is at the top of the cloud data center network pyramid and may include a wide area network (WAN) connection to the outside carrier network.
  • WAN wide area network
  • a data packet that is transmitted between the two processing units or processes can be made to flow through a specified aggregation switch by setting an appropriate combination of source and destination ports in the data packet.
  • the routing management for congestion avoidance may aim at enabling data flows from a same leaf switch to different destination leaf switches to pass through different aggregation switches, and/or data flows from different source leaf switches to a same destination leaf switches to pass through different aggregation switches, thus avoiding collisions between the data flows, and leading to no network congestion at the aggregation switches.
  • a processing unit or process in a computing node 208 may send/receive data to/from a processing unit or process in another computing node through a network interface controller (NIC) .
  • NIC network interface controller
  • a processing unit or process in a computing node 208 may be associated with a single network interface controllers or multiple network interface controllers for transmitting data to processing units or processes in other computing nodes. Additionally, or alternatively, multiple processing units or processes may be associated with a single network interface controller and employ that network interface controller for transmitting data to processing units or processes in other computing nodes.
  • a plurality of rules for data packet forwarding/routing may be implemented on the computing nodes 208.
  • the plurality of rules may include, but are not limited to, priorities for a processing unit or process in a first computing node to select a neighboring processing unit or process, conditions for a network interface controller in a first computing node to send or receive data, conditions for a network interface controller in a first computing node to route data to/from a network interface controller in a second computing node, etc.
  • the routing management may assign network interface controller (NIC) identifiers to each network interface controller that is connected or linked to a same leaf switch.
  • NIC network interface controller
  • a network interface controller of the processing unit or process and a network interface controller of a next processing unit or process are located in a same computing node or are directly connected or linked to a same leaf switch, a routing identifier may be determined as a default value or identifier. This default routing identifier indicates that data is either routed within a computing node or through a leaf switch, without passing through any aggregation switch in the communication network. Otherwise, the routing identifier may be determined to be equal to a NIC identifier of that processing unit or process, or other predefined value.
  • an aggregation identifier may be determined based on the determined routing identifier.
  • the mapping relationship between routing identifiers and aggregation identifiers may be predetermined in advance using a probing ⁇ based routing mechanism (e.g., sending probing data packets between computing nodes as described in the foregoing description) , for example.
  • a probing ⁇ based routing mechanism e.g., sending probing data packets between computing nodes as described in the foregoing description
  • data flows between processing units (or processes) which are included in different computing nodes and which network interface controllers different leaf switches will pass through a designated aggregation switch based on a predetermined mapping relationship, thus enabling routing control and management of data flows and distributing the data flows to different aggregation switches to avoid network congestion.
  • the three ⁇ tier Clos network as illustrated in FIG. 2 is merely an example network architecture of the massive data center network.
  • Other network architectures including, but not limited to, two ⁇ tier Clos networks may also be adopted to construct the massive data center networks.
  • FIG. 3 illustrates example failures occurred in the massive data center networks in accordance with an embodiment of the present disclosure.
  • network anomaly including, but not limited to, link failure (e.g., failure 312, 316, and 320) , computing node failure (e.g., failure 310) , leaf switch failure (e.g., failure 318) , aggregation switch failure (e.g., failure 314) , or core switch failure (e.g., failure 322) may occur, causing data packet loss and congestions in certain routing paths.
  • detection technology such as network quality analyzer (NQA) Track may be introduced.
  • NQA network quality analyzer
  • one or more computing nodes in each point of delivery (PoD) unit may implement the NQA track scheme to detect whether another computing node in the same or different PoD unit, a leaf switch in the same or different PoD unit, an aggregation switch in the same or different PoD unit, or a core switch is unreachable, etc.
  • An example detection approach is to utilize one computing node in a point of delivery (PoD) unit as a detection source.
  • computing node 308 ⁇ 1 in PoD ⁇ 1 may be assigned as a detection source.
  • the computing node 308 ⁇ 1 may periodically ping other computer nodes, leaf switches, aggregation switches, and core switches, by sending an Internet Control Message Protocol (ICMP) echo request packet.
  • ICMP Internet Control Message Protocol
  • the computing node 308 ⁇ 1 waits for an ICMP echo reply from each of those nodes and switches. After a pre ⁇ set period, or called Time to Live (TTL) period, if no echo reply is received from either the computing node or the switch, the computing node 308 ⁇ 1 determines that the computing node or the switch is unreachable.
  • TTL Time to Live
  • the detection source when the packet loss occurs in a large amount of computing nodes connected to a same leaf switch, the detection source (i.e., the designated computing node for anomaly detection) may further determine that the leaf switch may be in failure. In another example, when the packet loss only occurs in sporadic computing nodes connected to a same leaf switch, the detection source may further determine that these sporadic computing nodes may experience overload or the corresponding ports in the leaf switch may be full. In yet another example, when packet loss occurs in a large amount of computing nodes located in a different point of delivery (PoD) unit, the detection source may further determine that failures may occur in one or more corresponding aggregation switches located therein and/or one or more corresponding core switches connected thereto.
  • PoD point of delivery
  • the detection source may further determine that these sporadic computing nodes may experience overload or the corresponding ports in the aggregation switch and/or the core switch may be full.
  • Another example detection approach is to cooperate various computing nodes in different locations as detection sources and deploy different detection strategies among those detection sources.
  • Each of the detection sources may implement an agent program capable of additionally detecting the anomaly associated with OSI layer 4 through layer 7.
  • the detection sources may accept input control commands to dynamically configure the detection strategies.
  • This approach may establish TCP connections to further detect one or more parameters associated with the transport layer (i.e., OSI layer 4) , for example, transmission delay or transmission rate. With the network topology information, the transmission delay or transmission rate, and the data packet loss rate, the detection approach may further learn the exact location of the anomaly.
  • the detection source may determine that the number of computing nodes are operating normal but the first leaf switch may experience certain anomaly.
  • the detection source may determine that the first leaf switch are operating but the ports in the first leaf switch that correspond to the first number of computing nodes may be congested.
  • RED random early detection
  • WRED weighted random early detection
  • RRED robust random early detection
  • ECN explicit congestion notification
  • BECN backward ECN
  • FIG. 4 illustrates an example configuration of a computing node for implementing the forward path planning method in accordance with an embodiment of the present disclosure.
  • the example configuration 400 of the computing node 402 may include, but is not limited to, one or more processing units 404, one or more network interfaces 406, an input/output (I/O) interface 408, and memory 412.
  • the computing node 402 may further include one or more intra ⁇ node interconnects or switches 410.
  • the processing units 404 may be configured to execute instructions that are stored in the memory 412, and/or received from the input/output interface 408, and/or the network interface 406.
  • the processing units 404 may be implemented as one or more hardware processors including, for example, a microprocessor, an application ⁇ specific instruction ⁇ set processor, a physics processing unit (PPU) , a central processing unit (CPU) , a graphics processing unit, a digital signal processor, a tensor processing unit, etc. Additionally, or alternatively, the functionality described herein can be performed, at least in part, by one or more hardware logic components.
  • FPGAs field ⁇ programmable gate arrays
  • ASICs application ⁇ specific integrated circuits
  • ASSPs application ⁇ specific standard products
  • SOCs system ⁇ on ⁇ a ⁇ chip systems
  • CPLDs complex programmable logic devices
  • the memory 412 may include machine readable media in a form of volatile memory, such as Random Access Memory (RAM) and/or non ⁇ volatile memory, such as read only memory (ROM) or flash RAM.
  • RAM Random Access Memory
  • ROM read only memory
  • flash RAM flash random Access Memory
  • the machine readable media may include a volatile or non ⁇ volatile type, a removable or non ⁇ removable media, which may achieve storage of information using any method or technology.
  • the information may include a machine readable instruction, a data structure, a program module or other data.
  • machine readable media examples include, but not limited to, phase ⁇ change memory (PRAM) , static random access memory (SRAM) , dynamic random access memory (DRAM) , other types of random ⁇ access memory (RAM) , read ⁇ only memory (ROM) , electronically erasable programmable read ⁇ only memory (EEPROM) , quick flash memory or other internal storage technology, compact disk read ⁇ only memory (CD ⁇ ROM) , digital versatile disc (DVD) or other optical storage, magnetic cassette tape, magnetic disk storage or other magnetic storage devices, or any other non ⁇ transmission media, which may be used to store information that may be accessed by a computing node.
  • the machine readable media does not include any transitory media, such as modulated data signals and carrier waves.
  • the network interfaces 406 may be configured to connect the computing node 402 to other computing nodes via the communication network 106.
  • the network interfaces 406 may be established through a network interface controller (NIC) , which may employ both hardware and software in connecting the computing node 402 to the communication network 106.
  • NIC network interface controller
  • each type of NIC may use a different type of fabric or connector to connect to a physical medium associated with the communication network 106. Examples of types of fabrics or connectors may be found in the IEEE 802 specifications, and may include, for example, Ethernet (which is defined in 802.3) , Token Ring (which is defined in 802.5) , and wireless networking (which is defined in 802.11) , an InfiniBand, etc.
  • the intra ⁇ node switches 410 may include various types of interconnects or switches, which may include, but are not limited to, a high ⁇ speed serial computer expansion bus (such as PCIe, etc. ) , a serial multi ⁇ lane near ⁇ range communication link (such as Nolan, which is a wire ⁇ based communications protocol serial multi ⁇ lane near ⁇ range communication link, for example) , a switch chip with a plurality of ports (e.g., a NVSwitch, etc. ) , a point ⁇ to ⁇ point processor interconnect (such as an Intel QPI/UPI, etc. ) ., etc.
  • a high ⁇ speed serial computer expansion bus such as PCIe, etc.
  • a serial multi ⁇ lane near ⁇ range communication link such as Nolan, which is a wire ⁇ based communications protocol serial multi ⁇ lane near ⁇ range communication link, for example
  • a switch chip with a plurality of ports e.g., a NVSwitch, etc.
  • a point ⁇ to ⁇ point processor interconnect such as
  • the computing node 402 may further include other hardware components and/or other software components, such as program modules 414 to execute instructions stored in the memory 412 for performing various operations, and program data 416 for storing data received for path planning, anomaly detection, etc.
  • the program modules 414 may include a topology awareness module 418, a path planning module 420, and an anomaly detection module 422.
  • the topology awareness module 418 may be configured to maintain topology data associated with the network 106.
  • the topology data may be generated and implemented on each elements of the network 106 when the network architecture is delivered.
  • the topology data includes arrangements of the elements of a network, computing nodes, leaf switches, aggregation switches, core switches, etc., and indicates the connections/links among these elements.
  • the topology data may be represented as non ⁇ directional graph and stored as an adjacent list. All paths that route the data packets from a source node to a destination node may be configured to have equal cost. Hence, data flow from the source node to the destination node may be evenly distributed to all paths.
  • one or more available paths from the source node to the destination may be configured as reserved bandwidths and the data flow may be forwarded only to available paths.
  • the topology awareness module 418 may communicate with one or more switches in the massive data center network (e.g., leaf switch 306) in real ⁇ time, periodically, or at a pre ⁇ set time period to obtain topology data associated with the network 106.
  • the topology data associated with the network 106 may be stored in one or more separate storage devices.
  • the topology awareness module 418 may communicate with the one or more separate storage devices to obtain the real ⁇ time topology data.
  • the topology data associated with the network 106 may be dynamically updated to the topology awareness module 418 in response to a notification of network congestions.
  • the communication and topology data exchange between the computing node and the leaf switches may be achieved by using protocols including, but not limited to, Link Layer Discovery Protocol (LLDP) , Link Aggregation Control Protocol (LACP) , general ⁇ purpose Remote Procedure Calls (GRPC) , etc.
  • LLDP Link Layer Discovery Protocol
  • LACP Link Aggregation Control Protocol
  • GRPC general ⁇ purpose Remote Procedure Calls
  • the communication and topology data exchange between the computing node and the leaf switches may be achieved by remote software control.
  • the path planning module 410 may be configured to determine the routing paths to forward data packets and distribute the data packets to the routing paths to balance the data flows in the network 106.
  • the path planning module 420 may obtain a source address, a destination address, and a protocol from an IP header of a TCP/IP data packet, and a source port and a destination port from the TCP packet.
  • the source address, the destination address, the source port, the destination port, and the protocol may form a so ⁇ called five ⁇ tuple (or 5 ⁇ tuple) .
  • the five ⁇ tuple may uniquely indicate a data flow, in which all data packets have exactly the same five ⁇ tuples.
  • the path planning module 420 may determine all possible routing paths from the source node to the destination node. A data flow, in which all data packets have exactly the same five ⁇ tuples may use one of the all possible routing paths at one time.
  • the network topology data changes due to the anomaly.
  • the topology awareness module 418 may update the network topology data stored in the program data 416 to reflex the changes.
  • the network topology data associated with the computing node 402 may be stored separately from the program data 416 and may be updated in response to the topology data changes due to the anomaly.
  • the path planning module 410 may recompute using the Hash algorithm based on the updated network topology data and select a different path by riding another source port. It should be appreciated that the five ⁇ tuple (or 5 ⁇ tuple) that uniquely indicates a data flow described above is merely for the purpose of illustration. The present disclosure is not intended to be limiting.
  • the path planning module 420 may construct a three ⁇ tuple (or 3 ⁇ tuple) including the source IP address, destination IP address, and ICMP Identifier that uniquely identifies an ICMP Query session to indicate a data flow.
  • the planning module 420 may determine all possible routing paths from the source node to the destination node.
  • the routing paths may be determined based on various path finding algorithms including, but not limited to, shortest path algorithms. Examples of shortest path algorithms may include, but not limited to, Dijkstra's algorithm, Viterbi algorithm, Floyd–Warshall algorithm, Bellman–Ford algorithm, etc.
  • the path planning module 420 may further perform a Hash operation on the five ⁇ tuple to get a corresponding five ⁇ tuple hash values and determine a routing path from all possible shortest routing paths according to the five ⁇ tuple has values.
  • Hash operation maps a five ⁇ tuple to a unique path
  • data packets that all have the same five ⁇ tuple may be routed through the same path.
  • Various Hash algorithms may be implemented by the path planning module 420 including, but not limited to, Message Digest (MD, MD2, MD4, MD5 and MD6) , RIPEMD (RIPEND, RIPEMD ⁇ 128, and RIPEMD ⁇ 160) , Whirlpool (Whirlpool ⁇ 0, Whirlpool ⁇ T, and Whirlpool) or Secure Hash Function (SHA ⁇ 0, SHA ⁇ 1, SHA ⁇ 2, and SHA ⁇ 3) .
  • Message Digest MD, MD2, MD4, MD5 and MD6
  • RIPEMD RIPEND, RIPEMD ⁇ 128, and RIPEMD ⁇ 160
  • Whirlpool Widelpool ⁇ 0, Whirlpool ⁇ T, and Whirlpool
  • Secure Hash Function SHA ⁇ 0, SHA ⁇ 1, SHA ⁇ 2, and SHA ⁇ 3
  • the computing node forwards data packets based on pre ⁇ computed Hash values corresponding to all possible routing paths, respectively and the switches (i.e., the leaf switches) perform path planning based on the dynamic network topology and a Hash algorithm.
  • the switches i.e., the leaf switches
  • the topology data associated with the computing node and the leaf switches are not synchronized, and the Hash algorithms implemented on the computing node and the leaf switches have different configurations.
  • the five ⁇ tuples Hash values calculated by the computing node and the switches may direct different data flows to a same routing path, causing possible congestions in the network.
  • the path planning module 420 of the computing node 402 may obtain information associated with the Hash algorithm implemented on one or more leaf switches and configure the Hash algorithm implemented on the computing node 402 using the obtained information.
  • the information may include one or more parameters configured for the Hash algorithm implemented on the one or more leaf switches.
  • the path planning module 420 of the computing node 402 may further obtain network topology data stored associated with the one or more leaf switches and update the network topology data according to the obtained network topology data.
  • the communication and data exchange between the computing node and the leaf switches may be achieved by using protocols including, but not limited to, Link Layer Discovery Protocol (LLDP) , Link Aggregation Control Protocol (LACP) , general ⁇ purpose Remote Procedure Calls (GRPC) , etc.
  • LLDP Link Layer Discovery Protocol
  • LACP Link Aggregation Control Protocol
  • GRPC general ⁇ purpose Remote Procedure Calls
  • the collisions of mapping different five ⁇ tuple hash values to a same path may be reduced and possible flow congestion may be avoided.
  • the computing node may effectively determine the element in the network that is involved in the anomaly and re ⁇ plan the forward paths for the data flow.
  • the anomaly detection module 422 may be configured to detect anomaly occurred in the network 106.
  • the anomaly detection module 422 may implement the detection approaches described above with respect to FIG. 3, hence, is not described in detail herein.
  • the program data 416 may be configured to store topology information 424, configuration information 426, and routing information 428.
  • the topology information may include the network elements and the connection status of the network elements.
  • the topology information may be dynamically updated according to the path planning and data exchange between the computing node 402 and the leaf switches.
  • the configuration information 426 may include versions and parameters of the algorithms implemented on the computing node 402, routing algorithms, Hash algorithms, for example.
  • the routing information 428 may include all possible routing paths between a source node and a destination node.
  • the routing information 428 may further include mappings between the five ⁇ tuple hash values and the corresponding forward path.
  • FIG. 5 illustrates an example forward path planning in accordance with an embodiment of the present disclosure.
  • the example forward path planning 500 is illustrated among various computing nodes and leaf switches, ultimately connected to a single aggregation switch. Data packets from computing node 506 ⁇ 1 to computing node 506 ⁇ 2 are distributed to two data flows in two routing paths.
  • Path A includes four hops: Path A ⁇ 1, Path A ⁇ 2, Path A ⁇ 3, and Path A ⁇ 4 and goes through the computing node 506 ⁇ 1, the leaf switch 504 ⁇ 1, the aggregation switch 502, the leaf switch 504 ⁇ 2, and the computing node 506 ⁇ 2.
  • Path B includes four hops: Path B ⁇ 1, Path B ⁇ 2, Path B ⁇ 3, and Path B ⁇ 4 and goes through the computing node 506 ⁇ 1, the leaf switch 504 ⁇ 1, the aggregation switch 502, the leaf switch 504 ⁇ 3, and the computing node 506 ⁇ 2.
  • the computing nodes 506 ⁇ 1 detects anomaly in Path A and further determines that Path A ⁇ 3 and Path A ⁇ 4 are involved in the anomaly.
  • the anomaly may be associated with the leaf switch 504 ⁇ 3 and/or the ports of the leaf switch 504 ⁇ 3.
  • the network topology data associated with the computing node 506 ⁇ 1 may be updated to reflect the dynamic changes of the network caused by the anomaly.
  • the computing node 506 ⁇ 1 may recompute using the Hash algorithm based on the updated network topology data and select a different path by riding another source port.
  • the computing node 506 ⁇ 1 may transmit the data flow using the different path that goes through the leaf switch 504 ⁇ 4 including Path A ⁇ 1, Path A ⁇ 2, Path A ⁇ 3’ and Path A ⁇ 4’ .
  • FIG. 6 illustrates another example forward path planning in accordance with an embodiment of the present disclosure.
  • the example forward path planning 600 is illustrated among various computing nodes and leaf switches, ultimately connected to two aggregation switches. Data packets from computing node 606 ⁇ 1 to computing node 606 ⁇ 2 are distributed to two data flows in two routing paths.
  • Path A includes four hops: Path A ⁇ 1, Path A ⁇ 2, Path A ⁇ 3, and Path A ⁇ 4 and goes through the computing node 606 ⁇ 1, the leaf switch 604 ⁇ 3, the aggregation switch 602 ⁇ 2, the leaf switch 604 ⁇ 4, and the computing node 606 ⁇ 2.
  • Path B includes four hops: Path B ⁇ 1, Path B ⁇ 2, Path B ⁇ 3, and Path B ⁇ 4 and goes through the computing node 606 ⁇ 1, the leaf switch 604 ⁇ 1, the aggregation switch 602 ⁇ 1, the leaf switch 604 ⁇ 2, and the computing node 606 ⁇ 2.
  • the computing nodes 606 ⁇ 1 detects anomaly in Path A and further determines that the leaf switch 604 ⁇ 4 is involved in the anomaly.
  • the network topology data associated with the computing node 506 ⁇ 1 may be updated to reflect the dynamic changes of the network caused by the anomaly. Based on the updated network topology data, the computing node 606 ⁇ 1 may recompute using the Hash algorithm based on the updated network topology data and select a different path by riding another source port.
  • the computing node 606 ⁇ 1 may transmit the data flow using the different path that goes through the leaf switch 604 ⁇ 2 including Path A ⁇ 1, Path A ⁇ 2, Path A ⁇ 3’ and Path A ⁇ 4’ .
  • FIG. 7 illustrates an example equal cost multipath (ECMP) planning in accordance with an embodiment of the present disclosure.
  • ECMP load balancing refers to distributing data flows evenly by using load balancing algorithm to identify flows and distribute the data flows to different routing paths.
  • ECMP planning 700 four equal cost paths Path A, Path B, Path C, and Path D are available to route the data packets from computing node 706 ⁇ 1 to computing node 706 ⁇ 2.
  • Hash algorithm on the five ⁇ tuple data for routing, all four paths are getting utilized to route the data packets from computing node 706 ⁇ 1 to computing node 706 ⁇ 2.
  • this facilitates distributing data flows evenly across the network and reducing possible congestion.
  • the data flows can be distributed among the other three paths.
  • the five ⁇ tuple and the ECMP load balancing shown in FIG. 7 is merely for the purpose of illustration. The present disclosure is not intended to be limiting.
  • a three ⁇ tuple including the source IP address, destination IP address, and ICMP Identifier that uniquely identifies an ICMP Query session may be adopted to indicate a data flow.
  • one or more paths may be set as reserved bandwidth, and hence, the data flow is distributed to only the available paths excluding the reserved paths.
  • FIG. 8 illustrates an example forward path planning algorithm in accordance with an embodiment of the present disclosure.
  • FIG. 9 illustrates another example forward path planning algorithm in accordance with an embodiment of the present disclosure.
  • FIG. 10 illustrates another example forward path planning algorithm in accordance with an embodiment of the present disclosure.
  • FIG. 11 illustrates another example forward path planning algorithm in accordance with an embodiment of the present disclosure.
  • the methods described in FIGs. 8 ⁇ 11 may be implemented in the environment of FIG. 1 and/or the network architecture of FIG. 2. However, the present disclosure is not intended to be limiting. The methods described in FIGs. 8 ⁇ 11 may alternatively be implemented in other environments and/or network architectures.
  • machine ⁇ executable instructions can include routines, programs, objects, components, data structures, procedures, modules, functions, and the like that perform particular functions or implement particular abstract data types.
  • each of the example methods are illustrated as a collection of blocks in a logical flow graph representing a sequence of operations that can be implemented in hardware, software, firmware, or a combination thereof.
  • the order in which the method is described is not intended to be construed as a limitation, and any number of the described method blocks can be combined in any order to implement the method, or alternate methods. Additionally, individual blocks may be omitted from the method without departing from the spirit and scope of the subject matter described herein.
  • the blocks represent computer instructions that, when executed by one or more processors, perform the recited operations.
  • some or all of the blocks may represent application specific integrated circuits (ASICs) or other physical components that perform the recited operations.
  • ASICs application specific integrated circuits
  • a first computing node (e.g., the computing node 104) may obtain information associated with an algorithm implemented by at least a second computing node.
  • the first computing node may implement a same algorithm as the at least second computing node.
  • the algorithm implemented on the first computing node may be configured differently from the algorithm implemented on the second computing node.
  • the algorithm may include various Hash algorithms used for routing path planning based on five ⁇ tuple associated with the data packets in the data flow.
  • the at least one second computing node may be a leaf switch in a three ⁇ tier Clos network, through which, the first computing node connects to the data center network.
  • the first computing node may obtain network topology data stored associated with the at least one second computing node.
  • the first computing node may obtain the information associated with an algorithm implemented by at least a second computing node described at block 802 and the network topology data associated with the at least one second computing node described at block 804 via various network protocols, Link Layer Discovery Protocol (LLDP) , for example.
  • LLDP Link Layer Discovery Protocol
  • the network topology data may be represented in a non ⁇ directional graph illustrating the elements of the network and the connection status associated therewith.
  • the first computing node may receive a first data packet from a source device to be forwarded to a destination device.
  • the source device and the destination device may refer to the client devices 110 of FIG. 1.
  • the data packet is generated when a user operates the client devices 110 to communicate with another user operating a different client device.
  • the data packet is generated when a user visits an online sources or uses an online services.
  • the computing nodes 104 may upload or download data from cloud storage spaces, thus generating a flow of data packets.
  • the first computing node (e.g., the computing node 104) may determine a set of values associated with the first data packet according to the information associated with the algorithm.
  • the set of values associated with the first data packet may include a five ⁇ tuple extracted from the first data packet.
  • the set of values may include a source IP address, a destination IP address, a source port number, a destination port number, and a protocol used for communication.
  • the set of values, i.e., the five ⁇ tuple may be hashed using a Hash algorithm implemented by the first computing node.
  • a three ⁇ tuple including the source IP address, destination IP address, and ICMP Identifier that uniquely identifies an ICMP Query session may be adopted to indicate a data flow.
  • the first computing node (e.g., the computing node 104) may determine a forward path from the source device to the destination device according to the set of values associated with the first data packet and the network topology data.
  • the first computing node may update the network topology data associated therewith using the network topology data obtained from a storage associated with the at least one second computing node.
  • the first computing node may determine all best shortest routing paths according to the updated network topology data and select the forward path from the best shortest routing paths according to the set of values, the five ⁇ tuple hashed values, for example.
  • the first computing node may transmit the first data packet to the destination device through the forward path.
  • a first computing node may determine five ⁇ tuple data associated with the first data packet, the five ⁇ tuple data including a source IP address, a source port number, a destination IP address, a destination port number, and a protocol.
  • the first computing node may extract the source IP address, the destination IP address, and the protocol from the IP header of the data packet and further extract the source port number and the destination port number from the TCP portion.
  • the protocol may include any types of IP protocols including, but not limited to, IPv4 and IPv6.
  • the first computing node may extract the source IP address, the destination IP address, and the ICMP identifier to generate a three ⁇ tuple (or 3 ⁇ tuple) to indicate the data flow.
  • the first computing node may update the configuration of a Hash algorithm implemented by a first computing node using information associated with the algorithm implemented by at least one second computing node.
  • the information associated with the algorithm implemented by at least one second computing node may include versions of the algorithms, one or more parameters configuration of the algorithms, etc.
  • the first computing node may compute a five ⁇ tuple hash values corresponding to the five ⁇ tuple data as the set of values associated with the first data packet using the updated Hash algorithm.
  • the first computing node when the first computing node generates a three ⁇ tuple to identify a data flow, the first computing node computes a three ⁇ tuple hash values corresponding to the three ⁇ tuple data as the set of values associated with the first data packet using the updated Hash algorithm.
  • a first computing node may determine one or more paths from the source device to the destination device according to the network topology data.
  • the one or more paths from the source device to the destination device may include one or more shortest paths.
  • the first computing node may implement various shortest path algorithms, Dijkstra's algorithm, Viterbi algorithm, Floyd–Warshall algorithm, Bellman–Ford algorithm, for example.
  • the first computing node may perform a modulus operation on the five ⁇ tuple hash values with respect to the one or more paths.
  • the modulus operation may generate one or more distinct modulus values corresponding to the one or more paths, respectively.
  • Data packets having all same five ⁇ tuples may form a data flow.
  • Data packets directed from a source IP address to a destination IP address may be distributed to different data flows depending on the source ports the data packets are transmitted through.
  • Data flows may be distributed to the one or more paths according to the one or more distinct modulus values that distinctly correspond to the one or more paths.
  • the first computing node may determine a forward path from the one or more paths according to the results of the modulus operation.
  • the first computing node may select one path that maps to a data flow represented by the five ⁇ tuple as the forward path.
  • the arriving data packets that have the same five ⁇ tuple may use the same forward path.
  • the first computing node may designate one of the one or more paths as the forward path based on the traffic on these paths.
  • a first computing node may determine at least a first forward path and a second forward path from the one or more paths.
  • the first computing node may implement the equal cost multipaths (ECMP) algorithms to determine all possible paths between a source device and a destination device.
  • the one or more paths may be sorted based on the associated one or more distinct modulus values.
  • the first computing node may select one path that maps to a data flow represented by the five ⁇ tuple to forward the data flow. Alternatively, the first computing node may designate more than one path to forward the data flow.
  • the first computing node may distribute the data packets from the source device to the destination device to different data flows to be transmitted to all possible paths between the source device and the destination device. In other implementations, the first computing node may distribute the data flows to a set of all possible paths.
  • the first computing node may receive a plurality of second data packets from the source device to the be forwarded to the destination device.
  • the plurality of second data packets may have the same or different five ⁇ tuple.
  • the data packets that have the same five ⁇ tuples may be transmitted through one path of the first forward path and the second forward path as a data flow at one time.
  • the data packets that have different five ⁇ tuples may form different data flows that go through different forward paths.
  • the second data packet is transmitted through the same forward path generated according to the embodiment illustrated in FIG. 8.
  • the first computing node may distribute the plurality of second data packets to the first forward path and the second forward path, each of the first forward path and the second forward path carries a portion of the plurality of second data packets.
  • the first computing node may evenly distribute the data flows to all possible paths (i.e., the first forward path and the second forward path ) between the source device and the destination device based on the Hash computation.
  • the data flow carried by the first forward path and the second forward path may be uneven.
  • the first computing node may determine an anomaly in one of the first forward path and the second forward path.
  • the anomaly may be associated with a computing node, a switch, port of a computing node, port of a switch node, etc., causing network congestion.
  • the first computing node may detect the anomaly using the detection approaches described above with respect to FIG. 3.
  • the first computing node may generate one or more sessions corresponding to one or more forward paths, respectively.
  • the first computing node may detect the anomaly when a session timeout occurs in one of the one or more sessions.
  • the first computing node may determine a third forward path from the source device to the destination device to reroute data flow, i.e., the portion of the plurality of second data packets that is involved in the abnormality.
  • the first computing node may recompute using the Hash algorithm based on the updated network topology data and select a different path by riding another source port.
  • a method implemented by a first computing node comprising: obtaining, via a network, information associated with an algorithm implemented by at least one second computing node; obtaining, via the network, network topology data stored associated with the at least one second computing node; receiving, at the first computing node, a first data packet from a source device to be forwarded to a destination device; determining a forward path from the source device to the destination device according to information associated with the first data packet and the network topology data; and transmitting the first data packet to the destination device through the forward path.
  • determining a forward path from the source device to the destination device according to the information associated with the first data packet and the network topology data further comprises: determining one or more paths from the source device to the destination device according to the network topology data; and determining the forward path from the one or more paths according to the five ⁇ tuple hash values.
  • determining a forward path from the source device to the destination device according to the information associated with the first data packet and the network topology data further comprises: performing a modulus operation on the five ⁇ tuple Hash values with respect to the one or more paths; and determining the forward path from the one or more paths according to results of the modulus operation.
  • the forward path from the source device to the destination device includes at least a first forward path and a second forward path
  • the method further comprises: receiving a plurality of second data packets from the source device to be forwarded to the destination device; distributing the plurality of second data packets to the first forward path and the second forward path, each of the first forward path and the second forward path carries at least a portion of the plurality of second data packets; detecting an abnormality occurred in one of the first forward path and the second forward path; and determining a third forward path from the source device to the destination device to reroute the portion of the plurality of second data packets carried by one of the first forward path and the second forward path that is involved with the abnormality.
  • One or more machine readable media storing machine readable instructions that, when executed by a first computing node, cause the first computing node to perform acts comprising: obtaining, via a network, information associated with an algorithm implemented by at least one second computing node; obtaining, via the network, network topology data stored associated with the at least one second computing node; receiving, at the first computing node, a first data packet from a source device to be forwarded to a destination device; determining a forward path from the source device to the destination device according to information associated with the first data packet and the network topology data; and transmitting the first data packet to the destination device through the forward path.
  • the one or more machine readable media as recited in paragraph K the acts further comprising: performing a modulus operation on the five ⁇ tuple Hash values with respect to the one or more paths; and determining the forward path from the one or more paths according to results of the modulus operation.
  • N The one or more machine readable media as recited in paragraph H, wherein the determining of a forward path from the source device to the destination device according to the set of values associated with the first data packet and the network topology data is based on an equal cost multipaths (ECMP) planning algorithm.
  • ECMP equal cost multipaths
  • a first computing node comprising: one or more processing units; and memory storing machine executable instructions that, when executed by one or more processing units, cause the one or more processing units to perform acts comprising: obtaining, via a network, information associated with an algorithm implemented by at least one second computing node; obtaining, via the network, network topology data stored associated with the at least one second computing node; receiving, at the first computing node, a first data packet from a source device to be forwarded to a destination device; determining a forward path from the source device to the destination device according to information associated with the first data packet and the network topology data; and transmitting the first data packet to the destination device through the forward path.
  • the first computing node as recited in paragraph O, wherein the information associated with the first data packet including a set of values, and the acts further comprises: determining five ⁇ tuple data associated with the first data packet, the five ⁇ tuple data including a source IP address associated with the source device, a source port number associated with the source device, a destination IP address associated with the destination device, a destination port number associated with the destination device, and a protocol for communication in the network; updating a Hash algorithm implemented by the first computing node using the information associated with the algorithm implemented by the at least one second computing node; and computing a five ⁇ tuple Hash values corresponding to the five ⁇ tuple data as the set of values associated with the first data packet using the updated Hash algorithm.
  • the first computing node as recited in paragraph Q the acts further comprising: performing a modulus operation on the five ⁇ tuple Hash values with respect to the one or more paths; and determining the forward path from the one or more paths according to results of the modulus operation.
  • the first computing node as recited in paragraph O, the acts further comprising: receiving a plurality of second data packets from the source device to be forwarded to the destination device; distributing the plurality of second data packets to the first forward path and the second forward path, each of the first forward path and the second forward path carries at least a portion of the plurality of second data packets; detecting an abnormality occurred in one of the first forward path and the second forward path; and determining a third forward path from the source device to the destination device to reroute the portion of the plurality of second data packets carried by one of the first forward path and the second forward path that is involved with the abnormality.

Abstract

A method for planning a forward path in massive data center networks is provided. The method is implemented by a first computing node in a network. The method comprises obtaining information associated with an algorithm implemented by at least one second computing node; obtaining network topology data stored associated with the at least one second computing node; receiving, at the first computing node, a first data packet from a source device to be forwarded to a destination device; determining a forward path from the source device to the destination device according to information associated with the first data packet and the network topology data; and transmitting the first data packet to the destination device through the forward path.

Description

[Corrected under Rule 26, 03.06.2020] FORWARD PATH PLANNING METHOD IN MASSIVE DATA CENTER NETWORKS BACKGROUND
Data center networks often use compactly interconnected network topology to deliver high bandwidth for internal data exchange. In such networks, it is precarious to employ effective load balancing schemes so that all the available bandwidth resources can be utilized. In order to utilize all the available bandwidth, data flows need to be routed across the network instead of overloading a single path. Equal Cost Multipath (ECMP) path planning algorithm enables the usage of multiple equal cost paths from the source node to the destination node in the network. The advantage of using this algorithm is that the data flows can be split more uniformly to the whole network, thus, avoiding congestion and increasing bandwidth consumption.
In an existing data center network, for example, a two‐tier or a three‐tier Clos network, multiple servers connect to the network via a first tier switch, i.e., a leaf switch or an access switch. When a data flow arrives, the server forwards the data packets to various paths in the network to their respective destinations. The data packets forwarding may be determined based on the “server‐version” topology information and pre‐calculated values associated with the various paths. As the data packets arrive at the next hop leaf switches, each of the leaf switches performs dynamic path planning to distribute the data packets based on the “switch‐version” topology information and the path planning algorithm. As the server is not configured  to perform the dynamic path planning, the “server‐version” topology information may not represent the real‐time network topology information. When a network congestion occurs, the server cannot determine the source of the network congestion and respond timely to re‐route the data flow.
BRIEF DESCRIPTION OF THE DRAWINGS
Methods and systems for dynamic path planning in massive data center networks are provided. The present disclosure implements the dynamic path planning capability of a switch (i.e., a leaf switch or an access switch of a two‐tier or three‐tire Clos network) on a computing node (i.e., a server device) that access the network via the switch. The computing node communicates with one or more switches to obtain information related to the path planning algorithm or the routing algorithm used by the switch and the network topology associated with the switch through various protocols, such as Link Layer Discovery Protocol (LLDP) . The computing node further configures the path planning algorithm or the routing algorithm used therein based on the information related to the path planning algorithm or the routing algorithm used by the switch. The path planning algorithm or the routing algorithm used by the switch may include an equal cost multipath (ECMP) planning algorithm. The computing node further synchronizes the network topology associated with the computing node with the network topology associated with the switch. The present disclosure enables the computing node to perform the dynamic path planning for received data packets ahead of the switch in massive data center networks, hence,  effectively avoiding the data flow collisions in the network. Further, the computing node according to the present disclosure can efficiently detect the network congestions and respond timely by re‐routing the data flow so as to bypass the network congestions.
The detailed description is set forth with reference to the accompanying figures. In the figures, the left‐most digit (s) of a reference number identifies the figure in which the reference number first appears. The use of the same reference numbers in different figures indicates similar or identical items.
FIG. 1 illustrates an example environment in which a forward path planning system may be used in accordance with an embodiment of the present disclosure.
FIG. 2 illustrates an example network architecture of massive data center networks in accordance with an embodiment of the present disclosure.
FIG. 3 illustrates examples failures occurred in the massive data center networks in accordance with an embodiment of the present disclosure.
FIG. 4 illustrates an example configuration of a computing node for implementing the forward path planning method in accordance with an embodiment of the present disclosure.
FIG. 5 illustrates an example forward path planning in accordance with an embodiment of the present disclosure.
FIG. 6 illustrates another example forward path planning in accordance  with an embodiment of the present disclosure.
FIG. 7 illustrates an example equal cost multipath (ECMP) planning in accordance with an embodiment of the present disclosure.
FIG. 8 illustrates an example forward path planning algorithm in accordance with an embodiment of the present disclosure.
FIG. 9 illustrates another example forward path planning algorithm in accordance with an embodiment of the present disclosure.
FIG. 10 illustrates another example forward path planning algorithm in accordance with an embodiment of the present disclosure.
FIG. 11 illustrates another example forward path planning algorithm in accordance with an embodiment of the present disclosure.
DETAILED DESCRIPTION
The application describes multiple and varied embodiments and implementations. The following section describes an example framework that is suitable for practicing various implementations. Next, the application describes example systems, devices, and processes for implementing a distributed training system.
FIG. 1 illustrates an example environment in which a forward path planning system may be used in accordance with an embodiment of the present disclosure. The environment 100 may include a data center network 102. In this example, the data center network 102 may include a plurality of computing nodes or  servers 104‐1, 104‐2, …, 104‐K (which are collectively called hereinafter as computing nodes 104) , where K is a positive integer greater than one. In implementations, the plurality of computing nodes 104 may communicate data with each other via a communication network 106.
The computing node 104 may be implemented as any of a variety of computing devices having computing/processing and communication capabilities, which may include, but not limited to, a server, a desktop computer, a notebook or portable computer, a handheld device, a netbook, an Internet appliance, a tablet computer, a mobile device (e.g., a mobile phone, a personal digital assistant, a smart phone, etc. ) , etc., or a combination thereof.
The communication network 106 may be a wireless or a wired network, or a combination thereof. The network 106 may be a collection of individual networks interconnected with each other and functioning as a single large network (e.g., the Internet or an intranet) . Examples of such individual networks include, but are not limited to, telephone networks, cable networks, Local Area Networks (LANs) , Wide Area Networks (WANs) , and Metropolitan Area Networks (MANs) . Further, the individual networks may be wireless or wired networks, or a combination thereof. Wired networks may include an electrical carrier connection (such a communication cable, etc. ) and/or an optical carrier or connection (such as an optical fiber connection, etc. ) . Wireless networks may include, for example, a WiFi network, other radio frequency networks (e.g., 
Figure PCTCN2020090827-appb-000001
Zigbee, etc. ) , etc. In implementations, the communication network 106 may include a plurality of inter‐node interconnects or  switches 108‐1, 108‐2, …, 108‐L (which are collectively called hereinafter as inter‐node switches 108) for providing connections between the computing nodes 104, where L is a positive integer greater than one.
In implementations, the environment 100 may further include a plurality of client devices 110‐1, 110‐2, …, 110‐N (which are collectively called hereinafter as client devices 110) , where N is a positive integer greater than one. In implementations, users of the client devices 110 may communicate with each other via the communication network 106, or access online resources and services. These online resources and services may be implemented at the computing nodes 104. Data flows generated by users of the client devices 110 may be distributed to a plurality of routing paths and routed to a destination device through one or more of the plurality of paths. In implementations, the destination device may include another client device 110 or a computing node 104. In implementations, each of the plurality of routing paths may include one or more computing nodes 104 and switches 108 inter‐connected by physical links.
FIG. 2 illustrates an example network architecture of massive data center networks in accordance with an embodiment of the present disclosure. The network architecture of massive data center networks 200 may provide a detailed view of the environment in which a forward path planning system may be used. In implementations, the network architecture of massive data center networks is a three‐tier Clos network architecture in a full‐mesh topology. A first tier corresponds to a tier of leaf switches 206, also called access switches or top of rack (ToR) switches. The  computing nodes 208 are directly connected to the leaf switches 206, with each computing node 208 being connected to at least two leaf switches 206. In implementations, a computing node 208 may include one or more network interface controllers (e.g., four network interface controllers) which are connected to one or more ports (e.g., four ports) of a leaf switch 206. In implementations, the number of network interface controllers in each computing node 208 may or may not be the same. A second tier corresponds to a tier of aggregation switches 204, also called spine switches 204, that are connected to one or more leaf switches 206. In implementations, a number of computing nodes 208, the interconnected leaf switches 206, and the interconnected aggregation switches 204 may form a point of delivery (PoD) unit, for example, PoD‐1 and PoD‐2, as illustrated. A third tier corresponds to a tier of core switches 202 that are connected to one or more aggregation switches 204. The core switches 202 is at the top of the cloud data center network pyramid and may include a wide area network (WAN) connection to the outside carrier network.
In implementations, if two processing units or processes included in different computing nodes 208 are connected under a same leaf switch 206, data packets that are transmitted between the two processing units or processes will pass through that same leaf switch 206 without passing through any of the aggregation switches 204 or core switches 202. Alternatively, if two processing units or processes included in different computing nodes are connected under different leaf switches, data packets that are transmitted between the two processing units or processes will pass through one of the aggregation switches. In implementations, a data packet that  is transmitted between the two processing units or processes can be made to flow through a specified aggregation switch by setting an appropriate combination of source and destination ports in the data packet. The routing management for congestion avoidance may aim at enabling data flows from a same leaf switch to different destination leaf switches to pass through different aggregation switches, and/or data flows from different source leaf switches to a same destination leaf switches to pass through different aggregation switches, thus avoiding collisions between the data flows, and leading to no network congestion at the aggregation switches.
In implementations, a processing unit or process in a computing node 208 may send/receive data to/from a processing unit or process in another computing node through a network interface controller (NIC) . In implementations, a processing unit or process in a computing node 208 may be associated with a single network interface controllers or multiple network interface controllers for transmitting data to processing units or processes in other computing nodes. Additionally, or alternatively, multiple processing units or processes may be associated with a single network interface controller and employ that network interface controller for transmitting data to processing units or processes in other computing nodes. In implementations, a plurality of rules for data packet forwarding/routing may be implemented on the computing nodes 208. The plurality of rules may include, but are not limited to, priorities for a processing unit or process in a first computing node to select a neighboring processing unit or process, conditions for a network interface controller  in a first computing node to send or receive data, conditions for a network interface controller in a first computing node to route data to/from a network interface controller in a second computing node, etc.
In implementations, the routing management may assign network interface controller (NIC) identifiers to each network interface controller that is connected or linked to a same leaf switch. In some examples, a network interface controller of the processing unit or process and a network interface controller of a next processing unit or process are located in a same computing node or are directly connected or linked to a same leaf switch, a routing identifier may be determined as a default value or identifier. This default routing identifier indicates that data is either routed within a computing node or through a leaf switch, without passing through any aggregation switch in the communication network. Otherwise, the routing identifier may be determined to be equal to a NIC identifier of that processing unit or process, or other predefined value. Based on a mapping relationship between routing identifiers and aggregation identifiers, an aggregation identifier may be determined based on the determined routing identifier. In implementations, the mapping relationship between routing identifiers and aggregation identifiers may be predetermined in advance using a probing‐based routing mechanism (e.g., sending probing data packets between computing nodes as described in the foregoing description) , for example. In other words, data flows between processing units (or processes) which are included in a same computing node or which network interface controllers a same leaf switch will not go through any aggregation switch in the  communication network. On the other hand, data flows between processing units (or processes) which are included in different computing nodes and which network interface controllers different leaf switches will pass through a designated aggregation switch based on a predetermined mapping relationship, thus enabling routing control and management of data flows and distributing the data flows to different aggregation switches to avoid network congestion.
It should be appreciated that the three‐tier Clos network as illustrated in FIG. 2 is merely an example network architecture of the massive data center network. Other network architectures including, but not limited to, two‐tier Clos networks may also be adopted to construct the massive data center networks.
FIG. 3 illustrates example failures occurred in the massive data center networks in accordance with an embodiment of the present disclosure. After the configuration of a data center network 300 is delivered, network anomaly, including, but not limited to, link failure (e.g., failure 312, 316, and 320) , computing node failure (e.g., failure 310) , leaf switch failure (e.g., failure 318) , aggregation switch failure (e.g., failure 314) , or core switch failure (e.g., failure 322) may occur, causing data packet loss and congestions in certain routing paths. To detect those anomaly in the data center network, detection technology such as network quality analyzer (NQA) Track may be introduced. In implementations, one or more computing nodes in each point of delivery (PoD) unit may implement the NQA track scheme to detect whether another computing node in the same or different PoD unit, a leaf switch in the same or different PoD unit, an aggregation switch in the same or different PoD unit, or a core  switch is unreachable, etc.
An example detection approach is to utilize one computing node in a point of delivery (PoD) unit as a detection source. By way of example and not limitation, computing node 308‐1 in PoD‐1 may be assigned as a detection source. The computing node 308‐1 may periodically ping other computer nodes, leaf switches, aggregation switches, and core switches, by sending an Internet Control Message Protocol (ICMP) echo request packet. The computing node 308‐1 waits for an ICMP echo reply from each of those nodes and switches. After a pre‐set period, or called Time to Live (TTL) period, if no echo reply is received from either the computing node or the switch, the computing node 308‐1 determines that the computing node or the switch is unreachable. In one example, when the packet loss occurs in a large amount of computing nodes connected to a same leaf switch, the detection source (i.e., the designated computing node for anomaly detection) may further determine that the leaf switch may be in failure. In another example, when the packet loss only occurs in sporadic computing nodes connected to a same leaf switch, the detection source may further determine that these sporadic computing nodes may experience overload or the corresponding ports in the leaf switch may be full. In yet another example, when packet loss occurs in a large amount of computing nodes located in a different point of delivery (PoD) unit, the detection source may further determine that failures may occur in one or more corresponding aggregation switches located therein and/or one or more corresponding core switches connected thereto. In yet another example, when the packet loss only occurs in sporadic computing nodes located in a different  point of delivery (PoD) unit, the detection source may further determine that these sporadic computing nodes may experience overload or the corresponding ports in the aggregation switch and/or the core switch may be full.
Another example detection approach is to cooperate various computing nodes in different locations as detection sources and deploy different detection strategies among those detection sources. Each of the detection sources may implement an agent program capable of additionally detecting the anomaly associated with OSI layer 4 through layer 7. The detection sources may accept input control commands to dynamically configure the detection strategies. This approach may establish TCP connections to further detect one or more parameters associated with the transport layer (i.e., OSI layer 4) , for example, transmission delay or transmission rate. With the network topology information, the transmission delay or transmission rate, and the data packet loss rate, the detection approach may further learn the exact location of the anomaly. In one example, when the data packets routed by a first leaf switch to a number of computing nodes are experiencing high packet loss rate but the data packets to the number of computing nodes routed by a different leaf switch are received with no significant delays, the detection source may determine that the number of computing nodes are operating normal but the first leaf switch may experience certain anomaly. In another example, when the data packets to a first number of computing nodes routed by the first leaf switch experience long delay but the data packets to a second number of computing nodes routed by the same leaf switch have no delay, the detection source may determine that the first leaf switch are  operating but the ports in the first leaf switch that correspond to the first number of computing nodes may be congested.
It should be appreciated that the network failure detection approaches describes above are merely for illustration purposes. Other approaches including, but not limited to, random early detection (RED) , weighted random early detection (WRED) , robust random early detection (RRED) , explicit congestion notification (ECN) , backward ECN (BECN) may also be implemented to detect network congestions.
FIG. 4 illustrates an example configuration of a computing node for implementing the forward path planning method in accordance with an embodiment of the present disclosure. In implementations, the example configuration 400 of the computing node 402 may include, but is not limited to, one or more processing units 404, one or more network interfaces 406, an input/output (I/O) interface 408, and memory 412. In implementations, the computing node 402 may further include one or more intra‐node interconnects or switches 410.
In implementations, the processing units 404 may be configured to execute instructions that are stored in the memory 412, and/or received from the input/output interface 408, and/or the network interface 406. In implementations, the processing units 404 may be implemented as one or more hardware processors including, for example, a microprocessor, an application‐specific instruction‐set processor, a physics processing unit (PPU) , a central processing unit (CPU) , a graphics processing unit, a digital signal processor, a tensor processing unit, etc. Additionally, or alternatively, the functionality described herein can be performed, at least in part,  by one or more hardware logic components. For example, and without limitation, illustrative types of hardware logic components that can be used include field‐programmable gate arrays (FPGAs) , application‐specific integrated circuits (ASICs) , application‐specific standard products (ASSPs) , system‐on‐a‐chip systems (SOCs) , complex programmable logic devices (CPLDs) , etc.
The memory 412 may include machine readable media in a form of volatile memory, such as Random Access Memory (RAM) and/or non‐volatile memory, such as read only memory (ROM) or flash RAM. The memory 412 is an example of machine readable media.
The machine readable media may include a volatile or non‐volatile type, a removable or non‐removable media, which may achieve storage of information using any method or technology. The information may include a machine readable instruction, a data structure, a program module or other data. Examples of machine readable media include, but not limited to, phase‐change memory (PRAM) , static random access memory (SRAM) , dynamic random access memory (DRAM) , other types of random‐access memory (RAM) , read‐only memory (ROM) , electronically erasable programmable read‐only memory (EEPROM) , quick flash memory or other internal storage technology, compact disk read‐only memory (CD‐ROM) , digital versatile disc (DVD) or other optical storage, magnetic cassette tape, magnetic disk storage or other magnetic storage devices, or any other non‐transmission media, which may be used to store information that may be accessed by a computing node. As defined herein, the machine readable media does not include any transitory media,  such as modulated data signals and carrier waves.
In implementations, the network interfaces 406 may be configured to connect the computing node 402 to other computing nodes via the communication network 106. In implementations, the network interfaces 406 may be established through a network interface controller (NIC) , which may employ both hardware and software in connecting the computing node 402 to the communication network 106. In implementations, each type of NIC may use a different type of fabric or connector to connect to a physical medium associated with the communication network 106. Examples of types of fabrics or connectors may be found in the IEEE 802 specifications, and may include, for example, Ethernet (which is defined in 802.3) , Token Ring (which is defined in 802.5) , and wireless networking (which is defined in 802.11) , an InfiniBand, etc.
In implementations, the intra‐node switches 410 may include various types of interconnects or switches, which may include, but are not limited to, a high‐speed serial computer expansion bus (such as PCIe, etc. ) , a serial multi‐lane near‐range communication link (such as Nolan, which is a wire‐based communications protocol serial multi‐lane near‐range communication link, for example) , a switch chip with a plurality of ports (e.g., a NVSwitch, etc. ) , a point‐to‐point processor interconnect (such as an Intel QPI/UPI, etc. ) ., etc.
In implementations, the computing node 402 may further include other hardware components and/or other software components, such as program modules 414 to execute instructions stored in the memory 412 for performing various  operations, and program data 416 for storing data received for path planning, anomaly detection, etc. In implementations, the program modules 414 may include a topology awareness module 418, a path planning module 420, and an anomaly detection module 422.
The topology awareness module 418 may be configured to maintain topology data associated with the network 106. The topology data may be generated and implemented on each elements of the network 106 when the network architecture is delivered. The topology data includes arrangements of the elements of a network, computing nodes, leaf switches, aggregation switches, core switches, etc., and indicates the connections/links among these elements. In the equal cost multipath (ECMP) algorithm, the topology data may be represented as non‐directional graph and stored as an adjacent list. All paths that route the data packets from a source node to a destination node may be configured to have equal cost. Hence, data flow from the source node to the destination node may be evenly distributed to all paths. In implementations, one or more available paths from the source node to the destination may be configured as reserved bandwidths and the data flow may be forwarded only to available paths. The topology awareness module 418 may communicate with one or more switches in the massive data center network (e.g., leaf switch 306) in real‐time, periodically, or at a pre‐set time period to obtain topology data associated with the network 106. In implementations, the topology data associated with the network 106 may be stored in one or more separate storage devices. The topology awareness module 418 may communicate with the one or more  separate storage devices to obtain the real‐time topology data. In implementations, the topology data associated with the network 106 may be dynamically updated to the topology awareness module 418 in response to a notification of network congestions. In implementations, the communication and topology data exchange between the computing node and the leaf switches may be achieved by using protocols including, but not limited to, Link Layer Discovery Protocol (LLDP) , Link Aggregation Control Protocol (LACP) , general‐purpose Remote Procedure Calls (GRPC) , etc. In implementations, the communication and topology data exchange between the computing node and the leaf switches may be achieved by remote software control.
The path planning module 410 may be configured to determine the routing paths to forward data packets and distribute the data packets to the routing paths to balance the data flows in the network 106. In implementations, when a data packet arrives, the path planning module 420 may obtain a source address, a destination address, and a protocol from an IP header of a TCP/IP data packet, and a source port and a destination port from the TCP packet. The source address, the destination address, the source port, the destination port, and the protocol may form a so‐called five‐tuple (or 5‐tuple) . The five‐tuple may uniquely indicate a data flow, in which all data packets have exactly the same five‐tuples. The path planning module 420 may determine all possible routing paths from the source node to the destination node. A data flow, in which all data packets have exactly the same five‐tuples may use one of the all possible routing paths at one time. In implementations, when it is  expected to select a different path, for example, when a network anomaly occurs, the network topology data changes due to the anomaly. The topology awareness module 418 may update the network topology data stored in the program data 416 to reflex the changes. In implementations, the network topology data associated with the computing node 402 may be stored separately from the program data 416 and may be updated in response to the topology data changes due to the anomaly. The path planning module 410 may recompute using the Hash algorithm based on the updated network topology data and select a different path by riding another source port. It should be appreciated that the five‐tuple (or 5‐tuple) that uniquely indicates a data flow described above is merely for the purpose of illustration. The present disclosure is not intended to be limiting. The path planning module 420 may construct a three‐tuple (or 3‐tuple) including the source IP address, destination IP address, and ICMP Identifier that uniquely identifies an ICMP Query session to indicate a data flow.
In implementations, the planning module 420 may determine all possible routing paths from the source node to the destination node. The routing paths may be determined based on various path finding algorithms including, but not limited to, shortest path algorithms. Examples of shortest path algorithms may include, but not limited to, Dijkstra's algorithm, Viterbi algorithm, Floyd–Warshall algorithm, Bellman–Ford algorithm, etc. The path planning module 420 may further perform a Hash operation on the five‐tuple to get a corresponding five‐tuple hash values and determine a routing path from all possible shortest routing paths according to the five‐tuple has values. As the Hash operation maps a five‐tuple to a unique path, data  packets that all have the same five‐tuple may be routed through the same path. Various Hash algorithms may be implemented by the path planning module 420 including, but not limited to, Message Digest (MD, MD2, MD4, MD5 and MD6) , RIPEMD (RIPEND, RIPEMD‐128, and RIPEMD‐160) , Whirlpool (Whirlpool‐0, Whirlpool‐T, and Whirlpool) or Secure Hash Function (SHA‐0, SHA‐1, SHA‐2, and SHA‐3) . By implementing the Hash operation on the five‐tuple data, different data flows can be distributed evenly to all possible routing paths between a source node and a destination node to avoid network congestion.
In current path planning methods, the computing node forwards data packets based on pre‐computed Hash values corresponding to all possible routing paths, respectively and the switches (i.e., the leaf switches) perform path planning based on the dynamic network topology and a Hash algorithm. Quite often, the topology data associated with the computing node and the leaf switches are not synchronized, and the Hash algorithms implemented on the computing node and the leaf switches have different configurations. Hence, the five‐tuples Hash values calculated by the computing node and the switches (i.e., leaf switches or aggregation switches) may direct different data flows to a same routing path, causing possible congestions in the network. In the present implementation, the path planning module 420 of the computing node 402 may obtain information associated with the Hash algorithm implemented on one or more leaf switches and configure the Hash algorithm implemented on the computing node 402 using the obtained information. The information may include one or more parameters configured for the Hash  algorithm implemented on the one or more leaf switches. The path planning module 420 of the computing node 402 may further obtain network topology data stored associated with the one or more leaf switches and update the network topology data according to the obtained network topology data. In implementations, the communication and data exchange between the computing node and the leaf switches may be achieved by using protocols including, but not limited to, Link Layer Discovery Protocol (LLDP) , Link Aggregation Control Protocol (LACP) , general‐purpose Remote Procedure Calls (GRPC) , etc. As the topology data and Hash algorithm configuration is synchronized between the computing node and the leaf switch, the collisions of mapping different five‐tuple hash values to a same path may be reduced and possible flow congestion may be avoided. Further, when a network anomaly occurs, as the computing node maintain an updated topology from the view of the leaf switch and active sessions of the data flow, the computing node may effectively determine the element in the network that is involved in the anomaly and re‐plan the forward paths for the data flow.
The anomaly detection module 422 may be configured to detect anomaly occurred in the network 106. The anomaly detection module 422 may implement the detection approaches described above with respect to FIG. 3, hence, is not described in detail herein.
The program data 416 may be configured to store topology information 424, configuration information 426, and routing information 428. The topology information may include the network elements and the connection status of the  network elements. The topology information may be dynamically updated according to the path planning and data exchange between the computing node 402 and the leaf switches. The configuration information 426 may include versions and parameters of the algorithms implemented on the computing node 402, routing algorithms, Hash algorithms, for example. The routing information 428 may include all possible routing paths between a source node and a destination node. The routing information 428 may further include mappings between the five‐tuple hash values and the corresponding forward path.
FIG. 5 illustrates an example forward path planning in accordance with an embodiment of the present disclosure. The example forward path planning 500 is illustrated among various computing nodes and leaf switches, ultimately connected to a single aggregation switch. Data packets from computing node 506‐1 to computing node 506‐2 are distributed to two data flows in two routing paths. Path A includes four hops: Path A‐1, Path A‐2, Path A‐3, and Path A‐4 and goes through the computing node 506‐1, the leaf switch 504‐1, the aggregation switch 502, the leaf switch 504‐2, and the computing node 506‐2. Path B includes four hops: Path B‐1, Path B‐2, Path B‐3, and Path B‐4 and goes through the computing node 506‐1, the leaf switch 504‐1, the aggregation switch 502, the leaf switch 504‐3, and the computing node 506‐2. In the transmission of the data flow, the computing nodes 506‐1 detects anomaly in Path A and further determines that Path A‐3 and Path A‐4 are involved in the anomaly. The anomaly may be associated with the leaf switch 504‐3 and/or the ports of the leaf switch 504‐3. The network topology data associated with the computing node 506‐1  may be updated to reflect the dynamic changes of the network caused by the anomaly. Based on the updated network topology data, the computing node 506‐1 may recompute using the Hash algorithm based on the updated network topology data and select a different path by riding another source port. The computing node 506‐1 may transmit the data flow using the different path that goes through the leaf switch 504‐4 including Path A‐1, Path A‐2, Path A‐3’ and Path A‐4’ .
FIG. 6 illustrates another example forward path planning in accordance with an embodiment of the present disclosure. The example forward path planning 600 is illustrated among various computing nodes and leaf switches, ultimately connected to two aggregation switches. Data packets from computing node 606‐1 to computing node 606‐2 are distributed to two data flows in two routing paths. Path A includes four hops: Path A‐1, Path A‐2, Path A‐3, and Path A‐4 and goes through the computing node 606‐1, the leaf switch 604‐3, the aggregation switch 602‐2, the leaf switch 604‐4, and the computing node 606‐2. Path B includes four hops: Path B‐1, Path B‐2, Path B‐3, and Path B‐4 and goes through the computing node 606‐1, the leaf switch 604‐1, the aggregation switch 602‐1, the leaf switch 604‐2, and the computing node 606‐2. In the transmission of the data flow, the computing nodes 606‐1 detects anomaly in Path A and further determines that the leaf switch 604‐4 is involved in the anomaly. The network topology data associated with the computing node 506‐1 may be updated to reflect the dynamic changes of the network caused by the anomaly. Based on the updated network topology data, the computing node 606‐1 may recompute using the Hash algorithm based on the updated network topology data  and select a different path by riding another source port. The computing node 606‐1 may transmit the data flow using the different path that goes through the leaf switch 604‐2 including Path A‐1, Path A‐2, Path A‐3’ and Path A‐4’ .
FIG. 7 illustrates an example equal cost multipath (ECMP) planning in accordance with an embodiment of the present disclosure. ECMP load balancing refers to distributing data flows evenly by using load balancing algorithm to identify flows and distribute the data flows to different routing paths. As illustrated in the ECMP planning 700, four equal cost paths Path A, Path B, Path C, and Path D are available to route the data packets from computing node 706‐1 to computing node 706‐2. With the use of Hash algorithm on the five‐tuple data for routing, all four paths are getting utilized to route the data packets from computing node 706‐1 to computing node 706‐2. At the initial path planning state, this facilitates distributing data flows evenly across the network and reducing possible congestion. Further, when one of the four paths fails, the data flows can be distributed among the other three paths. It should be appreciated that the five‐tuple and the ECMP load balancing shown in FIG. 7 is merely for the purpose of illustration. The present disclosure is not intended to be limiting. In implementations, a three‐tuple including the source IP address, destination IP address, and ICMP Identifier that uniquely identifies an ICMP Query session may be adopted to indicate a data flow. Further, one or more paths may be set as reserved bandwidth, and hence, the data flow is distributed to only the available paths excluding the reserved paths.
FIG. 8 illustrates an example forward path planning algorithm in  accordance with an embodiment of the present disclosure. FIG. 9 illustrates another example forward path planning algorithm in accordance with an embodiment of the present disclosure. FIG. 10 illustrates another example forward path planning algorithm in accordance with an embodiment of the present disclosure. FIG. 11 illustrates another example forward path planning algorithm in accordance with an embodiment of the present disclosure. The methods described in FIGs. 8‐11 may be implemented in the environment of FIG. 1 and/or the network architecture of FIG. 2. However, the present disclosure is not intended to be limiting. The methods described in FIGs. 8‐11 may alternatively be implemented in other environments and/or network architectures.
The methods described in FIGs. 8‐11 are described in the general context of machine‐executable instructions. Generally, machine‐executable instructions can include routines, programs, objects, components, data structures, procedures, modules, functions, and the like that perform particular functions or implement particular abstract data types. Furthermore, each of the example methods are illustrated as a collection of blocks in a logical flow graph representing a sequence of operations that can be implemented in hardware, software, firmware, or a combination thereof. The order in which the method is described is not intended to be construed as a limitation, and any number of the described method blocks can be combined in any order to implement the method, or alternate methods. Additionally, individual blocks may be omitted from the method without departing from the spirit and scope of the subject matter described herein. In the context of software, the  blocks represent computer instructions that, when executed by one or more processors, perform the recited operations. In the context of hardware, some or all of the blocks may represent application specific integrated circuits (ASICs) or other physical components that perform the recited operations.
Referring back to the method 800 described in FIG. 8, at block 802, a first computing node (e.g., the computing node 104) may obtain information associated with an algorithm implemented by at least a second computing node.
In implementations, the first computing node may implement a same algorithm as the at least second computing node. The algorithm implemented on the first computing node may be configured differently from the algorithm implemented on the second computing node. The algorithm may include various Hash algorithms used for routing path planning based on five‐tuple associated with the data packets in the data flow. In implementations, the at least one second computing node may be a leaf switch in a three‐tier Clos network, through which, the first computing node connects to the data center network.
At block 804, the first computing node (e.g., the computing node 104) may obtain network topology data stored associated with the at least one second computing node.
In implementations, the first computing node may obtain the information associated with an algorithm implemented by at least a second computing node described at block 802 and the network topology data associated with the at least one second computing node described at block 804 via various  network protocols, Link Layer Discovery Protocol (LLDP) , for example. The network topology data may be represented in a non‐directional graph illustrating the elements of the network and the connection status associated therewith.
At block 806, the first computing node (e.g., the computing node 104) may receive a first data packet from a source device to be forwarded to a destination device.
In implementations, the source device and the destination device may refer to the client devices 110 of FIG. 1. In an example, the data packet is generated when a user operates the client devices 110 to communicate with another user operating a different client device. In another example, the data packet is generated when a user visits an online sources or uses an online services. In yet another examples, the computing nodes 104 may upload or download data from cloud storage spaces, thus generating a flow of data packets.
At block 808, the first computing node (e.g., the computing node 104) may determine a set of values associated with the first data packet according to the information associated with the algorithm.
In implementations, the set of values associated with the first data packet may include a five‐tuple extracted from the first data packet. The set of values may include a source IP address, a destination IP address, a source port number, a destination port number, and a protocol used for communication. The set of values, i.e., the five‐tuple, may be hashed using a Hash algorithm implemented by the first computing node. In implementations, a three‐tuple including the source IP address,  destination IP address, and ICMP Identifier that uniquely identifies an ICMP Query session may be adopted to indicate a data flow.
At block 810, the first computing node (e.g., the computing node 104) may determine a forward path from the source device to the destination device according to the set of values associated with the first data packet and the network topology data.
In implementations, the first computing node may update the network topology data associated therewith using the network topology data obtained from a storage associated with the at least one second computing node. The first computing node may determine all best shortest routing paths according to the updated network topology data and select the forward path from the best shortest routing paths according to the set of values, the five‐tuple hashed values, for example.
At block 812, the first computing node (e.g., the computing node 104) may transmit the first data packet to the destination device through the forward path.
Referring back to the method 900 described in FIG. 9, at block 902, a first computing node (e.g., the computing node 104) may determine five‐tuple data associated with the first data packet, the five‐tuple data including a source IP address, a source port number, a destination IP address, a destination port number, and a protocol.
In implementations, the first computing node may extract the source IP address, the destination IP address, and the protocol from the IP header of the data packet and further extract the source port number and the destination port number  from the TCP portion. The protocol may include any types of IP protocols including, but not limited to, IPv4 and IPv6. In other implementations, the first computing node may extract the source IP address, the destination IP address, and the ICMP identifier to generate a three‐tuple (or 3‐tuple) to indicate the data flow.
At block 904, the first computing node (e.g., the computing node 104) may update the configuration of a Hash algorithm implemented by a first computing node using information associated with the algorithm implemented by at least one second computing node. In implementations, the information associated with the algorithm implemented by at least one second computing node may include versions of the algorithms, one or more parameters configuration of the algorithms, etc.
At block 906, the first computing node (e.g., the computing node 104) may compute a five‐tuple hash values corresponding to the five‐tuple data as the set of values associated with the first data packet using the updated Hash algorithm. In implementations, when the first computing node generates a three‐tuple to identify a data flow, the first computing node computes a three‐tuple hash values corresponding to the three‐tuple data as the set of values associated with the first data packet using the updated Hash algorithm.
Referring back to the method 1000 described in FIG. 10, at block 1002, a first computing node (e.g., the computing node 104) may determine one or more paths from the source device to the destination device according to the network topology data. In implementations, the one or more paths from the source device to the destination device may include one or more shortest paths. The first computing  node may implement various shortest path algorithms, Dijkstra's algorithm, Viterbi algorithm, Floyd–Warshall algorithm, Bellman–Ford algorithm, for example.
At block 1004, the first computing node (e.g., the computing node 104) may perform a modulus operation on the five‐tuple hash values with respect to the one or more paths. In implementations, the modulus operation may generate one or more distinct modulus values corresponding to the one or more paths, respectively. Data packets having all same five‐tuples may form a data flow. Data packets directed from a source IP address to a destination IP address may be distributed to different data flows depending on the source ports the data packets are transmitted through. Data flows may be distributed to the one or more paths according to the one or more distinct modulus values that distinctly correspond to the one or more paths.
At block 1006, the first computing node (e.g., the computing node 104) may determine a forward path from the one or more paths according to the results of the modulus operation. In implementations, the first computing node may select one path that maps to a data flow represented by the five‐tuple as the forward path. The arriving data packets that have the same five‐tuple may use the same forward path. In other implementations, the first computing node may designate one of the one or more paths as the forward path based on the traffic on these paths.
Referring back to the method 1100 described in FIG. 11, at block 1102, a first computing node (e.g., the computing node 104) may determine at least a first forward path and a second forward path from the one or more paths. The first computing node may implement the equal cost multipaths (ECMP) algorithms to  determine all possible paths between a source device and a destination device. In implementations, the one or more paths may be sorted based on the associated one or more distinct modulus values. The first computing node may select one path that maps to a data flow represented by the five‐tuple to forward the data flow. Alternatively, the first computing node may designate more than one path to forward the data flow. In implementations, the first computing node may distribute the data packets from the source device to the destination device to different data flows to be transmitted to all possible paths between the source device and the destination device. In other implementations, the first computing node may distribute the data flows to a set of all possible paths.
At block 1104, the first computing node (e.g., the computing node 104) may receive a plurality of second data packets from the source device to the be forwarded to the destination device. The plurality of second data packets may have the same or different five‐tuple. The data packets that have the same five‐tuples may be transmitted through one path of the first forward path and the second forward path as a data flow at one time. The data packets that have different five‐tuples may form different data flows that go through different forward paths. In implementations, when one of the plurality of second data packets has the same five‐tuple as the first data packet described in FIG. 8, the second data packet is transmitted through the same forward path generated according to the embodiment illustrated in FIG. 8.
At block 1106, the first computing node (e.g., the computing node 104) may distribute the plurality of second data packets to the first forward path and the  second forward path, each of the first forward path and the second forward path carries a portion of the plurality of second data packets. In implementations, the first computing node may evenly distribute the data flows to all possible paths (i.e., the first forward path and the second forward path ) between the source device and the destination device based on the Hash computation. In other implementations, the data flow carried by the first forward path and the second forward path may be uneven.
At block 1108, the first computing node (e.g., the computing node 104) may determine an anomaly in one of the first forward path and the second forward path. The anomaly may be associated with a computing node, a switch, port of a computing node, port of a switch node, etc., causing network congestion. The first computing node may detect the anomaly using the detection approaches described above with respect to FIG. 3. In implementations, the first computing node may generate one or more sessions corresponding to one or more forward paths, respectively. The first computing node may detect the anomaly when a session timeout occurs in one of the one or more sessions.
At block 1110, the first computing node (e.g., the computing node 104) may determine a third forward path from the source device to the destination device to reroute data flow, i.e., the portion of the plurality of second data packets that is involved in the abnormality. In implementations, the first computing node may recompute using the Hash algorithm based on the updated network topology data  and select a different path by riding another source port.
Although the above method blocks are described to be executed in a particular order, in some implementations, some or all of the method blocks can be executed in other orders, or in parallel.
Although implementations have been described in language specific to structural features and/or methodological acts, it is to be understood that the claims are not necessarily limited to the specific features or acts described. Rather, the specific features and acts are disclosed as exemplary forms of implementing the claimed subject matter. Additionally, or alternatively, some or all of the operations may be implemented by one or more ASICS, FPGAs, or other hardware.
EXAMPLE CLAUSES
A. A method implemented by a first computing node, the method comprising: obtaining, via a network, information associated with an algorithm implemented by at least one second computing node; obtaining, via the network, network topology data stored associated with the at least one second computing node; receiving, at the first computing node, a first data packet from a source device to be forwarded to a destination device; determining a forward path from the source device to the destination device according to information associated with the first data packet and the network topology data; and transmitting the first data packet to the destination device through the forward path.
B. The method as recited in paragraph A, wherein the information associated with the first data packet includes a set of values and determining, at the first computing node, a set of values associated with the first data packet according to the information associated with the algorithm further comprises: executing a Hash algorithm implemented by the first computing node; applying at least the information associated with the algorithm implemented by the at least one second computing node to the Hash algorithm; and computing the a set of values associated with the first data packet using the Hash algorithm.
C. The method as recited in paragraph A, wherein the information associated with the first data packet including a set of values, and determining, at the first computing node, a set of values associated with the first data packet according to the information associated with the algorithm further comprises: determining five‐ tuple data associated with the first data packet, the five‐tuple data including a source IP address associated with the source device, a source port number associated with the source device, a destination IP address associated with the destination device, a destination port number associated with the destination device, and a protocol for communication in the network; updating a Hash algorithm implemented by the first computing node using the information associated with the algorithm implemented by the at least one second computing node; and computing a five‐tuple Hash values corresponding to the five‐tuple data as the set of values associated with the first data packet using the updated Hash algorithm.
D. The method as recited in paragraph C, wherein determining a forward path from the source device to the destination device according to the information associated with the first data packet and the network topology data further comprises: determining one or more paths from the source device to the destination device according to the network topology data; and determining the forward path from the one or more paths according to the five‐tuple hash values.
E. The method as recited in paragraph D, wherein determining a forward path from the source device to the destination device according to the information associated with the first data packet and the network topology data further comprises: performing a modulus operation on the five‐tuple Hash values with respect to the one or more paths; and determining the forward path from the one or more paths according to results of the modulus operation.
F. The method as recited in paragraph A, wherein the forward path from the source device to the destination device includes at least a first forward path and a second forward path, and the method further comprises: receiving a plurality of second data packets from the source device to be forwarded to the destination device; distributing the plurality of second data packets to the first forward path and the second forward path, each of the first forward path and the second forward path carries at least a portion of the plurality of second data packets; detecting an abnormality occurred in one of the first forward path and the second forward path; and determining a third forward path from the source device to the destination device to reroute the portion of the plurality of second data packets carried by one of the first forward path and the second forward path that is involved with the abnormality.
G. The method as recited in paragraph A, wherein the determining of a forward path from the source device to the destination device according to the set of values associated with the first data packet and the network topology data is based on an equal cost multipaths (ECMP) planning algorithm.
H. One or more machine readable media storing machine readable instructions that, when executed by a first computing node, cause the first computing node to perform acts comprising: obtaining, via a network, information associated with an algorithm implemented by at least one second computing node; obtaining, via the network, network topology data stored associated with the at least one second computing node; receiving, at the first computing node, a first data packet from a source device to be forwarded to a destination device; determining a forward path  from the source device to the destination device according to information associated with the first data packet and the network topology data; and transmitting the first data packet to the destination device through the forward path.
I. The one or more machine readable media as recited in paragraph H, wherein the information associated with the first data packet including a set of values, and the acts further comprises: executing a Hash algorithm implemented by the first computing node; applying at least the information associated with the algorithm implemented by the at least one second computing node to the Hash algorithm; and computing the set of values associated with the first data packet using the Hash algorithm.
J. The one or more machine readable media as recited in paragraph H, wherein the information associated with the first data packet including a set of values, and the acts further comprises: determining five‐tuple data associated with the first data packet, the five‐tuple data including a source IP address associated with the source device, a source port number associated with the source device, a destination IP address associated with the destination device, a destination port number associated with the destination device, and a protocol for communication in the network; updating a Hash algorithm implemented by the first computing node using the information associated with the algorithm implemented by the at least one second computing node; and computing a five‐tuple Hash values corresponding to the five‐tuple data as the set of values associated with the first data packet using the updated Hash algorithm.
K. The one or more machine readable media as recited in paragraph J, the acts further comprising: determining one or more paths from the source device to the destination device according to the network topology data; and determining the forward path from the one or more paths according to the five‐tuple hash values.
L. The one or more machine readable media as recited in paragraph K, the acts further comprising: performing a modulus operation on the five‐tuple Hash values with respect to the one or more paths; and determining the forward path from the one or more paths according to results of the modulus operation.
M. The one or more machine readable media as recited in paragraph H, the acts further comprising: receiving a plurality of second data packets from the source device to be forwarded to the destination device; distributing the plurality of second data packets to the first forward path and the second forward path, each of the first forward path and the second forward path carries at least a portion of the plurality of second data packets; detecting an abnormality occurred in one of the first forward path and the second forward path; and determining a third forward path from the source device to the destination device to reroute the portion of the plurality of second data packets carried by one of the first forward path and the second forward path that is involved with the abnormality.
N. The one or more machine readable media as recited in paragraph H, wherein the determining of a forward path from the source device to the destination device according to the set of values associated with the first data packet and the  network topology data is based on an equal cost multipaths (ECMP) planning algorithm.
O. A first computing node comprising: one or more processing units; and memory storing machine executable instructions that, when executed by one or more processing units, cause the one or more processing units to perform acts comprising: obtaining, via a network, information associated with an algorithm implemented by at least one second computing node; obtaining, via the network, network topology data stored associated with the at least one second computing node; receiving, at the first computing node, a first data packet from a source device to be forwarded to a destination device; determining a forward path from the source device to the destination device according to information associated with the first data packet and the network topology data; and transmitting the first data packet to the destination device through the forward path.
P. The first computing node as recited in paragraph O, wherein the information associated with the first data packet including a set of values, and the acts further comprises: determining five‐tuple data associated with the first data packet, the five‐tuple data including a source IP address associated with the source device, a source port number associated with the source device, a destination IP address associated with the destination device, a destination port number associated with the destination device, and a protocol for communication in the network; updating a Hash algorithm implemented by the first computing node using the information associated with the algorithm implemented by the at least one second computing node; and  computing a five‐tuple Hash values corresponding to the five‐tuple data as the set of values associated with the first data packet using the updated Hash algorithm.
Q. The first computing node as recited in paragraph P, wherein the information associated with the first data packet including a set of values, and the acts further comprises: determining one or more paths from the source device to the destination device according to the network topology data; and determining the forward path from the one or more paths according to the five‐tuple hash values.
R. The first computing node as recited in paragraph Q, the acts further comprising: performing a modulus operation on the five‐tuple Hash values with respect to the one or more paths; and determining the forward path from the one or more paths according to results of the modulus operation.
S. The first computing node as recited in paragraph O, the acts further comprising: receiving a plurality of second data packets from the source device to be forwarded to the destination device; distributing the plurality of second data packets to the first forward path and the second forward path, each of the first forward path and the second forward path carries at least a portion of the plurality of second data packets; detecting an abnormality occurred in one of the first forward path and the second forward path; and determining a third forward path from the source device to the destination device to reroute the portion of the plurality of second data packets carried by one of the first forward path and the second forward path that is involved with the abnormality.
T. The first computing node as recited in paragraph O, wherein the determining of a forward path from the source device to the destination device according to the set of values associated with the first data packet and the network topology data is based on an equal cost multipaths (ECMP) planning algorithm.

Claims (20)

  1. A method implemented by a first computing node, the method comprising:
    obtaining, via a network, information associated with an algorithm implemented by at least one second computing node;
    obtaining, via the network, network topology data stored associated with the at least one second computing node;
    receiving, at the first computing node, a first data packet from a source device to be forwarded to a destination device;
    determining a forward path from the source device to the destination device according to information associated with the first data packet and the network topology data; and
    transmitting the first data packet to the destination device through the forward path.
  2. The method of claim 1, wherein the information associated with the first data packet includes a set of values and determining, at the first computing node, a set of values associated with the first data packet according to the information associated with the algorithm further comprises:
    executing a Hash algorithm implemented by the first computing node;
    applying at least the information associated with the algorithm implemented by the at least one second computing node to the Hash algorithm; and
    computing the set of values associated with the first data packet using the Hash algorithm.
  3. The method of claim 1, wherein the information associated with the first data packet including a set of values, and determining, at the first computing node, a set of values associated with the first data packet according to the information associated with the algorithm further comprises:
    determining five‐tuple data associated with the first data packet, the five‐tuple data including a source IP address associated with the source device, a source port number associated with the source device, a destination IP address associated with the destination device, a destination port number associated with the destination device, and a protocol for communication in the network;
    updating a Hash algorithm implemented by the first computing node using the information associated with the algorithm implemented by the at least one second computing node; and
    computing a five‐tuple Hash values corresponding to the five‐tuple data as the set of values associated with the first data packet using the updated Hash algorithm.
  4. The method of claim 3, wherein determining a forward path from the source device to the destination device according to the information associated with the first data packet and the network topology data further comprises:
    determining one or more paths from the source device to the destination device according to the network topology data; and
    determining the forward path from the one or more paths according to the five‐tuple hash values.
  5. The method of claim 4, wherein determining a forward path from the source device to the destination device according to the information associated with the first data packet and the network topology data further comprises:
    performing a modulus operation on the five‐tuple Hash values with respect to the one or more paths; and
    determining the forward path from the one or more paths according to results of the modulus operation.
  6. The method of claim 1, wherein the forward path from the source device to the destination device includes at least a first forward path and a second forward path, and the method further comprises:
    receiving a plurality of second data packets from the source device to be forwarded to the destination device;
    distributing the plurality of second data packets to the first forward path and the second forward path, each of the first forward path and the second forward path carries at least a portion of the plurality of second data packets;
    detecting an abnormality occurred in one of the first forward path and the second forward path; and
    determining a third forward path from the source device to the destination device to reroute the portion of the plurality of second data packets carried by one of the first forward path and the second forward path that is involved with the abnormality.
  7. The method of claim 1, wherein the determining of a forward path from the source device to the destination device according to the set of values associated with the first data packet and the network topology data is based on an equal cost multipaths (ECMP) planning algorithm.
  8. One or more machine readable media storing machine readable instructions that, when executed by a first computing node, cause the first computing node to perform acts comprising:
    obtaining, via a network, information associated with an algorithm implemented by at least one second computing node;
    obtaining, via the network, network topology data stored associated with the at least one second computing node;
    receiving, at the first computing node, a first data packet from a source device to be forwarded to a destination device;
    determining a forward path from the source device to the destination device according to information associated with the first data packet and the network topology data; and
    transmitting the first data packet to the destination device through the forward path.
  9. The one or more machine readable media of claim 8, wherein the information associated with the first data packet including a set of values, and the acts further comprises:
    executing a Hash algorithm implemented by the first computing node;
    applying at least the information associated with the algorithm implemented by the at least one second computing node to the Hash algorithm; and
    computing the set of values associated with the first data packet using the Hash algorithm.
  10. The one or more machine readable media of claim 8, wherein the information associated with the first data packet including a set of values, and the acts further comprises:
    determining five‐tuple data associated with the first data packet, the five‐tuple data including a source IP address associated with the source device, a source port number associated with the source device, a destination IP address associated with the destination device, a destination port number associated with the destination device, and a protocol for communication in the network;
    updating a Hash algorithm implemented by the first computing node using the information associated with the algorithm implemented by the at least one second computing node; and
    computing a five‐tuple Hash values corresponding to the five‐tuple data as the set of values associated with the first data packet using the updated Hash algorithm.
  11. The one or more machine readable media of claim 10, the acts further comprising:
    determining one or more paths from the source device to the destination device according to the network topology data; and
    determining the forward path from the one or more paths according to the five‐tuple hash values.
  12. The one or more machine readable media of claim 11, the acts further comprising:
    performing a modulus operation on the five‐tuple Hash values with respect to the one or more paths; and
    determining the forward path from the one or more paths according to results of the modulus operation.
  13. The one or more machine readable media of claim 8, the acts further comprising:
    receiving a plurality of second data packets from the source device to be forwarded to the destination device;
    distributing the plurality of second data packets to the first forward path and the second forward path, each of the first forward path and the second forward path carries at least a portion of the plurality of second data packets;
    detecting an abnormality occurred in one of the first forward path and the second forward path; and
    determining a third forward path from the source device to the destination device to reroute the portion of the plurality of second data packets carried by one of the first forward path and the second forward path that is involved with the abnormality.
  14. The one or more machine readable media of claim 8, wherein the determining of a forward path from the source device to the destination device  according to the set of values associated with the first data packet and the network topology data is based on an equal cost multipaths (ECMP) planning algorithm.
  15. A first computing node comprising:
    one or more processing units; and
    memory storing machine executable instructions that, when executed by one or more processing units, cause the one or more processing units to perform acts comprising:
    obtaining, via a network, information associated with an algorithm implemented by at least one second computing node;
    obtaining, via the network, network topology data stored associated with the at least one second computing node;
    receiving, at the first computing node, a first data packet from a source device to be forwarded to a destination device;
    determining a forward path from the source device to the destination device according to information associated with the first data packet and the network topology data; and
    transmitting the first data packet to the destination device through the forward path.
  16. The first computing node of claim 15, wherein the information associated with the first data packet including a set of values, and the acts further comprises:
    determining five‐tuple data associated with the first data packet, the five‐tuple data including a source IP address associated with the source device, a source port number associated with the source device, a destination IP address associated with the destination device, a destination port number associated with the destination device, and a protocol for communication in the network;
    updating a Hash algorithm implemented by the first computing node using the information associated with the algorithm implemented by the at least one second computing node; and
    computing a five‐tuple Hash values corresponding to the five‐tuple data as the set of values associated with the first data packet using the updated Hash algorithm.
  17. The first computing node of claim 16, wherein the information associated with the first data packet including a set of values, and the acts further comprises:
    determining one or more paths from the source device to the destination device according to the network topology data; and
    determining the forward path from the one or more paths according to the five‐tuple hash values.
  18. The first computing node of claim 17, the acts further comprising:
    performing a modulus operation on the five‐tuple Hash values with respect to the one or more paths; and
    determining the forward path from the one or more paths according to results of the modulus operation.
  19. The first computing node of claim 15, the acts further comprising:
    receiving a plurality of second data packets from the source device to be forwarded to the destination device;
    distributing the plurality of second data packets to the first forward path and the second forward path, each of the first forward path and the second forward path carries at least a portion of the plurality of second data packets;
    detecting an abnormality occurred in one of the first forward path and the second forward path; and
    determining a third forward path from the source device to the destination device to reroute the portion of the plurality of second data packets carried by one of the first forward path and the second forward path that is involved with the abnormality.
  20. The first computing node of claim 15, wherein the determining of a forward path from the source device to the destination device according to the set of values associated with the first data packet and the network topology data is based on an equal cost multipaths (ECMP) planning algorithm.
PCT/CN2020/090827 2020-05-18 2020-05-18 Forward path planning method in massive data center networks WO2021232190A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202080100357.0A CN115462049B (en) 2020-05-18 2020-05-18 Forwarding path planning method for large-scale data network center
PCT/CN2020/090827 WO2021232190A1 (en) 2020-05-18 2020-05-18 Forward path planning method in massive data center networks

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2020/090827 WO2021232190A1 (en) 2020-05-18 2020-05-18 Forward path planning method in massive data center networks

Publications (1)

Publication Number Publication Date
WO2021232190A1 true WO2021232190A1 (en) 2021-11-25

Family

ID=78708969

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/090827 WO2021232190A1 (en) 2020-05-18 2020-05-18 Forward path planning method in massive data center networks

Country Status (2)

Country Link
CN (1) CN115462049B (en)
WO (1) WO2021232190A1 (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140347975A1 (en) * 2013-05-22 2014-11-27 Fujitsu Limited Data transmitting device, data transmitting method and non-transitory computer-readable storage medium
US20160248657A1 (en) * 2015-02-19 2016-08-25 Arista Networks, Inc. System and method of processing in-place adjacency updates
CN106357547A (en) * 2016-09-08 2017-01-25 重庆邮电大学 Software-defined network congestion control algorithm based on stream segmentation
US20170324664A1 (en) * 2016-05-05 2017-11-09 City University Of Hong Kong System and method for load balancing in a data network
CN108390820A (en) * 2018-04-13 2018-08-10 华为技术有限公司 Method, equipment and the system of load balancing
US20190386921A1 (en) * 2016-12-21 2019-12-19 Cisco Technology, Inc. MACHINE LEARNING-DERIVED ENTROPY PATH GRAPH FROM IN-SITU OAM (iOAM) DATA

Family Cites Families (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6898183B1 (en) * 2000-03-14 2005-05-24 Cisco Technology, Inc. Method of determining a data link path in a managed network
GB2398699A (en) * 2003-02-18 2004-08-25 Motorola Inc Determining a maximum transmission unit which may be transmitted over a particular route through a network
US7965642B2 (en) * 2007-09-06 2011-06-21 Cisco Technology, Inc. Computing path information to a destination node in a data communication network
CN101645850B (en) * 2009-09-25 2013-01-30 杭州华三通信技术有限公司 Forwarding route determining method and equipment
CN102801614B (en) * 2012-07-17 2016-04-27 杭州华三通信技术有限公司 A kind of convergence method of equal-cost route and the network equipment
US10218629B1 (en) * 2014-12-23 2019-02-26 Juniper Networks, Inc. Moving packet flows between network paths
CN106559324A (en) * 2015-09-24 2017-04-05 华为技术有限公司 A kind of method E-Packeted based on equal cost multipath and the network equipment
CN107786440B (en) * 2016-08-26 2021-05-11 华为技术有限公司 Method and device for forwarding data message
US10924352B2 (en) * 2018-01-17 2021-02-16 Nicira, Inc. Data center network topology discovery
CN110391982B (en) * 2018-04-20 2022-03-11 伊姆西Ip控股有限责任公司 Method, apparatus and computer program product for transmitting data
CN109039919B (en) * 2018-10-11 2021-09-21 平安科技(深圳)有限公司 Forwarding path determining method, device, system, computer equipment and storage medium

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140347975A1 (en) * 2013-05-22 2014-11-27 Fujitsu Limited Data transmitting device, data transmitting method and non-transitory computer-readable storage medium
US20160248657A1 (en) * 2015-02-19 2016-08-25 Arista Networks, Inc. System and method of processing in-place adjacency updates
US20170324664A1 (en) * 2016-05-05 2017-11-09 City University Of Hong Kong System and method for load balancing in a data network
CN106357547A (en) * 2016-09-08 2017-01-25 重庆邮电大学 Software-defined network congestion control algorithm based on stream segmentation
US20190386921A1 (en) * 2016-12-21 2019-12-19 Cisco Technology, Inc. MACHINE LEARNING-DERIVED ENTROPY PATH GRAPH FROM IN-SITU OAM (iOAM) DATA
CN108390820A (en) * 2018-04-13 2018-08-10 华为技术有限公司 Method, equipment and the system of load balancing

Also Published As

Publication number Publication date
CN115462049B (en) 2024-03-08
CN115462049A (en) 2022-12-09

Similar Documents

Publication Publication Date Title
US11695699B2 (en) Fault tolerant and load balanced routing
JP7417825B2 (en) slice-based routing
US20240022515A1 (en) Congestion-aware load balancing in data center networks
JP6608545B2 (en) Service traffic distribution method and apparatus
JP6369698B2 (en) Traffic switching method, device, and system
EP2928137B1 (en) System and method for software defined routing of traffic within and between autonomous systems with enhanced flow routing, scalability and security
US9680751B2 (en) Methods and devices for providing service insertion in a TRILL network
US8879397B2 (en) Balancing load in a network, such as a data center network, using flow based routing
KR101546734B1 (en) Data center interconnect and traffic engineering
EP3399703B1 (en) Method for implementing load balancing, apparatus, and network system
US9014201B2 (en) System and method for providing deadlock free routing between switches in a fat-tree topology
US9596094B2 (en) Managing multicast distribution using multicast trees
US8630297B2 (en) Method and apparatus for the distribution of network traffic
US20130003549A1 (en) Resilient Hashing for Load Balancing of Traffic Flows
US10931530B1 (en) Managing routing resources of a network
US20120201241A1 (en) Method & apparatus for the distribution of network traffic
US10110421B2 (en) Methods, systems, and computer readable media for using link aggregation group (LAG) status information
US10205661B1 (en) Control messages for scalable satellite device clustering control in a campus network
CN108259205B (en) Route publishing method and network equipment
US11070472B1 (en) Dynamically mapping hash indices to member interfaces
WO2021232190A1 (en) Forward path planning method in massive data center networks
KR20200059299A (en) Direct Interconnect Gateway
CN116192721A (en) Path perception method, device and system
US10284468B1 (en) E-channel identifiers (ECIDS) for scalable satellite device clustering control in a campus network
US10397061B1 (en) Link bandwidth adjustment for border gateway protocol

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20936116

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20936116

Country of ref document: EP

Kind code of ref document: A1

122 Ep: pct application non-entry in european phase

Ref document number: 20936116

Country of ref document: EP

Kind code of ref document: A1