WO2023050874A1 - 报文转发方法及装置、蜻蜓网络 - Google Patents

报文转发方法及装置、蜻蜓网络 Download PDF

Info

Publication number
WO2023050874A1
WO2023050874A1 PCT/CN2022/098019 CN2022098019W WO2023050874A1 WO 2023050874 A1 WO2023050874 A1 WO 2023050874A1 CN 2022098019 W CN2022098019 W CN 2022098019W WO 2023050874 A1 WO2023050874 A1 WO 2023050874A1
Authority
WO
WIPO (PCT)
Prior art keywords
network device
group
path
network
shortest path
Prior art date
Application number
PCT/CN2022/098019
Other languages
English (en)
French (fr)
Inventor
郝卫国
李军
温华锋
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to EP22874272.2A priority Critical patent/EP4333380A1/en
Publication of WO2023050874A1 publication Critical patent/WO2023050874A1/zh

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L12/00Data switching networks
    • H04L12/02Details
    • H04L12/16Arrangements for providing special services to substations
    • H04L12/18Arrangements for providing special services to substations for broadcast or conference, e.g. multicast
    • H04L12/1886Arrangements for providing special services to substations for broadcast or conference, e.g. multicast with traffic restrictions for efficiency improvement, e.g. involving subnets or subdomains
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L45/00Routing or path finding of packets in data switching networks
    • H04L45/12Shortest path evaluation
    • H04L45/121Shortest path evaluation by minimising delays
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L45/00Routing or path finding of packets in data switching networks
    • H04L45/12Shortest path evaluation
    • H04L45/122Shortest path evaluation by minimising distances, e.g. by selecting a route with minimum of number of hops
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L45/00Routing or path finding of packets in data switching networks
    • H04L45/22Alternate routing
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/10Flow control; Congestion control
    • H04L47/12Avoiding congestion; Recovering from congestion
    • H04L47/122Avoiding congestion; Recovering from congestion by diverting traffic away from congested entities

Definitions

  • the present application relates to the technical field of communications, and in particular to a message forwarding method and device, and a dragonfly network.
  • the dragonfly network is a commonly used high-efficiency communication system in the field of high performance computing (HPC). Compared with networking topologies such as fat tree networks, the Dragonfly network has the characteristics of short communication paths, low latency, and large-scale networking, and can achieve low-latency and high-throughput communications.
  • the Dragonfly network includes multiple device groups, and each device group includes multiple network devices.
  • Each network device includes a global (global) interface, a local (local) interface and an access (access) interface.
  • the global interface is used for inter-group interconnection
  • the local interface is used for intra-group interconnection
  • the access interface is used for connection to terminal devices such as servers, virtual machines (virtual machines, VMs), and storage devices.
  • Each network device in each device group implements a mesh (mesh) direct connection (that is, full direct connection) through a local interface.
  • a device group can be regarded as a logical device, and different device groups are directly connected to each other through the global interface.
  • the Dragonfly network usually uses adaptive routing to forward packets in order to take into account the two goals of low latency and high throughput.
  • the shortest path between groups is not congested, the shortest path between groups is preferentially selected for message forwarding to achieve low latency; if the shortest path between groups is congested, non-congested Forwarding over the shortest path for high throughput.
  • the inter-group shortest path refers to a message forwarding path that passes only one inter-group interconnection link.
  • the non-shortest path between groups refers to the packet forwarding path that passes through only one intermediate device group and two inter-group interconnection links, that is, the packet forwarding path that passes through only one intermediate device group is called the inter-group non-shortest path.
  • the inter-group shortest path may have multiple situations: the inter-group shortest path includes only one inter-group interconnection link; the inter-group shortest path includes a The inter-group interconnection link and an intra-group interconnection link in the source device group or the destination device group; the shortest path between groups includes an intra-group interconnection link in the source device group, an inter-group interconnection link, and a group Intralink.
  • the shortest path between groups can be preferentially used, and the control precision of the message forwarding path is low.
  • the present application provides a message forwarding method and device, and a dragonfly network, which can solve the current problem of low control precision of message forwarding paths.
  • a packet forwarding method is provided.
  • the method is applied to a dragonfly network, and the dragonfly network includes multiple device groups, and multiple inter-group interconnection links are provided between different device groups.
  • the first network device receives the first packet sent by the first terminal device connected to the first network device.
  • the destination address of the first message is the Internet Protocol (Internet Protocol, IP) address of the second terminal device connected to the second network device, the first network device belongs to the first device group, and the second network device belongs to the second device group.
  • IP Internet Protocol
  • the first network device connects to the second network through the first inter-group interconnection link
  • the device sends the first packet.
  • the first network device preferably selects the inter-group interconnection link between the first network device and the second device group to send messages to the second network device, so that the number of forwarding hops of the messages is as small as possible, so that Realize the low latency of inter-group communication, and improve the control precision of the message forwarding path at the same time.
  • the first network device determines whether the first inter-group interconnection link is congested according to the congestion state of the global interface connected to the second device group on the first network device.
  • the first network device sequentially Congestion judgments are performed on the shortest path within the group, the non-shortest direct local direct path, and the non-shortest path within the group between network devices until the target forwarding path is obtained.
  • the first network device sends the second packet obtained based on the first packet to the second network device through the target forwarding path.
  • the detour shortest path within the group includes the intra-group interconnection link between the first network device and the third network device in the first device group and the inter-group interconnection link between the third network device and the second device group
  • the local direct non-shortest path includes an intergroup interconnection link between the first network device and the third device group and an intergroup interconnection link between the third device group and the second device group
  • the intragroup bypass non-shortest path includes the intragroup interconnection link between the first network device and the fourth network device in the first device group, the intergroup interconnection link between the fourth network device and the fourth device group, and the fourth device group and the fourth device group. An intergroup interconnection link between the second device group.
  • the first network device when the first network device selects a path, it follows the principle of first selecting the shortest path between groups, and then selects the non-shortest path between groups when the shortest path between groups is congested, and selects the forwarding path among the shortest paths between groups
  • the local direct connection shortest path is preferred, and the local direct connection non-shortest path is preferred when the inter-group non-shortest path is selected.
  • the packet forwarding delay can be minimized, and on the other hand, the local direct connection path is preferred.
  • the first network device can better perceive the local congestion situation in real time, and subsequently judge whether to switch paths according to the local congestion situation, so that the timing of path switching can be more accurately determined.
  • the first network device sequentially performs congestion judgment on the shortest path within the group, the non-shortest local direct connection path, and the non-shortest path within the group between the first network device and the second network device until the target Up to the forwarding path, it includes: the first network device performs congestion judgment on the bypass shortest path within the group between the first network device and the second network device.
  • the first network device detours any intra-group detour between the first network device and the second network device.
  • the shortest path is used as the target forwarding path.
  • the first network device When there is no intra-group detour shortest path without congestion between the first network device and the second network device, the first network device performs local direct connection non-shortest path between the first network device and the second network device Congestion judgment. When there is a local direct connection non-shortest path without congestion between the first network device and the second network device, the first network device connects any local direct connection between the first network device and the second network device without congestion Non-shortest paths are used as destination forwarding paths. When there is no local direct non-shortest path without congestion between the first network device and the second network device, the first network device detours the non-shortest path within the group between the first network device and the second network device Perform congestion judgment. When there is a bypass non-shortest path within a group without congestion between the first network device and the second network device, the first network device transfers Bypass non-shortest paths as destination forwarding paths.
  • the implementation of the congestion judgment by the first network device on the shortest bypass path within the group between the first network device and the second network device includes: the first network device according to the first network device and the third network
  • the queue depth of the first outgoing interface queue of the local interface connected to the device and the congestion status of the global interface connected to the second device group on the third network device determine that the connection between the first network device and the second network device passes through the third network device Whether congestion occurs on the shortest bypass path within the group, the queue of the first outbound interface is used to forward packets forwarded through the shortest bypass path within the group in the first network device.
  • the method for the first network device to perform congestion judgment on the local direct non-shortest path between the first network device and the second network device includes: the first network device according to the connection between the first network device and the third device
  • the congestion state of the global interface of the group connection determines whether the local direct connection non-shortest path passing through the third device group between the first network device and the second network device is congested.
  • the implementation of the congestion judgment by the first network device on the detour non-shortest path within the group between the first network device and the second network device includes: the first network device according to the first network device and the fourth The queue depth of the second outgoing interface queue of the local interface connected to the network device and the congestion status of the global interface connected to the fourth device group on the fourth network device determine that the connection between the first network device and the second network device passes through the fourth network. Whether the non-shortest intra-group detour path of the device and the fourth device group is congested, and the second outbound interface queue is used to forward packets forwarded by the non-shortest intra-group detour path in the first network device.
  • the first network device Because when the first network device selects the shortest path in the group or the non-shortest path in the group, it must forward the message through another network device in the first device group, and whether the first network device chooses the bypass in the group The shortest path is still a non-shortest path in the group. If the packet is forwarded through the same network device in the first device group, the local interface used to send the packet is the same, so it is necessary to distinguish the packets sent from the same local interface. In this paper, the number of packets that take the shortest path within the group and the number of packets that take the non-shortest path within the group are used to determine the congestion of the shortest path within the group and the non-shortest path within the group.
  • the outgoing interface queue of the local interface is divided into a first outgoing interface queue and a second outgoing interface queue.
  • the first outbound interface queue is used to forward the packets forwarded by the first network device through the shortest path within the group
  • the second outbound interface queue is used to forward the packets forwarded by the first network device through the non-shortest path within the group , to determine the congestion situation of the shortest path in the group based on the queue depth of the first outbound interface queue, and determine the congestion situation of the non-shortest path in the group based on the queue depth of the second outbound interface queue.
  • the implementation of the first network device sending the second message obtained based on the first message to the second network device through the target forwarding path includes: when the target forwarding path is the shortest detour path in the group, the first The network device adds the first indication to the first message to obtain the second message, and sends the second message to the second network device through the target forwarding path, and the first indication is used to indicate that the type of the forwarding path is the shortest path within the group .
  • the target forwarding path is a non-shortest bypass path within the group
  • the first network device adds the second indication to the first packet to obtain the second packet, and sends the second packet to the second network device through the target forwarding path.
  • the second indication is used to indicate that the forwarding path type is an intra-group bypass non-shortest path.
  • the first network device when the first network device chooses the shortest path within the group or the non-shortest path within the group, it must forward the message through another network device in the first device group, regardless of whether the first network device chooses The shortest bypass path within the group or the non-shortest bypass path within the group. If the packet is forwarded through the same network device in the first device group, the local interface used by this network device to receive the packet is the same, so the network device Among the packets received from the local interface, it is necessary to distinguish which packets follow the shortest path within the group and which packets follow the non-shortest path within the group, so that the corresponding forwarding path can be used to forward the packets.
  • the source network device adds the first indication to the message that takes the shortest path within the group, and adds the second indication to the message that takes the non-shortest path within the group, so as to clearly indicate the forwarding within the group Which forwarding path is used by other network devices of the message to forward the message, so that other network devices can know which global interface should be used to forward the message based on the indication in the message, thereby improving the efficiency of message forwarding.
  • the first network device obtains a routing and forwarding table, where the routing and forwarding table includes a routing prefix table and multiple equal cost multi-path (equal cost multi-path, ECMP) group (ECMP group) tables.
  • Each entry of the routing prefix table includes a correspondence relationship between the destination IP address and the group index of the destination device group.
  • the group index of the destination device group is associated with an ECMP group table.
  • the ECMP group table includes the outgoing interfaces corresponding to each path from the first network device to the destination device group.
  • the destination device group is the terminal device to which the corresponding destination IP address belongs.
  • the network device uses the route prefix table combined with the ECMP group table to store the route to each terminal device, and one ECMP group table can store all the forwarding information corresponding to the IP address of the terminal device connected to the same device group, which can Save entry resources.
  • the ECMP group table includes a first routing sub-table and a second routing sub-table.
  • the first routing sub-table includes outgoing interfaces corresponding to each path from the first network device to the destination device group, and the first routing sub-table is used for the first network device to forward messages from terminal devices accessing the first device group .
  • the second routing sub-table includes the outbound interface corresponding to the inter-group shortest path from the first network device to the destination device group, and the second routing sub-table is used for the first network device to forward traffic from other device groups other than the first device group messages from the terminal equipment.
  • the first network device when the first network device is located in the intermediate device group or the destination device group, the first network device forwards the message based on the second routing subtable, that is, the first network device uses the shortest path algorithm to forward the message, which can avoid the occurrence of Routing loop.
  • each device group corresponds to an autonomous system (autonomous system, AS) number.
  • the process for the first network device to obtain the route forwarding table includes: when the first network device receives the first route message, and the AS-path attribute of the first route message contains only one AS number, the first network device The forwarding entries obtained based on the first routing message are respectively added to the sub-table and the second routing sub-table.
  • the first network device receives the second routing message, and the AS-path attribute of the second routing message contains two AS numbers, the first network device only adds the forwarding information obtained based on the second routing message to the first routing subtable. post items.
  • the process for the first network device to obtain the routing and forwarding table includes: the first network device according to the networking topology of the Dragonfly network, the IP address of the terminal device connected to the Dragonfly network, and the access device of the terminal device, Generate routing prefix table and multiple ECMP group tables.
  • the process for the first network device to obtain the routing and forwarding table includes: the first network device receiving the routing and forwarding table sent by the control device.
  • a message forwarding device is provided.
  • the device is applied to a first network device in a dragonfly network, and the dragonfly network includes multiple device groups, and there are multiple inter-group interconnection links between different device groups.
  • the device includes a plurality of functional modules, and the plurality of functional modules interact to implement the methods in the above first aspect and various implementation manners thereof.
  • the multiple functional modules can be implemented based on software, hardware or a combination of software and hardware, and the multiple functional modules can be combined or divided arbitrarily based on specific implementations.
  • a dragonfly network including: multiple device groups, and there are multiple inter-group interconnection links between different device groups, and the network devices in the device groups are used to implement the above-mentioned first aspect and various method in the implementation.
  • a network device including: a processor and a memory;
  • the memory is used to store a computer program, and the computer program includes program instructions
  • the processor is configured to invoke the computer program to implement the methods in the above first aspect and various implementation manners thereof.
  • a computer-readable storage medium is provided. Instructions are stored on the computer-readable storage medium. When the instructions are executed by a processor, the above-mentioned first aspect and the methods in each implementation manner thereof are realized.
  • a computer program product including a computer program.
  • the computer program When the computer program is executed by a processor, the method in the above first aspect and its various implementation manners are realized.
  • a chip in a seventh aspect, includes a programmable logic circuit and/or program instructions, and when the chip is running, implements the method in the above first aspect and various implementation manners thereof.
  • Fig. 1 is a schematic structural diagram of a dragonfly network provided by an embodiment of the present application
  • FIG. 2 is a schematic flowchart of a message forwarding method provided in an embodiment of the present application
  • FIG. 3 is a schematic structural diagram of a message forwarding device provided in an embodiment of the present application.
  • FIG. 4 is a schematic structural diagram of another message forwarding device provided by an embodiment of the present application.
  • Fig. 5 is a block diagram of a network device provided by an embodiment of the present application.
  • FIG. 1 is a schematic structural diagram of a dragonfly network provided by an embodiment of the present application.
  • the dragonfly network includes device group A, device group B and device group C.
  • Device group A includes 5 network devices A1-A5
  • device group B includes 5 network devices B1-B5
  • device group C includes 5 network devices C1-C5.
  • the network device may be a router or a switch.
  • the number of device groups in FIG. 1 and the number of network devices in each device group are only for illustration, and are not intended to limit the Dragonfly network provided by the embodiment of the present application.
  • each network device has 8 interfaces GE1-GE8 (not shown in the figure). Two interfaces (GE1, GE2) are access interfaces, four interfaces (GE3, GE4, GE5, GE6) are local interfaces, and two interfaces (GE7, GE8) are global interfaces.
  • the interface of the network device can be used as an incoming interface to receive packets, and can also be used as an outgoing interface to send packets.
  • the congestion state of an interface of a network device described in this application refers to the congestion state when the interface is used as an outbound interface.
  • the congestion state of the interface may be determined based on the queue depth of the outbound interface queue of the interface. If the queue depth of at least one outgoing interface queue of an interface exceeds the first threshold, it is determined that the interface is congested; if the queue depth of all the outgoing interface queues of an interface does not exceed the first threshold, it is determined that the interface is not congestion.
  • the congestion state of the interface may also be determined based on the bandwidth utilization of the interface.
  • the bandwidth utilization rate of a certain interface exceeds the second threshold, it is determined that the interface is congested; if the bandwidth utilization rate of a certain interface does not exceed the second threshold, it is determined that the interface is not congested.
  • the bandwidth utilization rate of an interface is equal to the ratio of the packet sending rate of the interface to the bandwidth allocated to the interface.
  • the congestion state of the interface can be divided into two types: congestion and non-congestion, or the congestion state of the interface can also be divided into three types: non-congestion (ie light load), moderate congestion and severe congestion. This embodiment of the present application does not limit this.
  • the five inter-group interconnection links between device group A and device group B include: the inter-group interconnection link L AB1 between network device A1 and network device B1, the inter-group interconnection link L AB1 between network device A2 and network device B2 Intergroup interconnection link L AB2 , intergroup interconnection link L AB3 between network device A3 and network device B3 , intergroup interconnection link L AB4 between network device A4 and network device B4 , network device A5 and network device Intergroup interconnection link L AB5 between B5.
  • the five inter-group interconnection links between device group A and device group C include: the inter-group interconnection link L AC1 between network device A1 and network device C5, the inter-group interconnection link L AC1 between network device A2 and network device C4 The interconnection link L AC2 , the intergroup interconnection link L AC3 between the network device A3 and the network device C3 , the intergroup interconnection link L AC4 between the network device A4 and the network device C2 , the intergroup interconnection link L AC4 between the network device A5 and the network device C1 The inter-group interconnection link L AC5 .
  • the five inter-group interconnection links between device group B and device group C include: the inter-group interconnection link L BC1 between network device B1 and network device C1, the inter-group interconnection link L BC1 between network device B2 and network device C2 Interconnection link L BC2 , intergroup interconnection link L BC3 between network device B3 and network device C3 , intergroup interconnection link L BC4 between network device B4 and network device C4 , intergroup interconnection link L BC4 between network device B5 and network device C5 Inter-group interconnection link L BC5 .
  • the inter-group shortest path between two device groups is divided into the local direct shortest path and the intra-group detour shortest path, and/or, the inter-group non-shortest path between two device groups It is divided into local direct non-shortest paths and intra-group detour non-shortest paths.
  • the difference between the shortest local direct path and the shortest path in the group is that the shortest path in the local direct connection does not include the intra-group interconnection link in the source device group, and the shortest path in the group includes a group link in the source device group. Intralink.
  • the difference between the local direct non-shortest path and the intra-group detour non-shortest path is that the local direct non-shortest path does not include the intra-group interconnection link in the source device group, and the intra-group detour non-shortest path includes the source device group. An intragroup interconnection link within the group.
  • the source device group refers to the device group where the access device (hereinafter referred to as the source network device) of the source terminal device that sends the message is located.
  • the destination device group refers to the device group where the access device (hereinafter referred to as the destination network device) of the destination terminal device receiving the message belongs.
  • the terminal device S1 is connected to the network device C1
  • the terminal device S2 is connected to the network device B5
  • the terminal device S3 is connected to the network device B5 .
  • terminal device S1 is a source terminal device
  • network device C1 is a source network device
  • device group C is a source device group.
  • the terminal device S2 is the destination terminal device
  • the network device B5 is the destination network device
  • the device group B is the destination device group.
  • the source network device can select the local direct shortest path,
  • the intra-group detour shortest path, the local direct non-shortest path, and the intra-group detour non-shortest non-congested path are used to forward the message from the terminal device connected to the source network device.
  • the source network device can select the shortest path within the group, the local direct non-shortest path, and the bypass path within the group in descending order of priority.
  • the non-congested path in the non-shortest path is used to forward the message from the terminal device connected to the source network device.
  • the source network device when there is an inter-group interconnection link between the source network device and the destination device group, and the inter-group interconnection link is not congested, the source network device sends a message from the source terminal device through the inter-group interconnection link. message.
  • terminal device S1 is a source terminal device
  • terminal device S2 is a destination terminal device. If the inter-group interconnection link L BC1 between network device C1 and network device B1 is not congested, network device C1 sends a message from terminal device S1 with the destination address to terminal device S2 to network device B1 through inter-group interconnection link L BC1 message.
  • the local direct shortest path includes the intergroup interconnection link between the source network device and the destination device group.
  • the source network device sends the message from the source terminal device through the intergroup interconnection link between the source network device and the destination device group. That is, the source network device sends the packet from the source terminal device through the local directly connected shortest path.
  • the local direct shortest path between network device C1 and network device B5 includes the inter-group interconnection link L BC1 between network device C1 and network device B1 and the intra-group interconnection link between network device B1 and network device B5 road.
  • the detour shortest path within the group includes an intragroup interconnection link between the source network device and another network device in the source device group, and an intergroup interconnection link between the other network device and the destination device group.
  • an intra-group detour shortest path between network device C1 and network device B5 includes the intra-group interconnection link between network device C1 and network device C5 and the inter-group interconnection link between network device C5 and network device B5 Road L BC5 .
  • the local direct non-shortest path includes an intergroup interconnection link between the source network device and another device group other than the destination device group, and an intergroup interconnection link between the other device group and the destination device group.
  • the local direct non-shortest path between network device C1 and network device B5 includes the intergroup interconnection link L AC5 between network device C1 and network device A5 and the intergroup interconnection link between network device A5 and network device B5 Link LAB5 .
  • the detour non-shortest path in the group includes the intra-group interconnection link between the source network device and another network device in the source device group, and the group link between the other network device and another device group other than the destination device group. and an intergroup interconnection link between the other device group and the destination device group.
  • an intra-group detour non-shortest path between network device C1 and network device B5 includes the intra-group interconnection link between network device C1 and network device C2, the inter-group interconnection link between network device C2 and network device A4
  • no congestion occurs on the forwarding path means that no congestion occurs on the link associated with the source device group in the forwarding path.
  • the links associated with the source device group may include intragroup interconnection links within the source device group and intergroup interconnection links between the source device group and other device groups. For example, if the intergroup interconnection link between the source network device and the destination device group is not congested, it is determined that the local direct shortest path is not congested. If the intragroup interconnection link between the source network device and another network device in the source device group is not congested, and the intergroup interconnection link between the other network device and the destination device group is not congested, then It is determined that no congestion occurs on the bypass shortest path within the group passing through the other network device.
  • the intergroup interconnection link between the source network device and another device group other than the destination device group it is determined that no congestion occurs on the local direct non-shortest path passing through the other device group. If the intra-group interconnection link between the source network device and another network device in the source device group is not congested, and the inter-group interconnection link between the other network device and another device group other than the destination device group If there is no congestion on the path, it is determined that no congestion occurs on the detour non-shortest path within the group passing through the other network device and the other device group.
  • the non-shortest path enables the source network device to perform path selection at a finer granularity, and realize fine control over the packet forwarding path, so that the low latency and high throughput of packet forwarding can be accurately controlled.
  • the source network device selects the path, it follows the principle of giving priority to the shortest path between the groups, and then selects the non-shortest path between the groups when the shortest path between the groups is congested, and chooses the forwarding path in the shortest path between the groups.
  • the shortest path of local direct connection When selecting the forwarding path among the non-shortest paths between groups, the local direct connection non-shortest path is preferred. On the one hand, it can reduce the packet forwarding delay as much as possible. On the other hand, the local direct connection path is preferred.
  • the source network device It can better perceive the local congestion situation in real time, and subsequently judge whether to switch paths according to the local congestion situation, and can determine the timing of path switching more accurately.
  • FIG. 2 is a schematic flowchart of a packet forwarding method provided in an embodiment of the present application. This method can be applied to the dragonfly network shown in Figure 1. As shown in Figure 2, the method includes:
  • Step 201 The first network device receives a first packet sent by a first terminal device connected to the first network device.
  • the destination address of the first message is the IP address of the second terminal device connected to the second network device, the first network device belongs to the first device group, and the second network device belongs to the second device group. There are multiple inter-group interconnection links between the first device group and the second device group.
  • the first terminal device is terminal device S1
  • the second terminal device is terminal device S2
  • the first network device is network device C1
  • the first device group is device group C
  • the second The second network device is network device B5, and the second device group is device group B.
  • Step 202 the first network device judges whether there is an inter-group interconnection link between the first network device and the second device group; if there is a first inter-group interconnection link between the first network device and the second device group, execute Step 203: If there is no intergroup interconnection link between the first network device and the second device group, execute step 205.
  • inter-group interconnection links between the first device group and the second device group, there may or may not be inter-group interconnection links between the first network device and the second device group .
  • the first network device judges whether there is an intergroup interconnection link between the first network device and the second device group, that is, judges whether there is a local direct shortest path between the first network device and the second device group.
  • Step 203 the first network device judges whether the first inter-group interconnection link is congested; if the first inter-group inter-link is not congested, execute step 204; if the first inter-group interconnection link is congested, execute step 203. 205.
  • the first network device judges whether the interconnection link between the first groups is congested, that is, judges the congestion of the local direct shortest path between the first network device and the second device group.
  • the first network device determines whether the first inter-group interconnection link is congested according to the congestion state of the global interface connected to the second device group on the first network device. If the global interface connected to the second device group on the first network device is congested, it is determined that the interlink between the first groups is congested; if the global interface connected to the second device group on the first network device is not congested, then Make sure that the interconnect link between the first group is not congested.
  • step 201 it is assumed that the network device C1 is connected to the network device B1 through the global interface GE8. If the global interface GE8 of the network device C1 is congested, it is determined that the intergroup interconnection link L BC1 between the network device C1 and the network device B1 is congested; if the global interface GE8 of the network device C1 is not congested, it is determined that the network device C1 The intergroup interconnection link L BC1 with the network device B1 is not congested.
  • Step 204 the first network device sends the first packet to the second network device through the first inter-group interconnection link.
  • the first network device sends the first packet to the second network device through the first inter-group interconnection link, that is, sends the first packet to the second network device through the local direct connection shortest path.
  • the source network device preferentially selects the local directly connected shortest path to send the message to the destination network device, so that the number of forwarding hops of the message is as small as possible, so as to achieve low delay in inter-group communication.
  • Step 205 the first network device performs a congestion judgment on the shortest intra-group bypass path between the first network device and the second network device; if there is an intra-group bypass without congestion between the first network device and the second network device If the shortest path is executed, execute step 206; if there is no detour shortest path within the group without congestion between the first network device and the second network device, execute step 207.
  • the first network device is based on the queue depth of the first outgoing interface queue of the local interface connected to the third network device on the first network device and the congestion state of the global interface connected to the second device group on the third network device , determining whether congestion occurs on the detour shortest path within the group passing through the third network device between the first network device and the second network device. If the queue depth of the first outgoing interface queue of the local interface connected to the third network device on the first network device exceeds the first threshold, and/or, the global interface connected to the second device group on the third network device is congested, Then it is determined that the shortest detour path within the group passing through the third network device between the first network device and the second network device is congested.
  • the queue depth of the first outgoing interface queue of the local interface connected to the third network device on the first network device does not exceed the first threshold, and the global interface connected to the second device group on the third network device is not congested, then It is determined that no congestion occurs on the detour shortest path within the group passing through the third network device between the first network device and the second network device.
  • the first outbound interface queue is used for forwarding the packets forwarded by the first network device through the bypass shortest path within the group. Because when the first network device selects the shortest path in the group or the non-shortest path in the group, it must forward the message through another network device in the first device group, and whether the first network device chooses the bypass in the group The shortest path is still a non-shortest path in the group. If the packet is forwarded through the same network device in the first device group, the local interface used to send the packet is the same, so it is necessary to distinguish the packets sent from the same local interface.
  • the number of packets that take the shortest path within the group and the number of packets that take the non-shortest path within the group are used to determine the congestion of the shortest path within the group and the non-shortest path within the group.
  • the outgoing interface queue of the local interface is divided into a first outgoing interface queue and a second outgoing interface queue.
  • the first outbound interface queue is used to forward the packets forwarded by the first network device through the shortest path within the group
  • the second outbound interface queue is used to forward the packets forwarded by the first network device through the non-shortest path within the group , to determine the congestion situation of the shortest path in the group based on the queue depth of the first outbound interface queue, and determine the congestion situation of the non-shortest path in the group based on the queue depth of the second outbound interface queue.
  • the third network device is any network device except the first network device that has an intergroup interconnection link between the first device group and the second device group.
  • the third network device may be network device C2, network device C3, network device C4 or network device C5.
  • multiple network devices belonging to the same device group notify each other of the congestion status of their own global interfaces.
  • the first network device may periodically or in real time send the congestion status of its own global interface to other network devices in the first device group.
  • the latest congestion status of each interface of the first network device and the latest congestion status of global interfaces of other network devices in the first device group may be stored in the first network device.
  • the first network device is the network device C1 in FIG. 1 , and the congestion status of each interface at a certain moment stored in the network device C1 may be as shown in Table 1. Wherein, "1" indicates that no congestion occurs, and "2" indicates that congestion occurs.
  • the congestion state (a, b) of the local interface GE3-6 of the network device C1 wherein a represents the queue of the first outbound interface used to forward the packets forwarded by the shortest path within the group in the network device C1 Congestion state, b represents the congestion state of the queue of the second outbound interface used to forward packets in the network device C1 that are forwarded by bypassing the non-shortest path within the group.
  • the first network device takes the shortest detour path within any group without congestion between the first network device and the second network device as the target forwarding path.
  • the first network device may perform congestion judgments on all shortest detour paths within the group between the first network device and the second network device, and then it can be seen that there is no congestion between the first network device and the second network device Randomly select one of the bypass shortest paths in the group as the target forwarding path.
  • the first network device may sequentially perform congestion judgment on the shortest detour path within the group between the first network device and the second network device, and stop judging after obtaining the shortest detour path within the group without congestion, and send The detour shortest path within the group without congestion is used as the target forwarding path.
  • network device C1 and network device B5 there are four shortest detour paths within the group between network device C1 and network device B5: network device C1 ⁇ network device C2 ⁇ network device B2 ⁇ network device B5; network device C1 ⁇ network device B5; network device C1 ⁇ network device B5 Network device C3 ⁇ network device B3 ⁇ network device B5; network device C1 ⁇ network device C4 ⁇ network device B4 ⁇ network device B5; network device C1 ⁇ network device C5 ⁇ network device B5.
  • Step 207 the first network device performs a congestion judgment on the local direct connection non-shortest path between the first network device and the second network device; if there is a local direct connection without congestion between the first network device and the second network device If it is not the shortest path, execute step 208; if there is no local direct non-shortest path without congestion between the first network device and the second network device, execute step 209.
  • the first network device determines that the local direct connection between the first network device and the second network device through the third device group is not the shortest according to the congestion state of the global interface connected to the third device group on the first network device Whether the path is congested. If the global interface connected to the third device group on the first network device is congested, it is determined that the local direct non-shortest path passing through the third device group between the first network device and the second network device is congested. If no congestion occurs on the global interface connected to the third device group on the first network device, it is determined that no congestion occurs on the local direct non-shortest path passing through the third device group between the first network device and the second network device.
  • the third device group is any device group in the Dragonfly network except the first device group and the second device group.
  • the third device group is device group A. Assume that network device C1 is connected to network device A5 through a global interface GE7.
  • the global interface GE7 of the network device C1 is congested, it is determined that the intergroup interconnection link L AC5 between the network device C1 and the network device A5 is congested, and then it is determined that the local direct connection between the network device C1 and the network device B5 is not the shortest The path is congested; if the global interface GE7 of network device C1 is not congested, it is determined that the intergroup interconnection link L AC5 between network device C1 and network device A5 is not congested, and then it is determined that there is no congestion between network device C1 and network device B5. No local direct non-shortest paths are congested.
  • Step 208 the first network device takes any local direct non-shortest path without congestion between the first network device and the second network device as the target forwarding path.
  • the first network device may perform congestion judgments on all local direct non-shortest paths between the first network device and the second network device, and then it can be seen that there is no congestion between the first network device and the second network device Randomly select one of the local direct non-shortest paths as the target forwarding path.
  • the first network device may sequentially perform congestion judgment on the local direct non-shortest path between the first network device and the second network device, stop judging after obtaining the local direct non-shortest path without congestion, and send The local direct non-shortest path without congestion is used as the target forwarding path.
  • network device C1 In the dragonfly network shown in Figure 1, there is only one local direct non-shortest path between network device C1 and network device B5: network device C1 ⁇ network device A5 ⁇ network device B5.
  • Step 209 the first network device performs congestion judgment on the detour non-shortest path within the group between the first network device and the second network device; Detour non-shortest path, execute step 210; if there is no detour non-shortest path within the group without congestion between the first network device and the second network device, end the message forwarding process.
  • the first network device is based on the queue depth of the second outgoing interface queue of the local interface connected to the fourth network device on the first network device and the congestion state of the global interface connected to the fourth device group on the fourth network device , determining whether congestion occurs on an intra-group detour non-shortest path passing through the fourth network device and the fourth device group between the first network device and the second network device.
  • the queue depth of the second outbound interface queue of the local interface connected to the fourth network device on the first network device exceeds the first threshold, and/or, the global interface connected to the fourth device group on the fourth network device is congested, Then it is determined that congestion occurs on the detour non-shortest path within the group passing through the fourth network device and the fourth device group between the first network device and the second network device.
  • the queue depth of the second outgoing interface queue of the local interface connected to the fourth network device on the first network device does not exceed the first threshold, and the global interface connected to the fourth device group on the fourth network device is not congested, then It is determined that no congestion occurs on the detour non-shortest path within the group passing through the fourth network device and the fourth device group between the first network device and the second network device.
  • the second outbound interface queue is used for forwarding the packets forwarded through the non-shortest path within the group in the first network device.
  • the fourth network device is any network device except the first network device that has an intergroup interconnection link between the first device group and the second device group.
  • the fourth network device and the third network device may be the same network device, or may be different network devices.
  • the fourth device group is any device group in the Dragonfly network except the first device group and the second device group.
  • the fourth device group and the third device group may be the same device group, or may be different device groups.
  • the third network device may be network device C2, network device C3, network device C4 or network device C5.
  • the fourth device group is device group A.
  • the first network device takes as the target forwarding path any detour non-shortest path within a group where no congestion occurs between the first network device and the second network device.
  • the first network device may perform congestion judgment on all detour non-shortest paths within the group between the first network device and the second network device, and then from the fact that there is no congestion between the first network device and the second network device Randomly select one of the congested bypass non-shortest paths as the target forwarding path.
  • the first network device may sequentially perform congestion judgment on the non-shortest detour path within the group between the first network device and the second network device, and stop the judgment after obtaining the non-shortest detour path within the group without congestion, And use the detour non-shortest path within the group without congestion as the target forwarding path.
  • network device C1 ⁇ network device C2 ⁇ network device A4 ⁇ network device B4 ⁇ network device B5 network device C1 ⁇ network device C2 ⁇ network device A4 ⁇ network device B4 ⁇ network device B5 ;Network device C1 ⁇ network device C3 ⁇ network device A3 ⁇ network device B3 ⁇ network device B5; network device C1 ⁇ network device C4 ⁇ network device A2 ⁇ network device B2 ⁇ network device B5; network device C1 ⁇ network device C5 ⁇ network Device A1 ⁇ Network Device B1 ⁇ Network Device B5.
  • Step 211 the first network device sends the second packet obtained based on the first packet to the second terminal device through the target forwarding path.
  • the first network device when the target forwarding path is the shortest detour path within the group, the first network device adds the first indication to the first packet to obtain the second packet, and sends the second packet to the second network device through the target forwarding path.
  • the first indication is used to indicate that the forwarding path type is the shortest path within the group.
  • the target forwarding path is a non-shortest bypass path within the group
  • the first network device adds the second indication to the first packet to obtain the second packet, and sends the second packet to the second network device through the target forwarding path.
  • the second indication is used to indicate that the forwarding path type is an intra-group bypass non-shortest path.
  • the first network device Because when the first network device selects the shortest path in the group or the non-shortest path in the group, it must forward the message through another network device in the first device group, regardless of whether the first network device chooses the bypass path in the group.
  • the shortest path is still a non-shortest path in the group. If the packet is forwarded through the same network device in the first device group, then the local interface used by this network device to receive the packet is the same, so this network device needs to distinguish between local Among the packets received by the interface, which packets follow the shortest path within the group, and which packets follow the non-shortest path within the group, so that the corresponding forwarding path is used to forward the packets.
  • the source network device clearly indicates that the device group Which forwarding path is used by other network devices that forward the message to forward the message, so that other network devices can know which global interface should be used to forward the message based on the indication in the message, so that the efficiency of message forwarding can be improved.
  • the above-mentioned first indication and second indication may be different differential services code point (differentiated services code point, DSCP) coded values, or may be different virtual local area network (virtual local area network, VLAN) priorities, and Or it could be a different 802.1p tag, etc.
  • the first indication and the second indication may be assigned in advance by a network manager, and the embodiment of the present application does not limit the type of indication.
  • VLAN priority when VLAN priority is used to indicate the forwarding path type, VLAN 10 can be used to indicate that the forwarding path type is the shortest path within the group, and VLAN 20 can be used to indicate that the forwarding path type is the non-shortest path within the group.
  • the first network device directly sends the first message to the second terminal device through the target forwarding path, that is, the second message in step 211 is the second message in step 201 of the first message.
  • the shortest path and non-shortest path bypassing within the group enable the source network device to perform path selection at a finer granularity, and realize fine control over the packet forwarding path, so that the low latency and high throughput of packet forwarding can be accurately controlled .
  • the source network device selects the path, it follows the principle of giving priority to the shortest path between the groups, and then selects the non-shortest path between the groups when the shortest path between the groups is congested, and chooses the forwarding path in the shortest path between the groups.
  • the shortest path of local direct connection When selecting the forwarding path among the non-shortest paths between groups, the local direct connection non-shortest path is preferred. On the one hand, it can reduce the packet forwarding delay as much as possible. On the other hand, the local direct connection path is preferred.
  • the source network device It can better perceive the local congestion situation in real time, and subsequently judge whether to switch paths according to the local congestion situation, and can determine the timing of path switching more accurately.
  • the above steps 201 to 211 may be performed by the forwarding chip in the first network device, that is, the forwarding chip judges the congestion of the forwarding path, and selects an appropriate forwarding path to forward the message based on the judgment result.
  • the above steps 201 to 211 may also be performed by the coprocessor of the forwarding chip in the first network device. After the forwarding chip receives the message, the coprocessor will judge the congestion of the forwarding path and select the appropriate path based on the judgment result. forwarding path, and the forwarding chip forwards the message based on the forwarding path selected by the coprocessor.
  • the communication between network devices in the Dragonfly network is realized based on the three-layer forwarding technology.
  • Each network device needs to maintain a routing and forwarding table to implement packet forwarding.
  • the first network device obtains a routing and forwarding table, where the routing and forwarding table includes a routing prefix table and multiple ECMP group tables.
  • Each entry of the routing prefix table includes a correspondence relationship between the destination IP address and the group index of the destination device group.
  • the group index of the destination device group is associated with an ECMP group table, and the ECMP group table includes the outgoing interfaces corresponding to each path from the first network device to the destination device group, and the destination device group is the terminal to which the corresponding destination IP address belongs The device group to which the access device of the device belongs.
  • each entry in the routing prefix table corresponds to store an IP address and a group index.
  • the IP address in one entry is the IP address of the destination terminal device (ie, the destination IP address)
  • the group index is the group index of the device group (ie, the destination device group) where the access device of the destination terminal device is located.
  • the IP address of terminal device S2 is 20.1.1.1/2
  • the IP address of terminal device S3 is 20.1.2.1/2
  • the group index of device group B is 100
  • the network device The routing prefix table stored in C1 may be shown in Table 2.
  • the routing prefix table lists the packets whose destination address is the IP address of the terminal device S2 and the packets whose destination address is the IP address of the terminal device S3 Both index to the ECMP group table associated with device group B.
  • the ECMP group table may include not only the outbound interface corresponding to each path from the first network device to the destination device group, but also the local congestion level and remote congestion level corresponding to each outbound interface.
  • the local congestion level refers to the congestion level of the local outbound interface.
  • the remote congestion level refers to the congestion level of packets sent via the local outbound interface on the corresponding outbound interfaces of other network devices in the device group.
  • the shortest detour path within the group and the non-shortest detour path within the group correspond to local congestion degree and remote congestion degree.
  • the local direct shortest path and the local direct non-shortest path only correspond to the local congestion level.
  • network device C1 is connected to network device A5 through global interface GE7, connected to network device B1 through global interface GE8, connected to network device C2 through local interface GE3, and connected to network device C2 through local interface GE4.
  • the network device C3 is connected to the network device C4 through the local interface GE5, and connected to the network device C5 through the local interface GE6.
  • the network device C2 is connected to the network device A4 through the global interface GE7, and is connected to the network device B2 through the global interface GE8.
  • the network device C3 is connected to the network device A3 through the global interface GE7, and is connected to the network device B3 through the global interface GE8.
  • the network device C4 is connected to the network device A2 through the global interface GE7, and is connected to the network device B4 through the global interface GE8.
  • the network device C5 is connected to the network device A1 through the global interface GE7, and is connected to the network device B5 through the global interface GE8. If the latest congestion status of each interface of the network device C1 and the latest congestion status of the global interfaces of other network devices in the device group B are as shown in Table 1, then the network device C1 stores the latest congestion status associated with the device group B
  • the ECMP group table may be as shown in Table 3.
  • the outgoing interface role "min1” corresponds to the shortest path of the local direct connection
  • the outgoing interface role “min2” corresponds to the shortest bypass path within the group
  • the outgoing interface role "non-min1” corresponds to the non-shortest local direct connection path
  • the outgoing interface role "non- -min2" corresponds to the detour non-shortest path within the group.
  • the network device uses the route prefix table combined with the ECMP group table to store the route to each terminal device, and one ECMP group table can store all the forwarding information corresponding to the IP address of the terminal device connected to the same device group , which can save table entry resources.
  • the ECMP group table includes a first routing sub-table and a second routing sub-table.
  • the first routing sub-table and the second routing sub-table may be associated with the same routing prefix table, or may be respectively associated with a routing prefix table.
  • the contents of the entries in the routing prefix table associated with the first routing sub-table and the second routing sub-table are generally the same.
  • the first routing sub-table includes outgoing interfaces corresponding to each path from the first network device to the destination device group, and the first routing sub-table is used for the first network device to forward messages from terminal devices accessing the first device group . That is, when the first network device is in the source device group, the first network device forwards the packet based on the first routing subtable.
  • the first network device may sequentially determine the local direct connection shortest path, Whether the bypass shortest path in the group, the local direct non-shortest path, and the bypass non-shortest path in the group are congested, and then use the corresponding outgoing interface to forward the message until a non-congested path is obtained.
  • the first network device receives a message sent by other network devices in the first device group and from a terminal device connected to the other network device through the local interface, the first network device determines the message according to the indication in the message. The text takes the shortest path within the group or the non-shortest path within the group, and then uses the corresponding outgoing interface to forward the message.
  • the first routing sub-table may be as shown in Table 3.
  • network device C1 After network device C1 receives a message from terminal device S1 with a destination address of 20.1.1.1/24 (the IP address of terminal device S2), it first obtains Table 3 based on the index in Table 2. Since the outbound interface role "min1" corresponds to The interface GE8 is not congested, that is, the local direct shortest path is not congested, so the network device C1 directly forwards the packet through the interface GE8.
  • the second routing sub-table includes the outbound interface corresponding to the inter-group shortest path from the first network device to the destination device group, and the second routing sub-table is used for the first network device to forward traffic from other device groups other than the first device group messages from the terminal equipment. That is, when the first network device is located in the intermediate device group or the destination device group, the first network device forwards the message based on the second routing subtable, that is, the first network device uses the shortest path algorithm to forward the message, which can avoid the occurrence of Routing loop.
  • the first network device determines a corresponding outbound interface based on the second routing sub-table to forward the message.
  • the second routing sub-table may be as shown in Table 4.
  • the routing and forwarding table may be generated through a distributed routing protocol or a centralized control solution.
  • distributed routing protocols are used to synchronize routes among network devices, and local routing and forwarding tables are generated respectively according to unified policy configuration.
  • the network devices in the same device group can use the interior border gateway protocol (interior Border Gateway Protocol, iBGP) to publish routing information, that is, the interconnection links in the group run iBGP.
  • Network devices in different device groups can use the external Border Gateway Protocol (eBGP) to advertise routing information, that is, the interconnection links between groups run eBGP.
  • iBGP interior border gateway protocol
  • eBGP external Border Gateway Protocol
  • Each device group corresponds to an AS number.
  • the process for the first network device to obtain the routing and forwarding table includes: when the first network device receives the first routing message, and the AS-path attribute of the first routing message contains only one AS number, the first network The device adds forwarding entries obtained based on the first routing message to the first routing sub-table and the second routing sub-table respectively.
  • the first network device receives the second routing message, and the AS-path attribute of the second routing message contains two AS numbers, the first network device only adds the forwarding information obtained based on the second routing message to the first routing subtable. post items.
  • network devices A1-A5 in device group A are all configured with AS number 100
  • network devices B1-B5 in device group B are all configured with AS number 101
  • network devices in device group C are configured with AS number 101.
  • All the network devices C1-C5 are configured with AS number 102.
  • Network device B5 in device group B publishes a routing message containing the IP address (20.1.1.1/24) of terminal device S2 through eBGP.
  • the AS-path attribute in the routing message carries AS number 101, and the routing message will be published to Each network device in device group A and device group C.
  • network device C1 After network device C1 receives the route message through eBGP, since the AS-path attribute of the route message contains only one AS number 101, network device C1 will use the route as the shortest path directly connected to the local route, and enter the route in the local routing prefix table Add the corresponding relationship between the IP address of the terminal device S2 and the group index of the device group B, and add corresponding forwarding entries in the first routing sub-table and the second routing sub-table contained in the ECMP group table associated with the device group B respectively .
  • the network device C1 publishes a routing message including the IP address of the terminal device S2 through iBGP in the device group C.
  • network devices C2-C5 receive the routing message through iBGP, since the AS-path attribute of the routing message only contains an AS number 101, network devices C2-C5 will use this route as the shortest detour path in the group, and local routing Add the corresponding relationship between the IP address of the terminal device S2 and the group index of the device group B in the prefix table, and add corresponding Forward entry.
  • network device C1 adds its own AS number 102 to the AS-path attribute of the received routing message, and then continues to issue a routing message containing the IP address of terminal device S2 to device group A through eBGP.
  • the AS number in the routing message The -path attribute carries AS numbers 101 and 102, and the routing message will be published to each network device in device group A.
  • network device A5 After network device A5 receives the routing message through eBGP, because the AS-path attribute of the routing message contains two AS numbers 101 and 102, network device A5 will use the route as a local direct non-shortest path, and in the local routing
  • the corresponding relationship between the IP address of the terminal device S2 and the group index of the device group B is added to the prefix table, and the corresponding forwarding entry is added to the first routing sub-table included in the ECMP group table associated with the device group B.
  • the network device A5 publishes a routing message including the IP address of the terminal device S2 in the device group A through iBGP.
  • network devices A1-A4 receive the routing message through iBGP, since the AS-path attribute of the routing message contains two AS numbers 101 and 102, network devices A1-A4 will use this route as a bypass non-shortest path within the group, Add the corresponding relationship between the IP address of the terminal device S2 and the group index of the device group B in the local routing prefix table, and add corresponding forwarding entries in the first routing sub-table included in the ECMP group table associated with the device group B.
  • each network device in device group A and device group C can obtain a routing prefix table similar to Table 2, a first routing sub-table similar to Table 3, and a second routing sub-table similar to Table 4.
  • the second implementation mode is to generate a routing and forwarding table through a centralized control scheme.
  • Each network device in the Dragonfly network is uniformly controlled and managed by the control device, which stores the network topology of the Dragonfly network and the IP addresses of the terminal devices connected to each network device.
  • the control device can report the changed information to the control device, so that the control device can update the stored information based on the information.
  • the control device may be, for example, a software-defined network (software-defined networking, SDN) controller.
  • control device can send the network topology of the Dragonfly network and the IP addresses of the terminal devices connected to each network device to each network device, and each network device can calculate the terminal devices connected to other network devices by itself according to a unified routing algorithm. forwarding paths to generate corresponding routing and forwarding tables.
  • the first network device may generate a routing prefix table and multiple ECMP group tables according to the networking topology of the Dragonfly network, the IP address of the terminal device accessing the Dragonfly network, and the access device of the terminal device.
  • control device can generate routing and forwarding tables corresponding to each network device based on the networking topology of the Dragonfly network, the IP address of the terminal device connected to the Dragonfly network, and the access device of the terminal device, and send the corresponding routing forwarding table to the network device. published.
  • the first network device may receive the routing and forwarding table sent by the control device.
  • the first network device used to execute the method shown in FIG. 2 may be the packet forwarding apparatus 300 shown in FIG. 3 .
  • the message forwarding device 300 is applied to network devices in the Dragonfly network.
  • the dragonfly network includes multiple device groups, and there are multiple inter-group interconnection links between different device groups.
  • the device 300 includes:
  • the receiving module 301 is configured to receive a first message sent by a first terminal device connected to the first network device, the destination address of the first message is the IP address of the second terminal device connected to the second network device, and the first The network device belongs to the first device group, and the second network device belongs to the second device group.
  • the processing module 302 is configured to determine whether the first inter-group inter-link is congested when there is a first inter-group inter-link between the first network device and the second device group.
  • the sending module 303 is configured to send the first packet to the second network device through the first inter-group inter-link when no congestion occurs on the first inter-group inter-link.
  • the processing module 302 is configured to determine whether the interconnection link between the first group is congested according to the congestion state of the global interface connected to the second device group on the first network device.
  • the processing module 302 is further configured to, when the first inter-group interconnection link is congested, or when there is no inter-group interconnection link between the first network device and the second device group, sequentially Congestion judgment is performed on the shortest detour path within the group, the non-shortest local direct connection path, and the non-shortest detour path within the group with the second network device until the target forwarding path is obtained.
  • the sending module 303 is further configured to send the second packet obtained based on the first packet to the second network device through the target forwarding path.
  • the detour shortest path within the group includes the intra-group interconnection link between the first network device and the third network device in the first device group and the inter-group interconnection link between the third network device and the second device group
  • the local direct non-shortest path includes an intergroup interconnection link between the first network device and the third device group and an intergroup interconnection link between the third device group and the second device group
  • the intragroup bypass non-shortest path includes the intragroup interconnection link between the first network device and the fourth network device in the first device group, the intergroup interconnection link between the fourth network device and the fourth device group, and the fourth device group and the fourth device group. An intergroup interconnection link between the second device group.
  • the processing module 302 is configured to: perform congestion judgment on the shortest detour path within the group between the first network device and the second network device.
  • perform congestion judgment on the shortest detour path within the group between the first network device and the second network device.
  • any local direct non-shortest path without congestion between the first network device and the second network device take any local direct non-shortest path without congestion between the first network device and the second network device as Destination forwarding path.
  • congestion judgment is performed on the detour non-shortest path within the group between the first network device and the second network device.
  • any intra-group detour non-shortest path without congestion between the first network device and the second network device path as the target forwarding path.
  • the processing module 302 is configured to: according to the queue depth of the first outgoing interface queue of the local interface connected to the third network device on the first network device and the global interface connected to the second device group on the third network device.
  • the congestion state of the first network device and the second network device determine whether there is congestion in the shortest path between the first network device and the second network device passing through the third network device.
  • the first outbound interface queue is used to forward Packets forwarded by the shortest path.
  • the processing module 302 is configured to: determine the local direct link between the first network device and the second network device passing through the third device group according to the congestion status of the global interface connected to the third device group on the first network device. Whether there is congestion on the non-shortest path.
  • the processing module 302 is configured to: according to the queue depth of the second outbound interface queue of the local interface connected to the fourth network device on the first network device and the global interface connected to the fourth device group on the fourth network device.
  • the congestion status of the first network device and the second network device determines whether there is congestion in the detour non-shortest path between the first network device and the second network device passing through the fourth network device and the fourth device group.
  • the second outbound interface queue is used to forward the first network device Packets forwarded through non-shortest paths within the group.
  • the sending module 303 is configured to: when the target forwarding path is the shortest bypass path in the group, add the first indication to the first message to obtain the second message, and send the second message to the second network device through the target forwarding path The second message is sent, and the first indication is used to indicate that the type of the forwarding path is the detour shortest path within the group. Or, when the target forwarding path is a non-shortest bypass path within the group, the second indication is added to the first message to obtain the second message, and the second message is sent to the second network device through the target forwarding path, and the second The indication is used to indicate that the forwarding path type is an intra-group bypass non-shortest path.
  • the apparatus 300 further includes: an obtaining module 304, configured to obtain a routing and forwarding table, where the routing and forwarding table includes a routing prefix table and multiple ECMP group tables.
  • Each entry in the routing prefix table includes the correspondence between the destination IP address and the group index of the destination device group, the group index is associated with an ECMP group table, and the ECMP group table includes each path from the first network device to the destination device group respectively For the corresponding outgoing interface, the destination device group is the device group where the access device of the terminal device to which the destination IP address belongs belongs.
  • the ECMP group table includes a first routing sub-table and a second routing sub-table.
  • the first routing sub-table includes outgoing interfaces corresponding to each path from the first network device to the destination device group, and the first routing sub-table is used for the first network device to forward messages from terminal devices accessing the first device group .
  • the second routing sub-table includes the outbound interface corresponding to the inter-group shortest path from the first network device to the destination device group, and the second routing sub-table is used for the first network device to forward traffic from other device groups other than the first device group messages from the terminal equipment.
  • each device group corresponds to an AS number
  • the obtaining module 304 is configured to: when the first network device receives the first routing message, and the AS-path attribute of the first routing message only contains one AS number, The forwarding entries obtained based on the first routing message are respectively added to the first routing sub-table and the second routing sub-table.
  • the first network device receives the second routing message, and the AS-path attribute of the second routing message contains two AS numbers, only the forwarding entry obtained based on the second routing message is added to the first routing sub-table.
  • the obtaining module 304 is configured to: generate a routing prefix table and multiple ECMP group tables according to the networking topology of the Dragonfly network, the IP address of the terminal device connected to the Dragonfly network, and the access device of the terminal device.
  • the obtaining module 304 is configured to: receive the routing and forwarding table sent by the control device.
  • the first network device used to execute the method shown in FIG. 2 may be the network device 500 shown in FIG. 5 .
  • the network device 500 is a network device in the Dragonfly network.
  • the dragonfly network includes multiple device groups, and there are multiple inter-group interconnection links between different device groups.
  • the network device 500 may be a router or a switch.
  • a network device 500 includes: a processor 501 and a memory 502 .
  • the memory 502 is used to store computer programs, the computer programs including program instructions;
  • the processor 501 is configured to call the computer program to implement actions performed by the first network device in the foregoing method embodiments.
  • the network device 500 further includes a communication bus 503 and a communication interface 504 .
  • the processor 501 includes one or more processing cores, and the processor 501 executes various functional applications and data processing by running computer programs.
  • Memory 502 may be used to store computer programs.
  • the memory may store an operating system and application program units required for at least one function.
  • the operating system can be an operating system such as a real-time operating system (Real Time eXecutive, RTX), LINUX, UNIX, WINDOWS or OS X.
  • the communication interfaces 504 are used to communicate with, for example, terminal devices or other network devices.
  • the communication interface 504 is used to send and receive messages.
  • the memory 502 and the communication interface 504 are respectively connected to the processor 501 through the communication bus 503 .
  • An embodiment of the present application also provides a computer-readable storage medium, where instructions are stored on the computer-readable storage medium, and when the instructions are executed by a processor, the actions performed by the first network device in the above method embodiments are implemented. .
  • An embodiment of the present application further provides a computer program product, including a computer program, and when the computer program is executed by a processor, the actions performed by the first network device in the foregoing method embodiments are implemented.
  • the program can be stored in a computer-readable storage medium.
  • the above-mentioned The storage medium mentioned may be a read-only memory, a magnetic disk or an optical disk, and the like.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

本申请公开了一种报文转发方法及装置、蜻蜓网络,属于通信技术领域。第一设备组中的第一网络设备接收与第一网络设备连接的第一终端设备发送的、目的地址为与第二设备组中的第二网络设备连接的第二终端设备的IP地址的第一报文。当第一网络设备与第二设备组之间存在第一组间互联链路,且第一组间互联链路未发生拥塞时,第一网络设备通过第一组间互联链路向第二网络设备发送第一报文。本申请中,第一网络设备优先选用第一网络设备与第二设备组之间的组间互联链路来向第二网络设备发送报文,使报文的转发跳数尽可能地少,以实现组间通信的低时延,同时提高了对报文转发路径的控制精度。

Description

报文转发方法及装置、蜻蜓网络
本申请要求于2021年09月28日提交的申请号为202111142613.5、发明名称为“报文转发方法及装置、蜻蜓网络”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及通信技术领域,特别涉及一种报文转发方法及装置、蜻蜓网络。
背景技术
蜻蜓(dragonfly)网络是高性能计算(high performance computing,HPC)领域一种常用的高效通信系统。与胖树(fat tree)网络等组网拓扑相比,蜻蜓网络具有通信路径短、时延低、组网规模大等特点,可以实现低时延和高吞吐通信。
蜻蜓网络包括多个设备组,每个设备组分别包括多个网络设备。每个网络设备包括全局(global)接口、本地(local)接口和接入(access)接口。全局接口用于组间互联,本地接口用于组内互联,接入接口用于连接服务器、虚拟机(virtual machine,VM)、存储设备等终端设备。每个设备组内的各个网络设备通过本地接口实现网状(mesh)直连(即全直连)。一个设备组可视为一个逻辑设备,不同设备组之间通过全局接口实现mesh直连。
目前,蜻蜓网络通常采用自适应路由(adaptive routing)方式进行报文转发,以兼顾低时延和高吞吐两个目标。不同设备组的网络设备进行组间通信时,如果组间最短路径不拥塞,则优先选择组间最短路径进行报文转发,以实现低时延;如果组间最短路径拥塞,则选择组间非最短路径进行转发,以实现高吞吐。其中,组间最短路径指只经过一条组间互联链路的报文转发路径。组间非最短路径指只经过一个中间设备组以及两条组间互联链路的报文转发路径,即有且只经过一个中间设备组的报文转发路径称为组间非最短路径。
但是,在不同设备组之间具有多条组间互联链路的情况下,组间最短路径可能会有多种情况:组间最短路径只包括一条组间互联链路;组间最短路径包括一条组间互联链路以及源设备组或目的设备组内一条组内互联链路;组间最短路径包括源设备组内一条组内互联链路、一条组间互联链路以及目的设备组内一条组内互联链路。而按照目前的报文转发方式只能优先选择使用组间最短路径,对报文转发路径的控制精度较低。
发明内容
本申请提供了一种报文转发方法及装置、蜻蜓网络,可以解决目前对报文转发路径的控制精度较低的问题。
第一方面,提供了一种报文转发方法。该方法应用于蜻蜓网络,该蜻蜓网络包括多个设备组,不同设备组之间具有多条组间互联链路。第一网络设备接收与第一网络设备连接的第一终端设备发送的第一报文。第一报文的目的地址为与第二网络设备连接的第二终端设备的互联网协议(Internet Protocol,IP)地址,第一网络设备属于第一设备组,第二网络设备属于第二设备组。当第一网络设备与第二设备组之间存在第一组间互联链路,且第一组间互联链 路未发生拥塞时,第一网络设备通过第一组间互联链路向第二网络设备发送第一报文。
本申请中,第一网络设备优先选用第一网络设备与第二设备组之间的组间互联链路来向第二网络设备发送报文,使报文的转发跳数尽可能地少,以实现组间通信的低时延,同时提高了对报文转发路径的控制精度。
可选地,第一网络设备根据第一网络设备上与第二设备组连接的全局接口的拥塞状态,确定第一组间互联链路是否发生拥塞。
可选地,当第一组间互联链路发生拥塞,或者,第一网络设备与第二设备组之间不存在组间互联链路时,第一网络设备依次对第一网络设备与第二网络设备之间的组内绕行最短路径、本地直连非最短路径以及组内绕行非最短路径进行拥塞判断,直至得到目标转发路径为止。第一网络设备通过目标转发路径向第二网络设备发送基于第一报文得到的第二报文。其中,组内绕行最短路径包括第一网络设备与第一设备组中的第三网络设备之间的组内互联链路以及第三网络设备与第二设备组之间的组间互联链路,本地直连非最短路径包括第一网络设备与第三设备组之间的组间互联链路以及第三设备组与第二设备组之间的一条组间互联链路,组内绕行非最短路径包括第一网络设备与第一设备组中的第四网络设备之间的组内互联链路、第四网络设备与第四设备组之间的组间互联链路以及第四设备组与第二设备组之间的一条组间互联链路。
本申请中,第一网络设备在进行路径选择时,遵循优先选用组间最短路径、组间最短路径都发生拥塞后再选用组间非最短路径的原则,并且在组间最短路径中选择转发路径时优先选用本地直连最短路径,在组间非最短路径中选择转发路径时优先选用本地直连非最短路径,一方面能够尽量降低报文转发时延,另一方面优先选用本地直连路径,第一网络设备能够较好地实时感知本地拥塞情况,后续根据本地拥塞情况判断是否切换路径,可以较为准确地确定路径切换时机。
可选地,第一网络设备依次对第一网络设备与第二网络设备之间的组内绕行最短路径、本地直连非最短路径以及组内绕行非最短路径进行拥塞判断,直至得到目标转发路径为止,包括:第一网络设备对第一网络设备与第二网络设备之间的组内绕行最短路径进行拥塞判断。当第一网络设备与第二网络设备之间存在未发生拥塞的组内绕行最短路径时,第一网络设备将第一网络设备与第二网络设备之间未发生拥塞的任一组内绕行最短路径作为目标转发路径。当第一网络设备与第二网络设备之间不存在未发生拥塞的组内绕行最短路径时,第一网络设备对第一网络设备与第二网络设备之间的本地直连非最短路径进行拥塞判断。当第一网络设备与第二网络设备之间存在未发生拥塞的本地直连非最短路径时,第一网络设备将第一网络设备与第二网络设备之间未发生拥塞的任一本地直连非最短路径作为目标转发路径。当第一网络设备与第二网络设备之间不存在未发生拥塞的本地直连非最短路径时,第一网络设备对第一网络设备与第二网络设备之间的组内绕行非最短路径进行拥塞判断。当第一网络设备与第二网络设备之间存在未发生拥塞的组内绕行非最短路径时,第一网络设备将第一网络设备与第二网络设备之间未发生拥塞的任一组内绕行非最短路径作为目标转发路径。
可选地,第一网络设备对第一网络设备与第二网络设备之间的组内绕行最短路径进行拥塞判断的实现方式,包括:第一网络设备根据第一网络设备上与第三网络设备连接的本地接口的第一出接口队列的队列深度以及第三网络设备上与第二设备组连接的全局接口的拥塞状态,确定第一网络设备与第二网络设备之间经过第三网络设备的组内绕行最短路径是否发生 拥塞,第一出接口队列用于转发第一网络设备中通过组内绕行最短路径转发的报文。
可选地,第一网络设备对第一网络设备与第二网络设备之间的本地直连非最短路径进行拥塞判断的实现方式,包括:第一网络设备根据第一网络设备上与第三设备组连接的全局接口的拥塞状态,确定第一网络设备与第二网络设备之间经过第三设备组的本地直连非最短路径是否发生拥塞。
可选地,第一网络设备对第一网络设备与第二网络设备之间的组内绕行非最短路径进行拥塞判断的实现方式,包括:第一网络设备根据第一网络设备上与第四网络设备连接的本地接口的第二出接口队列的队列深度以及第四网络设备上与第四设备组连接的全局接口的拥塞状态,确定第一网络设备与第二网络设备之间经过第四网络设备以及第四设备组的组内绕行非最短路径是否发生拥塞,第二出接口队列用于转发第一网络设备中通过组内绕行非最短路径转发的报文。
由于第一网络设备选用组内绕行最短路径或组内绕行非最短路径时,都要通过第一设备组内的另一网络设备转发报文,而第一网络设备无论选用组内绕行最短路径还是组内绕行非最短路径,如果通过第一设备组内的同一网络设备转发报文,那么发送报文时所使用的本地接口是相同的,因此需要区分从同一本地接口发出的报文中走组内绕行最短路径的报文数量以及走组内绕行非最短路径的报文数量,以分别确定组内绕行最短路径和组内绕行非最短路径的拥塞情况。本申请中,通过将本地接口的出接口队列划分为第一出接口队列和第二出接口队列。第一出接口队列用于转发第一网络设备中通过组内绕行最短路径转发的报文,第二出接口队列用于转发第一网络设备中通过组内绕行非最短路径转发的报文,分别实现基于第一出接口队列的队列深度确定组内绕行最短路径的拥塞情况,基于第二出接口队列的队列深度确定组内绕行非最短路径的拥塞情况。
可选地,第一网络设备通过目标转发路径向第二网络设备发送基于第一报文得到的第二报文的实现方式,包括:当目标转发路径为组内绕行最短路径时,第一网络设备在第一报文中添加第一指示得到第二报文,并通过目标转发路径向第二网络设备发送第二报文,第一指示用于指示转发路径类型为组内绕行最短路径。或者,当目标转发路径为组内绕行非最短路径时,第一网络设备在第一报文中添加第二指示得到第二报文,并通过目标转发路径向第二网络设备发送第二报文,第二指示用于指示转发路径类型为组内绕行非最短路径。
本申请中,由于第一网络设备选用组内绕行最短路径或组内绕行非最短路径时,都要通过第一设备组内的另一网络设备转发报文,而无论第一网络设备选用组内绕行最短路径还是组内绕行非最短路径,如果通过第一设备组内的同一网络设备转发报文,那么这个网络设备接收报文所使用的本地接口是相同的,因此这个网络设备需要区分从本地接口接收到的报文中哪些报文走的是组内绕行最短路径,哪些报文走的是组内绕行非最短路径,以便采用相应的转发路径来转发报文。本申请中,源网络设备通过在走组内绕行最短路径的报文中添加第一指示,在走组内绕行非最短路径的报文中添加第二指示,以明确指示设备组内转发报文的其它网络设备采用哪种转发路径来转发报文,这样其它网络设备可以基于报文中的指示知晓应该通过哪个全局接口转发报文,从而可以提高报文转发效率。
可选地,第一网络设备获取路由转发表,路由转发表包括路由前缀表和多个等价多路径(equal cost multi-path,ECMP)组(ECMP group)表。路由前缀表的每个表项包括目的IP地址与目的设备组的组索引的对应关系。目的设备组的组索引关联一个ECMP组表,ECMP组 表包括从第一网络设备到该目的设备组的每条路径分别对应的出接口,该目的设备组为对应的目的IP地址所属的终端设备的接入设备所在的设备组。
本申请中,网络设备采用路由前缀表结合ECMP组表的方式存储到各个终端设备的路由,采用一个ECMP组表即可存储接入同一设备组的终端设备的IP地址对应的所有转发信息,可以节约表项资源。
可选地,ECMP组表包括第一路由子表和第二路由子表。第一路由子表包括从第一网络设备到目的设备组的每条路径分别对应的出接口,第一路由子表用于第一网络设备转发来自接入第一设备组的终端设备的报文。第二路由子表包括从第一网络设备到目的设备组的组间最短路径对应的出接口,第二路由子表用于第一网络设备转发来自接入除第一设备组以外的其它设备组的终端设备的报文。
本申请中,当第一网络设备位于中间设备组或目的设备组内时,第一网络设备基于第二路由子表转发报文,即第一网络设备使用最短路径算法转发报文,可以避免出现路由环路。
一种实现方式中,每个设备组分别对应一个自治系统(autonomous system,AS)号。第一网络设备获取路由转发表的过程,包括:当第一网络设备接收到第一路由消息,且第一路由消息的AS-path属性仅包含一个AS号时,第一网络设备在第一路由子表和第二路由子表中分别添加基于第一路由消息得到的转发表项。当第一网络设备接收到第二路由消息,且第二路由消息的AS-path属性包含两个AS号时,第一网络设备仅在第一路由子表中添加基于第二路由消息得到的转发表项。
另一种实现方式中,第一网络设备获取路由转发表的过程,包括:第一网络设备根据蜻蜓网络的组网拓扑、接入蜻蜓网络的终端设备的IP地址以及终端设备的接入设备,生成路由前缀表和多个ECMP组表。
又一种实现方式中,第一网络设备获取路由转发表的过程,包括:第一网络设备接收控制设备发送的路由转发表。
第二方面,提供了一种报文转发装置。所述装置应用于蜻蜓网络中的第一网络设备,所述蜻蜓网络包括多个设备组,不同设备组之间具有多条组间互联链路。所述装置包括多个功能模块,所述多个功能模块相互作用,实现上述第一方面及其各实施方式中的方法。所述多个功能模块可以基于软件、硬件或软件和硬件的结合实现,且所述多个功能模块可以基于具体实现进行任意组合或分割。
第三方面,提供了一种蜻蜓网络,包括:多个设备组,不同设备组之间具有多条组间互联链路,所述设备组中的网络设备用于执行上述第一方面及其各实施方式中的方法。
第四方面,提供了一种网络设备,包括:处理器和存储器;
所述存储器,用于存储计算机程序,所述计算机程序包括程序指令;
所述处理器,用于调用所述计算机程序,实现上述第一方面及其各实施方式中的方法。
第五方面,提供了一种计算机可读存储介质,所述计算机可读存储介质上存储有指令,当所述指令被处理器执行时,实现上述第一方面及其各实施方式中的方法。
第六方面,提供了一种计算机程序产品,包括计算机程序,所述计算机程序被处理器执行时,实现上述第一方面及其各实施方式中的方法。
第七方面,提供了一种芯片,芯片包括可编程逻辑电路和/或程序指令,当芯片运行时,实现上述第一方面及其各实施方式中的方法。
附图说明
图1是本申请实施例提供的一种蜻蜓网络的结构示意图;
图2是本申请实施例提供的一种报文转发方法的流程示意图;
图3是本申请实施例提供的一种报文转发装置的结构示意图;
图4是本申请实施例提供的另一种报文转发装置的结构示意图;
图5是本申请实施例提供的一种网络设备的框图。
具体实施方式
为使本申请的目的、技术方案和优点更加清楚,下面将结合附图对本申请实施方式作进一步地详细描述。
本申请实施例提供的报文转发方法应用于蜻蜓网络。该蜻蜓网络包括多个设备组,不同设备组之间具有多条组间互联链路。两个设备组之间的多条组间互联链路通常分布在不同网络设备上。例如,图1是本申请实施例提供的一种蜻蜓网络的结构示意图。如图1所示,该蜻蜓网络包括设备组A、设备组B和设备组C。设备组A包括5个网络设备A1-A5,设备组B包括5个网络设备B1-B5,设备组C包括5个网络设备C1-C5。可选地,网络设备可以是路由器或交换机等。图1中设备组的数量以及每个设备组中网络设备的数量仅用作示意,不作为对本申请实施例提供的蜻蜓网络的限制。
在组网规划确定之后,可以配置蜻蜓网络中的每个网络设备的接口角色,具体可以将网络设备上的接口划分为全局接口、本地接口和接入接口。全局接口用于组间互联。本地接口用于组内互联。接入接口用于连接终端设备。终端设备例如可以是服务器、VM或存储设备等。网络设备上全局接口、本地接口、接入接口的比例通常为1:2:1。例如在如图1所示的蜻蜓网络中,每个网络设备具有8个接口GE1-GE8(图中未示出)。2个接口(GE1、GE2)为接入接口,4个接口(GE3、GE4、GE5、GE6)为本地接口,2个接口(GE7、GE8)为全局接口。
可选地,网络设备的接口既可以作为入接口来接收报文,也可以作为出接口来发送报文。本申请中所描述的网络设备的接口的拥塞状态,均指的是该接口作为出接口时的拥塞状态。一种实现方式下,接口的拥塞状态可以基于接口的出接口队列的队列深度确定。若某个接口的至少一个出接口队列的队列深度超出第一阈值,则判定该接口发生拥塞;若某个接口的所有出接口队列的队列深度均未超出第一阈值,则判定该接口未发生拥塞。另一种实现方式下,接口的拥塞状态也可以基于接口的带宽利用率确定。若某个接口的带宽利用率超出第二阈值,则判定该接口发生拥塞;若某个接口的带宽利用率未超出第二阈值,则判定该接口未发生拥塞。其中,接口的带宽利用率等于该接口的报文发送速率与分配给该接口的带宽的比值。本申请实施例中,可以将接口的拥塞状态分为拥塞和未拥塞这两种,或者也可以将接口的拥塞状态分为未拥塞(即轻载)、中度拥塞和重度拥塞这三种,本申请实施例对此不做限定。
在如图1所示的蜻蜓网络中,每个设备组内的多个网络设备之间通过各自的4个本地接口mesh直连。不同设备组之间分别具有5条组间互联链路。例如,设备组A与设备组B之间的5条组间互联链路分别包括:网络设备A1与网络设备B1之间的组间互联链路L AB1,网络设备A2与网络设备B2之间的组间互联链路L AB2,网络设备A3与网络设备B3之间的组 间互联链路L AB3,网络设备A4与网络设备B4之间的组间互联链路L AB4,网络设备A5与网络设备B5之间的组间互联链路L AB5。设备组A与设备组C之间的5条组间互联链路分别包括:网络设备A1与网络设备C5之间的组间互联链路L AC1,网络设备A2与网络设备C4之间的组间互联链路L AC2,网络设备A3与网络设备C3之间的组间互联链路L AC3,网络设备A4与网络设备C2之间的组间互联链路L AC4,网络设备A5与网络设备C1之间的组间互联链路L AC5。设备组B与设备组C之间的5条组间互联链路分别包括:网络设备B1与网络设备C1之间的组间互联链路L BC1,网络设备B2与网络设备C2之间的组间互联链路L BC2,网络设备B3与网络设备C3之间的组间互联链路L BC3,网络设备B4与网络设备C4之间的组间互联链路L BC4,网络设备B5与网络设备C5之间的组间互联链路L BC5
本申请实施例中,将两个设备组之间的组间最短路径划分为本地直连最短路径和组内绕行最短路径,和/或,将两个设备组之间的组间非最短路径划分为本地直连非最短路径和组内绕行非最短路径。本地直连最短路径与组内绕行最短路径的一处区别在于:本地直连最短路径不包括源设备组内的组内互联链路,组内绕行最短路径包括源设备组内的一条组内互联链路。本地直连非最短路径与组内绕行非最短路径的一处区别在于:本地直连非最短路径不包括源设备组内的组内互联链路,组内绕行非最短路径包括源设备组内的一条组内互联链路。
其中,源设备组指的是发送报文的源终端设备的接入设备(以下称为源网络设备)所在的设备组。目的设备组指的是接收报文的目的终端设备的接入设备(以下称为目的网络设备)所在的设备组。例如请继续参见图1,终端设备S1与网络设备C1连接,终端设备S2与网络设备B5连接,终端设备S3与网络设备B5连接。假设终端设备S1为源终端设备,则网络设备C1为源网络设备,设备组C为源设备组。假设终端设备S2为目的终端设备,则网络设备B5为目的网络设备,设备组B为目的设备组。
不同设备组的网络设备进行组间通信时,当源网络设备与目的设备组之间具有组间互联链路时,源网络设备可以按照优先级由高至低的顺序选用本地直连最短路径、组内绕行最短路径、本地直连非最短路径以及组内绕行非最短路径中的非拥塞路径来转发来自与源网络设备连接的终端设备的报文。当源网络设备与目的设备组之间不具有组间互联链路时,源网络设备可以按照优先级由高至低的顺序选用组内绕行最短路径、本地直连非最短路径以及组内绕行非最短路径中的非拥塞路径来转发来自与源网络设备连接的终端设备的报文。
一种实现方式中,当源网络设备与目的设备组之间存在组间互联链路,且该组间互联链路未发生拥塞时,源网络设备通过该组间互联链路发送来自源终端设备的报文。例如在如图1所示的蜻蜓网络中,终端设备S1为源终端设备,终端设备S2为目的终端设备。如果网络设备C1与网络设备B1之间的组间互联链路L BC1未发生拥塞,则网络设备C1通过组间互联链路L BC1向网络设备B1发送来自终端设备S1、目的地址为终端设备S2的报文。
本地直连最短路径包括源网络设备与目的设备组之间的组间互联链路,源网络设备通过源网络设备与目的设备组之间的组间互联链路发送来自源终端设备的报文,也即是,源网络设备通过本地直连最短路径发送来自源终端设备的报文。例如,网络设备C1与网络设备B5之间的本地直连最短路径包括网络设备C1与网络设备B1之间的组间互联链路L BC1以及网络设备B1与网络设备B5之间的组内互联链路。
组内绕行最短路径包括源网络设备与源设备组中的另一网络设备之间的组内互联链路以及该另一网络设备与目的设备组之间的组间互联链路。例如,网络设备C1与网络设备B5之 间的一条组内绕行最短路径包括网络设备C1与网络设备C5之间的组内互联链路以及网络设备C5与网络设备B5之间的组间互联链路L BC5
本地直连非最短路径包括源网络设备与除目的设备组以外的另一设备组之间的组间互联链路以及该另一设备组与目的设备组之间的一条组间互联链路。例如,网络设备C1与网络设备B5之间的本地直连非最短路径包括网络设备C1与网络设备A5之间的组间互联链路L AC5以及网络设备A5与网络设备B5之间的组间互联链路L AB5
组内绕行非最短路径包括源网络设备与源设备组中的另一网络设备之间的组内互联链路、该另一网络设备与除目的设备组以外的另一设备组之间的组间互联链路以及该另一设备组与目的设备组之间的一条组间互联链路。例如,网络设备C1与网络设备B5之间的一条组内绕行非最短路径包括网络设备C1与网络设备C2之间的组内互联链路、网络设备C2与网络设备A4之间的组间互联链路L AC4、网络设备A4与网络设备B4之间的组间互联链路L AB4以及网络设备B4与网络设备B5之间的组内互联链路。
本申请实施例中,转发路径未发生拥塞指的是该转发路径中与源设备组关联的链路未发生拥塞。与源设备组关联的链路可以包括源设备组内的组内互联链路以及源设备组与其它设备组之间的组间互联链路。例如,若源网络设备与目的设备组之间的组间互联链路未发生拥塞,则判定本地直连最短路径未发生拥塞。若源网络设备与源设备组中的另一网络设备之间的组内互联链路未发生拥塞,并且该另一网络设备与目的设备组之间的组间互联链路也未发生拥塞,则判定经过该另一网络设备的组内绕行最短路径未发生拥塞。若源网络设备与除目的设备组以外的另一设备组之间的组间互联链路未发生拥塞,则判定经过该另一设备组的本地直连非最短路径未发生拥塞。若源网络设备与源设备组中的另一网络设备之间的组内互联链路未发生拥塞,并且该另一网络设备与除目的设备组以外的另一设备组之间的组间互联链路也未发生拥塞,则判定经过该另一网络设备以及该另一设备组的组内绕行非最短路径未发生拥塞。
本申请实施例中,通过将组间最短路径划分为本地直连最短路径和组内绕行最短路径,和/或,将组间非最短路径划分为本地直连非最短路径和组内绕行非最短路径,使得源网络设备能够在更细的粒度上进行路径选择,实现对报文转发路径的精细控制,从而能够精确控制报文转发的低时延和高吞吐。另外,源网络设备在进行路径选择时,遵循优先选用组间最短路径、组间最短路径都发生拥塞后再选用组间非最短路径的原则,并且在组间最短路径中选择转发路径时优先选用本地直连最短路径,在组间非最短路径中选择转发路径时优先选用本地直连非最短路径,一方面能够尽量降低报文转发时延,另一方面优先选用本地直连路径,源网络设备能够较好地实时感知本地拥塞情况,后续根据本地拥塞情况判断是否切换路径,可以较为准确地确定路径切换时机。
下面对本申请实施例提供的方法流程进行举例说明。
例如,图2是本申请实施例提供的一种报文转发方法的流程示意图。该方法可以应用于如图1所示的蜻蜓网络。如图2所示,该方法包括:
步骤201、第一网络设备接收与第一网络设备连接的第一终端设备发送的第一报文。
第一报文的目的地址为与第二网络设备连接的第二终端设备的IP地址,第一网络设备属于第一设备组,第二网络设备属于第二设备组。第一设备组与第二设备组之间具有多条组间 互联链路。
例如在如图1所示的蜻蜓网络中,第一终端设备为终端设备S1,第二终端设备为终端设备S2,则第一网络设备为网络设备C1,第一设备组为设备组C,第二网络设备为网络设备B5,第二设备组为设备组B。
步骤202、第一网络设备判断第一网络设备与第二设备组之间是否存在组间互联链路;若第一网络设备与第二设备组之间存在第一组间互联链路,则执行步骤203;若第一网络设备与第二设备组之间不存在组间互联链路,则执行步骤205。
虽然第一设备组与第二设备组之间具有多条组间互联链路,但是第一网络设备与第二设备组之间可能存在组间互联链路,也可能不存在组间互联链路。
第一网络设备判断第一网络设备与第二设备组之间是否存在组间互联链路,也即是判断第一网络设备与第二设备组之间是否存在本地直连最短路径。
步骤203、第一网络设备判断第一组间互联链路是否发生拥塞;若第一组间互联链路未发生拥塞,则执行步骤204;若第一组间互联链路发生拥塞,则执行步骤205。
第一网络设备判断第一组间互联链路是否发生拥塞,也即是对第一网络设备与第二设备组之间的本地直连最短路径进行拥塞判断。可选地,第一网络设备根据第一网络设备上与第二设备组连接的全局接口的拥塞状态,确定第一组间互联链路是否发生拥塞。若第一网络设备上与第二设备组连接的全局接口发生拥塞,则确定第一组间互联链路发生拥塞;若第一网络设备上与第二设备组连接的全局接口未发生拥塞,则确定第一组间互联链路未发生拥塞。
结合参考步骤201中的例子,假设网络设备C1通过全局接口GE8与网络设备B1连接。若网络设备C1的全局接口GE8发生拥塞,则确定网络设备C1与网络设备B1之间的组间互联链路L BC1发生拥塞;若网络设备C1的全局接口GE8未发生拥塞,则确定网络设备C1与网络设备B1之间的组间互联链路L BC1未发生拥塞。
步骤204、第一网络设备通过第一组间互联链路向第二网络设备发送第一报文。
第一网络设备通过第一组间互联链路向第二网络设备发送第一报文,也即是通过本地直连最短路径向第二网络设备发送第一报文。
本申请实施例中,源网络设备优先选用本地直连最短路径向目的网络设备发送报文,使报文的转发跳数尽可能地少,以实现组间通信的低时延。
步骤205、第一网络设备对第一网络设备与第二网络设备之间的组内绕行最短路径进行拥塞判断;若第一网络设备与第二网络设备之间存在未发生拥塞的组内绕行最短路径,则执行步骤206;若第一网络设备与第二网络设备之间不存在未发生拥塞的组内绕行最短路径,则执行步骤207。
可选地,第一网络设备根据第一网络设备上与第三网络设备连接的本地接口的第一出接口队列的队列深度以及第三网络设备上与第二设备组连接的全局接口的拥塞状态,确定第一网络设备与第二网络设备之间经过第三网络设备的组内绕行最短路径是否发生拥塞。若第一网络设备上与第三网络设备连接的本地接口的第一出接口队列的队列深度超出第一阈值,和/或,第三网络设备上与第二设备组连接的全局接口发生拥塞,则确定第一网络设备与第二网络设备之间经过第三网络设备的组内绕行最短路径发生拥塞。若第一网络设备上与第三网络设备连接的本地接口的第一出接口队列的队列深度未超出第一阈值,且第三网络设备上与第二设备组连接的全局接口未发生拥塞,则确定第一网络设备与第二网络设备之间经过第三网 络设备的组内绕行最短路径未发生拥塞。
其中,第一出接口队列用于转发第一网络设备中通过组内绕行最短路径转发的报文。由于第一网络设备选用组内绕行最短路径或组内绕行非最短路径时,都要通过第一设备组内的另一网络设备转发报文,而第一网络设备无论选用组内绕行最短路径还是组内绕行非最短路径,如果通过第一设备组内的同一网络设备转发报文,那么发送报文时所使用的本地接口是相同的,因此需要区分从同一本地接口发出的报文中走组内绕行最短路径的报文数量以及走组内绕行非最短路径的报文数量,以分别确定组内绕行最短路径和组内绕行非最短路径的拥塞情况。本申请实施例中,通过将本地接口的出接口队列划分为第一出接口队列和第二出接口队列。第一出接口队列用于转发第一网络设备中通过组内绕行最短路径转发的报文,第二出接口队列用于转发第一网络设备中通过组内绕行非最短路径转发的报文,分别实现基于第一出接口队列的队列深度确定组内绕行最短路径的拥塞情况,基于第二出接口队列的队列深度确定组内绕行非最短路径的拥塞情况。
可选地,第三网络设备为第一设备组中与第二设备组之间具有组间互联链路的除第一网络设备以外的任一网络设备。结合参考步骤201中的例子,第三网络设备可以是网络设备C2、网络设备C3、网络设备C4或网络设备C5。
可选地,属于同一设备组的多个网络设备之间互相通告自身的全局接口的拥塞状态。例如第一网络设备可以向第一设备组内的其它网络设备周期性或实时发送自身的全局接口的拥塞状态。第一网络设备中可以存储自身各个接口的最新拥塞状态以及第一设备组内的其它网络设备的全局接口的最新拥塞状态。例如,第一网络设备为图1中的网络设备C1,网络设备C1中所存储的各个接口在某个时刻的拥塞状态可以如表1所示。其中,“1”表示未发生拥塞,“2”表示发生拥塞。
表1
  GE1 GE2 GE3 GE4 GE5 GE6 GE7 GE8
网络设备C1 1 1 (1,2) (1,1) (2,2) (1,1) 1 1
网络设备C2 / / / / / / 1 2
网络设备C3 / / / / / / 2 1
网络设备C4 / / / / / / 2 2
网络设备C5 / / / / / / 1 1
参见表1,网络设备C1的本地接口GE3-6的拥塞状态(a,b),其中a表示用于转发网络设备C1中通过组内绕行最短路径转发的报文的第一出接口队列的拥塞状态,b表示用于转发网络设备C1中通过组内绕行非最短路径转发的报文的第二出接口队列的拥塞状态。
步骤206、第一网络设备将第一网络设备与第二网络设备之间未发生拥塞的任一组内绕行最短路径作为目标转发路径。
可选地,第一网络设备可以对第一网络设备与第二网络设备之间的所有组内绕行最短路径分别进行拥塞判断,然后从第一网络设备与第二网络设备之间未发生拥塞的组内绕行最短路径中随机选择一条作为目标转发路径。或者,第一网络设备也可以依次对第一网络设备与第二网络设备之间的组内绕行最短路径进行拥塞判断,在得到未发生拥塞的组内绕行最短路径之后停止判断,并将该未发生拥塞的组内绕行最短路径作为目标转发路径。
在如图1所示的蜻蜓网络中,网络设备C1与网络设备B5之间有4条组内绕行最短路径:网络设备C1→网络设备C2→网络设备B2→网络设备B5;网络设备C1→网络设备C3→网络设备B3→网络设备B5;网络设备C1→网络设备C4→网络设备B4→网络设备B5;网络设备C1→网络设备C5→网络设备B5。
步骤207、第一网络设备对第一网络设备与第二网络设备之间的本地直连非最短路径进行拥塞判断;若第一网络设备与第二网络设备之间存在未发生拥塞的本地直连非最短路径,则执行步骤208;若第一网络设备与第二网络设备之间不存在未发生拥塞的本地直连非最短路径,则执行步骤209。
可选地,第一网络设备根据第一网络设备上与第三设备组连接的全局接口的拥塞状态,确定第一网络设备与第二网络设备之间经过第三设备组的本地直连非最短路径是否发生拥塞。若第一网络设备上与第三设备组连接的全局接口发生拥塞,则确定第一网络设备与第二网络设备之间经过第三设备组的本地直连非最短路径发生拥塞。若第一网络设备上与第三设备组连接的全局接口未发生拥塞,则确定第一网络设备与第二网络设备之间经过第三设备组的本地直连非最短路径未发生拥塞。
可选地,第三设备组为蜻蜓网络中除第一设备组和第二设备组以外的任一设备组。结合参考步骤201中的例子,第三设备组为设备组A。假设网络设备C1通过全局接口GE7与网络设备A5连接。若网络设备C1的全局接口GE7发生拥塞,则确定网络设备C1与网络设备A5之间的组间互联链路L AC5发生拥塞,进而确定网络设备C1与网络设备B5之间的本地直连非最短路径发生拥塞;若网络设备C1的全局接口GE7未发生拥塞,则确定网络设备C1与网络设备A5之间的组间互联链路L AC5未发生拥塞,进而确定网络设备C1与网络设备B5之间的本地直连非最短路径未发生拥塞。
步骤208、第一网络设备将第一网络设备与第二网络设备之间未发生拥塞的任一本地直连非最短路径作为目标转发路径。
可选地,第一网络设备可以对第一网络设备与第二网络设备之间的所有本地直连非最短路径分别进行拥塞判断,然后从第一网络设备与第二网络设备之间未发生拥塞的本地直连非最短路径中随机选择一条作为目标转发路径。或者,第一网络设备也可以依次对第一网络设备与第二网络设备之间的本地直连非最短路径进行拥塞判断,在得到未发生拥塞的本地直连非最短路径之后停止判断,并将该未发生拥塞的本地直连非最短路径作为目标转发路径。
在如图1所示的蜻蜓网络中,网络设备C1与网络设备B5之间只有一条本地直连非最短路径:网络设备C1→网络设备A5→网络设备B5。
步骤209、第一网络设备对第一网络设备与第二网络设备之间的组内绕行非最短路径进行拥塞判断;若第一网络设备与第二网络设备之间存在未发生拥塞的组内绕行非最短路径,则执行步骤210;若第一网络设备与第二网络设备之间不存在未发生拥塞的组内绕行非最短路径,则结束报文转发流程。
可选地,第一网络设备根据第一网络设备上与第四网络设备连接的本地接口的第二出接口队列的队列深度以及第四网络设备上与第四设备组连接的全局接口的拥塞状态,确定第一网络设备与第二网络设备之间经过第四网络设备以及第四设备组的组内绕行非最短路径是否发生拥塞。若第一网络设备上与第四网络设备连接的本地接口的第二出接口队列的队列深度超出第一阈值,和/或,第四网络设备上与第四设备组连接的全局接口发生拥塞,则确定第一 网络设备与第二网络设备之间经过第四网络设备以及第四设备组的组内绕行非最短路径发生拥塞。若第一网络设备上与第四网络设备连接的本地接口的第二出接口队列的队列深度未超出第一阈值,且第四网络设备上与第四设备组连接的全局接口未发生拥塞,则确定第一网络设备与第二网络设备之间经过第四网络设备以及第四设备组的组内绕行非最短路径未发生拥塞。其中,第二出接口队列用于转发第一网络设备中通过组内绕行非最短路径转发的报文。
可选地,第四网络设备为第一设备组中与第二设备组之间具有组间互联链路的除第一网络设备以外的任一网络设备。第四网络设备与上述第三网络设备可以是同一网络设备,或者也可以是不同网络设备。第四设备组为蜻蜓网络中除第一设备组和第二设备组以外的任一设备组。第四设备组与上述第三设备组可以是同一设备组,或者也可以是不同设备组。结合参考步骤201中的例子,第三网络设备可以是网络设备C2、网络设备C3、网络设备C4或网络设备C5。第四设备组为设备组A。
步骤210、第一网络设备将第一网络设备与第二网络设备之间未发生拥塞的任一组内绕行非最短路径作为目标转发路径。
可选地,第一网络设备可以对第一网络设备与第二网络设备之间的所有组内绕行非最短路径分别进行拥塞判断,然后从第一网络设备与第二网络设备之间未发生拥塞的组内绕行非最短路径中随机选择一条作为目标转发路径。或者,第一网络设备也可以依次对第一网络设备与第二网络设备之间的组内绕行非最短路径进行拥塞判断,在得到未发生拥塞的组内绕行非最短路径之后停止判断,并将该未发生拥塞的组内绕行非最短路径作为目标转发路径。
在如图1所示的蜻蜓网络中,网络设备C1与网络设备B5之间有4条组内绕行非最短路径:网络设备C1→网络设备C2→网络设备A4→网络设备B4→网络设备B5;网络设备C1→网络设备C3→网络设备A3→网络设备B3→网络设备B5;网络设备C1→网络设备C4→网络设备A2→网络设备B2→网络设备B5;网络设备C1→网络设备C5→网络设备A1→网络设备B1→网络设备B5。
步骤211、第一网络设备通过目标转发路径向第二终端设备发送基于第一报文得到的第二报文。
可选地,当目标转发路径为组内绕行最短路径时,第一网络设备在第一报文中添加第一指示得到第二报文,并通过目标转发路径向第二网络设备发送第二报文,该第一指示用于指示转发路径类型为组内绕行最短路径。或者,当目标转发路径为组内绕行非最短路径时,第一网络设备在第一报文中添加第二指示得到第二报文,并通过目标转发路径向第二网络设备发送第二报文,第二指示用于指示转发路径类型为组内绕行非最短路径。
由于第一网络设备选用组内绕行最短路径或组内绕行非最短路径时,都要通过第一设备组内的另一网络设备转发报文,而无论第一网络设备选用组内绕行最短路径还是组内绕行非最短路径,如果通过第一设备组内的同一网络设备转发报文,那么这个网络设备接收报文所使用的本地接口是相同的,因此这个网络设备需要区分从本地接口接收到的报文中哪些报文走的是组内绕行最短路径,哪些报文走的是组内绕行非最短路径,以便采用相应的转发路径来转发报文。本申请实施例中,源网络设备通过在走组内绕行最短路径的报文中添加第一指示,在走组内绕行非最短路径的报文中添加第二指示,以明确指示设备组内转发报文的其它网络设备采用哪种转发路径来转发报文,这样其它网络设备可以基于报文中的指示知晓应该通过哪个全局接口转发报文,从而可以提高报文转发效率。
可选地,上述第一指示、第二指示可以是不同的差分服务代码点(differentiated services code point,DSCP)编码值,或者可以是不同的虚拟局域网(virtual local area network,VLAN)优先级,又或者可以是不同的802.1p标签,等等。第一指示和第二指示具体可以由网络管理人员预先分配,本申请实施例对指示的类型不做限定。例如,采用VLAN优先级来指示转发路径类型时,可以用VLAN 10指示转发路径类型为组内绕行最短路径,用VLAN 20指示转发路径类型为组内绕行非最短路径。
可选地,当目标转发路径为本地直连非最短路径时,第一网络设备直接通过目标转发路径向第二终端设备发送第一报文,即步骤211中的第二报文即步骤201中的第一报文。
在本申请实施例提供的报文转发方法中,通过将组间最短路径划分为本地直连最短路径和组内绕行最短路径,和/或,将组间非最短路径划分为本地直连非最短路径和组内绕行非最短路径,使得源网络设备能够在更细的粒度上进行路径选择,实现对报文转发路径的精细控制,从而能够精确控制报文转发的低时延和高吞吐。另外,源网络设备在进行路径选择时,遵循优先选用组间最短路径、组间最短路径都发生拥塞后再选用组间非最短路径的原则,并且在组间最短路径中选择转发路径时优先选用本地直连最短路径,在组间非最短路径中选择转发路径时优先选用本地直连非最短路径,一方面能够尽量降低报文转发时延,另一方面优先选用本地直连路径,源网络设备能够较好地实时感知本地拥塞情况,后续根据本地拥塞情况判断是否切换路径,可以较为准确地确定路径切换时机。
本申请实施例提供的报文转发方法的步骤的先后顺序能够进行适当调整,步骤也能够根据情况进行相应增减。任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到变化的方法,都应涵盖在本申请的保护范围之内。
可选地,上述步骤201至步骤211可以由第一网络设备中的转发芯片执行,也即是,由转发芯片对转发路径进行拥塞判断,并基于判断结果选择合适的转发路径来转发报文。或者,上述步骤201至步骤211也可以由第一网络设备中的转发芯片协同协处理器执行,转发芯片接收到报文后,由协处理器对转发路径进行拥塞判断,并基于判断结果选择合适的转发路径,转发芯片再基于协处理器选择的转发路径来转发报文。
本申请实施例中,蜻蜓网络中的网络设备之间基于三层转发技术实现通信。每个网络设备都需要维护路由转发表,以实现报文转发。
可选地,第一网络设备获取路由转发表,该路由转发表包括路由前缀表和多个ECMP组表。路由前缀表的每个表项包括目的IP地址与目的设备组的组索引的对应关系。目的设备组的组索引关联一个ECMP组表,该ECMP组表包括从第一网络设备到该目的设备组的每条路径分别对应的出接口,该目的设备组为对应的目的IP地址所属的终端设备的接入设备所在的设备组。实际应用中,路由前缀表的每个表项对应存储一个IP地址与一个组索引。一个表项中的IP地址为目的终端设备的IP地址(即目的IP地址),组索引为目的终端设备的接入设备所在的设备组(即目的设备组)的组索引。
例如在如图1所示的蜻蜓网络中,终端设备S2的IP地址为20.1.1.1/24,终端设备S3的IP地址为20.1.2.1/24,设备组B的组索引为100,则网络设备C1中存储的路由前缀表可以如表2所示。
表2
IP地址 组索引
20.1.1.1/24 100
20.1.2.1/24 100
参见表2,由于终端设备S2与终端设备S3都接入设备组B,因此该路由前缀表将目的地址为终端设备S2的IP地址的报文以及目的地址为终端设备S3的IP地址的报文都索引至设备组B所关联的ECMP组表。
可选地,ECMP组表除了包括从第一网络设备到该目的设备组的每条路径分别对应的出接口以外,还可以包括各个出接口对应的本地拥塞程度和远端拥塞程度。其中,本地拥塞程度指本地出接口的拥塞程度。远端拥塞程度指经由本地出接口发出的报文在本设备组内其它网络设备对应的出接口的拥塞程度。组内绕行最短路径和组内绕行非最短路径对应有本地拥塞程度和远端拥塞程度。本地直连最短路径和本地直连非最短路径只对应有本地拥塞程度。
例如在如图1所示的蜻蜓网络中,网络设备C1通过全局接口GE7与网络设备A5连接,通过全局接口GE8与网络设备B1连接,通过本地接口GE3与网络设备C2连接,通过本地接口GE4与网络设备C3连接,通过本地接口GE5与网络设备C4连接,通过本地接口GE6与网络设备C5连接。网络设备C2通过全局接口GE7与网络设备A4连接,通过全局接口GE8与网络设备B2连接。网络设备C3通过全局接口GE7与网络设备A3连接,通过全局接口GE8与网络设备B3连接。网络设备C4通过全局接口GE7与网络设备A2连接,通过全局接口GE8与网络设备B4连接。网络设备C5通过全局接口GE7与网络设备A1连接,通过全局接口GE8与网络设备B5连接。若网络设备C1中所存储的自身各个接口的最新拥塞状态以及设备组B内的其它网络设备的全局接口的最新拥塞状态如表1所示,则网络设备C1中存储的与设备组B关联的ECMP组表可以如表3所示。
表3
Figure PCTCN2022098019-appb-000001
其中,出接口角色“min1”对应本地直连最短路径,出接口角色“min2”对应组内绕行最短路径,出接口角色“non-min1”对应本地直连非最短路径,出接口角色“non-min2”对应组内绕行非最短路径。
本申请实施例中,网络设备采用路由前缀表结合ECMP组表的方式存储到各个终端设备 的路由,采用一个ECMP组表即可存储接入同一设备组的终端设备的IP地址对应的所有转发信息,可以节约表项资源。
可选地,ECMP组表包括第一路由子表和第二路由子表。第一路由子表与第二路由子表可以关联至同一个路由前缀表,或者也可以分别关联至一个路由前缀表。第一路由子表与第二路由子表所关联的路由前缀表的表项内容通常是相同的。
第一路由子表包括从第一网络设备到目的设备组的每条路径分别对应的出接口,第一路由子表用于第一网络设备转发来自接入第一设备组的终端设备的报文。也即是,当第一网络设备位于源设备组内时,第一网络设备基于第一路由子表转发报文。
可选地,当第一网络设备通过接入接口接收到与第一网络设备连接的终端设备发送的报文后,第一网络设备可以基于第一路由子表,依次判断本地直连最短路径、组内绕行最短路径、本地直连非最短路径以及组内绕行非最短路径是否发生拥塞,直至得到非拥塞路径为止,再采用对应的出接口转发报文。当第一网络设备通过本地接口接收到第一设备组内的其它网络设备发送的、来自与该其它网络设备连接的终端设备的报文后,第一网络设备根据报文中的指示确定该报文走组内绕行最短路径还是走组内绕行非最短路径,然后采用对应的出接口转发报文。
第一路由子表可以如表3所示。网络设备C1接收到来自终端设备S1、目的地址为20.1.1.1/24(终端设备S2的IP地址)的报文后,首先基于表2索引得到表3,由于出接口角色“min1”对应的出接口GE8未发生拥塞,即本地直连最短路径未发生拥塞,因此网络设备C1直接通过接口GE8转发该报文。
第二路由子表包括从第一网络设备到目的设备组的组间最短路径对应的出接口,第二路由子表用于第一网络设备转发来自接入除第一设备组以外的其它设备组的终端设备的报文。也即是,当第一网络设备位于中间设备组或目的设备组内时,第一网络设备基于第二路由子表转发报文,即第一网络设备使用最短路径算法转发报文,可以避免出现路由环路。
可选地,当第一网络设备通过全局接口接收到来自其它设备组的报文后,第一网络设备基于第二路由子表确定对应的出接口来转发报文。例如,第二路由子表可以如表4所示。
表4
Figure PCTCN2022098019-appb-000002
其中,表4中各个表项的含义可参考对表3的相关描述,本申请实施例在此不再赘述。
本申请实施例中,可以通过分布式路由协议或集中控制方案生成路由转发表。
第一种实现方式,网络设备之间采用分布式路由协议进行路由同步,根据统一策略配置各自生成本地路由转发表。同一设备组内的网络设备之间可以使用内部边界网关协议(interior Border Gateway Protocol,iBGP)发布路由信息,即组内互联链路运行iBGP。不同设备组的网络设备之间可以使用外部边界网关协议(external Border Gateway Protocol,eBGP)发布路 由信息,即组间互联链路运行eBGP。每个设备组分别对应一个AS号。
这种实现方式下,第一网络设备获取路由转发表的过程包括:当第一网络设备接收到第一路由消息,且第一路由消息的AS-path属性仅包含一个AS号时,第一网络设备在第一路由子表和第二路由子表中分别添加基于第一路由消息得到的转发表项。当第一网络设备接收到第二路由消息,且第二路由消息的AS-path属性包含两个AS号时,第一网络设备仅在第一路由子表中添加基于第二路由消息得到的转发表项。
例如在如图1所示的蜻蜓网络中,设备组A中的网络设备A1-A5都配置有AS号100,设备组B中的网络设备B1-B5都配置有AS号101,设备组C中的网络设备C1-C5都配置有AS号102。
设备组B内的网络设备B5通过eBGP发布包含终端设备S2的IP地址(20.1.1.1/24)的路由消息,该路由消息中的AS-path属性携带有AS号101,该路由消息会发布给设备组A和设备组C中的各个网络设备。例如,网络设备C1通过eBGP接收到该路由消息后,由于该路由消息的AS-path属性仅包含一个AS号101,网络设备C1会将该路由作为本地直连最短路径,在本地路由前缀表中添加终端设备S2的IP地址与设备组B的组索引的对应关系,并在设备组B关联的ECMP组表所包含的第一路由子表和第二路由子表中分别添加相应的转发表项。
进一步地,网络设备C1在设备组C内通过iBGP发布包含终端设备S2的IP地址的路由消息。网络设备C2-C5通过iBGP接收到该路由消息后,由于该路由消息的AS-path属性仅包含一个AS号101,网络设备C2-C5会将该路由作为组内绕行最短路径,在本地路由前缀表中添加终端设备S2的IP地址与设备组B的组索引的对应关系,并在设备组B关联的ECMP组表所包含的第一路由子表和第二路由子表中分别添加相应的转发表项。
同时,网络设备C1在接收到的路由消息的AS-path属性中添加自身的AS号102,再继续通过eBGP向设备组A发布包含终端设备S2的IP地址的路由消息,该路由消息中的AS-path属性携带有AS号101和102,该路由消息会发布给设备组A中的各个网络设备。例如,网络设备A5通过eBGP接收到该路由消息后,由于该路由消息的AS-path属性包含两个AS号101和102,网络设备A5会将该路由作为本地直连非最短路径,在本地路由前缀表中添加终端设备S2的IP地址与设备组B的组索引的对应关系,并在设备组B关联的ECMP组表所包含的第一路由子表中添加相应的转发表项。
再进一步地,网络设备A5在设备组A内通过iBGP发布包含终端设备S2的IP地址的路由消息。网络设备A1-A4通过iBGP接收到该路由消息后,由于该路由消息的AS-path属性包含两个AS号101和102,网络设备A1-A4会将该路由作为组内绕行非最短路径,在本地路由前缀表中添加终端设备S2的IP地址与设备组B的组索引的对应关系,并在设备组B关联的ECMP组表所包含的第一路由子表中添加相应的转发表项。
最终,设备组A和设备组C中的各个网络设备可以得到类似表2的路由前缀表、类似表3的第一路由子表以及类似表4的第二路由子表。
第二种实现方式,通过集中控制方案生成路由转发表。蜻蜓网络中的各个网络设备由控制设备统一控制和管理,控制设备中存储有蜻蜓网络的组网拓扑以及各个网络设备连接的终端设备的IP地址等。当网络设备连接的终端设备的IP地址发生变化或网络设备的链路状态信息发生变化时,网络设备可以向控制设备上报变更后的信息,以供控制设备基于更新存储 的信息。该控制设备例如可以是软件定义网络(software-defined networking,SDN)控制器。
可选地,控制设备可以向各个网络设备发送蜻蜓网络的组网拓扑以及各个网络设备连接的终端设备的IP地址,由各个网络设备分别根据统一的路由算法自行计算到达其它网络设备连接的终端设备的转发路径,以生成对应的路由转发表。例如,第一网络设备可以根据蜻蜓网络的组网拓扑、接入蜻蜓网络的终端设备的IP地址以及终端设备的接入设备,生成路由前缀表和多个ECMP组表。
或者,控制设备可以根据蜻蜓网络的组网拓扑、接入蜻蜓网络的终端设备的IP地址以及终端设备的接入设备,生成各个网络设备对应的路由转发表,并向网络设备发送对应的路由转发表。例如,第一网络设备可以接收控制设备发送的路由转发表。
用于执行图2所示方法的第一网络设备可以是图3所示的报文转发装置300。该报文转发装置300应用于蜻蜓网络中的网络设备。该蜻蜓网络包括多个设备组,不同设备组之间具有多条组间互联链路。如图3所示,装置300包括:
接收模块301,用于接收与第一网络设备连接的第一终端设备发送的第一报文,第一报文的目的地址为与第二网络设备连接的第二终端设备的IP地址,第一网络设备属于第一设备组,第二网络设备属于第二设备组。
处理模块302,用于当第一网络设备与第二设备组之间存在第一组间互联链路时,确定第一组间互联链路是否发生拥塞。
发送模块303,用于当第一组间互联链路未发生拥塞时,通过第一组间互联链路向第二网络设备发送第一报文。
可选地,处理模块302,用于根据第一网络设备上与第二设备组连接的全局接口的拥塞状态,确定第一组间互联链路是否发生拥塞。
可选地,处理模块302,还用于当第一组间互联链路发生拥塞,或者,第一网络设备与第二设备组之间不存在组间互联链路时,依次对第一网络设备与第二网络设备之间的组内绕行最短路径、本地直连非最短路径以及组内绕行非最短路径进行拥塞判断,直至得到目标转发路径为止。发送模块303,还用于通过目标转发路径向第二网络设备发送基于第一报文得到的第二报文。其中,组内绕行最短路径包括第一网络设备与第一设备组中的第三网络设备之间的组内互联链路以及第三网络设备与第二设备组之间的组间互联链路,本地直连非最短路径包括第一网络设备与第三设备组之间的组间互联链路以及第三设备组与第二设备组之间的一条组间互联链路,组内绕行非最短路径包括第一网络设备与第一设备组中的第四网络设备之间的组内互联链路、第四网络设备与第四设备组之间的组间互联链路以及第四设备组与第二设备组之间的一条组间互联链路。
可选地,处理模块302,用于:对第一网络设备与第二网络设备之间的组内绕行最短路径进行拥塞判断。当第一网络设备与第二网络设备之间存在未发生拥塞的组内绕行最短路径时,将第一网络设备与第二网络设备之间未发生拥塞的任一组内绕行最短路径作为目标转发路径。当第一网络设备与第二网络设备之间不存在未发生拥塞的组内绕行最短路径时,对第一网络设备与第二网络设备之间的本地直连非最短路径进行拥塞判断。当第一网络设备与第二网络设备之间存在未发生拥塞的本地直连非最短路径时,将第一网络设备与第二网络设备之间未发生拥塞的任一本地直连非最短路径作为目标转发路径。当第一网络设备与第二网络设备之间不存在未发生拥塞的本地直连非最短路径时,对第一网络设备与第二网络设备之间的组内 绕行非最短路径进行拥塞判断。当第一网络设备与第二网络设备之间存在未发生拥塞的组内绕行非最短路径时,将第一网络设备与第二网络设备之间未发生拥塞的任一组内绕行非最短路径作为目标转发路径。
可选地,处理模块302,用于:根据第一网络设备上与第三网络设备连接的本地接口的第一出接口队列的队列深度以及第三网络设备上与第二设备组连接的全局接口的拥塞状态,确定第一网络设备与第二网络设备之间经过第三网络设备的组内绕行最短路径是否发生拥塞,第一出接口队列用于转发第一网络设备中通过组内绕行最短路径转发的报文。
可选地,处理模块302,用于:根据第一网络设备上与第三设备组连接的全局接口的拥塞状态,确定第一网络设备与第二网络设备之间经过第三设备组的本地直连非最短路径是否发生拥塞。
可选地,处理模块302,用于:根据第一网络设备上与第四网络设备连接的本地接口的第二出接口队列的队列深度以及第四网络设备上与第四设备组连接的全局接口的拥塞状态,确定第一网络设备与第二网络设备之间经过第四网络设备以及第四设备组的组内绕行非最短路径是否发生拥塞,第二出接口队列用于转发第一网络设备中通过组内绕行非最短路径转发的报文。
可选地,发送模块303,用于:当目标转发路径为组内绕行最短路径时,在第一报文中添加第一指示得到第二报文,并通过目标转发路径向第二网络设备发送第二报文,第一指示用于指示转发路径类型为组内绕行最短路径。或者,当目标转发路径为组内绕行非最短路径时,在第一报文中添加第二指示得到第二报文,并通过目标转发路径向第二网络设备发送第二报文,第二指示用于指示转发路径类型为组内绕行非最短路径。
可选地,如图4所示,装置300还包括:获取模块304,用于获取路由转发表,路由转发表包括路由前缀表和多个ECMP组表。路由前缀表的每个表项包括目的IP地址与目的设备组的组索引的对应关系,组索引关联一个ECMP组表,ECMP组表包括从第一网络设备到该目的设备组的每条路径分别对应的出接口,目的设备组为该目的IP地址所属的终端设备的接入设备所在的设备组。
可选地,ECMP组表包括第一路由子表和第二路由子表。第一路由子表包括从第一网络设备到目的设备组的每条路径分别对应的出接口,第一路由子表用于第一网络设备转发来自接入第一设备组的终端设备的报文。第二路由子表包括从第一网络设备到目的设备组的组间最短路径对应的出接口,第二路由子表用于第一网络设备转发来自接入除第一设备组以外的其它设备组的终端设备的报文。
可选地,每个设备组分别对应一个AS号,获取模块304,用于:当第一网络设备接收到第一路由消息,且第一路由消息的AS-path属性仅包含一个AS号时,在第一路由子表和第二路由子表中分别添加基于第一路由消息得到的转发表项。当第一网络设备接收到第二路由消息,且第二路由消息的AS-path属性包含两个AS号时,仅在第一路由子表中添加基于第二路由消息得到的转发表项。
或者,获取模块304,用于:根据蜻蜓网络的组网拓扑、接入蜻蜓网络的终端设备的IP地址以及终端设备的接入设备,生成路由前缀表和多个ECMP组表。
又或者,获取模块304,用于:接收控制设备发送的路由转发表。
关于上述实施例中的装置,其中各个模块执行操作的具体方式已经在有关该方法的实施 例中进行了详细描述,此处将不做详细阐述说明。
用于执行图2所示方法的第一网络设备可以是图5所示的网络设备500。该网络设备500为蜻蜓网络中的网络设备。该蜻蜓网络包括多个设备组,不同设备组之间具有多条组间互联链路。该网络设备500可以是路由器或交换机等。如图5所示,网络设备500包括:处理器501和存储器502。
存储器502,用于存储计算机程序,所述计算机程序包括程序指令;
处理器501,用于调用所述计算机程序,实现如上述方法实施例中第一网络设备执行的动作。
可选地,网络设备500还包括通信总线503和通信接口504。
其中,处理器501包括一个或者一个以上处理核心,处理器501通过运行计算机程序,执行各种功能应用以及数据处理。
存储器502可用于存储计算机程序。可选地,存储器可存储操作系统和至少一个功能所需的应用程序单元。操作系统可以是实时操作系统(Real Time eXecutive,RTX)、LINUX、UNIX、WINDOWS或OS X之类的操作系统。
通信接口504可以为多个,通信接口504用于与例如终端设备或其它网络设备进行通信。例如在本申请实施例中,通信接口504用于收发报文。
存储器502与通信接口504分别通过通信总线503与处理器501连接。
本申请实施例还提供了一种计算机可读存储介质,所述计算机可读存储介质上存储有指令,当所述指令被处理器执行时,实现上述方法实施例中第一网络设备执行的动作。
本申请实施例还提供了一种计算机程序产品,包括计算机程序,所述计算机程序被处理器执行时,实现上述方法实施例中第一网络设备执行的动作。
本领域普通技术人员可以理解实现上述实施例的全部或部分步骤可以通过硬件来完成,也可以通过程序来指令相关的硬件完成,所述的程序可以存储于一种计算机可读存储介质中,上述提到的存储介质可以是只读存储器,磁盘或光盘等。
在本申请实施例中,术语“第一”、“第二”和“第三”仅用于描述目的,而不能理解为指示或暗示相对重要性。
本申请中术语“和/或”,仅仅是一种描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。另外,本文中字符“/”,一般表示前后关联对象是一种“或”的关系。
以上所述仅为本申请的可选实施例,并不用以限制本申请,凡在本申请的构思和原则之内,所作的任何修改、等同替换、改进等,均应包含在本申请的保护范围之内。

Claims (30)

  1. 一种报文转发方法,其特征在于,应用于蜻蜓网络,所述蜻蜓网络包括多个设备组,不同设备组之间具有多条组间互联链路,所述方法包括:
    第一网络设备接收与所述第一网络设备连接的第一终端设备发送的第一报文,所述第一报文的目的地址为与第二网络设备连接的第二终端设备的互联网协议IP地址,所述第一网络设备属于第一设备组,所述第二网络设备属于第二设备组;
    当所述第一网络设备与所述第二设备组之间存在第一组间互联链路,且所述第一组间互联链路未发生拥塞时,所述第一网络设备通过所述第一组间互联链路向所述第二网络设备发送所述第一报文。
  2. 根据权利要求1所述的方法,其特征在于,所述方法还包括:
    所述第一网络设备根据所述第一网络设备上与所述第二设备组连接的全局接口的拥塞状态,确定所述第一组间互联链路是否发生拥塞。
  3. 根据权利要求1或2所述的方法,其特征在于,所述方法还包括:
    当所述第一组间互联链路发生拥塞,或者,所述第一网络设备与所述第二设备组之间不存在组间互联链路时,所述第一网络设备依次对所述第一网络设备与所述第二网络设备之间的组内绕行最短路径、本地直连非最短路径以及组内绕行非最短路径进行拥塞判断,直至得到目标转发路径为止;
    所述第一网络设备通过所述目标转发路径向所述第二网络设备发送基于所述第一报文得到的第二报文;
    其中,所述组内绕行最短路径包括所述第一网络设备与所述第一设备组中的第三网络设备之间的组内互联链路以及所述第三网络设备与所述第二设备组之间的组间互联链路,所述本地直连非最短路径包括所述第一网络设备与第三设备组之间的组间互联链路以及所述第三设备组与所述第二设备组之间的一条组间互联链路,所述组内绕行非最短路径包括所述第一网络设备与所述第一设备组中的第四网络设备之间的组内互联链路、所述第四网络设备与第四设备组之间的组间互联链路以及所述第四设备组与所述第二设备组之间的一条组间互联链路。
  4. 根据权利要求3所述的方法,其特征在于,所述第一网络设备依次对所述第一网络设备与所述第二网络设备之间的组内绕行最短路径、本地直连非最短路径以及组内绕行非最短路径进行拥塞判断,直至得到目标转发路径为止,包括:
    所述第一网络设备对所述第一网络设备与所述第二网络设备之间的组内绕行最短路径进行拥塞判断;
    当所述第一网络设备与所述第二网络设备之间存在未发生拥塞的组内绕行最短路径时,所述第一网络设备将所述第一网络设备与所述第二网络设备之间未发生拥塞的任一组内绕行最短路径作为所述目标转发路径;
    当所述第一网络设备与所述第二网络设备之间不存在未发生拥塞的组内绕行最短路径时,所述第一网络设备对所述第一网络设备与所述第二网络设备之间的本地直连非最短路径进行拥塞判断;
    当所述第一网络设备与所述第二网络设备之间存在未发生拥塞的本地直连非最短路径时,所述第一网络设备将所述第一网络设备与所述第二网络设备之间未发生拥塞的任一本地直连非最短路径作为所述目标转发路径;
    当所述第一网络设备与所述第二网络设备之间不存在未发生拥塞的本地直连非最短路径时,所述第一网络设备对所述第一网络设备与所述第二网络设备之间的组内绕行非最短路径进行拥塞判断;
    当所述第一网络设备与所述第二网络设备之间存在未发生拥塞的组内绕行非最短路径时,所述第一网络设备将所述第一网络设备与所述第二网络设备之间未发生拥塞的任一组内绕行非最短路径作为所述目标转发路径。
  5. 根据权利要求4所述的方法,其特征在于,所述第一网络设备对所述第一网络设备与所述第二网络设备之间的组内绕行最短路径进行拥塞判断,包括:
    所述第一网络设备根据所述第一网络设备上与所述第三网络设备连接的本地接口的第一出接口队列的队列深度以及所述第三网络设备上与所述第二设备组连接的全局接口的拥塞状态,确定所述第一网络设备与所述第二网络设备之间经过所述第三网络设备的组内绕行最短路径是否发生拥塞,所述第一出接口队列用于转发所述第一网络设备中通过组内绕行最短路径转发的报文。
  6. 根据权利要求4所述的方法,其特征在于,所述第一网络设备对所述第一网络设备与所述第二网络设备之间的本地直连非最短路径进行拥塞判断,包括:
    所述第一网络设备根据所述第一网络设备上与所述第三设备组连接的全局接口的拥塞状态,确定所述第一网络设备与所述第二网络设备之间经过所述第三设备组的本地直连非最短路径是否发生拥塞。
  7. 根据权利要求4所述的方法,其特征在于,所述第一网络设备对所述第一网络设备与所述第二网络设备之间的组内绕行非最短路径进行拥塞判断,包括:
    所述第一网络设备根据所述第一网络设备上与所述第四网络设备连接的本地接口的第二出接口队列的队列深度以及所述第四网络设备上与所述第四设备组连接的全局接口的拥塞状态,确定所述第一网络设备与所述第二网络设备之间经过所述第四网络设备以及所述第四设备组的组内绕行非最短路径是否发生拥塞,所述第二出接口队列用于转发所述第一网络设备中通过组内绕行非最短路径转发的报文。
  8. 根据权利要求3至7任一所述的方法,其特征在于,所述第一网络设备通过所述目标转发路径向所述第二网络设备发送基于所述第一报文得到的第二报文,包括:
    当所述目标转发路径为组内绕行最短路径时,所述第一网络设备在所述第一报文中添加第一指示得到第二报文,并通过所述目标转发路径向所述第二网络设备发送所述第二报文, 所述第一指示用于指示转发路径类型为组内绕行最短路径;或者,
    当所述目标转发路径为组内绕行非最短路径时,所述第一网络设备在所述第一报文中添加第二指示得到第二报文,并通过所述目标转发路径向所述第二网络设备发送所述第二报文,所述第二指示用于指示转发路径类型为组内绕行非最短路径。
  9. 根据权利要求1至8任一所述的方法,其特征在于,所述方法还包括:
    所述第一网络设备获取路由转发表,所述路由转发表包括路由前缀表和多个等价多路径ECMP组表;
    所述路由前缀表的每个表项包括目的IP地址与目的设备组的组索引的对应关系,所述组索引关联一个ECMP组表,所述ECMP组表包括从所述第一网络设备到所述目的设备组的每条路径分别对应的出接口,所述目的设备组为所述目的IP地址所属的终端设备的接入设备所在的设备组。
  10. 根据权利要求9所述的方法,其特征在于,所述ECMP组表包括第一路由子表和第二路由子表;
    所述第一路由子表包括从所述第一网络设备到所述目的设备组的每条路径分别对应的出接口,所述第一路由子表用于所述第一网络设备转发来自接入所述第一设备组的终端设备的报文;
    所述第二路由子表包括从所述第一网络设备到所述目的设备组的组间最短路径对应的出接口,所述第二路由子表用于所述第一网络设备转发来自接入除所述第一设备组以外的其它设备组的终端设备的报文。
  11. 根据权利要求10所述的方法,其特征在于,每个所述设备组分别对应一个自治系统AS号,所述第一网络设备获取路由转发表,包括:
    当所述第一网络设备接收到第一路由消息,且所述第一路由消息的AS-path属性仅包含一个AS号时,所述第一网络设备在所述第一路由子表和所述第二路由子表中分别添加基于所述第一路由消息得到的转发表项;
    当所述第一网络设备接收到第二路由消息,且所述第二路由消息的AS-path属性包含两个AS号时,所述第一网络设备仅在所述第一路由子表中添加基于所述第二路由消息得到的转发表项。
  12. 根据权利要求9或10所述的方法,其特征在于,所述第一网络设备获取路由转发表,包括:
    所述第一网络设备根据所述蜻蜓网络的组网拓扑、接入所述蜻蜓网络的终端设备的IP地址以及所述终端设备的接入设备,生成所述路由前缀表和多个所述ECMP组表。
  13. 根据权利要求9或10所述的方法,其特征在于,所述第一网络设备获取路由转发表,包括:
    所述第一网络设备接收控制设备发送的所述路由转发表。
  14. 一种报文转发装置,其特征在于,应用于蜻蜓网络中的第一网络设备,所述蜻蜓网络包括多个设备组,不同设备组之间具有多条组间互联链路,所述装置包括:
    接收模块,用于接收与所述第一网络设备连接的第一终端设备发送的第一报文,所述第一报文的目的地址为与第二网络设备连接的第二终端设备的互联网协议IP地址,所述第一网络设备属于第一设备组,所述第二网络设备属于第二设备组;
    处理模块,用于当所述第一网络设备与所述第二设备组之间存在第一组间互联链路时,确定所述第一组间互联链路是否发生拥塞;
    发送模块,用于当所述第一组间互联链路未发生拥塞时,通过所述第一组间互联链路向所述第二网络设备发送所述第一报文。
  15. 根据权利要求14所述的装置,其特征在于,
    所述处理模块,用于根据所述第一网络设备上与所述第二设备组连接的全局接口的拥塞状态,确定所述第一组间互联链路是否发生拥塞。
  16. 根据权利要求14或15所述的装置,其特征在于,
    所述处理模块,还用于当所述第一组间互联链路发生拥塞,或者,所述第一网络设备与所述第二设备组之间不存在组间互联链路时,依次对所述第一网络设备与所述第二网络设备之间的组内绕行最短路径、本地直连非最短路径以及组内绕行非最短路径进行拥塞判断,直至得到目标转发路径为止;
    所述发送模块,还用于通过所述目标转发路径向所述第二网络设备发送基于所述第一报文得到的第二报文;
    其中,所述组内绕行最短路径包括所述第一网络设备与所述第一设备组中的第三网络设备之间的组内互联链路以及所述第三网络设备与所述第二设备组之间的组间互联链路,所述本地直连非最短路径包括所述第一网络设备与第三设备组之间的组间互联链路以及所述第三设备组与所述第二设备组之间的一条组间互联链路,所述组内绕行非最短路径包括所述第一网络设备与所述第一设备组中的第四网络设备之间的组内互联链路、所述第四网络设备与第四设备组之间的组间互联链路以及所述第四设备组与所述第二设备组之间的一条组间互联链路。
  17. 根据权利要求16所述的装置,其特征在于,所述处理模块,用于:
    对所述第一网络设备与所述第二网络设备之间的组内绕行最短路径进行拥塞判断;
    当所述第一网络设备与所述第二网络设备之间存在未发生拥塞的组内绕行最短路径时,将所述第一网络设备与所述第二网络设备之间未发生拥塞的任一组内绕行最短路径作为所述目标转发路径;
    当所述第一网络设备与所述第二网络设备之间不存在未发生拥塞的组内绕行最短路径时,对所述第一网络设备与所述第二网络设备之间的本地直连非最短路径进行拥塞判断;
    当所述第一网络设备与所述第二网络设备之间存在未发生拥塞的本地直连非最短路径时,将所述第一网络设备与所述第二网络设备之间未发生拥塞的任一本地直连非最短路径作为所 述目标转发路径;
    当所述第一网络设备与所述第二网络设备之间不存在未发生拥塞的本地直连非最短路径时,对所述第一网络设备与所述第二网络设备之间的组内绕行非最短路径进行拥塞判断;
    当所述第一网络设备与所述第二网络设备之间存在未发生拥塞的组内绕行非最短路径时,将所述第一网络设备与所述第二网络设备之间未发生拥塞的任一组内绕行非最短路径作为所述目标转发路径。
  18. 根据权利要求17所述的装置,其特征在于,所述处理模块,用于:
    根据所述第一网络设备上与所述第三网络设备连接的本地接口的第一出接口队列的队列深度以及所述第三网络设备上与所述第二设备组连接的全局接口的拥塞状态,确定所述第一网络设备与所述第二网络设备之间经过所述第三网络设备的组内绕行最短路径是否发生拥塞,所述第一出接口队列用于转发所述第一网络设备中通过组内绕行最短路径转发的报文。
  19. 根据权利要求17所述的装置,其特征在于,所述处理模块,用于:
    根据所述第一网络设备上与所述第三设备组连接的全局接口的拥塞状态,确定所述第一网络设备与所述第二网络设备之间经过所述第三设备组的本地直连非最短路径是否发生拥塞。
  20. 根据权利要求17所述的装置,其特征在于,所述处理模块,用于:
    根据所述第一网络设备上与所述第四网络设备连接的本地接口的第二出接口队列的队列深度以及所述第四网络设备上与所述第四设备组连接的全局接口的拥塞状态,确定所述第一网络设备与所述第二网络设备之间经过所述第四网络设备以及所述第四设备组的组内绕行非最短路径是否发生拥塞,所述第二出接口队列用于转发所述第一网络设备中通过组内绕行非最短路径转发的报文。
  21. 根据权利要求16至20任一所述的装置,其特征在于,所述发送模块,用于:
    当所述目标转发路径为组内绕行最短路径时,在所述第一报文中添加第一指示得到第二报文,并通过所述目标转发路径向所述第二网络设备发送所述第二报文,所述第一指示用于指示转发路径类型为组内绕行最短路径;或者,
    当所述目标转发路径为组内绕行非最短路径时,在所述第一报文中添加第二指示得到第二报文,并通过所述目标转发路径向所述第二网络设备发送所述第二报文,所述第二指示用于指示转发路径类型为组内绕行非最短路径。
  22. 根据权利要求14至21任一所述的装置,其特征在于,所述装置还包括:
    获取模块,用于获取路由转发表,所述路由转发表包括路由前缀表和多个等价多路径ECMP组表;
    所述路由前缀表的每个表项包括目的IP地址与目的设备组的组索引的对应关系,所述组索引关联一个ECMP组表,所述ECMP组表包括从所述第一网络设备到所述目的设备组的每条路径分别对应的出接口,所述目的设备组为所述目的IP地址所属的终端设备的接入设备所在的设备组。
  23. 根据权利要求22所述的装置,其特征在于,所述ECMP组表包括第一路由子表和第二路由子表;
    所述第一路由子表包括从所述第一网络设备到所述目的设备组的每条路径分别对应的出接口,所述第一路由子表用于所述第一网络设备转发来自接入所述第一设备组的终端设备的报文;
    所述第二路由子表包括从所述第一网络设备到所述目的设备组的组间最短路径对应的出接口,所述第二路由子表用于所述第一网络设备转发来自接入除所述第一设备组以外的其它设备组的终端设备的报文。
  24. 根据权利要求23所述的装置,其特征在于,每个所述设备组分别对应一个自治系统AS号,所述获取模块,用于:
    当所述第一网络设备接收到第一路由消息,且所述第一路由消息的AS-path属性仅包含一个AS号时,在所述第一路由子表和所述第二路由子表中分别添加基于所述第一路由消息得到的转发表项;
    当所述第一网络设备接收到第二路由消息,且所述第二路由消息的AS-path属性包含两个AS号时,仅在所述第一路由子表中添加基于所述第二路由消息得到的转发表项。
  25. 根据权利要求22或23所述的装置,其特征在于,所述获取模块,用于:
    根据所述蜻蜓网络的组网拓扑、接入所述蜻蜓网络的终端设备的IP地址以及所述终端设备的接入设备,生成所述路由前缀表和多个所述ECMP组表。
  26. 根据权利要求22或23所述的装置,其特征在于,所述获取模块,用于:
    接收控制设备发送的所述路由转发表。
  27. 一种蜻蜓网络,其特征在于,包括:多个设备组,不同设备组之间具有多条组间互联链路,所述设备组中的网络设备用于执行如权利要求1至13任一所述的报文转发方法。
  28. 一种网络设备,其特征在于,包括:处理器和存储器;
    所述存储器,用于存储计算机程序,所述计算机程序包括程序指令;
    所述处理器,用于调用所述计算机程序,实现如权利要求1至13任一所述的报文转发方法。
  29. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质上存储有指令,当所述指令被处理器执行时,实现如权利要求1至13任一所述的报文转发方法。
  30. 一种计算机程序产品,其特征在于,包括计算机程序,所述计算机程序被处理器执行时,实现如权利要求1至13任一所述的报文转发方法。
PCT/CN2022/098019 2021-09-28 2022-06-10 报文转发方法及装置、蜻蜓网络 WO2023050874A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
EP22874272.2A EP4333380A1 (en) 2021-09-28 2022-06-10 Packet forwarding method and apparatus, and dragonfly network

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111142613.5 2021-09-28
CN202111142613.5A CN115914078A (zh) 2021-09-28 2021-09-28 报文转发方法及装置、蜻蜓网络

Publications (1)

Publication Number Publication Date
WO2023050874A1 true WO2023050874A1 (zh) 2023-04-06

Family

ID=85741142

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/098019 WO2023050874A1 (zh) 2021-09-28 2022-06-10 报文转发方法及装置、蜻蜓网络

Country Status (3)

Country Link
EP (1) EP4333380A1 (zh)
CN (1) CN115914078A (zh)
WO (1) WO2023050874A1 (zh)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117081984B (zh) * 2023-09-27 2024-03-26 新华三技术有限公司 一种路由调整方法、装置及电子设备
CN117155846B (zh) * 2023-10-31 2024-02-06 苏州元脑智能科技有限公司 互连网络的路由方法、装置、计算机设备及存储介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107896192A (zh) * 2017-11-20 2018-04-10 电子科技大学 一种SDN网络中区分业务优先级的QoS控制方法
CN107959633A (zh) * 2017-11-18 2018-04-24 浙江工商大学 一种工业实时网络中基于价格机制的多路径负载均衡方法
CN108156090A (zh) * 2018-03-15 2018-06-12 北京邮电大学 基于卫星dtn网络拥塞控制最优到达率路由方法
US20200236052A1 (en) * 2020-03-04 2020-07-23 Arvind Srinivasan Improving end-to-end congestion reaction using adaptive routing and congestion-hint based throttling for ip-routed datacenter networks
CN111711565A (zh) * 2020-07-01 2020-09-25 西安电子科技大学 面向高速互连蜻蜓+网络的多路径路由方法

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107959633A (zh) * 2017-11-18 2018-04-24 浙江工商大学 一种工业实时网络中基于价格机制的多路径负载均衡方法
CN107896192A (zh) * 2017-11-20 2018-04-10 电子科技大学 一种SDN网络中区分业务优先级的QoS控制方法
CN108156090A (zh) * 2018-03-15 2018-06-12 北京邮电大学 基于卫星dtn网络拥塞控制最优到达率路由方法
US20200236052A1 (en) * 2020-03-04 2020-07-23 Arvind Srinivasan Improving end-to-end congestion reaction using adaptive routing and congestion-hint based throttling for ip-routed datacenter networks
CN111711565A (zh) * 2020-07-01 2020-09-25 西安电子科技大学 面向高速互连蜻蜓+网络的多路径路由方法

Also Published As

Publication number Publication date
EP4333380A1 (en) 2024-03-06
CN115914078A (zh) 2023-04-04

Similar Documents

Publication Publication Date Title
US11588733B2 (en) Slice-based routing
EP2911348B1 (en) Control device discovery in networks having separate control and forwarding devices
WO2023050874A1 (zh) 报文转发方法及装置、蜻蜓网络
US7558214B2 (en) Mechanism to improve concurrency in execution of routing computation and routing information dissemination
US8503310B2 (en) Technique for policy conflict resolution using priority with variance
US8611251B2 (en) Method and apparatus for the distribution of network traffic
US8675656B2 (en) Scaling virtual private networks using service insertion architecture
US6584071B1 (en) Routing with service level guarantees between ingress-egress points in a packet network
US8630297B2 (en) Method and apparatus for the distribution of network traffic
JP6510115B2 (ja) 負荷分散を実現するための方法、装置、およびネットワークシステム
US9807035B1 (en) Using available bandwidths of an IP fabric to intelligently distribute data
US20070230369A1 (en) Route selection in a network
WO2007005307A1 (en) Mechanism to load balance traffic in an ethernet network
CN109450793B (zh) 一种业务流量调度的方法和装置
EP2675118B1 (en) Method and device for adjusting ip network load
US11863322B2 (en) Communication method and apparatus
WO2017084448A1 (zh) 一种网络系统及网络运行方法
GB2570698A (en) Data network
CN109286563B (zh) 一种数据传输的控制方法和装置
EP1185041A2 (en) OSPF autonomous system with a backbone divided into two sub-areas
US11070472B1 (en) Dynamically mapping hash indices to member interfaces
WO2015135284A1 (zh) 数据流转发的控制方法及系统、计算机存储介质
CN111245724A (zh) 基于虚拟交换机部署的sdn负载均衡路由方法
Faghani et al. Shortcut switching strategy in metro Ethernet networks
CN106330707B (zh) 网络控制方法及网络控制器

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22874272

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 2022874272

Country of ref document: EP

ENP Entry into the national phase

Ref document number: 2022874272

Country of ref document: EP

Effective date: 20231129