WO2022253087A1 - Data transmission method, node, network manager, and system - Google Patents

Data transmission method, node, network manager, and system Download PDF

Info

Publication number
WO2022253087A1
WO2022253087A1 PCT/CN2022/095142 CN2022095142W WO2022253087A1 WO 2022253087 A1 WO2022253087 A1 WO 2022253087A1 CN 2022095142 W CN2022095142 W CN 2022095142W WO 2022253087 A1 WO2022253087 A1 WO 2022253087A1
Authority
WO
WIPO (PCT)
Prior art keywords
node
transmission path
data message
data
intermediate node
Prior art date
Application number
PCT/CN2022/095142
Other languages
French (fr)
Chinese (zh)
Inventor
周超
徐世萍
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2022253087A1 publication Critical patent/WO2022253087A1/en

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L45/00Routing or path finding of packets in data switching networks
    • H04L45/22Alternate routing
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L45/00Routing or path finding of packets in data switching networks
    • H04L45/28Routing or path finding of packets in data switching networks using route fault recovery

Definitions

  • the present application relates to the technical field of communications, and in particular to a data transmission method, node, network manager and system.
  • the network architecture used by computing clusters for high-performance computing (high performance computing, HPC) services is usually an infinite bandwidth (infiniband, IB) network or remote direct memory access (DRAM) based on converged Ethernet.
  • IB infiniband
  • DRAM remote direct memory access
  • RDMA over converged ethernet (RoCE) network in which RoCE network allows the use of remote direct memory access (remote direct memory access, RDMA) technology in traditional Ethernet, based on the mature ecological implementation of traditional Ethernet, the cost has obvious advantages compared with IB network, And the version evolution rate is much faster than other types of networks, and will become the mainstream network selection for HPC services in the future.
  • the source routing forwarding mechanism is usually used to replace the traditional Internet protocol (internet protocol, IP) routing forwarding mechanism.
  • IP Internet protocol
  • the network controller first issues the source routing path of the transmission path to the source node, and the source routing path includes each node on the transmission path where the source node transmits the data packet to the destination node
  • the identifier of the outgoing port of each node in the network, and the outgoing port of each node in the network is directly connected to another node in the network.
  • each node on the transmission path can forward the data Message, to realize the transmission of data message from the source node to the destination node.
  • the source node and the destination node rely on periodic heartbeat connections to detect whether there is a link failure between the source node and the destination node, and re-request the network controller to configure the source node and the destination node after the failure is detected.
  • the repair time for faults is longer, and the user experience is poor.
  • the present application provides a data transmission method, a node, a network manager and a system, which are used to realize rapid recovery of transmission path failures.
  • the present application provides a data transmission system, the data transmission system can perform data transmission based on a RoCE network, including a network manager, a source node and a destination node; the source node is used to send the network manager A transmission path request, where the transmission path request includes the identifier of the destination node; the network manager is configured to determine a main transmission path and a backup path between the source node and the destination node according to the transmission path request a transmission path, before the source node transmits a data message to the destination node, sending a transmission path response including the primary transmission path and the standby transmission path to the source node; the source node is also used to record the The main transmission path and the transmission path of the backup transmission path respond, and when the failure occurs in the transmission of the data packet by the primary transmission path, the backup transmission path is used to transmit the data packet.
  • a RoCE network including a network manager, a source node and a destination node
  • the source node is used to send the network manager A transmission path request, where the
  • the network manager after receiving the transmission path request from the source node, the network manager not only determines the main transmission path for the data message transmission of the source node, but also determines one or more backup transmission paths for the data message transmission of the source node , and return the main transmission path and the backup transmission path to the source node.
  • the source node can use the backup transmission path to transmit data packets, and there is no need to request the network manager to reconfigure the transmission path, which is conducive to the realization of transmission Quick repair of path faults improves user experience.
  • the main transmission path and the backup transmission path share the least number of nodes between the source node and the destination node.
  • the data transmission system further includes: a first intermediate node, wherein the first intermediate node is any one located between the source node and the destination node in the main transmission path node; the first intermediate node is configured to determine a fallback path according to the data message; the first intermediate node is also configured to determine the fallback path according to the fallback path Send a fault notification message to the source node.
  • the first intermediate node when it detects the failure of the main transmission path of the transmission data message, it sends a failure notification message to the source node, which is conducive to the rapid perception of the failure of the main transmission path by the source node and realizes the rapid detection of the failure of the transmission path. repair.
  • the data transmission system further includes: one or more second intermediate nodes, wherein the one or more second intermediate nodes are located between the source node and the main transmission path A node between the first intermediate nodes; when the first intermediate node determines a fallback path according to the data message, it is specifically used to: according to the data message carried in the data message, in the One or more identifiers of ingress ports of the second intermediate node, to determine the fallback path.
  • the first intermediate node is further configured to, before forwarding the data message to the next-hop node adjacent to the first intermediate node in the main transmission path, An identifier of an ingress port of the data message at the first intermediate node is added to the data message.
  • the first intermediate node when the first intermediate node forwards the data message, it adds the identification of the ingress port of the data message in the data message to the data message, which is beneficial to the subsequent nodes of the first intermediate node when they find that the transmission path is faulty. , to trace back the source of the fault notification message.
  • the first intermediate node when the first intermediate node adds the identifier of the ingress port of the data packet in the data packet to the data packet, it is specifically used to send the data packet
  • the identifier of the egress port of the first intermediate node recorded in the source routing label carried in the file is replaced by the identifier of the ingress port.
  • the outbound port field in the source routing label carried by the multiplexed data message carries the inbound port identifier, which is beneficial to avoid modification of the data message transmission format and improve the forwarding efficiency of the data message.
  • the present application provides a data transmission method, which includes: a source node sends a transmission path request to a network manager, and the transmission path request includes an identifier of a destination node; The transmission path response of the device including the main transmission path and the backup transmission path, and record the transmission path response including the main transmission path and the backup transmission path; when the main transmission path fails to transmit data packets, the The source node transmits the data packet by using the standby transmission path.
  • the method further includes: when the source node receives a failure notification message from the first intermediate node, determining that a failure occurs in the data message transmitted by the main transmission path, wherein the first An intermediate node is any node located between the source node and the destination node in the main transmission path.
  • the present application provides a data transmission method, the method comprising: a network manager receives a transmission path request from a source node, and the transmission path request includes an identifier of a destination node; A main transmission path and a standby transmission path are determined between the node and the destination node; the network manager sends a message including the main transmission path and the backup transmission path to the source node before the source node transmits a data message to the destination node.
  • the transport path response for the transport path comprising: a network manager receives a transmission path request from a source node, and the transmission path request includes an identifier of a destination node; A main transmission path and a standby transmission path are determined between the node and the destination node; the network manager sends a message including the main transmission path and the backup transmission path to the source node before the source node transmits a data message to the destination node.
  • the transport path response for the transport path for the transport path.
  • the present application provides a data transmission method, the method comprising: a first intermediate node receiving a data message from a source node, wherein the first intermediate node is located between the source node and the data message
  • the destination node transmits the node in the main transmission path of the data message; the first intermediate node determines the fallback path according to the data message; when detecting that the main transmission path fails, the first intermediate node
  • the intermediate node sends a fault notification message to the source node according to the fallback path.
  • the first intermediate node determines a fallback path according to the data message, including: the first intermediate node determines the fallback path according to the data message carried in the data message in a or a plurality of identifications of ingress ports of second intermediate nodes, and determine the fallback path, wherein the one or more second intermediate nodes are located between the source node and the first intermediate node in the main transmission path nodes between.
  • the method further includes: before forwarding the data packet to the next-hop node adjacent to the first intermediate node in the main transmission path, the first intermediate node The node adds the identifier of the ingress port of the data packet at the first intermediate node to the data packet.
  • the first intermediate node adds an identifier of the ingress port of the data message in the data message to the data message, including: the first intermediate node will The identifier of the egress port of the first intermediate node recorded in the source routing label carried in the data packet is replaced with the identifier of the ingress port.
  • the embodiment of the present application provides a data transmission device, which has the function of implementing each step in the above-mentioned second aspect and any possible design of the second aspect.
  • the function can be realized by hardware, or by The hardware executes the corresponding software implementation.
  • the hardware or software includes one or more units (modules) corresponding to the above functions, such as a communication unit and a processing unit.
  • the device may be a chip or an integrated circuit.
  • the device includes a processor and an interface circuit, the processor is coupled to the interface circuit, and is used to realize the functions of each step in the second aspect and any possible design of the second aspect .
  • the interface circuit may be a transceiver or an input/output interface.
  • the device may further include a memory, where the memory stores a program executable by the processor for realizing the functions of each step in the above-mentioned second aspect and any possible design of the second aspect.
  • the device may be a source node.
  • the embodiment of the present application provides a data transmission device, which has the function of realizing each step in the third aspect above, and the function can be realized by hardware, or by executing corresponding software by hardware.
  • the hardware or software includes one or more units (modules) corresponding to the above functions, such as a communication unit and a processing unit.
  • the device may be a chip or an integrated circuit.
  • the device includes a processor and an interface circuit, the processor is coupled to the interface circuit, and is configured to implement the functions of the steps in the above third aspect.
  • the interface circuit may be a transceiver or an input-output interface.
  • the device may further include a memory storing a program executable by the processor for realizing the functions of each step in the above third aspect.
  • the device may be a network manager.
  • the embodiment of the present application provides a data transmission device, which has the function of implementing each step in the fourth aspect and any possible design of the fourth aspect.
  • the function can be realized by hardware, or by The hardware executes the corresponding software implementation.
  • the hardware or software includes one or more units (modules) corresponding to the above functions, such as a communication unit and a processing unit.
  • the device may be a chip or an integrated circuit.
  • the device includes a processor and an interface circuit, the processor is coupled to the interface circuit, and is used to realize the functions of each step in the fourth aspect and any possible design of the fourth aspect .
  • the interface circuit may be a transceiver or an input/output interface.
  • the device may further include a memory, the memory stores a program executable by the processor for realizing the function of each step in the fourth aspect and any possible design of the fourth aspect.
  • the device may be a first intermediate node.
  • the embodiment of the present application also provides a computer program, which, when the computer program is run on a computer, causes the computer to execute any one of the above-mentioned second to fourth aspects and the second to fourth aspects Methods provided in Possible Designs.
  • the embodiment of the present application also provides a computer-readable storage medium, where a computer program is stored in the computer-readable storage medium, and when the computer program is executed by a computer, the computer executes the above-mentioned second to the fourth aspect and the method provided in any possible design of the second to fourth aspects.
  • the embodiment of the present application also provides a chip, the chip is used to read the computer program stored in the memory, and execute any one of the possible tasks in the second to fourth aspects and the second to fourth aspects above. method provided in the design.
  • FIG. 1 is a schematic diagram of a source routing forwarding mechanism
  • FIG. 2 is a schematic diagram of the principle of an ECMP mechanism provided in the embodiment of the present application.
  • FIG. 3 is a schematic diagram of the principle of an AR mechanism provided by an embodiment of the present application.
  • FIG. 4 is one of the schematic diagrams of the network architecture provided by the embodiment of the present application.
  • FIG. 5 is one of the schematic diagrams of the architecture of the data transmission system provided by the embodiment of the present application.
  • FIG. 6 is a schematic flowchart of a data transmission method provided by an embodiment of the present application.
  • FIG. 7 is the second schematic diagram of the network architecture provided by the embodiment of the present application.
  • FIG. 8 is one of schematic diagrams of a source routing label in an embodiment of the present application.
  • FIG. 9 is a second schematic diagram of a source routing label according to an embodiment of the present application.
  • FIG. 10 is one of the schematic diagrams of the data transmission device provided by the embodiment of the present application.
  • Figure 11 is the second schematic diagram of the data transmission device provided by the embodiment of the present application.
  • FIG. 12 is the second schematic diagram of the architecture of the data transmission system provided by the embodiment of the present application.
  • Host also known as service node, communication node, etc.
  • can be used to process business data has the function of sending and receiving, and can send data to other hosts and/or receive data from other hosts.
  • the host can be a server, a big data platform, the cloud, a cloud server, a server cluster, a terminal device or a computer device, etc., or it can be a component of these devices, such as a chip or a chip system.
  • computer equipment which may be referred to as a computer for short, refers to equipment with processing functions such as network data storage, data transmission, and data reception.
  • transmission node also known as intermediate node, switching node, etc.
  • transmission node is a device with data exchange (forwarding) function, which can be a switch, router, gateway and other devices, or other devices with data exchange
  • the IP routing and forwarding mechanism usually refers to the mechanism based on message information (such as the destination IP address, etc.) using a routing selection algorithm (such as a hash algorithm, etc.) to implement routing and forwarding.
  • a routing selection algorithm such as a hash algorithm, etc.
  • the ECMP mechanism uses the hash method to calculate the outgoing ports of different data streams forwarded based on the five-tuple (source IP address, source port, destination IP address, destination port, and transport layer protocol), and completes each data stream and One-to-one mapping of end-to-end transmission paths, different data streams are evenly distributed to each end-to-end transmission path.
  • the egress port of each ECMP hash is also uniquely determined, and the end-to-end transmission path of the flow is finally uniquely determined.
  • the biggest problem with the ECMP load sharing scheme is that when the traffic in the network is unevenly distributed (mixed elephant and mouse flows), treating large flows and small flows as equivalent and assigning them to different transmission paths will cause each transmission The load is heavily unbalanced among the paths.
  • the AR mechanism refers to the ECMP mechanism.
  • the judgment of the congestion status of the outgoing port queue is added. If the congestion exceeds the threshold, it is adjusted to other ports. The load is severely unbalanced.
  • both the ECMP mechanism and the AR mechanism require each intermediate node on the transmission path to maintain the global routing table of the entire network and execute the routing algorithm, which is difficult to maintain and implement, and the transmission delay is relatively large because each intermediate node needs to execute the routing algorithm. , not as simple in concept as the source routing mechanism, and easy to implement.
  • Source routing which can also be called a source routing mechanism, that is, the source node of data transmission (ie, the sending node) can specify some or all of the transmission nodes (or intermediate nodes) that the sent message passes along the way.
  • the outgoing port A of the source node is connected to the intermediate node 1
  • the outgoing port B of the intermediate node 1 is connected to the intermediate node 2
  • the outgoing port C of the intermediate node 2 is connected to the intermediate node 3
  • the outgoing port B of the intermediate node 3 is connected to the intermediate node 3.
  • the outgoing port D is connected to the intermediate node 4, the outgoing port E of the intermediate node 4 is connected to the intermediate node 5, and the outgoing port F of the intermediate node 5 is connected to the destination node.
  • Port A, the outgoing port B of the intermediate node 1, the outgoing port C of the intermediate node 2, the outgoing port D of the intermediate node 3, the outgoing port E of the intermediate node 4, and the outgoing port F of the intermediate node 5 (that is, the identification of the transmission path Source routing path) indicates that each hop node that the data packet passes through from the source node is intermediate node 1, intermediate node 2, intermediate node 3, intermediate node 4, intermediate node 5 and the destination node.
  • the source node can only rely on the periodic heartbeat connection with the destination node to detect whether there is a link failure between the source node and the destination node, and re-request the network controller to send the source node after the failure is detected.
  • the efficiency of fault detection is low, and after the fault is found, the network controller needs to re-scan the network architecture to configure the new transmission path, which takes a long time. In this case, It is especially serious when the network architecture is complex, and it is difficult to quickly repair transmission path failures.
  • the embodiment of the present application provides a data transmission solution, which is used to determine the main transmission path and the backup transmission path at the source node and the destination node through the network manager at the same time, so as to realize the rapid recovery after the failure of the transmission path.
  • FIG. 4 it is an example of a data transmission system under a possible fat-tree (Fat-Tree) network architecture applicable to the embodiment of the present application, including multiple hosts (such as host H11, host H12, etc.), multiple intermediate Nodes (such as intermediate node E11, intermediate node A11, intermediate node C11, etc.) and a network manager (fabric manager, FM).
  • hosts such as host H11, host H12, etc.
  • multiple intermediate Nodes such as intermediate node E11, intermediate node A11, intermediate node C11, etc.
  • a network manager fabric manager, FM
  • the intermediate nodes can also be called transmission nodes, and the intermediate nodes of the leaf (leaf) layer and the spine (spine) layer can be divided into different delivery units (point of delivery, POD) or clusters, and each leaf in each POD
  • the intermediate nodes of each layer are connected to the intermediate nodes of each spine layer; at the same time, the intermediate nodes of each spine layer can be connected to the intermediate nodes of one or more super-spine layers, so that different PODs can pass through the super-spine
  • the intermediate nodes of the (super-spine) layer are connected.
  • the network manager has a network management function, is directly or indirectly connected to each intermediate node and host computer in the network, and can provide network transmission path configuration.
  • any host in FIG. 4 can be used as a source node of data transmission, that is, a data sending end, or as a destination node of data transmission, that is, a data receiving end.
  • the host (such as host H11, host H12 etc.) among Fig. , intermediate node C11, etc.) may include components such as a switch (Switch) component and a switch agent (Switch-agent) component, and the network manager may include a global network manager controller (FM-controller) component, a global network manager management (FM-manager) components and other components.
  • switch switch
  • switch-agent switch agent
  • FM-controller global network manager controller
  • FM-manager global network manager management
  • the components of the network manager (FM), host (Host) and intermediate nodes (such as switches (Switch)) in the data transmission system can be divided into forwarding plane, control plane and
  • the management plane has three levels, where the forwarding plane can include the RNIC network card on the host side and the Switch component on the intermediate node side; the control plane includes the Host-agent component on the host side, the Switch-agent component on the intermediate node side, and the FM on the network manager side -controller component; the management plane includes the FM-manager component on the network manager side.
  • the FM-manager component can deliver port configurations (port config) to intermediate nodes, and can also learn the port link status (port link status) and port input/output adaptation rate (port input/output utility rate) between intermediate nodes, etc. information, and can store network topology (network topology) and other information in a database (data base, DB).
  • the data transmission system can perform data transmission based on the RoCE network.
  • the RNIC network card on the host side can request a transmission path from the FM-controller component, and the FM-controller component returns the main transmission path to the RNIC network card on the host side. path and backup transmission path, and then the RNIC network card on the host side can transmit data packets through the main transmission path and backup transmission path and other RNIC network cards on the host side.
  • FIG. 6 is a schematic diagram of a data transmission method provided in an embodiment of the present application, the method including:
  • the source node sends a transmission path request to a network manager, and the network manager receives the transmission path request.
  • the transmission path request includes the identifier of the destination node.
  • an application such as an HPC application
  • the source node when an application (such as an HPC application) in the source node (such as the host H11) is started, it will trigger the source node to initiate a network connection (such as an RDMA connection) to the destination node (such as the host H21), for the source
  • a network connection such as an RDMA connection
  • the destination node such as the host H21
  • the data message of a certain service is transmitted between the node and the destination node.
  • the source node may send a transmission path request including the identifier of the destination node to the network manager, requesting the network manager to configure the connection to the destination node for the source node.
  • the transmission path of the data message sent by the node when an application in the source node is started and initiates a network connection to the destination node.
  • the RDMA network interface card (RDMA network interface card, RNIC) of the source node may send the identification of the destination node to the network manager.
  • the request for a transmission path, wherein the identifier of the destination node may be an identity document (ID), IP address, etc. of the destination node, which is not limited in this application.
  • the network manager determines a primary transmission path and a standby transmission path between the source node and the destination node, and before the source node transmits a data packet to the destination node, sends a message including the Transmission path responses of the primary transmission path and the standby transmission path.
  • the network manager has a network management function, and is directly or indirectly connected to each intermediate node (also called a transmission node) and a host in the network, and can provide network transmission path configuration, performance control of each node, etc. Function. Between each node in the network (comprising intermediate node and host computer, and between intermediate node and intermediate node) can be connected by port, as shown in Figure 7, intermediate node E11 can be connected with the port 101 of intermediate node A11 by port 16.
  • the port connections between nodes in the network can be configured by network managers when nodes (including hosts and intermediate nodes) access the network, and the nodes report to the network manager, or can be configured by network managers through the network manager , and sent to corresponding nodes by the network manager, which is not limited in this application.
  • the network manager can determine the minimum transmission path between the source node and the destination node based on the topology of the managed network. path, that is, multiple transmission paths do not include repeated intermediate nodes as much as possible.
  • the transmission path determined by the network manager includes transmission path 1: source node H11-intermediate node E11-intermediate node A11-intermediate node C11-intermediate node A21-intermediate node E21-destination node H21, and transmission path 2 : source node H11-intermediate node E11-intermediate node A12-intermediate node C22-intermediate node A22-intermediate node E21-destination node H21, wherein transmission path 1 and transmission path 2 are in the spine (spine) layer and super spine (super spine) Layers have no intersections, that is, no shared intermediate nodes.
  • one of the transmission paths can be determined as the main transmission path, and the other transmission paths are determined as backup transmission paths, and the transmission path including the Transmission path responses of the primary transmission path and the standby transmission path.
  • the network manager may use the transmission path with the least number of intermediate nodes as the primary transmission path according to the principle that the number of intermediate nodes is the least There are multiple transmission paths with the least number of nodes, and one of the transmission paths with the fewest number of intermediate nodes can be randomly selected as the main transmission path), and the other transmission paths can be used as backup transmission paths; of course, according to the principle of the smallest average load, The principle of the lowest transmission delay is to select the main transmission path among multiple transmission paths, and the other transmission paths are used as backup transmission paths.
  • the network manager may also directly send a transmission path response including the multiple transmission paths to the source node, and the source node selects one of the multiple transmission paths as the primary transmission path, and the other transmission paths as backup transmission paths.
  • the IP addresses of source node H11, intermediate node E11, intermediate node A11, intermediate node C11, intermediate node A21, intermediate node E21, and destination node H21 are IP H11, IP E11, IP A11, IP C11, IP A21, IP E21, IP H21.
  • the source node After the source node obtains the primary transmission path and the standby transmission path from the network manager, it records (or carries) the source routing path of the primary transmission path in the source routing label carried in the data message, and transmits to the destination node through the primary transmission path. datagram.
  • a heartbeat connection may also be maintained between the source node and the destination node to detect whether a failure occurs on the first transmission path.
  • the source node can send a heartbeat request message to the destination node through the main transmission path according to the set period (such as 1s, 2s), etc., if the heartbeat response message replied by the destination node is received within the specified time (such as within 1ms) , it means that the main transmission path can normally transmit packets without failure, otherwise, it is determined that the main transmission path is faulty.
  • the intermediate node can mark the data message on the ingress port of the node Added to the data message to facilitate the traceability and rollback of the fault notification message when the transmission path fails.
  • an intermediate node when it forwards a data packet, it may replace the identifier of the outgoing port of the intermediate node recorded in the source routing label with the identifier of the incoming port of the data packet at the intermediate node.
  • the data message carries the source routing label (that is, the message header) as shown in Figure 8, and the intermediate node E11 receives the data sent by the source node H11 through the port 102 (ingress port) message, when the intermediate node E11 forwards the data message through its own port 18 (outlet port) according to HOP1 (IP E11+18), HOP1 is changed from IP E11+18 to IP E11+102, similarly, the intermediate node C11 When forwarding data packets, modify HOP2 from IP A11+19 to IP A11+103, etc.
  • the intermediate node When any intermediate node in the main transmission path is detected to have a link failure on the outgoing port and the data message cannot be forwarded, the intermediate node will construct a failure notification (FN) message and forward it back from the incoming port of the data message. Return the fault notification message.
  • the failure notification message includes the source routing path of the fallback path that sends the failure notification message to the source node (i.e. specifying the outgoing port of each hop on the fallback path), wherein the source routing path of the fallback path can be obtained from the datagram obtained from the text.
  • the intermediate node on the main transmission path that detects the failure of the outgoing port link is referred to as the first intermediate node.
  • the data message carrying the source routing label shown in FIG. 7 After forwarding by the intermediate node A11 and the intermediate node C11 (multiple second intermediate nodes located between the source node and the first intermediate node), the intermediate node A21 (first intermediate node) on the main transmission path receives the data message.
  • the intermediate node E11, the intermediate node A11 and the intermediate node C11 forward the data message
  • the identification of the outgoing port of the intermediate node recorded in the source routing label of the data message is replaced by the inbound port of the data message in the intermediate node.
  • the identification of the port and the source route carried by the data packet received by the intermediate node A21 are shown in FIG. 9 .
  • the intermediate node A21 determines that the source routing path of the fallback path is Hop 0(IP A21+105), Hop1( IP C11+104), Hop2(IP A11+103), Hop3(IP E11+102).
  • the intermediate node A21 constructs and generates a fault notification message, and the source routing label carried by the fault notification message contains Hop 0 (IP A21+105), Hop1 (IP C11+104), Hop2 (IP A11+103), Hop3 (IP E11+102), the intermediate node A21 sends a fault notification message to the intermediate node C11 from the incoming port of the data message, namely port 105, according to the Hop1(IP C11+104) recorded in the source routing label carried by the fault notification message ), Hop2 (IP A11+103), Hop3 (IPE11+102), intermediate node C11, intermediate node A11, and intermediate node E11 forward the fault notification message hop by hop until the fault notification message is received by the host H11, and the transmission data is determined The main transmission path of the packet is faulty.
  • intermediate nodes can reserve resources (such as reserved bandwidth resources, forwarding queue resources, etc.) Trace back.
  • the source node can transmit data packets of different services with multiple destination nodes at the same time.
  • the fault notification message can also carry the information of the service to which the corresponding data message belongs.
  • Service flow identification wherein the service flow identification of the service to which the data message belongs can be obtained from the flow label (flow label) field in the message header of the data message, etc., and the service flow identification can be the corresponding source IP, Destination IP + queue pair (queue pair, QP) identifier (identifier, ID), etc.
  • the source node After the source node detects the failure of the main transmission path, it switches the transmission of data packets from the main transmission path to the standby transmission path issued by the network manager, that is, the source node uses the standby transmission path for data transmission, which can realize the fast transmission path. Failover to achieve the goal of fast convergence of business flows.
  • the source routing path in the source routing label carried by the data packet is switched from the source routing path of the primary transmission path to the source routing path of the standby transmission path, which can realize Fast failover of the transmission path achieves the goal of fast convergence of business flows.
  • the data transmission scheme provided by the embodiment of the present application is not only applicable to the Fat-Tree network architecture as shown in Figure 4, but also applicable to 3D/6D ring (torus) topology network architecture, dragonfly topological network architecture, etc.
  • each network element includes a corresponding hardware structure and/or software module (or unit) for performing each function.
  • the present application can be implemented in the form of hardware or a combination of hardware and computer software in combination with the units and algorithm steps of each example described in the embodiments disclosed herein. Whether a certain function is executed by hardware or computer software drives hardware depends on the specific application and design constraints of the technical solution. Those skilled in the art may use different methods to implement the described functions for each specific application, but such implementation should not be regarded as exceeding the scope of the present application.
  • FIG. 10 and FIG. 11 are schematic structural diagrams of a possible data transmission device provided by an embodiment of the present application. These data transmission devices can be used to implement the functions of the source node, the network manager, or the first intermediate node in the above method embodiments, and thus can also realize the beneficial effects of the above method embodiments.
  • the data transmission device may be the source node or the network manager or the first intermediate node, and may also be a module (such as a chip) applied to the source node or the network manager or the first intermediate node.
  • the data transmission device 1000 may include: a processing unit 1002 and a communication unit 1003 , and may also include a storage unit 1001 .
  • the data transmission apparatus 1000 is configured to realize the functions of the source node or the network manager or the first intermediate node in the above method embodiments.
  • the processing unit 1002 is configured to implement corresponding processing functions.
  • the communication unit 1003 is used to support communication between the data transmission device 1000 and other network entities.
  • the storage unit 1001 is configured to store program codes and/or data of the data transmission device 1000 .
  • the communication unit 1003 may include a receiving unit and/or a sending unit, configured to perform receiving and sending operations respectively.
  • the communication unit 1003 is used to send a transmission path request to the network manager, and the transmission path request includes the identity of the destination node;
  • the transmission path response of the network manager including the main transmission path and the backup transmission path;
  • the processing unit 1002 is configured to record a transmission path response including the primary transmission path and the backup transmission path;
  • the communication unit 1003 is further configured to use the standby transmission path to transmit the data packet when the primary transmission path fails to transmit the data packet.
  • the processing unit 1002 is further configured to, when the communication unit 1003 receives a failure notification message from the first intermediate node, determine that a failure occurs in the data message transmitted by the main transmission path, Wherein the first intermediate node is any node located between the source node and the destination node in the main transmission path.
  • the communication unit 1003 is configured to receive a transmission path request from the source node, and the transmission path request includes the identifier of the destination node;
  • the processing unit 1002 is configured to determine a primary transmission path and a standby transmission path between the source node and the destination node;
  • the communication unit 1003 is further configured to send a transmission path response including the primary transmission path and the standby transmission path to the source node before the source node transmits the data packet to the destination node.
  • the communication unit 1003 is configured to receive a data message from the source node, wherein the first intermediate node is located at the source node A node in the main transmission path for transmitting the data message with the destination node of the data message;
  • the processing unit 1002 is configured to determine a fallback path according to the data message
  • the communication unit 1003 is further configured to send a failure notification message to the source node according to the fallback path when a failure of the main transmission path is detected.
  • the processing unit 1002 determines the fallback path according to the data message, it is specifically configured to: An identification of an ingress port of an intermediate node to determine the fallback path, wherein the one or more second intermediate nodes are nodes located between the source node and the first intermediate node in the main transmission path.
  • the processing unit 1002 is further configured to forward the data message to a next-hop node adjacent to the first intermediate node in the main transmission path in the communication unit 1003 Before, adding the identifier of the ingress port of the data packet at the first intermediate node to the data packet.
  • the processing unit 1002 is specifically configured to carry the The identifier of the egress port of the first intermediate node recorded in the source routing label of , is replaced by the identifier of the ingress port.
  • processing unit 1002 and the communication unit 1003 can be directly obtained by referring to related descriptions in the method embodiments, and details are not repeated here.
  • the data transmission device 1100 includes a processor 1110 and an interface circuit 1120 .
  • the processor 1110 and the interface circuit 1120 are coupled to each other. It can be understood that the interface circuit 1120 may be an input and output interface.
  • the data transmission device 1100 may further include a memory 1130 for storing instructions executed by the processor 1110 or storing input data required by the processor 1110 to execute the instructions or storing data generated by the processor 1110 after executing the instructions.
  • the processor 1110 is used to implement the functions of the above-mentioned processing unit 1002
  • the interface circuit 1120 is used to implement the above-mentioned communication unit 1003 function.
  • a computer-readable storage medium on which instructions are stored, and when the instructions are executed, the above method can be executed data transfer method.
  • a computer program product including instructions is provided, and when the instructions are executed, the data transmission method applicable to the source node or the network manager or the first intermediate node in the above method embodiments can be executed.
  • a chip is provided, and when the chip is running, it can execute the data transmission method applicable to the source node or the network manager or the first intermediate node in the above method embodiments.
  • Fig. 12 is a schematic diagram of the architecture of a data transmission system provided by an embodiment of the present application.
  • the data transmission system includes a network manager, a source node, a destination node, and a first intermediate node, wherein the network manager has the above-mentioned functions for realizing The function data transmission device of the network manager in the method embodiment, the source node has the above-mentioned function data transmission device for realizing the source node in the method embodiment, and the first intermediate node has the above-mentioned function data transmission device for realizing the first intermediate node in the method embodiment Function data transmission device.
  • the above-mentioned embodiments may be implemented in whole or in part by software, hardware, firmware or other arbitrary combinations.
  • the above-described embodiments may be implemented in whole or in part in the form of computer program products.
  • the computer program product includes one or more computer instructions. When the computer program instructions are loaded or executed on the computer, the processes or functions according to the embodiments of the present application will be generated in whole or in part.
  • the computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable devices.
  • the computer instructions may be stored in or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be transmitted from a website, computer, server or data center Transmission to another website site, computer, server, or data center by wired (eg, coaxial cable, optical fiber, digital subscriber line (DSL)) or wireless (eg, infrared, wireless, microwave, etc.).
  • the computer-readable storage medium may be any available medium that can be accessed by a computer, or a data storage device such as a server or a data center that includes one or more sets of available media.
  • the available media may be magnetic media (eg, floppy disk, hard disk, magnetic tape), optical media (eg, DVD), or semiconductor media.
  • the semiconductor medium may be a solid state drive (SSD).

Abstract

A data transmission method, a node, a network manager, and a system, for use in realizing quick repair of a transmission path fault. The method comprises: the network manager determines a main transmission path and a standby transmission path between a source node and a destination node according to a transmission path request from the source node which comprises an identifier of the destination node, and delivers the main transmission path and the standby transmission path to the source node; the source node transmits a data message by using the standby transmission path when the main transmission path fails to transmit the data message.

Description

一种数据传输方法、节点、网络管理器及系统A data transmission method, node, network manager and system
相关申请的交叉引用Cross References to Related Applications
本申请要求在2021年05月31日提交中国专利局、申请号为202110604511.4、申请名称为“一种数据传输方法、节点、网络管理器及系统”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of the Chinese patent application submitted to the China Patent Office on May 31, 2021, with the application number 202110604511.4 and the application name "A data transmission method, node, network manager and system", the entire contents of which are passed References are incorporated in this application.
技术领域technical field
本申请涉及通信技术领域,尤其涉及一种数据传输方法、节点、网络管理器及系统。The present application relates to the technical field of communications, and in particular to a data transmission method, node, network manager and system.
背景技术Background technique
在企业数据中心或超级运算中心中,面向高性能计算(high performance computing,HPC)业务的计算集群采用的网络架构通常为无限带宽(infiniband,IB)网络或者基于融合以太网的远程直接内存访问(RDMA over converged ethernet,RoCE)网络,其中RoCE网络允许在传统以太网中使用远程直接内存访问(remote direct memory access,RDMA)技术,基于传统以太网的成熟生态实现,成本相比IB网络优势明显,且版本演进速度远高于其他类型的网络,将会成为HPC业务未来主流的网络选型。In enterprise data centers or supercomputing centers, the network architecture used by computing clusters for high-performance computing (high performance computing, HPC) services is usually an infinite bandwidth (infiniband, IB) network or remote direct memory access (DRAM) based on converged Ethernet. RDMA over converged ethernet (RoCE) network, in which RoCE network allows the use of remote direct memory access (remote direct memory access, RDMA) technology in traditional Ethernet, based on the mature ecological implementation of traditional Ethernet, the cost has obvious advantages compared with IB network, And the version evolution rate is much faster than other types of networks, and will become the mainstream network selection for HPC services in the future.
目前,为了优化RoCE网络的转发性能,通常采用源路由转发机制替代传统的互联网协议(internet protocol,IP)路由转发机制。如图1所示,在进行数据报文传输前,网络控制器首先向源节点下发传输路径的源路由路径,源路由路径中包括源节点向目的节点传输数据报文的传输路径上各节点的出端口的标识,而网络中每个节点的出端口直连网络中的一个其它节点,通过在数据报文中携带所述源路由路径的信息,位于传输路径上的各节点即可转发数据报文,实现数据报文从源节点到目的节点的传输。然而,现有源路由机制,源节点和目的节点依靠周期性的心跳连接来检测源节点和目的节点之间是否存在链路故障,并在检测到故障后重新请求网路控制器配置源节点和目的节点之间的源路由,故障的修复时间较长,用户体验较差。At present, in order to optimize the forwarding performance of the RoCE network, the source routing forwarding mechanism is usually used to replace the traditional Internet protocol (internet protocol, IP) routing forwarding mechanism. As shown in Figure 1, before the data packet is transmitted, the network controller first issues the source routing path of the transmission path to the source node, and the source routing path includes each node on the transmission path where the source node transmits the data packet to the destination node The identifier of the outgoing port of each node in the network, and the outgoing port of each node in the network is directly connected to another node in the network. By carrying the information of the source routing path in the data message, each node on the transmission path can forward the data Message, to realize the transmission of data message from the source node to the destination node. However, in the existing source routing mechanism, the source node and the destination node rely on periodic heartbeat connections to detect whether there is a link failure between the source node and the destination node, and re-request the network controller to configure the source node and the destination node after the failure is detected. For source routing between destination nodes, the repair time for faults is longer, and the user experience is poor.
发明内容Contents of the invention
本申请提供一种数据传输方法、节点、网络管理器及系统,用以实现传输路径故障的快速修复。The present application provides a data transmission method, a node, a network manager and a system, which are used to realize rapid recovery of transmission path failures.
第一方面,本申请提供了一种数据传输系统,所述数据传输系统可以基于RoCE网络进行数据传输,包括网络管理器、源节点和目的节点;所述源节点,用于向网络管理器发送传输路径请求,所述传输路径请求包括所述目的节点的标识;所述网络管理器,用于根据所述传输路径请求,在所述源节点和所述目的节点之间确定主传输路径和备传输路径,在源节点向目的节点传输数据报文之前,向所述源节点发送包括所述主传输路径和所述备传输路径的传输路径响应;所述源节点,还用于记录包括所述主传输路径和所述备传输路径的传输路径响应,并当所述主传输路径传输所述数据报文发生故障时,利用所述备传输路径传输所述数据报文。In a first aspect, the present application provides a data transmission system, the data transmission system can perform data transmission based on a RoCE network, including a network manager, a source node and a destination node; the source node is used to send the network manager A transmission path request, where the transmission path request includes the identifier of the destination node; the network manager is configured to determine a main transmission path and a backup path between the source node and the destination node according to the transmission path request a transmission path, before the source node transmits a data message to the destination node, sending a transmission path response including the primary transmission path and the standby transmission path to the source node; the source node is also used to record the The main transmission path and the transmission path of the backup transmission path respond, and when the failure occurs in the transmission of the data packet by the primary transmission path, the backup transmission path is used to transmit the data packet.
采用上述方法,网络管理器在接收到源节点的传输路径请求后,除了为源节点的数据报文传输确定主传输路径外,还为源节点的数据报文传输确定一条或多条备传输路径,并向源节点返回主传输路径和备传输路径,源节点在主传输路径发生故障时,即可利用备传输路径传输数据报文,无需再请求网络管理器重新配置传输路径,有利于实现传输路径故障的快速修复,提高用户体验。Using the above method, after receiving the transmission path request from the source node, the network manager not only determines the main transmission path for the data message transmission of the source node, but also determines one or more backup transmission paths for the data message transmission of the source node , and return the main transmission path and the backup transmission path to the source node. When the main transmission path fails, the source node can use the backup transmission path to transmit data packets, and there is no need to request the network manager to reconfigure the transmission path, which is conducive to the realization of transmission Quick repair of path faults improves user experience.
在一种可能的设计中,所述主传输路径和所述备传输路径之间尽可能的无交叉,也即主传输路径和备传输路径在源节点和目的节点之间共用的节点最少。In a possible design, there is no intersection between the main transmission path and the backup transmission path as much as possible, that is, the main transmission path and the backup transmission path share the least number of nodes between the source node and the destination node.
上述设计中,有利于避免因同一节点故障,导致主传输路径和备传输路径均无法传输数据报文的情况。In the above design, it is beneficial to avoid the situation that both the primary transmission path and the backup transmission path cannot transmit data packets due to failure of the same node.
在一种可能的设计中,所述数据传输系统还包括:第一中间节点,其中所述第一中间节点是所述主传输路径中位于所述源节点和所述目的节点之间的任意一个节点;所述第一中间节点,用于根据所述数据报文,确定回退路径;所述第一中间节点,还用于当检测到所述主传输路径故障时,根据所述回退路径向所述源节点发送故障通告报文。In a possible design, the data transmission system further includes: a first intermediate node, wherein the first intermediate node is any one located between the source node and the destination node in the main transmission path node; the first intermediate node is configured to determine a fallback path according to the data message; the first intermediate node is also configured to determine the fallback path according to the fallback path Send a fault notification message to the source node.
上述设计中,第一中间节点在检测到传输数据报文的主传输路径故障时,向源节点发送故障通告报文,有利于源节点对主传输路径故障的快速感知,实现传输路径故障的快速修复。In the above design, when the first intermediate node detects the failure of the main transmission path of the transmission data message, it sends a failure notification message to the source node, which is conducive to the rapid perception of the failure of the main transmission path by the source node and realizes the rapid detection of the failure of the transmission path. repair.
在另一种可能的设计中,所述数据传输系统还包括:一个或多个第二中间节点,其中所述一个或多个第二中间节点是所述主传输路径中位于所述源节点和所述第一中间节点之间的节点;所述第一中间节点根据所述数据报文,确定回退路径时,具体用于根据所述数据报文中携带的所述数据报文在所述一个或多个第二中间节点的入端口的标识,确定所述回退路径。In another possible design, the data transmission system further includes: one or more second intermediate nodes, wherein the one or more second intermediate nodes are located between the source node and the main transmission path A node between the first intermediate nodes; when the first intermediate node determines a fallback path according to the data message, it is specifically used to: according to the data message carried in the data message, in the One or more identifiers of ingress ports of the second intermediate node, to determine the fallback path.
上述设计中,通过在数据报文中携带数据报文在第二中间节点的入端口的标识,有利于第一中间节点对故障通告报文的快速溯源回退。In the above design, by carrying the identification of the ingress port of the data message in the second intermediate node in the data message, it is beneficial for the first intermediate node to quickly trace back the source of the fault notification message.
在另一种可能的设计中,所述第一中间节点,还用于在向所述主传输路径中与所述第一中间节点相邻的下一跳节点转发所述数据报文之前,在所述数据报文中添加所述数据报文在所述第一中间节点的入端口的标识。In another possible design, the first intermediate node is further configured to, before forwarding the data message to the next-hop node adjacent to the first intermediate node in the main transmission path, An identifier of an ingress port of the data message at the first intermediate node is added to the data message.
上述设计中,第一中间节点在转发数据报文时,在数据报文中添加数据报文在第一中间节点的入端口的标识,有利于第一中间节点的后续节点在发现传输路径故障时,对故障通告报文的溯源回退。In the above design, when the first intermediate node forwards the data message, it adds the identification of the ingress port of the data message in the data message to the data message, which is beneficial to the subsequent nodes of the first intermediate node when they find that the transmission path is faulty. , to trace back the source of the fault notification message.
在另一种可能的设计中,所述第一中间节点在所述数据报文中添加所述数据报文在所述第一中间节点的入端口的标识时,具体用于将所述数据报文携带的源路由标签中记录的所述第一中间节点的出端口的标识替换为所述入端口的标识。In another possible design, when the first intermediate node adds the identifier of the ingress port of the data packet in the data packet to the data packet, it is specifically used to send the data packet The identifier of the egress port of the first intermediate node recorded in the source routing label carried in the file is replaced by the identifier of the ingress port.
上述设计中,复用数据报文携带的源路由标签中出端口字段携带入端口的标识,有利于避免对数据报文传输格式的修改,提高数据报文的转发效率。In the above design, the outbound port field in the source routing label carried by the multiplexed data message carries the inbound port identifier, which is beneficial to avoid modification of the data message transmission format and improve the forwarding efficiency of the data message.
第二方面,本申请提供了一种数据传输方法,该方法包括:源节点向网络管理器发送传输路径请求,所述传输路径请求包括目的节点的标识;所述源节点接收来自所述网络管理器的包括主传输路径和备传输路径的传输路径响应,并记录包括所述主传输路径和所述备传输路径的传输路径响应;当所述主传输路径传输数据报文发生故障时,所述源节点利用所述备传输路径传输所述数据报文。In a second aspect, the present application provides a data transmission method, which includes: a source node sends a transmission path request to a network manager, and the transmission path request includes an identifier of a destination node; The transmission path response of the device including the main transmission path and the backup transmission path, and record the transmission path response including the main transmission path and the backup transmission path; when the main transmission path fails to transmit data packets, the The source node transmits the data packet by using the standby transmission path.
在一种可能的设计中,所述方法还包括:当所述源节点接收到来自第一中间节点的故 障通告报文时,确定所述主传输路径传输数据报文发生故障,其中所述第一中间节点是所述主传输路径中位于所述源节点和所述目的节点之间的任意一个节点。In a possible design, the method further includes: when the source node receives a failure notification message from the first intermediate node, determining that a failure occurs in the data message transmitted by the main transmission path, wherein the first An intermediate node is any node located between the source node and the destination node in the main transmission path.
第三方面,本申请提供了一种数据传输方法,该方法包括:网络管理器接收来自源节点的传输路径请求,所述传输路径请求包括目的节点的标识;所述网络管理器在所述源节点和所述目的节点之间确定主传输路径和备传输路径;所述网络管理器在源节点向目的节点传输数据报文之前,向所述源节点发送包括所述主传输路径和所述备传输路径的传输路径响应。In a third aspect, the present application provides a data transmission method, the method comprising: a network manager receives a transmission path request from a source node, and the transmission path request includes an identifier of a destination node; A main transmission path and a standby transmission path are determined between the node and the destination node; the network manager sends a message including the main transmission path and the backup transmission path to the source node before the source node transmits a data message to the destination node. The transport path response for the transport path.
第四方面,本申请提供了一种数据传输方法,该方法包括:第一中间节点接收来自源节点的数据报文,其中所述第一中间节点是位于所述源节点和所述数据报文的目的节点传输所述数据报文的主传输路径中的节点;所述第一中间节点根据所述数据报文,确定回退路径;当检测到所述主传输路径故障时,所述第一中间节点根据所述回退路径向所述源节点发送故障通告报文。In a fourth aspect, the present application provides a data transmission method, the method comprising: a first intermediate node receiving a data message from a source node, wherein the first intermediate node is located between the source node and the data message The destination node transmits the node in the main transmission path of the data message; the first intermediate node determines the fallback path according to the data message; when detecting that the main transmission path fails, the first intermediate node The intermediate node sends a fault notification message to the source node according to the fallback path.
在一种可能的设计中,所述第一中间节点根据所述数据报文,确定回退路径,包括:所述第一中间节点根据所述数据报文中携带的所述数据报文在一个或多个第二中间节点的入端口的标识,确定所述回退路径,其中所述一个或多个第二中间节点是所述主传输路径中位于所述源节点和所述第一中间节点之间的节点。In a possible design, the first intermediate node determines a fallback path according to the data message, including: the first intermediate node determines the fallback path according to the data message carried in the data message in a or a plurality of identifications of ingress ports of second intermediate nodes, and determine the fallback path, wherein the one or more second intermediate nodes are located between the source node and the first intermediate node in the main transmission path nodes between.
在另一种可能的设计中,所述方法还包括:在向所述主传输路径中与所述第一中间节点相邻的下一跳节点转发所述数据报文之前,所述第一中间节点在所述数据报文中添加所述数据报文在所述第一中间节点的入端口的标识。In another possible design, the method further includes: before forwarding the data packet to the next-hop node adjacent to the first intermediate node in the main transmission path, the first intermediate node The node adds the identifier of the ingress port of the data packet at the first intermediate node to the data packet.
在另一种可能的设计中,所述第一中间节点在所述数据报文中添加所述数据报文在所述第一中间节点的入端口的标识,包括:所述第一中间节点将所述数据报文携带的源路由标签中记录的所述第一中间节点的出端口的标识替换为所述入端口的标识。In another possible design, the first intermediate node adds an identifier of the ingress port of the data message in the data message to the data message, including: the first intermediate node will The identifier of the egress port of the first intermediate node recorded in the source routing label carried in the data packet is replaced with the identifier of the ingress port.
第五方面,本申请实施例提供一种数据传输装置,该装置具有实现上述第二方面及第二方面任意一种可能的设计中各个步骤的功能,所述功能可以通过硬件实现,也可以通过硬件执行相应的软件实现。所述硬件或软件包括一个或多个与上述功能相对应的单元(模块),比如包括通信单元和处理单元。In the fifth aspect, the embodiment of the present application provides a data transmission device, which has the function of implementing each step in the above-mentioned second aspect and any possible design of the second aspect. The function can be realized by hardware, or by The hardware executes the corresponding software implementation. The hardware or software includes one or more units (modules) corresponding to the above functions, such as a communication unit and a processing unit.
在一种可能的设计中,该装置可以是芯片或者集成电路。In one possible design, the device may be a chip or an integrated circuit.
在一种可能的设计中,该装置包括处理器和接口电路,所述处理器与所述接口电路耦合,用于实现上述第二方面及第二方面任意一种可能的设计中各个步骤的功能。可以理解的是,接口电路可以为收发器或输入输出接口。该装置还可以包括存储器,所述存储器存储有可被处理器执行的用于实现上述第二方面及第二方面任意一种可能的设计中各个步骤的功能的程序。In a possible design, the device includes a processor and an interface circuit, the processor is coupled to the interface circuit, and is used to realize the functions of each step in the second aspect and any possible design of the second aspect . It can be understood that the interface circuit may be a transceiver or an input/output interface. The device may further include a memory, where the memory stores a program executable by the processor for realizing the functions of each step in the above-mentioned second aspect and any possible design of the second aspect.
在一种可能的设计中,该装置可以为源节点。In one possible design, the device may be a source node.
第六方面,本申请实施例提供一种数据传输装置,该装置具有实现上述第三方面中各个步骤的功能,所述功能可以通过硬件实现,也可以通过硬件执行相应的软件实现。所述硬件或软件包括一个或多个与上述功能相对应的单元(模块),比如包括通信单元和处理单元。In the sixth aspect, the embodiment of the present application provides a data transmission device, which has the function of realizing each step in the third aspect above, and the function can be realized by hardware, or by executing corresponding software by hardware. The hardware or software includes one or more units (modules) corresponding to the above functions, such as a communication unit and a processing unit.
在一种可能的设计中,该装置可以是芯片或者集成电路。In one possible design, the device may be a chip or an integrated circuit.
在一种可能的设计中,该装置包括处理器和接口电路,所述处理器与所述接口电路耦合,用于实现上述第三方面中各个步骤的功能。可以理解的是,接口电路可以为收发器或 输入输出接口。该装置还可以包括存储器,所述存储器存储有可被处理器执行的用于实现上述第三方面中各个步骤的功能的程序。In a possible design, the device includes a processor and an interface circuit, the processor is coupled to the interface circuit, and is configured to implement the functions of the steps in the above third aspect. It can be understood that the interface circuit may be a transceiver or an input-output interface. The device may further include a memory storing a program executable by the processor for realizing the functions of each step in the above third aspect.
在一种可能的设计中,该装置可以为网络管理器。In one possible design, the device may be a network manager.
第七方面,本申请实施例提供一种数据传输装置,该装置具有实现上述第四方面及第四方面任意一种可能的设计中各个步骤的功能,所述功能可以通过硬件实现,也可以通过硬件执行相应的软件实现。所述硬件或软件包括一个或多个与上述功能相对应的单元(模块),比如包括通信单元和处理单元。In the seventh aspect, the embodiment of the present application provides a data transmission device, which has the function of implementing each step in the fourth aspect and any possible design of the fourth aspect. The function can be realized by hardware, or by The hardware executes the corresponding software implementation. The hardware or software includes one or more units (modules) corresponding to the above functions, such as a communication unit and a processing unit.
在一种可能的设计中,该装置可以是芯片或者集成电路。In one possible design, the device may be a chip or an integrated circuit.
在一种可能的设计中,该装置包括处理器和接口电路,所述处理器与所述接口电路耦合,用于实现上述第四方面及第四方面任意一种可能的设计中各个步骤的功能。可以理解的是,接口电路可以为收发器或输入输出接口。该装置还可以包括存储器,所述存储器存储有可被处理器执行的用于实现上述第四方面及第四方面任意一种可能的设计中各个步骤的功能的程序。In a possible design, the device includes a processor and an interface circuit, the processor is coupled to the interface circuit, and is used to realize the functions of each step in the fourth aspect and any possible design of the fourth aspect . It can be understood that the interface circuit may be a transceiver or an input/output interface. The device may further include a memory, the memory stores a program executable by the processor for realizing the function of each step in the fourth aspect and any possible design of the fourth aspect.
在一种可能的设计中,该装置可以为第一中间节点。In a possible design, the device may be a first intermediate node.
第八方面,本申请实施例还提供了一种计算机程序,当所述计算机程序在计算机上运行时,使得所述计算机执行上述第二至第四方面及第二至第四方面中任意一种可能的设计中提供的方法。In the eighth aspect, the embodiment of the present application also provides a computer program, which, when the computer program is run on a computer, causes the computer to execute any one of the above-mentioned second to fourth aspects and the second to fourth aspects Methods provided in Possible Designs.
第九方面,本申请实施例还提供了一种计算机可读存储介质,所述计算机可读存储介质中存储有计算机程序,当所述计算机程序被计算机执行时,使得所述计算机执行上述第二至第四方面及第二至第四方面中任意一种可能的设计中提供的方法。In the ninth aspect, the embodiment of the present application also provides a computer-readable storage medium, where a computer program is stored in the computer-readable storage medium, and when the computer program is executed by a computer, the computer executes the above-mentioned second to the fourth aspect and the method provided in any possible design of the second to fourth aspects.
第十方面,本申请实施例还提供了一种芯片,所述芯片用于读取存储器中存储的计算机程序,执行上述第二至第四方面及第二至第四方面中任意一种可能的设计中提供的方法。In the tenth aspect, the embodiment of the present application also provides a chip, the chip is used to read the computer program stored in the memory, and execute any one of the possible tasks in the second to fourth aspects and the second to fourth aspects above. method provided in the design.
上述第二方面至第十方面所能达到的技术效果请参照上述第一方面所能达到的技术效果,这里不再重复赘述。For the technical effects that can be achieved from the second aspect to the tenth aspect, please refer to the technical effects that can be achieved by the first aspect above, and will not be repeated here.
附图说明Description of drawings
图1为源路由转发机制示意图;FIG. 1 is a schematic diagram of a source routing forwarding mechanism;
图2为本申请实施例提供的一种ECMP机制原理示意图;FIG. 2 is a schematic diagram of the principle of an ECMP mechanism provided in the embodiment of the present application;
图3为本申请实施例提供的一种AR机制原理示意图;FIG. 3 is a schematic diagram of the principle of an AR mechanism provided by an embodiment of the present application;
图4为本申请实施例提供的网络架构示意图之一;FIG. 4 is one of the schematic diagrams of the network architecture provided by the embodiment of the present application;
图5为本申请实施例提供的数据传输系统的架构示意图之一;FIG. 5 is one of the schematic diagrams of the architecture of the data transmission system provided by the embodiment of the present application;
图6为本申请实施例提供的一种数据传输方法的流程示意图;FIG. 6 is a schematic flowchart of a data transmission method provided by an embodiment of the present application;
图7为本申请实施例提供的网络架构示意图之二;FIG. 7 is the second schematic diagram of the network architecture provided by the embodiment of the present application;
图8为本申请实施例源路由标签示意图之一;FIG. 8 is one of schematic diagrams of a source routing label in an embodiment of the present application;
图9为本申请实施例源路由标签示意图之二;FIG. 9 is a second schematic diagram of a source routing label according to an embodiment of the present application;
图10为本申请实施例提供的数据传输装置示意图之一;FIG. 10 is one of the schematic diagrams of the data transmission device provided by the embodiment of the present application;
图11为本申请实施例提供的数据传输装置示意图之二;Figure 11 is the second schematic diagram of the data transmission device provided by the embodiment of the present application;
图12为本申请实施例提供的数据传输系统的架构示意图之二。FIG. 12 is the second schematic diagram of the architecture of the data transmission system provided by the embodiment of the present application.
具体实施方式Detailed ways
在介绍本申请实施例之前,首先对本申请中的部分用语进行解释说明,以便于本领域技术人员理解。Before introducing the embodiments of the present application, some terms in the present application are firstly explained, so as to facilitate the understanding of those skilled in the art.
1)、主机,又可以称为业务节点、通信节点等,可用于处理业务数据,具有收发功能,可以向其它主机发送数据和/或接收来自其它主机的数据。例如:主机可以是服务器、大数据平台、云端、云服务器、服务器集群、终端设备或计算机设备等等,也可以是这些设备中的部件,如芯片或芯片系统等。其中,计算机设备,可简称为计算机,是指具备网络数据存储、数据发送和数据接收等处理功能的设备。1) Host, also known as service node, communication node, etc., can be used to process business data, has the function of sending and receiving, and can send data to other hosts and/or receive data from other hosts. For example: the host can be a server, a big data platform, the cloud, a cloud server, a server cluster, a terminal device or a computer device, etc., or it can be a component of these devices, such as a chip or a chip system. Among them, computer equipment, which may be referred to as a computer for short, refers to equipment with processing functions such as network data storage, data transmission, and data reception.
2)、传输节点,又可以称为中间节点、交换节点等,是一种具备数据交换(转发)功能的设备,可以是交换机,也可以是路由器、网关等设备,还可以是其他具有数据交换功能的装置或设备,或者是这些设备中的部件,如芯片或芯片系统等。本申请实施例对此不限定。2), transmission node, also known as intermediate node, switching node, etc., is a device with data exchange (forwarding) function, which can be a switch, router, gateway and other devices, or other devices with data exchange A functional device or device, or a component of these devices, such as a chip or a chip system. This embodiment of the present application does not limit it.
3)、IP路由转发机制,通常是指基于报文的信息(如目的IP地址等)采用路由选择算法(如哈希算法等)实现路由转发的机制。以IP路由转发机制中的等价多路径(equal cost multipath,ECMP)机制和自适应路由(adaptive routing,AR)为例。如图2所示,ECMP机制使用哈希方法基于五元组(源IP地址、源端口、目的IP地址、目的端口和传输层协议)计算不同数据流转发的出端口,完成每个数据流和端到端传输路径的一一映射,不同的数据流被均匀的分配至各个端到端传输路径当中。由于每条流的五元组是确定的,因此每次ECMP哈希的出端口也是唯一确定的,该流的端到端传输路径也最终被唯一确定。然而,ECMP负载分担方案最大的问题是:当网络中流量大小分布不均时(大象流和老鼠流混合),将大流和小流等价看待并分配到不同传输路径上会导致各传输路径之间的负载严重不均衡。3) The IP routing and forwarding mechanism usually refers to the mechanism based on message information (such as the destination IP address, etc.) using a routing selection algorithm (such as a hash algorithm, etc.) to implement routing and forwarding. Take the equal cost multipath (equal cost multipath, ECMP) mechanism and adaptive routing (adaptive routing, AR) in the IP routing and forwarding mechanism as examples. As shown in Figure 2, the ECMP mechanism uses the hash method to calculate the outgoing ports of different data streams forwarded based on the five-tuple (source IP address, source port, destination IP address, destination port, and transport layer protocol), and completes each data stream and One-to-one mapping of end-to-end transmission paths, different data streams are evenly distributed to each end-to-end transmission path. Since the quintuple of each flow is determined, the egress port of each ECMP hash is also uniquely determined, and the end-to-end transmission path of the flow is finally uniquely determined. However, the biggest problem with the ECMP load sharing scheme is that when the traffic in the network is unevenly distributed (mixed elephant and mouse flows), treating large flows and small flows as equivalent and assigning them to different transmission paths will cause each transmission The load is heavily unbalanced among the paths.
如图3所示,AR机制参考了ECMP机制,在ECMP机制的基础上增加了对出端口队列拥塞状态的判断,如果拥塞超出门限则调整至其它端口,一定程度上避免了各传输路径之间的负载严重不均衡。但是ECMP机制和AR机制,均需传输路径上的各中间节点维护整个网络全局路由表以及执行路由选择算法,维护实现难度高、且由于每个中间节点均需执行路由选择算法传输时延较大,不如源路由机制概念简单,易于实现。As shown in Figure 3, the AR mechanism refers to the ECMP mechanism. On the basis of the ECMP mechanism, the judgment of the congestion status of the outgoing port queue is added. If the congestion exceeds the threshold, it is adjusted to other ports. The load is severely unbalanced. However, both the ECMP mechanism and the AR mechanism require each intermediate node on the transmission path to maintain the global routing table of the entire network and execute the routing algorithm, which is difficult to maintain and implement, and the transmission delay is relatively large because each intermediate node needs to execute the routing algorithm. , not as simple in concept as the source routing mechanism, and easy to implement.
4)、源路由,也可以称为源路由机制,即数据传输的源节点(即发送节点)可以指定所发送的报文沿途经过的部分或全部传输节点(或中间节点)。如图1所示,已知源节点的出端口A与中间节点1连接、中间节点1的出端口B与中间节点2连接、中间节点2的出端口C与中间节点3连接、中间节点3的出端口D与中间节点4连接、中间节点4的出端口E与中间节点5连接、中间节点5的出端口F与目的节点连接,源节点可以在发送的数据报文中携带上述源节点的出端口A、中间节点1的出端口B、中间节点2的出端口C、中间节点3的出端口D、中间节点4的出端口E和中间节点5的出端口F的标识(也即传输路径的源路由路径)指示数据报文从源节点出发经过的每一跳节点为中间节点1、中间节点2、中间节点3、中间节点4、中间节点5和目的节点。4) Source routing, which can also be called a source routing mechanism, that is, the source node of data transmission (ie, the sending node) can specify some or all of the transmission nodes (or intermediate nodes) that the sent message passes along the way. As shown in Figure 1, it is known that the outgoing port A of the source node is connected to the intermediate node 1, the outgoing port B of the intermediate node 1 is connected to the intermediate node 2, the outgoing port C of the intermediate node 2 is connected to the intermediate node 3, and the outgoing port B of the intermediate node 3 is connected to the intermediate node 3. The outgoing port D is connected to the intermediate node 4, the outgoing port E of the intermediate node 4 is connected to the intermediate node 5, and the outgoing port F of the intermediate node 5 is connected to the destination node. Port A, the outgoing port B of the intermediate node 1, the outgoing port C of the intermediate node 2, the outgoing port D of the intermediate node 3, the outgoing port E of the intermediate node 4, and the outgoing port F of the intermediate node 5 (that is, the identification of the transmission path Source routing path) indicates that each hop node that the data packet passes through from the source node is intermediate node 1, intermediate node 2, intermediate node 3, intermediate node 4, intermediate node 5 and the destination node.
然而,现有源路由机制,源节点仅能依靠与目的节点间周期性的心跳连接来检测源节点和目的节点之间是否存在链路故障,并在检测到故障后重新请求网络控制器下发源节点和目的节点之间新传输路径的源路由路径,故障检测的效率较低,而且发现故障后,网络控制器需要重新扫描网络架构配置新传输路径,所需时间也较长,这种情况在网络架构复 杂的情况下尤为严重,难以实现传输路径故障的快速修复。However, in the existing source routing mechanism, the source node can only rely on the periodic heartbeat connection with the destination node to detect whether there is a link failure between the source node and the destination node, and re-request the network controller to send the source node after the failure is detected. For the source routing path of the new transmission path between the node and the destination node, the efficiency of fault detection is low, and after the fault is found, the network controller needs to re-scan the network architecture to configure the new transmission path, which takes a long time. In this case, It is especially serious when the network architecture is complex, and it is difficult to quickly repair transmission path failures.
有鉴于此,本申请实施例提供一种数据传输方案,用以通过网络管理器同时在源节点和目的节点确定主传输路径和备传输路径,实现传输路径发生故障后的快速修复。下面结合附图详细说明本申请实施例。In view of this, the embodiment of the present application provides a data transmission solution, which is used to determine the main transmission path and the backup transmission path at the source node and the destination node through the network manager at the same time, so as to realize the rapid recovery after the failure of the transmission path. Embodiments of the present application will be described in detail below in conjunction with the accompanying drawings.
如图4所示,为本申请实施例适用的一种可能的胖树(Fat-Tree)网络架构下的数据传输系统示例,包括多个主机(如主机H11、主机H12等)、多个中间节点(如中间节点E11、中间节点A11、中间节点C11等)和网络管理器(fabric manager,FM)。其中,中间节点也可以称为传输节点,叶子(leaf)层和脊柱(spine)层的中间节点可以被划分为不同交付单元(point of delivery,POD)或集群,在每个POD中每个叶子层的中间节点与每个脊柱层的中间节点连接;同时每个脊柱层的中间节点可以与一个或多个超级脊柱(super-spine)层的中间节点连接,使得不同POD之间可以通过超级脊柱(super-spine)层的中间节点连接。其中,网络管理器具有网管功能,与网络中的各中间节点和主机直接或间接连接,可以提供网络的传输路径配置。在进行数据传输时,图4中的任一主机可以作为数据传输的源节点,即数据发送端,也可以作为数据传输的目的节点,即数据接收端。As shown in Figure 4, it is an example of a data transmission system under a possible fat-tree (Fat-Tree) network architecture applicable to the embodiment of the present application, including multiple hosts (such as host H11, host H12, etc.), multiple intermediate Nodes (such as intermediate node E11, intermediate node A11, intermediate node C11, etc.) and a network manager (fabric manager, FM). Among them, the intermediate nodes can also be called transmission nodes, and the intermediate nodes of the leaf (leaf) layer and the spine (spine) layer can be divided into different delivery units (point of delivery, POD) or clusters, and each leaf in each POD The intermediate nodes of each layer are connected to the intermediate nodes of each spine layer; at the same time, the intermediate nodes of each spine layer can be connected to the intermediate nodes of one or more super-spine layers, so that different PODs can pass through the super-spine The intermediate nodes of the (super-spine) layer are connected. Among them, the network manager has a network management function, is directly or indirectly connected to each intermediate node and host computer in the network, and can provide network transmission path configuration. During data transmission, any host in FIG. 4 can be used as a source node of data transmission, that is, a data sending end, or as a destination node of data transmission, that is, a data receiving end.
为了实现本申请的数据传输方案,图4中的主机(如主机H11、主机H12等)可以包括RNIC网卡、主机代理(Host-agent)组件等组件,中间节点(如中间节点E11、中间节点A11、中间节点C11等)可以包括交换(Switch)组件和交换代理(Switch-agent)组件等组件,网络管理器可以包括全局的网络管理器控制器(FM-controller)组件、全局的网络管理器管理(FM-manager)组件等组件。具体的,如图5所示,数据传输系统中的网络管理器(FM)、主机(Host)和中间节点(如交换机(Switch))的组件根据功能不同,可分为转发面、控制面和管理面三个层级,其中转发面可以包括主机侧的RNIC网卡、中间节点侧的Switch组件;控制面包括主机侧的Host-agent组件、中间节点侧的Switch-agent组件和网络管理器侧的FM-controller组件;管理面包括网络管理器侧的FM-manager组件。FM-manager组件可以为中间节点下发端口配置(port config),也可以获知中间节点间的端口链路状态(port link status)和端口输入/输出适配率(port input/output uitility rate)等信息,并可以将网络拓扑(network topology)等信息存储在数据库(data base,DB)。所述数据传输系统可以基于RoCE网络进行数据传输,在进行数据报文传输时,可以由主机侧的RNIC网卡向FM-controller组件请求传输路径,FM-controller组件向主机侧的RNIC网卡返回主传输路径和备传输路径,然后主机侧的RNIC网卡即可根据通过主传输路径和备传输路径和其它主机侧的RNIC网卡进行数据报文的传输。In order to realize the data transmission scheme of the present application, the host (such as host H11, host H12 etc.) among Fig. , intermediate node C11, etc.) may include components such as a switch (Switch) component and a switch agent (Switch-agent) component, and the network manager may include a global network manager controller (FM-controller) component, a global network manager management (FM-manager) components and other components. Specifically, as shown in Figure 5, the components of the network manager (FM), host (Host) and intermediate nodes (such as switches (Switch)) in the data transmission system can be divided into forwarding plane, control plane and The management plane has three levels, where the forwarding plane can include the RNIC network card on the host side and the Switch component on the intermediate node side; the control plane includes the Host-agent component on the host side, the Switch-agent component on the intermediate node side, and the FM on the network manager side -controller component; the management plane includes the FM-manager component on the network manager side. The FM-manager component can deliver port configurations (port config) to intermediate nodes, and can also learn the port link status (port link status) and port input/output adaptation rate (port input/output utility rate) between intermediate nodes, etc. information, and can store network topology (network topology) and other information in a database (data base, DB). The data transmission system can perform data transmission based on the RoCE network. When transmitting data packets, the RNIC network card on the host side can request a transmission path from the FM-controller component, and the FM-controller component returns the main transmission path to the RNIC network card on the host side. path and backup transmission path, and then the RNIC network card on the host side can transmit data packets through the main transmission path and backup transmission path and other RNIC network cards on the host side.
图6为本申请实施例提供的一种数据传输方法示意图,该方法包括:FIG. 6 is a schematic diagram of a data transmission method provided in an embodiment of the present application, the method including:
S601:源节点向网络管理器发送传输路径请求,所述网络管理器接收所述传输路径请求。S601: The source node sends a transmission path request to a network manager, and the network manager receives the transmission path request.
其中,所述传输路径请求包括目的节点的标识。Wherein, the transmission path request includes the identifier of the destination node.
在一些实施中,源节点(如主机H11)中的某一应用(如HPC应用)被启动时,会触发源节点向目的节点(如主机H21)发起网络连接(如RDMA连接),用于源节点和目的节点之间传输某一业务的数据报文。具体的,当源节点中的某一应用被启动,向目的节点发起网络连接时,源节点可以向网络管理器发送包括目的节点的标识的传输路径请求,请求网络管理器为源节点配置向目的节点发送业务的数据报文的传输路径。In some implementations, when an application (such as an HPC application) in the source node (such as the host H11) is started, it will trigger the source node to initiate a network connection (such as an RDMA connection) to the destination node (such as the host H21), for the source The data message of a certain service is transmitted between the node and the destination node. Specifically, when an application in the source node is started and initiates a network connection to the destination node, the source node may send a transmission path request including the identifier of the destination node to the network manager, requesting the network manager to configure the connection to the destination node for the source node. The transmission path of the data message sent by the node.
作为一种示例,源节点中的HPC应用被启动时,向目的节点发起RDMA连接时,可 以由源节点的RDMA网络接口卡(RDMA network interface card,RNIC)向网络管理器发送包括目的节点的标识的传输路径请求,其中目的节点的标识可以为目的节点的身份标识号(identity document,ID)、IP地址等,本申请不作限定。As an example, when the HPC application in the source node is started, when an RDMA connection is initiated to the destination node, the RDMA network interface card (RDMA network interface card, RNIC) of the source node may send the identification of the destination node to the network manager. The request for a transmission path, wherein the identifier of the destination node may be an identity document (ID), IP address, etc. of the destination node, which is not limited in this application.
S602:所述网络管理器在所述源节点和所述目的节点之间确定主传输路径和备传输路径,并在源节点向目的节点传输数据报文之前,向所述源节点发送包括所述主传输路径和所述备传输路径的传输路径响应。S602: The network manager determines a primary transmission path and a standby transmission path between the source node and the destination node, and before the source node transmits a data packet to the destination node, sends a message including the Transmission path responses of the primary transmission path and the standby transmission path.
在本申请实施例中,网络管理器具有网管功能,与网络中的各中间节点(也可以称为传输节点)和主机直接或间接连接,可以提供网络的传输路径配置、各节点的性能控制等功能。在网络中各节点间(包括中间节点和主机间、以及中间节点和中间节点间)可以通过端口连接,如图7所示,中间节点E11可以通过端口16与中间节点A11的端口101连接。具体的,网络中各节点间的端口连接可以在节点(包括主机和中间节点)入网时由网络管理人员等配置,并由节点上报网络管理器,也可以由网络管理人员等通过网络管理器配置,并由网络管理器下发至相应节点,本申请对此不进行限定。In the embodiment of this application, the network manager has a network management function, and is directly or indirectly connected to each intermediate node (also called a transmission node) and a host in the network, and can provide network transmission path configuration, performance control of each node, etc. Function. Between each node in the network (comprising intermediate node and host computer, and between intermediate node and intermediate node) can be connected by port, as shown in Figure 7, intermediate node E11 can be connected with the port 101 of intermediate node A11 by port 16. Specifically, the port connections between nodes in the network can be configured by network managers when nodes (including hosts and intermediate nodes) access the network, and the nodes report to the network manager, or can be configured by network managers through the network manager , and sent to corresponding nodes by the network manager, which is not limited in this application.
作为一种示例,网络管理器在接收到来自源节点的包括目的节点的标识的传输路径请求后,可以根据管理的网络的拓扑架构,为源节点和目的节点间确定多条共用中间节点最少传输路径,也即多条传输路径之间尽可能不包括重复的中间节点。如图7所示,网络管理器确定的传输路径包括传输路径1:源节点H11-中间节点E11-中间节点A11-中间节点C11-中间节点A21-中间节点E21-目的节点H21,以及传输路径2:源节点H11-中间节点E11-中间节点A12-中间节点C22-中间节点A22-中间节点E21-目的节点H21,其中传输路径1和传输路径2在脊柱(spine)层和超级脊柱(super spine)层无交叉,也即无共用中间节点。As an example, after receiving the transmission path request from the source node including the identification of the destination node, the network manager can determine the minimum transmission path between the source node and the destination node based on the topology of the managed network. path, that is, multiple transmission paths do not include repeated intermediate nodes as much as possible. As shown in Figure 7, the transmission path determined by the network manager includes transmission path 1: source node H11-intermediate node E11-intermediate node A11-intermediate node C11-intermediate node A21-intermediate node E21-destination node H21, and transmission path 2 : source node H11-intermediate node E11-intermediate node A12-intermediate node C22-intermediate node A22-intermediate node E21-destination node H21, wherein transmission path 1 and transmission path 2 are in the spine (spine) layer and super spine (super spine) Layers have no intersections, that is, no shared intermediate nodes.
网络管理器在源节点和目的节点间确定共用中间节点最少的多条传输路径后,可以将其中一条传输路径确定为主传输路径,其它传输路径确定为备传输路径,向源节点发送包括所述主传输路径和所述备传输路径的传输路径响应。具体的,网络管理器在多条传输路径中确定为主传输路径和备传输路径时,可以按照包括的中间节点最少的原则,将包括中间节点数量最少的传输路径作为主传输路径(如果包括中间节点数量最少的传输路径存在多条,可以在多条包括中间节点数量最少的传输路径中随机选取一条作为主传输路径),其它传输路径作为备传输路径;当然也可以按照平均负载最小的原则、传输时延最低的原则等在多条传输路径中选取主传输路径,其它传输路径作为备传输路径。After the network manager determines multiple transmission paths that share the least intermediate nodes between the source node and the destination node, one of the transmission paths can be determined as the main transmission path, and the other transmission paths are determined as backup transmission paths, and the transmission path including the Transmission path responses of the primary transmission path and the standby transmission path. Specifically, when the network manager determines the primary transmission path and the standby transmission path among the multiple transmission paths, it may use the transmission path with the least number of intermediate nodes as the primary transmission path according to the principle that the number of intermediate nodes is the least There are multiple transmission paths with the least number of nodes, and one of the transmission paths with the fewest number of intermediate nodes can be randomly selected as the main transmission path), and the other transmission paths can be used as backup transmission paths; of course, according to the principle of the smallest average load, The principle of the lowest transmission delay is to select the main transmission path among multiple transmission paths, and the other transmission paths are used as backup transmission paths.
当然,网络管理器也可以直接向源节点发送包括所述多条传输路径的传输路径响应,由源节点在所述多条传输路径中选择一条作为主传输路径,其它传输路径作为备传输路径。Of course, the network manager may also directly send a transmission path response including the multiple transmission paths to the source node, and the source node selects one of the multiple transmission paths as the primary transmission path, and the other transmission paths as backup transmission paths.
以源节点H11-中间节点E11-中间节点A11-中间节点C11-中间节点A21-中间节点E21-目的节点H21为主传输路径为例,其中源节点H11通过端口17与中间节点E11的端口102连接、中间节点E11通过端口18与中间节点A11的端口103连接、中间节点A11通过端口19与中间节点C11的端口104连接、中间节点C11通过端口20与中间节点A21的端口105连接、中间节点A21通过端口21与中间节点E21的端口106连接、中间节点E21通过端口22与目的节点H21的端口107连接,则网络管理器可以通过向源节点下发包括主传输路径的源路由路径的传输路径响应,将主传输路径发送给源节点,其中主传输路径的源路由路径可以包括主传输路径上由源节点向目的节点传输数据报文的各节点的出端口的标识。其中各节点的出端口的标识可以由节点的标识(如IP地址)和出端口的端口号 组成,作为一种示例,所述主传输路径的源路由路径如下表1所示。Take source node H11-intermediate node E11-intermediate node A11-intermediate node C11-intermediate node A21-intermediate node E21-destination node H21 as the main transmission path, where source node H11 is connected to port 102 of intermediate node E11 through port 17 , the intermediate node E11 is connected to the port 103 of the intermediate node A11 through port 18, the intermediate node A11 is connected to the port 104 of the intermediate node C11 through the port 19, the intermediate node C11 is connected to the port 105 of the intermediate node A21 through the port 20, and the intermediate node A21 is connected through Port 21 is connected to port 106 of the intermediate node E21, and the intermediate node E21 is connected to port 107 of the destination node H21 through port 22, then the network manager can send the transmission path response of the source routing path including the main transmission path to the source node, Sending the main transmission path to the source node, wherein the source routing path of the main transmission path may include the identifiers of the egress ports of the nodes on the main transmission path that transmit data packets from the source node to the destination node. Wherein the identification of the outgoing port of each node may be composed of the identification of the node (such as an IP address) and the port number of the outgoing port. As an example, the source routing path of the main transmission path is shown in Table 1 below.
表1Table 1
Figure PCTCN2022095142-appb-000001
Figure PCTCN2022095142-appb-000001
其中,源节点H11、中间节点E11、中间节点A11、中间节点C11、中间节点A21、中间节点E21、目的节点H21的IP地址分别为IP H11、IP E11、IP A11、IP C11、IP A21、IP E21、IP H21。Among them, the IP addresses of source node H11, intermediate node E11, intermediate node A11, intermediate node C11, intermediate node A21, intermediate node E21, and destination node H21 are IP H11, IP E11, IP A11, IP C11, IP A21, IP E21, IP H21.
S603:当所述源节点在所述主传输路径传输数据报文发生故障时,所述源节点利用所述备传输路径传输所述数据报文。S603: When the source node fails to transmit the data packet on the primary transmission path, the source node transmits the data packet by using the standby transmission path.
源节点从网络管理器获取到主传输路径和备传输路径后,通过在数据报文的携带的源路由标签中记录(或携带)主传输路径的源路由路径,通过主传输路径向目的节点传输数据报文。After the source node obtains the primary transmission path and the standby transmission path from the network manager, it records (or carries) the source routing path of the primary transmission path in the source routing label carried in the data message, and transmits to the destination node through the primary transmission path. datagram.
在一种可能的实施中,源节点和目的节点之间还可以维持心跳连接,用于检测第一传输路径是否发生故障。例如,源节点可以按照设定周期(如1s、2s)等,通过主传输路径向目的节点发送心跳请求报文,如果在规定时间内(如1ms内)接收到目的节点回复的心跳响应报文,则说明主传输路径能够正常传输报文无故障,否则,确定主传输路径发生故障。In a possible implementation, a heartbeat connection may also be maintained between the source node and the destination node to detect whether a failure occurs on the first transmission path. For example, the source node can send a heartbeat request message to the destination node through the main transmission path according to the set period (such as 1s, 2s), etc., if the heartbeat response message replied by the destination node is received within the specified time (such as within 1ms) , it means that the main transmission path can normally transmit packets without failure, otherwise, it is determined that the main transmission path is faulty.
为了进一步提高故障检测效率,在另一种可能的实施中,数据报文从源节点发出后,数据报文每经过一跳转发,中间节点可以将数据报文在该节点的入端口的标识添加至数据报文中,方便传输路径发生故障时,故障通告报文的溯源回退。In order to further improve the fault detection efficiency, in another possible implementation, after the data message is sent from the source node, each time the data message is forwarded through a hop, the intermediate node can mark the data message on the ingress port of the node Added to the data message to facilitate the traceability and rollback of the fault notification message when the transmission path fails.
作为一种示例,中间节点可以在转发数据报文时,将源路由标签中记录的该中间节点的出端口的标识替换为数据报文在该中间节点的入端口的标识。仍以数据报文通过上述主传输路径传输为例,数据报文携带源路由标签(即报文头)如图8所示,中间节点E11通过端口102(入端口)接收源节点H11发送的数据报文,中间节点E11在根据HOP1(IP E11+18)通过自身端口18(出端口)转发该数据报文时,将HOP1由IP E11+18修改为IP E11+102,同理,中间节点C11转发数据报文时,将HOP2由IP A11+19修改为IP A11+103等。As an example, when an intermediate node forwards a data packet, it may replace the identifier of the outgoing port of the intermediate node recorded in the source routing label with the identifier of the incoming port of the data packet at the intermediate node. Still taking the transmission of the data message through the above-mentioned main transmission path as an example, the data message carries the source routing label (that is, the message header) as shown in Figure 8, and the intermediate node E11 receives the data sent by the source node H11 through the port 102 (ingress port) message, when the intermediate node E11 forwards the data message through its own port 18 (outlet port) according to HOP1 (IP E11+18), HOP1 is changed from IP E11+18 to IP E11+102, similarly, the intermediate node C11 When forwarding data packets, modify HOP2 from IP A11+19 to IP A11+103, etc.
当主传输路径中任一中间节点被检测到出端口链路故障,数据报文无法转发时,该中间节点构造生成故障通告(failure notification,FN)报文,并从数据报文的入端口转发回退故障通告报文。其中故障通告报文中包括向源节点发送故障通告报文的回退路径的源路由路径(即指定回退路径上每一跳的出端口),其中回退路径的源路由路径可以从数据报文中获取。为了方便描述后续,本申请实施例的后续描述中,将上述主传输路径中检测到出端口链路故障的中间节点称为第一中间节点。When any intermediate node in the main transmission path is detected to have a link failure on the outgoing port and the data message cannot be forwarded, the intermediate node will construct a failure notification (FN) message and forward it back from the incoming port of the data message. Return the fault notification message. Wherein the failure notification message includes the source routing path of the fallback path that sends the failure notification message to the source node (i.e. specifying the outgoing port of each hop on the fallback path), wherein the source routing path of the fallback path can be obtained from the datagram obtained from the text. For the convenience of subsequent descriptions, in the subsequent descriptions of the embodiments of the present application, the intermediate node on the main transmission path that detects the failure of the outgoing port link is referred to as the first intermediate node.
作为一种示例,如图7所述,数据报文携带如图8所示的用于指示主传输路径的源路 由标签由主机H11(源节点)发出,经过主传输路径上的中间节点E11、中间节点A11和中间节点C11(位于源节点和第一中间节点间的多个第二中间节点)转发后,主传输路径上的中间节点A21(第一中间节点)接收到数据报文。其中,中间节点E11、中间节点A11和中间节点C11在转发数据报文时,将数据报文的源路由标签中记录的该中间节点的出端口的标识替换为数据报文在该中间节点的入端口的标识,中间节点A21接收到的数据报文携带的源路由如图9所示。假设主传输路径上的中间节点A21检测到向下一跳节点中间节点E21转发数据报文的端口21发生故障(如端口21与中间节点21的心跳连接中断或电连接中断等),中间节点A21根据数据报文中的源路由标签确定回退路径的源路由路径。如图9所示,中间节点A21根据源路由标签中的Hop1、Hop2、Hop3以及自身接收数据报文的端口105,确定回退路径的源路由路径为Hop 0(IP A21+105)、Hop1(IP C11+104)、Hop2(IP A11+103)、Hop3(IP E11+102)。As an example, as shown in FIG. 7, the data message carrying the source routing label shown in FIG. After forwarding by the intermediate node A11 and the intermediate node C11 (multiple second intermediate nodes located between the source node and the first intermediate node), the intermediate node A21 (first intermediate node) on the main transmission path receives the data message. Wherein, when the intermediate node E11, the intermediate node A11 and the intermediate node C11 forward the data message, the identification of the outgoing port of the intermediate node recorded in the source routing label of the data message is replaced by the inbound port of the data message in the intermediate node. The identification of the port and the source route carried by the data packet received by the intermediate node A21 are shown in FIG. 9 . Assuming that the intermediate node A21 on the main transmission path detects that the port 21 forwarding the data message to the next hop node intermediate node E21 fails (such as the heartbeat connection interruption or electrical connection interruption between the port 21 and the intermediate node 21, etc.), the intermediate node A21 Determine the source routing path of the fallback path according to the source routing label in the data packet. As shown in Figure 9, the intermediate node A21 determines that the source routing path of the fallback path is Hop 0(IP A21+105), Hop1( IP C11+104), Hop2(IP A11+103), Hop3(IP E11+102).
中间节点A21构造生成故障通告报文,故障通告报文携带的源路由标签中记录有Hop 0(IP A21+105)、Hop1(IP C11+104)、Hop2(IP A11+103)、Hop3(IP E11+102)的信息,中间节点A21从数据报文的入端口,即端口105向中间节点C11发送故障通告报文,依据故障通告报文携带的源路由标签中记录的Hop1(IP C11+104)、Hop2(IP A11+103)、Hop3(IPE11+102),中间节点C11、中间节点A11、中间节点E11逐跳转发故障通告报文,直至故障通告报文被主机H11接收,确定传输数据报文的主传输路径发生故障。The intermediate node A21 constructs and generates a fault notification message, and the source routing label carried by the fault notification message contains Hop 0 (IP A21+105), Hop1 (IP C11+104), Hop2 (IP A11+103), Hop3 (IP E11+102), the intermediate node A21 sends a fault notification message to the intermediate node C11 from the incoming port of the data message, namely port 105, according to the Hop1(IP C11+104) recorded in the source routing label carried by the fault notification message ), Hop2 (IP A11+103), Hop3 (IPE11+102), intermediate node C11, intermediate node A11, and intermediate node E11 forward the fault notification message hop by hop until the fault notification message is received by the host H11, and the transmission data is determined The main transmission path of the packet is faulty.
另外,为了提高故障通告报文传输的可靠性,中间节点可以针对故障通告报文预留资源(如预留带宽资源、转发队列资源等),以保障故障通告报文可无阻塞地、快速地溯源回退。In addition, in order to improve the reliability of fault notification message transmission, intermediate nodes can reserve resources (such as reserved bandwidth resources, forwarding queue resources, etc.) Trace back.
在一些实施中,源节点可以同时与多个目的节点传输不同业务的数据报文,为了便于源节点对发生故障的传输路径的定位,故障通告报文中还可以携带对应数据报文所属业务的业务流标识,其中数据报文所属业务的业务流标识可以从数据报文的报文头中的流标签(flow label)字段等中获取,业务流的标识可以为业务流传输对应的源IP、目的IP+队列对(queue pair,QP)标识(identifier,ID)等。In some implementations, the source node can transmit data packets of different services with multiple destination nodes at the same time. In order to facilitate the location of the faulty transmission path by the source node, the fault notification message can also carry the information of the service to which the corresponding data message belongs. Service flow identification, wherein the service flow identification of the service to which the data message belongs can be obtained from the flow label (flow label) field in the message header of the data message, etc., and the service flow identification can be the corresponding source IP, Destination IP + queue pair (queue pair, QP) identifier (identifier, ID), etc.
源节点在检测到主传输路径发生故障后,将数据报文的传输由主传输路径切换到网络管理器下发的备传输路径,即源节点利用备传输路径进行数据传输,可实现传输路径快速故障切换,达成业务流快速收敛的目标。After the source node detects the failure of the main transmission path, it switches the transmission of data packets from the main transmission path to the standby transmission path issued by the network manager, that is, the source node uses the standby transmission path for data transmission, which can realize the fast transmission path. Failover to achieve the goal of fast convergence of business flows.
具体的,源节点在检测到主传输路径发生故障后,将数据报文携带的源路由标签中的源路由路径由主传输路径的源路由路径切换为备传输路径的源路由路径,即可实现传输路径快速故障切换,达成业务流快速收敛的目标。Specifically, after the source node detects that the primary transmission path fails, the source routing path in the source routing label carried by the data packet is switched from the source routing path of the primary transmission path to the source routing path of the standby transmission path, which can realize Fast failover of the transmission path achieves the goal of fast convergence of business flows.
另外,需要理解的是,本申请实施例提供的数据传输方案不仅适用于如图4所示的Fat-Tree网络架构,还可适应于3D/6D环(torus)拓扑网络架构、蜻蜓(dragonfly)拓扑网络架构等。In addition, it should be understood that the data transmission scheme provided by the embodiment of the present application is not only applicable to the Fat-Tree network architecture as shown in Figure 4, but also applicable to 3D/6D ring (torus) topology network architecture, dragonfly topological network architecture, etc.
上述主要从源节点和第一中间节点、网络管理器之间交互的角度对本申请提供的方案进行了介绍。可以理解的是,为了实现上述功能,各网元(设备)包括了执行各个功能相应的硬件结构和/或软件模块(或单元)。本领域技术人员应该很容易意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,本申请能够以硬件或硬件和计算机软件的结合形式来实现。某个功能究竟以硬件还是计算机软件驱动硬件的方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方 法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。The foregoing mainly introduces the solution provided by the present application from the perspective of interaction between the source node, the first intermediate node, and the network manager. It can be understood that, in order to realize the above functions, each network element (device) includes a corresponding hardware structure and/or software module (or unit) for performing each function. Those skilled in the art should easily realize that the present application can be implemented in the form of hardware or a combination of hardware and computer software in combination with the units and algorithm steps of each example described in the embodiments disclosed herein. Whether a certain function is executed by hardware or computer software drives hardware depends on the specific application and design constraints of the technical solution. Those skilled in the art may use different methods to implement the described functions for each specific application, but such implementation should not be regarded as exceeding the scope of the present application.
图10和图11为本申请的实施例提供的可能的数据传输装置的结构示意图。这些数据传输装置可以用于实现上述方法实施例中源节点或网络管理器或第一中间节点的功能,因此也能实现上述方法实施例所具备的有益效果。在本申请的实施例中,该数据传输装置可以是源节点或网络管理器或第一中间节点,还可以是应用于源节点或网络管理器或第一中间节点的模块(如芯片)。FIG. 10 and FIG. 11 are schematic structural diagrams of a possible data transmission device provided by an embodiment of the present application. These data transmission devices can be used to implement the functions of the source node, the network manager, or the first intermediate node in the above method embodiments, and thus can also realize the beneficial effects of the above method embodiments. In the embodiment of the present application, the data transmission device may be the source node or the network manager or the first intermediate node, and may also be a module (such as a chip) applied to the source node or the network manager or the first intermediate node.
如图10所示。数据传输装置1000可以包括:处理单元1002和通信单元1003,还可以包括存储单元1001。数据传输装置1000用于实现上述方法实施例中源节点或网络管理器或第一中间节点的功能。As shown in Figure 10. The data transmission device 1000 may include: a processing unit 1002 and a communication unit 1003 , and may also include a storage unit 1001 . The data transmission apparatus 1000 is configured to realize the functions of the source node or the network manager or the first intermediate node in the above method embodiments.
一种可能的设计中,处理单元1002用于实现相应的处理功能。通信单元1003用于支持数据传输装置1000与其他网络实体的通信。存储单元1001,用于存储数据传输装置1000的程序代码和/或数据。可选地,通信单元1003可以包括接收单元和/或发送单元,分别用于执行接收和发送操作。In a possible design, the processing unit 1002 is configured to implement corresponding processing functions. The communication unit 1003 is used to support communication between the data transmission device 1000 and other network entities. The storage unit 1001 is configured to store program codes and/or data of the data transmission device 1000 . Optionally, the communication unit 1003 may include a receiving unit and/or a sending unit, configured to perform receiving and sending operations respectively.
当数据传输装置1000用于实现方法实施例中源节点的功能时:所述通信单元1003,用于向网络管理器发送传输路径请求,所述传输路径请求包括目的节点的标识;以及接收来自所述网络管理器的包括主传输路径和备传输路径的传输路径响应;When the data transmission device 1000 is used to realize the function of the source node in the method embodiment: the communication unit 1003 is used to send a transmission path request to the network manager, and the transmission path request includes the identity of the destination node; The transmission path response of the network manager including the main transmission path and the backup transmission path;
所述处理单元1002,用于记录包括所述主传输路径和所述备传输路径的传输路径响应;The processing unit 1002 is configured to record a transmission path response including the primary transmission path and the backup transmission path;
所述通信单元1003,还用于当所述主传输路径传输数据报文发生故障时,利用所述备传输路径传输所述数据报文。The communication unit 1003 is further configured to use the standby transmission path to transmit the data packet when the primary transmission path fails to transmit the data packet.
在一种可能的设计中,所述处理单元1002,还用于当所述通信单元1003接收到来自第一中间节点的故障通告报文时,确定所述主传输路径传输数据报文发生故障,其中所述第一中间节点是所述主传输路径中位于所述源节点和所述目的节点之间的任意一个节点。In a possible design, the processing unit 1002 is further configured to, when the communication unit 1003 receives a failure notification message from the first intermediate node, determine that a failure occurs in the data message transmitted by the main transmission path, Wherein the first intermediate node is any node located between the source node and the destination node in the main transmission path.
当数据传输装置1000用于实现方法实施例中网络管理器的功能时:所述通信单元1003,用于接收来自源节点的传输路径请求,所述传输路径请求包括目的节点的标识;When the data transmission device 1000 is used to realize the function of the network manager in the method embodiment: the communication unit 1003 is configured to receive a transmission path request from the source node, and the transmission path request includes the identifier of the destination node;
所述处理单元1002,用于在所述源节点和所述目的节点之间确定主传输路径和备传输路径;The processing unit 1002 is configured to determine a primary transmission path and a standby transmission path between the source node and the destination node;
所述通信单元1003,还用于在源节点向目的节点传输数据报文之前,向所述源节点发送包括所述主传输路径和所述备传输路径的传输路径响应。The communication unit 1003 is further configured to send a transmission path response including the primary transmission path and the standby transmission path to the source node before the source node transmits the data packet to the destination node.
当数据传输装置1000用于实现方法实施例中第一中间节点的功能时:所述通信单元1003,用于接收来自源节点的数据报文,其中所述第一中间节点是位于所述源节点和所述数据报文的目的节点传输所述数据报文的主传输路径中的节点;When the data transmission device 1000 is used to implement the function of the first intermediate node in the method embodiment: the communication unit 1003 is configured to receive a data message from the source node, wherein the first intermediate node is located at the source node A node in the main transmission path for transmitting the data message with the destination node of the data message;
所述处理单元1002,用于根据所述数据报文,确定回退路径;The processing unit 1002 is configured to determine a fallback path according to the data message;
所述通信单元1003,还用于当检测到所述主传输路径故障时,根据所述回退路径向所述源节点发送故障通告报文。The communication unit 1003 is further configured to send a failure notification message to the source node according to the fallback path when a failure of the main transmission path is detected.
在一种可能的设计中,所述处理单元1002根据所述数据报文,确定回退路径时,具体用于根据所述数据报文中携带的所述数据报文在一个或多个第二中间节点的入端口的标识,确定所述回退路径,其中所述一个或多个第二中间节点是所述主传输路径中位于所述源节点和所述第一中间节点之间的节点。In a possible design, when the processing unit 1002 determines the fallback path according to the data message, it is specifically configured to: An identification of an ingress port of an intermediate node to determine the fallback path, wherein the one or more second intermediate nodes are nodes located between the source node and the first intermediate node in the main transmission path.
在一种可能的设计中,所述处理单元1002,还用于在所述通信单元1003向所述主传输路径中与所述第一中间节点相邻的下一跳节点转发所述数据报文之前,在所述数据报文 中添加所述数据报文在所述第一中间节点的入端口的标识。In a possible design, the processing unit 1002 is further configured to forward the data message to a next-hop node adjacent to the first intermediate node in the main transmission path in the communication unit 1003 Before, adding the identifier of the ingress port of the data packet at the first intermediate node to the data packet.
在一种可能的设计中,所述处理单元1002在所述数据报文中添加所述数据报文在所述第一中间节点的入端口的标识时,具体用于将所述数据报文携带的源路由标签中记录的所述第一中间节点的出端口的标识替换为所述入端口的标识。In a possible design, the processing unit 1002 is specifically configured to carry the The identifier of the egress port of the first intermediate node recorded in the source routing label of , is replaced by the identifier of the ingress port.
有关上述处理单元1002和通信单元1003更详细的描述可以直接参考方法实施例中相关描述直接得到,这里不加赘述。More detailed descriptions about the processing unit 1002 and the communication unit 1003 can be directly obtained by referring to related descriptions in the method embodiments, and details are not repeated here.
如图11所示,数据传输装置1100包括处理器1110和接口电路1120。处理器1110和接口电路1120之间相互耦合。可以理解的是,接口电路1120可以为输入输出接口。可选的,数据传输装置1100还可以包括存储器1130,用于存储处理器1110执行的指令或存储处理器1110运行指令所需要的输入数据或存储处理器1110运行指令后产生的数据。As shown in FIG. 11 , the data transmission device 1100 includes a processor 1110 and an interface circuit 1120 . The processor 1110 and the interface circuit 1120 are coupled to each other. It can be understood that the interface circuit 1120 may be an input and output interface. Optionally, the data transmission device 1100 may further include a memory 1130 for storing instructions executed by the processor 1110 or storing input data required by the processor 1110 to execute the instructions or storing data generated by the processor 1110 after executing the instructions.
当数据传输装置1100用于实现适用于源节点或网络管理器或第一中间节点的数据传输方法时,处理器1110用于实现上述处理单元1002的功能,接口电路1120用于实现上述通信单元1003的功能。When the data transmission device 1100 is used to implement the data transmission method applicable to the source node or the network manager or the first intermediate node, the processor 1110 is used to implement the functions of the above-mentioned processing unit 1002, and the interface circuit 1120 is used to implement the above-mentioned communication unit 1003 function.
作为本实施例的另一种形式,提供一种计算机可读存储介质,其上存储有指令,该指令被执行时可以执行上述方法实施例中适用于源节点或网络管理器或第一中间节点的数据传输方法。As another form of this embodiment, a computer-readable storage medium is provided, on which instructions are stored, and when the instructions are executed, the above method can be executed data transfer method.
作为本实施例的另一种形式,提供一种包含指令的计算机程序产品,该指令被执行时可以执行上述方法实施例中适用于源节点或网络管理器或第一中间节点的数据传输方法。As another form of this embodiment, a computer program product including instructions is provided, and when the instructions are executed, the data transmission method applicable to the source node or the network manager or the first intermediate node in the above method embodiments can be executed.
作为本实施例的另一种形式,提供一种芯片,所述芯片运行时,可以执行上述方法实施例中适用于源节点或网络管理器或第一中间节点的数据传输方法。As another form of this embodiment, a chip is provided, and when the chip is running, it can execute the data transmission method applicable to the source node or the network manager or the first intermediate node in the above method embodiments.
图12为本申请实施例提供的一种数据传输系统的架构示意图,所述数据传输系统包括网络管理器、源节点、目的节点和第一中间节点,其中所述网络管理器具有上述用于实现方法实施例中网络管理器的功能数据传输装置、源节点具有上述用于实现方法实施例中源节点的功能数据传输装置、第一中间节点具有上述用于实现方法实施例中第一中间节点的功能数据传输装置。Fig. 12 is a schematic diagram of the architecture of a data transmission system provided by an embodiment of the present application. The data transmission system includes a network manager, a source node, a destination node, and a first intermediate node, wherein the network manager has the above-mentioned functions for realizing The function data transmission device of the network manager in the method embodiment, the source node has the above-mentioned function data transmission device for realizing the source node in the method embodiment, and the first intermediate node has the above-mentioned function data transmission device for realizing the first intermediate node in the method embodiment Function data transmission device.
上述实施例,可以全部或部分地通过软件、硬件、固件或其他任意组合来实现。当使用软件实现时,上述实施例可以全部或部分地以计算机程序产品的形式实现。所述计算机程序产品包括一个或多个计算机指令。在计算机上加载或执行所述计算机程序指令时,全部或部分地产生按照本申请实施例所述的流程或功能。所述计算机可以为通用计算机、专用计算机、计算机网络、或者其他可编程装置。所述计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一个计算机可读存储介质传输,例如,所述计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线(例如同轴电缆、光纤、数字用户线(DSL))或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。所述计算机可读存储介质可以是计算机能够存取的任何可用介质或者是包含一个或多个可用介质集合的服务器、数据中心等数据存储设备。所述可用介质可以是磁性介质(例如,软盘、硬盘、磁带)、光介质(例如,DVD)、或者半导体介质。半导体介质可以是固态硬盘(solid state drive,SSD)。The above-mentioned embodiments may be implemented in whole or in part by software, hardware, firmware or other arbitrary combinations. When implemented using software, the above-described embodiments may be implemented in whole or in part in the form of computer program products. The computer program product includes one or more computer instructions. When the computer program instructions are loaded or executed on the computer, the processes or functions according to the embodiments of the present application will be generated in whole or in part. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable devices. The computer instructions may be stored in or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be transmitted from a website, computer, server or data center Transmission to another website site, computer, server, or data center by wired (eg, coaxial cable, optical fiber, digital subscriber line (DSL)) or wireless (eg, infrared, wireless, microwave, etc.). The computer-readable storage medium may be any available medium that can be accessed by a computer, or a data storage device such as a server or a data center that includes one or more sets of available media. The available media may be magnetic media (eg, floppy disk, hard disk, magnetic tape), optical media (eg, DVD), or semiconductor media. The semiconductor medium may be a solid state drive (SSD).
以上所述,仅为本申请的具体实施方式。熟悉本技术领域的技术人员根据本申请提供的具体实施方式,可想到变化或替换,都应涵盖在本申请的保护范围之内。The foregoing is only a specific implementation manner of the present application. Those skilled in the art may conceive changes or substitutions based on the specific implementation methods provided in this application, and all of them shall fall within the protection scope of this application.

Claims (20)

  1. 一种数据传输系统,其特征在于,包括网络管理器、源节点和目的节点;A data transmission system, characterized in that it includes a network manager, a source node and a destination node;
    所述源节点,用于向网络管理器发送传输路径请求,所述传输路径请求包括所述目的节点的标识;The source node is configured to send a transmission path request to the network manager, where the transmission path request includes the identifier of the destination node;
    所述网络管理器,用于根据所述传输路径请求,在所述源节点和所述目的节点之间确定主传输路径和备传输路径,在源节点向目的节点传输数据报文之前,向所述源节点发送包括所述主传输路径和所述备传输路径的传输路径响应;The network manager is configured to determine a primary transmission path and a backup transmission path between the source node and the destination node according to the transmission path request, and send a data packet to the destination node before the source node transmits a data message to the destination node. The source node sends a transmission path response including the primary transmission path and the standby transmission path;
    所述源节点,还用于记录包括所述主传输路径和所述备传输路径的传输路径响应,并当所述主传输路径传输所述数据报文发生故障时,利用所述备传输路径传输所述数据报文。The source node is further configured to record a transmission path response including the primary transmission path and the backup transmission path, and when the primary transmission path fails to transmit the data packet, use the backup transmission path to transmit The datagram.
  2. 如权利要求1所述的数据传输系统,其特征在于,所述数据传输系统还包括:第一中间节点,其中所述第一中间节点是所述主传输路径中位于所述源节点和所述目的节点之间的任意一个节点;The data transmission system according to claim 1, characterized in that, the data transmission system further comprises: a first intermediate node, wherein the first intermediate node is located between the source node and the Any node between the destination nodes;
    所述第一中间节点,用于根据所述数据报文,确定回退路径;The first intermediate node is configured to determine a fallback path according to the data message;
    所述第一中间节点,还用于当检测到所述主传输路径故障时,根据所述回退路径向所述源节点发送故障通告报文。The first intermediate node is further configured to send a failure notification message to the source node according to the fallback path when detecting a failure of the main transmission path.
  3. 如权利要求2所述的数据传输系统,其特征在于,所述数据传输系统还包括:一个或多个第二中间节点,其中所述一个或多个第二中间节点是所述主传输路径中位于所述源节点和所述第一中间节点之间的节点;The data transmission system according to claim 2, wherein the data transmission system further comprises: one or more second intermediate nodes, wherein the one or more second intermediate nodes are in the main transmission path a node located between said source node and said first intermediate node;
    所述第一中间节点根据所述数据报文,确定回退路径时,具体用于根据所述数据报文中携带的所述数据报文在所述一个或多个第二中间节点的入端口的标识,确定所述回退路径。When the first intermediate node determines the fallback path according to the data message, it is specifically used to, according to the ingress port of the one or more second intermediate nodes, the data message carried in the data message , to determine the fallback path.
  4. 如权利要求2或3所述的数据传输系统,其特征在于,所述第一中间节点,还用于在向所述主传输路径中与所述第一中间节点相邻的下一跳节点转发所述数据报文之前,在所述数据报文中添加所述数据报文在所述第一中间节点的入端口的标识。The data transmission system according to claim 2 or 3, wherein the first intermediate node is further configured to forward the next-hop node to the next-hop node adjacent to the first intermediate node in the main transmission path Before the data message, add the identifier of the ingress port of the data message at the first intermediate node to the data message.
  5. 如权利要求4所述的数据传输系统,其特征在于,所述第一中间节点在所述数据报文中添加所述数据报文在所述第一中间节点的入端口的标识时,具体用于将所述数据报文携带的源路由标签中记录的所述第一中间节点的出端口的标识替换为所述入端口的标识。The data transmission system according to claim 4, wherein when the first intermediate node adds the identification of the ingress port of the data message in the data message to the data message, specifically use and replacing the identifier of the egress port of the first intermediate node recorded in the source routing label carried by the data packet with the identifier of the ingress port.
  6. 一种数据传输方法,其特征在于,包括:A data transmission method, characterized in that, comprising:
    源节点向网络管理器发送传输路径请求,所述传输路径请求包括目的节点的标识;The source node sends a transmission path request to the network manager, and the transmission path request includes the identifier of the destination node;
    所述源节点接收来自所述网络管理器的包括主传输路径和备传输路径的传输路径响应,并记录包括所述主传输路径和所述备传输路径的传输路径响应;The source node receives a transmission path response including the primary transmission path and the backup transmission path from the network manager, and records the transmission path response including the primary transmission path and the backup transmission path;
    当所述主传输路径传输数据报文发生故障时,所述源节点利用所述备传输路径传输所述数据报文。When the data message transmitted by the main transmission path fails, the source node transmits the data message by using the backup transmission path.
  7. 如权利要求6所述的方法,其特征在于,所述方法还包括:The method of claim 6, further comprising:
    当所述源节点接收到来自第一中间节点的故障通告报文时,确定所述主传输路径传输数据报文发生故障,其中所述第一中间节点是所述主传输路径中位于所述源节点和所述目的节点之间的任意一个节点。When the source node receives the failure notification message from the first intermediate node, it determines that the data message transmitted by the main transmission path fails, wherein the first intermediate node is located in the source node on the main transmission path Any node between the node and the destination node.
  8. 一种数据传输方法,其特征在于,包括:A data transmission method, characterized in that, comprising:
    网络管理器接收来自源节点的传输路径请求,所述传输路径请求包括目的节点的标识;The network manager receives a transmission path request from the source node, where the transmission path request includes the identifier of the destination node;
    所述网络管理器在所述源节点和所述目的节点之间确定主传输路径和备传输路径;The network manager determines a primary transmission path and a backup transmission path between the source node and the destination node;
    所述网络管理器在源节点向目的节点传输数据报文之前,向所述源节点发送包括所述主传输路径和所述备传输路径的传输路径响应。Before the source node transmits the data packet to the destination node, the network manager sends a transmission path response including the primary transmission path and the standby transmission path to the source node.
  9. 一种数据传输方法,其特征在于,包括:A data transmission method, characterized in that, comprising:
    第一中间节点接收来自源节点的数据报文,其中所述第一中间节点是位于所述源节点和所述数据报文的目的节点传输所述数据报文的主传输路径中的节点;The first intermediate node receives the data message from the source node, wherein the first intermediate node is a node located in the main transmission path where the source node and the destination node of the data message transmit the data message;
    所述第一中间节点根据所述数据报文,确定回退路径;The first intermediate node determines a fallback path according to the data message;
    当检测到所述主传输路径故障时,所述第一中间节点根据所述回退路径向所述源节点发送故障通告报文。When detecting that the main transmission path is faulty, the first intermediate node sends a fault notification message to the source node according to the fallback path.
  10. 如权利要求9所述的方法,其特征在于,所述第一中间节点根据所述数据报文,确定回退路径,包括:The method according to claim 9, wherein the first intermediate node determines a fallback path according to the data message, comprising:
    所述第一中间节点根据所述数据报文中携带的所述数据报文在一个或多个第二中间节点的入端口的标识,确定所述回退路径,其中所述一个或多个第二中间节点是所述主传输路径中位于所述源节点和所述第一中间节点之间的节点。The first intermediate node determines the fallback path according to the identifiers of the ingress ports of the data message carried in the data message on one or more second intermediate nodes, wherein the one or more first The second intermediate node is a node located between the source node and the first intermediate node in the main transmission path.
  11. 如权利要求9或10所述的方法,其特征在于,所述方法还包括:The method according to claim 9 or 10, further comprising:
    在向所述主传输路径中与所述第一中间节点相邻的下一跳节点转发所述数据报文之前,所述第一中间节点在所述数据报文中添加所述数据报文在所述第一中间节点的入端口的标识。Before forwarding the data message to the next hop node adjacent to the first intermediate node in the main transmission path, the first intermediate node adds the data message in the data message The identifier of the ingress port of the first intermediate node.
  12. 一种源节点,其特征在于,包括:处理单元和通信单元;A source node, characterized by comprising: a processing unit and a communication unit;
    所述通信单元,用于向网络管理器发送传输路径请求,所述传输路径请求包括目的节点的标识;以及接收来自所述网络管理器的包括主传输路径和备传输路径的传输路径响应;The communication unit is configured to send a transmission path request to a network manager, where the transmission path request includes an identifier of a destination node; and receive a transmission path response from the network manager including a primary transmission path and a backup transmission path;
    所述处理单元,用于记录包括所述主传输路径和所述备传输路径的传输路径响应;The processing unit is configured to record a transmission path response including the primary transmission path and the backup transmission path;
    所述通信单元,还用于当所述主传输路径传输数据报文发生故障时,利用所述备传输路径传输所述数据报文。The communication unit is further configured to use the standby transmission path to transmit the data packet when the primary transmission path fails to transmit the data packet.
  13. 如权利要求12所述的源节点,其特征在于,所述处理单元,还用于当所述通信单元接收到来自第一中间节点的故障通告报文时,确定所述主传输路径传输数据报文发生故障,其中所述第一中间节点是所述主传输路径中位于所述源节点和所述目的节点之间的任意一个节点。The source node according to claim 12, wherein the processing unit is further configured to determine that the main transmission path transmits a datagram when the communication unit receives a failure notification message from the first intermediate node A file failure occurs, wherein the first intermediate node is any node located between the source node and the destination node in the main transmission path.
  14. 一种网络管理器,其特征在于,包括:处理单元和通信单元;A network manager, characterized by comprising: a processing unit and a communication unit;
    所述通信单元,用于接收来自源节点的传输路径请求,所述传输路径请求包括目的节点的标识;The communication unit is configured to receive a transmission path request from a source node, where the transmission path request includes an identifier of a destination node;
    所述处理单元,用于在所述源节点和所述目的节点之间确定主传输路径和备传输路径;The processing unit is configured to determine a main transmission path and a backup transmission path between the source node and the destination node;
    所述通信单元,还用于在源节点向目的节点传输数据报文之前,向所述源节点发送包括所述主传输路径和所述备传输路径的传输路径响应。The communication unit is further configured to send a transmission path response including the primary transmission path and the backup transmission path to the source node before the source node transmits the data message to the destination node.
  15. 一种第一中间节点,其特征在于,包括:处理单元和通信单元;A first intermediate node, characterized by comprising: a processing unit and a communication unit;
    所述通信单元,用于接收来自源节点的数据报文,其中所述第一中间节点是位于所述源节点和所述数据报文的目的节点传输所述数据报文的主传输路径中的节点;The communication unit is configured to receive a data message from a source node, wherein the first intermediate node is located in the main transmission path where the source node and the destination node of the data message transmit the data message node;
    所述处理单元,用于根据所述数据报文,确定回退路径;The processing unit is configured to determine a fallback path according to the data message;
    所述通信单元,还用于当检测到所述主传输路径故障时,根据所述回退路径向所述源节点发送故障通告报文。The communication unit is further configured to send a failure notification message to the source node according to the fallback path when a failure of the main transmission path is detected.
  16. 如权利要求15所述的第一中间节点,其特征在于,所述处理单元根据所述数据报文,确定回退路径时,具体用于根据所述数据报文中携带的所述数据报文在一个或多个第二中间节点的入端口的标识,确定所述回退路径,其中所述一个或多个第二中间节点是所述主传输路径中位于所述源节点和所述第一中间节点之间的节点。The first intermediate node according to claim 15, wherein when the processing unit determines the fallback path according to the data message, it is specifically configured to Identifying said fallback path at an ingress port of one or more second intermediate nodes, wherein said one or more second intermediate nodes are between said source node and said first Nodes between intermediate nodes.
  17. 如权利要求15或16所述的第一中间节点,其特征在于,所述处理单元,还用于在所述通信单元向所述主传输路径中与所述第一中间节点相邻的下一跳节点转发所述数据报文之前,在所述数据报文中添加所述数据报文在所述第一中间节点的入端口的标识。The first intermediate node according to claim 15 or 16, wherein the processing unit is further configured to transmit the communication unit to the next next adjacent node in the main transmission path to the first intermediate node. Before forwarding the data message, the hopping node adds an identifier of an ingress port of the data message at the first intermediate node to the data message.
  18. 如权利要求17所述的第一中间节点,其特征在于,所述处理单元在所述数据报文中添加所述数据报文在所述第一中间节点的入端口的标识时,具体用于将所述数据报文携带的源路由标签中记录的所述第一中间节点的出端口的标识替换为所述入端口的标识。The first intermediate node according to claim 17, wherein when the processing unit adds the identification of the ingress port of the data message in the data message to the data message, it is specifically used to replacing the identifier of the egress port of the first intermediate node recorded in the source routing label carried by the data packet with the identifier of the ingress port.
  19. 一种数据传输装置,其特征在于,所述装置包括处理器和存储器,所述存储器中用于存储计算机执行指令,所述数据传输装置运行时,所述处理器执行所述存储器中的计算机执行指令以利用所述数据传输装置中的硬件资源执行权利要求6-11中任一所述方法的操作步骤。A data transmission device, characterized in that the device includes a processor and a memory, the memory is used to store computer-executable instructions, and when the data transmission device is running, the processor executes the computer-executable instructions in the memory Instructions are used to execute the operation steps of the method in any one of claims 6-11 by using the hardware resources in the data transmission device.
  20. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质中存储有计算机程序,当所述计算机程序被计算机执行时,使得所述计算机执行如权利要求6-11中任一项所述的方法的操作步骤。A computer-readable storage medium, characterized in that a computer program is stored in the computer-readable storage medium, and when the computer program is executed by a computer, the computer executes any one of claims 6-11. Operational steps of the method.
PCT/CN2022/095142 2021-05-31 2022-05-26 Data transmission method, node, network manager, and system WO2022253087A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110604511.4A CN113472646B (en) 2021-05-31 2021-05-31 Data transmission method, node, network manager and system
CN202110604511.4 2021-05-31

Publications (1)

Publication Number Publication Date
WO2022253087A1 true WO2022253087A1 (en) 2022-12-08

Family

ID=77871896

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/095142 WO2022253087A1 (en) 2021-05-31 2022-05-26 Data transmission method, node, network manager, and system

Country Status (2)

Country Link
CN (1) CN113472646B (en)
WO (1) WO2022253087A1 (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113472646B (en) * 2021-05-31 2023-02-10 华为技术有限公司 Data transmission method, node, network manager and system
CN115442293B (en) * 2022-08-27 2023-06-06 武汉烽火技术服务有限公司 Path finding method, device, equipment and readable storage medium
WO2024065481A1 (en) * 2022-09-29 2024-04-04 新华三技术有限公司 Data processing method and apparatus, and network device and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101047618A (en) * 2006-03-29 2007-10-03 华为技术有限公司 Method and system for acquiring network route information
CN101512968A (en) * 2006-09-19 2009-08-19 华为技术有限公司 Faults propagation and protection for connection oriented data paths in packet networks
US9712381B1 (en) * 2014-07-31 2017-07-18 Google Inc. Systems and methods for targeted probing to pinpoint failures in large scale networks
CN110178410A (en) * 2017-12-21 2019-08-27 华为技术有限公司 A kind of communication path determines method and the network equipment
CN113472646A (en) * 2021-05-31 2021-10-01 华为技术有限公司 Data transmission method, node, network manager and system

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101192883A (en) * 2006-11-21 2008-06-04 华为技术有限公司 Multicast protection method in WDM optical network
CN103856400B (en) * 2012-11-29 2017-06-27 华为技术有限公司 FCoE message forwarding methods, equipment and system
CN116232986A (en) * 2019-11-01 2023-06-06 华为技术有限公司 Path protection method and network node

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101047618A (en) * 2006-03-29 2007-10-03 华为技术有限公司 Method and system for acquiring network route information
CN101512968A (en) * 2006-09-19 2009-08-19 华为技术有限公司 Faults propagation and protection for connection oriented data paths in packet networks
US9712381B1 (en) * 2014-07-31 2017-07-18 Google Inc. Systems and methods for targeted probing to pinpoint failures in large scale networks
CN110178410A (en) * 2017-12-21 2019-08-27 华为技术有限公司 A kind of communication path determines method and the network equipment
CN113472646A (en) * 2021-05-31 2021-10-01 华为技术有限公司 Data transmission method, node, network manager and system

Also Published As

Publication number Publication date
CN113472646A (en) 2021-10-01
CN113472646B (en) 2023-02-10

Similar Documents

Publication Publication Date Title
US9736278B1 (en) Method and apparatus for connecting a gateway router to a set of scalable virtual IP network appliances in overlay networks
WO2022253087A1 (en) Data transmission method, node, network manager, and system
EP2911348B1 (en) Control device discovery in networks having separate control and forwarding devices
KR101317969B1 (en) Inter-node link aggregation system and method
US8976652B2 (en) Relay device, method of controlling relay device, and relay system
JP6165850B2 (en) Enhanced protocol independent multicast (PIM) fast rerouting methodology using downstream notification packets
US9614759B2 (en) Systems and methods for providing anycast MAC addressing in an information handling system
KR100680888B1 (en) Virtual multicast routing for a cluster having state synchronization
US9059902B2 (en) Procedures, apparatuses, systems, and computer-readable media for operating primary and backup network elements
US8549120B2 (en) System and method for location based address assignment in the distribution of traffic in a virtual gateway
US10469277B2 (en) Multicast group establishment method in fat-tree network, apparatus, and fat-tree network
US9515927B2 (en) System and method for layer 3 proxy routing
JP2015521449A (en) Enhancements to PIM fast rerouting using upstream activation packets
WO2020168854A1 (en) Evpn multicast method, apparatus and system
US11750440B2 (en) Fast forwarding re-convergence of switch fabric multi-destination packets triggered by link failures
US10742545B2 (en) Multicasting system
WO2022067791A1 (en) Data processing method, data transmission method, and related device
US10771402B2 (en) Link aggregated fibre channel over ethernet system
US9800508B2 (en) System and method of flow shaping to reduce impact of incast communications
US20200403908A1 (en) Fault diagnosis method and apparatus thereof
US9030926B2 (en) Protocol independent multicast last hop router discovery
US11394635B2 (en) Aggregated bit index explicit replication networking system
EP4325800A1 (en) Packet forwarding method and apparatus
US20210067438A1 (en) Multicast transmissions management
US10728328B2 (en) System and method for transmitting data via ethernet switch devices in an ethernet fabric

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22815133

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE