WO2021203985A1 - Congestion information synchronizing method and related apparatus - Google Patents

Congestion information synchronizing method and related apparatus Download PDF

Info

Publication number
WO2021203985A1
WO2021203985A1 PCT/CN2021/083150 CN2021083150W WO2021203985A1 WO 2021203985 A1 WO2021203985 A1 WO 2021203985A1 CN 2021083150 W CN2021083150 W CN 2021083150W WO 2021203985 A1 WO2021203985 A1 WO 2021203985A1
Authority
WO
WIPO (PCT)
Prior art keywords
value
message
congestion
bit
network computing
Prior art date
Application number
PCT/CN2021/083150
Other languages
French (fr)
Chinese (zh)
Inventor
林钦亮
王巧灵
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2021203985A1 publication Critical patent/WO2021203985A1/en

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/10Flow control; Congestion control
    • H04L47/11Identifying congestion
    • H04L47/115Identifying congestion using a dedicated packet
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/10Flow control; Congestion control
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L69/00Network arrangements, protocols or services independent of the application payload and not provided for in the other groups of this subclass
    • H04L69/16Implementation or adaptation of Internet protocol [IP], of transmission control protocol [TCP] or of user datagram protocol [UDP]

Definitions

  • the embodiments of the present application relate to the field of communication technologies, and in particular to a method and related devices for synchronizing congestion information.
  • More and more network applications rely on large-scale computing, such as artificial intelligence, Internet of Things, cloud computing, and so on.
  • large-scale computing it is not feasible to rely on a single training node. Only through the collaborative computing of multiple training nodes in distributed computing can high-performance computing be provided for network applications.
  • the in-net computing network makes full use of network computing resources, allocates some key calculations for distributed computing nodes, and provides aggregated calculations, so that multiple pieces of data are aggregated into one, thereby reducing network bandwidth usage and speeding up network transmission , Speed up distributed computing.
  • congestion control is an important means to improve network resource utilization and optimize transmission quality.
  • the quality of congestion handling directly affects the performance of the system.
  • standard remote direct memory access (RDMA) protocols use congestion control algorithms based on explicit congestion notification (ECN) flags, and more and more are based on transmission control protocols (transmission control protocols).
  • ECN explicit congestion notification
  • TCP transmission control protocols
  • the control protocol (TCP) protocol and applications have also begun to enable the ECN flag in the standard TCP protocol and the corresponding congestion control method.
  • the congestion control method based on the ECN flag bit can be understood as: if the data flow is sent from node A to node B, when the switch in the network detects that the port is congested, it will pass the forward ECN position in all the packets of the port The bit is 1.
  • node B receives a message with forward ECN set to 1 (indicating that the route from node A to node B is congested), then the backward ECN bit in the message of node A will be set to 1. Or reply to a congestion notification packet (CNP), so that node A performs congestion control when it receives a packet with the ECN set or a CNP packet.
  • CNP congestion notification packet
  • the current congestion control method based on the ECN flag is only suitable for point-to-point unicast communication.
  • the current congestion control method based on the ECN flag is easy to work
  • the data message sent by the node is discarded by the network computing switch in the process of aggregation processing; in addition, the communication mode becomes a many-to-many communication mode due to the process of providing aggregation calculation by the computing network within the network.
  • the traditional point-to-point congestion control method is likely to cause the transmission rate of each working node in the many-to-many communication mode to be unable to synchronize, so that multiple working nodes cannot synchronize congestion control.
  • the embodiments of the present application provide a method and related devices for synchronization of congestion information, which are used to synchronously send congestion information to N working nodes, which fills in the gap of the lack of applicable congestion information synchronization in the intranet computing network, and further enables N
  • the sending rate of each of the working nodes tends to be smooth.
  • an embodiment of the present application provides a method for synchronizing congestion information, and the method may include:
  • the in-network computing switch acquires congestion information, where the congestion information is used to indicate that the first communication link is congested, and the first communication link is the link between the first working node and the in-network computing switch, and
  • the first working node is any one of N working nodes, and the N is an integer greater than 2;
  • the computing switch in the network sends N packets with the same sequence number to the N working nodes, and the N packets with the same sequence number all carry the congestion information, so that the N working nodes are based on The congestion information performs congestion control.
  • the computing switch in the network sends N packets with the same sequence number that all carry congestion information to N working nodes, so that the congestion information can be synchronized to these N working nodes, which fills in the lack of the computing network in the network.
  • the applicable congestion information synchronization is blank, and the N working nodes can perform synchronous congestion control according to the congestion information carried in the message, so that the sending rate of each working node tends to be smooth.
  • the in-network computing switch acquiring congestion information may include:
  • the in-network computing switch obtains the first message sent by the first working node, the first message carries the congestion information, and the congestion information includes the display congestion notification ECN flag bit of the first message
  • the congestion information includes the display congestion notification ECN flag bit of the first message
  • the in-network computing switch sends N packets with the same sequence number to the N working nodes, and the N packets with the same sequence number all carry the congestion information, including:
  • the in-network computing switch sends N packets with the same sequence number to the N working nodes, and the N packets with the same sequence number all carry the Congestion information.
  • the network computing switch will carry the congestion information in N packets with the same sequence number, and send the N messages to N working nodes.
  • a message with the same sequence number not only realizes that the congestion information can be synchronized to these N working nodes, and fills the gap of the lack of applicable congestion information synchronization in the intranet computing network, but also enables the N working nodes to be able to synchronize according to the report.
  • the congestion information carried in the text performs synchronous congestion control, and further solves the failure problem of the congestion information in the in-network computing network based on the first value being within the time limit.
  • the in-network computing switch sends N reports with the same sequence number to the N working nodes.
  • N reports with the same sequence number to the N working nodes.
  • the in-network computing switch modifies the value of the ECN flag bit of the second packet to the first value to obtain a third packet, and the second packet is the first aggregation within the validity period.
  • the sequence number of the second message is the same as the sequence number of the third message;
  • the in-network computing switch sending N packets with the same sequence number to the N working nodes includes:
  • the computing switch in the network sends N third messages to the N working nodes, the first value in each third message indicates that the corresponding working node performs congestion control, and the N The sequence numbers in the third message are the same.
  • the computing switch in the network realizes synchronization of congestion information by sending N third messages with the same sequence number to N working nodes, and further makes the first value in each third message indicate the corresponding The working nodes perform congestion control so that the sending rate of each working node tends to be smooth.
  • the third possible implementation manner it may further include:
  • the in-network computing switch receives the fourth packet sent by the first working node within the congestion indicator aging period of the first value, the value on the ECN flag bit in the fourth packet Is the first value;
  • the in-network computing switch ignores the first value in the fourth packet.
  • the in-network computing switch modifies the value of the ECN flag bit of the second packet to the first A value, which can include:
  • the in-network computing switch When the ECN flag bit of the first packet includes the first ECN field, and the ECN flag bit of the second packet includes the second ECN field, the in-network computing switch will set the value in the second ECN field The value is modified to the first value in the first ECN field; or,
  • the in-network computing switch When the ECN flag bit of the first message includes the first forward display congestion notification FECN bit, and the ECN flag bit of the second message includes the second FECN bit, the in-network computing switch will The value in the second FECN bit is modified to the first value in the first FECN bit; or,
  • the in-network computing switch When the ECN flag bit of the first message includes the first backward display congestion notification BECN bit, and the ECN flag bit of the second message includes the second BECN bit, the in-network computing switch will The value in the second BECN bit is modified to the first value on the first BECN bit.
  • the way that the calculation switch in the network modifies the value on the ECN flag bit of the second packet to the first value is also There can be multiple types, and through the above-mentioned modification, multiple application possibilities are provided for the subsequent synchronization of congestion information.
  • the in-network computing switch acquiring congestion information may include:
  • the in-network computing switch modifies the value of the ECN flag bit in the N data messages to be broadcast to Obtain congestion information, where the N data messages to be broadcast are messages with the same sequence number among the N working nodes;
  • the in-network computing switch sends N packets with the same sequence number to the N working nodes, and the N packets with the same sequence number all carry the congestion information, which may include:
  • the in-network computing switch sends the modified N data messages to be broadcast to the N working nodes, and the values on the ECN flag bits in the modified N data messages to be broadcast are used respectively Instruct the N working nodes to perform congestion control.
  • the computing switch in the network can send these N modified data messages to be broadcast to N working nodes, so that each modified data message to be broadcast carries congestion information, so that the congestion information can be Synchronize to these N working nodes, filling the gap of the lack of applicable congestion information synchronization in the intranet computing network; and enabling the N working nodes to synchronize according to the congestion information carried in the modified data message to be broadcast Congestion control further smoothes the sending rate of each of the N working nodes.
  • the in-network computing switch calculates the value on the ECN flag bit in the N data packets to be broadcast Modifications include:
  • the in-network computing switch sets the value in the third ECN field; or,
  • the in-network computing switch sets the value in the third FECN bit.
  • the way that the calculation switch in the network modifies the value on the ECN flag bit of the second packet to the first value is also There can be multiple types, and through the above-mentioned modification, multiple application possibilities are provided for the subsequent synchronization of congestion information.
  • the embodiments of the present application provide an in-network computing switch, and the in-network computing switch may include:
  • the acquiring unit is configured to acquire congestion information, where the congestion information is used to indicate that the first communication link is congested, and the first communication link is the link between the first working node and the computing switch in the network, so
  • the first working node is any one of N working nodes, and the N is an integer greater than 2;
  • the sending unit is configured to send N messages with the same sequence number to the N working nodes, and the N messages with the same sequence number all carry the congestion information, so that the N working nodes are based on all the messages.
  • the congestion information is described for congestion control.
  • the acquiring unit may include:
  • the first obtaining module is configured to obtain a first message sent by the first working node, the first message carrying the congestion information, and the congestion information includes the display congestion notification ECN flag of the first message A first value on the bit, where the first value is used to indicate that the first communication link is congested;
  • the sending unit includes:
  • the first sending module is configured to send N packets with the same sequence number to the N working nodes within the congestion indication time period of the first value obtained by the first obtaining module, and the N All packets with the same sequence number carry the congestion information.
  • the in-network computing switch may further include:
  • the modifying unit is configured to modify the value of the ECN flag bit of the second message to the first value before the first sending module sends N messages with the same sequence number to the N working nodes, To obtain a third message, where the second message is the first message to be aggregated within the validity period, and the sequence number of the second message is the same as the sequence number of the third message;
  • the first sending module includes:
  • the first sending submodule is configured to send the N third messages obtained by the modifying unit to the N working nodes, and the first value in each third message indicates the corresponding work
  • the node performs congestion control, and the sequence numbers in the N third messages are the same.
  • the in-network computing switch further includes:
  • the acquiring unit is configured to receive a fourth message sent by the first working node within the congestion indication aging period of the first value, and the value on the ECN flag bit in the fourth message Is the first value;
  • the ignoring unit is configured to ignore the first value in the fourth message obtained by the obtaining unit.
  • the modification unit is configured to, when the ECN flag bit of the first packet includes a first ECN field, and the ECN flag bit of the second packet includes a second ECN field, add the second ECN field to The value of is modified to the first value in the first ECN field; or,
  • the modification unit is configured to: when the ECN flag bit of the first message includes the first forward display congestion notification FECN bit, and the ECN flag bit of the second message includes the second FECN bit, then the The value in the second FECN bit is modified to the first value in the first FECN bit; or,
  • the modification unit is configured to: when the ECN flag bit of the first message includes the first backward display congestion notification BECN bit, and the ECN flag bit of the second message includes the second BECN bit, then the The value in the second BECN bit is modified to the first value on the first BECN bit.
  • the acquiring unit may include:
  • the second acquisition module is configured to modify the value of the ECN flag bit in the N data packets to be broadcast when the port status between the first working node and the in-network computing switch shows congestion, To obtain congestion information, where the N data messages to be broadcast are messages with the same sequence number among the N working nodes;
  • the sending unit includes:
  • the second sending module is configured to send the modified N data messages to be broadcast to the N working nodes, and the values on the ECN flag bits in the modified N data messages to be broadcast are respectively used Instruct the N working nodes to perform congestion control.
  • the second acquisition module is configured to include the ECN flag bit in the data message to be broadcast When the third ECN field is used, the calculation switch in the network sets the value in the third ECN field; or,
  • the second acquiring module is configured to, when the ECN flag bit in the data message to be broadcast includes the third FECN bit, the in-network computing switch sets the value in the third FECN bit.
  • an embodiment of the present application provides a computer device, including: a processor and a memory; the memory is used to store program instructions, and when the computer device is running, the processor executes the program instructions stored in the memory to enable The computer device executes the congestion information synchronization method as described in the first aspect or any one of the possible implementation manners of the first aspect.
  • embodiments of the present application provide a computer-readable storage medium, including instructions, which when run on a computer, cause the computer to execute a method as in the first aspect or any one of the possible implementation manners of the first aspect.
  • the embodiments of the present application provide a computer program product containing instructions, which when run on a computer, cause the computer to execute the method as in the first aspect or any one of the possible implementation manners of the first aspect.
  • an embodiment of the present application provides a chip system, which includes a processor, and is used to support the in-network computing switch to implement the functions involved in the first aspect or any one of the possible implementation manners of the first aspect.
  • the chip system also includes a memory, and the memory is used to store the program instructions and data necessary for calculating the switch in the network.
  • the chip system can be composed of chips, or include chips and other discrete devices.
  • the congestion information can indicate that the first communication link between any working node and the computing switch in the network is congested
  • the congestion information is all It is carried in N messages with the same sequence number, so that the N messages with the same sequence number are sent to N working nodes.
  • the computing switch in the network sends N packets with the same sequence number that all carry congestion information to N working nodes, which not only realizes that the congestion information can be synchronized to these N working nodes, so that N working nodes
  • the congestion control can be performed synchronously according to the congestion information carried in the message, which fills in the gap of the lack of applicable congestion information synchronization in the computing network in the network, and further makes the N working nodes
  • the sending rate of each working node tends to be smooth.
  • FIG. 1 is a schematic diagram of a system architecture provided by an embodiment of the present application.
  • FIG. 2 is a schematic diagram of an embodiment of a method for synchronization of congestion information provided by this embodiment
  • FIG. 3 is a schematic diagram of aggregation calculation performed by the in-network computing switch provided by this embodiment
  • FIG. 4 is a schematic diagram of another embodiment of a method for synchronization of congestion information provided by this embodiment
  • FIG. 5 is a schematic diagram of the state of the ECN flag bit of the RoCE v2 protocol or the TCP protocol proposed in the embodiment of the present application;
  • FIG. 6 is a schematic diagram of another embodiment of the method for synchronization of congestion information provided by this embodiment.
  • FIG. 7 is a schematic diagram of an embodiment of an in-network computing switch provided by an embodiment of the present application.
  • FIG. 8 is a schematic diagram of another embodiment of an in-network computing switch provided by an embodiment of the present application.
  • FIG. 9 is a schematic diagram of another embodiment of an in-network computing switch provided by an embodiment of the present application.
  • FIG. 10 is a schematic diagram of another embodiment of an in-network computing switch provided by an embodiment of the present application.
  • FIG. 11 is a schematic diagram of the hardware structure of a communication device in an embodiment of the present application.
  • the embodiments of the present application provide a method and related devices for synchronization of congestion information, which are used to synchronously send congestion information to N working nodes, which fills in the gap of the lack of applicable congestion information synchronization in the intranet computing network, and further enables N
  • the sending rate of each of the working nodes tends to be smooth.
  • Congestion control is an important means to improve network resource utilization and optimize transmission quality.
  • the quality of congestion handling directly affects the performance of the system. Since the in-network computing network can make full use of network computing resources, it can allocate part of the key calculations for distributed computing nodes, and can provide aggregate calculations, so that multiple pieces of data can be aggregated into one, thereby reducing network bandwidth usage and speeding up network transmission. In-network computing networks that can provide aggregated computing are becoming more and more popular. And the intranet computing network has corresponding flow control features, namely: 1.
  • the intranet computing switch performs aggregation calculation on packets with the same sequence number (index value) sent by N working nodes (N is an integer greater than 2), After calculating the parameters in the data message with the same serial number sent by N working nodes, it will be sent out, otherwise the packet will be lost; 2.
  • N working nodes are required to send data Synchronize the sending rate at the time; the sending rate of 3.N working nodes depends on the slowest link in the topology.
  • the traditional congestion control method based on the ECN flag bit is only suitable for point-to-point unicast communication, and cannot be applied to the many-to-many synchronous communication mode of intranet computing network, that is, the congestion control method in point-to-point unicast communication is adopted. , It cannot solve that when any working node is congested and the sending rate is reduced, the sending rate of the other N-1 working nodes can also be reduced accordingly.
  • the method proposed in the embodiment of the present application is mainly applied to the application scenario of the in-network computing network that performs congestion control based on the ECN flag.
  • the foregoing application scenarios of the in-network computing network include, but are not limited to, artificial intelligence (AI) distributed training, a mapping reduction model (MapReduce), or a distributed database, and so on.
  • AI artificial intelligence
  • MapReduce mapping reduction model
  • FIG. 1 An embodiment of the present application provides a schematic diagram of a system architecture.
  • the system can include an in-network computing switch and N working nodes; among them, the in-network computing switch is mainly used to obtain congestion information, and because the congestion information can indicate any of the N working nodes
  • the first communication link between a working node and the computing switch in the network is congested (the black dot in Figure 1). Therefore, when the first communication link is congested, the computing switch in the network can be in N All packets with the same sequence number carry the congestion information, so that the computing switch in the network can send N packets with the same sequence number to N working nodes.
  • congestion control can be performed synchronously according to the congestion information carried in the message, so as to avoid the situation that the working node corresponding to the communication link that is not congested causes transmission interruption or even timeout due to the excessively fast sending rate.
  • the first communication link that is congested shown in Figure 1 is the link between the computing switch in the network and the working node 0, which is only a schematic description. In actual applications, congestion occurs.
  • the first communication link may also be a link between a working node such as a working node 1 or a working node 2 and a computing switch in the network, which will not be specifically limited in the embodiment of the present application.
  • the aforementioned intra-network computing switches also have certain programmability and computing capabilities, and can calculate and modify the fields in the message, such as modifying the ECN field and the The calculation result is substituted into the load of the message and so on.
  • the aforementioned intra-network computing switches include but are not limited to Bareboot Wedge 100B switches, Cisco N3400 switches, etc., which will not be specifically limited in the embodiments of the present application.
  • the aforementioned N working nodes may be servers with graphics processing units (GPUs), training nodes, etc., which will not be specifically limited in the embodiments of the present application.
  • the method for synchronizing congestion information in this embodiment is not only applicable to the system architecture shown in FIG. 1 above, but also applicable to other system architectures, which is not specifically limited here.
  • FIG. 2 is a schematic diagram of an embodiment of the method for synchronization of congestion information provided by this embodiment.
  • Methods can include:
  • the computing switch in the network obtains congestion information, the congestion information is used to indicate that the first communication link is congested, the first communication link is the link between the first working node and the computing switch in the network, and the first working node is N Any one of the working nodes, N is an integer greater than 2.
  • the computing switches in the network will obtain congestion information, and then determine the occurrence of congestion based on the congestion information.
  • the congestion information can be that the computing switch in the network detects that any communication port leading to these N working nodes is in a congested state, and thus obtains the congestion information based on the communication port in the congested state; or The first working node corresponding to the first communication link where the congestion occurs notifies the congestion information to the in-network computing switch. It should be understood that the method for obtaining congestion information will not be specifically limited in the embodiment of the present application.
  • the computing switch in the network sends N packets with the same sequence number to N working nodes, and the N packets with the same sequence number all carry congestion information, so that the N working nodes respectively perform congestion control based on the congestion information.
  • the serial number can indicate the serial number of the message, and for the messages with the same serial number, it indicates that the data messages sent by the N working nodes to the computing switch in the network belong to the same batch.
  • the internal computing switch can distinguish the data packets sent by N working nodes based on the serial number, and correspond the parameters of the data packets from N different working nodes with the same serial number to the same batch of data packets. The aggregate calculation.
  • the foregoing aggregation calculation can be understood as that N working nodes synchronously send data packets carrying data to be calculated to the computing switches in the network, and different working nodes send different data packets that are numbered with serial numbers, and the computing switches in the network are After receiving N data packets, the parameters in the data packets with the same sequence number are aggregated accordingly. When all the data packets with a certain sequence number sent by N working nodes are calculated, the network The computing switch will send the aggregation result to N working nodes in the form of packets.
  • Figure 3 a schematic diagram of an in-network computing switch performing aggregation calculation. It can be seen from Figure 3 that the parameters in the data packets sent by the working node 0 to the computing switch in the network are “1", “2", and “3” respectively; The parameters in the data message are “4", “5", and “6”; the parameters in the data message sent by the working node 2 to the computing switch in the network are “7", “8", and “9” respectively. .
  • serial numbers of the data messages corresponding to the parameters "1", “4", and “7” are the same, while the data corresponding to the parameters "2", “5", and “8”
  • the serial numbers of the messages (assuming index1) are the same, and the serial numbers of the data messages corresponding to the parameters "3", “6", and “9” (assuming index2) are the same.
  • the computing switch in the network will sum and average the parameters in the data packets with the same sequence number sent from working node 0, working node 1, and working node 2, for example, the parameters in the data packet corresponding to index 0 "1", "4", and "7" are averaged, and the polymerization result is 4.
  • the calculation switch in the network will only send the aggregation result corresponding to the sequence number to the corresponding working node after calculating the parameters in the data message with the same sequence number sent by the three working nodes.
  • the working node 0, working node 1 and working node 2 in Fig. 3 are only a schematic description for the aggregation calculation. In practical applications, the number of working nodes involved in the aggregation calculation The number is not limited, as long as N is an integer greater than 2.
  • the network computing switch needs to carry the congestion information in N packets with the same sequence number, so that these N packets with the same sequence number can be transferred.
  • Send to N working nodes That is to say, the computing switch in the network needs to carry the congestion information in the message obtained after the aggregation is completed, and send the message that carries the congestion information after the aggregation is completed to N working nodes, because only to N working nodes All messages with the same sequence number can be sent to the N working nodes to obtain the congestion information in the same batch of messages, which means that the congestion information sent to the N working nodes is synchronized, so that These N working nodes can perform synchronous congestion control based on the congestion information in the received message, avoiding the situation that the working node corresponding to the communication link that is not congested causes the transmission to be interrupted or even timed out due to the excessively fast transmission rate. .
  • the N working nodes respectively perform congestion control based on the congestion information, which can be understood as the N working nodes synchronously reducing the rate of sending messages corresponding to each other, etc., which will not be specifically done in this embodiment of the application. Limited description.
  • the in-network computing switch can obtain congestion information in a variety of ways, and realize the synchronization of congestion information in different ways, which will be described in detail in the following embodiments:
  • Case 1 The working node corresponding to the congested communication link informs the computing switch in the network.
  • Case 2 Active detection by the computing switch in the network.
  • FIG. 4 is a schematic diagram of another embodiment of the method for synchronizing congestion information provided by the embodiment of this application.
  • another embodiment of the method for synchronizing congestion information provided by the embodiment of the present application may include:
  • the in-network computing switch obtains the first message sent by the first working node, the first message carries congestion information, and the congestion information includes the first value on the ECN flag bit of the display congestion notification of the first message, and the first value is used To indicate that the first communication link is congested.
  • the communication port of the computing switch in the network is not congested, but it is congested with any communication port of other switches connected to the computing switch in the network.
  • the first working node here can be understood as passing the first working node.
  • a communication link is indirectly connected to the computing switch in the network, then the first working node corresponding to the first communication link that is congested will carry the congestion information in the first message and send it to the computing switch in the network. .
  • the congestion information may include the first value on the ECN flag bit of the first packet, and the ECN flag bit in the first packet is a field with a length of 2 bits located in the header of the packet.
  • the aforementioned ECN flag is located in the fourth version of the Internet communication protocol (Internet protocol version 4, IPv4) or Internet protocol version 6 (Internet protocol version 6, IPv6) in the 2bit field of the packet header; while in the wireless broadband technology (InfiniBand, IB) protocol, the ECN flag is located in the basic transmission
  • the 2-bit field of the header (base transport header, BTH) can be specifically composed of forward explicit congestion notification (FECN) or backward explicit congestion notification (BECN), that is, the first The one bit is the FECN bit, and the latter bit is the BECN bit, which will not be specifically limited in the embodiment of the present
  • the first value is the value on the ECN flag bit of the aforementioned first message, which can be used to indicate that the first communication link is congested.
  • FIG. 5 is a schematic diagram of the state of the ECN flag bit of the RoCE v2 protocol or the TCP protocol proposed in the embodiment of this application. It can be seen from Figure 5 that when the value of the ECN field in the IPv4 or IPv6 packet header is 11, the corresponding state is the forward congestion flag, indicating that congestion occurs. Therefore, the ECN in the RoCE v2 protocol or the TCP protocol The first value on the flag bit may take a value of 11, which is used to indicate that the first communication link between the first working node and the computing switch in the network is congested.
  • the value of the FECN bit is 1 indicating that congestion occurs, and the value of the BECN bit is 1 also indicating congestion.
  • the value of the FECN bit is 1, which means that A is experiencing congestion in the process of sending data stream to B; if the data stream flows from the working node B to the working node A, and the value of the BECN bit is 1, it means that B is sending to A Congestion is encountered in the process of data flow. Therefore, the first value on the ECN flag bit in the InfiniBand protocol may include: the value of the first bit is 1, or the value of the second bit is 1.
  • the value of the ECN field, the FECN bit, or the BECN bit may also use other values to indicate the occurrence of congestion in practical applications, which will not be specifically limited in the embodiment of the present application.
  • the in-network computing switch sends N packets with the same sequence number to N working nodes, and the N packets with the same sequence number all carry congestion information.
  • Working node 1 and working node 2 that are not in congestion will continuously send data packets to the computing switch in the network, which can easily cause the buffer area of the computing switch in the network to overflow and be exhausted. Based on the calculation in the network, the transmission rate in the network often depends on the flow control characteristics of the slowest link in the topology. Before the aggregation of a data message with a certain sequence number is completed, the data message sent by the working node will be cached. Up or being discarded, causing the congestion information to become outdated and invalid.
  • the in-network computing switch detects that the first message obtained from the first working node carries congestion information, it will pass a timer, a message timer, etc. Monitor the aging period of the congestion information in the first message.
  • the computing switch in the network will carry the congestion information in the N packets with the same sequence number, and send the same N sequence numbers to the N working nodes.
  • the message not only realizes that the congestion information can be synchronized to these N working nodes, so that after receiving these N messages with the same sequence number, the N working nodes can synchronize according to the congestion information carried in the message
  • Local congestion control fills the gap in the lack of applicable congestion information synchronization in the intranet computing network, and can also solve the problem of failure of congestion information in the intranet computing network based on the first value being within the time limit.
  • the method for synchronizing congestion information may further include:
  • the calculation switch in the network modifies the value of the ECN flag bit of the second packet to the first value to obtain the third packet.
  • the second packet is the first packet to be aggregated within the validity period.
  • the serial number of the message is the same as the serial number of the third message;
  • the computing switch in the network sends N packets with the same sequence number to N working nodes, including:
  • the computing switch in the network sends N third messages to N working nodes, and the first value in each third message indicates that the corresponding working node performs congestion control, and the sequence numbers in the N third messages are the same.
  • the in-network computing switch first parses the first message, and based on the flow control feature that the in-network computing switch will only send the message out after completing the aggregation calculation. Therefore, regardless of whether the first packet obtained after parsing is a packet of the first type or a packet of the second type, the computing switch in the network needs to wait for the first aggregation within the congestion indicator aging period of the first value.
  • the completed message that is, the second message, and then modify the value of the ECN flag bit of the second message to the first value.
  • the purpose is to carry the congestion information in the second message, so that it can be
  • the modified second message is taken as the third message carrying congestion information.
  • the computing switch in the network can send N third messages to N working nodes to synchronize the congestion information, so that the first value in each third message indicates that the corresponding working node performs congestion control, and further The ground makes the sending rate of each working node tend to be smooth.
  • distinguishing whether the first message belongs to the first type of message or the second type of message can be determined by the message length of the first message. For example: when the message length of the first message is within the first preset message length, it can be determined that the first message belongs to the first type of message; when the message length of the first message is in the first 2. When the message length is within the preset message length, it can be determined that the first message belongs to the message of the second type.
  • the aforementioned first preset message length is greater than the second preset message length, and the first type of message can be understood to be able to perform aggregation calculations, or directly broadcast data messages; and the second type of messages It can be understood that neither aggregate calculations nor broadcast data packets can be performed.
  • the in-network computing switch modifies the value on the ECN flag bit of the second packet to There can also be multiple ways of the first value, which can be understood with reference to the following ways:
  • Method 1 When the ECN flag bit of the first packet includes the first ECN field and the ECN flag bit of the second packet includes the second ECN field, the in-network computing switch will modify the value in the second ECN field to the first The first value in the ECN field.
  • the ECN flag bit in the first message received is the first ECN field
  • the value in the second ECN field in the second message is modified to Same as the first value, such as: modified to "11", so that the congestion information carried in the first message can be copied to the second message, which provides multiple application possibilities for subsequent synchronization of congestion information sex.
  • Method 3 When the ECN flag bit of the first message includes the first backward display congestion notification BECN bit, and the ECN flag bit of the second message includes the second BECN bit, the on-net computing switch will set the value in the second BECN bit The value is modified to the first value on the first BECN bit.
  • the ECN flag in the InfiniBand protocol is located in the 2bit field in the BTH header, and the 2bit field is composed of FECN and BECN, and the value on the FECN bit is "1", or the value "1" on the BECN bit can indicate congestion.
  • the ECN flag bit in the first message received is the first FECN bit
  • the value in the second FECN bit in the second message is modified to be the same as the first value.
  • modify to "1” if the ECN flag bit in the first message received is the first BECN bit, then the value in the second BECN bit in the second message is modified to be the same as the first value, such as: 1", so that the congestion information carried in the first message can be copied to the second message, which provides multiple application possibilities for subsequent synchronization of congestion information.
  • the value of the second FECN bit is set to “1” or the value of the second BECN bit is set to "1" to indicate that congestion occurs. This is only a schematic description, and in practical applications It is also possible to define the value on the second FECN bit or the value on the second BECN bit as other numerical values to indicate the occurrence of congestion, which will not be specifically limited in the embodiment of this application.
  • the method of congestion information synchronization may further include:
  • the in-network computing switch receives the fourth message sent by the first working node during the congestion indication aging period of the first value, the value on the ECN flag bit in the fourth message is the first value;
  • the calculation switch in the network ignores the first value in the fourth packet.
  • the network computing switch receives the fourth message sent by the first working node, at this time, because the value on the ECN flag bit in the fourth message is the same as that of the first working node.
  • the first value in the congestion information carried in one message is the same, it means that the fourth message also carries the congestion information.
  • the network computing switch has started a timer to monitor the aging of the first value when it receives the first message, and in order to avoid repeated copying and sending of congestion information, if the congestion indication is at the first value at this time
  • the fourth packet that also carries congestion information is also received within the validity period.
  • the calculation switch in the network does not need to restart another timer when the fourth packet is received, but ignores the fourth packet.
  • the first value That is to say, the computing switch in the network can ignore the congestion information in the fourth packet, and forward the fourth packet to the destination according to the original forwarding rules, that is, only need to perform the congestion information carried in the first packet. Just send it synchronously, saving network resources.
  • the in-network computing switch ignores the congestion information.
  • the congestion indication aging at the first value has expired, then the congestion information corresponding to the first value has expired.
  • the network computing switch will synchronize the invalid congestion information and it will not be able to make N
  • Each working node performs congestion control synchronously, so the computing switch in the network can ignore the congestion information in the first message, send the first message to the destination according to the original forwarding rules, and re-transmit the congested communication link Obtain other congestion information that has not failed from the corresponding working node.
  • FIG. 6 is a schematic diagram of another embodiment of a method for synchronization of congestion information provided in an embodiment of this application.
  • another embodiment of the method for synchronizing congestion information provided by the embodiment of the present application may include:
  • the computing switch in the network modifies the value of the ECN flag in the N data packets to be broadcast to obtain congestion information.
  • the N data messages to be broadcast are messages with the same sequence number among the N working nodes.
  • the in-network computing switch can monitor the cache queue information where the messages sent by N working nodes are located. When the cache queue information has exceeded the cache threshold, the in-network computing switch can determine among the N working nodes. The port status between any of the working nodes and the computing switch in the network has shown congestion, that is, the port status between the first working node and the computing switch in the network shows congestion.
  • the first working node here can be understood as a working node that is directly connected to the computing switch in the network through the first communication link.
  • the first working node is only any one of the N working nodes. It is not limited in the embodiment of this application; secondly, in addition to the way that the in-network computing switch judges whether the port is congested by caching queue information, in practical applications, it can also determine the port by other judgment methods such as port utilization. It is in a congested state, which will not be specifically limited in the embodiment of the present application.
  • N data messages to be broadcast are messages with the same sequence number obtained after the calculation switch in the network completes the aggregation calculation of the data messages with the same sequence number sent by the N working nodes. .
  • the description of the serial number can be understood with reference to step 202 in FIG. 2, and details will not be repeated here.
  • the computing switch in the network sends the modified N data messages to be broadcast to N working nodes, and the values on the ECN flag bits in the modified N data messages to be broadcast are used to instruct the N working nodes to perform Congestion control.
  • the obtained N modified data messages to be broadcast all carry congestion information, so the network calculation switch Then the N modified data messages to be broadcast can be sent to N working nodes, which not only realizes that the congestion information can be synchronized to these N working nodes, but also fills the gap of the lack of applicable congestion information synchronization in the intranet computing network.
  • the N working nodes can perform synchronous congestion control according to the congestion information carried in the modified data message to be broadcast, and further make the N working nodes
  • the sending rate of each working node in the network tends to be smooth, so as to avoid the situation that the working node corresponding to the communication link that is not congested causes transmission interruption or even timeout because the sending rate is too fast.
  • the in-network computing switch modifies the value on the ECN flag bit of the second packet to There can also be multiple ways of the first value. Therefore, the calculation switch in the network can modify the value of the ECN flag in the N data messages to be broadcast, which can be understood by referring to the following ways:
  • the second method when the ECN flag bit in the data message to be broadcast includes the third FECN bit, the in-network computing switch sets the value in the third FECN bit.
  • the ECN flag bit in the InfiniBand protocol is located in the 2bit field in the BTH header, and the 2bit field is composed of the FECN bit and the BECN bit, and the value of the FECN bit is "1", or BECN
  • the value "1" in the bit can indicate that congestion occurs.
  • the computing switch in the network sends N packets with the same sequence number that all carry congestion information to N working nodes, which not only realizes that the congestion information can be synchronized to these N working nodes, so that N
  • the working nodes can perform synchronous congestion control according to the congestion information carried in the message, filling in the gap of the lack of applicable congestion information synchronization in the computing network in the network, and further making the N working nodes
  • the sending rate of each working node tends to be smooth.
  • the foregoing mainly introduces the congestion information synchronization method provided by the embodiment of the present application from the perspective of the method. It can be understood that in order to realize the above-mentioned functions, corresponding hardware structures and/or software modules for performing each function are included. Those skilled in the art should easily realize that in combination with the modules and algorithm steps of the examples described in the embodiments disclosed in the present application, the present application can be implemented in the form of hardware or a combination of hardware and computer software. Whether a certain function is executed by hardware or computer software-driven hardware depends on the specific application and design constraint conditions of the technical solution. Professionals and technicians can use different methods for each specific application to implement the described functions, but such implementation should not be considered beyond the scope of this application.
  • the embodiments of the present application may divide the device into functional modules according to the foregoing method examples.
  • each functional module may be divided corresponding to each function, or two or more functions may be integrated into one processing module.
  • the above-mentioned integrated modules can be implemented in the form of hardware or software functional modules. It should be noted that the division of modules in the embodiments of the present application is illustrative, and is only a logical function division, and there may be other division methods in actual implementation.
  • an embodiment of the in-network computing switch in the embodiment of the present application includes:
  • the obtaining unit 701 is configured to obtain congestion information, and the congestion information is used to indicate that a first communication link is congested, and the first communication link is a link between a first working node and the computing switch in the network, and The first working node is any one of N working nodes, and N is an integer greater than 2;
  • the sending unit 702 is configured to send N messages with the same sequence number to N working nodes.
  • the N messages with the same sequence number all carry the congestion information obtained by the obtaining unit 701, so that the N working nodes are based on congestion. Information is used for congestion control.
  • the sending unit 702 sends to N working nodes N packets with the same sequence numbers that all carry the congestion information obtained by the obtaining unit 701, so that the congestion information can be synchronized to the N working nodes, filling the network.
  • N working nodes can synchronize congestion control according to the congestion information carried in the message, so that the sending rate of each working node tends to be smooth.
  • the obtaining unit 701 may include:
  • the first obtaining module 7011 is configured to obtain the first message sent by the first working node, the first message carries congestion information, and the congestion information includes the first value on the ECN flag bit of the display congestion notification of the first message. The value is used to indicate that the first communication link is congested, where the first working node is any one of the N working nodes;
  • the sending unit 702 may include:
  • the first sending module 7021 is configured to send N packets with the same sequence number and N packets with the same sequence number to N working nodes within the congestion indication aging period of the first value obtained by the first acquisition module 7011 Both carry congestion information.
  • the first sending module 7021 carries the congestion information obtained by the first obtaining module 7011 in all N packets with the same sequence number, and Sending these N messages with the same sequence number to N working nodes, not only realizes that congestion information can be synchronized to these N working nodes, and fills the gap in the lack of applicable congestion information synchronization in the intranet computing network, but also It can enable N working nodes to perform synchronous congestion control according to the congestion information carried in the message, and further solve the failure problem of the congestion information in the in-network computing network based on the first value being within the validity period.
  • the in-network computing switch may further include:
  • the modifying unit 703 is configured to modify the value of the ECN flag bit of the second message to the first value before the first sending module 7021 sends N messages with the same sequence number to the N working nodes to obtain the third Message, the second message is the first message to be aggregated within the validity period, and the sequence number of the second message is the same as the sequence number of the third message;
  • the first sending module 7021 includes:
  • the first sending sub-module 70211 is configured to send N third messages obtained by the modification unit 703 to N working nodes.
  • the first value in each third message indicates that the corresponding working node performs congestion control.
  • the sequence numbers in the three messages are the same.
  • the in-network computing switch further includes:
  • the acquiring unit 701 is configured to receive a fourth message sent by the first working node within the congestion indication aging period of the first value, and the value on the ECN flag bit in the fourth message is the first value;
  • the ignoring unit is used to ignore the first value in the fourth message obtained by the obtaining unit 701.
  • the modifying unit 703 is configured to include the ECN flag bit of the first packet When the first ECN field and the ECN flag bit of the second packet include the second ECN field, modify the value in the second ECN field to the first value in the first ECN field; or,
  • the modification unit 703 is configured to: when the ECN flag bit of the first message includes the first forward display congestion notification FECN bit, and the ECN flag bit of the second message includes the second FECN bit, change the value in the second FECN bit Modify to the first value in the first FECN bit; or,
  • the modifying unit 703 is configured to: when the ECN flag bit of the first message includes the first backward display congestion notification BECN bit, and the ECN flag bit of the second message includes the second BECN bit, the value in the second BECN bit is changed Modified to the first value on the first BECN bit.
  • the obtaining unit 701 may include:
  • the second acquisition module 7012 is used to modify the value of the ECN flag bit in the N data packets to be broadcast when the port status between the first working node and the in-network computing switch shows congestion to obtain the congestion Information, where the N data messages to be broadcast are messages with the same sequence number among the N working nodes, and the second working node is any one of the N working nodes;
  • the sending unit 702 includes:
  • the second sending module 7022 is used to send modified N data messages to be broadcast to N working nodes, and the values on the ECN flag bits in the modified N data messages to be broadcast are used to indicate N jobs respectively The node performs congestion control.
  • the second sending module 7022 may send the N modified data messages to be broadcast to N working nodes, so that each modified data message to be broadcast carries the data obtained by the second obtaining module 7012 Congestion information, so that the congestion information can be synchronized to these N working nodes, filling the gap of the lack of applicable congestion information synchronization in the computing network in the network; and allowing the N working nodes to be based on the modified data message to be broadcast
  • the congestion information carried in the congestion control is performed synchronously, which further makes the sending rate of each of the N working nodes tend to be smooth.
  • the second obtaining module 7012 is used for the ECN flag in the data message to be broadcast
  • the calculation switch in the network sets the value in the third ECN field
  • the second acquisition module 7012 is configured to set the value in the third FECN bit by the in-network computing switch when the ECN flag bit in the data message to be broadcast includes the third FECN bit.
  • FIG. 11 is a schematic diagram of the hardware structure of a communication device in an embodiment of the present application. As shown in FIG. 11, the communication device may include:
  • the communication device includes at least one processor 1101, a communication line 1107, a memory 1103, and at least one communication interface 1104.
  • the processor 1101 may be a general-purpose central processing unit (central processing unit, CPU), a microprocessor, an application-specific integrated circuit (server IC), or one or more programs for controlling the execution of the program of this application Integrated circuits.
  • CPU central processing unit
  • server IC application-specific integrated circuit
  • the communication line 1107 may include a path to transmit information between the aforementioned components.
  • Communication interface 1104 which uses any device such as a transceiver to communicate with other devices or communication networks, such as Ethernet, radio access network (RAN), wireless local area networks (WLAN), etc. .
  • RAN radio access network
  • WLAN wireless local area networks
  • the memory 1103 may be a read-only memory (ROM) or other types of static storage devices that can store static information and instructions, random access memory (RAM), or other types that can store information and instructions
  • ROM read-only memory
  • RAM random access memory
  • the dynamic storage device, the memory can exist independently, and is connected to the processor through the communication line 1107.
  • the memory can also be integrated with the processor.
  • the memory 1103 is used to store computer-executed instructions for executing the solution of the present application, and the processor 1101 controls the execution.
  • the processor 1101 is configured to execute computer-executable instructions stored in the memory 1103, so as to implement the congestion information synchronization method provided in the foregoing embodiment of the present application.
  • the computer-executable instructions in the embodiments of the present application may also be referred to as application program codes, which are not specifically limited in the embodiments of the present application.
  • the communication device may include multiple processors, such as the processor 1101 and the processor 1102 in FIG. 11.
  • processors can be a single-CPU (single-CPU) processor or a multi-core (multi-CPU) processor.
  • the processor here may refer to one or more devices, circuits, and/or processing cores for processing data (for example, computer program instructions).
  • the communication apparatus may further include an output device 1105 and an input device 1106.
  • the output device 1105 communicates with the processor 1101 and can display information in a variety of ways.
  • the input device 1106 communicates with the processor 1101 and can receive user input in a variety of ways.
  • the input device 1106 may be a mouse, a touch screen device, a sensor device, or the like.
  • the aforementioned communication device may be a general-purpose device or a dedicated device.
  • the communication device may be a router, an in-network computing switch, or a device with a similar structure in FIG. 11.
  • the embodiment of the present application does not limit the type of the communication device.
  • the above-mentioned acquisition unit 701, first acquisition module 7011, and second acquisition module 7012 can all be implemented by the input device 1106, and the sending unit 702, the first sending module 7021, the first sending sub-module 70211, and the second sending module 7022 can all be implemented through the input device 1106.
  • the output device 1105 is implemented, and both the modification unit 703 and the ignoring unit can be implemented by the processor 1101 or the processor 1102.
  • the disclosed system, device, and method can be implemented in other ways.
  • the device embodiment described above is only illustrative.
  • the division of the unit is only a logical function division, and there may be other division methods in actual implementation, for example, multiple units or components may be combined or may be Integrate into another system, or some features can be ignored or not implemented.
  • the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, and may be in electrical, mechanical or other forms.
  • the unit described as a separate component may or may not be physically separated, and the component displayed as a unit may or may not be a physical unit, that is, it may be located in one place, or may also be distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the objectives of the solutions of the embodiments.
  • the functional units in the various embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit.
  • the above-mentioned integrated unit can be implemented in the form of hardware or software functional unit.
  • the integrated unit is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a computer readable storage medium.
  • the technical solution of the present application essentially or the part that contributes to the existing technology or all or part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium , Including several instructions to make a computer device (which can be a personal computer, a server, or a network device, etc.) execute all or part of the steps of the method in each embodiment of the present application.
  • the aforementioned storage media include: U disk, mobile hard disk, read-only memory (read-only memory, ROM), random access memory (random access memory, RAM), magnetic disks or optical disks and other media that can store program codes. .

Abstract

Disclosed in embodiments of the present application are a congestion information synchronizing method and a related apparatus, applied to a scenario of an in-network computing network, and used for filling in the gap of the lack of applicable congestion information synchronization in the in-network computing network and for causing the transmission rate of each of N working nodes to be smooth to avoid the interruption or even timeout of transmission caused by excessively fast transmission rates of the working nodes corresponding to communication links in which congestion does not occur. The congestion information synchronizing method comprises: obtaining congestion information, the congestion information being used for indicating that congestion occurs in a first communication link, the first communication link being a link between a first working node and an in-network computing switch, the first working node being any one of N working nodes, and N being an integer greater than 2; and sending to the N working nodes N packets having the same sequence number, the N packets having the same sequence number all carrying the congestion information, so that the N working nodes separately perform congestion control on the basis of the congestion information.

Description

一种拥塞信息同步的方法以及相关装置Method and related device for synchronization of congestion information
本申请要求于2020年04月09日提交中国专利局、申请号为202010273713.0、发明名称为“一种拥塞信息同步的方法以及相关装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of a Chinese patent application filed with the Chinese Patent Office on April 9, 2020, the application number is 202010273713.0, and the invention title is "A method and related device for congestion information synchronization", the entire content of which is incorporated by reference In this application.
技术领域Technical field
本申请实施例涉及通信技术领域,具体涉及一种拥塞信息同步的方法以及相关装置。The embodiments of the present application relate to the field of communication technologies, and in particular to a method and related devices for synchronizing congestion information.
背景技术Background technique
越来越多的网络应用依赖于大规模计算,例如:人工智能、物联网和云计算等等。而想要实现大规模的计算,依靠单个训练节点是不可行的,唯有通过分布式计算的多个训练节点协同计算,才能为网络应用提供高性能的计算。而网内计算(in-net computing)网络充分利用网络计算资源,为分布式计算节点分摊部分关键计算,并提供聚合计算,使得多份数据聚合为一份,从而压缩网络带宽占用,加快网络传输,为分布式计算提速。More and more network applications rely on large-scale computing, such as artificial intelligence, Internet of Things, cloud computing, and so on. To achieve large-scale computing, it is not feasible to rely on a single training node. Only through the collaborative computing of multiple training nodes in distributed computing can high-performance computing be provided for network applications. The in-net computing network makes full use of network computing resources, allocates some key calculations for distributed computing nodes, and provides aggregated calculations, so that multiple pieces of data are aggregated into one, thereby reducing network bandwidth usage and speeding up network transmission , Speed up distributed computing.
众所周知,拥塞控制是提高网络资源利用率、优化传输质量的重要手段,拥塞处理的好坏直接影响着系统的性能。目前,标准的远程直接内存访问(remote direct memory access,RDMA)协议中都采用基于显示拥塞通知(explicit congestion notification,ECN)标志位的拥塞控制算法,而且也越来越多基于传输控制协议(transmission control protocol,TCP)的协议和应用也开始启用了标准TCP协议中的ECN标志位和相应的拥塞控制方法。基于ECN标志位的拥塞控制方法可以理解为:若数据流从A节点发送到B节点,当网络中的交换机检测到端口发生拥塞时,则将经过该端口的所有报文中的前向ECN位置位为1,若B节点接收到前向ECN被置位为1的报文(表明A节点到B节点的路由发生了拥塞),那么将发生A节点的报文中的后向ECN位置位为1,或者回复一个拥塞通知报文(congestion notification packet,CNP),从而使得A节点收到后向ECN被置位的报文或CNP报文时,进行拥塞控制。As we all know, congestion control is an important means to improve network resource utilization and optimize transmission quality. The quality of congestion handling directly affects the performance of the system. At present, standard remote direct memory access (RDMA) protocols use congestion control algorithms based on explicit congestion notification (ECN) flags, and more and more are based on transmission control protocols (transmission control protocols). The control protocol (TCP) protocol and applications have also begun to enable the ECN flag in the standard TCP protocol and the corresponding congestion control method. The congestion control method based on the ECN flag bit can be understood as: if the data flow is sent from node A to node B, when the switch in the network detects that the port is congested, it will pass the forward ECN position in all the packets of the port The bit is 1. If node B receives a message with forward ECN set to 1 (indicating that the route from node A to node B is congested), then the backward ECN bit in the message of node A will be set to 1. Or reply to a congestion notification packet (CNP), so that node A performs congestion control when it receives a packet with the ECN set or a CNP packet.
但是,目前基于ECN标志位的拥塞控制方法只适用于点对点的单播通信中,而在网内计算的多对多同步通信模式当中,运用目前基于ECN标志位的拥塞控制方法,很容易因为工作节点发来的数据报文被网内计算交换机在聚合处理的过程中造成拥塞信息被丢弃;另外,由于网内计算网络在提供聚合计算的过程会使得通信模式变成了多对多的通信模式,而采用传统的点对点的拥塞控制方法,则容易造成多对多通信模式下的各个工作节点的发送速率无法同步,从而使得多个工作节点之间未能同步地进行拥塞控制。However, the current congestion control method based on the ECN flag is only suitable for point-to-point unicast communication. In the many-to-many synchronous communication mode calculated in the network, the current congestion control method based on the ECN flag is easy to work The data message sent by the node is discarded by the network computing switch in the process of aggregation processing; in addition, the communication mode becomes a many-to-many communication mode due to the process of providing aggregation calculation by the computing network within the network. However, the traditional point-to-point congestion control method is likely to cause the transmission rate of each working node in the many-to-many communication mode to be unable to synchronize, so that multiple working nodes cannot synchronize congestion control.
发明内容Summary of the invention
本申请实施例提供了一种拥塞信息同步的方法以及相关装置,用于向N个工作节点同步发送拥塞信息,填补了网内计算网络中缺少适用的拥塞信息同步的空白,进一步地使得N个工作节点中的每个工作节点的发送速率趋向平滑。The embodiments of the present application provide a method and related devices for synchronization of congestion information, which are used to synchronously send congestion information to N working nodes, which fills in the gap of the lack of applicable congestion information synchronization in the intranet computing network, and further enables N The sending rate of each of the working nodes tends to be smooth.
第一方面,本申请实施例提供了一种拥塞信息同步的方法,该方法可以包括:In the first aspect, an embodiment of the present application provides a method for synchronizing congestion information, and the method may include:
网内计算交换机获取拥塞信息,所述拥塞信息用于指示第一通信链路发生拥塞,所述第一通信链路为第一工作节点与所述网内计算交换机之间的链路,所述第一工作节点为N 个工作节点中的任意一个,所述N为大于2的整数;The in-network computing switch acquires congestion information, where the congestion information is used to indicate that the first communication link is congested, and the first communication link is the link between the first working node and the in-network computing switch, and The first working node is any one of N working nodes, and the N is an integer greater than 2;
所述网内计算交换机向所述N个工作节点发送N个序列号相同的报文,所述N个序列号相同的报文均携带所述拥塞信息,以使得所述N个工作节点分别基于所述拥塞信息进行拥塞控制。The computing switch in the network sends N packets with the same sequence number to the N working nodes, and the N packets with the same sequence number all carry the congestion information, so that the N working nodes are based on The congestion information performs congestion control.
通过上述方式,网内计算交换机向N个工作节点发送均携带有拥塞信息的N个序列号相同的报文,使得拥塞信息能够同步到这N个工作节点中,填补了网内计算网络中缺少适用的拥塞信息同步的空白,并且使得N个工作节点可以根据报文中所携带的拥塞信息进行同步地拥塞控制,使得每个工作节点的发送速率趋向平滑。Through the above method, the computing switch in the network sends N packets with the same sequence number that all carry congestion information to N working nodes, so that the congestion information can be synchronized to these N working nodes, which fills in the lack of the computing network in the network. The applicable congestion information synchronization is blank, and the N working nodes can perform synchronous congestion control according to the congestion information carried in the message, so that the sending rate of each working node tends to be smooth.
可选的,结合上述第一方面,在第一种可能的实现方式中,所述网内计算交换机获取拥塞信息,可以包括:Optionally, with reference to the above first aspect, in a first possible implementation manner, the in-network computing switch acquiring congestion information may include:
所述网内计算交换机获取所述第一工作节点发送的第一报文,所述第一报文携带所述拥塞信息,所述拥塞信息包括所述第一报文的显示拥塞通知ECN标志位上的第一值,所述第一值用于指示所述第一通信链路发生拥塞,其中,所述第一工作节点为所述N个工作节点中的任意一个;The in-network computing switch obtains the first message sent by the first working node, the first message carries the congestion information, and the congestion information includes the display congestion notification ECN flag bit of the first message The first value above, the first value is used to indicate that the first communication link is congested, and the first working node is any one of the N working nodes;
对应地,所述网内计算交换机向所述N个工作节点发送N个序列号相同的报文,所述N个序列号相同的报文均携带所述拥塞信息,包括:Correspondingly, the in-network computing switch sends N packets with the same sequence number to the N working nodes, and the N packets with the same sequence number all carry the congestion information, including:
在所述第一值的拥塞指示时效期内,所述网内计算交换机向所述N个工作节点发送N个序列号相同的报文,所述N个序列号相同的报文均携带所述拥塞信息。During the congestion indication aging period of the first value, the in-network computing switch sends N packets with the same sequence number to the N working nodes, and the N packets with the same sequence number all carry the Congestion information.
通过上述方式,在第一值的拥塞指示时效期还没有超时的时候,网内计算交换机则在N个序列号相同的报文中都带上拥塞信息,并向N个工作节点分别发送这N个序列号相同的报文,不仅实现了拥塞信息能够同步到这N个工作节点中,填补了网内计算网络中缺少适用的拥塞信息同步的空白,而且也能够使得N个工作节点可以根据报文中所携带的拥塞信息进行同步地拥塞控制,进一步地基于第一值处于时效期内解决拥塞信息在网内计算网络中出现的失效问题。Through the above method, when the congestion indication aging period of the first value has not expired, the network computing switch will carry the congestion information in N packets with the same sequence number, and send the N messages to N working nodes. A message with the same sequence number not only realizes that the congestion information can be synchronized to these N working nodes, and fills the gap of the lack of applicable congestion information synchronization in the intranet computing network, but also enables the N working nodes to be able to synchronize according to the report. The congestion information carried in the text performs synchronous congestion control, and further solves the failure problem of the congestion information in the in-network computing network based on the first value being within the time limit.
可选的,结合上述第一方面的第一种可能的实现方式,在第二种可能的实现方式中,在所述网内计算交换机向所述N个工作节点发送N个序列号相同的报文之前,还可以包括:Optionally, in combination with the first possible implementation manner of the first aspect described above, in the second possible implementation manner, the in-network computing switch sends N reports with the same sequence number to the N working nodes. Before the article, you can also include:
所述网内计算交换机将第二报文的ECN标志位上的值修改为所述第一值,以得到第三报文,所述第二报文是在所述时效期内第一个聚合完成的报文,所述第二报文的序列号与所述第三报文的序列号相同;The in-network computing switch modifies the value of the ECN flag bit of the second packet to the first value to obtain a third packet, and the second packet is the first aggregation within the validity period. For a completed message, the sequence number of the second message is the same as the sequence number of the third message;
对应地,所述网内计算交换机向所述N个工作节点发送N个序列号相同的报文,包括:Correspondingly, the in-network computing switch sending N packets with the same sequence number to the N working nodes includes:
所述网内计算交换机向所述N个工作节点发送N个所述第三报文,每个所述第三报文中的所述第一值指示对应的工作节点进行拥塞控制,所述N个第三报文中的序列号相同。The computing switch in the network sends N third messages to the N working nodes, the first value in each third message indicates that the corresponding working node performs congestion control, and the N The sequence numbers in the third message are the same.
通过上述方式,网内计算交换机通过向N个工作节点发送序列号相同的N个第三报文,实现对拥塞信息的同步,进一步地使得每个第三报文中的第一值指示对应的工作节点进行拥塞控制,使得每个工作节点的发送速率趋向平滑。In the above manner, the computing switch in the network realizes synchronization of congestion information by sending N third messages with the same sequence number to N working nodes, and further makes the first value in each third message indicate the corresponding The working nodes perform congestion control so that the sending rate of each working node tends to be smooth.
可选的,结合上述第一方面的第一种至第二种可能的实现方式,在第三种可能的实现方式中,还可以包括:Optionally, in combination with the first to second possible implementation manners of the first aspect described above, in the third possible implementation manner, it may further include:
若在所述第一值的拥塞指示时效期内,所述网内计算交换机接收到所述第一工作节点 发送的第四报文,所述第四报文中所述ECN标志位上的值为所述第一值;If the in-network computing switch receives the fourth packet sent by the first working node within the congestion indicator aging period of the first value, the value on the ECN flag bit in the fourth packet Is the first value;
所述网内计算交换机忽略所述第四报文中的所述第一值。The in-network computing switch ignores the first value in the fourth packet.
通过上述方式,只需要将第一报文中所携带的拥塞信息进行同步发送即可,而忽略掉第四报文中的拥塞信息,不仅避免拥塞信息的重复发送,而且还节省网络资源。Through the above method, only the congestion information carried in the first message needs to be sent synchronously, and the congestion information in the fourth message is ignored, which not only avoids repeated transmission of congestion information, but also saves network resources.
可选的,结合上述第一方面第二种可能的实现方式,在第四种可能的实现方式中,所述网内计算交换机将第二报文的ECN标志位上的值修改为所述第一值,可以包括:Optionally, in combination with the second possible implementation manner of the first aspect described above, in a fourth possible implementation manner, the in-network computing switch modifies the value of the ECN flag bit of the second packet to the first A value, which can include:
当所述第一报文的ECN标志位包括第一ECN字段,所述第二报文的ECN标志位包括第二ECN字段时,则所述网内计算交换机将所述第二ECN字段中的值修改为所述第一ECN字段中的所述第一值;或,When the ECN flag bit of the first packet includes the first ECN field, and the ECN flag bit of the second packet includes the second ECN field, the in-network computing switch will set the value in the second ECN field The value is modified to the first value in the first ECN field; or,
当所述第一报文的ECN标志位包括第一前向显示拥塞通知FECN位,所述第二报文的ECN标志位包括第二FECN位时,则所述网内计算交换机将所述第二FECN位中的值修改为所述第一FECN位中的所述第一值;或,When the ECN flag bit of the first message includes the first forward display congestion notification FECN bit, and the ECN flag bit of the second message includes the second FECN bit, the in-network computing switch will The value in the second FECN bit is modified to the first value in the first FECN bit; or,
当所述第一报文的ECN标志位包括第一后向显示拥塞通知BECN位,所述第二报文的ECN标志位包括第二BECN位时,则所述网内计算交换机将所述第二BECN位中的值修改为所述第一BECN位上的所述第一值。When the ECN flag bit of the first message includes the first backward display congestion notification BECN bit, and the ECN flag bit of the second message includes the second BECN bit, the in-network computing switch will The value in the second BECN bit is modified to the first value on the first BECN bit.
实施例中,由于ECN标志位上的值在不同协议中的取值有多种形式,因此网内计算交换机将第二报文的ECN标志位上的值修改为所述第一值的方式也可以有多种,通过上述方式的修改,为后续的拥塞信息的同步提供了多种应用的可能性。In the embodiment, since the value on the ECN flag bit has multiple forms in different protocols, the way that the calculation switch in the network modifies the value on the ECN flag bit of the second packet to the first value is also There can be multiple types, and through the above-mentioned modification, multiple application possibilities are provided for the subsequent synchronization of congestion information.
可选的,结合上述第一方面,在第五种可能的实现方式中,所述网内计算交换机获取拥塞信息,可以包括:Optionally, in combination with the above-mentioned first aspect, in a fifth possible implementation manner, the in-network computing switch acquiring congestion information may include:
当所述第一工作节点与所述网内计算交换机之间的端口状态显示拥塞时,则所述网内计算交换机将N个待广播数据报文中的ECN标志位上的值进行修改,以得到拥塞信息,其中,所述N个待广播数据报文为所述N个工作节点中序列号相同的报文;When the port status between the first working node and the in-network computing switch shows congestion, the in-network computing switch modifies the value of the ECN flag bit in the N data messages to be broadcast to Obtain congestion information, where the N data messages to be broadcast are messages with the same sequence number among the N working nodes;
对应地,所述网内计算交换机向所述N个工作节点发送N个序列号相同的报文,所述N个序列号相同的报文均携带所述拥塞信息,可以包括:Correspondingly, the in-network computing switch sends N packets with the same sequence number to the N working nodes, and the N packets with the same sequence number all carry the congestion information, which may include:
所述网内计算交换机向所述N个工作节点发送修改后的N个所述待广播数据报文,所述修改后的N个待广播数据报文中的ECN标志位上的值分别用于指示所述N个工作节点进行拥塞控制。The in-network computing switch sends the modified N data messages to be broadcast to the N working nodes, and the values on the ECN flag bits in the modified N data messages to be broadcast are used respectively Instruct the N working nodes to perform congestion control.
实施例中,网内计算交换机可以向N个工作节点发送这N个修改后的待广播数据报文,使得每个修改后的待广播数据报文中都携带有拥塞信息,从而使得拥塞信息能够同步到这N个工作节点中,填补了网内计算网络中缺少适用的拥塞信息同步的空白;并且使得N个工作节点可以根据修改后的待广播数据报文中所携带的拥塞信息进行同步地拥塞控制,进一步地使得N个工作节点中的每个工作节点的发送速率趋向平滑。In an embodiment, the computing switch in the network can send these N modified data messages to be broadcast to N working nodes, so that each modified data message to be broadcast carries congestion information, so that the congestion information can be Synchronize to these N working nodes, filling the gap of the lack of applicable congestion information synchronization in the intranet computing network; and enabling the N working nodes to synchronize according to the congestion information carried in the modified data message to be broadcast Congestion control further smoothes the sending rate of each of the N working nodes.
可选的,结合上述第一方面第五种可能的实现方式,在第六种可能的实现方式中,所述网内计算交换机将N个待广播数据报文中的ECN标志位上的值进行修改,包括:Optionally, in combination with the fifth possible implementation manner of the first aspect described above, in the sixth possible implementation manner, the in-network computing switch calculates the value on the ECN flag bit in the N data packets to be broadcast Modifications include:
当所述待广播数据报文中的ECN标志位包括第三ECN字段时,则所述网内计算交换机将所述第三ECN字段中的值进行置位;或,When the ECN flag in the data message to be broadcast includes the third ECN field, the in-network computing switch sets the value in the third ECN field; or,
当所述待广播数据报文中的ECN标志位包括第三FECN位时,则所述网内计算交换机将所 述第三FECN位中的值进行置位。When the ECN flag bit in the data message to be broadcast includes the third FECN bit, the in-network computing switch sets the value in the third FECN bit.
实施例中,由于ECN标志位上的值在不同协议中的取值有多种形式,因此网内计算交换机将第二报文的ECN标志位上的值修改为所述第一值的方式也可以有多种,通过上述方式的修改,为后续的拥塞信息的同步提供了多种应用的可能性。In the embodiment, since the value on the ECN flag bit has multiple forms in different protocols, the way that the calculation switch in the network modifies the value on the ECN flag bit of the second packet to the first value is also There can be multiple types, and through the above-mentioned modification, multiple application possibilities are provided for the subsequent synchronization of congestion information.
第二方面,本申请实施例提供了一种网内计算交换机,该网内计算交换机可以包括:In the second aspect, the embodiments of the present application provide an in-network computing switch, and the in-network computing switch may include:
获取单元,用于获取拥塞信息,所述拥塞信息用于指示第一通信链路发生拥塞,所述第一通信链路为第一工作节点与所述网内计算交换机之间的链路,所述第一工作节点为N个工作节点中的任意一个,所述N为大于2的整数;The acquiring unit is configured to acquire congestion information, where the congestion information is used to indicate that the first communication link is congested, and the first communication link is the link between the first working node and the computing switch in the network, so The first working node is any one of N working nodes, and the N is an integer greater than 2;
发送单元,用于向所述N个工作节点发送N个序列号相同的报文,所述N个序列号相同的报文均携带所述拥塞信息,以使得所述N个工作节点分别基于所述拥塞信息进行拥塞控制。The sending unit is configured to send N messages with the same sequence number to the N working nodes, and the N messages with the same sequence number all carry the congestion information, so that the N working nodes are based on all the messages. The congestion information is described for congestion control.
可选的,结合上述第二方面,在第一种可能的实现方式中,所述获取单元可以包括:Optionally, with reference to the above second aspect, in the first possible implementation manner, the acquiring unit may include:
第一获取模块,用于获取所述第一工作节点发送的第一报文,所述第一报文携带所述拥塞信息,所述拥塞信息包括所述第一报文的显示拥塞通知ECN标志位上的第一值,所述第一值用于指示所述第一通信链路发生拥塞;The first obtaining module is configured to obtain a first message sent by the first working node, the first message carrying the congestion information, and the congestion information includes the display congestion notification ECN flag of the first message A first value on the bit, where the first value is used to indicate that the first communication link is congested;
对应地,所述发送单元,包括:Correspondingly, the sending unit includes:
第一发送模块,用于在所述第一获取模块获取得到的所述第一值的拥塞指示时效期内,向所述N个工作节点发送N个序列号相同的报文,所述N个序列号相同的报文均携带所述拥塞信息。The first sending module is configured to send N packets with the same sequence number to the N working nodes within the congestion indication time period of the first value obtained by the first obtaining module, and the N All packets with the same sequence number carry the congestion information.
可选的,结合上述第二方面的第一种可能的实现方式,在第二种可能的实现方式中,所述网内计算交换机还可以包括:Optionally, in combination with the first possible implementation manner of the second aspect described above, in the second possible implementation manner, the in-network computing switch may further include:
修改单元,用于在所述第一发送模块向所述N个工作节点发送N个序列号相同的报文之前,将第二报文的ECN标志位上的值修改为所述第一值,以得到第三报文,所述第二报文是在所述时效期内第一个聚合完成的报文,所述第二报文的序列号与所述第三报文的序列号相同;The modifying unit is configured to modify the value of the ECN flag bit of the second message to the first value before the first sending module sends N messages with the same sequence number to the N working nodes, To obtain a third message, where the second message is the first message to be aggregated within the validity period, and the sequence number of the second message is the same as the sequence number of the third message;
对应地,所述第一发送模块,包括:Correspondingly, the first sending module includes:
第一发送子模块,用于向所述N个工作节点发送所述修改单元得到的N个所述第三报文,每个所述第三报文中的所述第一值指示对应的工作节点进行拥塞控制,所述N个第三报文中的序列号相同。The first sending submodule is configured to send the N third messages obtained by the modifying unit to the N working nodes, and the first value in each third message indicates the corresponding work The node performs congestion control, and the sequence numbers in the N third messages are the same.
可选的,结合上述第二方面的第一种至第二种可能的实现方式,在第三种可能的实现方式中,所述网内计算交换机还包括:Optionally, in combination with the first to second possible implementation manners of the above second aspect, in a third possible implementation manner, the in-network computing switch further includes:
所述获取单元,用于在所述第一值的拥塞指示时效期内,接收到所述第一工作节点发送的第四报文,所述第四报文中所述ECN标志位上的值为所述第一值;The acquiring unit is configured to receive a fourth message sent by the first working node within the congestion indication aging period of the first value, and the value on the ECN flag bit in the fourth message Is the first value;
忽略单元,用于忽略所述获取单元获取到的所述第四报文中的所述第一值。The ignoring unit is configured to ignore the first value in the fourth message obtained by the obtaining unit.
可选的,结合上述第二方面第二种可能的实现方式,在第四种可能的实现方式中,Optionally, in combination with the second possible implementation manner of the second aspect described above, in the fourth possible implementation manner,
所述修改单元,用于在所述第一报文的ECN标志位包括第一ECN字段,所述第二报文的ECN标志位包括第二ECN字段时,则将所述第二ECN字段中的值修改为所述第一ECN字段中的所述第一值;或,The modification unit is configured to, when the ECN flag bit of the first packet includes a first ECN field, and the ECN flag bit of the second packet includes a second ECN field, add the second ECN field to The value of is modified to the first value in the first ECN field; or,
所述修改单元,用于在所述第一报文的ECN标志位包括第一前向显示拥塞通知FECN位, 所述第二报文的ECN标志位包括第二FECN位时,则将所述第二FECN位中的值修改为所述第一FECN位中的所述第一值;或,The modification unit is configured to: when the ECN flag bit of the first message includes the first forward display congestion notification FECN bit, and the ECN flag bit of the second message includes the second FECN bit, then the The value in the second FECN bit is modified to the first value in the first FECN bit; or,
所述修改单元,用于在所述第一报文的ECN标志位包括第一后向显示拥塞通知BECN位,所述第二报文的ECN标志位包括第二BECN位时,则将所述第二BECN位中的值修改为所述第一BECN位上的所述第一值。The modification unit is configured to: when the ECN flag bit of the first message includes the first backward display congestion notification BECN bit, and the ECN flag bit of the second message includes the second BECN bit, then the The value in the second BECN bit is modified to the first value on the first BECN bit.
可选的,结合上述第二方面,在第五种可能的实现方式中,所述获取单元可以包括:Optionally, with reference to the above second aspect, in a fifth possible implementation manner, the acquiring unit may include:
第二获取模块,用于在所述第一工作节点与所述网内计算交换机之间的端口状态显示拥塞时,则将N个待广播数据报文中的ECN标志位上的值进行修改,以得到拥塞信息,其中,所述N个待广播数据报文为所述N个工作节点中序列号相同的报文;The second acquisition module is configured to modify the value of the ECN flag bit in the N data packets to be broadcast when the port status between the first working node and the in-network computing switch shows congestion, To obtain congestion information, where the N data messages to be broadcast are messages with the same sequence number among the N working nodes;
对应地,所述发送单元,包括:Correspondingly, the sending unit includes:
第二发送模块,用于向所述N个工作节点发送修改后的N个所述待广播数据报文,所述修改后的N个待广播数据报文中的ECN标志位上的值分别用于指示所述N个工作节点进行拥塞控制。The second sending module is configured to send the modified N data messages to be broadcast to the N working nodes, and the values on the ECN flag bits in the modified N data messages to be broadcast are respectively used Instruct the N working nodes to perform congestion control.
可选的,结合上述第二方面第五种可能的实现方式,在第六种可能的实现方式中,所述第二获取模块,用于在所述待广播数据报文中的ECN标志位包括第三ECN字段时,则所述网内计算交换机将所述第三ECN字段中的值进行置位;或,Optionally, in combination with the fifth possible implementation manner of the second aspect described above, in the sixth possible implementation manner, the second acquisition module is configured to include the ECN flag bit in the data message to be broadcast When the third ECN field is used, the calculation switch in the network sets the value in the third ECN field; or,
所述第二获取模块,用于在所述待广播数据报文中的ECN标志位包括第三FECN位时,则所述网内计算交换机将所述第三FECN位中的值进行置位。The second acquiring module is configured to, when the ECN flag bit in the data message to be broadcast includes the third FECN bit, the in-network computing switch sets the value in the third FECN bit.
第三方面,本申请实施例提供一种计算机设备,包括:处理器和存储器;该存储器用于存储程序指令,当该计算机设备运行时,该处理器执行该存储器存储的该程序指令,以使该计算机设备执行如上述第一方面或第一方面任意一种可能实现方式的拥塞信息同步的方法。In a third aspect, an embodiment of the present application provides a computer device, including: a processor and a memory; the memory is used to store program instructions, and when the computer device is running, the processor executes the program instructions stored in the memory to enable The computer device executes the congestion information synchronization method as described in the first aspect or any one of the possible implementation manners of the first aspect.
第四方面,本申请实施例提供了一种计算机可读存储介质,包括指令,当其在计算机上运行时,使得计算机执行如第一方面或第一方面任意一种可能实现方式的方法。In a fourth aspect, embodiments of the present application provide a computer-readable storage medium, including instructions, which when run on a computer, cause the computer to execute a method as in the first aspect or any one of the possible implementation manners of the first aspect.
第五方面,本申请实施例提供了一种包含指令的计算机程序产品,当其在计算机上运行时,使得计算机执行如第一方面或第一方面任意一种可能实现方式的方法。In the fifth aspect, the embodiments of the present application provide a computer program product containing instructions, which when run on a computer, cause the computer to execute the method as in the first aspect or any one of the possible implementation manners of the first aspect.
第六方面,本申请实施例提供了一种芯片系统,该芯片系统包括处理器,用于支持网内计算交换机实现上述第一方面或第一方面任意一种可能的实现方式中所涉及的功能。在一种可能的设计中,芯片系统还包括存储器,存储器,用于保存网内计算交换机必要的程序指令和数据。该芯片系统,可以由芯片构成,也可以包含芯片和其他分立器件。In a sixth aspect, an embodiment of the present application provides a chip system, which includes a processor, and is used to support the in-network computing switch to implement the functions involved in the first aspect or any one of the possible implementation manners of the first aspect. . In a possible design, the chip system also includes a memory, and the memory is used to store the program instructions and data necessary for calculating the switch in the network. The chip system can be composed of chips, or include chips and other discrete devices.
从以上技术方案可以看出,本申请实施例具有以下优点:It can be seen from the above technical solutions that the embodiments of the present application have the following advantages:
本申请实施例中,由于拥塞信息能够指示出任意一个工作节点与网内计算交换机之间的第一通信链路发生拥塞,因此网内计算交换机在获取到拥塞信息时,则将该拥塞信息都携带于N个序列号相同的报文中,从而将这N个序列号相同的报文发送至N个工作节点。也就是说,网内计算交换机通过向N个工作节点发送均携带有拥塞信息的N个序列号相同的报文,不仅实现了拥塞信息能够同步到这N个工作节点中,使得N个工作节点在分别接收到报文后,可以根据报文中所携带的拥塞信息进行同步地拥塞控制,填补了网内计算网络中缺少适用的拥塞信息同步的空白,而且进一步地使得N个工作节点中的每个工作节点的发送速率趋向 平滑。In the embodiment of the present application, because the congestion information can indicate that the first communication link between any working node and the computing switch in the network is congested, when the computing switch in the network obtains the congestion information, the congestion information is all It is carried in N messages with the same sequence number, so that the N messages with the same sequence number are sent to N working nodes. In other words, the computing switch in the network sends N packets with the same sequence number that all carry congestion information to N working nodes, which not only realizes that the congestion information can be synchronized to these N working nodes, so that N working nodes After each message is received, the congestion control can be performed synchronously according to the congestion information carried in the message, which fills in the gap of the lack of applicable congestion information synchronization in the computing network in the network, and further makes the N working nodes The sending rate of each working node tends to be smooth.
附图说明Description of the drawings
为了更清楚地说明本申请实施例的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例。In order to explain the technical solutions of the embodiments of the present application more clearly, the following will briefly introduce the drawings needed in the description of the embodiments. Obviously, the drawings in the following description are only some embodiments of the present application.
图1是本申请实施例提供的一种系统架构示意图;FIG. 1 is a schematic diagram of a system architecture provided by an embodiment of the present application;
图2是本实施例提供的拥塞信息同步的方法的一个实施例示意图;FIG. 2 is a schematic diagram of an embodiment of a method for synchronization of congestion information provided by this embodiment;
图3是本实施例提供的网内计算交换机进行聚合计算的示意图;FIG. 3 is a schematic diagram of aggregation calculation performed by the in-network computing switch provided by this embodiment;
图4是本实施例提供的拥塞信息同步的方法的另一个实施例示意图;FIG. 4 is a schematic diagram of another embodiment of a method for synchronization of congestion information provided by this embodiment;
图5是本申请实施例中提出的RoCE v2协议或者TCP协议的ECN标志位的状态示意图;FIG. 5 is a schematic diagram of the state of the ECN flag bit of the RoCE v2 protocol or the TCP protocol proposed in the embodiment of the present application;
图6是本实施例提供的拥塞信息同步的方法的另一个实施例示意图;FIG. 6 is a schematic diagram of another embodiment of the method for synchronization of congestion information provided by this embodiment;
图7是本申请实施例提供的网内计算交换机的一个实施例示意图;FIG. 7 is a schematic diagram of an embodiment of an in-network computing switch provided by an embodiment of the present application;
图8是本申请实施例提供的网内计算交换机的另一个实施例示意图;FIG. 8 is a schematic diagram of another embodiment of an in-network computing switch provided by an embodiment of the present application;
图9是本申请实施例提供的网内计算交换机的另一个实施例示意图;FIG. 9 is a schematic diagram of another embodiment of an in-network computing switch provided by an embodiment of the present application;
图10是本申请实施例提供的网内计算交换机的另一个实施例示意图;FIG. 10 is a schematic diagram of another embodiment of an in-network computing switch provided by an embodiment of the present application;
图11是本申请实施例中的通信装置的硬件结构一个示意图。FIG. 11 is a schematic diagram of the hardware structure of a communication device in an embodiment of the present application.
具体实施方式Detailed ways
本申请实施例提供了一种拥塞信息同步的方法以及相关装置,用于向N个工作节点同步发送拥塞信息,填补了网内计算网络中缺少适用的拥塞信息同步的空白,进一步地使得N个工作节点中的每个工作节点的发送速率趋向平滑。The embodiments of the present application provide a method and related devices for synchronization of congestion information, which are used to synchronously send congestion information to N working nodes, which fills in the gap of the lack of applicable congestion information synchronization in the intranet computing network, and further enables N The sending rate of each of the working nodes tends to be smooth.
本申请的说明书和权利要求书及上述附图中的术语“第一”、“第二”、“第三”、“第四”等(如果存在)是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。应该理解这样使用的数据在适当情况下可以互换,以便这里描述的本申请的实施例能够以除了在这里图示或描述的那些以外的顺序实施。此外,术语“包括”和“具有”以及他们的任何变形,意图在于覆盖不排他的包含,例如,包含了一系列步骤或单元的过程、方法、系统、产品或设备不必限于清楚地列出的那些步骤或单元,而是可包括没有清楚地列出的或对于这些过程、方法、产品或设备固有的其它步骤或单元。The terms "first", "second", "third", "fourth", etc. (if any) in the description and claims of this application and the above-mentioned drawings are used to distinguish similar objects, without having to use To describe a specific order or sequence. It should be understood that the data used in this way can be interchanged under appropriate circumstances, so that the embodiments of the present application described herein can be implemented in a sequence other than those illustrated or described herein. In addition, the terms "including" and "having" and any variations of them are intended to cover non-exclusive inclusions. For example, a process, method, system, product, or device that includes a series of steps or units is not necessarily limited to those clearly listed. Those steps or units may include other steps or units that are not clearly listed or are inherent to these processes, methods, products, or equipment.
拥塞控制是提高网络资源利用率、优化传输质量的重要手段,拥塞处理的好坏直接影响着系统的性能。由于网内计算网络能够充分利用网络计算资源,可以为分布式计算节点分摊部分的关键计算,并且能够提供聚合计算,使得多份数据聚合为一份,从而压缩网络带宽占用,加快网络传输,因此能够提供聚合计算的网内计算网络越来越受到青睐。并且网内计算网络具备相应的流控特点,即:1.网内计算交换机对N个工作节点(N为大于2的整数)发来的序列号(index值)相同的报文进行聚合计算,计算完N个工作节点发来的同一序列号的数据报文中的参数之后才会发送出去,否则丢包;2.为了防止网内计算交换机中缓存区溢出,则需要N个工作节点发送数据时的发送速率进行同步;3.N个工作节点的发送速率取决于拓扑中最慢的链路。而传统的基于ECN标志位的拥塞控制方法只适用于点对点的单播通信中,无法适用于网内计算网络的多对多的同步通信模式中,即采用点对点的单播通信中的拥塞控制方法,并不能够解决当任意一个工作节点因发生拥塞而导致发送速率降 低时,其他的N-1个工作节点的发送速率也能够相应的降低。Congestion control is an important means to improve network resource utilization and optimize transmission quality. The quality of congestion handling directly affects the performance of the system. Since the in-network computing network can make full use of network computing resources, it can allocate part of the key calculations for distributed computing nodes, and can provide aggregate calculations, so that multiple pieces of data can be aggregated into one, thereby reducing network bandwidth usage and speeding up network transmission. In-network computing networks that can provide aggregated computing are becoming more and more popular. And the intranet computing network has corresponding flow control features, namely: 1. The intranet computing switch performs aggregation calculation on packets with the same sequence number (index value) sent by N working nodes (N is an integer greater than 2), After calculating the parameters in the data message with the same serial number sent by N working nodes, it will be sent out, otherwise the packet will be lost; 2. In order to prevent the buffer area in the calculation switch in the network from overflowing, N working nodes are required to send data Synchronize the sending rate at the time; the sending rate of 3.N working nodes depends on the slowest link in the topology. The traditional congestion control method based on the ECN flag bit is only suitable for point-to-point unicast communication, and cannot be applied to the many-to-many synchronous communication mode of intranet computing network, that is, the congestion control method in point-to-point unicast communication is adopted. , It cannot solve that when any working node is congested and the sending rate is reduced, the sending rate of the other N-1 working nodes can also be reduced accordingly.
因此,为了解决上述的问题,本申请实施例所提出的方法主要应用于基于ECN标志位进行拥塞控制的网内计算网络的应用场景。前述的网内计算网络的应用场景包括但不限于人工智能(artificial intelligence,AI)分布式训练、映射归约模型(MapReduce)或者分布式数据库等等。基于网内计算网络的应用场景,请参阅图1,本申请实施例提供了一种系统架构示意图。从图1可以看出,该系统可以包括网内计算交换机、以及N个工作节点;其中,网内计算交换机主要用于获取拥塞信息,而由于拥塞信息能够指示出这N个工作节点中的任意一个工作节点与网内计算交换机之间的第一通信链路发生了拥塞(图1中的黑圆点),因而在第一通信链路发生拥塞的时候,该网内计算交换机便可以在N个序列号相同的报文中都携带上该拥塞信息,这样网内计算交换机便可以向N个工作节点发送N个序列号相同的报文。这样基于N个序列号相同的报文来携带相同的拥塞信息,不仅实现了拥塞信息能够同步到这N个工作节点中,而且还能够使得N个工作节点在接收到这N个序列号相同的报文后,可以根据报文中所携带的拥塞信息进行同步地拥塞控制,避免没有发生拥塞的通信链路所对应的工作节点因发送速率过快而导致发送中断甚至超时的情况。Therefore, in order to solve the above-mentioned problem, the method proposed in the embodiment of the present application is mainly applied to the application scenario of the in-network computing network that performs congestion control based on the ECN flag. The foregoing application scenarios of the in-network computing network include, but are not limited to, artificial intelligence (AI) distributed training, a mapping reduction model (MapReduce), or a distributed database, and so on. Based on the application scenario of the intranet computing network, please refer to FIG. 1. An embodiment of the present application provides a schematic diagram of a system architecture. It can be seen from Figure 1 that the system can include an in-network computing switch and N working nodes; among them, the in-network computing switch is mainly used to obtain congestion information, and because the congestion information can indicate any of the N working nodes The first communication link between a working node and the computing switch in the network is congested (the black dot in Figure 1). Therefore, when the first communication link is congested, the computing switch in the network can be in N All packets with the same sequence number carry the congestion information, so that the computing switch in the network can send N packets with the same sequence number to N working nodes. In this way, based on N packets with the same sequence number to carry the same congestion information, it not only realizes that the congestion information can be synchronized to the N working nodes, but also enables the N working nodes to receive the same N serial numbers. After the message is sent, congestion control can be performed synchronously according to the congestion information carried in the message, so as to avoid the situation that the working node corresponding to the communication link that is not congested causes transmission interruption or even timeout due to the excessively fast sending rate.
需要理解的是,图1中所示出的发生拥塞的第一通信链路为网内计算交换机与工作节点0之间链路,仅仅是一个示意性的描述,具体在实际应用中,发生拥塞的第一通信链路也可以是工作节点1或者工作节点2等工作节点与网内计算交换机之间的链路,具体在本申请实施例中将不做具体限定。It should be understood that the first communication link that is congested shown in Figure 1 is the link between the computing switch in the network and the working node 0, which is only a schematic description. In actual applications, congestion occurs. The first communication link may also be a link between a working node such as a working node 1 or a working node 2 and a computing switch in the network, which will not be specifically limited in the embodiment of the present application.
需要理解的是,前述的网内计算交换机除了具备传统的转发能力之外,还具备一定的可编程能力和计算能力,能够对报文中的字段做计算和修改,例如:修改ECN字段、把计算结果替换到报文的负荷中等等。另外,前述的网内计算交换机包括但不限于Bareboot Wedge100B交换机、Cisco N3400交换机等,具体在本申请实施例中将不做限定。另外,前述的N个工作节点可以是带有图形处理器(graphics processing unit,GPU)的服务器、训练节点等,具体在本申请实施例中将不做具体限定。It should be understood that, in addition to the traditional forwarding capabilities, the aforementioned intra-network computing switches also have certain programmability and computing capabilities, and can calculate and modify the fields in the message, such as modifying the ECN field and the The calculation result is substituted into the load of the message and so on. In addition, the aforementioned intra-network computing switches include but are not limited to Bareboot Wedge 100B switches, Cisco N3400 switches, etc., which will not be specifically limited in the embodiments of the present application. In addition, the aforementioned N working nodes may be servers with graphics processing units (GPUs), training nodes, etc., which will not be specifically limited in the embodiments of the present application.
本实施例中的拥塞信息同步的方法除了可以适用于上述图1所示的系统架构,还可以适用于其他系统架构,具体此处不作限定。The method for synchronizing congestion information in this embodiment is not only applicable to the system architecture shown in FIG. 1 above, but also applicable to other system architectures, which is not specifically limited here.
为便于更好地理解本申请实施例所提出的方案,下面将对本实施例中的具体流程进行介绍,请参阅图2,是本实施例提供的拥塞信息同步的方法的一个实施例示意图,该方法可以包括:In order to facilitate a better understanding of the solution proposed in the embodiment of this application, the specific process in this embodiment will be introduced below. Please refer to FIG. 2, which is a schematic diagram of an embodiment of the method for synchronization of congestion information provided by this embodiment. Methods can include:
201、网内计算交换机获取拥塞信息,拥塞信息用于指示第一通信链路发生拥塞,第一通信链路为第一工作节点与网内计算交换机之间的链路,第一工作节点为N个工作节点中的任意一个,N为大于2的整数。201. The computing switch in the network obtains congestion information, the congestion information is used to indicate that the first communication link is congested, the first communication link is the link between the first working node and the computing switch in the network, and the first working node is N Any one of the working nodes, N is an integer greater than 2.
实施例中,每个工作节点都与网内计算交换机之间存在相应的通信链路,而在这N个工作节点中的任意一个工作节点与该网内计算交换机之间的第一通信链路发生拥塞,网内计算交换机都会获取拥塞信息,然后基于该拥塞信息明确出拥塞发生。In the embodiment, there is a corresponding communication link between each working node and the computing switch in the network, and the first communication link between any one of the N working nodes and the computing switch in the network In the event of congestion, the computing switches in the network will obtain congestion information, and then determine the occurrence of congestion based on the congestion information.
可以理解的是,该拥塞信息可以是网内计算交换机检测到自身通往这N个工作节点的任意一个通信端口处于拥塞状态,从而基于处于拥塞状态的通信端口来获取到拥塞信息;也可以由发生拥塞的第一通信链路所对应的第一工作节点将拥塞信息通知到网内计算交换 机。应当理解的是,在本申请实施例中将对拥塞信息的获取方式不做具体限定。It is understandable that the congestion information can be that the computing switch in the network detects that any communication port leading to these N working nodes is in a congested state, and thus obtains the congestion information based on the communication port in the congested state; or The first working node corresponding to the first communication link where the congestion occurs notifies the congestion information to the in-network computing switch. It should be understood that the method for obtaining congestion information will not be specifically limited in the embodiment of the present application.
202、网内计算交换机向N个工作节点发送N个序列号相同的报文,N个序列号相同的报文均携带拥塞信息,以使得N个工作节点分别基于拥塞信息进行拥塞控制。202. The computing switch in the network sends N packets with the same sequence number to N working nodes, and the N packets with the same sequence number all carry congestion information, so that the N working nodes respectively perform congestion control based on the congestion information.
实施例中,序列号能够指示出报文的编号,而针对序列号相同的报文则表明N个工作节点分别向网内计算交换机所发送的数据报文是属于同一批次发送的,这样网内计算交换机基于序列号能够区分出N个工作节点发来的数据报文,并将来自N个不同的工作节点但序列号相同的数据报文中的参数对同一批次的数据报文做相应的聚合计算。In the embodiment, the serial number can indicate the serial number of the message, and for the messages with the same serial number, it indicates that the data messages sent by the N working nodes to the computing switch in the network belong to the same batch. The internal computing switch can distinguish the data packets sent by N working nodes based on the serial number, and correspond the parameters of the data packets from N different working nodes with the same serial number to the same batch of data packets. The aggregate calculation.
前述的聚合计算可以理解成,N个工作节点向网内计算交换机同步发送携带有待计算数据的数据报文,不同的工作节点发送不同的数据报文用序列号进行编号,而网内计算交换机在接收到N个数据报文后,将序列号相同的数据报文中的参数做相应的聚合计算,当计算完N个工作节点发送来的某个序列号的全部数据报文后,则网内计算交换机会将聚合结果通过报文的形式发送给N个工作节点。The foregoing aggregation calculation can be understood as that N working nodes synchronously send data packets carrying data to be calculated to the computing switches in the network, and different working nodes send different data packets that are numbered with serial numbers, and the computing switches in the network are After receiving N data packets, the parameters in the data packets with the same sequence number are aggregated accordingly. When all the data packets with a certain sequence number sent by N working nodes are calculated, the network The computing switch will send the aggregation result to N working nodes in the form of packets.
举例来说,请参阅图3提供的网内计算交换机进行聚合计算的示意图。从图3中可以看出,工作节点0先后向网内计算交换机发送的数据报文中的参数分别为“1”、“2”、“3”;工作节点1先后向网内计算交换机发送的数据报文中的参数分别为“4”、“5”、“6”;工作节点2先后向网内计算交换机发送的数据报文中的参数分别为“7”、“8”、“9”。而参数“1”、“4”、“7”所分别对应的数据报文的序列号(假设为index0)是相同的,而参数“2”、“5”、“8”所分别对应的数据报文的序列号(假设为index1)相同,而参数“3”、“6”、“9”所分别对应的数据报文的序列号(假设为index2)是相同的。For example, please refer to Figure 3 for a schematic diagram of an in-network computing switch performing aggregation calculation. It can be seen from Figure 3 that the parameters in the data packets sent by the working node 0 to the computing switch in the network are "1", "2", and "3" respectively; The parameters in the data message are "4", "5", and "6"; the parameters in the data message sent by the working node 2 to the computing switch in the network are "7", "8", and "9" respectively. . The serial numbers of the data messages corresponding to the parameters "1", "4", and "7" (assuming index0) are the same, while the data corresponding to the parameters "2", "5", and "8" The serial numbers of the messages (assuming index1) are the same, and the serial numbers of the data messages corresponding to the parameters "3", "6", and "9" (assuming index2) are the same.
于是网内计算交换机会先后针对工作节点0、工作节点1以及工作节点2中发送来的序列号相同的数据报文中的参数进行求和平均,例如对index0所对应的数据报文中的参数“1”、“4”、“7”进行平均,则得到聚合结果为4。这样网内计算交换机在计算完这三个工作节点发来的同一个序列号的数据报文中的参数之后,才会将该序列号对应的聚合结果发送至相应的工作节点。可以理解的是,图3中的工作节点0、工作节点1以及工作节点2仅仅是针对聚合计算作出的一个示意性的描述,具体在实际应用中对聚合计算中所涉及到的工作节点的个数不做限定,只要N为大于2的整数即可。Therefore, the computing switch in the network will sum and average the parameters in the data packets with the same sequence number sent from working node 0, working node 1, and working node 2, for example, the parameters in the data packet corresponding to index 0 "1", "4", and "7" are averaged, and the polymerization result is 4. In this way, the calculation switch in the network will only send the aggregation result corresponding to the sequence number to the corresponding working node after calculating the parameters in the data message with the same sequence number sent by the three working nodes. It is understandable that the working node 0, working node 1 and working node 2 in Fig. 3 are only a schematic description for the aggregation calculation. In practical applications, the number of working nodes involved in the aggregation calculation The number is not limited, as long as N is an integer greater than 2.
因此,为了实现将拥塞信息同步至N个工作节点,网内计算交换机则需要将该拥塞信息都携带于N个序列号相同的报文中,这样便可以将这N个序列号相同的报文发送至N个工作节点。也就是说网内计算交换机需要在聚合完成后所得到的报文中携带上拥塞信息,并将聚合完成后且携带了拥塞信息的报文发送至N个工作节点,因为只有向N个工作节点都发送序列号相同的报文,才能够使得这N个工作节点获取到同一批次的报文中的拥塞信息,即说明了发往N个工作节点中的拥塞信息是同步的,这样才使得这N个工作节点都能基于所收到的报文中的拥塞信息进行同步地拥塞控制,避免没有发生拥塞的通信链路所对应的工作节点因发送速率过快而导致发送中断甚至超时的情况。Therefore, in order to synchronize the congestion information to N working nodes, the network computing switch needs to carry the congestion information in N packets with the same sequence number, so that these N packets with the same sequence number can be transferred. Send to N working nodes. That is to say, the computing switch in the network needs to carry the congestion information in the message obtained after the aggregation is completed, and send the message that carries the congestion information after the aggregation is completed to N working nodes, because only to N working nodes All messages with the same sequence number can be sent to the N working nodes to obtain the congestion information in the same batch of messages, which means that the congestion information sent to the N working nodes is synchronized, so that These N working nodes can perform synchronous congestion control based on the congestion information in the received message, avoiding the situation that the working node corresponding to the communication link that is not congested causes the transmission to be interrupted or even timed out due to the excessively fast transmission rate. .
另外,需要理解的是,N个工作节点分别基于拥塞信息进行拥塞控制可以理解为N个工作节点将各自对应的发送报文的速率同步地降低等等,具体在本申请实施例中将不做限定说明。In addition, it should be understood that the N working nodes respectively perform congestion control based on the congestion information, which can be understood as the N working nodes synchronously reducing the rate of sending messages corresponding to each other, etc., which will not be specifically done in this embodiment of the application. Limited description.
基于上述图2对应的实施例可知,网内计算交换机可以通过多种方式来获取到拥塞信 息,并通过不同的方式来实现拥塞信息的同步,下面将分别通过实施例进行详细地介绍:Based on the embodiment corresponding to Figure 2 above, it can be seen that the in-network computing switch can obtain congestion information in a variety of ways, and realize the synchronization of congestion information in different ways, which will be described in detail in the following embodiments:
情况一、由发生拥塞的通信链路所对应的工作节点告知网内计算交换机。Case 1: The working node corresponding to the congested communication link informs the computing switch in the network.
情况二、网内计算交换机主动检测。Case 2: Active detection by the computing switch in the network.
一、针对上述的情况一,请参阅图4,为本申请实施例提供的拥塞信息同步的方法的另一个实施例示意图。如图4所示,本申请实施例提供的拥塞信息同步的方法的另一个实施例可以包括:1. For the above situation 1, please refer to FIG. 4, which is a schematic diagram of another embodiment of the method for synchronizing congestion information provided by the embodiment of this application. As shown in FIG. 4, another embodiment of the method for synchronizing congestion information provided by the embodiment of the present application may include:
401、网内计算交换机获取第一工作节点发送的第一报文,第一报文携带拥塞信息,拥塞信息包括第一报文的显示拥塞通知ECN标志位上的第一值,第一值用于指示第一通信链路发生拥塞。401. The in-network computing switch obtains the first message sent by the first working node, the first message carries congestion information, and the congestion information includes the first value on the ECN flag bit of the display congestion notification of the first message, and the first value is used To indicate that the first communication link is congested.
实施例中,网内计算交换机自身的通信端口没有发生拥塞,而是与该网内计算交换机所连接的其他交换机的任意一个通信端口发生了拥塞,此处的第一工作节点可以理解成通过第一通信链路间接地与该网内计算交换机连接,那么发生拥塞的第一通信链路所对应的第一工作节点就会将拥塞信息携带于第一报文中,从而发送至网内计算交换机。In the embodiment, the communication port of the computing switch in the network is not congested, but it is congested with any communication port of other switches connected to the computing switch in the network. The first working node here can be understood as passing the first working node. A communication link is indirectly connected to the computing switch in the network, then the first working node corresponding to the first communication link that is congested will carry the congestion information in the first message and send it to the computing switch in the network. .
可以理解的是,拥塞信息可以包括该第一报文的ECN标志位上的第一值,第一报文中的ECN标志位是位于报文头部中长度为2bit的字段。例如:在基于融合以太网的远程直接内存访问(remote direct memory access over converged ethernet,RDMA over converged ethernet,RoCE)v2协议或者TCP协议中,前述的ECN标志位是位于互联网通信协议第四版(Internet protocol version 4,IPv4)或者互联网通信协议第六版(Internet protocol version 6,IPv6)报文头部的2bit字段中;而在无线宽带技术(InfiniBand,IB)协议中,ECN标志位是位于基础传输头部(base transport header,BTH)的2bit字段中,具体可以由前向显式拥塞通知(forward explicit congestion notification,FECN)或后向显示拥塞通知(backward explicit congestion notification,BECN)构成,即第一个比特为FECN位,后一个比特为BECN位,具体在本申请实施例中将不做限定。It is understandable that the congestion information may include the first value on the ECN flag bit of the first packet, and the ECN flag bit in the first packet is a field with a length of 2 bits located in the header of the packet. For example: in remote direct memory access (remote direct memory access overconverged ethernet, RDMA overconverged ethernet, RoCE) v2 protocol or TCP protocol based on converged Ethernet, the aforementioned ECN flag is located in the fourth version of the Internet communication protocol (Internet protocol version 4, IPv4) or Internet protocol version 6 (Internet protocol version 6, IPv6) in the 2bit field of the packet header; while in the wireless broadband technology (InfiniBand, IB) protocol, the ECN flag is located in the basic transmission The 2-bit field of the header (base transport header, BTH) can be specifically composed of forward explicit congestion notification (FECN) or backward explicit congestion notification (BECN), that is, the first The one bit is the FECN bit, and the latter bit is the BECN bit, which will not be specifically limited in the embodiment of the present application.
另外,第一值为前述第一报文的ECN标志位上的取值,它可以用来指示出第一通信链路发生拥塞。例如:请参阅图5,为本申请实施例中提出的RoCE v2协议或者TCP协议的ECN标志位的状态示意图。从图5可以看出,在IPv4或者IPv6报文头部的ECN字段的值取值为11时,相应的状态为前向拥塞标志,表示发生拥塞,因此在RoCE v2协议或者TCP协议中的ECN标志位上的第一值可以取值为11,用于指示第一工作节点与网内计算交换机之间的第一通信链路发生拥塞。In addition, the first value is the value on the ECN flag bit of the aforementioned first message, which can be used to indicate that the first communication link is congested. For example, please refer to FIG. 5, which is a schematic diagram of the state of the ECN flag bit of the RoCE v2 protocol or the TCP protocol proposed in the embodiment of this application. It can be seen from Figure 5 that when the value of the ECN field in the IPv4 or IPv6 packet header is 11, the corresponding state is the forward congestion flag, indicating that congestion occurs. Therefore, the ECN in the RoCE v2 protocol or the TCP protocol The first value on the flag bit may take a value of 11, which is used to indicate that the first communication link between the first working node and the computing switch in the network is congested.
而在InfiniBand协议中的ECN标志位中,FECN位的值为1表示拥塞发生,BECN位的值为1也表示拥塞发生,但值得注意的是,假设数据流从工作节点A流向工作节点B,此时FECN位取值为1,则表示A向B发送数据流的过程中遭遇拥塞;若数据流从工作节点B流向工作节点A,此时BECN位取值为1,则表示B向A发送数据流的过程中遭遇拥塞。因此在InfiniBand协议中的ECN标志位上的第一值可以包括:第一个比特的取值为1,或者第二比特的取值为1。In the ECN flag bit in the InfiniBand protocol, the value of the FECN bit is 1 indicating that congestion occurs, and the value of the BECN bit is 1 also indicating congestion. At this time, the value of the FECN bit is 1, which means that A is experiencing congestion in the process of sending data stream to B; if the data stream flows from the working node B to the working node A, and the value of the BECN bit is 1, it means that B is sending to A Congestion is encountered in the process of data flow. Therefore, the first value on the ECN flag bit in the InfiniBand protocol may include: the value of the first bit is 1, or the value of the second bit is 1.
因此,在RoCE v2协议或者TCP协议中,拥塞信息可以表示成IP.ECN=11,而在InfiniBand协议中,拥塞信息可以表示成InfiniBand.FECN=1或者InfiniBand.BECN=1。但应当理解的是,ECN字段、FECN位或者BECN位上的值除了前述的定义以外,在实际应用中还可能使用其他的值来指示拥塞发生,具体在本申请实施例中将不做限定。Therefore, in the RoCE v2 protocol or the TCP protocol, the congestion information can be expressed as IP.ECN=11, and in the InfiniBand protocol, the congestion information can be expressed as InfiniBand.FECN=1 or InfiniBand.BECN=1. However, it should be understood that, in addition to the aforementioned definitions, the value of the ECN field, the FECN bit, or the BECN bit may also use other values to indicate the occurrence of congestion in practical applications, which will not be specifically limited in the embodiment of the present application.
402、在第一值的拥塞指示时效期内,网内计算交换机向N个工作节点发送N个序列号相同的报文,N个序列号相同的报文均携带拥塞信息。402. During the congestion indication aging period of the first value, the in-network computing switch sends N packets with the same sequence number to N working nodes, and the N packets with the same sequence number all carry congestion information.
在图3所描述的聚合计算的基础上,假设工作节点1和工作节点2发送的index0所分别对应的数据报文中的参数“4”、“7”已经达到网内计算交换机,那么此时网内计算交换机就会一直等待工作节点0发送index0所对应的数据报文中的参数“1”,直到接收到“1”之后,网内计算交换机才会进一步地针对index0所对应的所有的数据报文做聚合计算。但是,若工作节点0因第一通信链路发生拥塞而导致网内计算交换机一直无法收到相应的数据报文,从而无法做聚合计算,并且由于网内计算交换机的缓存区的空间有限,而没有发生拥塞的工作节点1和工作节点2会一直不间断地向网内计算交换机发送数据报文,这样很容易就会导致网内计算交换机的缓存区溢出且耗尽。基于网内计算网络中的发送速率往往取决于拓扑中最慢的链路的流控特点,针对某个序列号的数据报文在聚合完成之前,工作节点发送来的数据报文会因被缓存起来或者被丢弃而造成拥塞信息的过时且失效。On the basis of the aggregation calculation described in Figure 3, assuming that the parameters "4" and "7" in the data packets corresponding to index0 sent by working node 1 and working node 2 have reached the calculation switch in the network, then at this time The calculation switch in the network will always wait for the working node 0 to send the parameter "1" in the data message corresponding to index0. After receiving "1", the calculation switch in the network will further target all the data corresponding to index0. Packets are aggregated and calculated. However, if the working node 0 is congested on the first communication link, the calculation switch in the network cannot receive the corresponding data packets, so that the aggregation calculation cannot be performed, and the buffer area of the calculation switch in the network is limited. Working node 1 and working node 2 that are not in congestion will continuously send data packets to the computing switch in the network, which can easily cause the buffer area of the computing switch in the network to overflow and be exhausted. Based on the calculation in the network, the transmission rate in the network often depends on the flow control characteristics of the slowest link in the topology. Before the aggregation of a data message with a certain sequence number is completed, the data message sent by the working node will be cached. Up or being discarded, causing the congestion information to become outdated and invalid.
因此,在拥塞信息具有强时效性的特点下,网内计算交换机在检测出从第一工作节点中所获取到的第一报文携带拥塞信息时,则会通过计时器、报文计时器等监测该第一报文中的拥塞信息的时效期。Therefore, under the characteristics of strong timeliness of congestion information, when the in-network computing switch detects that the first message obtained from the first working node carries congestion information, it will pass a timer, a message timer, etc. Monitor the aging period of the congestion information in the first message.
当第一值的拥塞指示时效期还没有超时的时候,网内计算交换机则在N个序列号相同的报文中都带上拥塞信息,并向N个工作节点分别发送这N个序列号相同的报文,不仅实现了拥塞信息能够同步到这N个工作节点中,使得N个工作节点在接收到这N个序列号相同的报文后,可以根据报文中所携带的拥塞信息进行同步地拥塞控制,填补了网内计算网络中缺少适用的拥塞信息同步的空白,而且还能够基于第一值处于时效期内解决拥塞信息在网内计算网络中出现的失效问题。When the congestion indication aging period of the first value has not expired, the computing switch in the network will carry the congestion information in the N packets with the same sequence number, and send the same N sequence numbers to the N working nodes. The message not only realizes that the congestion information can be synchronized to these N working nodes, so that after receiving these N messages with the same sequence number, the N working nodes can synchronize according to the congestion information carried in the message Local congestion control fills the gap in the lack of applicable congestion information synchronization in the intranet computing network, and can also solve the problem of failure of congestion information in the intranet computing network based on the first value being within the time limit.
可选的,在一些实施例中,在网内计算交换机向N个工作节点发送N个序列号相同的报文之前,该拥塞信息同步的方法还可以包括:Optionally, in some embodiments, before the in-network computing switch sends N packets with the same sequence number to N working nodes, the method for synchronizing congestion information may further include:
网内计算交换机将第二报文的ECN标志位上的值修改为第一值,以得到第三报文,第二报文是在时效期内第一个聚合完成的报文,第二报文的序列号与第三报文的序列号相同;The calculation switch in the network modifies the value of the ECN flag bit of the second packet to the first value to obtain the third packet. The second packet is the first packet to be aggregated within the validity period. The serial number of the message is the same as the serial number of the third message;
对应地,网内计算交换机向N个工作节点发送N个序列号相同的报文,包括:Correspondingly, the computing switch in the network sends N packets with the same sequence number to N working nodes, including:
网内计算交换机向N个工作节点发送N个第三报文,每个第三报文中的第一值指示对应的工作节点进行拥塞控制,N个第三报文中的序列号相同。The computing switch in the network sends N third messages to N working nodes, and the first value in each third message indicates that the corresponding working node performs congestion control, and the sequence numbers in the N third messages are the same.
也就是理解成,在获取到第一报文之后,网内计算交换机先对该第一报文进行解析,而基于网内计算交换机只有完成聚合计算才会将报文发送出去的流控特点,因此无论解析后得到的第一报文是属于第一类型的报文还是属于第二类型的报文,该网内计算交换机都需要在第一值的拥塞指示时效期内等待到第一个聚合完成的报文,即第二报文,然后将该第二报文的ECN标志位上的值修改成第一值,其目的是为了第二报文中能够携带上该拥塞信息,这样便可以将修改后的第二报文作为携带有拥塞信息的第三报文。That is to say, after obtaining the first message, the in-network computing switch first parses the first message, and based on the flow control feature that the in-network computing switch will only send the message out after completing the aggregation calculation. Therefore, regardless of whether the first packet obtained after parsing is a packet of the first type or a packet of the second type, the computing switch in the network needs to wait for the first aggregation within the congestion indicator aging period of the first value. The completed message, that is, the second message, and then modify the value of the ECN flag bit of the second message to the first value. The purpose is to carry the congestion information in the second message, so that it can be The modified second message is taken as the third message carrying congestion information.
这样,网内计算交换机便可以向N个工作节点发送N个第三报文,实现对拥塞信息的同步,使得每个第三报文中的第一值指示对应的工作节点进行拥塞控制,进一步地使得每个工作节点的发送速率趋向平滑。In this way, the computing switch in the network can send N third messages to N working nodes to synchronize the congestion information, so that the first value in each third message indicates that the corresponding working node performs congestion control, and further The ground makes the sending rate of each working node tend to be smooth.
需要说明的是,区分第一报文是属于第一类型的报文还是第二类型的报文,可以通过 第一报文的报文长度进行确定。例如:当第一报文的报文长度在第一预设报文长度内时,则可以确定出该第一报文属于第一类型的报文;当第一报文的报文长度在第二预设报文长度内时,则可以确定出该第一报文属于第二类型的报文。另外,前述的第一预设报文长度大于第二预设报文长度,而第一类型的报文可以理解成能够进行聚合计算,也可以直接广播数据报文;而第二类型的报文可以理解成既不能进行聚合计算,也不能广播数据报文。It should be noted that, distinguishing whether the first message belongs to the first type of message or the second type of message can be determined by the message length of the first message. For example: when the message length of the first message is within the first preset message length, it can be determined that the first message belongs to the first type of message; when the message length of the first message is in the first 2. When the message length is within the preset message length, it can be determined that the first message belongs to the message of the second type. In addition, the aforementioned first preset message length is greater than the second preset message length, and the first type of message can be understood to be able to perform aggregation calculations, or directly broadcast data messages; and the second type of messages It can be understood that neither aggregate calculations nor broadcast data packets can be performed.
可选的,在另一些实施例中,由于ECN标志位上的值在不同协议中的取值有多种形式,因此,网内计算交换机将第二报文的ECN标志位上的值修改为第一值的方式也可以有多种,可以参照下述的方式进行理解:Optionally, in other embodiments, since the value on the ECN flag bit can take multiple forms in different protocols, the in-network computing switch modifies the value on the ECN flag bit of the second packet to There can also be multiple ways of the first value, which can be understood with reference to the following ways:
方式一:当第一报文的ECN标志位包括第一ECN字段,第二报文的ECN标志位包括第二ECN字段时,则网内计算交换机将第二ECN字段中的值修改为第一ECN字段中的第一值。Method 1: When the ECN flag bit of the first packet includes the first ECN field and the ECN flag bit of the second packet includes the second ECN field, the in-network computing switch will modify the value in the second ECN field to the first The first value in the ECN field.
也就是说,在RoCE v2协议或者TCP协议中,若所接收的第一报文中的ECN标志位为第一ECN字段,那么就将第二报文中的第二ECN字段中的值修改成与第一值相同,如:修改成“11”,这样就可以将第一报文中所携带的拥塞信息复制到第二报文中,为后续的拥塞信息的同步提供了多种应用的可能性。That is to say, in the RoCE v2 protocol or the TCP protocol, if the ECN flag bit in the first message received is the first ECN field, then the value in the second ECN field in the second message is modified to Same as the first value, such as: modified to "11", so that the congestion information carried in the first message can be copied to the second message, which provides multiple application possibilities for subsequent synchronization of congestion information sex.
需要理解的是,将第一ECN字段上的值取值为“11”来表示发生拥塞,仅仅是一个示意性的描述,在实际应用中,还有可能将第一ECN字段上的值定义成其它的数值来表示拥塞发生,具体在本申请实施例中将不做限定。It should be understood that setting the value of the first ECN field to "11" to indicate that congestion occurs is only a schematic description. In practical applications, it is also possible to define the value of the first ECN field as Other numerical values indicate the occurrence of congestion, which will not be specifically limited in the embodiment of the present application.
方式二:当第一报文的ECN标志位包括第一前向显示拥塞通知FECN位,第二报文的ECN标志位包括第二FECN位时,则网内计算交换机将第二FECN位中的值修改为第一FECN位中的第一值;或,Manner 2: When the ECN flag bit of the first message includes the first forward display congestion notification FECN bit, and the ECN flag bit of the second message includes the second FECN bit, the intra-network computing switch sets the FECN bit in the second FECN bit The value is modified to the first value in the first FECN bit; or,
方式三:当第一报文的ECN标志位包括第一后向显示拥塞通知BECN位,第二报文的ECN标志位包括第二BECN位时,则网内计算交换机将第二BECN位中的值修改为第一BECN位上的第一值。Method 3: When the ECN flag bit of the first message includes the first backward display congestion notification BECN bit, and the ECN flag bit of the second message includes the second BECN bit, the on-net computing switch will set the value in the second BECN bit The value is modified to the first value on the first BECN bit.
实施例中,针对方式二与方式三,由于InfiniBand协议中的ECN标志位是位于在BTH头部中的2bit字段中,而这2bit字段又由FECN和BECN构成,并且FECN位上的取值为“1”,或者BECN位上的取值为“1”都能够表示发生拥塞。In the embodiment, for the second and third modes, the ECN flag in the InfiniBand protocol is located in the 2bit field in the BTH header, and the 2bit field is composed of FECN and BECN, and the value on the FECN bit is "1", or the value "1" on the BECN bit can indicate congestion.
因此,在InfiniBand协议中,若所接收的第一报文中的ECN标志位为第一FECN位,那么就将第二报文中的第二FECN位中的值修改成与第一值相同,如:修改成“1”。或者,若所接收的第一报文中的ECN标志位为第一BECN位,那么就将第二报文中的第二BECN位中的值修改成与第一值相同,如:修改成“1”,这样就可以将第一报文中所携带的拥塞信息复制到第二报文中,为后续的拥塞信息的同步提供了多种应用的可能性。Therefore, in the InfiniBand protocol, if the ECN flag bit in the first message received is the first FECN bit, then the value in the second FECN bit in the second message is modified to be the same as the first value. Such as: modify to "1". Or, if the ECN flag bit in the first message received is the first BECN bit, then the value in the second BECN bit in the second message is modified to be the same as the first value, such as: 1", so that the congestion information carried in the first message can be copied to the second message, which provides multiple application possibilities for subsequent synchronization of congestion information.
需要理解的是,将第二FECN位上的值取值为“1”或第二BECN位上的值取值为“1”来表示发生拥塞,仅仅是一个示意性的描述,在实际应用中,还有可能将第二FECN位上的值或第二BECN位上的值定义成其它的数值来表示拥塞发生,具体在本申请实施例中将不做限定。It should be understood that the value of the second FECN bit is set to "1" or the value of the second BECN bit is set to "1" to indicate that congestion occurs. This is only a schematic description, and in practical applications It is also possible to define the value on the second FECN bit or the value on the second BECN bit as other numerical values to indicate the occurrence of congestion, which will not be specifically limited in the embodiment of this application.
另外,为了起到节省网络资源,避免拥塞信息的重复发送等目的,那么在另一些实施例中,拥塞信息同步的方法还可以包括:In addition, in order to save network resources and avoid repeated transmission of congestion information, in other embodiments, the method of congestion information synchronization may further include:
若在第一值的拥塞指示时效期内,网内计算交换机接收到第一工作节点发送的第四报 文,第四报文中ECN标志位上的值为第一值;If the in-network computing switch receives the fourth message sent by the first working node during the congestion indication aging period of the first value, the value on the ECN flag bit in the fourth message is the first value;
网内计算交换机忽略第四报文中的第一值。The calculation switch in the network ignores the first value in the fourth packet.
实施例中,如果在第一值的拥塞指示时效期内,网内计算交换机接收到第一工作节点发送的第四报文,此时由于第四报文中的ECN标志位上的值与第一报文所携带的拥塞信息中的第一值相同,那么就说明该第四报文中也携带有拥塞信息。但是,网内计算交换机已经在接收到第一报文的时候开启了一个计时器来监测第一值的时效,而为了避免拥塞信息的重复复制且发送,若此时在第一值的拥塞指示时效期内还接收到同样携带有拥塞信息的第四报文,网内计算交换机则无需在接收到第四报文的时候再重新启用另外一个计时器,而是忽略掉第四报文中的第一值。也就是说网内计算交换机可以忽略掉该第四报文中的拥塞信息,按照原来的转发规则将第四报文转发至目的地,即只需要将第一报文中所携带的拥塞信息进行同步发送即可,节省网络资源。In the embodiment, if within the congestion indication aging period of the first value, the network computing switch receives the fourth message sent by the first working node, at this time, because the value on the ECN flag bit in the fourth message is the same as that of the first working node. The first value in the congestion information carried in one message is the same, it means that the fourth message also carries the congestion information. However, the network computing switch has started a timer to monitor the aging of the first value when it receives the first message, and in order to avoid repeated copying and sending of congestion information, if the congestion indication is at the first value at this time The fourth packet that also carries congestion information is also received within the validity period. The calculation switch in the network does not need to restart another timer when the fourth packet is received, but ignores the fourth packet. The first value. That is to say, the computing switch in the network can ignore the congestion information in the fourth packet, and forward the fourth packet to the destination according to the original forwarding rules, that is, only need to perform the congestion information carried in the first packet. Just send it synchronously, saving network resources.
另外,还可以理解的是,若第一值的拥塞指示时效超时时,则网内计算交换机忽略拥塞信息。也就是说,在第一值的拥塞指示时效已经过时了,那么就说明该第一值所对应的拥塞信息已经失效,此时网内计算交换机再将该失效的拥塞信息进行同步也无法使得N个工作节点进行同步地拥塞控制,因此网内计算交换机可以忽略掉第一报文中的拥塞信息,按照原来的转发规则将第一报文发往目的地,并重新从发生拥塞的通信链路所对应的工作节点中获取其他未失效的拥塞信息。In addition, it can also be understood that if the congestion of the first value indicates that the aging is timed out, the in-network computing switch ignores the congestion information. In other words, the congestion indication aging at the first value has expired, then the congestion information corresponding to the first value has expired. At this time, the network computing switch will synchronize the invalid congestion information and it will not be able to make N Each working node performs congestion control synchronously, so the computing switch in the network can ignore the congestion information in the first message, send the first message to the destination according to the original forwarding rules, and re-transmit the congested communication link Obtain other congestion information that has not failed from the corresponding working node.
二、针对上述的情况二,请参阅图6,为本申请实施例提供的拥塞信息同步的方法的另一个实施例示意图。如图6所示,本申请实施例提供的拥塞信息同步的方法的另一个实施例可以包括:2. In view of the second situation described above, please refer to FIG. 6, which is a schematic diagram of another embodiment of a method for synchronization of congestion information provided in an embodiment of this application. As shown in FIG. 6, another embodiment of the method for synchronizing congestion information provided by the embodiment of the present application may include:
601、当第一工作节点与网内计算交换机之间的端口状态显示拥塞时,则网内计算交换机将N个待广播数据报文中的ECN标志位上的值进行修改,以得到拥塞信息,其中,N个待广播数据报文为N个工作节点中序列号相同的报文。601. When the port status between the first working node and the computing switch in the network shows congestion, the computing switch in the network modifies the value of the ECN flag in the N data packets to be broadcast to obtain congestion information. Among them, the N data messages to be broadcast are messages with the same sequence number among the N working nodes.
实施例中,网内计算交换机可以通过监测N个工作节点发来的报文所在的缓存队列信息,在缓存队列信息已经超出缓存阈值时,则网内计算交换机可以确定出这N个工作节点中的任意一个工作节点与该网内计算交换机之间端口状态已经显示拥塞了,即第一工作节点与网内计算交换机之间的端口状态显示拥塞。In an embodiment, the in-network computing switch can monitor the cache queue information where the messages sent by N working nodes are located. When the cache queue information has exceeded the cache threshold, the in-network computing switch can determine among the N working nodes. The port status between any of the working nodes and the computing switch in the network has shown congestion, that is, the port status between the first working node and the computing switch in the network shows congestion.
需要说明的是,此处的第一工作节点可以理解成通过第一通信链路与该网内计算交换机直接连接的工作节点,另外第一工作节点仅仅是N个工作节点中的任意一个,具体是哪一个本申请实施例中不做限定;其次,网内计算交换机除了通过缓存队列信息来判断端口是否拥塞的方式以外,在实际应用中还可以通过端口利用率等其他的判断方式来确定端口处于拥塞状态,具体在本申请实施例中将不做限定。It should be noted that the first working node here can be understood as a working node that is directly connected to the computing switch in the network through the first communication link. In addition, the first working node is only any one of the N working nodes. It is not limited in the embodiment of this application; secondly, in addition to the way that the in-network computing switch judges whether the port is congested by caching queue information, in practical applications, it can also determine the port by other judgment methods such as port utilization. It is in a congested state, which will not be specifically limited in the embodiment of the present application.
因此,网内计算交换机可以在端口拥塞的时候,将N个待广播数据报文中的ECN标志位上的值进行修改,从而得到拥塞信息,比如:修改成“ECN=1”,或者“ECN=11”等等,那么相应的拥塞信息可以表示成“InfiniBand.ECN=1”或“IP.ECN=11”等等。Therefore, when the port is congested, the computing switch in the network can modify the value of the ECN flag in the N data packets to be broadcast to obtain congestion information, for example, modify it to "ECN=1" or "ECN =11" and so on, then the corresponding congestion information can be expressed as "InfiniBand.ECN=1" or "IP.ECN=11" and so on.
另外,需要说明的是,前述的N个待广播数据报文是网内计算交换机将N个工作节点发来的序列号相同的数据报文完成聚合计算之后,所得到的序列号相同的报文。对于序列号的描述可以参照图2中的步骤202进行理解,具体此处将不做赘述。In addition, it should be noted that the aforementioned N data messages to be broadcast are messages with the same sequence number obtained after the calculation switch in the network completes the aggregation calculation of the data messages with the same sequence number sent by the N working nodes. . The description of the serial number can be understood with reference to step 202 in FIG. 2, and details will not be repeated here.
602、网内计算交换机向N个工作节点发送修改后的N个待广播数据报文,修改后的N个待广播数据报文中的ECN标志位上的值分别用于指示N个工作节点进行拥塞控制。602. The computing switch in the network sends the modified N data messages to be broadcast to N working nodes, and the values on the ECN flag bits in the modified N data messages to be broadcast are used to instruct the N working nodes to perform Congestion control.
实施例中,在对N个待广播数据报文中的ECN标志位上的值进行修改后,而所得到的N个修改后的待广播数据报文均携带了拥塞信息,因此网内计算交换机则可以向N个工作节点发送这N个修改后的待广播数据报文,不仅实现了拥塞信息能够同步到这N个工作节点中,填补了网内计算网络中缺少适用的拥塞信息同步的空白;并且使得N个工作节点在分别接收到修改后的待广播数据报文后,可以根据修改后的待广播数据报文中所携带的拥塞信息进行同步地拥塞控制,进一步地使得N个工作节点中的每个工作节点的发送速率趋向平滑,避免没有发生拥塞的通信链路所对应的工作节点因发送速率过快而导致发送中断甚至超时的情况。In the embodiment, after modifying the value of the ECN flag bit in the N data messages to be broadcast, the obtained N modified data messages to be broadcast all carry congestion information, so the network calculation switch Then the N modified data messages to be broadcast can be sent to N working nodes, which not only realizes that the congestion information can be synchronized to these N working nodes, but also fills the gap of the lack of applicable congestion information synchronization in the intranet computing network. ; And so that after receiving the modified data message to be broadcast, the N working nodes can perform synchronous congestion control according to the congestion information carried in the modified data message to be broadcast, and further make the N working nodes The sending rate of each working node in the network tends to be smooth, so as to avoid the situation that the working node corresponding to the communication link that is not congested causes transmission interruption or even timeout because the sending rate is too fast.
可选的,在另一些实施例中,由于ECN标志位上的值在不同协议中的取值有多种形式,因此,网内计算交换机将第二报文的ECN标志位上的值修改为第一值的方式也可以有多种,因此网内计算交换机将N个待广播数据报文中的ECN标志位上的值进行修改可以参照下述的方式进行理解:Optionally, in other embodiments, since the value on the ECN flag bit can take multiple forms in different protocols, the in-network computing switch modifies the value on the ECN flag bit of the second packet to There can also be multiple ways of the first value. Therefore, the calculation switch in the network can modify the value of the ECN flag in the N data messages to be broadcast, which can be understood by referring to the following ways:
方式一:当待广播数据报文中的ECN标志位包括第三ECN字段时,则网内计算交换机将第三ECN字段中的值进行置位。Manner 1: When the ECN flag in the data message to be broadcast includes the third ECN field, the in-network computing switch sets the value in the third ECN field.
也就是说,在RoCE v2协议或者TCP协议中,若每个待广播数据报文中的的ECN标志位为第三ECN字段,那么就将每个待广播数据报文中的第三ECN字段中的值进行置位,比如“11”,这样就可以将拥塞信息“IP.ECN=11”复制到N个待广播数据报文中,为后续的拥塞信息的同步提供了多种应用的可能性。That is to say, in the RoCE v2 protocol or TCP protocol, if the ECN flag bit in each data message to be broadcast is the third ECN field, then the third ECN field in each data message to be broadcast The value of is set, such as "11", so that the congestion information "IP.ECN=11" can be copied to N data messages to be broadcast, which provides multiple application possibilities for subsequent synchronization of congestion information .
需要理解的是,将第三ECN字段上的值置位为“11”来表示发生拥塞,仅仅是一个示意性的描述,在实际应用中,还有可能将第三ECN字段上的值置位成其它的数值来表示拥塞发生,具体在本申请实施例中将不做限定。It should be understood that setting the value of the third ECN field to "11" indicates that congestion occurs, which is only a schematic description. In practical applications, it is also possible to set the value of the third ECN field. Other values are used to indicate the occurrence of congestion, which will not be specifically limited in the embodiment of the present application.
或者,方式二:当待广播数据报文中的ECN标志位包括第三FECN位时,则网内计算交换机将第三FECN位中的值进行置位。Or, the second method: when the ECN flag bit in the data message to be broadcast includes the third FECN bit, the in-network computing switch sets the value in the third FECN bit.
实施例中,由于InfiniBand协议中的ECN标志位是位于在BTH头部中的2bit字段中,而这2bit字段由FECN位和BECN位构成,并且FECN位上的取值为“1”,或者BECN位上的取值为“1”都能够表示发生拥塞。但是在网内计算交换机主动检测的情况下,只需要将FECN位上的取值进行修改便能够指示出发生拥塞。因此,在InfiniBand协议中,若每个待广播数据报文中的ECN标志位为第三FECN位时,那么就将每个待广播数据报文中的第三FECN位中的值进行置位,比如“1”,这样就可以将拥塞信息“InfiniBand.FECN=1”复制到N个待广播数据报文中,为后续的拥塞信息的同步提供了多种应用的可能性。In the embodiment, because the ECN flag bit in the InfiniBand protocol is located in the 2bit field in the BTH header, and the 2bit field is composed of the FECN bit and the BECN bit, and the value of the FECN bit is "1", or BECN The value "1" in the bit can indicate that congestion occurs. However, in the case of active detection by the computing switch in the network, only the value of the FECN bit needs to be modified to indicate the occurrence of congestion. Therefore, in the InfiniBand protocol, if the ECN flag bit in each data message to be broadcast is the third FECN bit, then the value in the third FECN bit in each data message to be broadcast is set. For example, "1", so that the congestion information "InfiniBand.FECN=1" can be copied into N data messages to be broadcast, which provides multiple application possibilities for subsequent synchronization of congestion information.
需要理解的是,将第三FECN位上的值取值为“1”来表示发生拥塞,仅仅是一个示意性的描述,在实际应用中,还有可能将第三FECN位上的值置位成其它的数值来表示拥塞发生,具体在本申请实施例中将不做限定。It should be understood that setting the value of the third FECN bit to "1" to indicate that congestion occurs is only a schematic description. In practical applications, it is also possible to set the value of the third FECN bit. Other values are used to indicate the occurrence of congestion, which will not be specifically limited in the embodiment of the present application.
本申请实施例中,网内计算交换机通过向N个工作节点发送均携带有拥塞信息的N个序列号相同的报文,不仅实现了拥塞信息能够同步到这N个工作节点中,使得N个工作节点在分别接收到报文后,可以根据报文中所携带的拥塞信息进行同步地拥塞控制,填补了 网内计算网络中缺少适用的拥塞信息同步的空白,进一步地使得N个工作节点中的每个工作节点的发送速率趋向平滑。In the embodiment of this application, the computing switch in the network sends N packets with the same sequence number that all carry congestion information to N working nodes, which not only realizes that the congestion information can be synchronized to these N working nodes, so that N After receiving the message, the working nodes can perform synchronous congestion control according to the congestion information carried in the message, filling in the gap of the lack of applicable congestion information synchronization in the computing network in the network, and further making the N working nodes The sending rate of each working node tends to be smooth.
上述主要从方法的角度对本申请实施例提供的拥塞信息同步的方法进行了介绍。可以理解的是为了实现上述功能,包含了执行各个功能相应的硬件结构和/或软件模块。本领域技术人员应该很容易意识到,结合本申请中所公开的实施例描述的各示例的模块及算法步骤,本申请能够以硬件或硬件和计算机软件的结合形式来实现。某个功能究竟以硬件还是计算机软件驱动硬件的方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。The foregoing mainly introduces the congestion information synchronization method provided by the embodiment of the present application from the perspective of the method. It can be understood that in order to realize the above-mentioned functions, corresponding hardware structures and/or software modules for performing each function are included. Those skilled in the art should easily realize that in combination with the modules and algorithm steps of the examples described in the embodiments disclosed in the present application, the present application can be implemented in the form of hardware or a combination of hardware and computer software. Whether a certain function is executed by hardware or computer software-driven hardware depends on the specific application and design constraint conditions of the technical solution. Professionals and technicians can use different methods for each specific application to implement the described functions, but such implementation should not be considered beyond the scope of this application.
本申请实施例可以根据上述方法示例对装置进行功能模块的划分,例如,可以对应各个功能划分各个功能模块,也可以将两个或两个以上的功能集成在一个处理模块中。上述集成的模块既可以采用硬件的形式实现,也可以采用软件功能模块的形式实现。需要说明的是,本申请实施例中对模块的划分是示意性的,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式。The embodiments of the present application may divide the device into functional modules according to the foregoing method examples. For example, each functional module may be divided corresponding to each function, or two or more functions may be integrated into one processing module. The above-mentioned integrated modules can be implemented in the form of hardware or software functional modules. It should be noted that the division of modules in the embodiments of the present application is illustrative, and is only a logical function division, and there may be other division methods in actual implementation.
下面将对本申请实施例中的网内计算交换机进行详细地描述,请参阅图7,本申请实施例中网内计算交换机的一个实施例包括:The in-network computing switch in the embodiment of the present application will be described in detail below. Referring to FIG. 7, an embodiment of the in-network computing switch in the embodiment of the present application includes:
获取单元701,用于获取拥塞信息,拥塞信息用于指示第一通信链路发生拥塞,所述第一通信链路为第一工作节点与所述网内计算交换机之间的链路,所述第一工作节点为N个工作节点中的任意一个,N为大于2的整数;The obtaining unit 701 is configured to obtain congestion information, and the congestion information is used to indicate that a first communication link is congested, and the first communication link is a link between a first working node and the computing switch in the network, and The first working node is any one of N working nodes, and N is an integer greater than 2;
发送单元702,用于向N个工作节点发送N个序列号相同的报文,N个序列号相同的报文均携带上述获取单元701获取得到的拥塞信息,以使得N个工作节点分别基于拥塞信息进行拥塞控制。The sending unit 702 is configured to send N messages with the same sequence number to N working nodes. The N messages with the same sequence number all carry the congestion information obtained by the obtaining unit 701, so that the N working nodes are based on congestion. Information is used for congestion control.
通过上述方式,发送单元702向N个工作节点发送均携带由获取单元701获取得到的拥塞信息的N个序列号相同的报文,使得拥塞信息能够同步到这N个工作节点中,填补了网内计算网络中缺少适用的拥塞信息同步的空白,并且使得N个工作节点可以根据报文中所携带的拥塞信息进行同步地拥塞控制,使得每个工作节点的发送速率趋向平滑。In the above manner, the sending unit 702 sends to N working nodes N packets with the same sequence numbers that all carry the congestion information obtained by the obtaining unit 701, so that the congestion information can be synchronized to the N working nodes, filling the network. In the internal computing network, there is a lack of applicable congestion information synchronization gaps, and N working nodes can synchronize congestion control according to the congestion information carried in the message, so that the sending rate of each working node tends to be smooth.
为了便于理解,在图7所描述的实施例的基础上,请参阅图8,本申请实施例中网内计算交换机的另一个实施例中,获取单元701可以包括:For ease of understanding, on the basis of the embodiment described in FIG. 7, please refer to FIG. 8. In another embodiment of the in-network computing switch in the embodiment of the present application, the obtaining unit 701 may include:
第一获取模块7011,用于获取第一工作节点发送的第一报文,第一报文携带拥塞信息,拥塞信息包括第一报文的显示拥塞通知ECN标志位上的第一值,第一值用于指示第一通信链路发生拥塞,其中,第一工作节点为N个工作节点中的任意一个;The first obtaining module 7011 is configured to obtain the first message sent by the first working node, the first message carries congestion information, and the congestion information includes the first value on the ECN flag bit of the display congestion notification of the first message. The value is used to indicate that the first communication link is congested, where the first working node is any one of the N working nodes;
对应地,发送单元702,可以包括:Correspondingly, the sending unit 702 may include:
第一发送模块7021,用于在第一获取模块7011获取得到的第一值的拥塞指示时效期内,向N个工作节点发送N个序列号相同的报文,N个序列号相同的报文均携带拥塞信息。The first sending module 7021 is configured to send N packets with the same sequence number and N packets with the same sequence number to N working nodes within the congestion indication aging period of the first value obtained by the first acquisition module 7011 Both carry congestion information.
通过上述方式,在第一值的拥塞指示时效期还没有超时的时候,第一发送模块7021则在N个序列号相同的报文中都带上由第一获取模块7011获得的拥塞信息,并向N个工作节点分别发送这N个序列号相同的报文,不仅实现了拥塞信息能够同步到这N个工作节点中,填补了网内计算网络中缺少适用的拥塞信息同步的空白,而且也能够使得N个工作节 点可以根据报文中所携带的拥塞信息进行同步地拥塞控制,进一步地基于第一值处于时效期内解决拥塞信息在网内计算网络中出现的失效问题。In the above manner, when the congestion indication aging period of the first value has not expired, the first sending module 7021 carries the congestion information obtained by the first obtaining module 7011 in all N packets with the same sequence number, and Sending these N messages with the same sequence number to N working nodes, not only realizes that congestion information can be synchronized to these N working nodes, and fills the gap in the lack of applicable congestion information synchronization in the intranet computing network, but also It can enable N working nodes to perform synchronous congestion control according to the congestion information carried in the message, and further solve the failure problem of the congestion information in the in-network computing network based on the first value being within the validity period.
可选的,在图8所描述的实施例的基础上,请参阅图9,本申请实施例中网内计算交换机的另一个实施例中,网内计算交换机还可以包括:Optionally, on the basis of the embodiment described in FIG. 8, please refer to FIG. 9. In another embodiment of the in-network computing switch in the embodiment of the present application, the in-network computing switch may further include:
修改单元703,用于在第一发送模块7021向N个工作节点发送N个序列号相同的报文之前,将第二报文的ECN标志位上的值修改为第一值,以得到第三报文,第二报文是在时效期内第一个聚合完成的报文,第二报文的序列号与第三报文的序列号相同;The modifying unit 703 is configured to modify the value of the ECN flag bit of the second message to the first value before the first sending module 7021 sends N messages with the same sequence number to the N working nodes to obtain the third Message, the second message is the first message to be aggregated within the validity period, and the sequence number of the second message is the same as the sequence number of the third message;
对应地,第一发送模块7021,包括:Correspondingly, the first sending module 7021 includes:
第一发送子模块70211,用于向N个工作节点发送修改单元703得到的N个第三报文,每个第三报文中的第一值指示对应的工作节点进行拥塞控制,N个第三报文中的序列号相同。The first sending sub-module 70211 is configured to send N third messages obtained by the modification unit 703 to N working nodes. The first value in each third message indicates that the corresponding working node performs congestion control. The sequence numbers in the three messages are the same.
可选的,在上述图8和图9所描述的实施例的基础上,在本申请实施例中网内计算交换机的另一个实施例中,网内计算交换机还包括:Optionally, on the basis of the embodiments described in FIG. 8 and FIG. 9, in another embodiment of the in-network computing switch in the embodiment of the present application, the in-network computing switch further includes:
获取单元701,用于在第一值的拥塞指示时效期内,接收到第一工作节点发送的第四报文,第四报文中ECN标志位上的值为第一值;The acquiring unit 701 is configured to receive a fourth message sent by the first working node within the congestion indication aging period of the first value, and the value on the ECN flag bit in the fourth message is the first value;
忽略单元,用于忽略获取单元701获取到的第四报文中的第一值。The ignoring unit is used to ignore the first value in the fourth message obtained by the obtaining unit 701.
可选的,在上述图9所描述的实施例的基础上,在本申请实施例中网内计算交换机的另一个实施例中,修改单元703,用于在第一报文的ECN标志位包括第一ECN字段,第二报文的ECN标志位包括第二ECN字段时,则将第二ECN字段中的值修改为第一ECN字段中的第一值;或,Optionally, on the basis of the embodiment described in FIG. 9 above, in another embodiment of the in-network computing switch in the embodiment of the present application, the modifying unit 703 is configured to include the ECN flag bit of the first packet When the first ECN field and the ECN flag bit of the second packet include the second ECN field, modify the value in the second ECN field to the first value in the first ECN field; or,
修改单元703,用于在第一报文的ECN标志位包括第一前向显示拥塞通知FECN位,第二报文的ECN标志位包括第二FECN位时,则将第二FECN位中的值修改为第一FECN位中的第一值;或,The modification unit 703 is configured to: when the ECN flag bit of the first message includes the first forward display congestion notification FECN bit, and the ECN flag bit of the second message includes the second FECN bit, change the value in the second FECN bit Modify to the first value in the first FECN bit; or,
修改单元703,用于在第一报文的ECN标志位包括第一后向显示拥塞通知BECN位,第二报文的ECN标志位包括第二BECN位时,则将第二BECN位中的值修改为第一BECN位上的第一值。The modifying unit 703 is configured to: when the ECN flag bit of the first message includes the first backward display congestion notification BECN bit, and the ECN flag bit of the second message includes the second BECN bit, the value in the second BECN bit is changed Modified to the first value on the first BECN bit.
可选的,在图7所描述的实施例的基础上,请参阅图10,本申请实施例中网内计算交换机的另一个实施例中,获取单元701可以包括:Optionally, on the basis of the embodiment described in FIG. 7, please refer to FIG. 10. In another embodiment of the in-network computing switch in the embodiment of the present application, the obtaining unit 701 may include:
第二获取模块7012,用于在第一工作节点与网内计算交换机之间的端口状态显示拥塞时,则将N个待广播数据报文中的ECN标志位上的值进行修改,以得到拥塞信息,其中,N个待广播数据报文为N个工作节点中序列号相同的报文,第二工作节点为N个工作节点中的任意一个;The second acquisition module 7012 is used to modify the value of the ECN flag bit in the N data packets to be broadcast when the port status between the first working node and the in-network computing switch shows congestion to obtain the congestion Information, where the N data messages to be broadcast are messages with the same sequence number among the N working nodes, and the second working node is any one of the N working nodes;
对应地,发送单元702,包括:Correspondingly, the sending unit 702 includes:
第二发送模块7022,用于向N个工作节点发送修改后的N个待广播数据报文,修改后的N个待广播数据报文中的ECN标志位上的值分别用于指示N个工作节点进行拥塞控制。The second sending module 7022 is used to send modified N data messages to be broadcast to N working nodes, and the values on the ECN flag bits in the modified N data messages to be broadcast are used to indicate N jobs respectively The node performs congestion control.
实施例中,第二发送模块7022可以向N个工作节点发送这N个修改后的待广播数据报文,使得每个修改后的待广播数据报文中都携带有由第二获取模块7012得到的拥塞信息,从而使得拥塞信息能够同步到这N个工作节点中,填补了网内计算网络中缺少适用的拥塞信息同步的空白;并且使得N个工作节点可以根据修改后的待广播数据报文中所携带的拥塞信息进 行同步地拥塞控制,进一步地使得N个工作节点中的每个工作节点的发送速率趋向平滑。In an embodiment, the second sending module 7022 may send the N modified data messages to be broadcast to N working nodes, so that each modified data message to be broadcast carries the data obtained by the second obtaining module 7012 Congestion information, so that the congestion information can be synchronized to these N working nodes, filling the gap of the lack of applicable congestion information synchronization in the computing network in the network; and allowing the N working nodes to be based on the modified data message to be broadcast The congestion information carried in the congestion control is performed synchronously, which further makes the sending rate of each of the N working nodes tend to be smooth.
可选的,在图10所描述的实施例的基础上,本申请实施例中网内计算交换机的另一个实施例中,第二获取模块7012,用于在待广播数据报文中的ECN标志位包括第三ECN字段时,则网内计算交换机将第三ECN字段中的值进行置位;或,Optionally, on the basis of the embodiment described in FIG. 10, in another embodiment of the in-network computing switch in the embodiment of the present application, the second obtaining module 7012 is used for the ECN flag in the data message to be broadcast When the bit includes the third ECN field, the calculation switch in the network sets the value in the third ECN field; or,
第二获取模块7012,用于在待广播数据报文中的ECN标志位包括第三FECN位时,则网内计算交换机将第三FECN位中的值进行置位。The second acquisition module 7012 is configured to set the value in the third FECN bit by the in-network computing switch when the ECN flag bit in the data message to be broadcast includes the third FECN bit.
上面从模块化功能实体的角度对本申请实施例中的网内计算交换机进行描述,下面从硬件处理的角度对本申请实施例中的网内计算交换机进行描述。图11是本申请实施例中的通信装置的硬件结构一个示意图。如图11所示,该通信装置可以包括:The foregoing describes the intra-network computing switch in the embodiment of the present application from the perspective of a modular functional entity, and the following describes the intra-network computing switch in the embodiment of the present application from the perspective of hardware processing. FIG. 11 is a schematic diagram of the hardware structure of a communication device in an embodiment of the present application. As shown in FIG. 11, the communication device may include:
该通信装置包括至少一个处理器1101,通信线路1107,存储器1103以及至少一个通信接口1104。The communication device includes at least one processor 1101, a communication line 1107, a memory 1103, and at least one communication interface 1104.
处理器1101可以是一个通用中央处理器(central processing unit,CPU),微处理器,特定应用集成电路(application-specific integrated circuit,服务器IC),或一个或多个用于控制本申请方案程序执行的集成电路。The processor 1101 may be a general-purpose central processing unit (central processing unit, CPU), a microprocessor, an application-specific integrated circuit (server IC), or one or more programs for controlling the execution of the program of this application Integrated circuits.
通信线路1107可包括一通路,在上述组件之间传送信息。The communication line 1107 may include a path to transmit information between the aforementioned components.
通信接口1104,使用任何收发器一类的装置,用于与其他装置或通信网络通信,如以太网,无线接入网(radio access network,RAN),无线局域网(wireless local area networks,WLAN)等。 Communication interface 1104, which uses any device such as a transceiver to communicate with other devices or communication networks, such as Ethernet, radio access network (RAN), wireless local area networks (WLAN), etc. .
存储器1103可以是只读存储器(read-only memory,ROM)或可存储静态信息和指令的其他类型的静态存储装置,随机存取存储器(random access memory,RAM)或者可存储信息和指令的其他类型的动态存储装置,存储器可以是独立存在,通过通信线路1107与处理器相连接。存储器也可以和处理器集成在一起。The memory 1103 may be a read-only memory (ROM) or other types of static storage devices that can store static information and instructions, random access memory (RAM), or other types that can store information and instructions The dynamic storage device, the memory can exist independently, and is connected to the processor through the communication line 1107. The memory can also be integrated with the processor.
其中,存储器1103用于存储执行本申请方案的计算机执行指令,并由处理器1101来控制执行。处理器1101用于执行存储器1103中存储的计算机执行指令,从而实现本申请上述实施例提供的拥塞信息同步的方法。The memory 1103 is used to store computer-executed instructions for executing the solution of the present application, and the processor 1101 controls the execution. The processor 1101 is configured to execute computer-executable instructions stored in the memory 1103, so as to implement the congestion information synchronization method provided in the foregoing embodiment of the present application.
可选的,本申请实施例中的计算机执行指令也可以称之为应用程序代码,本申请实施例对此不作具体限定。Optionally, the computer-executable instructions in the embodiments of the present application may also be referred to as application program codes, which are not specifically limited in the embodiments of the present application.
在具体实现中,作为一种实施例,通信装置可以包括多个处理器,例如图11中的处理器1101和处理器1102。这些处理器中的每一个可以是一个单核(single-CPU)处理器,也可以是一个多核(multi-CPU)处理器。这里的处理器可以指一个或多个装置、电路、和/或用于处理数据(例如计算机程序指令)的处理核。In a specific implementation, as an embodiment, the communication device may include multiple processors, such as the processor 1101 and the processor 1102 in FIG. 11. Each of these processors can be a single-CPU (single-CPU) processor or a multi-core (multi-CPU) processor. The processor here may refer to one or more devices, circuits, and/or processing cores for processing data (for example, computer program instructions).
在具体实现中,作为一种实施例,通信装置还可以包括输出设备1105和输入设备1106。输出设备1105和处理器1101通信,可以以多种方式来显示信息。输入设备1106和处理器1101通信,可以以多种方式接收用户的输入。例如,输入设备1106可以是鼠标、触摸屏装置或传感装置等。In a specific implementation, as an embodiment, the communication apparatus may further include an output device 1105 and an input device 1106. The output device 1105 communicates with the processor 1101 and can display information in a variety of ways. The input device 1106 communicates with the processor 1101 and can receive user input in a variety of ways. For example, the input device 1106 may be a mouse, a touch screen device, a sensor device, or the like.
上述的通信装置可以是一个通用装置或者是一个专用装置。在具体实现中,通信装置可以是路由器、网内计算交换机或有图11中类似结构的装置。本申请实施例不限定通信装置的类型。The aforementioned communication device may be a general-purpose device or a dedicated device. In a specific implementation, the communication device may be a router, an in-network computing switch, or a device with a similar structure in FIG. 11. The embodiment of the present application does not limit the type of the communication device.
上述获取单元701、第一获取模块7011、第二获取模块7012都可以通过输入设备1106来实现,发送单元702、第一发送模块7021、第一发送子模块70211、第二发送模块7022都可以通过输出设备1105来实现,修改单元703、忽略单元都可以通过处理器1101或处理器1102来实现。The above-mentioned acquisition unit 701, first acquisition module 7011, and second acquisition module 7012 can all be implemented by the input device 1106, and the sending unit 702, the first sending module 7021, the first sending sub-module 70211, and the second sending module 7022 can all be implemented through the input device 1106. The output device 1105 is implemented, and both the modification unit 703 and the ignoring unit can be implemented by the processor 1101 or the processor 1102.
在上述实施例中,可以全部或部分地通过软件、硬件、固件或者其任意组合来实现。当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented by software, it can be implemented in the form of a computer program product in whole or in part.
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的系统,装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。Those skilled in the art can clearly understand that, for the convenience and conciseness of the description, the specific working process of the above-described system, device, and unit can refer to the corresponding process in the foregoing method embodiment, which will not be repeated here.
在本申请所提供的几个实施例中,应该理解到,所揭露的系统,装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,该单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。In the several embodiments provided in this application, it should be understood that the disclosed system, device, and method can be implemented in other ways. For example, the device embodiment described above is only illustrative. For example, the division of the unit is only a logical function division, and there may be other division methods in actual implementation, for example, multiple units or components may be combined or may be Integrate into another system, or some features can be ignored or not implemented. In addition, the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, and may be in electrical, mechanical or other forms.
该作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。The unit described as a separate component may or may not be physically separated, and the component displayed as a unit may or may not be a physical unit, that is, it may be located in one place, or may also be distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the objectives of the solutions of the embodiments.
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。In addition, the functional units in the various embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit. The above-mentioned integrated unit can be implemented in the form of hardware or software functional unit.
该集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本申请各个实施例该方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(read-only memory,ROM)、随机存取存储器(random access memory,RAM)、磁碟或者光盘等各种可以存储程序代码的介质。If the integrated unit is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a computer readable storage medium. Based on this understanding, the technical solution of the present application essentially or the part that contributes to the existing technology or all or part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium , Including several instructions to make a computer device (which can be a personal computer, a server, or a network device, etc.) execute all or part of the steps of the method in each embodiment of the present application. The aforementioned storage media include: U disk, mobile hard disk, read-only memory (read-only memory, ROM), random access memory (random access memory, RAM), magnetic disks or optical disks and other media that can store program codes. .
以上实施例仅用以说明本申请的技术方案,而非对其限制;尽管参照前述实施例对本申请进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本申请各实施例技术方案的精神和范围。The above embodiments are only used to illustrate the technical solutions of the present application, not to limit them; although the present application has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art should understand that they can still compare the previous embodiments. The recorded technical solutions are modified, or some of the technical features are equivalently replaced; these modifications or replacements do not cause the essence of the corresponding technical solutions to deviate from the spirit and scope of the technical solutions of the embodiments of the present application.

Claims (17)

  1. 一种拥塞信息同步的方法,其特征在于,包括:A method for synchronizing congestion information is characterized in that it includes:
    网内计算交换机获取拥塞信息,所述拥塞信息用于指示第一通信链路发生拥塞,所述第一通信链路为第一工作节点与所述网内计算交换机之间的链路,所述第一工作节点为N个工作节点中的任意一个,所述N为大于2的整数;The in-network computing switch acquires congestion information, where the congestion information is used to indicate that the first communication link is congested, and the first communication link is the link between the first working node and the in-network computing switch, and The first working node is any one of N working nodes, and the N is an integer greater than 2;
    所述网内计算交换机向所述N个工作节点发送N个序列号相同的报文,所述N个序列号相同的报文均携带所述拥塞信息,以使得所述N个工作节点分别基于所述拥塞信息进行拥塞控制。The computing switch in the network sends N packets with the same sequence number to the N working nodes, and the N packets with the same sequence number all carry the congestion information, so that the N working nodes are based on The congestion information performs congestion control.
  2. 根据权利要求1所述的方法,其特征在于,所述网内计算交换机获取拥塞信息,包括:The method according to claim 1, wherein the acquiring congestion information by the in-network computing switch comprises:
    所述网内计算交换机获取所述第一工作节点发送的第一报文,所述第一报文携带所述拥塞信息,所述拥塞信息包括所述第一报文的显示拥塞通知ECN标志位上的第一值,所述第一值用于指示所述第一通信链路发生拥塞;The in-network computing switch obtains the first message sent by the first working node, the first message carries the congestion information, and the congestion information includes the display congestion notification ECN flag bit of the first message The first value on the above, the first value is used to indicate that the first communication link is congested;
    对应地,所述网内计算交换机向所述N个工作节点发送N个序列号相同的报文,所述N个序列号相同的报文均携带所述拥塞信息,包括:Correspondingly, the in-network computing switch sends N packets with the same sequence number to the N working nodes, and the N packets with the same sequence number all carry the congestion information, including:
    在所述第一值的拥塞指示时效期内,所述网内计算交换机向所述N个工作节点发送N个序列号相同的报文,所述N个序列号相同的报文均携带所述拥塞信息。During the congestion indication aging period of the first value, the in-network computing switch sends N packets with the same sequence number to the N working nodes, and the N packets with the same sequence number all carry the Congestion information.
  3. 根据权利要求2所述的方法,其特征在于,在所述网内计算交换机向所述N个工作节点发送N个序列号相同的报文之前,还包括:The method according to claim 2, wherein before the in-network computing switch sends N packets with the same sequence number to the N working nodes, the method further comprises:
    所述网内计算交换机将第二报文的ECN标志位上的值修改为所述第一值,以得到第三报文,所述第二报文是在所述时效期内第一个聚合完成的报文,所述第二报文的序列号与所述第三报文的序列号相同;The in-network computing switch modifies the value of the ECN flag bit of the second packet to the first value to obtain a third packet, and the second packet is the first aggregation within the validity period. For a completed message, the sequence number of the second message is the same as the sequence number of the third message;
    对应地,所述网内计算交换机向所述N个工作节点发送N个序列号相同的报文,包括:Correspondingly, the in-network computing switch sending N packets with the same sequence number to the N working nodes includes:
    所述网内计算交换机向所述N个工作节点发送N个所述第三报文,每个所述第三报文中的所述第一值指示对应的工作节点进行拥塞控制,所述N个第三报文中的序列号相同。The computing switch in the network sends N third messages to the N working nodes, the first value in each third message indicates that the corresponding working node performs congestion control, and the N The sequence numbers in the third message are the same.
  4. 根据权利要求2-3任一项所述的方法,其特征在于,所述方法还包括:The method according to any one of claims 2-3, wherein the method further comprises:
    若在所述第一值的拥塞指示时效期内,所述网内计算交换机接收到所述第一工作节点发送的第四报文,所述第四报文中ECN标志位上的值为所述第一值;If the in-network computing switch receives the fourth packet sent by the first working node within the congestion indication aging period of the first value, the value on the ECN flag bit in the fourth packet is The first value;
    所述网内计算交换机忽略所述第四报文中的所述第一值。The in-network computing switch ignores the first value in the fourth packet.
  5. 根据权利要求3所述的方法,其特征在于,所述网内计算交换机将第二报文的ECN标志位上的值修改为所述第一值,包括:The method according to claim 3, wherein the in-network computing switch modifies the value of the ECN flag bit of the second packet to the first value, comprising:
    当所述第一报文的ECN标志位包括第一ECN字段,所述第二报文的ECN标志位包括第二ECN字段时,则所述网内计算交换机将所述第二ECN字段中的值修改为所述第一ECN字段中的所述第一值;或,When the ECN flag bit of the first packet includes the first ECN field, and the ECN flag bit of the second packet includes the second ECN field, the in-network computing switch will set the value in the second ECN field The value is modified to the first value in the first ECN field; or,
    当所述第一报文的ECN标志位包括第一前向显示拥塞通知FECN位,所述第二报文的ECN标志位包括第二FECN位时,则所述网内计算交换机将所述第二FECN位中的值修改为所述第一FECN位中的所述第一值;或,When the ECN flag bit of the first message includes the first forward display congestion notification FECN bit, and the ECN flag bit of the second message includes the second FECN bit, the in-network computing switch will The value in the second FECN bit is modified to the first value in the first FECN bit; or,
    当所述第一报文的ECN标志位包括第一后向显示拥塞通知BECN位,所述第二报文的ECN标志位包括第二BECN位时,则所述网内计算交换机将所述第二BECN位中的值修改为所述第 一BECN位上的所述第一值。When the ECN flag bit of the first message includes the first backward display congestion notification BECN bit, and the ECN flag bit of the second message includes the second BECN bit, the in-network computing switch will The value in the second BECN bit is modified to the first value on the first BECN bit.
  6. 根据权利要求1所述的方法,其特征在于,所述网内计算交换机获取拥塞信息,包括:The method according to claim 1, wherein the acquiring congestion information by the in-network computing switch comprises:
    当所述第一工作节点与所述网内计算交换机之间的端口状态显示拥塞时,则所述网内计算交换机将N个待广播数据报文中的ECN标志位上的值进行修改,以得到拥塞信息,其中,所述N个待广播数据报文为所述N个工作节点中序列号相同的报文;When the port status between the first working node and the in-network computing switch shows congestion, the in-network computing switch modifies the value of the ECN flag bit in the N data messages to be broadcast to Obtain congestion information, where the N data messages to be broadcast are messages with the same sequence number among the N working nodes;
    对应地,所述网内计算交换机向所述N个工作节点发送N个序列号相同的报文,所述N个序列号相同的报文均携带所述拥塞信息,包括:Correspondingly, the in-network computing switch sends N packets with the same sequence number to the N working nodes, and the N packets with the same sequence number all carry the congestion information, including:
    所述网内计算交换机向所述N个工作节点发送修改后的N个所述待广播数据报文,所述修改后的N个待广播数据报文中的ECN标志位上的值分别用于指示所述N个工作节点进行拥塞控制。The in-network computing switch sends the modified N data messages to be broadcast to the N working nodes, and the values on the ECN flag bits in the modified N data messages to be broadcast are used respectively Instruct the N working nodes to perform congestion control.
  7. 根据权利要求6所述的方法,其特征在于,所述网内计算交换机将N个待广播数据报文中的ECN标志位上的值进行修改,包括:The method according to claim 6, wherein the in-network computing switch modifies the value of the ECN flag bit in the N data messages to be broadcast, comprising:
    当所述待广播数据报文中的ECN标志位包括第三ECN字段时,则所述网内计算交换机将所述第三ECN字段中的值进行置位;或,When the ECN flag in the data message to be broadcast includes the third ECN field, the in-network computing switch sets the value in the third ECN field; or,
    当所述待广播数据报文中的ECN标志位包括第三FECN位时,则所述网内计算交换机将所述第三FECN位中的值进行置位。When the ECN flag bit in the data message to be broadcast includes the third FECN bit, the in-network computing switch sets the value in the third FECN bit.
  8. 一种网内计算交换机,其特征在于,包括:An in-network computing switch, which is characterized in that it comprises:
    获取单元,用于获取拥塞信息,所述拥塞信息用于指示第一通信链路发生拥塞,所述第一通信链路为第一工作节点与所述网内计算交换机之间的链路,所述第一工作节点为N个工作节点中的任意一个,所述N为大于2的整数;The acquiring unit is configured to acquire congestion information, where the congestion information is used to indicate that the first communication link is congested, and the first communication link is the link between the first working node and the computing switch in the network, so The first working node is any one of N working nodes, and the N is an integer greater than 2;
    发送单元,用于向所述N个工作节点发送N个序列号相同的报文,所述N个序列号相同的报文均携带所述拥塞信息,以使得所述N个工作节点分别基于所述拥塞信息进行拥塞控制。The sending unit is configured to send N messages with the same sequence number to the N working nodes, and the N messages with the same sequence number all carry the congestion information, so that the N working nodes are based on all the messages. The congestion information is described for congestion control.
  9. 根据权利要求8所述的网内计算交换机,其特征在于,所述获取单元,包括:The in-network computing switch according to claim 8, wherein the acquiring unit comprises:
    第一获取模块,用于获取所述第一工作节点发送的第一报文,所述第一报文携带所述拥塞信息,所述拥塞信息包括所述第一报文的显示拥塞通知ECN标志位上的第一值,所述第一值用于指示所述第一通信链路发生拥塞;The first obtaining module is configured to obtain a first message sent by the first working node, the first message carrying the congestion information, and the congestion information includes the display congestion notification ECN flag of the first message A first value on the bit, where the first value is used to indicate that the first communication link is congested;
    对应地,所述发送单元,包括:Correspondingly, the sending unit includes:
    第一发送模块,用于在所述第一获取模块获取得到的所述第一值的拥塞指示时效期内,向所述N个工作节点发送N个序列号相同的报文,所述N个序列号相同的报文均携带所述拥塞信息。The first sending module is configured to send N packets with the same sequence number to the N working nodes within the congestion indication time period of the first value obtained by the first obtaining module, and the N All packets with the same sequence number carry the congestion information.
  10. 根据权利要求9所述的网内计算交换机,其特征在于,所述网内计算交换机还包括:The in-network computing switch according to claim 9, wherein the in-network computing switch further comprises:
    修改单元,用于在所述第一发送模块向所述N个工作节点发送N个序列号相同的报文之前,将第二报文的ECN标志位上的值修改为所述第一值,以得到第三报文,所述第二报文是在所述时效期内第一个聚合完成的报文,所述第二报文的序列号与所述第三报文的序列号相同;The modifying unit is configured to modify the value of the ECN flag bit of the second message to the first value before the first sending module sends N messages with the same sequence number to the N working nodes, To obtain a third message, where the second message is the first message to be aggregated within the validity period, and the sequence number of the second message is the same as the sequence number of the third message;
    对应地,所述第一发送模块,包括:Correspondingly, the first sending module includes:
    第一发送子模块,用于向所述N个工作节点发送所述修改单元得到的N个所述第三报文,每个所述第三报文中的所述第一值指示对应的工作节点进行拥塞控制,所述N个第三报文中 的序列号相同。The first sending submodule is configured to send the N third messages obtained by the modifying unit to the N working nodes, and the first value in each third message indicates the corresponding work The node performs congestion control, and the sequence numbers in the N third messages are the same.
  11. 根据权利要求9-10任一项所述的网内计算交换机,其特征在于,所述网内计算交换机还包括:The in-network computing switch according to any one of claims 9-10, wherein the in-network computing switch further comprises:
    所述获取单元,用于在所述第一值的拥塞指示时效期内,接收到所述第一工作节点发送的第四报文,所述第四报文中所述ECN标志位上的值为所述第一值;The acquiring unit is configured to receive a fourth message sent by the first working node within the congestion indication aging period of the first value, and the value on the ECN flag bit in the fourth message Is the first value;
    忽略单元,用于忽略所述获取单元获取到的所述第四报文中的所述第一值。The ignoring unit is configured to ignore the first value in the fourth message obtained by the obtaining unit.
  12. 根据权利要求10所述的网内计算交换机,其特征在于,The in-network computing switch according to claim 10, characterized in that:
    所述修改单元,用于在所述第一报文的ECN标志位包括第一ECN字段,所述第二报文的ECN标志位包括第二ECN字段时,则将所述第二ECN字段中的值修改为所述第一ECN字段中的所述第一值;或,The modification unit is configured to, when the ECN flag bit of the first packet includes a first ECN field, and the ECN flag bit of the second packet includes a second ECN field, add the second ECN field to The value of is modified to the first value in the first ECN field; or,
    所述修改单元,用于在所述第一报文的ECN标志位包括第一前向显示拥塞通知FECN位,所述第二报文的ECN标志位包括第二FECN位时,则将所述第二FECN位中的值修改为所述第一FECN位中的所述第一值;或,The modification unit is configured to: when the ECN flag bit of the first message includes the first forward display congestion notification FECN bit, and the ECN flag bit of the second message includes the second FECN bit, then the The value in the second FECN bit is modified to the first value in the first FECN bit; or,
    所述修改单元,用于在所述第一报文的ECN标志位包括第一后向显示拥塞通知BECN位,所述第二报文的ECN标志位包括第二BECN位时,则将所述第二BECN位中的值修改为所述第一BECN位上的所述第一值。The modification unit is configured to: when the ECN flag bit of the first message includes the first backward display congestion notification BECN bit, and the ECN flag bit of the second message includes the second BECN bit, then the The value in the second BECN bit is modified to the first value on the first BECN bit.
  13. 根据权利要求8所述的网内计算交换机,其特征在于,所述获取单元,包括:The in-network computing switch according to claim 8, wherein the acquiring unit comprises:
    第二获取模块,用于在所述第一工作节点与所述网内计算交换机之间的端口状态显示拥塞时,则将N个待广播数据报文中的ECN标志位上的值进行修改,以得到拥塞信息,其中,所述N个待广播数据报文为所述N个工作节点中序列号相同的报文,所述第二工作节点为所述N个工作节点中的任意一个;The second acquisition module is configured to modify the value of the ECN flag bit in the N data messages to be broadcast when the port status between the first working node and the in-network computing switch shows congestion, To obtain congestion information, wherein the N data messages to be broadcast are messages with the same sequence number among the N working nodes, and the second working node is any one of the N working nodes;
    对应地,所述发送单元,包括:Correspondingly, the sending unit includes:
    第二发送模块,用于向所述N个工作节点发送修改后的N个所述待广播数据报文,所述修改后的N个待广播数据报文中的ECN标志位上的值分别用于指示所述N个工作节点进行拥塞控制。The second sending module is configured to send the modified N data messages to be broadcast to the N working nodes, and the values on the ECN flag bits in the modified N data messages to be broadcast are respectively used Instruct the N working nodes to perform congestion control.
  14. 根据权利要求13所述的网内计算交换机,其特征在于,The in-network computing switch according to claim 13, characterized in that:
    所述第二获取模块,用于在所述待广播数据报文中的ECN标志位包括第三ECN字段时,则所述网内计算交换机将所述第三ECN字段中的值进行置位;或,The second acquiring module is configured to, when the ECN flag bit in the data message to be broadcast includes a third ECN field, the in-network computing switch sets the value in the third ECN field; or,
    所述第二获取模块,用于在所述待广播数据报文中的ECN标志位包括第三FECN位时,则所述网内计算交换机将所述第三FECN位中的值进行置位。The second acquiring module is configured to, when the ECN flag bit in the data message to be broadcast includes the third FECN bit, the in-network computing switch sets the value in the third FECN bit.
  15. 一种计算机设备,其特征在于,包括:处理器,所述处理器与存储器耦合,所述存储器用于存储程序或指令,当所述程序或指令被所述处理器执行时,使得所述计算机设备执行如权利要求1至7中任意一项所述的方法。A computer device, characterized by comprising: a processor, the processor is coupled with a memory, the memory is used to store a program or instruction, when the program or instruction is executed by the processor, the computer The device executes the method according to any one of claims 1 to 7.
  16. 一种计算机可读存储介质,其上存储有计算机程序或指令,其特征在于,所述计算机程序或指令被执行时使得计算机执行如权利要求1至7中任意一项所述的方法。A computer-readable storage medium having a computer program or instruction stored thereon, wherein the computer program or instruction is executed to cause a computer to execute the method according to any one of claims 1 to 7.
  17. 一种芯片,其特征在于,包括:处理器,所述处理器与存储器耦合,所述存储器用于存储程序或指令,当所述程序或指令被所述处理器执行时,使得网内计算交换机执行如权利要求1至7中任意一项所述的方法。A chip, characterized by comprising: a processor, the processor is coupled with a memory, the memory is used to store a program or an instruction, when the program or an instruction is executed by the processor, the network computing switch Perform the method according to any one of claims 1 to 7.
PCT/CN2021/083150 2020-04-09 2021-03-26 Congestion information synchronizing method and related apparatus WO2021203985A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010273713.0A CN113518037A (en) 2020-04-09 2020-04-09 Congestion information synchronization method and related device
CN202010273713.0 2020-04-09

Publications (1)

Publication Number Publication Date
WO2021203985A1 true WO2021203985A1 (en) 2021-10-14

Family

ID=78022429

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/083150 WO2021203985A1 (en) 2020-04-09 2021-03-26 Congestion information synchronizing method and related apparatus

Country Status (2)

Country Link
CN (1) CN113518037A (en)
WO (1) WO2021203985A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117544567A (en) * 2024-01-09 2024-02-09 南京邮电大学 Memory transfer integrated RDMA data center congestion control method

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115396372B (en) * 2022-10-26 2023-02-28 阿里云计算有限公司 Data stream rate control method, intelligent network card, cloud device and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101754266A (en) * 2008-12-15 2010-06-23 中国移动通信集团公司 Method, system and device for adjusting transmission speed and redirecting routing
CN102196502A (en) * 2011-04-06 2011-09-21 东南大学 Congestion control method for wireless sensor network
CN102474452A (en) * 2009-07-02 2012-05-23 高通股份有限公司 Transmission of control information across multiple packets
CN104581821A (en) * 2015-01-28 2015-04-29 湘潭大学 Congestion control method based on node cache length equitable distribution rate
US20150188820A1 (en) * 2013-12-31 2015-07-02 International Business Machines Corporation Quantized congestion notification (qcn) extension to explicit congestion notification (ecn) for transport-based end-to-end congestion notification

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101754266A (en) * 2008-12-15 2010-06-23 中国移动通信集团公司 Method, system and device for adjusting transmission speed and redirecting routing
CN102474452A (en) * 2009-07-02 2012-05-23 高通股份有限公司 Transmission of control information across multiple packets
CN102196502A (en) * 2011-04-06 2011-09-21 东南大学 Congestion control method for wireless sensor network
US20150188820A1 (en) * 2013-12-31 2015-07-02 International Business Machines Corporation Quantized congestion notification (qcn) extension to explicit congestion notification (ecn) for transport-based end-to-end congestion notification
CN104581821A (en) * 2015-01-28 2015-04-29 湘潭大学 Congestion control method based on node cache length equitable distribution rate

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117544567A (en) * 2024-01-09 2024-02-09 南京邮电大学 Memory transfer integrated RDMA data center congestion control method
CN117544567B (en) * 2024-01-09 2024-03-19 南京邮电大学 Memory transfer integrated RDMA data center congestion control method

Also Published As

Publication number Publication date
CN113518037A (en) 2021-10-19

Similar Documents

Publication Publication Date Title
US11228534B2 (en) Congestion control method, network device, and network interface controller
US9571382B2 (en) Method, controller, and system for processing data packet
WO2017070970A1 (en) Route determining method and corresponding apparatus and system
WO2021012697A1 (en) Training method, device, and system using mos model
WO2017016106A1 (en) Sdn-based qos-supported communications tunnel establishment method and system
US20130287038A1 (en) Synchronization of traffic multiplexing in link aggregation
WO2021203985A1 (en) Congestion information synchronizing method and related apparatus
US9025451B2 (en) Positive feedback ethernet link flow control for promoting lossless ethernet
WO2020063339A1 (en) Method, device and system for realizing data transmission
JP2010518782A (en) Proxy-based signaling architecture for streaming media services in wireless communication systems
EP2868054B1 (en) Resilient video encoding control via explicit network indication
CN107454000B (en) Network data transmission device and method
US8787160B2 (en) Method, apparatus, and system for judging path congestion
WO2018213987A1 (en) Data distribution method, device and system
WO2021238799A1 (en) Data packet transmission method and apparatus
WO2020147440A1 (en) Data usage reporting method, apparatus and system
CN110691379A (en) Active routing communication method suitable for wireless ad hoc network
WO2023116580A1 (en) Path switching method and apparatus, network device, and network system
CN114710975A (en) Multi-domain transport multi-transport network context identification
WO2022042386A1 (en) Method for controlling message sending, network device and system
CN115766605A (en) Network congestion control method, device and system
Yang et al. Improving XCP to achieve max–min fair bandwidth allocation
WO2019119836A1 (en) Message processing method and device
WO2017166031A1 (en) Method and device for distinguishing access network and computer readable storage medium
TWI821882B (en) Packet loss rate measuring method, communication apparatus, and communication system

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21785032

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21785032

Country of ref document: EP

Kind code of ref document: A1