WO2024061042A1 - Data transmission method and data transmission system - Google Patents

Data transmission method and data transmission system Download PDF

Info

Publication number
WO2024061042A1
WO2024061042A1 PCT/CN2023/118075 CN2023118075W WO2024061042A1 WO 2024061042 A1 WO2024061042 A1 WO 2024061042A1 CN 2023118075 W CN2023118075 W CN 2023118075W WO 2024061042 A1 WO2024061042 A1 WO 2024061042A1
Authority
WO
WIPO (PCT)
Prior art keywords
node
data
data stream
credit value
information
Prior art date
Application number
PCT/CN2023/118075
Other languages
French (fr)
Chinese (zh)
Inventor
叶秋红
何子键
林云
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2024061042A1 publication Critical patent/WO2024061042A1/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/10Flow control; Congestion control
    • H04L47/12Avoiding congestion; Recovering from congestion

Definitions

  • the embodiments of the present application relate to the field of network communication technology, and in particular, to a data transmission method and a data transmission system.
  • DCN data center network
  • HPC high performance computer cluster
  • Noc network-on-chip
  • the source node In current network systems, the source node usually pushes the data stream directly to the destination node to reduce the transmission delay of the data stream.
  • this method usually causes transmission congestion at some nodes in the network system. When the congestion further spreads, it leads to head blocking or packet loss in the network system.
  • a local flow control method based on a priority flow control mechanism or a tail packet loss mechanism is usually adopted, or a source-end flow control method using an advance congestion notification mechanism is used.
  • the flow control method provided by the existing technology does not distinguish between the data flow that causes congestion and the non-congested data flow, which cannot guarantee the low delay of the non-congested data flow; in addition, the local flow control method is also easy to cause Packet loss problem, the source-side flow control method cannot maintain high throughput when the network system is overloaded. Therefore, how to balance the data flow in the network system to ensure the data transmission efficiency of the network system has become a problem that needs to be solved.
  • the data transmission method and system provided by this application can improve the data transmission efficiency of the network system.
  • this application adopts the following technical solutions:
  • inventions of the present application provide a data transmission method.
  • the data transmission method includes: receiving a first data stream, where the first data stream includes a plurality of data packets; and sending data to a destination according to a preset first credit value.
  • the node transmits the first data stream, and the first credit value is used to indicate a preset data traffic size transmitted at one time; and receives first information, the first information is used to indicate the number of the port that receives the first data stream.
  • the cache capacity exceeds a preset threshold; based on the first information and a second credit value obtained from the destination node, the first data stream is transmitted to the destination node, where the second credit value is used to indicate The maximum data traffic obtained from the destination node for one transmission.
  • the source node used to send the first data stream may maintain two credit counters.
  • the value of one credit counter (denoted as credit1) (that is, the first credit value ) is preset by the system and is used to indicate the preset amount of data traffic transmitted at one time.
  • the value of a credit counter (recorded as credit2) (that is, the second credit value obtained from the destination node) is based on the value obtained from the destination node.
  • Credit2 that is, the second credit value obtained from the destination node
  • the source node can send data streams based on credit1 and credit2. Among them, the data flow sent to the destination node based on the count of credit1 is an uncontrolled data flow, and the data flow sent to the destination node based on the count of credit2 is a controlled data flow.
  • transmitting the data stream to the destination node through the first credit value may include: the source node transmits request information to the destination node to obtain the credit value of the first data stream, but does not receive it within the preset time period.
  • the source node In response to feedback from the destination node, the source node directly transmits the first data stream to the destination node based on the first credit value preset in the network system.
  • the first data stream sent by the source node does not carry an identifier used to distinguish whether the data traffic size is obtained from the destination node.
  • the first data stream sent by the source node may respectively carry an identifier used to distinguish whether the data traffic size is obtained from the destination node. For example, a bit can be set in the packet header of the data stream being sent. "0" means sending with the first credit value, "1” means sending with the second credit value, that is, "0" means not getting it from the destination or node. Credit value, "1" Indicates that the credit value has been obtained from the destination node.
  • the source node when the source node fails to obtain the first data stream indicated by the destination node due to the size of the data stream sent to the destination node, congestion occurs on some nodes (that is, the node receives the third data stream from the source node). (When the cache capacity of a port of a data flow exceeds a preset threshold), the node where congestion occurs feeds back an indication to the source node through the data link connected to the source node indicating that the first data flow output by the source node has caused congestion in the network system. information.
  • the source node can convert the first data flow transmitted based on the first credit value (that is, the uncontrolled data flow) into the first data flow transmitted based on the second credit value (that is, the controlled data flow).
  • the data flow that is assigned a credit value by the destination node will not be affected in any way, and can still continue to be transmitted based on the credit value assigned by the destination node, because what causes congestion in the network system is usually those who have not obtained the amount of data traffic from the destination node.
  • Data flow, data flow with credit value distributed through the destination node will not cause congestion in the network system, unlike the existing technology that does not distinguish between data flows that cause congestion and non-congested data flows, thus affecting the normal transmission of non-congested data flows.
  • the data stream is transmitted only by transmitting the second credit value assigned by the destination node, which means that normal transmission of non-congested data in the network system can be ensured.
  • the congestion of the network system can be greatly reduced while ensuring a high throughput rate of the network system; in addition, compared with directly discarding newly incoming data packets in the prior art, the data transmission method provided by the embodiment of the present application can also Ensure the accuracy of data stream transmission.
  • the data transmission method further includes: stopping transmitting the first data stream based on the first credit value.
  • the first data flow also includes indication information requesting to obtain the second credit value; based on the first information and the second credit value obtained from the destination node, Before transmitting the first data stream to the destination node, the data transmission method further includes: receiving the second credit value from the destination node.
  • the first information is used to indicate that the buffer capacity of the port through which the destination node receives the first data flow exceeds a preset threshold of the destination node.
  • transmitting the first data stream to the destination node according to a preset first credit value includes: transmitting the first data stream to the destination node through a switching node; and The first information is used to indicate that the buffer capacity of the port through which the switching node receives the first data flow exceeds the preset threshold of the switching node.
  • the first information is carried by the switching node in the first data stream and transmitted to the destination node.
  • a part of the first data flow can be stored in the cache of the switching node. Then, the first information is added to the first data stream, and the part of the first data stream added with the first information is transmitted to the destination node.
  • the number of times the data stream is sent can be reduced, which is beneficial to saving bandwidth of the network system.
  • the receiving the first information includes: receiving a second data stream, wherein the second data stream carries the first information, the second credit value and an indication and The identifier of the first data stream corresponding to the second credit value.
  • the source node receiving the second data stream may include two situations. In the first situation, the destination node adds the first information to the second data stream and sends it to the source node; in the second situation, the switching node adds the first information to the second data stream and sends it to the source node; in the second situation, the switching node The first information is added to the second data stream and sent to the source node.
  • the number of times the data stream is sent can be reduced, which is beneficial to saving bandwidth of the network system.
  • the data transmission method further includes: receiving second information, the second information being used to indicate the The cache capacity of the port is lower than a preset threshold; based on the second information and the first credit value, a third data flow is transmitted to the destination node, where the third data flow includes a plurality of data packets.
  • the cache capacity of the port that receives the first data stream is lower than the preset threshold, it means that the congestion in the network system is relieved.
  • the third data stream is transmitted to the destination node through the first credit value, which can improve the throughput rate and data of the network system. transmission efficiency.
  • the data transmission further includes: transmitting a third data stream to the destination node based on the first credit value after a preset period, where the third data stream includes a plurality of data packets.
  • the node where congestion occurs can process such that the buffer capacity of the port that receives the first data flow is lower than the preset threshold. If the source node does not receive information indicating that the buffer capacity of the port receiving the first data stream is lower than the preset threshold after a preset period of time, the transmission of the indication information may be lost, or there may be a delay in sending the information by the congested node.
  • the source node uses the first credit value to transmit the third data stream after a preset period of time, which can improve the throughput rate and data transmission efficiency of the network system.
  • the second data stream is a confirmation character message.
  • inventions of the present application provide a data transmission system.
  • the data transmission system includes: multiple chips, any one of the multiple chips includes a port, and any one of the chips communicates with other chips through the port. ;
  • the source chip among the plurality of chips is used to: receive a first data stream, where the first data stream includes a plurality of data packets; and transmit the first data stream to the destination chip according to a preset first credit value , the first credit value is used to indicate a preset data traffic size transmitted at one time; receiving first information, the first information is used to indicate that the cache capacity of the port receiving the first data flow exceeds a preset threshold ; Based on the first information and a second credit value obtained from the destination chip, transmit the first data stream to the destination chip, wherein the second credit value is used to indicate the data received from the destination chip.
  • the source chip after receiving the first information, is further configured to: stop transmitting the first data stream based on the first credit value.
  • the first data stream further includes indication information requesting to obtain a second credit value; based on the first information and the second credit value obtained from the destination chip, Before the destination chip transmits the first data stream, the source chip is further configured to: receive the second credit value from the destination chip.
  • the first information is used to indicate that the cache capacity of the port through which the destination chip receives the first data flow exceeds a preset threshold of the destination chip.
  • the plurality of chips further include a switching chip.
  • the source chip transmits the first data stream to the destination chip according to a preset first credit value
  • the source chip specifically uses In: transmitting the first data flow to the destination chip through the switching chip; and the first information is used to indicate that the buffer capacity of the port of the switching chip receiving the first data flow exceeds the switching capacity.
  • the preset threshold of the chip is used to indicate that the buffer capacity of the port of the switching chip receiving the first data flow exceeds the switching capacity.
  • the switching chip is configured to: receive the first data stream; add the first information to the first data stream and transmit it to the destination chip.
  • the destination chip is specifically configured to: receive the first data stream; and generate a second data stream based on the first information and the indication information.
  • the second data stream The first information, the second credit value, and the identifier used to indicate the first data stream corresponding to the second credit value are carried in the data stream; and the second data stream is sent to the source chip.
  • the purpose The end chip is specifically configured to: generate a second data stream based on the indication information, the second data stream carries the second credit value, and transmit the second data stream to the source chip through the switching chip.
  • Data flow; the switching chip is specifically configured to: add the first information to the second data flow, and transmit the second data flow to the source chip.
  • the source chip is further configured to: receive second information, where the second information is used to indicate the The cache capacity of the port is lower than a preset threshold; based on the second information and the first credit value, a third data stream is transmitted to the destination chip, where the third data stream includes a plurality of data packets.
  • the source chip after stopping transmitting the first data stream based on the first credit value, is further configured to: after a preset period of time, based on the first credit value, The destination chip transmits a third data stream, and the third data stream includes a plurality of data packets.
  • the second data stream is a confirmation character message.
  • embodiments of the present application provide a device, which may include: a first receiving unit, configured to receive a first data stream, where the first data stream includes a plurality of data packets; and a first sending unit, configured to Transmit the first data stream to the destination node according to a preset first credit value, where the first credit value is used to indicate the preset amount of data traffic transmitted at one time; the second receiving unit is used to receive the first information, The first information is used to indicate that the buffer capacity of the port receiving the first data flow exceeds a preset threshold; the second The transmitting unit transmits the first data stream to the destination node based on the first information and a second credit value obtained from the destination node, wherein the second credit value is used to indicate the data flow from the destination node.
  • the device further includes: a transmission stopping unit configured to stop transmitting the first data stream based on the first credit value.
  • the first data stream further includes indication information requesting to obtain the second credit value.
  • the first information is used to indicate that the buffer capacity of the port through which the destination node receives the first data flow exceeds a preset threshold of the destination node. In a possible implementation, the first information is used to indicate that the buffer capacity of the port through which the destination node receives the first data flow exceeds a preset threshold of the destination node.
  • the first information is carried by the switching node in the first data stream and transmitted to the destination node.
  • the second receiving unit is specifically configured to: receive a second data stream, where the second data stream carries the first information, the second credit value, and an identifier for indicating the first data stream corresponding to the second credit value.
  • the device after stopping transmitting the first data stream based on the first credit value, the device further includes: a third receiving unit configured to receive second information, the second information used to indicate that the buffer capacity of the port is lower than a preset threshold; a third sending unit used to transmit a third data stream to the destination node based on the second information and the first credit value, the third Three data streams include multiple data packets.
  • the device after stopping transmitting the first data stream based on the first credit value, the device further includes: a third sending unit configured to transmit the first data stream based on the first credit value after a preset period of time.
  • the credit value is used to transmit a third data stream to the destination node, where the third data stream includes a plurality of data packets.
  • embodiments of the present application provide a chip, which includes a processor, a memory, a cache, and a port; wherein the port is controlled by the processor and is used to communicate with other chips outside the chip. , to receive data streams from other chips or transmit data streams to other chips, and store the received data streams into the cache; the processor is used to execute program instructions in the memory to implement the first aspect Described data transmission method.
  • embodiments of the present application provide a computer-readable storage medium.
  • the computer-readable storage medium stores a computer program.
  • the computer program When executed by the controller, the computer program is used to implement the data as described in the first aspect. Transmission method.
  • embodiments of the present application provide a computer program product, which is used to implement the data transmission method described in the first aspect when the computer program product is executed by a controller.
  • Figure 1 is a schematic structural diagram of the prior art used to alleviate node congestion provided by the embodiment of the present application;
  • FIG2 is a schematic diagram of an architecture of a network system provided in an embodiment of the present application.
  • Figure 3 is a schematic diagram of the node structure of the network system shown in Figure 2 provided by an embodiment of the present application;
  • Figures 4A to 4C are schematic diagrams of scenarios in which the source node sends data streams based on credit1 and credit2 provided by the embodiment of the present application;
  • Figure 5 is a schematic diagram of the interaction flow between nodes in the network system shown in Figure 2 provided by an embodiment of the present application;
  • FIGS 6A to 6B are schematic diagrams of the interaction process shown in Figure 5 provided by the embodiment of the present application.
  • Figure 7 is another schematic diagram of the interaction process between nodes in the network system shown in Figure 2 provided by the embodiment of the present application;
  • FIG8 is a schematic diagram of a scenario of the interaction process shown in FIG7 provided in an embodiment of the present application.
  • Figure 9 is another schematic diagram of the interaction process between nodes in the network system shown in Figure 2 provided by the embodiment of the present application;
  • Figure 10 is a schematic diagram of the interaction process shown in Figure 8 provided by the embodiment of the present application.
  • FIG 11 is another schematic diagram of the interaction process between nodes in the network system shown in Figure 2 provided by the embodiment of the present application;
  • Figure 12 is a flow chart of the data transmission method provided by the embodiment of the present application.
  • Figure 13 is a schematic structural diagram of the device provided by the embodiment of the present application.
  • FIG. 1 illustrates local flow control as a way to alleviate congestion.
  • a source node S1 a destination node D1 and switching nodes A1, B1, B2 and C1 are shown.
  • switching node C1 When local flow control is used to alleviate congestion, assuming that switching node C1 is congested, and when the cache of switching node C1 is almost full, you can choose to directly discard newly incoming data packets; or, you can also use priority-based FPC (priority-based flow control) method, that is, the switching node C1 notifies the upstream node B1 to stop sending data flows.
  • FPC priority-based flow control
  • source-side flow control is used to alleviate congestion, assuming that switching node C1 is congested, switching node C1 can use early congestion notification (ECN, Early Congestion Notification) to send information to source node S1 to reduce the load of source node S1. Data stream transfer rate.
  • ECN Early Congestion Notification
  • the above two methods of alleviating congestion do not distinguish between data flows that cause congestion and non-congested data flows, which cannot guarantee the normal transmission of non-congested data flows; in addition, when the local flow control method selects direct When discarding newly incoming data packets, it is easy to cause packet loss problems; the source-side flow control method cannot maintain high throughput when the network system is overloaded.
  • the industry further proposes to use push-pull hybrid scheduling to transmit data traffic.
  • the source node S1 can obtain the credit value (credit) of a certain data flow from the destination node D1.
  • the credit value is used to indicate the maximum data flow obtained from the destination node D1 for one transmission.
  • the source node S1 receives the credit, the source node S1 sends the data stream to the destination node D1 according to the credit, which can reduce network congestion and achieve high throughput of the data network.
  • the above credit-based data flow can also be called controlled data flow.
  • the source node S1 can also send an uncontrolled data stream to reduce the transmission delay of the data stream.
  • the uncontrolled data stream is specifically a data stream that has not been allocated credit by the destination node. It is usually a data stream based on the system of the source node S1. The latest data stream sent to the destination node with a preset credit.
  • Source node S1 can send controlled data streams and uncontrolled data streams at the same time. For example, while the source node S1 sends a controlled data flow to the destination node D1 through the credit allocated by the destination node D1, it also sends an uncontrolled data flow to the destination node D1.
  • the credit preset by the system is usually higher, which usually does not consider the data throughput in the network. That is, what causes congestion at various nodes in the network is usually uncontrolled data flow.
  • switching node C1 When congestion occurs at switching node C1, switching node C1 does not detect whether the congestion is caused by controlled data flow or uncontrolled data flow. As long as the data flow causing congestion is caused by source node S1, switching node C1 will send data to the source node S1.
  • the node S1 transmits information indicating that the data flow transmitted by the source node S1 is congested, so that the source node S1 stops transmitting the data flow to the network or transmits data to the data network in a manner that reduces network bandwidth. Since the controlled data flow does not cause network node congestion, this causes the controlled data flow sent by the source node S1 to be unable to be transmitted in the network, reducing the throughput of the network system.
  • the data transmission method and data transmission system provided by the embodiments of the present application are based on the above-mentioned push-pull hybrid scheduling method for data traffic transmission.
  • the source node causes congestion in some nodes due to sending a data stream to the destination node that has not been assigned a credit value by the destination node (that is, the data stream received by the network node from the source node exceeds a preset threshold, or the cache growth rate of a port connected to the source node in the network node exceeds a preset threshold)
  • the congested node feeds back to the source node through the data link connected to the source node an indication that the data stream output by the source node has caused congestion in the network system.
  • the source node can only transmit to the destination node data streams that have been assigned a credit value by the destination node. Since the data streams that cause congestion in the network system are usually those that have not been assigned a credit value by the destination node, the data streams that have been assigned a credit value by the destination node will not cause congestion in the network system.
  • the data transmission method provided in the embodiment of the present application when the network system is congested, only transmits the data stream that is allocated credit value by the destination node, that is, it can ensure the normal transmission of non-congested data in the network system, so that the congestion of the network system can be greatly reduced while ensuring the high throughput of the network system; compared with the prior art in which the source node transmits data to the data network by reducing the bandwidth in the network when congestion occurs, the embodiment of the present application can continue to transmit data based on the controlled credit value obtained from the destination end, so that the bandwidth in the network does not need to be reduced, thereby ensuring the high throughput of the network system; in addition, compared with the prior art in which the newly incoming data packets are directly discarded, the data transmission method provided
  • the data transmission method provided by the embodiments of this application can be applied to various network systems.
  • data center network systems high-performance computer cluster network systems, cloud network systems, or chip-packaged on-chip network systems.
  • This embodiment of the present application takes a data center network system as an example to describe in detail the network system provided by the embodiment of the present application.
  • Figure 2 is a schematic structural diagram of a network system 100 provided by an embodiment of the present application.
  • the network system 100 may be a leaf-spine (leaf-spine) network architecture.
  • the network system 100 includes multiple leaf nodes and multiple spine nodes, and the multiple leaf nodes and multiple spine nodes are fully connected.
  • Leaf nodes and spine nodes can also be called switching nodes or routing nodes.
  • Figure 2 schematically shows four leaf nodes, namely node a1, node a2, node a3 and node a4.
  • Figure 2 schematically shows two spine nodes, respectively node b1 and node b2.
  • the downlink port of the leaf node is connected to the server that needs to exchange data traffic, and the uplink port is connected to the spine node.
  • data exchange between servers connected to different leaf nodes can be implemented through the spine nodes commonly connected to the different leaf nodes.
  • node a1 can send the data stream of the connected server S11 to node b1, and node b1 can send it to node a2, so that node a2 can transmit the data stream to node a2.
  • Server S21 if the servers connected by node a1 and node a2 need to exchange data, node a1 can send the data stream of the connected server S11 to node b1, and node b1 can send it to node a2, so that node a2 can transmit the data stream to node a2.
  • Server S21 For example, if the servers connected by node a1 and node a2 need to exchange data, node a1 can send the data stream of the connected server S11 to node b1, and node b1 can send it to node a2, so that node a2 can transmit the data stream to node a2.
  • the leaf node used by the network system 100 to send data flows may be called a source node
  • the leaf node used to receive data flows may be called a destination node
  • the spine node used to transmit data flows may be called a switching node.
  • server S11 transmits data to server S21
  • the leaf node a1 connected to server S11 can be called the source node
  • the leaf node a21 connected to server S21 can be called the destination node.
  • the embodiments of this application are described using a data center network system as an example and are not used to limit the solution.
  • the server can communicate with the terminal device through multiple switches.
  • the server can be the source node and the terminal device can be the destination node; for another example, two terminal devices can communicate with each other through the server and The switch communicates, the terminal device used to send the data flow is the source node, and the device used to receive the data flow is the destination node.
  • each node can be as shown in Figure 3.
  • Figure 3 is a schematic structural diagram of a node provided by an embodiment of the present application.
  • the node can be a source node, a destination node or a switching node in the network system 100, such as the leaf node or spine node shown in Figure 2.
  • the switching nodes in the network system 100 may be switches, routers, or other network devices.
  • Figure 3 is only an example of each node.
  • the node provided by the embodiment of the present application can also be any type of device, such as a chip or chipset, or a circuit board equipped with a chip or chipset, etc. This embodiment provides This is not limited.
  • the node includes a processor, memory, and ports.
  • the processor, memory and ports may be integrated into one or more chips, which may be regarded as a chipset.
  • the processor performs various functions of the node by running or executing software programs stored in the memory and calling instructions and data stored in the memory.
  • the processor may include one or more modules, such as a central processing unit (CPU) and a network processor (NP).
  • the network processor may be composed of an application-specific integrated circuit (application-specific integrated circuit). ASIC) or field-Programmable Gate array (FPGA) chip implementation.
  • Memory can be used to store software programs, instructions, and data, and can be implemented by any type of volatile or non-volatile memory or a combination thereof, including, for example, static random access memory (SRAM), dynamic random access memory (SDRAM) ), one or more of double-rate synchronous dynamic random access memory (DDR), erasable programmable read-only memory (EPROM), and read-only memory (ROM).
  • a node may include multiple ports, n is schematically shown in the figure. Among the plurality of ports, some ports are configured as input ports of the node to receive data from other nodes, and other ports are configured as output ports of the node to send data to other nodes or servers. It should be noted that the embodiment of the present application does not specifically limit the number of ports included in each node. The number of ports included in each node is set according to the needs of the scenario.
  • the above-mentioned memory may also include a cache, which is used to store data streams transmitted by other nodes.
  • the cache on each node in the network system 100 is a single-point cache, that is, the cache in the node is divided into multiple cache spaces, and each cache space is dedicated to one of the ports. .
  • node a1 in the network system 100 shown in FIG. 2 includes 10 ports
  • the cache in node a1 can be divided into 10 cache spaces, and the cache spaces correspond to the ports one-to-one.
  • node a1 receives a data stream through port port1 and stores the received data stream in the cache space corresponding to port1.
  • the cache on each node in the network system 100 is a dynamic shared cache, that is, the cache in the node is divided into multiple cache spaces, and the data streams received by multiple ports on the node Both can be stored in the same cache space.
  • node a1 receives data streams through ports port1 and port2 respectively, and stores the received data in the same cache space.
  • the capacity of each cache space and the mapping relationship between each cache space and ports can be dynamically adjusted based on the needs of the scenario.
  • each node includes three ports.
  • Node a1 includes port Pa11, port Pa12 and port Pa13
  • node a2 includes port Pa21, port Pa22 and port Pa23
  • node a3 includes port Pa31, port Pa32 and port Pa33
  • node a4 includes port Pa41, port Pa42 and port Pa43
  • node b1 It includes port Pb11, port Pb12 and port Pb13
  • node b2 includes port Pb21, port Pb122 and port Pb23.
  • the port Pa11 of the node a1 is connected to the server S11 and the server S12, the port Pa12 of the node a1 is connected to the port Pb11 of the node b1, the port Pa13 of the node a1 is connected to the port Pb21 of the node b2; the port Pa21 of the node a2 Connected to server S21 and server S22, the port Pa22 of node a2 is connected to the port Pb11 of node b1, the port Pa23 of node a2 is connected to the port Pb22 of node b2; the port Pa31 of node a3 is connected to server S31 and server S32, node a3 Port Pa32 is connected to port Pb12 of node b1, port Pa33 of node a3 is connected to port Pb23 of node b2; port Pa41 of node a4 is connected to server S41 and server S42, port Pa42 of node a4 is connected to port Pb13 of no
  • the number of nodes, the number of ports included in each node, and the connection relationship between the ports shown in Figure 2 are schematic and are set based on the needs of the application scenario.
  • the application examples are not specifically limited.
  • the port Pa12 of the node a1 can be connected to the port Pb11 of the node b1 and the port Pb21 of the node b2 at the same time.
  • the source node used to send the data stream can maintain two credit counters.
  • one credit counter (denoted as credit1)
  • the value (that is, the preset credit value) is preset by the system. It is used to indicate the preset amount of data traffic transmitted at one time.
  • the value of a credit counter (recorded as credit2) (that is, the credit obtained from the destination node value) is set based on the credit obtained from the destination node, which is used to indicate the maximum data flow obtained from the destination node for one transmission.
  • the source node can send data streams based on credit1 and credit2.
  • the data flow sent to the destination node based on the count of credit1 is an uncontrolled data flow
  • the data flow sent to the destination node based on the count of credit2 is a controlled data flow.
  • node a1 as the source node and node a3 as the destination node shown in Figure 2 as an example, through the scenarios shown in Figures 4A to 4C, the configuration of the value of credit1 and the value of credit2, and the source node based on these two
  • the credit counter transmits the data stream and is described in more detail.
  • node a1 In the scenario shown in FIG4A , it is assumed that node a1 is currently in the initial state. In this initial state, the credit1 count in node a1 is full, assuming it is 10; the credit2 count in node a1 is 0.
  • Node a1 receives data stream D11 from the server. Data stream D11 is a new data stream. Based on the value in credit1, node a1 transmits data stream D11 to destination node a3. The data flow of the transmitted data stream D11 corresponds to the value of credit1. Data stream D11 is sent based on credit1, that is, it is an uncontrolled data stream. In addition, the data stream D11 also carries indication information indicating the credit value of data stream D11.
  • the count of credit1 becomes 0.
  • the value in credit1 is filled based on the system's preset rate. After a certain period of time, it is assumed that the count of credit1 becomes 3, and node a1 obtains the credit value of data stream D11 from node a3. Assume that the credit value is also 10.
  • node a1 If data flow D11 still has data traffic to be transmitted in the scenario of Figure 4A, and the size of the data traffic to be transmitted is exactly equal to the data traffic indicated by the credit value obtained from node a3, then node a1 will obtain the credit value from node a3. The value is filled to credit2, at which point the count of credit2 becomes 10, as shown in the scene in Figure 4B. Then, node a1 transmits data flow D11 to destination node a3 based on the value in credit2. The data flow size of the transmitted data flow D11 corresponds to the data flow size indicated by the value of credit2.
  • the value in credit1 is the same as the value in credit2.
  • Node a1 transmits the bandwidth of data stream D11 based on the value in credit1, and node a1 is based on the value in credit2.
  • the bandwidth of the transmission data stream D11 is the same.
  • node a1 receives data flow D13 from the server, data flow D13 is a new data flow, and this data flow D13 is an uncontrolled data flow.
  • node a1 is based on the count 3 of credit1, that is, using the data traffic size corresponding to the count 3 of credit1, it transmits data flow D13 to node a3; in the scenario of Figure 4C, node a1 is based on the count of credit1 10, that is, the data flow size corresponding to the count 10 of credit1 is used to transmit data flow D13 to node a3.
  • each node can send or receive multiple data streams, such as eight, through one of the ports. data flow.
  • the following takes the node a1 and the node a2 in Figure 2 as the source node, the node a3 and the node a4 as the destination node, the node b1 and the node b2 as the switching node as an example, combined with the interaction process 500 shown in Figure 5, and Figure 6A to Figure 6B
  • the application scenario shown provides a more detailed description of the data transmission method provided by the embodiment of the present application. Please refer to Figure 5.
  • Figure 5 is an interaction process 500 between node a1, node b1 and node a3 provided by the embodiment of the present application.
  • the interaction process 500 includes:
  • Step 501 Node a1 transmits data stream D11 to node b1 based on the received data stream D11 and the preset credit value 1.
  • node a1 can receive the data stream from the server it is connected to, and then prioritize the received data stream.
  • the priority refers to the data stream sent to the network first. Assume that node a1 receives multiple pieces of data. After sorting based on priority, data stream D11 is sent first. Assume that data flow D11 is a new data flow, that is, it has not communicated with the destination node to which data flow D11 needs to be transmitted, so that data flow D11 has not obtained credit value from its corresponding destination node, that is, data flow D11 is currently Uncontrolled data flow.
  • node a1 Based on the principle of reducing the transmission delay of the data flow, node a1 directly transmits data flow D11 to port Pb11 of node b1 through port Pa12 based on credit value 1, that is, the value of credit1 shown in Figure 4A, as shown in Figure 6A. It should be noted that node a1 can generate multiple data packets for transmission based on the frame format agreed by the data transmission protocol in the network system 100.
  • the data packets also include It includes a field indicating the destination node a3, a field indicating the identity document (ID, Identity document) of the node a1, a field indicating whether the data flow D11 is a controlled data flow, a field indicating the ID of the data flow D11, and a field indicating the port of the node a1. number field.
  • the data frame may also include more or fewer fields, which is not specifically limited in the embodiments of this application.
  • node a1 transmits data flow D11 to node a3, it may also transmit data flow D12 at the same time.
  • This data flow D12 is a controlled data flow, and this controlled data flow D12 is pre-slave from the node.
  • a3 obtains a data flow with a credit value, which is the maximum data flow of data flow D12 transmitted by node a1 at one time.
  • node b1 saves part of the data stream D11 to the cache corresponding to port Pb11, and transmits the other part of the data stream D11 to node a3.
  • node b1 detects that the cache capacity of port Pb11 receiving data stream D11 exceeds the preset threshold, and transmits information I1 to node a1, where the information I1 is used to indicate that the cache capacity of port Pb11 receiving data stream D11 exceeds the preset threshold.
  • node b1 Based on the bandwidth on the transmission path, node b1 transmits part of the data flow D11 to node a3 through port Pb12, and saves another part of the data flow D11 in the cache corresponding to port Pb11.
  • the port Pb11 of the node b1 not only receives the data flow D11 from the node a1, but also receives the data flow D21 from the node a2.
  • the data stream D11 received from node a1 and the data stream D21 received from node a2 are both temporarily stored in the same cache corresponding to port Pb11. Assume that data flow D21 is a data flow assigned a credit value via the destination node.
  • the credit value of data flow D21 is obtained from node a3 by node a2 communicating with node a3 in advance.
  • the data flow D21 is a controlled data flow. Since the traffic size transmitted by data flow D21 at one time is configured based on the credit value assigned by the destination node, it usually does not cause congestion at node b1; while data flow D11 is an uncontrolled data flow, and the traffic size transmitted at one time is usually The network system is pre-configured, and the size of the data flow is usually not fixed. In order to reduce the transmission delay of the data flow, the flow of the data flow transmitted at one time is usually large, that is, uncontrolled data flow usually causes congestion on node b1.
  • node b1 When node b1 detects that the cache capacity corresponding to port Pb11 exceeds the preset threshold, that is, node b1 is congested, node b1 can transmit information I1 to node a1 to indicate that the cache capacity occupied by data flow D11 exceeds the preset threshold.
  • the information I1 may be a Push Congestion Notification (PCN) message, and the format of the PCN message may be set based on the data transmission protocol in the network system 100 .
  • the PCN message may include a field indicating the ID of node a1 and an indication. Field identifying the data flow D11 that caused congestion.
  • the PCN message may also include more or fewer fields, which is not specifically limited in this embodiment of the present application.
  • the cache capacity corresponding to the port Pb11 exceeds a preset threshold.
  • the preset threshold is, for example, 60% of the cache capacity, 80% of the cache capacity, etc.
  • the received data stream D11 cannot be sent all at once.
  • Node b1 sends part of the data stream and caches part of the data, and then based on the data transmission bandwidth, it can The amount of data to be accommodated, the cached data stream D11 is transmitted to node a3 in one or more times.
  • Step 504 Node a3 sends indication information indicating the credit value 2 of data flow D11 to node a1.
  • node a1 can transmit request information requesting to obtain credit value 2 to node a3 in various ways.
  • node a1 can add the request information requesting the credit value 2 to the above-mentioned data flow D11 and transmit it to node a3; in the second possible implementation, node a1 can transmit the data Before or after the flow D11, the request information requesting to obtain the credit value 2 is transmitted to the node a3 independently of the data flow D11.
  • the embodiment of the present application does not specifically limit the method of requesting the request information to obtain the credit value 2.
  • the credit value 2 can be obtained by node a1 sending a request (req, request) message to node a3 and obtaining an acknowledgment character (ACK, acknowledge character) message from node a3. That is to say, in the first possible implementation method mentioned above, node a1 can add the req message to the data flow D11. The req message carries the request information to obtain the credit value 2; in the second possible implementation method mentioned above, In a possible implementation, node a1 directly transmits a req message to node a3, and the req message carries request information for obtaining credit value 2.
  • Step 505 Node a1 transmits data flow D11 to node a3 based on credit value 2. In this step, node a1 can also continue to transmit data flow D12 to node a3 based on the credit value of data flow D12 obtained from node a3 in advance.
  • node a1 After node a1 receives the information I1 from node b1, it can first detect whether the credit value 2 of the data stream D11 obtained from node a3, that is, whether the count of credit2 corresponding to the data stream D11 as shown in Figure 4A is 0. . When it is detected that the count of credit2 shown in Figure 4A is not 0, that is, the credit value 2 of the data stream D11 has been obtained from node a3, at this time, it stops based on the preset credit value (that is, based on the credit1 shown in Figure 4A count) transmits the uncontrolled data flow D11, and transmits the controlled data flow to the node a3 based on the credit value 2 obtained from the node a3.
  • the preset credit value that is, based on the credit1 shown in Figure 4A count
  • the node a1 can also transmit the controlled data flow D12, as shown in Figure 6B.
  • the data D12 is an old data flow, that is, it has communicated with the destination node to which the data flow D12 is to be transmitted, in order to obtain a credit value from the destination node corresponding to the data flow D12, that is, the data flow D12 is currently a controlled data flow.
  • the credit value of data flow D12 is used to indicate the cache capacity allocated by node a3 to data flow D12.
  • the credit value of data flow D12 is obtained by node a1 from node a3 in advance. It should be noted that the data flow D11 and data flow D12 as mentioned above are independent data flows.
  • node a1 needs to apply to node a3 for data flow D11. For the corresponding credit value, it is also necessary to apply to node a3 for the credit value corresponding to data flow D12. Therefore, when node a3 sends the credit value of each data flow to node a1, it also needs to carry a data flow identifier indicating the data flow corresponding to the credit value. In the scenario shown in Figure 6B, node a1 transmits data flow D11 to port Pb11 of node b1 through port Pa12 based on credit value 2.
  • Node b1 forwards data flow D11 to port Pb11 of node b1 through port Pb12 to realize node a1 transmits data flow D11. to node a3.
  • node a1 transmits data stream D12 to port Pb21 of node b2 through port Pa12, and node b2 forwards it to port Pa33 of node a3 through port Pb23, so that node a1 transmits data stream D12 to node a3.
  • the information I1 that the cache capacity of the port Pb11 exceeds the preset threshold is information indicating that congestion occurs in the data flow D11. Therefore, when the network system is congested, node a1 can stop transmitting the data stream D11 based on the credit value credit1 regardless of whether there is a remaining count of the credit value 1 (that is, whether the count in credit1 shown in Figure 4A is zero).
  • the data flow D12 allocated with a credit value and the data flow D11 allocated with a credit value are transmitted to the network system 100. Since the data traffic that is not allocated a credit value usually causes congestion in the network system, the data traffic allocated with a credit value is not It will cause congestion in the network system.
  • the network system 100 provided by the embodiment of the present application, when the network When the system is congested, only the data flows assigned credit values are transmitted, which ensures the normal transmission of non-congested data flows in the network system, thus greatly reducing the congestion of the network system while ensuring a high throughput rate of the network system; in addition, , Compared with the existing technology that directly discards newly incoming data packets, the data transmission method provided by the embodiment of the present application can also ensure the accuracy of data stream transmission.
  • FIG5 shows the process of converting the first data stream from an uncontrolled data stream to a controlled data stream, so that node a1 can continue to transmit the controlled data stream D11 to node a3.
  • the above step 504 can also occur before the above step 503. That is, node a1 may first receive the indication information indicating the credit value 2 of data stream D11 from node a3, and then receive the information I1 sent by node b1; or it can be said that node a3 first sends the indication information indicating the credit value 2 of data stream D11 to node a1, and node b1 then sends the information I1 to node a1.
  • node a1 when node a1 first receives the information I1 sent by node b1, but does not receive the indication information indicating the credit value 2 of data stream D11 sent by node a3, in this case, as shown in FIG4A , the count of credit2 corresponding to data stream D11 is 0, and at this time node a1 may temporarily stop transmitting data stream D11 to node a3.
  • the interaction process 100 shown in Figure 5 schematically shows that node b1 directly transmits information I1 to node a1 when detecting congestion on port Pb11.
  • the above information I1 can be added by node b1 to the req message in data flow D11 and transmitted to node a3.
  • a3 After a3 generates an ACK message based on the req message, it can add the information I1 to the ACK message, and transmit the information I1 and the indication information indicating the credit value 2 of the data flow D11 to the node a1 together.
  • the above information I1 can be added to the req message in the data flow D11 by the node b1.
  • the above information I1 can also be directly added to the ACK message, so node b1 will add information I1 and indication data
  • the message indicating the credit value 2 of flow D11 is transmitted to node a1.
  • the following still takes the node a1 and the node a2 in Figure 2 as the source node, the node a3 and the node a4 as the destination node, and the node b1 and the node b2 as the switching node as an example.
  • Figure 7 is an interaction process 700 between node a1, node b1 and node a3 provided by the embodiment of the present application.
  • Figure 8 is a schematic diagram of an application scenario of the interaction process shown in Figure 7.
  • the interaction process 700 includes:
  • Step 701 Node a1 transmits data stream D11 to node b1 based on the received data stream D11 and the preset credit value 1.
  • Data flow D11 includes multiple data packets that need to be sent, and also includes a req message, which carries request information for obtaining credit value 2.
  • the req message may be set based on the data transmission protocol in the network system 100 .
  • the req message may include a field indicating the destination node a3, a field indicating the ID of the node a1, a field indicating the ID of the data flow D11, and a field indicating obtaining a credit value, etc.
  • the req message may also include more or fewer fields, which is not specifically limited in the embodiments of this application. It should be noted that an empty field can also be set in the req message for switching node b1 to add information I1 to it.
  • Step 702 Node b1 saves part of the data flow D11 into the cache corresponding to port Pb11.
  • steps 701 to 702 is the same as steps 501 to 502 shown in Figure 5.
  • steps 501 and 502 refer to the relevant descriptions of steps 501 and 502, which will not be described again.
  • Step 703 Node b1 detects that the cache capacity of port Pb11 that receives data flow D11 exceeds the preset threshold, extracts the req message from data flow D11, and adds information I1 to the req message.
  • Step 704 Transmit the data stream D11 with the information I1 added to node a3.
  • the implementation of node b1 detecting that the buffer capacity of port Pb11 that receives data flow D11 exceeds the preset threshold is the same as that in step 503 shown in Figure 5 , node b1 detects that the buffer capacity of port Pb11 that receives data flow D11 exceeds the preset threshold.
  • the implementation methods are the same. For details, refer to the relevant descriptions in step 503 and will not be described again.
  • information I1 can be added to the req message.
  • the position where the information I1 is added to the req message can be pre-agreed based on the data transmission protocol.
  • the information I1 may include a field indicating the ID of the node a1, a field indicating the ID of the data flow D11, and a field indicating PCN information.
  • the ID of node a1 is represented by sixteen bits
  • the ID of data stream D11 is represented by sixteen bits
  • PCN information is represented by one bit.
  • the req message carries information I1.
  • Node b1 transmits data stream D11 carrying information I1 to node a1.
  • Step 705 Based on the req message, node a3 generates an ACK message carrying information I1 and indication information indicating the credit value 2 of data flow D11, and transmits the ACK message to node b1.
  • node a3 parses out each field in the req message and obtains information I1 and information requesting the credit value 2 of data flow D11. Then, node a3 generates an ACK message, which may include a field indicating the ID of node a1, a field indicating the ID of node a3, a field indicating the ID of data flow D11, and a field indicating the credit value of data flow D11.
  • the ACK message may also include more or fewer fields, which is not specifically limited in this embodiment of the present application.
  • the fields included in this information I1 are the same as the fields included in the information I1 described in step 704.
  • Step 706 Node b1 transmits the ACK message to node a1.
  • Step 707 Node a1 transmits data flow D11 to node a3 based on the ACK message and credit value 2.
  • node a1 After receiving the ACK message, information indicating the credit value 2 of the data flow D11 and information indicating that the buffer capacity of the port Pb11 receiving the data flow D11 exceeds the preset threshold are obtained from the ACK message. Therefore, node a1 uses the credit value 2 assigned by node a3 to transmit the data flow D11 to node a3 through node b2.
  • FIG. 9 is an interaction process 900 between nodes a1, node b1 and node a3 provided in an embodiment of the present application
  • Figure 10 is a schematic diagram of an application scenario of the interaction process shown in Figure 9, and the interaction process 900 includes:
  • Step 901 Node a1 transmits data stream D11 to node b1 based on the received data stream D11 and the preset credit value 1.
  • Data flow D11 includes multiple data packets that need to be sent, and also includes a req message, which carries request information for obtaining credit value 2.
  • Step 902 Node b1 saves part of the data flow D11 into the cache corresponding to port Pb11, and transmits another part of the data flow D11 to node a3.
  • steps 901 to 902 is the same as steps 501 to 502 shown in Figure 5.
  • steps 501 to 502 refer to the relevant descriptions of step 501 and step 502, and will not be repeated;
  • the content of the req message is the same as that in step 701 shown in Figure 7.
  • the contents of the req messages shown are the same.
  • steps 701 and will not be described again refer to the relevant description of step 701 and will not be described again.
  • Step 903 node a3 generates an ACK message based on the req message in the data stream D11, and transmits the ACK message to node b1.
  • the ACK message may include a field indicating the ID of node a1, a field indicating the ID of node a3, a field indicating the ID of data stream D11, and a field indicating the credit value of data stream D11.
  • the generated ACK message does not carry information I1. Among them, some empty fields can also be set in the ACK message for node b1 to add information I1 to the ACK message.
  • Step 904 Node b1 detects that the buffer capacity of port pb11 that receives data flow D11 exceeds a preset threshold, adds information I1 to the ACK message, and transmits the ACK message to node a1. After this step, in addition to the fields described in step 903, the ACK message also includes a field carrying information I1.
  • Step 905 Node a1 transmits data flow D11 to node a3 based on the ACK message and credit value 2.
  • the specific implementation of this step is the same as the specific implementation of step 707 shown in Figure 7. Refer to the description of step 707, which will not be described again.
  • the network system 100 provided by the embodiment of the present application, when the switching node in the network system 100 is congested, can notify the source node corresponding to the congested data flow through multiple methods or more opportunities, thereby further improving the efficiency of the network system 100. Efficiency of network system 100.
  • Figure 11 is an interaction process 1100 between node a1, node b1 and node a3 provided by the embodiment of the present application.
  • the interaction process 1100 includes:
  • Step 1101 Node a1 transmits data stream D11 to node b1 based on the received data stream D11 and the preset credit value 1.
  • Step 1102 Node b1 transmits data stream D11 to node a3.
  • Step 1103 Node a3 saves the data stream D11 into the cache.
  • Step 1104 Node a3 detects that the buffer capacity of port Pa32 that receives data stream D11 exceeds a preset threshold, and transmits information I1 to node b1.
  • node a3 detects the cache capacity corresponding to the port used to receive data flow D11 (for example, port Pa32 shown in Figure 2).
  • the cache capacity exceeds the preset threshold, it means that node a3 is congested and causes The congested data flow is data flow D11.
  • node a3 can transmit information I1 to node b1.
  • the information I1 may be a PCN message, and the PCN message may include a field indicating the ID of the node a1 and a field indicating the ID of the data flow D11 that causes congestion.
  • the PCN message may also include more or fewer fields, which is not specifically limited in this embodiment of the present application.
  • Step 1105 node b1 transmits the information I1 to node a1.
  • Step 1106 Node a1 transmits data stream D11 to node a3 based on the information I1 and the credit value 2 obtained in advance from node a3.
  • Figure 11 exemplarily shows that when the buffer capacity of the port Pb32 of the destination node a3 that receives the data flow D11 exceeds the preset threshold, the Connect to node a1 to transmit information I1.
  • node a3 can also add information I1 to the ACK message and transmit it to node a1 through the ACK message when the buffer capacity of port Pb32 that receives data flow D11 exceeds the preset threshold.
  • node a3 needs to receive a req message from node a1 before detecting that the cache capacity of port Pb32 exceeds the preset threshold.
  • the req message is used to request certain data from node a1 to be sent to node a3.
  • the credit value of the data flow; then, node a3 generates an ACK message based on the req message, adds information I1 to the ACK message, and transmits it to node a1.
  • step 707, step 905 or step 1106 when the node After a1 stops sending data flows with unallocated information values for a preset time, if it has not received information indicating that the congestion is relieved, that is, it has not received information indicating that the buffer capacity of the port receiving data flow D11 is lower than the preset threshold, you can A new data stream, such as data stream D13, is transmitted to the network system 100 based on the preset credit value 1 (that is, the count of credit1 shown in FIG. 4A).
  • step 707, step 905 or step 1106 when node a1 receives the indication information I2 sent by node b1 or node a3, the indication information I2 is used to indicate
  • the buffer capacity of the port receiving data flow D11 is lower than the preset threshold, and a new data flow, such as data flow D13, can be transmitted to the network system 100 based on the preset credit value 1 (that is, the count of credit1 shown in FIG. 4A). .
  • node b1 or node a3 can send the indication information I2 separately; or, node b1 adds the indication information I2 to the req message, and the node a3 generates an ACK message with the information I2 added based on the req message, and transmits it to node a1 ; Alternatively, node b1 or node a3 directly adds the indication information I2 to the ACK message and transmits it to node a1.
  • the format of each message, the format of the information I2 and the way in which the information I2 is added to each message are the same as the format of each message, the format of the information I1 and the addition of the information I1 to each of the above embodiments. The adding method in the message is similar. Please refer to the relevant description for details and will not be repeated again.
  • nodes a1 and node a2 shown in Figure 2 are used as source nodes, nodes b1 and node b2 are switching nodes, and nodes a3 and node a4 are used as destination nodes, which are described through specific scenarios. of. It can be understood that in other scenarios, node a3 can also be a source node, and node a1 can also be a destination node. The embodiment of this application does not specifically limit the source node, switching node, and destination node.
  • a data flow D11 and a data flow D12 with a credit value allocated through the destination node are used as examples for description.
  • no credit is assigned by the destination node.
  • the value data stream can include multiple data streams, and the data streams that are assigned credit values by the destination node can also include multiple data streams.
  • the multiple data streams that are not assigned credit values by the destination node can come from the same source node, or they can come from different sources. node; the multiple data flows that are assigned credit values through the destination node can also come from the same source node, or they can come from different source nodes; the multiple data flows that are not assigned credit values by the destination node may partially cause the switching node or destination Node congestion may also completely cause congestion on the switching node or destination node, depending on the specific scenario.
  • the embodiments of this application are described by taking the switching node to include only one layer as an example.
  • the source node and the destination node may include multiple layers of switching nodes, and the source node may communicate with the destination through multiple switching nodes. Node communication, any one of the multiple switching nodes may be congested due to receiving data flows without assigned credit values.
  • the embodiment of the present application provides a data transmission method 1200.
  • the data transmission method 1200 is applied to the source node in any network system. For example, it can be applied to the node a1 and the node in the data center network shown in Figure 2. a2, node a3 or node a4, at this time any node among node a1, node a2, node a3 or node a4 is the source node.
  • the data transmission method 1200 includes the following steps: Step 1201, receive a first data stream, the first data stream includes multiple data packets; Step 1202, transmit the first data stream to the destination node according to the preset first credit value , the first credit value is used to indicate the preset data traffic size transmitted at one time; Step 1203, receive first information, the first information is used to indicate that the buffer capacity of the port receiving the first data flow exceeds Preset threshold; step 1204, transmit the first data stream to the destination node based on the first information and the second credit value obtained from the destination node, where the second credit value is used to indicate The maximum data traffic obtained from the destination node for one transmission.
  • the data transmission method 1200 further includes: stopping transmission of the first data stream based on the first credit value.
  • the first data flow further includes indication information requesting to obtain the second credit value; based on the first information and the second credit value obtained from the destination node, the Before the destination node transmits the first data stream, the The data transmission method 1200 further includes: receiving the second credit value from the destination node.
  • the first information is used to indicate that the buffer capacity of the port through which the destination node receives the first data flow exceeds a preset threshold of the destination node.
  • the first information is used to indicate that the buffer capacity of the port through which the destination node receives the first data flow exceeds a preset threshold of the destination node.
  • the first information is carried by the switching node in the first data stream and transmitted to the destination node.
  • the receiving the first information includes: receiving a second data stream, wherein the second data stream carries the first information, the second credit value and an indication and The identifier of the first data stream corresponding to the second credit value.
  • the data transmission method 1200 further includes: receiving second information, the second information being used to indicate the The cache capacity of the port is lower than a preset threshold; based on the second information and the first credit value, a third data stream is transmitted to the destination node, where the third data stream includes a plurality of data packets.
  • the data transmission method 1200 further includes: after a preset period of time, based on the first credit value, The destination node transmits a third data stream, where the third data stream includes a plurality of data packets.
  • the source node (such as node a1, node a2, node a3 or node a4 shown in Figure 2) includes hardware and/or software modules corresponding to each function.
  • the present application can be implemented in the form of hardware or a combination of hardware and computer software. Whether a function is performed by hardware or computer software driving the hardware depends on the specific application and design constraints of the technical solution. Those skilled in the art can use different methods to implement the described functions in conjunction with the embodiments for each specific application, but such implementations should not be considered to be beyond the scope of this application.
  • This embodiment can divide the source node into functional modules according to the above method examples. For example, different functional modules can be divided corresponding to each function, or two or more functions can be integrated into one processing module.
  • the above integrated modules can be implemented in the form of hardware. It should be noted that the division of modules in this embodiment is schematic and is only a logical function division. In actual implementation, there may be other division methods.
  • Figure 13 shows a possible schematic diagram of the device 1300 involved in the above embodiment.
  • the device corresponding to Figure 13 Device 1300 may be a software device running on the source node, or device 1300 may be a combined software and hardware device embedded in the source node.
  • the device 1300 may include: a first receiving unit 1301, used to receive a first data stream, where the first data stream includes a plurality of data packets; a first sending unit 1302, used to send data according to a preset The first credit value is used to transmit the first data stream to the destination node.
  • the first credit value is used to indicate the preset data traffic size transmitted at one time; the second receiving unit 1303 is used to receive the first information. A piece of information is used to indicate that the buffer capacity of the port that receives the first data flow exceeds a preset threshold; the second sending unit 1304 is used to send a message to the port based on the first information and the second credit value obtained from the destination node.
  • the destination node transmits the first data flow, wherein the second credit value is used to indicate a maximum data flow transmitted at one time obtained from the destination node.
  • the device 1300 further includes: a transmission stopping unit (not shown in the figure), configured to stop transmitting the first data stream based on the first credit value.
  • a transmission stopping unit (not shown in the figure), configured to stop transmitting the first data stream based on the first credit value.
  • the first data stream further includes indication information requesting to obtain the second credit value.
  • the first information is used to indicate that a cache capacity of a port of the destination node that receives the first data flow exceeds a preset threshold of the destination node.
  • the first information is used to indicate that the buffer capacity of the port through which the destination node receives the first data flow exceeds a preset threshold of the destination node.
  • the first information is carried by the switching node in the first data stream and transmitted to the destination node.
  • the second receiving unit is specifically configured to: receive a second data stream, wherein the second data stream carries the first information, the second credit value and an indication An identification of the first data stream corresponding to the second credit value.
  • the device 1300 after stopping transmitting the first data stream based on the first credit value, the device 1300 further includes: a third receiving unit (not shown in the figure), configured to receive a third Two information, the second information is used to indicate the cache of the port The capacity is lower than the preset threshold; a third sending unit (not shown in the figure) is configured to transmit a third data stream to the destination node based on the second information and the first credit value, the third A data stream consists of multiple data packets.
  • the device 1300 after stopping transmitting the first data stream based on the first credit value, the device 1300 further includes: a third sending unit (not shown in the figure), configured to Assuming a period of time, based on the first credit value, a third data stream is transmitted to the destination node, where the third data stream includes a plurality of data packets.
  • a third sending unit (not shown in the figure), configured to Assuming a period of time, based on the first credit value, a third data stream is transmitted to the destination node, where the third data stream includes a plurality of data packets.
  • the above source node may also include at least one processor, memory and port.
  • at least one processor can call all or part of the computer program stored in the memory to control and manage the actions of the source node.
  • the memory can be used to support node execution and storage of program codes and data, etc.
  • the memory includes but is not limited to at least a part of the storage space, cache (Cache) or registers of the above-mentioned memory.
  • At least one processor may implement or execute the various exemplary plurality of logic modules described in conjunction with the present disclosure, which may be a combination of one or more microprocessors that implement computing functions.
  • at least one processor may also include other programmable logic devices, transistor logic devices, or discrete hardware components.
  • This embodiment also provides a computer-readable storage medium.
  • Computer instructions are stored in the computer-readable storage medium. When the computer instructions are run on a computer, they cause the computer to execute the above related method steps to implement the data transmission in the above embodiment. method.
  • This embodiment also provides a computer program product.
  • the computer program product When the computer program product is run on a computer, it causes the computer to perform the above related steps to implement the data transmission method in the above embodiment.
  • the computer-readable storage medium or computer program product provided by this embodiment is used to execute the corresponding method provided above. Therefore, the beneficial effects it can achieve can be referred to the corresponding method provided above. The beneficial effects will not be repeated here.
  • each functional unit in each embodiment of the present application can be integrated into one product, or each unit can exist physically alone, or two or more units can be integrated into one product.
  • the above modules are implemented in the form of software functional units and sold or used as independent products, they can be stored in a readable storage medium.
  • the technical solutions of the embodiments of the present application are essentially or contribute to the existing technology, or all or part of the technical solution can be embodied in the form of a software product, and the software product is stored in a storage medium , including several instructions to cause a device (which can be a microcontroller, a chip, etc.) or a processor to execute all or part of the steps of the methods of various embodiments of the present application.
  • the aforementioned readable storage media include: U disk, mobile hard disk, read only memory (ROM), random access memory (RAM), magnetic disk or optical disk, etc. that can store program code. medium.

Abstract

Provided in the embodiments of the present application are a data transmission method and system. The data transmission method provided in the present application comprises: receiving a first data stream, wherein the first data stream comprises a plurality of data packets; transmitting the first data stream to a destination node according to a preset first credit value, wherein the first credit value is used for indicating a preset size of data traffic transmitted at one time; receiving first information, wherein the first information is used for indicating that the cache capacity of a port which receives the first data stream exceeds a preset threshold value; and transmitting the first data stream to the destination node on the basis of the first information and a second credit value that is obtained from the destination node, wherein the second credit value is used for indicating the maximum data traffic which is transmitted at one time and is obtained from the destination node. The data transmission method can improve the data transmission efficiency of a network system.

Description

数据传输方法和数据传输系统Data transmission method and data transmission system
本申请要求于2022年09月20日提交中国专利局、申请号为202211145199.8、申请名称为“数据传输方法和数据传输系统”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims priority to the Chinese patent application filed with the China Patent Office on September 20, 2022, with the application number 202211145199.8 and the application name "Data transmission method and data transmission system", the entire content of which is incorporated into this application by reference. .
技术领域Technical field
本申请实施例涉及网络通信技术领域,尤其涉及一种数据传输方法和数据传输系统。The embodiments of the present application relate to the field of network communication technology, and in particular, to a data transmission method and a data transmission system.
背景技术Background technique
近年来,随着互联网服务、分布式计算等技术的发展,数据中心网络(dater center network,DCN)、高性能计算机群(HPC,High Performance Computing)网络、片上网络(Noc,network-on-chip)等各种网络系统得到广泛使用,实现服务器之间、服务器与终端设备之间、或者芯片上工作的各部件之间的数据交换。In recent years, with the development of Internet services, distributed computing and other technologies, data center network (DCN), high performance computer cluster (HPC, High Performance Computing) network, network-on-chip (Noc, network-on-chip) ) and other network systems are widely used to realize data exchange between servers, between servers and terminal devices, or between various components working on the chip.
当前网络系统中,通常采用源节点直接将数据流推送至目的节点的方式,以降低数据流的传输时延。然而,当网络系统中数据流量激增时,该种方式通常会导致网络系统中的某些节点发送拥塞,当该拥塞进一步扩散时,导致网络系统中产生头阻或丢包现象。现有技术中,当拥塞发生时,通常采用基于优先级流量控制机制或者尾部丢包机制的本地流量控制方式,或者采用提前拥塞通知机制的源端流量控制方式。然而,现有技术提供的流量控制方式中,对产生拥塞的数据流和非拥塞数据流不加以区分,这就无法保证非拥塞数据流的低延时性;另外,本地流量控制方式还容易造成丢包问题,源端流量控制方式在网络系统重载的情况下无法保持高吞吐量。由此,如何平衡网络系统中数据流量以保证网络系统的数据传输效率,成为需要解决的问题。In current network systems, the source node usually pushes the data stream directly to the destination node to reduce the transmission delay of the data stream. However, when data traffic surges in the network system, this method usually causes transmission congestion at some nodes in the network system. When the congestion further spreads, it leads to head blocking or packet loss in the network system. In the existing technology, when congestion occurs, a local flow control method based on a priority flow control mechanism or a tail packet loss mechanism is usually adopted, or a source-end flow control method using an advance congestion notification mechanism is used. However, the flow control method provided by the existing technology does not distinguish between the data flow that causes congestion and the non-congested data flow, which cannot guarantee the low delay of the non-congested data flow; in addition, the local flow control method is also easy to cause Packet loss problem, the source-side flow control method cannot maintain high throughput when the network system is overloaded. Therefore, how to balance the data flow in the network system to ensure the data transmission efficiency of the network system has become a problem that needs to be solved.
发明内容Contents of the invention
本申请提供的数据传输方法和系统,可以提高网络系统的数据传输效率。为达到上述目的,本申请采用如下技术方案:The data transmission method and system provided by this application can improve the data transmission efficiency of the network system. In order to achieve the above purpose, this application adopts the following technical solutions:
第一方面,本申请实施例提供一种数据传输方法,该数据传输方法包括:接收第一数据流,所述第一数据流包括多个数据包;根据预先设置的第一信用值,向目的节点传输第一数据流,所述第一信用值用于指示预先设置的一次所传输的数据流量大小;接收第一信息,所述第一信息用于指示接收所述第一数据流的端口的缓存容量超过预设阈值;基于所述第一信息以及从所述目的节点获得的第二信用值,向所述目的节点传输所述第一数据流,其中,所述第二信用值用于指示从所述目的节点获得的一次所传输的最大数据流量。In a first aspect, embodiments of the present application provide a data transmission method. The data transmission method includes: receiving a first data stream, where the first data stream includes a plurality of data packets; and sending data to a destination according to a preset first credit value. The node transmits the first data stream, and the first credit value is used to indicate a preset data traffic size transmitted at one time; and receives first information, the first information is used to indicate the number of the port that receives the first data stream. The cache capacity exceeds a preset threshold; based on the first information and a second credit value obtained from the destination node, the first data stream is transmitted to the destination node, where the second credit value is used to indicate The maximum data traffic obtained from the destination node for one transmission.
本申请实施例中,用于发送第一数据流的源节点中可以维护有两种credit计数器,该两种credit计数器中,一种credit计数器(记为credit1)的数值(也即第一信用值)是系统预先设置的,其用于指示预先设置的一次所传输的数据流量大小,一种credit计数器(记为credit2)的数值(也即从目的节点获得的第二信用值)是基于从目的节点获得的credit而设置的,其用于指示从目的节点获得的一次所传输的最大数据流量。源节点可以基于credit1和credit2,发送数据流。其中,基于credit1的计数向目的节点发送的数据流为非受控数据流,基于credit2的计数向目的节点发送的数据流为受控数据流。In the embodiment of this application, the source node used to send the first data stream may maintain two credit counters. Among the two credit counters, the value of one credit counter (denoted as credit1) (that is, the first credit value ) is preset by the system and is used to indicate the preset amount of data traffic transmitted at one time. The value of a credit counter (recorded as credit2) (that is, the second credit value obtained from the destination node) is based on the value obtained from the destination node. Set by the credit obtained by the node, it is used to indicate the maximum data flow obtained from the destination node for one transmission. The source node can send data streams based on credit1 and credit2. Among them, the data flow sent to the destination node based on the count of credit1 is an uncontrolled data flow, and the data flow sent to the destination node based on the count of credit2 is a controlled data flow.
本申请实施例中,通过第一信用值向目的节点传输数据流,可以包括:源节点向目的节点传输请求获得第一数据流的信用值的请求信息,但在预设时间周期内未收到目的节点的反馈,则源节点直接基于网络系统中预先设定的第一信用值,向目的节点传输第一数据流。In the embodiment of this application, transmitting the data stream to the destination node through the first credit value may include: the source node transmits request information to the destination node to obtain the credit value of the first data stream, but does not receive it within the preset time period. In response to feedback from the destination node, the source node directly transmits the first data stream to the destination node based on the first credit value preset in the network system.
本申请实施例中,源节点所发送的第一数据流,未携带用于区分是否从目的节点获得数据流量大小的标识。在一种可能的实现方式中,源节点所发送的第一数据流,可以分别携带有用于区分是否从目的节点获得数据流量大小的标识。例如,可以在所发送的数据流的包头设置一比特位,“0”代表采用第一信用值发送,“1”代表采用第二信用值发送,也即“0”代表未从目的及节点获得信用值,“1” 代表已从目的节点获得信用值。In this embodiment of the present application, the first data stream sent by the source node does not carry an identifier used to distinguish whether the data traffic size is obtained from the destination node. In a possible implementation manner, the first data stream sent by the source node may respectively carry an identifier used to distinguish whether the data traffic size is obtained from the destination node. For example, a bit can be set in the packet header of the data stream being sent. "0" means sending with the first credit value, "1" means sending with the second credit value, that is, "0" means not getting it from the destination or node. Credit value, "1" Indicates that the credit value has been obtained from the destination node.
本申请实施例提供的数据传输方法和网络系统,当源节点由于向目的节点发送数据流大小未获得目的节点指示的第一数据流,导致某些节点发生拥塞(即该节点从源节点接收第一数据流的端口的缓存容量超过预设阈值时),该发生拥塞的节点通过与源节点相连接的数据链路,向源节点反馈指示源节点输出的第一数据流导致网络系统拥塞的指示信息。从而,源节点可以在网络系统拥塞时,将基于第一信用值传输的第一数据流(也即非受控数据流)转换成基于第二信用值传输的第一数据流(也即受控数据流),经目的节点分配信用值的数据流可以不受任何影响,仍然可以基于目的节点分配的信用值继续传输,由于导致网络系统发生拥塞的通常是那些未从目的节点获得数据流量大小的数据流,经目的节点分配有信用值的数据流量不会导致网络系统发生拥塞,与现有技术中对产生拥塞的数据流和非拥塞数据流不加以区分、从而影响非拥塞数据流的正常传输相比,本申请实施例提供的数据传输方法,当网络系统发生拥塞时,仅通过传输经目的节点分配的第二信用值传输数据流,也即可以保障非拥塞数据在网络系统中正常传输,从而可以在保证网络系统高吞吐率的情况下,大大降低网络系统的拥塞;另外,与现有技术中直接丢弃新传入的数据报文相比,本申请实施例提供的数据传输方法还可以保障数据流传输的准确率。According to the data transmission method and network system provided by the embodiments of the present application, when the source node fails to obtain the first data stream indicated by the destination node due to the size of the data stream sent to the destination node, congestion occurs on some nodes (that is, the node receives the third data stream from the source node). (When the cache capacity of a port of a data flow exceeds a preset threshold), the node where congestion occurs feeds back an indication to the source node through the data link connected to the source node indicating that the first data flow output by the source node has caused congestion in the network system. information. Therefore, when the network system is congested, the source node can convert the first data flow transmitted based on the first credit value (that is, the uncontrolled data flow) into the first data flow transmitted based on the second credit value (that is, the controlled data flow). Data flow), the data flow that is assigned a credit value by the destination node will not be affected in any way, and can still continue to be transmitted based on the credit value assigned by the destination node, because what causes congestion in the network system is usually those who have not obtained the amount of data traffic from the destination node. Data flow, data flow with credit value distributed through the destination node will not cause congestion in the network system, unlike the existing technology that does not distinguish between data flows that cause congestion and non-congested data flows, thus affecting the normal transmission of non-congested data flows. Compared with the data transmission method provided by the embodiments of the present application, when congestion occurs in the network system, the data stream is transmitted only by transmitting the second credit value assigned by the destination node, which means that normal transmission of non-congested data in the network system can be ensured. Therefore, the congestion of the network system can be greatly reduced while ensuring a high throughput rate of the network system; in addition, compared with directly discarding newly incoming data packets in the prior art, the data transmission method provided by the embodiment of the present application can also Ensure the accuracy of data stream transmission.
在一种可能的实现方式中,在接收第一信息之后,所述数据传输方法还包括:停止基于所述第一信用值传输所述第一数据流。In a possible implementation, after receiving the first information, the data transmission method further includes: stopping transmitting the first data stream based on the first credit value.
通过停止基于第一信用值传输数据流,也即停止传输未经目的节点分配信用值的数据流,可以实现网络系统发生拥塞时、仅传输受控数据流,从而可以在保证网络系统高吞吐率的情况下,大大降低网络系统的拥塞。By stopping the transmission of data flows based on the first credit value, that is, stopping the transmission of data flows without a credit value assigned by the destination node, it is possible to transmit only controlled data flows when congestion occurs in the network system, thereby ensuring high throughput of the network system. In this case, the congestion of the network system is greatly reduced.
在一种可能的实现方式中,所述第一数据流还包括请求获得所述第二信用值的指示信息;所述基于所述第一信息以及从所述目的节点获得的第二信用值,向所述目的节点传输所述第一数据流之前,所述数据传输方法还包括:从所述目的节点接收所述第二信用值。In a possible implementation, the first data flow also includes indication information requesting to obtain the second credit value; based on the first information and the second credit value obtained from the destination node, Before transmitting the first data stream to the destination node, the data transmission method further includes: receiving the second credit value from the destination node.
在一种可能的实现方式中,所述第一信息用于指示所述目的节点接收所述第一数据流的端口的缓存容量超过所述目的节点的预设阈值。In a possible implementation, the first information is used to indicate that the buffer capacity of the port through which the destination node receives the first data flow exceeds a preset threshold of the destination node.
在一种可能的实现方式中,所述根据预先设置的第一信用值,向目的节点传输第一数据流,包括:通过交换节点向所述目的节点传输所述第一数据流;以及所述第一信息用于指示所述交换节点接收所述第一数据流的端口的缓存容量超过所述交换节点的所述预设阈值。In a possible implementation, transmitting the first data stream to the destination node according to a preset first credit value includes: transmitting the first data stream to the destination node through a switching node; and The first information is used to indicate that the buffer capacity of the port through which the switching node receives the first data flow exceeds the preset threshold of the switching node.
在一种可能的实现方式中,所述第一信息被所述交换节点承载于所述第一数据流中传输至所述目的节点。In a possible implementation, the first information is carried by the switching node in the first data stream and transmitted to the destination node.
本申请实施例中,交换节点接收到第一数据流后,在检测到接收第一数据流的端口的缓存容量超过预设阈值时,可以将一部分第一数据流存储至交换节点的缓存中,然后将第一信息添加至第一数据流中,将添加有第一信息的这部分第一数据流传输至目的节点。In this embodiment of the present application, after the switching node receives the first data flow, and detects that the cache capacity of the port receiving the first data flow exceeds the preset threshold, a part of the first data flow can be stored in the cache of the switching node. Then, the first information is added to the first data stream, and the part of the first data stream added with the first information is transmitted to the destination node.
通过将第一信息添加至第一数据流中发送,可以减少数据流的发送次数,有利于节约网络系统的带宽。By adding the first information to the first data stream for transmission, the number of times the data stream is sent can be reduced, which is beneficial to saving bandwidth of the network system.
在一种可能的实现方式中,所述接收第一信息,包括:接收第二数据流,其中所述第二数据流承载有所述第一信息、所述第二信用值以及用于指示与所述第二信用值对应的所述第一数据流的标识。In a possible implementation, the receiving the first information includes: receiving a second data stream, wherein the second data stream carries the first information, the second credit value and an indication and The identifier of the first data stream corresponding to the second credit value.
本申请实施例中,源节点接收第二数据流可以包括两种情况,第一种情况,目的节点将第一信息添加至第二数据流中发送至源节点;第二种情况,交换节点将第一信息添加至第二数据流中发送至源节点。In the embodiment of this application, the source node receiving the second data stream may include two situations. In the first situation, the destination node adds the first information to the second data stream and sends it to the source node; in the second situation, the switching node adds the first information to the second data stream and sends it to the source node; in the second situation, the switching node The first information is added to the second data stream and sent to the source node.
通过将第一信息添加至第二数据流中发送,可以减少数据流的发送次数,有利于节约网络系统的带宽。By adding the first information to the second data stream for transmission, the number of times the data stream is sent can be reduced, which is beneficial to saving bandwidth of the network system.
在一种可能的实现方式中,在停止基于所述第一信用值传输所述第一数据流之后,所述数据传输方法还包括:接收第二信息,所述第二信息用于指示所述端口的缓存容量低于预设阈值;基于所述第二信息和所述第一信用值,向所述目的节点传输第三数据流,所述第三数据流包括多个数据包。In a possible implementation, after stopping transmitting the first data stream based on the first credit value, the data transmission method further includes: receiving second information, the second information being used to indicate the The cache capacity of the port is lower than a preset threshold; based on the second information and the first credit value, a third data flow is transmitted to the destination node, where the third data flow includes a plurality of data packets.
当接收第一数据流的端口的缓存容量低于预设阈值时,说明网络系统中拥塞解除,此时通过第一信用值向目的节点传输第三数据流,可以提高网络系统的吞吐率以及数据传输效率。When the cache capacity of the port that receives the first data stream is lower than the preset threshold, it means that the congestion in the network system is relieved. At this time, the third data stream is transmitted to the destination node through the first credit value, which can improve the throughput rate and data of the network system. transmission efficiency.
在一种可能的实现方式中,在停止基于所述第一信用值传输所述第一数据流之后,所述数据传输 方法还包括:经过预设时段,基于所述第一信用值,向所述目的节点传输第三数据流,所述第三数据流包括多个数据包。In a possible implementation, after stopping transmitting the first data stream based on the first credit value, the data transmission The method further includes: transmitting a third data stream to the destination node based on the first credit value after a preset period, where the third data stream includes a plurality of data packets.
通常,接收第一数据流的端口的缓存容量高于预设阈值一定时间后,发生拥塞的节点可以处理以使得接收第一数据流的端口的缓存容量低于预设阈值。如果源节点经过预设时段后未接收到指示接收第一数据流的端口的缓存容量低于预设阈值的信息,有可能该指示信息传输丢失,也有可能发生拥塞的节点发送该信息具有延迟。源节点经过预设时间段之后采用第一信用值传输第三数据流,可以提高提高网络系统的吞吐率以及数据传输效率。Generally, after the buffer capacity of the port that receives the first data flow is higher than the preset threshold for a certain period of time, the node where congestion occurs can process such that the buffer capacity of the port that receives the first data flow is lower than the preset threshold. If the source node does not receive information indicating that the buffer capacity of the port receiving the first data stream is lower than the preset threshold after a preset period of time, the transmission of the indication information may be lost, or there may be a delay in sending the information by the congested node. The source node uses the first credit value to transmit the third data stream after a preset period of time, which can improve the throughput rate and data transmission efficiency of the network system.
在一种可能的实现方式中,所述第二数据流为确认字符报文。In a possible implementation manner, the second data stream is a confirmation character message.
第二方面,本申请实施例提供一种数据传输系统,所述数据传输系统包括:多个芯片,所述多个芯片中的任一芯片包括端口,所述任一芯片通过端口与其他芯片通信;所述多个芯片中的源端芯片用于:接收第一数据流,所述第一数据流包括多个数据包;根据预先设置的第一信用值,向目的端芯片传输第一数据流,所述第一信用值用于指示预先设置的一次所传输的数据流量大小;接收第一信息,所述第一信息用于指示接收所述第一数据流的端口的缓存容量超过预设阈值;基于所述第一信息以及从所述目的端芯片获得的第二信用值,向所述目的端芯片传输所述第一数据流,其中,所述第二信用值用于指示从所述目的端芯片获得的一次所传输的最大数据流量。In a second aspect, embodiments of the present application provide a data transmission system. The data transmission system includes: multiple chips, any one of the multiple chips includes a port, and any one of the chips communicates with other chips through the port. ; The source chip among the plurality of chips is used to: receive a first data stream, where the first data stream includes a plurality of data packets; and transmit the first data stream to the destination chip according to a preset first credit value , the first credit value is used to indicate a preset data traffic size transmitted at one time; receiving first information, the first information is used to indicate that the cache capacity of the port receiving the first data flow exceeds a preset threshold ; Based on the first information and a second credit value obtained from the destination chip, transmit the first data stream to the destination chip, wherein the second credit value is used to indicate the data received from the destination chip. The maximum data traffic that the end chip obtains at one time.
在一种可能的实现方式中,所述接收第一信息之后,所述源端芯片还用于:停止基于所述第一信用值传输所述第一数据流。In a possible implementation, after receiving the first information, the source chip is further configured to: stop transmitting the first data stream based on the first credit value.
在一种可能的实现方式中,所述第一数据流还包括请求获得第二信用值的指示信息;所述基于所述第一信息以及从所述目的端芯片获得的第二信用值,向所述目的端芯片传输所述第一数据流之前,所述源端芯片还用于:从所述目的端芯片接收所述第二信用值。In a possible implementation, the first data stream further includes indication information requesting to obtain a second credit value; based on the first information and the second credit value obtained from the destination chip, Before the destination chip transmits the first data stream, the source chip is further configured to: receive the second credit value from the destination chip.
在一种可能的实现方式中,所述第一信息用于指示所述目的端芯片接收所述第一数据流的端口的缓存容量超过所述目的端芯片的预设阈值。In a possible implementation, the first information is used to indicate that the cache capacity of the port through which the destination chip receives the first data flow exceeds a preset threshold of the destination chip.
在一种可能的实现方式中,所述多个芯片还包括交换芯片,所述源端芯片根据预先设置的第一信用值向目的端芯片传输第一数据流时,所述源端芯片具体用于:通过所述交换芯片向所述目的端芯片传输所述第一数据流;以及所述第一信息用于指示所述交换芯片接收所述第一数据流的端口的缓存容量超过所述交换芯片的所述预设阈值。In a possible implementation, the plurality of chips further include a switching chip. When the source chip transmits the first data stream to the destination chip according to a preset first credit value, the source chip specifically uses In: transmitting the first data flow to the destination chip through the switching chip; and the first information is used to indicate that the buffer capacity of the port of the switching chip receiving the first data flow exceeds the switching capacity. The preset threshold of the chip.
在一种可能的实现方式中,所述交换芯片用于:接收所述第一数据流;将所述第一信息添加至所述第一数据流中传输至所述目的端芯片。In a possible implementation, the switching chip is configured to: receive the first data stream; add the first information to the first data stream and transmit it to the destination chip.
在一种可能的实现方式中,所述目的端芯片具体用于:接收所述第一数据流;基于所述第一信息以及所述指示信息,生成第二数据流,所述第二数据流中承载所述第一信息、所述第二信用值以及用于指示与所述第二信用值对应的所述第一数据流的标识;向所述源端芯片发送所述第二数据流。In a possible implementation, the destination chip is specifically configured to: receive the first data stream; and generate a second data stream based on the first information and the indication information. The second data stream The first information, the second credit value, and the identifier used to indicate the first data stream corresponding to the second credit value are carried in the data stream; and the second data stream is sent to the source chip.
在一种可能的实现方式中,当所述第一信息用于指示所述交换芯片接收所述第一数据流的端口的缓存容量超过所述交换芯片的所述预设阈值时,所述目的端芯片具体用于:基于所述指示信息,生成第二数据流,所述第二数据流中承载所述第二信用值,以及通过所述交换芯片向所述源端芯片传输所述第二数据流;所述交换芯片具体用于:将所述第一信息添加至所述第二数据流中,以及将所述第二数据流传输至所述源端芯片。In a possible implementation, when the first information is used to indicate that the buffer capacity of the port through which the switching chip receives the first data flow exceeds the preset threshold of the switching chip, the purpose The end chip is specifically configured to: generate a second data stream based on the indication information, the second data stream carries the second credit value, and transmit the second data stream to the source chip through the switching chip. Data flow; the switching chip is specifically configured to: add the first information to the second data flow, and transmit the second data flow to the source chip.
在一种可能的实现方式中,在停止基于所述第一信用值传输所述第一数据流之后,所述源端芯片还用于:接收第二信息,所述第二信息用于指示所述端口的缓存容量低于预设阈值;基于所述第二信息和所述第一信用值,向所述目的端芯片传输第三数据流,所述第三数据流包括多个数据包。In a possible implementation, after stopping transmitting the first data stream based on the first credit value, the source chip is further configured to: receive second information, where the second information is used to indicate the The cache capacity of the port is lower than a preset threshold; based on the second information and the first credit value, a third data stream is transmitted to the destination chip, where the third data stream includes a plurality of data packets.
在一种可能的实现方式中,在停止基于所述第一信用值传输所述第一数据流之后,所述源端芯片还用于:经过预设时段,基于所述第一信用值,向所述目的端芯片传输第三数据流,所述第三数据流包括多个数据包。In a possible implementation, after stopping transmitting the first data stream based on the first credit value, the source chip is further configured to: after a preset period of time, based on the first credit value, The destination chip transmits a third data stream, and the third data stream includes a plurality of data packets.
在一种可能的实现方式中,所述第二数据流为确认字符报文。In a possible implementation manner, the second data stream is a confirmation character message.
第三方面,本申请实施例提供一种装置,该装置可以包括:第一接收单元,用于接收第一数据流,所述第一数据流包括多个数据包;第一发送单元,用于根据预先设置的第一信用值,向目的节点传输第一数据流,所述第一信用值用于指示预先设置的一次所传输的数据流量大小;第二接收单元,用于接收第一信息,所述第一信息用于指示接收所述第一数据流的端口的缓存容量超过预设阈值;第二发 送单元,基于所述第一信息以及从所述目的节点获得的第二信用值,向所述目的节点传输所述第一数据流,其中,所述第二信用值用于指示从所述目的节点获得的一次所传输的最大数据流量。In a third aspect, embodiments of the present application provide a device, which may include: a first receiving unit, configured to receive a first data stream, where the first data stream includes a plurality of data packets; and a first sending unit, configured to Transmit the first data stream to the destination node according to a preset first credit value, where the first credit value is used to indicate the preset amount of data traffic transmitted at one time; the second receiving unit is used to receive the first information, The first information is used to indicate that the buffer capacity of the port receiving the first data flow exceeds a preset threshold; the second The transmitting unit transmits the first data stream to the destination node based on the first information and a second credit value obtained from the destination node, wherein the second credit value is used to indicate the data flow from the destination node. The maximum amount of data traffic a node gets transmitted at one time.
在一种可能的实现方式中,所述装置还包括:停止传输单元,用于停止基于所述第一信用值传输所述第一数据流。In a possible implementation, the device further includes: a transmission stopping unit configured to stop transmitting the first data stream based on the first credit value.
在一种可能的实现方式中,所述第一数据流还包括请求获得第二信用值的指示信息。In a possible implementation, the first data stream further includes indication information requesting to obtain the second credit value.
在一种可能的实现方式中,所述第一信息用于指示所述目的节点接收所述第一数据流的端口的缓存容量超过所述目的节点的预设阈值。在一种可能的实现方式中,所述第一信息用于指示所述目的节点接收所述第一数据流的端口的缓存容量超过所述目的节点的预设阈值。In a possible implementation, the first information is used to indicate that the buffer capacity of the port through which the destination node receives the first data flow exceeds a preset threshold of the destination node. In a possible implementation, the first information is used to indicate that the buffer capacity of the port through which the destination node receives the first data flow exceeds a preset threshold of the destination node.
在一种可能的实现方式中,所述第一信息被所述交换节点承载于所述第一数据流中传输至所述目的节点。In a possible implementation, the first information is carried by the switching node in the first data stream and transmitted to the destination node.
在一种可能的实现方式中,所述第二接收单元具体用于:接收第二数据流,其中所述第二数据流承载有所述第一信息、所述第二信用值以及用于指示与所述第二信用值对应的所述第一数据流的标识。In a possible implementation manner, the second receiving unit is specifically configured to: receive a second data stream, where the second data stream carries the first information, the second credit value, and an identifier for indicating the first data stream corresponding to the second credit value.
在一种可能的实现方式中,在停止基于所述第一信用值传输所述第一数据流之后,所述装置还包括:第三接收单元,用于接收第二信息,所述第二信息用于指示所述端口的缓存容量低于预设阈值;第三发送单元,用于基于所述第二信息和所述第一信用值,向所述目的节点传输第三数据流,所述第三数据流包括多个数据包。In a possible implementation, after stopping transmitting the first data stream based on the first credit value, the device further includes: a third receiving unit configured to receive second information, the second information used to indicate that the buffer capacity of the port is lower than a preset threshold; a third sending unit used to transmit a third data stream to the destination node based on the second information and the first credit value, the third Three data streams include multiple data packets.
在一种可能的实现方式中,在停止基于所述第一信用值传输所述第一数据流之后,所述装置还包括:第三发送单元,用于经过预设时段,基于所述第一信用值,向所述目的节点传输第三数据流,所述第三数据流包括多个数据包。In a possible implementation, after stopping transmitting the first data stream based on the first credit value, the device further includes: a third sending unit configured to transmit the first data stream based on the first credit value after a preset period of time. The credit value is used to transmit a third data stream to the destination node, where the third data stream includes a plurality of data packets.
第四方面,本申请实施例提供一种芯片,所述芯片包括处理器、存储器、缓存和端口;其中,所述端口基于所述处理器的控制,用于与所述芯片外部的其他芯片通信,以从其他芯片接收数据流、或者向他芯片传输数据流,以及将所接收到的数据流存储至所述缓存;所述处理器用于执行所述存储器中的程序指令,以实现第一方面所述的数据传输方法。In a fourth aspect, embodiments of the present application provide a chip, which includes a processor, a memory, a cache, and a port; wherein the port is controlled by the processor and is used to communicate with other chips outside the chip. , to receive data streams from other chips or transmit data streams to other chips, and store the received data streams into the cache; the processor is used to execute program instructions in the memory to implement the first aspect Described data transmission method.
第五方面,本申请实施例提供一种计算机可读存储介质,所述计算机可读存储介质存储中存储有计算机程序,该计算机程序被控制器执行时用于实现如第一方面所述的数据传输方法。In a fifth aspect, embodiments of the present application provide a computer-readable storage medium. The computer-readable storage medium stores a computer program. When executed by the controller, the computer program is used to implement the data as described in the first aspect. Transmission method.
第六方面,本申请实施例提供一种计算机程序产品,当所述计算机程序产品被控制器执行时用于实现上述第一方面所述的数据传输方法。In a sixth aspect, embodiments of the present application provide a computer program product, which is used to implement the data transmission method described in the first aspect when the computer program product is executed by a controller.
应当理解的是,本申请的第二至六方面与本申请的第一方面的技术方案一致,各方面及对应的可行实施方式所取得的有益效果相似,不再赘述。It should be understood that the second to sixth aspects of the present application are consistent with the technical solution of the first aspect of the present application, and the beneficial effects achieved by each aspect and corresponding feasible implementations are similar, and will not be described again.
附图说明Description of the drawings
为了更清楚地说明本申请实施例的技术方案,下面将对本申请实施例的描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动性的前提下,还可以根据这些附图获得其他的附图。In order to explain the technical solutions of the embodiments of the present application more clearly, the drawings needed to be used in the description of the embodiments of the present application will be briefly introduced below. Obviously, the drawings in the following description are only some embodiments of the present application. , for those of ordinary skill in the art, other drawings can also be obtained based on these drawings without exerting creative labor.
图1是本申请实施例提供的现有技术中用于缓解节点拥塞的结构示意图;Figure 1 is a schematic structural diagram of the prior art used to alleviate node congestion provided by the embodiment of the present application;
图2是本申请实施例提供的网络系统的一个架构示意图;FIG2 is a schematic diagram of an architecture of a network system provided in an embodiment of the present application;
图3是本申请实施例提供的如图2所示的网络系统的节点结构示意图;Figure 3 is a schematic diagram of the node structure of the network system shown in Figure 2 provided by an embodiment of the present application;
图4A~图4C是本申请实施例提供的源节点基于credit1和credit2发送数据流的场景示意图;Figures 4A to 4C are schematic diagrams of scenarios in which the source node sends data streams based on credit1 and credit2 provided by the embodiment of the present application;
图5是本申请实施例提供的如图2所示的网络系统中各节点之间的交互流程示意图;Figure 5 is a schematic diagram of the interaction flow between nodes in the network system shown in Figure 2 provided by an embodiment of the present application;
图6A~图6B是本申请实施例提供的如图5所示的交互流程的场景示意图;Figures 6A to 6B are schematic diagrams of the interaction process shown in Figure 5 provided by the embodiment of the present application;
图7本申请实施例提供的如图2所示的网络系统中各节点之间的又一个交互流程示意图;Figure 7 is another schematic diagram of the interaction process between nodes in the network system shown in Figure 2 provided by the embodiment of the present application;
图8是本申请实施例提供的如图7所示的交互流程的场景示意图;FIG8 is a schematic diagram of a scenario of the interaction process shown in FIG7 provided in an embodiment of the present application;
图9本申请实施例提供的如图2所示的网络系统中各节点之间的又一个交互流程示意图;Figure 9 is another schematic diagram of the interaction process between nodes in the network system shown in Figure 2 provided by the embodiment of the present application;
图10是本申请实施例提供的如图8所示的交互流程的场景示意图;Figure 10 is a schematic diagram of the interaction process shown in Figure 8 provided by the embodiment of the present application;
图11本申请实施例提供的如图2所示的网络系统中各节点之间的又一个交互流程示意图;Figure 11 is another schematic diagram of the interaction process between nodes in the network system shown in Figure 2 provided by the embodiment of the present application;
图12是本申请实施例提供的数据传输方法的一个流程图;Figure 12 is a flow chart of the data transmission method provided by the embodiment of the present application;
图13是本申请实施例提供的装置的一个结构示意图。 Figure 13 is a schematic structural diagram of the device provided by the embodiment of the present application.
具体实施方式Detailed ways
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application. Obviously, the described embodiments are part of the embodiments of the present application, rather than all of the embodiments. Based on the embodiments in this application, all other embodiments obtained by those of ordinary skill in the art without creative efforts fall within the scope of protection of this application.
本文所提及的"第一"、"第二"以及类似的词语并不表示任何顺序、数量或者重要性,而只是用来区分不同的组成部分。同样,"一个"或者"一"等类似词语也不表示数量限制,而是表示存在至少一个。"First", "second" and similar words mentioned herein do not indicate any order, quantity or importance, but are only used to distinguish different components. Likewise, similar words such as "a" or "one" do not indicate a quantitative limit, but rather indicate the presence of at least one.
在本申请实施例中,“示例性的”或者“例如”等词用于表示例子、例证或说明。本申请实施例中被描述为“示例性的”或者“例如”的任何实施例或设计方案不应被解释为比其它实施例或设计方案更优选或更具优势。确切而言,使用“示例性的”或者“例如”等词旨在以具体方式呈现相关概念。在本申请实施例的描述中,除非另有说明,“多个”的含义是指两个或两个以上。In the embodiments of this application, words such as "exemplary" or "for example" are used to express examples, illustrations or illustrations. Any embodiment or design described as "exemplary" or "such as" in the embodiments of the present application is not to be construed as preferred or advantageous over other embodiments or designs. Rather, use of the words "exemplary" or "such as" is intended to present the concept in a concrete manner. In the description of the embodiments of this application, unless otherwise specified, the meaning of “plurality” refers to two or more.
现有技术的网络系统中,通常采用源节点直接将数据流推送至目的节点的数据传输方式,以降低数据流的传输时延。在采用该种数据传输方式下,当网络系统中数据流量激增时,该种方式通常会导致网络系统中的某些节点发送拥塞。当网络节点发生拥塞后,现有技术中通常采用本地流量控制的方式或者源端流量控制的方式来缓解拥塞。如图1所示。图1示出了本地流量控制的方式缓解拥塞。在图1中,示出了源节点S1、目的节点D1和交换节点A1、B1、B2和C1。当采用本地流量控制的方式缓解拥塞时,假设交换节点C1发生拥塞,当交换节点C1的缓存快用满时,可以选择直接丢弃新传入的数据报文;或者,还可以采用基于优先级的流量控制(FPC,priority-based flow control)的方式,即交换节点C1通知上游节点B1停止发送数据流。当采用源端流量控制的方式缓解拥塞时,假设交换节点C1发生拥塞,交换节点C1可以采用提前拥塞通知(ECN,Early Congestion Notification)的方式,向源节点S1发送信息,以降低源节点S1的数据流传输速率。综上可以看出,以上两种缓解拥塞的方式,对产生拥塞的数据流和非拥塞数据流不加以区分,这就无法保证非拥塞数据流的正常传输;另外,当本地流量控制方式选择直接丢弃新传入的数据报文时,还容易造成丢包问题;源端流量控制方式在网络系统重载的情况下无法保持高吞吐量。In existing network systems, a data transmission method is usually adopted in which the source node directly pushes the data stream to the destination node to reduce the transmission delay of the data stream. When this data transmission method is used, when data traffic surges in the network system, this method usually causes transmission congestion at some nodes in the network system. When a network node is congested, local flow control or source flow control is usually used to alleviate the congestion in the existing technology. As shown in Figure 1. Figure 1 illustrates local flow control as a way to alleviate congestion. In Figure 1, a source node S1, a destination node D1 and switching nodes A1, B1, B2 and C1 are shown. When local flow control is used to alleviate congestion, assuming that switching node C1 is congested, and when the cache of switching node C1 is almost full, you can choose to directly discard newly incoming data packets; or, you can also use priority-based FPC (priority-based flow control) method, that is, the switching node C1 notifies the upstream node B1 to stop sending data flows. When source-side flow control is used to alleviate congestion, assuming that switching node C1 is congested, switching node C1 can use early congestion notification (ECN, Early Congestion Notification) to send information to source node S1 to reduce the load of source node S1. Data stream transfer rate. In summary, it can be seen that the above two methods of alleviating congestion do not distinguish between data flows that cause congestion and non-congested data flows, which cannot guarantee the normal transmission of non-congested data flows; in addition, when the local flow control method selects direct When discarding newly incoming data packets, it is easy to cause packet loss problems; the source-side flow control method cannot maintain high throughput when the network system is overloaded.
基于以上现有技术的不足,业界进一步提出采用推拉混合调度的方式进行数据流量的传输,在该种实现方式中,源节点S1可以从目的节点D1获得某条数据流的信用值(credit),该信用值用于指示从所述目的节点D1获得的一次所传输的最大数据流量。当源节点S1接收到credit,源节点S1就按credit发送数据流到目的节点D1,从而可以降低网络拥塞,以实现数据网络的高吞吐量。上述基于credit的数据流也可以称为受控数据流。另外,源节点S1还可以发送非受控数据流,以降低数据流的传输时延,该非受控数据流具体为,未经过目的节点分配credit的数据流,其通常为源节点S1基于系统预先设置的credit发送至目的节点的最新的数据流。源节点S1可以同时发送受控数据流和非受控数据流。例如,源节点S1通过目的节点D1分配的credit向目的节点D1发送受控数据流的同时,还向目的节点D1发送非受控数据流。该种数据流量传输方式中,为了提高传输速率,系统预先设置的credit通常较高,其通常不考虑网络中的数据吞吐量。也即导致网络中各节点拥塞的通常为非受控数据流。当交换节点C1发生拥塞时,交换节点C1并不检测导致拥塞的是受控数据流还是非受控数据流,只要导致拥塞的数据流是由源节点S1引起的,那么交换节点C1即向源节点S1传输指示源节点S1传输的数据流产生拥塞的信息,从而源节点S1停止向网络中传输数据流或者以降低网络带宽的方式向数据网络中传输数据。由于受控数据流并不会导致网络节点发生拥塞,这就导致源节点S1发出的受控数据流无法传输网络中,降低了网络系统的吞吐量。Based on the shortcomings of the above existing technologies, the industry further proposes to use push-pull hybrid scheduling to transmit data traffic. In this implementation, the source node S1 can obtain the credit value (credit) of a certain data flow from the destination node D1. The credit value is used to indicate the maximum data flow obtained from the destination node D1 for one transmission. When the source node S1 receives the credit, the source node S1 sends the data stream to the destination node D1 according to the credit, which can reduce network congestion and achieve high throughput of the data network. The above credit-based data flow can also be called controlled data flow. In addition, the source node S1 can also send an uncontrolled data stream to reduce the transmission delay of the data stream. The uncontrolled data stream is specifically a data stream that has not been allocated credit by the destination node. It is usually a data stream based on the system of the source node S1. The latest data stream sent to the destination node with a preset credit. Source node S1 can send controlled data streams and uncontrolled data streams at the same time. For example, while the source node S1 sends a controlled data flow to the destination node D1 through the credit allocated by the destination node D1, it also sends an uncontrolled data flow to the destination node D1. In this data traffic transmission method, in order to increase the transmission rate, the credit preset by the system is usually higher, which usually does not consider the data throughput in the network. That is, what causes congestion at various nodes in the network is usually uncontrolled data flow. When congestion occurs at switching node C1, switching node C1 does not detect whether the congestion is caused by controlled data flow or uncontrolled data flow. As long as the data flow causing congestion is caused by source node S1, switching node C1 will send data to the source node S1. The node S1 transmits information indicating that the data flow transmitted by the source node S1 is congested, so that the source node S1 stops transmitting the data flow to the network or transmits data to the data network in a manner that reduces network bandwidth. Since the controlled data flow does not cause network node congestion, this causes the controlled data flow sent by the source node S1 to be unable to be transmitted in the network, reducing the throughput of the network system.
本申请实施例提供的数据传输方法和数据传输系统,在上述推拉混合调度的方式进行数据流量传输的基础上,当源节点由于向目的节点发送未经目的节点分配信用值的数据流导致某些节点发生拥塞(即该网络节点从源节点所接收的数据流超过预设阈值、或者网络节点中与源节点相连接的某端口的缓存增速超过预设阈值)时,该发生拥塞的节点通过与源节点相连接的数据链路,向源节点反馈指示源节点输出的数据流导致网络系统拥塞的指示信息。从而,源节点可以在网络系统拥塞时,仅向目的节点传输经目的节点分配有信用值的数据流,由于导致网络系统发生拥塞的通常是那些未经目的节点分配信用值的数据流量,经目的节点分配有信用值的数据流量不会导致网络系统发生拥塞,与现有技术中对产生拥塞的数据流和非拥塞数据流不加以区分,源节点停止向目的节点传输数据流,从而影响 非拥塞数据流的正常传输相比,本申请实施例提供的数据传输方法,当网络系统发生拥塞时,仅传输经目的节点分配信用值的数据流,也即可以保障非拥塞数据在网络系统中正常传输,从而可以在保证网络系统高吞吐率的情况下,大大降低网络系统的拥塞;与现有技术中发生拥塞时,源节点以降低网络中带宽的方式向数据网络中传输数据相比,本申请实施例由于可以继续基于从目的端获得的受控信用值传输数据,可以不需要降低网络中的带宽,保证网络系统高吞吐率;另外,与现有技术中直接丢弃新传入的数据报文相比,本申请实施例提供的数据传输方法还可以保障数据流传输的准确率。需要说明的是,本申请实施例中,未经目的节点分配信用至的数据流,也即是数据流量大小未经目的节点指示的数据流。The data transmission method and data transmission system provided by the embodiments of the present application are based on the above-mentioned push-pull hybrid scheduling method for data traffic transmission. When the source node causes congestion in some nodes due to sending a data stream to the destination node that has not been assigned a credit value by the destination node (that is, the data stream received by the network node from the source node exceeds a preset threshold, or the cache growth rate of a port connected to the source node in the network node exceeds a preset threshold), the congested node feeds back to the source node through the data link connected to the source node an indication that the data stream output by the source node has caused congestion in the network system. Thus, when the network system is congested, the source node can only transmit to the destination node data streams that have been assigned a credit value by the destination node. Since the data streams that cause congestion in the network system are usually those that have not been assigned a credit value by the destination node, the data streams that have been assigned a credit value by the destination node will not cause congestion in the network system. This is different from the prior art that does not distinguish between congested data streams and non-congested data streams, and the source node stops transmitting data streams to the destination node, thereby affecting Compared with the normal transmission of non-congested data streams, the data transmission method provided in the embodiment of the present application, when the network system is congested, only transmits the data stream that is allocated credit value by the destination node, that is, it can ensure the normal transmission of non-congested data in the network system, so that the congestion of the network system can be greatly reduced while ensuring the high throughput of the network system; compared with the prior art in which the source node transmits data to the data network by reducing the bandwidth in the network when congestion occurs, the embodiment of the present application can continue to transmit data based on the controlled credit value obtained from the destination end, so that the bandwidth in the network does not need to be reduced, thereby ensuring the high throughput of the network system; in addition, compared with the prior art in which the newly incoming data packets are directly discarded, the data transmission method provided in the embodiment of the present application can also ensure the accuracy of data stream transmission. It should be noted that in the embodiment of the present application, the data stream that is not allocated credit by the destination node is the data stream whose data flow size is not indicated by the destination node.
本申请实施例提供的数据传输方法,可以应用于各种网络系统中。例如,数据中心网络系统、高性能计算机群网络系统、云网络系统或者芯粒合封的片上网络系统等。本申请实施例以数据中心网络系统为例,对本申请实施例提供的网络系统进行详细描述。请参考图2,图2是本申请实施例提供的一个网络系统100的结构示意图。在图2中,网络系统100可以为leaf-spine(叶-脊)网络架构。在网络系统100中,包括多个leaf节点和多个spine节点,多个leaf节点和多个spine节点之间全连接。leaf节点和spine节点也可以称为交换节点或者路由节点。图2中示意性的示出了四个leaf节点,分别为节点a1、节点a2、节点a3和节点a4,图2中示意性的示出了两个spine节点,分别为节点b1和节点b2。leaf节点的下行端口与需要进行数据流量交换的服务器连接,上行端口与spine节点连接。如图2所示的网络系统中,与不同leaf节点连接的服务器之间进行数据交换时,可以通过该不同leaf节点共同连接的spine节点实现。例如,节点a1和节点a2连接的服务器需要进行数据交换,则节点a1可以将其连接的服务器S11的数据流发送给节点b1,由节点b1发送给节点a2,从而由节点a2将数据流传输给服务器S21。The data transmission method provided by the embodiments of this application can be applied to various network systems. For example, data center network systems, high-performance computer cluster network systems, cloud network systems, or chip-packaged on-chip network systems. This embodiment of the present application takes a data center network system as an example to describe in detail the network system provided by the embodiment of the present application. Please refer to Figure 2, which is a schematic structural diagram of a network system 100 provided by an embodiment of the present application. In Figure 2, the network system 100 may be a leaf-spine (leaf-spine) network architecture. The network system 100 includes multiple leaf nodes and multiple spine nodes, and the multiple leaf nodes and multiple spine nodes are fully connected. Leaf nodes and spine nodes can also be called switching nodes or routing nodes. Figure 2 schematically shows four leaf nodes, namely node a1, node a2, node a3 and node a4. Figure 2 schematically shows two spine nodes, respectively node b1 and node b2. The downlink port of the leaf node is connected to the server that needs to exchange data traffic, and the uplink port is connected to the spine node. In the network system shown in Figure 2, data exchange between servers connected to different leaf nodes can be implemented through the spine nodes commonly connected to the different leaf nodes. For example, if the servers connected by node a1 and node a2 need to exchange data, node a1 can send the data stream of the connected server S11 to node b1, and node b1 can send it to node a2, so that node a2 can transmit the data stream to node a2. Server S21.
本申请实施例中,网络系统100用于发送数据流的leaf节点可以称为源节点,用于接收数据流的leaf节点可以称为目的节点,用于传输数据流的spine节点可以称为交换节点。例如,服务器S11向服务器S21传输数据,则服务器S11所接入的leaf节点a1可以称为源节点,服务器S21所接入的leaf节点a21可以称为目的节点。需要说明的是,本申请实施例是以数据中心网络系统为例进行描述的,并不用于对方案的限定。例如,在其他场景中,服务器可以通过多个交换机与终端设备进行通信,在该种情况下,服务器可以为源节点,终端设备可以为目的节点;再例如,两个终端设备之间通过服务器以及交换机进行通信,用于发送数据流的终端设备为源节点,用于接收数据流的设备为目的节点。In this embodiment of the present application, the leaf node used by the network system 100 to send data flows may be called a source node, the leaf node used to receive data flows may be called a destination node, and the spine node used to transmit data flows may be called a switching node. . For example, when server S11 transmits data to server S21, the leaf node a1 connected to server S11 can be called the source node, and the leaf node a21 connected to server S21 can be called the destination node. It should be noted that the embodiments of this application are described using a data center network system as an example and are not used to limit the solution. For example, in other scenarios, the server can communicate with the terminal device through multiple switches. In this case, the server can be the source node and the terminal device can be the destination node; for another example, two terminal devices can communicate with each other through the server and The switch communicates, the terminal device used to send the data flow is the source node, and the device used to receive the data flow is the destination node.
如图2所示的网络系统100中,各节点的结构可以如图3所示。图3为本申请实施例提供的节点的结构示意图,该节点可以为网络系统100中的源节点、目的节点或者交换节点,例如图2所示的leaf节点或者spine节点。实际应用中,网络系统100中的交换节点可以为交换机、路由器、或者其他网络设备等。图3仅是各节点的一个示例,可替换地,本申请实施例提供的节点还可以是任一类型设备,例如芯片或芯片组或搭载有芯片或芯片组的电路板等,本实施例对此不限定。该芯片或芯片组或搭载有芯片或芯片组的电路板可在适合的软件驱动下工作。本中请实施例以交换机芯片为例对该节点的结构进行说明。参见图3,该节点包括处理器、存储器和端口。可选地,所述处理器、存储器和端口可以集成在一个或多个芯片内,该一个或多个芯片可以被视为是一个芯片组。其中,处理器通过运行或执行存储在存储器内的软件程序,以及调用存储在存储器内的指令和数据,执行该节点的各种功能。处理器可以包括一个或者多个模块,比如,包括中央处理单元(central processing unit,CPU)和网络处理器(network processor,NP),该网络处理器可以由专用集成电路(application-specific integrated circuit,ASIC)或现场可编程门阵列(field-Programmable Gate array,FPGA)芯片实现。存储器可用于存储软件程序、指令以及数据,可以由任何类型的易失性或非易失性存储器或者它们的组合实现,例如包括静态随机存取存储器(SRAM)、动态随机随机存取存储器(SDRAM)、双倍速率同步动态随机存取存储器(DDR)、可擦除可编程只读存储器(EPROM)和只读存储器(ROM)中的一项或多项。一个节点可以包括多个端口,图中示意性的示出了n个。该多个端口中,一部分端口被配置为该节点的输入端口,以从其他节点接收数据,另外一部分端口被配置为该节点的输出端口,以向其他节点或者服务器发送数据。需要说明的是,本申请实施例对各节点所包括的端口数目不做具体限定,各节点所包括的端口数目根据场景的需要设置。In the network system 100 shown in Figure 2, the structure of each node can be as shown in Figure 3. Figure 3 is a schematic structural diagram of a node provided by an embodiment of the present application. The node can be a source node, a destination node or a switching node in the network system 100, such as the leaf node or spine node shown in Figure 2. In practical applications, the switching nodes in the network system 100 may be switches, routers, or other network devices. Figure 3 is only an example of each node. Alternatively, the node provided by the embodiment of the present application can also be any type of device, such as a chip or chipset, or a circuit board equipped with a chip or chipset, etc. This embodiment provides This is not limited. The chip or chipset or the circuit board equipped with the chip or chipset can operate under a suitable software driver. This embodiment uses a switch chip as an example to describe the structure of the node. Referring to Figure 3, the node includes a processor, memory, and ports. Alternatively, the processor, memory and ports may be integrated into one or more chips, which may be regarded as a chipset. The processor performs various functions of the node by running or executing software programs stored in the memory and calling instructions and data stored in the memory. The processor may include one or more modules, such as a central processing unit (CPU) and a network processor (NP). The network processor may be composed of an application-specific integrated circuit (application-specific integrated circuit). ASIC) or field-Programmable Gate array (FPGA) chip implementation. Memory can be used to store software programs, instructions, and data, and can be implemented by any type of volatile or non-volatile memory or a combination thereof, including, for example, static random access memory (SRAM), dynamic random access memory (SDRAM) ), one or more of double-rate synchronous dynamic random access memory (DDR), erasable programmable read-only memory (EPROM), and read-only memory (ROM). A node may include multiple ports, n is schematically shown in the figure. Among the plurality of ports, some ports are configured as input ports of the node to receive data from other nodes, and other ports are configured as output ports of the node to send data to other nodes or servers. It should be noted that the embodiment of the present application does not specifically limit the number of ports included in each node. The number of ports included in each node is set according to the needs of the scenario.
另外,本申请实施例中,上述存储器还可以包括缓存,该缓存用于存储其他节点传输的数据流。 第一种可能的实现方式中,网络系统100中每一个节点上的缓存,均为单点缓存,也即节点中的缓存被划分为多个缓存空间,每一个缓存空间专供其中一个端口使用。例如,假设图2所示的网络系统100中的节点a1包括10个端口,则节点a1中的缓存可以被划分为10个缓存空间,缓存空间与端口一一对应。例如,节点a1通过端口port1接收数据流,将所接收的数据流存储于与port1对应的缓存空间中。第二种可能的实现方式中,网络系统100中每一个节点上的缓存,均为动态共享缓存,也即节点中的缓存被划分为多个缓存空间,节点上多个端口所接收的数据流均可以存储至同一个缓存空间中。例如,节点a1通过端口port1、port2分别接收数据流,将所接收到的数据里均存储至同一个缓存空间中。在该种情况下,每一个缓存空间的容量、以及各个缓存空间与端口之间的映射关系可以基于场景的需要进行动态调节。本申请实施例下文中以如上所述的单点缓存为例,进行描述。In addition, in this embodiment of the present application, the above-mentioned memory may also include a cache, which is used to store data streams transmitted by other nodes. In the first possible implementation, the cache on each node in the network system 100 is a single-point cache, that is, the cache in the node is divided into multiple cache spaces, and each cache space is dedicated to one of the ports. . For example, assuming that node a1 in the network system 100 shown in FIG. 2 includes 10 ports, the cache in node a1 can be divided into 10 cache spaces, and the cache spaces correspond to the ports one-to-one. For example, node a1 receives a data stream through port port1 and stores the received data stream in the cache space corresponding to port1. In the second possible implementation, the cache on each node in the network system 100 is a dynamic shared cache, that is, the cache in the node is divided into multiple cache spaces, and the data streams received by multiple ports on the node Both can be stored in the same cache space. For example, node a1 receives data streams through ports port1 and port2 respectively, and stores the received data in the same cache space. In this case, the capacity of each cache space and the mapping relationship between each cache space and ports can be dynamically adjusted based on the needs of the scenario. The embodiments of the present application will be described below by taking the above-mentioned single-point cache as an example.
基于图3所述的各节点的结构,请继续参考图2,图2中示意性的示出了每个节点均包括三个端口。节点a1包括端口Pa11、端口Pa12和端口Pa13,节点a2包括端口Pa21、端口Pa22和端口Pa23,节点a3包括端口Pa31、端口Pa32和端口Pa33,节点a4包括端口Pa41、端口Pa42和端口Pa43,节点b1包括端口Pb11、端口Pb12和端口Pb13,节点b2包括端口Pb21、端口Pb122和端口Pb23。在图2中,节点a1的端口Pa11与服务器S11和服务器S12连接,节点a1的端口Pa12,与节点b1的端口Pb11连接,节点a1的端口Pa13与节点b2的端口Pb21连接;节点a2的端口Pa21与服务器S21和服务器S22连接,节点a2的端口Pa22,与节点b1的端口Pb11连接,节点a2的端口Pa23与节点b2的端口Pb22连接;节点a3的端口Pa31与服务器S31和服务器S32连接,节点a3的端口Pa32,与节点b1的端口Pb12连接,节点a3的端口Pa33与节点b2的端口Pb23连接;节点a4的端口Pa41与服务器S41和服务器S42连接,节点a4的端口Pa42,与节点b1的端口Pb13连接,节点a4的端口Pa43与节点b2的端口Pb23连接。需要说明的是,如图2所示的节点的数目、每个节点所包括的端口的数目、以及各节点之间端口的连接关系均为示意性的,其基于应用场景的需要来设置,本申请实施例不做具体限定。例如,在其他应用场景中,节点a1的端口Pa12可以同时与节点b1的端口Pb11和节点b2的端口Pb21连接。Based on the structure of each node described in Figure 3, please continue to refer to Figure 2, which schematically shows that each node includes three ports. Node a1 includes port Pa11, port Pa12 and port Pa13, node a2 includes port Pa21, port Pa22 and port Pa23, node a3 includes port Pa31, port Pa32 and port Pa33, node a4 includes port Pa41, port Pa42 and port Pa43, node b1 It includes port Pb11, port Pb12 and port Pb13, and node b2 includes port Pb21, port Pb122 and port Pb23. In Figure 2, the port Pa11 of the node a1 is connected to the server S11 and the server S12, the port Pa12 of the node a1 is connected to the port Pb11 of the node b1, the port Pa13 of the node a1 is connected to the port Pb21 of the node b2; the port Pa21 of the node a2 Connected to server S21 and server S22, the port Pa22 of node a2 is connected to the port Pb11 of node b1, the port Pa23 of node a2 is connected to the port Pb22 of node b2; the port Pa31 of node a3 is connected to server S31 and server S32, node a3 Port Pa32 is connected to port Pb12 of node b1, port Pa33 of node a3 is connected to port Pb23 of node b2; port Pa41 of node a4 is connected to server S41 and server S42, port Pa42 of node a4 is connected to port Pb13 of node b1 Connection, the port Pa43 of node a4 is connected to the port Pb23 of node b2. It should be noted that the number of nodes, the number of ports included in each node, and the connection relationship between the ports shown in Figure 2 are schematic and are set based on the needs of the application scenario. The application examples are not specifically limited. For example, in other application scenarios, the port Pa12 of the node a1 can be connected to the port Pb11 of the node b1 and the port Pb21 of the node b2 at the same time.
基于图2所示的网络系统100,本申请实施例中,用于发送数据流的源节点中可以维护有两种credit计数器,该两种credit计数器中,一种credit计数器(记为credit1)的数值(也即预设信用值)是系统预先设置的,其用于指示预先设置的一次所传输的数据流量大小,一种credit计数器(记为credit2)的数值(也即从目的节点获得的信用值)是基于从目的节点获得的credit而设置的,其用于指示从目的节点获得的一次所传输的最大数据流量。源节点可以基于credit1和credit2,发送数据流。其中,基于credit1的计数向目的节点发送的数据流为非受控数据流,基于credit2的计数向目的节点发送的数据流为受控数据流。下面以图2所示的节点a1为源节点、节点a3为目的节点为例,通过图4A~图4C所示的场景,对credit1的数值和credit2的数值的配置、以及源节点基于该两种credit计数器进行数据流的发送,进行更为详细的描述。Based on the network system 100 shown in Figure 2, in this embodiment of the present application, the source node used to send the data stream can maintain two credit counters. Among the two credit counters, one credit counter (denoted as credit1) The value (that is, the preset credit value) is preset by the system. It is used to indicate the preset amount of data traffic transmitted at one time. The value of a credit counter (recorded as credit2) (that is, the credit obtained from the destination node value) is set based on the credit obtained from the destination node, which is used to indicate the maximum data flow obtained from the destination node for one transmission. The source node can send data streams based on credit1 and credit2. Among them, the data flow sent to the destination node based on the count of credit1 is an uncontrolled data flow, and the data flow sent to the destination node based on the count of credit2 is a controlled data flow. Taking node a1 as the source node and node a3 as the destination node shown in Figure 2 as an example, through the scenarios shown in Figures 4A to 4C, the configuration of the value of credit1 and the value of credit2, and the source node based on these two The credit counter transmits the data stream and is described in more detail.
在图4A所示的场景中,假设节点a1当前为初始状态。在该初始状态下,节点a1中credit1的计数为满的状态,假设为10;节点a1中credit2的计数0。节点a1从服务器接收到数据流D11,数据流D11是一条新的数据流,节点a1基于credit1中的数值,向目的节点a3传输数据流D11,所传输的数据流D11的数据流量大小对应于credit1的数值。数据流D11是基于credit1发送的,也即为非受控数据流。此外,该数据流D11中还携带有指示获得数据流D11的信用值的指示信息。In the scenario shown in FIG4A , it is assumed that node a1 is currently in the initial state. In this initial state, the credit1 count in node a1 is full, assuming it is 10; the credit2 count in node a1 is 0. Node a1 receives data stream D11 from the server. Data stream D11 is a new data stream. Based on the value in credit1, node a1 transmits data stream D11 to destination node a3. The data flow of the transmitted data stream D11 corresponds to the value of credit1. Data stream D11 is sent based on credit1, that is, it is an uncontrolled data stream. In addition, the data stream D11 also carries indication information indicating the credit value of data stream D11.
经过图4A所示的场景,credit1的计数变为0。credit1中的数值基于系统预设的速率进行数值填充,经过一定时间,假设credit1的计数变为3,而节点a1从节点a3获取到数据流D11的信用值,假设该信用值同样为10。After the scenario shown in Figure 4A, the count of credit1 becomes 0. The value in credit1 is filled based on the system's preset rate. After a certain period of time, it is assumed that the count of credit1 becomes 3, and node a1 obtains the credit value of data stream D11 from node a3. Assume that the credit value is also 10.
如果数据流D11在图4A的场景中还存在待传输的数据流量,且待传输的数据流量大小恰好等于从节点a3所获得的信用值指示的数据流量,则节点a1将从节点a3获得的信用值填充至credit2,此时credit2的计数变为10,如图4B的场景所示。然后,节点a1基于credit2中的数值,向目的节点a3传输数据流D11,所传输的数据流D11的数据流量大小对应于credit2的数值指示的数据流量大小。If data flow D11 still has data traffic to be transmitted in the scenario of Figure 4A, and the size of the data traffic to be transmitted is exactly equal to the data traffic indicated by the credit value obtained from node a3, then node a1 will obtain the credit value from node a3. The value is filled to credit2, at which point the count of credit2 becomes 10, as shown in the scene in Figure 4B. Then, node a1 transmits data flow D11 to destination node a3 based on the value in credit2. The data flow size of the transmitted data flow D11 corresponds to the data flow size indicated by the value of credit2.
需要说明的是,在图4A和图4B所示的场景中,credit1中的数值与credit2中的数值相同,节点a1基于credit1中的数值传输数据流D11的带宽,与节点a1基于credit2中的数值传输数据流D11的带宽相同。 It should be noted that in the scenarios shown in Figure 4A and Figure 4B, the value in credit1 is the same as the value in credit2. Node a1 transmits the bandwidth of data stream D11 based on the value in credit1, and node a1 is based on the value in credit2. The bandwidth of the transmission data stream D11 is the same.
如果数据流D11在图4A的场景中还存在待传输的数据流量,且待传输的数据流量小于从节点a3所获得的信用值指示的数据流量,假设待传输数据流D11的数据流量对应的credit为2,则节点a1将从节点a3获得的信用值中的2个信用值填充至credit2,将从节点a3获得的信用值中的另外8个信用值填充至credit1,如图4C的场景所示。在图4C中,credit1的计数变为10,credit2的计数变为2。然后,节点a1基于credit2中的数值,向目的节点a3传输数据流D11,所传输的数据流D11的数据流量大小对应于credit2的数值指示的数据流量大小。If data flow D11 still has data traffic to be transmitted in the scenario of Figure 4A, and the data traffic to be transmitted is less than the data traffic indicated by the credit value obtained from node a3, assume that the credit corresponding to the data traffic of data flow D11 to be transmitted is is 2, then node a1 fills 2 credit values from the credit value obtained from node a3 to credit2, and fills the other 8 credit values from the credit value obtained from node a3 into credit1, as shown in the scene in Figure 4C . In Figure 4C, the count of credit1 becomes 10 and the count of credit2 becomes 2. Then, node a1 transmits data flow D11 to destination node a3 based on the value in credit2. The data flow size of the transmitted data flow D11 corresponds to the data flow size indicated by the value of credit2.
如果节点a1从服务器接收到数据流D13,数据流D13为新的数据流,该数据流D13为非受控数据流。在图4B的场景下,节点a1基于credit1的计数3,也即采用与credit1的计数3对应的数据流量大小,向节点a3传输数据流D13;在图4C的场景下,节点a1基于credit1的计数10,也即采用与credit1的计数10对应的数据流量大小,向节点a3传输数据流D13。If node a1 receives data flow D13 from the server, data flow D13 is a new data flow, and this data flow D13 is an uncontrolled data flow. In the scenario of Figure 4B, node a1 is based on the count 3 of credit1, that is, using the data traffic size corresponding to the count 3 of credit1, it transmits data flow D13 to node a3; in the scenario of Figure 4C, node a1 is based on the count of credit1 10, that is, the data flow size corresponding to the count 10 of credit1 is used to transmit data flow D13 to node a3.
基于图2所示的网络系统100、图4A~图4C所示的源节点发送数据流的原理,本申请实施例中,每一个节点通过其中一个端口可以发送或接收多条数据流,例如八条数据流。下面以图2中节点a1和节点a2为源节点,节点a3和节点a4为目的节点,节点b1和节点b2为交换节点为例,结合图5所示的交互流程500,以及图6A~图6B所示的应用场景,对本申请实施例提供的数据传输方法进行更为详细的描述。请参考图5,图5是本申请实施例提供的节点a1、节点b1以及节点a3之间的一个交互流程500,该交互流程500包括:Based on the principle of the network system 100 shown in Figure 2 and the source nodes shown in Figures 4A to 4C sending data streams, in the embodiment of the present application, each node can send or receive multiple data streams, such as eight, through one of the ports. data flow. The following takes the node a1 and the node a2 in Figure 2 as the source node, the node a3 and the node a4 as the destination node, the node b1 and the node b2 as the switching node as an example, combined with the interaction process 500 shown in Figure 5, and Figure 6A to Figure 6B The application scenario shown provides a more detailed description of the data transmission method provided by the embodiment of the present application. Please refer to Figure 5. Figure 5 is an interaction process 500 between node a1, node b1 and node a3 provided by the embodiment of the present application. The interaction process 500 includes:
步骤501,节点a1基于所接收到的数据流D11,基于预先设置的信用值1,向节点b1传输数据流D11。Step 501: Node a1 transmits data stream D11 to node b1 based on the received data stream D11 and the preset credit value 1.
该步骤中,节点a1可以从其所连接的服务器接收数据流,然后对所接收到的数据流进行优先权排序,这里的优先权是指优先发送至网络中的数据流。假设节点a1接收到多条数据,基于优先权排序后,优先发送数据流D11。假设数据流D11为新的数据流,也即未与数据流D11所需要传输至的目的节点通信,从而该数据流D11未从其所对应的目的节点获得信用值,也即数据流D11目前为非受控数据流。基于降低数据流的传输时延的原则,节点a1基于信用值1,也即图4A所示的credit1的数值,通过端口Pa12直接向节点b1的端口Pb11传输数据流D11,如图6A所示。需要说明的是,节点a1可以基于网络系统100中的数据传输协议所约定的帧格式,将数据流D11生成多个数据包进行传输,该数据包中除了包括承载业务数据的字段之外,还包括指示目的节点a3的字段、指示节点a1的身份标识(ID,Identity document)的字段、指示数据流D11是否为受控数据流的字段、指示数据流D11的ID的字段、指示节点a1的端口号的字段。该数据帧还可以包括更多或更少的字段,本申请实施例对此不做具体限定。在一种可能的实现方式中,节点a1在向节点a3传输数据流D11的同时,也可能同时传输数据流D12,该数据流D12为受控数据流,该受控数据流D12是预先从节点a3获得信用值的数据流,该信用值为节点a1一次所传输的数据流D12的最大数据流量。In this step, node a1 can receive the data stream from the server it is connected to, and then prioritize the received data stream. The priority here refers to the data stream sent to the network first. Assume that node a1 receives multiple pieces of data. After sorting based on priority, data stream D11 is sent first. Assume that data flow D11 is a new data flow, that is, it has not communicated with the destination node to which data flow D11 needs to be transmitted, so that data flow D11 has not obtained credit value from its corresponding destination node, that is, data flow D11 is currently Uncontrolled data flow. Based on the principle of reducing the transmission delay of the data flow, node a1 directly transmits data flow D11 to port Pb11 of node b1 through port Pa12 based on credit value 1, that is, the value of credit1 shown in Figure 4A, as shown in Figure 6A. It should be noted that node a1 can generate multiple data packets for transmission based on the frame format agreed by the data transmission protocol in the network system 100. In addition to fields carrying business data, the data packets also include It includes a field indicating the destination node a3, a field indicating the identity document (ID, Identity document) of the node a1, a field indicating whether the data flow D11 is a controlled data flow, a field indicating the ID of the data flow D11, and a field indicating the port of the node a1. number field. The data frame may also include more or fewer fields, which is not specifically limited in the embodiments of this application. In a possible implementation, while node a1 transmits data flow D11 to node a3, it may also transmit data flow D12 at the same time. This data flow D12 is a controlled data flow, and this controlled data flow D12 is pre-slave from the node. a3 obtains a data flow with a credit value, which is the maximum data flow of data flow D12 transmitted by node a1 at one time.
步骤502,节点b1将部分数据流D11保存至与端口Pb11对应的缓存中,将另外一部分数据流D11传输至节点a3。步骤503,节点b1检测出接收数据流D11的端口Pb11的缓存容量超过预设阈值,向节点a1传输信息I1,该信息I1用于指示接收数据流D11的端口Pb11的缓存容量超过预设阈值。In step 502, node b1 saves part of the data stream D11 to the cache corresponding to port Pb11, and transmits the other part of the data stream D11 to node a3. In step 503, node b1 detects that the cache capacity of port Pb11 receiving data stream D11 exceeds the preset threshold, and transmits information I1 to node a1, where the information I1 is used to indicate that the cache capacity of port Pb11 receiving data stream D11 exceeds the preset threshold.
节点b1基于传输路径上的带宽,将一部分数据流D11通过端口Pb12传输至节点a3,将另外一部分数据流D11保存至与端口Pb11对应的缓存中。另外,从图6A所示的场景中可以看出,节点b1的端口Pb11除了接收来自于节点a1的数据流D11之外,还接收来自节点a2的数据流D21。从节点a1所接收到的数据流D11和从节点a2接收到的数据流D21均暂存至端口Pb11对应的同一个缓存中。假设数据流D21为经目的节点分配有信用值的数据流,数据流D21的信用值是节点a2预先与节点a3通信、从节点a3处获得的。也即是说,数据流D21为受控数据流。由于数据流D21一次所传输的流量大小,是基于目的节点分配的信用值配置的,其通常不会引起节点b1拥塞;而数据流D11是非受控数据流,其一次所传输的流量大小通常为网络系统预先配置的,该数据流的大小通常不固定,为了降低数据流的传输时延,其通常一次所传输的数据流的流量较大,也即非受控数据流通常会导致节点b1拥塞。当节点b1检测出端口Pb11对应的缓存容量超过预设阈值时,也即节点b1拥塞,节点b1可以向节点a1传输信息I1,以指示数据流D11所占用的缓存容量超过预设阈值。具体实现中,信息I1可以为推送拥塞通知(PCN,Push Congestion Notification)报文,该PCN报文的格式可以基于网络系统100中的数据传输协议设置。具体实现中,该PCN报文中可以包括指示节点a1的ID的字段以及指示 引起拥塞的数据流D11的标识的字段。另外,PCN报文还可以包括更多或更少的字段,本申请实施例对此不做具体限定。Based on the bandwidth on the transmission path, node b1 transmits part of the data flow D11 to node a3 through port Pb12, and saves another part of the data flow D11 in the cache corresponding to port Pb11. In addition, it can be seen from the scene shown in FIG. 6A that the port Pb11 of the node b1 not only receives the data flow D11 from the node a1, but also receives the data flow D21 from the node a2. The data stream D11 received from node a1 and the data stream D21 received from node a2 are both temporarily stored in the same cache corresponding to port Pb11. Assume that data flow D21 is a data flow assigned a credit value via the destination node. The credit value of data flow D21 is obtained from node a3 by node a2 communicating with node a3 in advance. In other words, the data flow D21 is a controlled data flow. Since the traffic size transmitted by data flow D21 at one time is configured based on the credit value assigned by the destination node, it usually does not cause congestion at node b1; while data flow D11 is an uncontrolled data flow, and the traffic size transmitted at one time is usually The network system is pre-configured, and the size of the data flow is usually not fixed. In order to reduce the transmission delay of the data flow, the flow of the data flow transmitted at one time is usually large, that is, uncontrolled data flow usually causes congestion on node b1. When node b1 detects that the cache capacity corresponding to port Pb11 exceeds the preset threshold, that is, node b1 is congested, node b1 can transmit information I1 to node a1 to indicate that the cache capacity occupied by data flow D11 exceeds the preset threshold. In specific implementation, the information I1 may be a Push Congestion Notification (PCN) message, and the format of the PCN message may be set based on the data transmission protocol in the network system 100 . In specific implementation, the PCN message may include a field indicating the ID of node a1 and an indication. Field identifying the data flow D11 that caused congestion. In addition, the PCN message may also include more or fewer fields, which is not specifically limited in this embodiment of the present application.
需要说明的是,上述端口Pb11对应的缓存容量超过预设阈值,该预设阈值例如为缓存容量的百分之六十,缓存容量的百分之八十等。通常,受限于节点b1与节点a3之间的数据传输带宽,无法将所接收到数据流D11一次全部发送,节点b1将一部分数据流发送,一部分数据进行缓存,然后基于数据传输带宽一次所能容纳的数据量,将所缓存的数据流D11分一次或多次传输至节点a3。It should be noted that the cache capacity corresponding to the port Pb11 exceeds a preset threshold. The preset threshold is, for example, 60% of the cache capacity, 80% of the cache capacity, etc. Usually, limited by the data transmission bandwidth between node b1 and node a3, the received data stream D11 cannot be sent all at once. Node b1 sends part of the data stream and caches part of the data, and then based on the data transmission bandwidth, it can The amount of data to be accommodated, the cached data stream D11 is transmitted to node a3 in one or more times.
步骤504,节点a3向节点a1发送指示数据流D11的信用值2的指示信息。Step 504: Node a3 sends indication information indicating the credit value 2 of data flow D11 to node a1.
本申请实施例中,节点a1可以通过多种方式向节点a3传输请求获得信用值2的请求信息。第一种可能的实现方式中,节点a1可以将请求获得信用值2的请求信息添加至上述数据流D11中,传输至节点a3;在第二种可能的实现方式中,节点a1可以在传输数据流D11之前或之后,独立于数据流D11的向节点a3传输请求获得信用值2的请求信息,本申请实施例对此请求获得信用值2的请求信息的方式不做具体限定。In the embodiment of this application, node a1 can transmit request information requesting to obtain credit value 2 to node a3 in various ways. In the first possible implementation, node a1 can add the request information requesting the credit value 2 to the above-mentioned data flow D11 and transmit it to node a3; in the second possible implementation, node a1 can transmit the data Before or after the flow D11, the request information requesting to obtain the credit value 2 is transmitted to the node a3 independently of the data flow D11. The embodiment of the present application does not specifically limit the method of requesting the request information to obtain the credit value 2.
可选的,信用值2可以是节点a1向节点a3发送请求(req,request)报文,以及从节点a3获得确认字符(ACK,acknowledge character)报文而得到的。也即是说,在上述第一种可能的实现方式中,节点a1可以将req报文添加至数据流D11中,该req报文中承载请求获得信用值2的请求信息;在上述第二种可能的实现方式中,节点a1直接向节点a3传输req报文,该req报文中承载请求获得信用值2的请求信息。Optionally, the credit value 2 can be obtained by node a1 sending a request (req, request) message to node a3 and obtaining an acknowledgment character (ACK, acknowledge character) message from node a3. That is to say, in the first possible implementation method mentioned above, node a1 can add the req message to the data flow D11. The req message carries the request information to obtain the credit value 2; in the second possible implementation method mentioned above, In a possible implementation, node a1 directly transmits a req message to node a3, and the req message carries request information for obtaining credit value 2.
步骤505,节点a1基于信用值2,向节点a3传输数据流D11。在该步骤中,节点a1还可以基于预先从节点a3获得的数据流D12的信用值,继续向节点a3传输数据流D12。Step 505: Node a1 transmits data flow D11 to node a3 based on credit value 2. In this step, node a1 can also continue to transmit data flow D12 to node a3 based on the credit value of data flow D12 obtained from node a3 in advance.
节点a1从节点b1接收到信息I1后,可以首先检测是否从节点a3获得的数据流D11的信用值2,也即检测如图4A中所示的与数据流D11对应的credit2的计数是否为0。当检测到图4A所示的credit2的计数不为0时,也即已经从节点a3获得数据流D11的信用值2,此时停止基于预先设置的信用值(也即基于图4A所示的credit1的计数)传输非受控数据流D11,基于从节点a3获得信用值2向节点a3传输受控数据流,此外,节点a1还可以传输受控数据流D12,如图6B所示。数据D12为老的数据流,也即已经与数据流D12所要传输至的目的节点通信,以从该数据流D12所对应的目的节点获得信用值,也即数据流D12目前为受控数据流。数据流D12的信用值,用于指示节点a3分配给数据流D12的缓存容量。数据流D12的信用值是节点a1预先从节点a3获得的。需要说明的是,如上所述的数据流D11和数据流D12均为独立的数据流,如果数据流D11和数据流D12均需要成为受控数据流,节点a1需要向节点a3申请与数据流D11对应的信用值,也需要向节点a3申请与数据流D12对应的信用值。从而,节点a3在向节点a1发送各条数据流的信用值时,还需要携带指示信用值所对应的数据流的数据流标识。图6B所示的场景中,节点a1基于信用值2,通过端口Pa12向节点b1的端口Pb11传输数据流D11,节点b1通过端口Pb12转发至节点a3的端口Pa32以实现节点a1将数据流D11传输至节点a3。此外,节点a1通过端口Pa12将数据流D12传输至节点b2的端口Pb21,节点b2通过端口Pb23转发至节点a3的端口Pa33,以实现节点a1将数据流D12传输至节点a3。通过图5所示的交互流程500可以看出,当节点a1由于向节点b1发送未分配信用值的数据流D11,导致节点b1发送拥塞时,节点b1可以向节点a1传输指示接收数据流D11的端口Pb11的缓存容量超过预设阈值的信息I1,也即指示数据流D11产生拥塞的指示信息。从而,节点a1可以在网络系统发生拥塞时,无论信用值1是否还有剩余计数(也即图4A所示的credit1中的计数是否为零),均停止基于信用值credit1传输数据流D11,仅向网络系统100中传输分配有信用值的数据流D12和分配有信用值的数据流D11,由于导致网络系统发生拥塞的通常是那些未分配信用值的数据流量,分配有信用值的数据流量不会导致网络系统发生拥塞,与现有技术中对产生拥塞的数据流和非拥塞数据流不加以区分,影响非拥塞数据流的正常传输相比,本申请实施例提供的网络系统100,当网络系统发生拥塞时,仅传输分配信用值的数据流,也即可以保障非拥塞数据流在网络系统中正常传输,从而可以在保证网络系统高吞吐率的情况下,大大降低网络系统的拥塞;另外,与现有技术中直接丢弃新传入的数据报文相比,本申请实施例提供的数据传输方法还可以保障数据流传输的准确率。After node a1 receives the information I1 from node b1, it can first detect whether the credit value 2 of the data stream D11 obtained from node a3, that is, whether the count of credit2 corresponding to the data stream D11 as shown in Figure 4A is 0. . When it is detected that the count of credit2 shown in Figure 4A is not 0, that is, the credit value 2 of the data stream D11 has been obtained from node a3, at this time, it stops based on the preset credit value (that is, based on the credit1 shown in Figure 4A count) transmits the uncontrolled data flow D11, and transmits the controlled data flow to the node a3 based on the credit value 2 obtained from the node a3. In addition, the node a1 can also transmit the controlled data flow D12, as shown in Figure 6B. The data D12 is an old data flow, that is, it has communicated with the destination node to which the data flow D12 is to be transmitted, in order to obtain a credit value from the destination node corresponding to the data flow D12, that is, the data flow D12 is currently a controlled data flow. The credit value of data flow D12 is used to indicate the cache capacity allocated by node a3 to data flow D12. The credit value of data flow D12 is obtained by node a1 from node a3 in advance. It should be noted that the data flow D11 and data flow D12 as mentioned above are independent data flows. If both data flow D11 and data flow D12 need to become controlled data flows, node a1 needs to apply to node a3 for data flow D11. For the corresponding credit value, it is also necessary to apply to node a3 for the credit value corresponding to data flow D12. Therefore, when node a3 sends the credit value of each data flow to node a1, it also needs to carry a data flow identifier indicating the data flow corresponding to the credit value. In the scenario shown in Figure 6B, node a1 transmits data flow D11 to port Pb11 of node b1 through port Pa12 based on credit value 2. Node b1 forwards data flow D11 to port Pb11 of node b1 through port Pb12 to realize node a1 transmits data flow D11. to node a3. In addition, node a1 transmits data stream D12 to port Pb21 of node b2 through port Pa12, and node b2 forwards it to port Pa33 of node a3 through port Pb23, so that node a1 transmits data stream D12 to node a3. It can be seen from the interaction process 500 shown in Figure 5 that when node a1 sends data flow D11 with no credit value to node b1, causing node b1 to send congestion, node b1 can transmit to node a1 a message indicating the reception of data flow D11. The information I1 that the cache capacity of the port Pb11 exceeds the preset threshold is information indicating that congestion occurs in the data flow D11. Therefore, when the network system is congested, node a1 can stop transmitting the data stream D11 based on the credit value credit1 regardless of whether there is a remaining count of the credit value 1 (that is, whether the count in credit1 shown in Figure 4A is zero). The data flow D12 allocated with a credit value and the data flow D11 allocated with a credit value are transmitted to the network system 100. Since the data traffic that is not allocated a credit value usually causes congestion in the network system, the data traffic allocated with a credit value is not It will cause congestion in the network system. Compared with the existing technology that does not distinguish between data flows that cause congestion and non-congested data flows, which affects the normal transmission of non-congested data flows, the network system 100 provided by the embodiment of the present application, when the network When the system is congested, only the data flows assigned credit values are transmitted, which ensures the normal transmission of non-congested data flows in the network system, thus greatly reducing the congestion of the network system while ensuring a high throughput rate of the network system; in addition, , Compared with the existing technology that directly discards newly incoming data packets, the data transmission method provided by the embodiment of the present application can also ensure the accuracy of data stream transmission.
以上通过图5示出了第一数据流由非受控数据流转换为受控数据流的过程,从而节点a1可以继续向节点a3传输该受控数据流D11。需要说明的是,上述步骤504也可以发生在上述步骤503之前,也 即节点a1可以先从节点a3接收到指示数据流D11的信用值2的指示信息,后接收到节点b1发送的信息I1;或者也可以说,节点a3先向节点a1发送指示数据流D11的信用值2的指示信息,节点b1后向节点a1发送信息I1。此外,在本申请实施例一种可能的实现方式中,当节点a1先收到节点b1发送的信息I1时,但是未收到节点a3发送的指示数据流D11的信用值2的指示信息,在该种情况下,如图4A中所示的与数据流D11对应的credit2的计数为0,此时节点a1可以暂时停止向节点a3传输数据流D11。FIG5 shows the process of converting the first data stream from an uncontrolled data stream to a controlled data stream, so that node a1 can continue to transmit the controlled data stream D11 to node a3. It should be noted that the above step 504 can also occur before the above step 503. That is, node a1 may first receive the indication information indicating the credit value 2 of data stream D11 from node a3, and then receive the information I1 sent by node b1; or it can be said that node a3 first sends the indication information indicating the credit value 2 of data stream D11 to node a1, and node b1 then sends the information I1 to node a1. In addition, in a possible implementation of the embodiment of the present application, when node a1 first receives the information I1 sent by node b1, but does not receive the indication information indicating the credit value 2 of data stream D11 sent by node a3, in this case, as shown in FIG4A , the count of credit2 corresponding to data stream D11 is 0, and at this time node a1 may temporarily stop transmitting data stream D11 to node a3.
另外,如图5所示的交互流程100中,示意性的示出了节点b1在检测出端口Pb11发生拥塞时,直接向节点a1传输信息I1。在第二种可能的实现方式中,当节点a1通过节点b1向节点a3传输数据流D11时,上述信息I1可以被节点b1添加至数据流D11中的req报文中传输至节点a3,待节点a3基于req报文生成ACK报文后,可以将信息I1添加至ACK报文中,将信息I1以及指示数据流D11的信用值2的指示信息一并传输至节点a1。更为具体的,上述信息I1可以被节点b1添加至数据流D11中的req报文中。此外,在三种可能的实现方式中,当节点a3通过节点b1向节点a1传输ACK报文时,上述信息I1也可以直接添加至ACK报文中,从而节点b1将添加有信息I1以及指示数据流D11的信用值2的指示信息的报文传输给节点a1。下面仍以图2中节点a1和节点a2为源节点,节点a3和节点a4为目的节点,节点b1和节点b2为交换节点为例,通过图7和图8所示的交互流程,分别对上述第二种可能的实现方式以及第三种可能的实现方式进行更为详细的描述。In addition, the interaction process 100 shown in Figure 5 schematically shows that node b1 directly transmits information I1 to node a1 when detecting congestion on port Pb11. In the second possible implementation, when node a1 transmits data flow D11 to node a3 through node b1, the above information I1 can be added by node b1 to the req message in data flow D11 and transmitted to node a3. After a3 generates an ACK message based on the req message, it can add the information I1 to the ACK message, and transmit the information I1 and the indication information indicating the credit value 2 of the data flow D11 to the node a1 together. More specifically, the above information I1 can be added to the req message in the data flow D11 by the node b1. In addition, in the three possible implementations, when node a3 transmits an ACK message to node a1 through node b1, the above information I1 can also be directly added to the ACK message, so node b1 will add information I1 and indication data The message indicating the credit value 2 of flow D11 is transmitted to node a1. The following still takes the node a1 and the node a2 in Figure 2 as the source node, the node a3 and the node a4 as the destination node, and the node b1 and the node b2 as the switching node as an example. Through the interaction process shown in Figure 7 and Figure 8, the above-mentioned The second possible implementation method and the third possible implementation method are described in more detail.
请参考图7,图7是本申请实施例提供的节点a1、节点b1以及节点a3之间的一个交互流程700,图8为如图7所示的交互流程的一个应用场景示意图,该交互流程700包括:Please refer to Figure 7. Figure 7 is an interaction process 700 between node a1, node b1 and node a3 provided by the embodiment of the present application. Figure 8 is a schematic diagram of an application scenario of the interaction process shown in Figure 7. The interaction process 700 includes:
步骤701,节点a1基于所接收到的数据流D11,基于预先设置的信用值1,向节点b1传输数据流D11。数据流D11中既包括需要发送的多个数据包,还包括req报文,该req报文中承载有请求获得信用值2的请求信息。该步骤中,req报文可以基于网络系统100中的数据传输协议设置。具体实现中,该req报文中可以包括指示目的节点a3的字段、指示节点a1的ID的字段、指示数据流D11的ID的字段和指示获得信用值的字段等。req报文还可以包括更多或更少的字段,本申请实施例对此不做具体限定。需要说明的是,req报文中还可以设置有空字段,以供交换节点b1向其中添加信息I1。Step 701: Node a1 transmits data stream D11 to node b1 based on the received data stream D11 and the preset credit value 1. Data flow D11 includes multiple data packets that need to be sent, and also includes a req message, which carries request information for obtaining credit value 2. In this step, the req message may be set based on the data transmission protocol in the network system 100 . In specific implementation, the req message may include a field indicating the destination node a3, a field indicating the ID of the node a1, a field indicating the ID of the data flow D11, and a field indicating obtaining a credit value, etc. The req message may also include more or fewer fields, which is not specifically limited in the embodiments of this application. It should be noted that an empty field can also be set in the req message for switching node b1 to add information I1 to it.
步骤702,节点b1将部分数据流D11保存至与端口Pb11对应的缓存中。Step 702: Node b1 saves part of the data flow D11 into the cache corresponding to port Pb11.
步骤701-步骤702的具体实现与图5所示的步骤501-步骤502相同,具体参考步骤501和步骤502的相关描述,不再赘述。The specific implementation of steps 701 to 702 is the same as steps 501 to 502 shown in Figure 5. For details, refer to the relevant descriptions of steps 501 and 502, which will not be described again.
步骤703,节点b1检测出接收数据流D11的端口Pb11的缓存容量超过预设阈值,从数据流D11中提取出req报文,将信息I1添加至req报文中。步骤704,向节点a3传输添加有信息I1的数据流D11。节点b1检测出接收数据流D11的端口Pb11的缓存容量超过预设阈值的实现方式与图5所示的步骤503中、节点b1检测出接收数据流D11的端口Pb11的缓存容量超过预设阈值的实现方式相同,具体参考步骤503中的相关描述,不再赘述。当节点b1检测出端口Pb11的缓存容量超过预设阈值之后,可以将信息I1添加至req报文中。信息I1添加至req报文中的位置,可以基于数据传输协议预先约定的。例如,信息I1可以包括指示节点a1的ID的字段、指示数据流D11的ID的字段以及指示PCN信息的字段。例如,约定节点a1的ID由十六比特位表示、数据流D11的ID由十六比特位表示、PCN信息由一比特位表示。当指示PCN信息的字段为1时,表示接收数据流D11的端口Pb11的缓存容量超过预设阈值。由此,req报文中携带有信息I1。节点b1将携带有信息I1的数据流D11传输至节点a1。Step 703: Node b1 detects that the cache capacity of port Pb11 that receives data flow D11 exceeds the preset threshold, extracts the req message from data flow D11, and adds information I1 to the req message. Step 704: Transmit the data stream D11 with the information I1 added to node a3. The implementation of node b1 detecting that the buffer capacity of port Pb11 that receives data flow D11 exceeds the preset threshold is the same as that in step 503 shown in Figure 5 , node b1 detects that the buffer capacity of port Pb11 that receives data flow D11 exceeds the preset threshold. The implementation methods are the same. For details, refer to the relevant descriptions in step 503 and will not be described again. When node b1 detects that the cache capacity of port Pb11 exceeds the preset threshold, information I1 can be added to the req message. The position where the information I1 is added to the req message can be pre-agreed based on the data transmission protocol. For example, the information I1 may include a field indicating the ID of the node a1, a field indicating the ID of the data flow D11, and a field indicating PCN information. For example, it is agreed that the ID of node a1 is represented by sixteen bits, the ID of data stream D11 is represented by sixteen bits, and the PCN information is represented by one bit. When the field indicating the PCN information is 1, it indicates that the buffer capacity of the port Pb11 that receives the data flow D11 exceeds the preset threshold. Therefore, the req message carries information I1. Node b1 transmits data stream D11 carrying information I1 to node a1.
步骤705,节点a3基于req报文,生成携带有信息I1以及指示数据流D11的信用值2的指示信息的ACK报文,以及将ACK报文传输至节点b1。该步骤中,节点a3接收到req报文后,解析出req报文中的各字段,得到信息I1以及请求获得数据流D11的信用值2的信息。然后,节点a3生成ACK报文,该ACK报文中可以包括指示节点a1的ID的字段、指示节点a3的ID的字段、指示数据流D11的ID的字段、指示数据流D11的信用值的字段以及承载信息I1的字段等。可以理解的是,ACK报文还可以包括更多或更少的字段,本申请实施例对此不做具体限定。该信息I1所包括的字段与步骤704中所述的信息I1所包括的字段相同。Step 705: Based on the req message, node a3 generates an ACK message carrying information I1 and indication information indicating the credit value 2 of data flow D11, and transmits the ACK message to node b1. In this step, after receiving the req message, node a3 parses out each field in the req message and obtains information I1 and information requesting the credit value 2 of data flow D11. Then, node a3 generates an ACK message, which may include a field indicating the ID of node a1, a field indicating the ID of node a3, a field indicating the ID of data flow D11, and a field indicating the credit value of data flow D11. As well as fields carrying information I1, etc. It can be understood that the ACK message may also include more or fewer fields, which is not specifically limited in this embodiment of the present application. The fields included in this information I1 are the same as the fields included in the information I1 described in step 704.
步骤706,节点b1将ACK报文传输至节点a1。Step 706: Node b1 transmits the ACK message to node a1.
步骤707,节点a1基于ACK报文以及信用值2,向节点a3传输数据流D11。该步骤中,节点a1 接收到ACK报文之后,从ACK报文中获得指示数据流D11的信用值2的信息、以及指示接收数据流D11的端口Pb11的缓存容量超过预设阈值的信息。从而,节点a1采用节点a3分配的信用值2,通过节点b2向节点a3传输数据流D11。Step 707: Node a1 transmits data flow D11 to node a3 based on the ACK message and credit value 2. In this step, node a1 After receiving the ACK message, information indicating the credit value 2 of the data flow D11 and information indicating that the buffer capacity of the port Pb11 receiving the data flow D11 exceeds the preset threshold are obtained from the ACK message. Therefore, node a1 uses the credit value 2 assigned by node a3 to transmit the data flow D11 to node a3 through node b2.
以上通过图7和图8,示出了信息I1可以添加至数据流D11中的req报文中,待节点a3基于req报文生成ACK报文后,将信息I1添加至ACK报文中传输至节点a1。下面,通过图9和图10,介绍信息I1直接添加至ACK报文中,从而节点b1将添加有信息I1的ACK报文传输给节点a1。请参考图9,图9是本申请实施例提供的节点a1、节点b1以及节点a3之间的一个交互流程900,图10为如图9所示的交互流程的一个应用场景示意图,该交互流程900包括:As shown above, through Figures 7 and 8, information I1 can be added to the req message in the data stream D11. After node a3 generates an ACK message based on the req message, the information I1 is added to the ACK message and transmitted to node a1. Next, through Figures 9 and 10, it is introduced that information I1 is directly added to the ACK message, so that node b1 transmits the ACK message with information I1 added to node a1. Please refer to Figure 9, which is an interaction process 900 between nodes a1, node b1 and node a3 provided in an embodiment of the present application, and Figure 10 is a schematic diagram of an application scenario of the interaction process shown in Figure 9, and the interaction process 900 includes:
步骤901,节点a1基于所接收到的数据流D11,基于预先设置的信用值1,向节点b1传输数据流D11。数据流D11中既包括需要发送的多个数据包,还包括req报文,该req报文中承载有请求获得信用值2的请求信息。Step 901: Node a1 transmits data stream D11 to node b1 based on the received data stream D11 and the preset credit value 1. Data flow D11 includes multiple data packets that need to be sent, and also includes a req message, which carries request information for obtaining credit value 2.
步骤902,节点b1将部分数据流D11保存至与端口Pb11对应的缓存中,将另外一部分数据流D11传输至节点a3。Step 902: Node b1 saves part of the data flow D11 into the cache corresponding to port Pb11, and transmits another part of the data flow D11 to node a3.
步骤901-步骤902的具体实现与图5所示的步骤501-步骤502相同,具体参考步骤501和步骤502的相关描述,不再赘述;req报文的内容与图7所示的步骤701中所示的req报文的内容相同,具体参考步骤701的相关描述,不再赘述。The specific implementation of steps 901 to 902 is the same as steps 501 to 502 shown in Figure 5. For details, refer to the relevant descriptions of step 501 and step 502, and will not be repeated; the content of the req message is the same as that in step 701 shown in Figure 7. The contents of the req messages shown are the same. For details, refer to the relevant description of step 701 and will not be described again.
步骤903,节点a3基于数据流D11中的req报文,生成ACK报文,以及将ACK报文传输至节点b1。ACK报文中可以包括指示节点a1的ID的字段、指示节点a3的ID的字段、指示数据流D11的ID的字段和指示数据流D11的信用值的字段等。所生成的ACK报文中未携带信息I1。其中,ACK报文中还可以设置有一些空字段,以供节点b1将信息I1添加至ACK报文中。Step 903, node a3 generates an ACK message based on the req message in the data stream D11, and transmits the ACK message to node b1. The ACK message may include a field indicating the ID of node a1, a field indicating the ID of node a3, a field indicating the ID of data stream D11, and a field indicating the credit value of data stream D11. The generated ACK message does not carry information I1. Among them, some empty fields can also be set in the ACK message for node b1 to add information I1 to the ACK message.
步骤904,节点b1检测出接收数据流D11的端口pb11的缓存容量超过预设阈值,将信息I1添加至ACK报文中,以及向节点a1传输ACK报文。该步骤之后,ACK报文中除了包括上述步骤903所述的各字段之外,还包括承载信息I1的字段。Step 904: Node b1 detects that the buffer capacity of port pb11 that receives data flow D11 exceeds a preset threshold, adds information I1 to the ACK message, and transmits the ACK message to node a1. After this step, in addition to the fields described in step 903, the ACK message also includes a field carrying information I1.
步骤905,节点a1基于ACK报文以及信用值2,向节点a3传输数据流D11。该步骤的具体实现与图7所示的步骤707的具体实现相同,参考步骤707的描述,不再赘述。Step 905: Node a1 transmits data flow D11 to node a3 based on the ACK message and credit value 2. The specific implementation of this step is the same as the specific implementation of step 707 shown in Figure 7. Refer to the description of step 707, which will not be described again.
从图9所示的实施例中可以看出,与图7所示的实施例不同的是,在节点b1转发ACK报文时,经信息I1添加至ACK报文中的。从而,本申请实施例提供的网络系统100,当网络系统100中的交换节点发生拥塞时,可以通过多种方式或者更多的时机通知发生拥塞的数据流所对应的源节点,从而可以进一步提高网络系统100的效率。It can be seen from the embodiment shown in Figure 9 that the difference from the embodiment shown in Figure 7 is that when node b1 forwards the ACK message, the information I1 is added to the ACK message. Therefore, the network system 100 provided by the embodiment of the present application, when the switching node in the network system 100 is congested, can notify the source node corresponding to the congested data flow through multiple methods or more opportunities, thereby further improving the efficiency of the network system 100. Efficiency of network system 100.
以上通过图5~图10所示的实施例,示出了交换节点发生拥塞时,向发生拥塞的数据流所对应的源节点传输指示拥塞的信息,在其他可能的实现方式中,目的节点也可能发生拥塞,在该种情况下,目的节点也可以通过直接传输拥塞信息的方式或者通过ACK报文的方式,向发生拥塞的数据流所对应的源节点传输指示拥塞的信息。请继续参考图11,图11是本申请实施例提供的节点a1、节点b1以及节点a3之间的一个交互流程1100,该交互流程1100包括:The above embodiments shown in Figures 5 to 10 illustrate that when a switching node is congested, information indicating congestion is transmitted to the source node corresponding to the congested data flow. In other possible implementations, the destination node also Congestion may occur. In this case, the destination node can also transmit congestion information to the source node corresponding to the congested data flow by directly transmitting congestion information or through an ACK message. Please continue to refer to Figure 11. Figure 11 is an interaction process 1100 between node a1, node b1 and node a3 provided by the embodiment of the present application. The interaction process 1100 includes:
步骤1101,节点a1基于所接收到的数据流D11,基于预先设置的信用值1,向节点b1传输数据流D11。Step 1101: Node a1 transmits data stream D11 to node b1 based on the received data stream D11 and the preset credit value 1.
步骤1102,节点b1将数据流D11传输至节点a3。Step 1102: Node b1 transmits data stream D11 to node a3.
步骤1103,节点a3将数据流D11保存至缓存中。步骤1104,节点a3检测出接收数据流D11的端口Pa32的缓存容量超过预设阈值,将信息I1传输至节点b1。该步骤中,节点a3通过检测用于接收数据流D11的端口(例如图2中所示的端口Pa32)所对应的缓存容量,当缓存容量超过预设阈值时,说明节点a3发生拥塞,且导致拥塞的数据流为数据流D11。从而,节点a3可以向节点b1传输信息I1。信息I1可以为PCN报文,该PCN报文中可以包括指示节点a1的ID的字段以及指示产生拥塞的数据流D11的ID的字段。另外,PCN报文还可以包括更多或更少的字段,本申请实施例对此不做具体限定。Step 1103: Node a3 saves the data stream D11 into the cache. Step 1104: Node a3 detects that the buffer capacity of port Pa32 that receives data stream D11 exceeds a preset threshold, and transmits information I1 to node b1. In this step, node a3 detects the cache capacity corresponding to the port used to receive data flow D11 (for example, port Pa32 shown in Figure 2). When the cache capacity exceeds the preset threshold, it means that node a3 is congested and causes The congested data flow is data flow D11. Thus, node a3 can transmit information I1 to node b1. The information I1 may be a PCN message, and the PCN message may include a field indicating the ID of the node a1 and a field indicating the ID of the data flow D11 that causes congestion. In addition, the PCN message may also include more or fewer fields, which is not specifically limited in this embodiment of the present application.
步骤1105,节点b1将信息I1传输至节点a1。Step 1105, node b1 transmits the information I1 to node a1.
步骤1106,节点a1基于信息I1以及预先从节点a3获得的信用值2,向节点a3传输数据流D11。Step 1106: Node a1 transmits data stream D11 to node a3 based on the information I1 and the credit value 2 obtained in advance from node a3.
图11示例性的示出了目的节点a3在接收数据流D11的端口Pb32的缓存容量超过预设阈值时,直 接向节点a1传输信息I1。在其他可能的实现方式中,节点a3也可以在接收数据流D11的端口Pb32的缓存容量超过预设阈值时,将信息I1添加至ACK报文中,通过ACK报文传输至节点a1。在该种实现方式中,节点a3需要在检测到端口Pb32的缓存容量超过预设阈值之前,从节点a1接收req报文,该req报文用于请求节点a1中待发送至节点a3的某些数据流的信用值;然后,节点a3基于req报文生成ACK报文,以及将信息I1添加至ACK报文中,传输至节点a1。Figure 11 exemplarily shows that when the buffer capacity of the port Pb32 of the destination node a3 that receives the data flow D11 exceeds the preset threshold, the Connect to node a1 to transmit information I1. In other possible implementations, node a3 can also add information I1 to the ACK message and transmit it to node a1 through the ACK message when the buffer capacity of port Pb32 that receives data flow D11 exceeds the preset threshold. In this implementation, node a3 needs to receive a req message from node a1 before detecting that the cache capacity of port Pb32 exceeds the preset threshold. The req message is used to request certain data from node a1 to be sent to node a3. The credit value of the data flow; then, node a3 generates an ACK message based on the req message, adds information I1 to the ACK message, and transmits it to node a1.
以上图5~图11所示的实施例中,示意性的示出了当交换节点b1或者目的节点a3由于接收未分配信用值的数据流D11,而导致节点产生拥塞时,向发送数据流D11的节点a1传输指示拥塞的信息,从而节点a1停止向网络系统100中发送数据流D11或者仅发送分配有信用值的数据流D11。基于以上图5、图7、图9和图11所示的交互流程,本申请实施例一种可能的实现方式中,在上述步骤505、步骤707、步骤905或者步骤1106的基础上,当节点a1停止发送未分配信息值的数据流预设时间之后,如果还未收到指示拥塞解除的信息,也即未收到指示接收数据流D11的端口的缓存容量低于预设阈值的信息,可以基于预先设置的信用值1(也即图4A所示的credit1的计数)向网络系统100中传输新的数据流,例如数据流D13。In the above embodiments shown in Figures 5 to 11, it is schematically shown that when the switching node b1 or the destination node a3 causes node congestion due to receiving the data flow D11 without assigned credit value, the data flow D11 is sent to the The node a1 transmits information indicating congestion, so that the node a1 stops sending the data flow D11 into the network system 100 or only sends the data flow D11 assigned a credit value. Based on the interaction process shown in Figure 5, Figure 7, Figure 9 and Figure 11 above, in a possible implementation manner of the embodiment of this application, on the basis of the above step 505, step 707, step 905 or step 1106, when the node After a1 stops sending data flows with unallocated information values for a preset time, if it has not received information indicating that the congestion is relieved, that is, it has not received information indicating that the buffer capacity of the port receiving data flow D11 is lower than the preset threshold, you can A new data stream, such as data stream D13, is transmitted to the network system 100 based on the preset credit value 1 (that is, the count of credit1 shown in FIG. 4A).
在一种可能的实现方式中,在上述步骤505、步骤707、步骤905或者步骤1106的基础上,节点a1在接收到节点b1或者节点a3发送的指示信息I2时,该指示信息I2用于指示接收数据流D11的端口的缓存容量低于预设阈值,可以基于预先设置的信用值1(也即图4A所示的credit1的计数)向网络系统100中传输新的数据流,例如数据流D13。其中,节点b1或者节点a3可以单独发送指示信息I2;或者,节点b1将指示信息I2添加至req报文中,由节点a3基于req报文生成添加有信息I2的ACK报文,传输至节点a1;或者,节点b1或节点a3将指示信息I2直接添加至ACK报文中,传输至节点a1。其中,各报文的格式、信息I2的格式以及信息I2添加至各报文中的添加方式,与以上各实施例中所述的各报文的格式、信息I1的格式以及信息I1添加至各报文中的添加方式相类似,具体参考相关描述,不再赘述。In a possible implementation, based on the above step 505, step 707, step 905 or step 1106, when node a1 receives the indication information I2 sent by node b1 or node a3, the indication information I2 is used to indicate The buffer capacity of the port receiving data flow D11 is lower than the preset threshold, and a new data flow, such as data flow D13, can be transmitted to the network system 100 based on the preset credit value 1 (that is, the count of credit1 shown in FIG. 4A). . Among them, node b1 or node a3 can send the indication information I2 separately; or, node b1 adds the indication information I2 to the req message, and the node a3 generates an ACK message with the information I2 added based on the req message, and transmits it to node a1 ; Alternatively, node b1 or node a3 directly adds the indication information I2 to the ACK message and transmits it to node a1. Among them, the format of each message, the format of the information I2 and the way in which the information I2 is added to each message are the same as the format of each message, the format of the information I1 and the addition of the information I1 to each of the above embodiments. The adding method in the message is similar. Please refer to the relevant description for details and will not be repeated again.
需要说明的是,本申请实施例中,是以图2所示的节点a1、节点a2为源节点,节点b1和节点b2为交换节点,节点a3和节点a4为目的节点,通过具体场景进行描述的。可以理解的是,在其他场景中,节点a3也可以为源节点,节点a1也可以为目的节点,本申请实施例对源节点、交换节点和目的节点不做具体限定。另外,在本申请各实施例中,均以一条数据流D11和一条经目的节点分配信用值的数据流D12为例进行描述的,可以理解的是,网络系统100中,未经目的节点分配信用值的数据流可以包括多条,经目的节点分配信用值的数据流也可以包括多条,该多条未经目的节点分配信用值的数据流可以来自同一个源节点,也可以来自不同的源节点;该多条经目的节点分配信用值的数据流也可以来自同一个源节点,也可以来自不同的源节点;该多条经目的节点未分配信用值的数据流可能部分造成交换节点或目的节点拥塞,也可能全部造成交换节点或目的节点拥塞,基于具体场景而定。再次,本申请各实施例以交换节点仅包括一层为例进行描述的,在其他应用场景中,源节点和目的节点之间可以包括多层交换节点,源节点可以通过多个交换节点与目的节点通信,该多个交换节点中的任意一个交换节点均可能由于接收未分配信用值的数据流而发生拥塞。It should be noted that in the embodiment of the present application, nodes a1 and node a2 shown in Figure 2 are used as source nodes, nodes b1 and node b2 are switching nodes, and nodes a3 and node a4 are used as destination nodes, which are described through specific scenarios. of. It can be understood that in other scenarios, node a3 can also be a source node, and node a1 can also be a destination node. The embodiment of this application does not specifically limit the source node, switching node, and destination node. In addition, in each embodiment of the present application, a data flow D11 and a data flow D12 with a credit value allocated through the destination node are used as examples for description. It can be understood that in the network system 100, no credit is assigned by the destination node. The value data stream can include multiple data streams, and the data streams that are assigned credit values by the destination node can also include multiple data streams. The multiple data streams that are not assigned credit values by the destination node can come from the same source node, or they can come from different sources. node; the multiple data flows that are assigned credit values through the destination node can also come from the same source node, or they can come from different source nodes; the multiple data flows that are not assigned credit values by the destination node may partially cause the switching node or destination Node congestion may also completely cause congestion on the switching node or destination node, depending on the specific scenario. Thirdly, the embodiments of this application are described by taking the switching node to include only one layer as an example. In other application scenarios, the source node and the destination node may include multiple layers of switching nodes, and the source node may communicate with the destination through multiple switching nodes. Node communication, any one of the multiple switching nodes may be congested due to receiving data flows without assigned credit values.
基于同一发明构思,本申请实施例提供一种数据传输方法1200,该数据传输方法1200应用于任意网络系统中的源节点,例如可以应用于图2所述的数据中心网络中的节点a1、节点a2、节点a3或者节点a4,此时节点a1、节点a2、节点a3或者节点a4中的任一节点为源节点。该数据传输方法1200包括如下步骤:步骤1201,接收第一数据流,所述第一数据流包括多个数据包;步骤1202,根据预先设置的第一信用值,向目的节点传输第一数据流,所述第一信用值用于指示预先设置的一次所传输的数据流量大小;步骤1203,接收第一信息,所述第一信息用于指示接收所述第一数据流的端口的缓存容量超过预设阈值;步骤1204,基于所述第一信息以及从所述目的节点获得的第二信用值,向所述目的节点传输所述第一数据流,其中,所述第二信用值用于指示从所述目的节点获得的一次所传输的最大数据流量。Based on the same inventive concept, the embodiment of the present application provides a data transmission method 1200. The data transmission method 1200 is applied to the source node in any network system. For example, it can be applied to the node a1 and the node in the data center network shown in Figure 2. a2, node a3 or node a4, at this time any node among node a1, node a2, node a3 or node a4 is the source node. The data transmission method 1200 includes the following steps: Step 1201, receive a first data stream, the first data stream includes multiple data packets; Step 1202, transmit the first data stream to the destination node according to the preset first credit value , the first credit value is used to indicate the preset data traffic size transmitted at one time; Step 1203, receive first information, the first information is used to indicate that the buffer capacity of the port receiving the first data flow exceeds Preset threshold; step 1204, transmit the first data stream to the destination node based on the first information and the second credit value obtained from the destination node, where the second credit value is used to indicate The maximum data traffic obtained from the destination node for one transmission.
在一种可能的实现方式中,在接收第一信息之后,所述数据传输方法1200还包括:停止基于所述第一信用值传输所述第一数据流。In a possible implementation, after receiving the first information, the data transmission method 1200 further includes: stopping transmission of the first data stream based on the first credit value.
在一种可能的实现方式中,所述第一数据流还包括请求获得所述第二信用值的指示信息;基于所述第一信息以及从所述目的节点获得的第二信用值,向所述目的节点传输所述第一数据流之前,所述 数据传输方法1200还包括:从所述目的节点接收所述第二信用值。In a possible implementation, the first data flow further includes indication information requesting to obtain the second credit value; based on the first information and the second credit value obtained from the destination node, the Before the destination node transmits the first data stream, the The data transmission method 1200 further includes: receiving the second credit value from the destination node.
在一种可能的实现方式中,所述第一信息用于指示所述目的节点接收所述第一数据流的端口的缓存容量超过所述目的节点的预设阈值。In a possible implementation, the first information is used to indicate that the buffer capacity of the port through which the destination node receives the first data flow exceeds a preset threshold of the destination node.
在一种可能的实现方式中,所述第一信息用于指示所述目的节点接收所述第一数据流的端口的缓存容量超过所述目的节点的预设阈值。In a possible implementation, the first information is used to indicate that the buffer capacity of the port through which the destination node receives the first data flow exceeds a preset threshold of the destination node.
在一种可能的实现方式中,所述第一信息被所述交换节点承载于所述第一数据流中传输至所述目的节点。In a possible implementation, the first information is carried by the switching node in the first data stream and transmitted to the destination node.
在一种可能的实现方式中,所述接收第一信息,包括:接收第二数据流,其中所述第二数据流承载有所述第一信息、所述第二信用值以及用于指示与所述第二信用值对应的所述第一数据流的标识。In a possible implementation, the receiving the first information includes: receiving a second data stream, wherein the second data stream carries the first information, the second credit value and an indication and The identifier of the first data stream corresponding to the second credit value.
在一种可能的实现方式中,在停止基于所述第一信用值传输所述第一数据流之后,所述数据传输方法1200还包括:接收第二信息,所述第二信息用于指示所述端口的缓存容量低于预设阈值;基于所述第二信息和所述第一信用值,向所述目的节点传输第三数据流,所述第三数据流包括多个数据包。In a possible implementation, after stopping transmitting the first data stream based on the first credit value, the data transmission method 1200 further includes: receiving second information, the second information being used to indicate the The cache capacity of the port is lower than a preset threshold; based on the second information and the first credit value, a third data stream is transmitted to the destination node, where the third data stream includes a plurality of data packets.
在一种可能的实现方式中,在停止基于所述第一信用值传输所述第一数据流之后,所述数据传输方法1200还包括:经过预设时段,基于所述第一信用值,向所述目的节点传输第三数据流,所述第三数据流包括多个数据包。In a possible implementation, after stopping transmitting the first data stream based on the first credit value, the data transmission method 1200 further includes: after a preset period of time, based on the first credit value, The destination node transmits a third data stream, where the third data stream includes a plurality of data packets.
可以理解的是,源节点(例如图2所示的节点a1、节点a2、节点a3或者节点a4)为了实现上述功能,其包含了执行各个功能相应的硬件和/或软件模块。结合本文中所公开的实施例描述的各示例的步骤,本申请能够以硬件或硬件和计算机软件的结合形式来实现。某个功能究竟以硬件还是计算机软件驱动硬件的方式来执行,取决于技术方案的特定应用和设计约束条件。本领域技术人员可以结合实施例对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。It can be understood that, in order to implement the above functions, the source node (such as node a1, node a2, node a3 or node a4 shown in Figure 2) includes hardware and/or software modules corresponding to each function. In conjunction with the steps of each example described in the embodiments disclosed herein, the present application can be implemented in the form of hardware or a combination of hardware and computer software. Whether a function is performed by hardware or computer software driving the hardware depends on the specific application and design constraints of the technical solution. Those skilled in the art can use different methods to implement the described functions in conjunction with the embodiments for each specific application, but such implementations should not be considered to be beyond the scope of this application.
本实施例可以根据上述方法示例对源节点进行功能模块的划分,例如,可以对应各个功能划分各个不同功能模块,也可以将两个或两个以上的功能集成在一个处理模块中。上述集成的模块可以采用硬件的形式实现。需要说明的是,本实施例中对模块的划分是示意性的,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式。This embodiment can divide the source node into functional modules according to the above method examples. For example, different functional modules can be divided corresponding to each function, or two or more functions can be integrated into one processing module. The above integrated modules can be implemented in the form of hardware. It should be noted that the division of modules in this embodiment is schematic and is only a logical function division. In actual implementation, there may be other division methods.
在采用对应各个功能划分各个功能模块的情况下,图13示出了上述实施例中涉及的装置1300的一种可能的示意图,可以对之前提到的装置进行进一步扩展,例如,图13对应装置1300可以是软件装置,运行于源节点之上,或者装置1300可以是一个软件和硬件结合的装置,被嵌入至源节点中。如图13所示,该装置1300可以包括:第一接收单元1301,用于接收第一数据流,所述第一数据流包括多个数据包;第一发送单元1302,用于根据预先设置的第一信用值,向目的节点传输第一数据流,所述第一信用值用于指示预先设置的一次所传输的数据流量大小;第二接收单元1303,用于接收第一信息,所述第一信息用于指示接收所述第一数据流的端口的缓存容量超过预设阈值;第二发送单元1304,用于基于所述第一信息以及从所述目的节点获得的第二信用值,向所述目的节点传输所述第一数据流,其中,所述第二信用值用于指示从所述目的节点获得的一次所传输的最大数据流量。In the case of dividing each functional module corresponding to each function, Figure 13 shows a possible schematic diagram of the device 1300 involved in the above embodiment. The previously mentioned device can be further expanded. For example, the device corresponding to Figure 13 Device 1300 may be a software device running on the source node, or device 1300 may be a combined software and hardware device embedded in the source node. As shown in Figure 13, the device 1300 may include: a first receiving unit 1301, used to receive a first data stream, where the first data stream includes a plurality of data packets; a first sending unit 1302, used to send data according to a preset The first credit value is used to transmit the first data stream to the destination node. The first credit value is used to indicate the preset data traffic size transmitted at one time; the second receiving unit 1303 is used to receive the first information. A piece of information is used to indicate that the buffer capacity of the port that receives the first data flow exceeds a preset threshold; the second sending unit 1304 is used to send a message to the port based on the first information and the second credit value obtained from the destination node. The destination node transmits the first data flow, wherein the second credit value is used to indicate a maximum data flow transmitted at one time obtained from the destination node.
在一种可能的实现方式中,所述装置1300还包括:停止传输单元(图中未示出),用于停止基于所述第一信用值传输所述第一数据流。In a possible implementation, the device 1300 further includes: a transmission stopping unit (not shown in the figure), configured to stop transmitting the first data stream based on the first credit value.
在一种可能的实现方式中,所述第一数据流还包括请求获得所述第二信用值的指示信息。In a possible implementation, the first data stream further includes indication information requesting to obtain the second credit value.
在一种可能的实现方式中,所述第一信息用于指示所述目的节点接收所述第一数据流的端口的缓存容量超过所述目的节点的预设阈值。In a possible implementation manner, the first information is used to indicate that a cache capacity of a port of the destination node that receives the first data flow exceeds a preset threshold of the destination node.
在一种可能的实现方式中,所述第一信息用于指示所述目的节点接收所述第一数据流的端口的缓存容量超过所述目的节点的预设阈值。In a possible implementation, the first information is used to indicate that the buffer capacity of the port through which the destination node receives the first data flow exceeds a preset threshold of the destination node.
在一种可能的实现方式中,所述第一信息被所述交换节点承载于所述第一数据流中传输至所述目的节点。In a possible implementation, the first information is carried by the switching node in the first data stream and transmitted to the destination node.
在一种可能的实现方式中,所述第二接收单元具体用于:接收第二数据流,其中所述第二数据流承载有所述第一信息、所述第二信用值以及用于指示与所述第二信用值对应的所述第一数据流的标识。In a possible implementation, the second receiving unit is specifically configured to: receive a second data stream, wherein the second data stream carries the first information, the second credit value and an indication An identification of the first data stream corresponding to the second credit value.
在一种可能的实现方式中,在停止基于所述第一信用值传输所述第一数据流之后,所述装置1300还包括:第三接收单元(图中未示出),用于接收第二信息,所述第二信息用于指示所述端口的缓存 容量低于预设阈值;第三发送单元(图中未示出),用于基于所述第二信息和所述第一信用值,向所述目的节点传输第三数据流,所述第三数据流包括多个数据包。In a possible implementation, after stopping transmitting the first data stream based on the first credit value, the device 1300 further includes: a third receiving unit (not shown in the figure), configured to receive a third Two information, the second information is used to indicate the cache of the port The capacity is lower than the preset threshold; a third sending unit (not shown in the figure) is configured to transmit a third data stream to the destination node based on the second information and the first credit value, the third A data stream consists of multiple data packets.
在一种可能的实现方式中,在停止基于所述第一信用值传输所述第一数据流之后,所述装置1300还包括:第三发送单元(图中未示出),用于经过预设时段,基于所述第一信用值,向所述目的节点传输第三数据流,所述第三数据流包括多个数据包。In a possible implementation, after stopping transmitting the first data stream based on the first credit value, the device 1300 further includes: a third sending unit (not shown in the figure), configured to Assuming a period of time, based on the first credit value, a third data stream is transmitted to the destination node, where the third data stream includes a plurality of data packets.
示例性地,以上源节点还可以包括至少一个处理器、存储器和端口。其中,至少一个处理器可以调用所述存储器内存储的全部或部分计算机程序,对源节点的动作进行控制管理,例如,可以用于支持源节点执行图13所示的各个单元执行的步骤。存储器可以用于支持节点执行存储程序代码和数据等,存储器包括但不限于上述存储器的至少一部分存储空间、缓存(Cache)或寄存器。至少一个处理器可以实现或执行结合本申请公开内容所描述的各种示例性的多个逻辑模块,其可以是实现计算功能的一个或多个微处理器组合。此外,至少一个处理器还可以包括其他可编程逻辑器件、晶体管逻辑器件、或者分立硬件组件等。Exemplarily, the above source node may also include at least one processor, memory and port. Among them, at least one processor can call all or part of the computer program stored in the memory to control and manage the actions of the source node. For example, it can be used to support the source node to execute the steps performed by each unit shown in Figure 13. The memory can be used to support node execution and storage of program codes and data, etc. The memory includes but is not limited to at least a part of the storage space, cache (Cache) or registers of the above-mentioned memory. At least one processor may implement or execute the various exemplary plurality of logic modules described in conjunction with the present disclosure, which may be a combination of one or more microprocessors that implement computing functions. In addition, at least one processor may also include other programmable logic devices, transistor logic devices, or discrete hardware components.
本实施例还提供一种计算机可读存储介质,该计算机可读存储介质中存储有计算机指令,当该计算机指令在计算机上运行时,使得计算机执行上述相关方法步骤实现上述实施例中的数据传输方法。This embodiment also provides a computer-readable storage medium. Computer instructions are stored in the computer-readable storage medium. When the computer instructions are run on a computer, they cause the computer to execute the above related method steps to implement the data transmission in the above embodiment. method.
本实施例还提供了一种计算机程序产品,当该计算机程序产品在计算机上运行时,使得计算机执行上述相关步骤,以实现上述实施例中的数据传输方法。This embodiment also provides a computer program product. When the computer program product is run on a computer, it causes the computer to perform the above related steps to implement the data transmission method in the above embodiment.
其中,本实施例提供的计算机可读存储介质或者计算机程序产品均用于执行上文所提供的对应的方法,因此,其所能达到的有益效果可参考上文所提供的对应的方法中的有益效果,此处不再赘述。Among them, the computer-readable storage medium or computer program product provided by this embodiment is used to execute the corresponding method provided above. Therefore, the beneficial effects it can achieve can be referred to the corresponding method provided above. The beneficial effects will not be repeated here.
通过以上实施方式的描述,所属领域的技术人员可以了解到,为描述的方便和简洁,仅以上述各功能模块的划分进行举例说明,实际应用中,可以根据需要而将上述功能分配由不同的功能模块完成,即将装置的内部结构划分成不同的功能模块,以完成以上描述的全部或者部分功能。Through the description of the above embodiments, those skilled in the art can understand that for the convenience and simplicity of description, only the division of the above functional modules is used as an example. In practical applications, the above functions can be allocated to different modules according to needs. The functional module is completed, that is, the internal structure of the device is divided into different functional modules to complete all or part of the functions described above.
另外,在本申请各个实施例中的各功能单元可以集成在一个产品中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个产品中。对应于图12,上述模块如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个可读取存储介质中。基于这样的理解,本申请实施例的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该软件产品存储在一个存储介质中,包括若干指令用以使得一个设备(可以是单片机,芯片等)或处理器(processor)执行本申请各个实施例方法的全部或部分步骤。而前述的可读存储介质包括:U盘、移动硬盘、只读存储器(read only memory,ROM)、随机存取存储器(random access memory,RAM)、磁碟或者光盘等各种可以存储程序代码的介质。In addition, each functional unit in each embodiment of the present application can be integrated into one product, or each unit can exist physically alone, or two or more units can be integrated into one product. Corresponding to Figure 12, if the above modules are implemented in the form of software functional units and sold or used as independent products, they can be stored in a readable storage medium. Based on this understanding, the technical solutions of the embodiments of the present application are essentially or contribute to the existing technology, or all or part of the technical solution can be embodied in the form of a software product, and the software product is stored in a storage medium , including several instructions to cause a device (which can be a microcontroller, a chip, etc.) or a processor to execute all or part of the steps of the methods of various embodiments of the present application. The aforementioned readable storage media include: U disk, mobile hard disk, read only memory (ROM), random access memory (RAM), magnetic disk or optical disk, etc. that can store program code. medium.
最后应说明的是:以上各实施例仅用以说明本申请的技术方案,而非对其限制;尽管参照前述各实施例对本申请进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分或者全部技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本申请各实施例技术方案的范围。 Finally, it should be noted that the above embodiments are only used to illustrate the technical solution of the present application, but not to limit it; although the present application has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art should understand that: The technical solutions described in the foregoing embodiments can still be modified, or some or all of the technical features can be equivalently replaced; and these modifications or substitutions do not deviate from the essence of the corresponding technical solutions from the technical solutions of the embodiments of the present application. scope.

Claims (21)

  1. 一种数据传输方法,其特征在于,包括:A data transmission method, characterized by including:
    接收第一数据流,所述第一数据流包括多个数据包;receiving a first data stream, the first data stream including a plurality of data packets;
    根据预先设置的第一信用值,向目的节点传输第一数据流,所述第一信用值用于指示预先设置的一次所传输的数据流量大小;Transmit the first data flow to the destination node according to a preset first credit value, where the first credit value is used to indicate a preset amount of data traffic transmitted at one time;
    接收第一信息,所述第一信息用于指示接收所述第一数据流的端口的缓存容量超过预设阈值;Receive first information, the first information being used to indicate that the buffer capacity of the port that receives the first data flow exceeds a preset threshold;
    基于所述第一信息以及从所述目的节点获得的第二信用值,向所述目的节点传输所述第一数据流,其中,所述第二信用值用于指示从所述目的节点获得的一次所传输的最大数据流量。Transmit the first data stream to the destination node based on the first information and a second credit value obtained from the destination node, wherein the second credit value is used to indicate the credit value obtained from the destination node. The maximum amount of data transferred at one time.
  2. 根据权利要求1所述数据传输方法,其特征在于,所述接收第一信息之后,所述数据传输方法还包括:The data transmission method according to claim 1, characterized in that after receiving the first information, the data transmission method further includes:
    停止基于所述第一信用值传输所述第一数据流。Stop transmitting the first data stream based on the first credit value.
  3. 根据权利要求1所述的数据传输方法,其特征在于,所述第一数据流还包括请求获得所述第二信用值的指示信息;所述基于所述第一信息以及从所述目的节点获得的第二信用值,向所述目的节点传输所述第一数据流之前,所述数据传输方法还包括:The data transmission method according to claim 1, characterized in that, the first data flow further includes indication information requesting to obtain the second credit value; the information obtained based on the first information and from the destination node The second credit value, before transmitting the first data stream to the destination node, the data transmission method further includes:
    从所述目的节点接收所述第二信用值。The second credit value is received from the destination node.
  4. 根据权利要求1-3任一项所述的数据传输方法,其特征在于,所述第一信息用于指示所述目的节点接收所述第一数据流的端口的缓存容量超过所述目的节点的预设阈值。The data transmission method according to any one of claims 1 to 3, characterized in that the first information is used to indicate that the buffer capacity of the port of the destination node that receives the first data flow exceeds that of the destination node. Preset threshold.
  5. 根据权利要求1-3任一项所述的数据传输方法,其特征在于,所述根据预先设置的第一信用值,向目的节点传输第一数据流,包括:The data transmission method according to any one of claims 1 to 3, characterized in that transmitting the first data stream to the destination node according to the preset first credit value includes:
    通过交换节点向所述目的节点传输所述第一数据流;以及Transmitting the first data stream to the destination node through a switching node; and
    所述第一信息用于指示所述交换节点接收所述第一数据流的端口的缓存容量超过所述交换节点的所述预设阈值。The first information is used to indicate that the buffer capacity of the port through which the switching node receives the first data stream exceeds the preset threshold of the switching node.
  6. 根据权利要求5所述的数据传输方法,其特征在于,The data transmission method according to claim 5, characterized in that:
    所述第一信息被所述交换节点承载于所述第一数据流中传输至所述目的节点。The first information is carried by the switching node in the first data stream and transmitted to the destination node.
  7. 根据权利要求1-6任一项所述的数据传输方法,其特征在于,所述接收第一信息,包括:The data transmission method according to any one of claims 1-6, characterized in that receiving the first information includes:
    接收第二数据流,其中所述第二数据流承载有所述第一信息、所述第二信用值以及用于指示与所述第二信用值对应的所述第一数据流的标识。A second data stream is received, wherein the second data stream carries the first information, the second credit value, and an identifier indicating the first data stream corresponding to the second credit value.
  8. 根据权利要求2所述的数据传输方法,其特征在于,在停止基于所述第一信用值传输所述第一数据流之后,所述数据传输方法还包括:The data transmission method according to claim 2, characterized in that, after stopping transmitting the first data stream based on the first credit value, the data transmission method further includes:
    接收第二信息,所述第二信息用于指示所述端口的缓存容量低于所述预设阈值;Receive second information, the second information being used to indicate that the cache capacity of the port is lower than the preset threshold;
    基于所述第二信息和所述第一信用值,向所述目的节点传输第三数据流,所述第三数据流包括多个数据包。Based on the second information and the first credit value, a third data flow is transmitted to the destination node, where the third data flow includes a plurality of data packets.
  9. 根据权利要求2所述的数据传输方法,其特征在于,在停止基于所述第一信用值传输所述第一数据流之后,所述数据传输方法还包括:The data transmission method according to claim 2, characterized in that, after stopping transmitting the first data stream based on the first credit value, the data transmission method further includes:
    经过预设时段,基于所述第一信用值,向所述目的节点传输第三数据流,所述第三数据流包括多个数据包。After a preset period of time, based on the first credit value, a third data stream is transmitted to the destination node, where the third data stream includes a plurality of data packets.
  10. 根据权利要求7所述的数据传输方法,其特征在于,The data transmission method according to claim 7, characterized in that:
    所述第二数据流为确认字符报文。The second data stream is a confirmation character message.
  11. 一种数据传输系统,其特征在于,所述数据传输系统包括:多个芯片,所述多个芯片中的任一芯片包括端口,所述任一芯片通过端口与其他芯片通信;A data transmission system, characterized in that the data transmission system includes: a plurality of chips, any one of the plurality of chips includes a port, and any one of the chips communicates with other chips through the port;
    所述多个芯片中的源端芯片用于:The source chip among the plurality of chips is used for:
    接收第一数据流,所述第一数据流包括多个数据包;receiving a first data stream, the first data stream including a plurality of data packets;
    根据预先设置的第一信用值,向目的端芯片传输第一数据流,所述第一信用值用于指示预先设置的一次所传输的数据流量大小;Transmit the first data stream to the destination chip according to a preset first credit value, where the first credit value is used to indicate a preset amount of data traffic transmitted at one time;
    接收第一信息,所述第一信息用于指示接收所述第一数据流的端口的缓存容量超过预设阈值;Receive first information, the first information being used to indicate that the buffer capacity of the port that receives the first data flow exceeds a preset threshold;
    基于所述第一信息以及从所述目的端芯片获得的第二信用值,向所述目的端芯片传输所述第一数 据流,其中,所述第二信用值用于指示从所述目的端芯片获得的一次所传输的最大数据流量。Based on the first information and the second credit value obtained from the destination chip, transmit the first data to the destination chip. data flow, wherein the second credit value is used to indicate the maximum data flow rate transmitted at one time obtained from the destination chip.
  12. 根据权利要求11所述的数据传输系统,其特征在于,所述接收第一信息之后,所述源端芯片还用于:The data transmission system according to claim 11, characterized in that, after receiving the first information, the source chip is also used to:
    停止基于所述第一信用值传输所述第一数据流。Stop transmitting the first data stream based on the first credit value.
  13. 根据权利要求11所述的数据传输系统,其特征在于,所述第一数据流还包括请求获得所述第二信用值的指示信息;所述基于所述第一信息以及从所述目的端芯片获得的第二信用值,向所述目的端芯片传输所述第一数据流之前,所述源端芯片还用于:The data transmission system according to claim 11, wherein the first data stream further includes indication information requesting to obtain the second credit value; After obtaining the second credit value, before transmitting the first data stream to the destination chip, the source chip is also used to:
    从所述目的端芯片接收所述第二信用值。The second credit value is received from the destination chip.
  14. 根据权利要求11-13任一项所述的数据传输系统,其特征在于,所述第一信息用于指示所述目的端芯片接收所述第一数据流的端口的缓存容量超过所述目的端芯片的预设阈值。The data transmission system according to any one of claims 11 to 13, characterized in that the first information is used to indicate that the buffer capacity of the port of the destination chip that receives the first data stream exceeds the destination port. The chip’s preset threshold.
  15. 根据权利要求11-13任一项所述的数据传输系统,其特征在于,所述多个芯片还包括交换芯片,所述源端芯片根据预先设置的第一信用值向目的端芯片传输第一数据流时,所述源端芯片具体用于:The data transmission system according to any one of claims 11 to 13, wherein the plurality of chips further includes a switching chip, and the source chip transmits the first credit value to the destination chip according to a preset first credit value. During data streaming, the source chip is specifically used for:
    通过所述交换芯片向所述目的端芯片传输所述第一数据流;以及Transmitting the first data stream to the destination chip through the switching chip; and
    所述第一信息用于指示所述交换芯片接收所述第一数据流的端口的缓存容量超过所述交换芯片的所述预设阈值。The first information is used to indicate that the cache capacity of the port of the switching chip receiving the first data flow exceeds the preset threshold of the switching chip.
  16. 根据权利要求15所述的数据传输系统,其特征在于,所述交换芯片用于:The data transmission system according to claim 15, characterized in that the switching chip is used for:
    接收所述第一数据流;receiving the first data stream;
    将所述第一信息添加至所述第一数据流中传输至所述目的端芯片。Add the first information to the first data stream and transmit it to the destination chip.
  17. 根据权利要求11-16任一项所述的数据传输系统,其特征在于,所述目的端芯片具体用于:The data transmission system according to any one of claims 11 to 16, characterized in that the destination chip is specifically used for:
    接收所述第一数据流;receiving the first data stream;
    基于所述第一信息以及所述指示信息,生成第二数据流,所述第二数据流中承载所述第一信息、所述第二信用值以及用于指示与所述第二信用值对应的所述第一数据流的标识;Based on the first information and the indication information, generate a second data stream, where the second data stream carries the first information, the second credit value, and an identifier for indicating the first data stream corresponding to the second credit value;
    向所述源端芯片发送所述第二数据流。Send the second data stream to the source chip.
  18. 根据权利要求15所述的数据传输系统,其特征在于,当所述第一信息用于指示所述交换芯片接收所述第一数据流的端口的缓存容量超过所述交换芯片的所述预设阈值时,所述目的端芯片具体用于:The data transmission system according to claim 15, characterized in that when the first information is used to indicate that the buffer capacity of the port of the switching chip that receives the first data stream exceeds the preset value of the switching chip, When the threshold is reached, the destination chip is specifically used for:
    基于所述指示信息,生成第二数据流,所述第二数据流中承载所述第二信用值,以及通过所述交换芯片向所述源端芯片传输所述第二数据流;Based on the indication information, generate a second data stream, the second data stream carries the second credit value, and transmit the second data stream to the source chip through the switching chip;
    所述交换芯片具体用于:The switching chip is specifically used for:
    将所述第一信息添加至所述第二数据流中,以及将所述第二数据流传输至所述源端芯片。Add the first information to the second data stream, and transmit the second data stream to the source chip.
  19. 根据权利要求12所述的数据传输系统,其特征在于,在停止基于所述第一信用值传输所述第一数据流之后,所述源端芯片还用于:The data transmission system according to claim 12, characterized in that, after stopping transmitting the first data stream based on the first credit value, the source chip is also used to:
    接收第二信息,所述第二信息用于指示所述端口的缓存容量低于预设阈值;Receive second information, the second information being used to indicate that the cache capacity of the port is lower than a preset threshold;
    基于所述第二信息和所述第一信用值,向所述目的端芯片传输第三数据流,所述第三数据流包括多个数据包。Based on the second information and the first credit value, a third data stream is transmitted to the destination chip, where the third data stream includes a plurality of data packets.
  20. 根据权利要求12所述的数据传输系统,其特征在于,在停止基于所述第一信用值传输所述第一数据流之后,所述源端芯片还用于:The data transmission system according to claim 12, characterized in that, after stopping transmitting the first data stream based on the first credit value, the source chip is also used to:
    经过预设时段,基于所述第一信用值,向所述目的端芯片传输第三数据流,所述第三数据流包括多个数据包。After a preset period of time, based on the first credit value, a third data stream is transmitted to the destination chip, where the third data stream includes a plurality of data packets.
  21. 根据权利要求17或18所述的数据传输系统,其特征在于,The data transmission system according to claim 17 or 18, characterized in that,
    所述第二数据流为确认字符报文。 The second data stream is a confirmation character message.
PCT/CN2023/118075 2022-09-20 2023-09-11 Data transmission method and data transmission system WO2024061042A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202211145199.8A CN117793009A (en) 2022-09-20 2022-09-20 Data transmission method and data transmission system
CN202211145199.8 2022-09-20

Publications (1)

Publication Number Publication Date
WO2024061042A1 true WO2024061042A1 (en) 2024-03-28

Family

ID=90382087

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/118075 WO2024061042A1 (en) 2022-09-20 2023-09-11 Data transmission method and data transmission system

Country Status (2)

Country Link
CN (1) CN117793009A (en)
WO (1) WO2024061042A1 (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6724721B1 (en) * 1999-05-07 2004-04-20 Cisco Technology, Inc. Approximated per-flow rate limiting
EP2063580A1 (en) * 2007-11-20 2009-05-27 Lucent Technologies Inc. Low complexity scheduler with generalized processor sharing GPS like scheduling performance
WO2019232760A1 (en) * 2018-06-07 2019-12-12 华为技术有限公司 Data exchange method, data exchange node and data center network
CN111416775A (en) * 2019-01-04 2020-07-14 阿里巴巴集团控股有限公司 Data receiving and sending method, device and system

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6724721B1 (en) * 1999-05-07 2004-04-20 Cisco Technology, Inc. Approximated per-flow rate limiting
EP2063580A1 (en) * 2007-11-20 2009-05-27 Lucent Technologies Inc. Low complexity scheduler with generalized processor sharing GPS like scheduling performance
WO2019232760A1 (en) * 2018-06-07 2019-12-12 华为技术有限公司 Data exchange method, data exchange node and data center network
CN111416775A (en) * 2019-01-04 2020-07-14 阿里巴巴集团控股有限公司 Data receiving and sending method, device and system

Also Published As

Publication number Publication date
CN117793009A (en) 2024-03-29

Similar Documents

Publication Publication Date Title
US11477129B2 (en) Data transmission method, computing device, network device, and data transmission system
JP5159889B2 (en) Method, system, and computer program product for adaptive congestion control over virtual lanes of data center ethernet architecture
WO2020236274A1 (en) System and method for facilitating efficient event notification management for a network interface controller (nic)
WO2018210117A1 (en) Congestion control method, network device, and network interface controller thereof
WO2021244240A1 (en) Network congestion control method and apparatus, device, system, and storage medium
US20220303217A1 (en) Data Forwarding Method, Data Buffering Method, Apparatus, and Related Device
CN109417514B (en) Message sending method and device and storage equipment
CN107770085B (en) Network load balancing method, equipment and system
WO2018121535A1 (en) Load balance processing method and apparatus
US20230059755A1 (en) System and method for congestion control using a flow level transmit mechanism
WO2019001484A1 (en) Method, apparatus and system for adjusting rate of sending side
WO2021238799A1 (en) Data packet transmission method and apparatus
US20070291782A1 (en) Acknowledgement filtering
US20230370387A1 (en) Network congestion handling method, apparatus, and device
US11165705B2 (en) Data transmission method, device, and computer storage medium
JP2016515361A (en) Network transmission coordination based on transmission metadata provided by the application
CN110177051A (en) Data center's jamming control method based on fluidics
CN111224888A (en) Method for sending message and message forwarding equipment
WO2024061042A1 (en) Data transmission method and data transmission system
WO2023116580A1 (en) Path switching method and apparatus, network device, and network system
WO2023109891A1 (en) Multicast transmission method, apparatus and system
US11622028B2 (en) Explicit notification of operative conditions along a network path
WO2021120764A1 (en) Method and apparatus for sending and receiving data
WO2022057462A1 (en) Congestion control method and apparatus
TWI831622B (en) Apparatus for managing network flow congestion and method thereof