WO2023123075A1 - 一种数据交换的控制方法及装置 - Google Patents

一种数据交换的控制方法及装置 Download PDF

Info

Publication number
WO2023123075A1
WO2023123075A1 PCT/CN2021/142564 CN2021142564W WO2023123075A1 WO 2023123075 A1 WO2023123075 A1 WO 2023123075A1 CN 2021142564 W CN2021142564 W CN 2021142564W WO 2023123075 A1 WO2023123075 A1 WO 2023123075A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
edge node
flow
exchange
information
Prior art date
Application number
PCT/CN2021/142564
Other languages
English (en)
French (fr)
Inventor
林云
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to CN202180060636.3A priority Critical patent/CN116686332A/zh
Priority to PCT/CN2021/142564 priority patent/WO2023123075A1/zh
Publication of WO2023123075A1 publication Critical patent/WO2023123075A1/zh

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W28/00Network traffic management; Network resource management
    • H04W28/02Traffic management, e.g. flow control or congestion control
    • H04W28/10Flow control between communication endpoints

Definitions

  • the embodiments of the present application relate to the field of communication technologies, and in particular, to a data exchange control method and device.
  • a data center network is a network applied in a data center, which can be used to provide a fully connected network for many servers (servers) in the data center, and exchange data between different servers.
  • multiple switching nodes in the DCN are interconnected through standard protocols such as Ethernet (Ethernet) protocol or Internet protocol (internet protocol, IP), so that the processing among multiple switching nodes is independent of each other.
  • Ethernet Ethernet
  • IP Internet protocol
  • the present application provides a data exchange control method and device, which are used to avoid congestion in a data exchange network and improve exchange efficiency.
  • a data exchange control method is provided, which is applied to a data exchange network including a first edge node and a second edge node, and the method includes: the first edge node counts the number of data to be exchanged in the input data stream A data volume; the first edge node acquires indication information from the second edge node, the indication information is used to indicate the second data volume, and the second edge node is the edge node that outputs the data stream; the first edge node according to the first data volume and a second data volume, controlling the exchange of the data stream.
  • obtaining the indication information by the first edge node may refer to obtaining the indication information by a processor in the first edge node, for example, the first edge node may receive the indication information from the second edge node through a communication interface or a transceiver. At this time, the processor in the first edge node may acquire the indication information from the communication interface or the transceiver.
  • the first edge node when data is exchanged between the first edge node and the second edge node, the first edge node can count the first data amount of the data to be exchanged in the data stream, and the second edge node can send the instruction information to the second edge node
  • An edge node indicates a second amount of data that is allowed to be sent by the first edge node, and the second amount of data may be determined by the second edge node according to parameters such as status information and data flow information, so that the first edge node may be based on the first amount of data and
  • the second data volume controls the exchange of the data flow, so that the first edge node controls the ingress flow, and the second edge node controls the egress flow, thereby avoiding congestion in the data exchange network and improving exchange efficiency.
  • the first edge node controls the exchange of the data flow according to the first data volume and the second data volume, including: the first edge node amount, determine a third data amount; if the third data amount is greater than a preset threshold, the first edge node controls the exchange of the data flow.
  • the third data volume is greater than the preset threshold, it means that the data volume of the data to be exchanged in the first edge node is large, and the data volume that the second edge node can output is limited, so that the first edge node
  • congestion in the data exchange network can be avoided, and the exchange efficiency can be improved.
  • the first edge node controls the exchange of the data flow, including: the first edge node controls the flow of the data to be exchanged; and/or, the first edge node controls the data flow
  • the source server corresponding to the flow performs flow control.
  • the first edge node may perform flow control on the data to be exchanged to avoid congestion of the data exchange network caused by the data to be exchanged, and may also perform flow control on the source server corresponding to the data flow to avoid sending More data congests the data exchange network.
  • the first edge node performs flow control on the data to be exchanged, including: the first edge node buffers or discards the data to be exchanged, or controls sending the data to be exchanged to the second edge node The sending rate of exchanged data; the first edge node performs flow control on the source server, including: the first edge node sends flow control information to the source server, and the flow control information is used to indicate at least one of the following: instruct the source server to suspend sending The data flow; indicating the sending rate of the source server to send the data flow; indicating the data volume that the source server is allowed to send the data flow.
  • the first edge node may perform flow control on the data flow in various ways, thereby improving the flexibility and effectiveness of flow control.
  • the second data amount is related to the state information of the second edge node; and/or, at least one of the time point, interval and granularity of the second edge node sending the indication information It is related to the state information of the second edge node.
  • the second edge node determines the second amount of data that the first edge node is allowed to send according to its own state information, and the corresponding time point, interval, and granularity of sending the indication information, which can improve the speed of the first edge node. The accuracy of flow control for this data stream.
  • the state information of the second edge node includes at least one of the following: the amount of data that has been output for the data flow, the cache state of the second edge node, and the destination server of the data flow state.
  • the second edge node may determine the second data volume and indication information according to a plurality of different status information.
  • the method further includes: the first edge node sends request information to the second edge node, the request information is used to determine the second data amount, and the request information includes at least one of the following Information: the number of remaining data packets, the amount of remaining data, and SLA information of the service level agreement.
  • the request information is carried in a data packet of the data flow.
  • the first edge node sends request information to the second edge node, so that the second edge node determines the second data amount and indication information according to the information carried in the request information and its own state information, so that the The accuracy of the second data amount and indication information determined by the second edge node under different users, different data streams or different transmission states, etc., further improves the accuracy of the flow control of the data stream by the first edge node.
  • the method before the first edge node obtains the indication information from the second edge node, the method further includes: the first edge node sends the second edge node the data to be exchanged.
  • the first edge node directly uses the preset amount of data to send part or all of the data to be exchanged to the second edge node, the part or all of the data can be sent without waiting for a round-trip delay, Therefore, the signaling interaction between the first edge node and the second edge node is reduced, and data transmission efficiency is improved.
  • the data flow is a data flow divided according to a network output port or a destination network card; and/or, the data switching network includes one or more virtual extended local area networks (VXLAN), and the data flow is a data flow divided according to VXLAN; and/or, the data flow is a data flow divided according to a quintuple. Further, the division of the data flow is also related to priority.
  • VXLAN virtual extended local area networks
  • the division of the data flow is also related to priority.
  • a data exchange control method is provided, which is applied to a data exchange network including a first edge node and a second edge node, and the method includes: the second edge node determines indication information corresponding to a data flow, and the indication information Used to indicate the second amount of data, the second edge node is the edge node that outputs the data flow; the second edge node sends the indication information to the first edge node, so that the first edge node The first data amount and the second data amount control the exchange of the data to be exchanged, and the first data amount is obtained by statistics of the first edge node.
  • the first edge node when data is exchanged between the first edge node and the second edge node, the first edge node can count the first data amount of the data to be exchanged in the data flow, and the second edge can send the first
  • the edge node indicates the second amount of data that the first edge node is allowed to send, and the second amount of data can be determined by the second edge node according to parameters such as state information and data flow information, so that the first edge node can be based on the first data amount and the second
  • the second data volume controls the exchange of the data flow, thereby realizing the control of the first edge node on the ingress flow, and the second edge node on the egress flow, thereby avoiding congestion in the data exchange network and improving exchange efficiency.
  • the second data amount is related to the status information of the second edge node; and/or, at least one of the time point, interval and granularity of the second edge node sending the indication information It is related to the state information of the second edge node.
  • the second edge node determines the second amount of data that the first edge node is allowed to send according to its own state information, and the corresponding time point, interval, and granularity of sending the indication information, which can improve the speed of the first edge node. The accuracy of flow control for this data stream. .
  • the state information of the second edge node includes at least one of the following: the amount of data that has been output for the data flow, the cache state of the second edge node, and the destination server of the data flow state.
  • the second edge node may determine the second data volume and indication information according to a plurality of different status information.
  • the method before the second edge node determines the indication information of the data flow, the method further includes: the second edge node acquires request information from the first edge node, and the request information is used to determine For the second data volume, the request information includes at least one of the following information: the number of remaining data packets, the remaining data volume, and SLA information.
  • the request information is carried in a data packet of the data flow.
  • the first edge node sends request information to the second edge node, so that the second edge node determines the second data amount and indication information according to the information carried in the request information and its own state information, so that the The accuracy of the second data amount and indication information determined by the second edge node under different users, different data streams or different transmission states, etc., further improves the accuracy of the flow control of the data stream by the first edge node.
  • the method before the second edge node determines the indication information of the data flow, the method further includes: the second edge node acquires the data to be exchanged sent by the first edge node according to the preset data volume .
  • the first edge node directly uses the preset amount of data to send part or all of the data to be exchanged to the second edge node, the part or all of the data can be sent without waiting for a round-trip delay, Therefore, the signaling interaction between the first edge node and the second edge node is reduced, and data transmission efficiency is improved.
  • the data flow is a data flow divided according to a network port or a destination network card; and/or, the data switching network includes one or more virtual extended local area networks VXLAN, and the data flow is A data flow divided according to VXLAN; and/or, the data flow is a data flow divided according to a quintuple. Further, the division of the data flow is also related to priority.
  • the above-mentioned possible implementation method improves the flexibility and diversity of the data flow division, so that flow control can be implemented for data flows with different granularities and priorities, thereby avoiding congestion in the data exchange network and improving exchange efficiency.
  • a data exchange control device which is applied to a data exchange network including a first edge node and a second edge node, and the device, as the first edge node, includes: a processing unit for counting input data The first data amount of the data to be exchanged in the flow; the receiving unit is configured to obtain indication information from a second edge node, the indication information is used to indicate the second data amount, and the second edge node is an edge node that outputs the data flow; The processing unit is further configured to control the exchange of the data flow according to the first data volume and the second data volume.
  • the processing unit is further configured to: determine a third data amount according to the first data amount and the second data amount; if the third data amount is greater than a preset threshold, control the data flow exchange.
  • the processing unit is further configured to: control the flow of the data to be exchanged; and/or control the flow of the source server corresponding to the data flow.
  • the processing unit is further configured to: buffer or discard the data to be exchanged, or control a sending rate of the data to be exchanged to the second edge node;
  • the device includes: a sending unit, It is used to send flow control information to the source server, and the flow control information is used to indicate at least one of the following: instruct the source server to suspend sending the data stream; instruct the source server to send the sending rate of the data stream; indicate to allow the source server The amount of data sent for this stream.
  • the second data amount is related to the state information of the second edge node; and/or, at least one of the time point, interval, and granularity for the second edge node to send the indication information It is related to the state information of the second edge node.
  • the status information of the second edge node includes at least one of the following: the amount of data that has been output for the data flow, the cache status of the second edge node, and the destination server of the data flow state.
  • the device includes: a sending unit, further configured to send request information to the second edge node, where the request information is used to determine the second data amount, and the request information includes at least one of the following Types of information: the number of remaining data packets, the amount of remaining data, and SLA information.
  • the request information is carried in a data packet of the data flow.
  • the apparatus includes: a sending unit, further configured to send the data to be exchanged to the second edge node according to a preset data amount.
  • the data flow is a data flow divided according to a network port or a destination network card; and/or, the data switching network includes one or more virtual extended local area networks VXLAN, and the data flow is A data flow divided according to VXLAN; and/or, the data flow is a data flow divided according to a quintuple. Further, the division of the data flow is also related to priority.
  • a device for controlling data exchange is provided, which is applied to a data exchange network including a first edge node and a second edge node.
  • the device as the second edge node, includes: a processing unit, configured to determine the corresponding data flow Indication information, the indication information is used to indicate the second data volume, the second edge node is the edge node that outputs the data stream; the sending unit is configured to send the indication information to the first edge node, so that the first edge node according to The first data volume and the second data volume of the data to be exchanged in the data stream control the exchange of the data to be exchanged, and the first data volume is obtained by statistics of the first edge node.
  • the second data amount is related to the status information of the second edge node; and/or, at least one of the time point, interval and granularity of the second edge node sending the indication information It is related to the state information of the second edge node.
  • the state information of the second edge node includes at least one of the following: the amount of data that has been output for the data flow, the cache state of the second edge node, and the destination server of the data flow state.
  • the apparatus further includes: a receiving unit, configured to obtain request information from the first edge node, where the request information is used to determine the second data amount, and the request information includes at least the following One type of information: the number of remaining data packets, the amount of remaining data, and SLA information of the service level agreement.
  • the request information is carried in a data packet of the data flow.
  • the apparatus further includes: a receiving unit, further configured to obtain the data to be exchanged and sent by the first edge node according to a preset data amount.
  • the data flow is a data flow divided according to a network port or a destination network card; and/or, the data switching network includes one or more virtual extended local area networks VXLAN, and the data flow is A data flow divided according to VXLAN; and/or, the data flow is a data flow divided according to a quintuple. Further, the division of the data flow is also related to priority.
  • a control device for data exchange comprising: a processor, a memory, a communication interface and a bus, the processor, the memory and the communication interface are connected through the bus; the memory is used to store program codes, when When the program code is executed by the processor, the device executes the data exchange control method provided in the first aspect or any possible implementation manner of the first aspect.
  • a control device for data exchange includes: a processor, a memory, a communication interface and a bus, the processor, the memory and the communication interface are connected through the bus; the memory is used to store program codes, when When the program code is executed by the processor, the device executes the data exchange control method provided in any possible implementation manner of the second aspect or the first aspect.
  • a data exchange network in yet another aspect of the present application, includes a first edge node and a second edge node, and the first edge node includes any possible implementation of the third aspect or the third aspect Or the data exchange control device provided in the fifth aspect, the second edge node includes the fourth aspect, any possible implementation manner of the fourth aspect, or the data exchange control device provided in the sixth aspect.
  • a computer-readable storage medium is provided, and a computer program or instruction is stored in the computer-readable storage medium.
  • the computer program or instruction is executed, the first aspect or the first aspect is implemented.
  • a control method for data exchange provided by any possible implementation of .
  • a computer-readable storage medium is provided, and a computer program or instruction is stored in the computer-readable storage medium.
  • the computer program or instruction is executed, the second aspect or the second aspect is implemented.
  • a control method for data exchange provided by any possible implementation of .
  • a computer program product comprising: a computer program (also referred to as code, or an instruction), when the computer program is executed, the computer executes the program described in the first aspect. Or the data exchange control method provided by any possible implementation manner of the first aspect.
  • a computer program product comprising: a computer program (also referred to as code, or an instruction), when the computer program is executed, the computer executes the program described in the second aspect. Or the data exchange control method provided by any possible implementation manner of the second aspect.
  • FIG. 1 shows a schematic structural diagram of a switching frame
  • FIG. 2 is a schematic structural diagram of a data exchange network provided by an embodiment of the present application.
  • FIG. 3 is a schematic structural diagram of another data exchange network provided by an embodiment of the present application.
  • FIG. 4 is a schematic structural diagram of another data exchange network provided by an embodiment of the present application.
  • FIG. 5 is a schematic structural diagram of a switching system provided by an embodiment of the present application.
  • FIG. 6 is a schematic structural diagram of a source node provided by an embodiment of the present application.
  • FIG. 7 is a schematic structural diagram of a destination node provided by an embodiment of the present application.
  • FIG. 8 is a schematic flowchart of a data exchange control method provided in an embodiment of the present application.
  • FIG. 9 is a schematic flowchart of another data exchange control method provided by the embodiment of the present application.
  • FIG. 10 is a schematic structural diagram of a first edge node provided by an embodiment of the present application.
  • FIG. 11 is a schematic structural diagram of another first edge node provided by the embodiment of the present application.
  • FIG. 12 is a schematic structural diagram of a second edge node provided by an embodiment of the present application.
  • FIG. 13 is a schematic structural diagram of another second edge node provided by an embodiment of the present application.
  • At least one (unit) of a, b or c can represent: a, b, c, a and b, a and c, b and c or a, b and c, wherein a, b and c can be It can be single or multiple.
  • words such as "first" and "second” do not limit the quantity and order.
  • the data exchange network can include a data center network (dater center network, DCN), a high performance computing (high performance computing, HPC) network and a cloud
  • DCN data center network
  • HPC high performance computing
  • a physical network such as a network may also include a virtual network running on a physical network.
  • the data switching network may include multiple switching nodes, and the switching nodes may also be referred to as nodes.
  • the switching node may be a switching device such as a switch or a router, or may be a switching frame, a switching board, or a switching element (switch element, SE).
  • a switch frame can include multiple line cards and multiple switch boards, the multiple line cards and the multiple switch boards can be arranged on the backplane (backplane), and each switch board can include multiple exchange unit.
  • the switching board may also be called a switching card (switch card, SC), and the network card may also be called a network interface card (network interface card, NIC).
  • FIG. 1 shows a schematic structural diagram of a switching frame, and multiple line cards are represented as LC1-LCi, and switching units included in multiple switching boards are represented as SE1-SEk, and i and k are positive integers.
  • the structure of the data exchange network will be specifically described below.
  • Fig. 2 is a schematic structural diagram of a data exchange network provided by an embodiment of the present application.
  • the data exchange network may include at least two exchange layers, and the at least two exchange layers include a plurality of nodes.
  • the multiple nodes may all be optical nodes.
  • the node at the edge of the data exchange network may be referred to as an edge node, and the edge node may be a switching device used to access the server, or the edge node may be a server in the data exchange network.
  • a network card In FIG. 2, an edge device is used as a switching device as an example for illustration.
  • the plurality of nodes other nodes except the edge nodes may be referred to as intermediate nodes, and the intermediate nodes are not shown in FIG. 2 .
  • the edge node when an edge node is used to receive the data flow input to the data exchange network, the edge node may be called a source node; when an edge node is used to output data flow to the outside of the data exchange network, the edge node A node may be called a destination node.
  • the source node and the destination node can be used to manage the traffic input and output of the data exchange network, so that the source node can be used as an ingress traffic manager (ingress traffic manager, ITM) of the data exchange network, and the destination node Can be used as an egress traffic manager for this data exchange network. Only 3 edge nodes in the data exchange network are shown in Fig.
  • the 3 edge nodes include a destination node (expressed as D1) and two source nodes (expressed as D1) exchanging data to the destination node (expressed as S1 and S2) are described as examples, and the above FIG. 2 does not limit the embodiment of the present application.
  • the data switching network includes three switching layers.
  • the data exchange network includes an access layer, an aggregation layer and a core layer
  • the access layer includes a plurality of access nodes
  • the aggregation layer includes a plurality of aggregation nodes
  • the core layer includes a plurality of The core (core) node
  • the downlink port of the access node is connected to the server (server) that needs to exchange data traffic
  • the uplink port of the access node is connected to the downlink port of the sink node
  • the uplink port of the sink node is connected to the core node.
  • the aggregation layer and the access layer can be divided into multiple groups (pods), a group can include multiple access nodes and multiple aggregation nodes, and each access node is fully connected to multiple aggregation nodes .
  • Multiple core nodes connected to the same sink node may be referred to as a core (core) plane, and each core plane is connected to a different sink node in each group respectively.
  • the data exchange network includes 3 groups, a group includes 3 access nodes and 4 aggregation nodes, and each core plane includes two core nodes as an example for illustration.
  • the access nodes in Fig. 3 can be represented as A1-A9
  • the aggregation nodes can be represented as B1-B12
  • the core nodes can be represented as C1-C8, and the three groups are respectively represented as P1-P3.
  • the access node A1 when the data traffic is exchanged between servers connected to different access nodes in a group, it can be realized through the aggregation node in the same group as the access node, for example, the access node A1 is connected to the access node A3 If the server needs to exchange data traffic, the access node A1 can send the data stream of the connected server to the access node A3 through the aggregation node B1.
  • the data traffic When the data traffic is exchanged between the servers connected to the access nodes in different groups, it can be realized through the convergence node in the same group as the access node and the core node connected to the convergence node, for example, the access node
  • the server connected to A1 and access node A5 needs to exchange data traffic, then access node A1 can send the data flow of the server it connects to the aggregation node B1, and the aggregation node B1 forwards it to the core node C1, and then C1 passes the aggregation Node B5 sends to access node A5.
  • each switching layer of the data switching network may also include more or fewer nodes than shown in the figure, or the data switching network may also be a network including two switching layers, or a plurality of nodes in the core layer A core node may or may not be divided into multiple core planes, which is not specifically limited in this embodiment of the present application.
  • the data exchange network includes a physical network and a virtual network running on the physical network.
  • the physical network may also be referred to as an underlying network (underlay network), and is used to provide an underlying control plane (underlay control plane).
  • the physical network may include a plurality of switching devices, and among the plurality of switching devices, the switching device located at the edge of the physical network may be called an edge device (edge device), and the edge device may be used to access a server (for example, a terminal or a host wait).
  • the virtual network may also be referred to as an overlay network, and is used to provide an overlay control plane.
  • the bearer network may include one or more virtual extended local area networks (virtual extensible local area network, VXLAN).
  • VXLAN virtual extensible local area network
  • the VXLAN can be formed on the physical network through a VXLAN tunnel endpoint (VXLAN tunnel end point, VTEP).
  • VXLAN tunnel end point VXLAN tunnel end point, VTEP.
  • Different VXLANs can be distinguished by different VXLAN network identifiers (VXLAN network identifier, VNI/VNID).
  • VNI/VNID VXLAN network identifier
  • Users in the same VXLAN are equivalent to interconnecting with the connected VXLAN, that is, users in the VXLAN are not aware of the existence of other XVLANs and physical networks.
  • VNI/VNID VXLAN network identifier
  • the virtual network is based on the VXLAN created by the VXLAN protocol as an example.
  • the virtual network can also be a bearer network created based on other bearer protocols.
  • the above Figure 4 does not apply to this application. Examples constitute restrictions.
  • the structure of the physical network in the above-mentioned Figure 4 is similar to the structure of the data exchange network shown in the above-mentioned Figure 3, and the physical network can also include multiple switching layers.
  • the multiple switching layers can be two switching layers or three switching layers. exchange layer etc.
  • n ⁇ n switching network Network switch fabric, SF
  • the n ⁇ n SF includes n source nodes (source, S), n destination nodes (destination, D), and m switching nodes SN at the intermediate stage.
  • Si and Di in Figure 5 are the same sink node (the value of i is from 1 to n in sequence), that is, n source nodes and n destination nodes are connected to the same core plane and n sink nodes are respectively used as Nodes divided by function when sending nodes and receiving nodes.
  • Each of the n sink nodes may include multiple ports.
  • S the multiple ports are input ports
  • D the multiple ports are output ports.
  • the data switching network can be regarded as composed of multiple switching networks, and each switching network adopts the same control mechanism.
  • a data switching network including two switching layers can also be regarded as composed of multiple switching networks, the difference is that n source nodes and n destination nodes are connected to the same core plane as n access nodes in The nodes that are divided according to functions when they serve as the sending node and the receiving node respectively, will not be described in detail in this embodiment of the present application.
  • the switching network can complete the operation of switching the data packet (packet) received from S to D.
  • the data packet passes through SE, it can maintain the format of the original variable-length packet (variable-length packet), or it can be cut by S first. It is sent as a cell, and after D receives all the cells, it is reassembled into a complete data packet.
  • S can usually distribute received data packets to each SN as evenly as possible.
  • the data packet sent by S usually carries the information of D, and the SN forwards the data packet to the corresponding D according to the carried information.
  • S receives data packets from the outside of the system through the input port.
  • S has multiple built-in virtual output queues (virtual output queue, VOQ) for caching data packets going to different Ds (or for caching different output ports going to different Ds. or multiple VOQs corresponding to finer-grained flows, that is, multiple VOQs are used to cache data packets of different granularity flows), VOQs can be used to ensure end-to-end quality of service (quality of service, QoS), is A means of preventing head-of-line blocking (HOL blocking).
  • HOL blocking head-of-line blocking
  • n VOQs For an n ⁇ n switching network, there are generally at least n VOQs corresponding to n Ds in each S, and more VOQs can be included if it is further subdivided according to the output ports of Ds or requires a higher granularity.
  • FIG. 6 is a schematic structural diagram of a source node (S).
  • the S includes multiple input ports, a queue manager (queue manager, QM), an ingress scheduler (ingress scheduler, ISC) and a network interface (fabric interface).
  • QM can be used to manage K VOQs (K is a positive integer)
  • ISC can be used to schedule K VOQs in QM
  • the scheduled output data packets can be cut into cells through the network interface, and after adding the cell header (Header) Send to SE, or send to SE directly according to the data packet.
  • Header cell header
  • FIG. 7 is a schematic structural diagram of a destination node (D).
  • the D includes multiple output ports, a queue manager (queue manager, QM), an egress scheduler (egress scheduler, ESC) and a network interface (fabric interface).
  • QM can be used to manage L output queues (output queue, OQ)
  • L is a positive integer
  • L OQs are used to cache data packets destined for different outputs
  • ESC can be used to schedule L OQs in QM.
  • the ISC in S can send a request to the corresponding D according to the state of the VOQ in S, and the ESC in D completes the scheduling after receiving the request, and notifies the ISC of the scheduling result.
  • the ISC schedules the data packets in the VOQ in S to be dequeued according to the scheduling result.
  • the ESC in D may consider the QoS characteristics of different requests and the congestion degree of each OQ in D during the scheduling process.
  • Fig. 8 is a data exchange control method provided by the embodiment of the present application, which can be applied to the data exchange network provided above, and the method includes the following steps.
  • the first edge node counts a first data volume of the data to be exchanged in the input data stream.
  • the first edge node may be a node located at an edge position in the data exchange network and configured to receive a data stream input from the outside into the data exchange network.
  • the first edge node may be a switching device in the data switching network, for example, the switching device may be a switch or a router; the first edge node may also be a source network card SNIC, and the source network card may refer to the source server network card in .
  • the input data flow may refer to a data flow input to the first edge node.
  • the first edge node may receive one or more data streams from the outside, and the input data stream may be any one of the one or more data streams.
  • the second edge node in the data switching network can be used to output the data stream outside the data switching network, the second edge node can be a switching device in the data switching network, or it can be a destination network card DNIC, the destination network card It can be a network card in the destination server.
  • the data to be exchanged in the data stream may refer to the data of the data stream stored in the first edge node, that is, the data to be exchanged includes the data received by the first edge node but not output to the downstream node.
  • the data to be exchanged may also include the data of the data stream that has been output by the first edge node to the downstream node but not yet output by the second edge node.
  • the data to be exchanged is the data of the data flow stored in the first edge node.
  • the first edge node includes a first counter (counter), and the first edge node may use the first counter to count the first data volume of the data to be exchanged in the data flow.
  • the first data amount may be the number of data bits/bytes, the number of data packets, the number of cells, or the number of data blocks in the data to be exchanged, which is not specifically limited in this embodiment of the present application.
  • the first edge node may add one to the value of the first counter each time a data packet of the data flow is received, and then When a data packet of the data flow is output, the value of the first counter is decremented by one, so that the value of the first counter is the first data amount.
  • the first edge node may also decrement the value of the first counter by one each time a data packet of the data flow is received, and increase the value of the first counter by one each time a data packet of the data flow is output. , which is not specifically limited in this embodiment of the present application.
  • the aforementioned data flow may refer to data sent from the same source and destined for the same destination.
  • different data streams can be divided according to sources of different granularities or purposes of different granularities. Several possible division methods are described below.
  • the first type is to divide the data flow according to the network output port or the destination network card. Specifically, taking the first edge node as an example, all the data received by the first edge node and destined for the same network output port or the same destination network card are considered as data in the same data stream.
  • the network output port may refer to a port for outputting data to the outside of the data exchange network
  • the destination network card may refer to a network card of the destination server.
  • the data exchange network includes a bearer network
  • the data in the bearer network can also be divided into data streams according to the network output port or the destination network card.
  • the data switching network includes at least one VXLAN
  • the first edge node uses multiple counters to count the amount of data to be exchanged in different data streams, there may be a corresponding relationship between the multiple counters and multiple network output ports, or the multiple counters and There may be a corresponding relationship between multiple destination network cards.
  • each counter in the first edge node may be used to count the amount of data corresponding to a network output port or a destination network card.
  • the second type is to divide the data flow according to the bearer network, for example, the bearer network may be VXLAN.
  • the bearer network may be VXLAN.
  • the data of the same bearer network (for example, the same VXLAN) received by the first edge node is regarded as data in the same data flow.
  • the data of the physical network can also be divided into data streams according to the bearer network.
  • the physical network can be determined according to the correspondence between the at least one VXLAN and the network output port, or the correspondence between the at least one VXLAN and the destination network card The data stream corresponding to the data in .
  • the first edge node uses multiple counters to count the amount of data to be exchanged in different data flows, there may be a corresponding relationship between the multiple counters and at least one VXLAN included in the data switching network.
  • each counter in the first edge node may be used to count the amount of data corresponding to one VXLAN.
  • the third type is to divide the data stream according to the quintuple.
  • the data corresponding to the same quintuple received by the first edge node is regarded as data in the same data stream.
  • the five-tuple may include: source IP address, source port, destination IP address, destination port and transport layer protocol.
  • data streams may also be divided according to four-tuples or seven-tuples, which will not be repeated in this embodiment of the present application.
  • the data exchange network includes a bearer network
  • the data in the bearer network can also be divided into data flows according to the quintuple.
  • each of the multiple counters can be used to count different data obtained according to the five-tuple division at different times The data volume of the flow, the data volume of the same data flow is counted at the same time.
  • the division may also be performed in combination with the priority of the data.
  • the priority may refer to information used to distinguish the priority or type of the data packet, and the priority may usually be transmitted together with the data, for example, the priority may be derived from the differentiated services code point ( Differentiated services code point (DSCP) domain, DSCP domain can also be called service type (type of service, ToS).
  • DSCP Differentiated services code point
  • ToS type of service
  • the data received by the first edge node to the same DNIC and corresponding to the same priority can be regarded as the data of the same data flow, so that it can also be based on ⁇
  • the dimension of DNIC, priority ⁇ sets the number of counters in the first edge node.
  • VXLAN and priority division the data received by the first edge node and destined for the same VXLAN (that is, corresponding to the same VXLAN network identifier VNI) and corresponding to the same priority can be regarded as data of the same data flow , so that the number of counters can also be set in the first edge node according to the dimension of ⁇ VNI, priority ⁇ .
  • the second edge node sends indication information to the first edge node, where the indication information is used to indicate the second data amount.
  • the second data volume may be the data volume determined by the second edge node and allowed to be sent by the first edge node.
  • the indication information may include information for directly indicating the second data volume, or may include information for indirectly indicating the second data volume.
  • the indication information may directly be a numerical value corresponding to the second data volume. For example, if the indication information includes a value of 200, it is used to indicate that the second data volume is 200KB.
  • the indication information may include the sequence number of the data packet corresponding to the second amount of data (this sequence number is the same as the sequence number of the last data packet currently exchanged) The difference between can be used to indicate the second data amount), or the indication information includes a first value, and the product of the first value and the unit data amount is used to indicate the second data amount, for example, if the first value is 6, then
  • the second data amount may be 6* ⁇ F, and ⁇ F represents a unit data amount, for example, ⁇ F may be 4KB, 8KB, etc., and the unit data amount may be set or configured in advance.
  • the second edge node may generate the indication according to the relevant information of the data flow and/or the state information of the second edge node information, and send the indication information to the first edge node.
  • the indication information may also be used to indicate the data volume (for example, the length of the data packet) of the data flow just sent by the second edge node, and other possible information such as status information of the second edge node.
  • the second edge node may also determine other possible information such as the time point, interval, and granularity (also called granularity) for sending the indication information according to the relevant information of the data flow and/or the state information of the second edge node .
  • the granularity of sending the indication information may refer to the granularity of the correspondingly sent data when the second edge node is triggered to send the indication information.
  • the status information of the second edge node includes at least one of the following items: the amount of data that has been output for the data flow, the cache status of the second edge node, and the output port status of the second edge node.
  • the cache status of the second edge node may include the cache status of the data flow in the second edge node, and may also include the cache status of other data flows.
  • the output port status of the second edge node may include the bandwidth and rate of the output port corresponding to the data flow in the second edge node, and may also include the bandwidth and rate of the output port, and the like.
  • the relevant information of the data flow may include at least one of the following: the state of the destination server of the data flow, the state of the source server, information of remaining data, and service-level agreement (service-level agreement, SLA) information of the data flow.
  • the status of the destination server may include the receiving bandwidth and rate of the destination server, etc.; the status of the source server may include the first data volume, sending bandwidth and rate, etc.; the remaining data information may include the number of remaining data packets or the remaining data volume; SLA information It can include user level or priority, data type, business type, etc.
  • the first edge node may send request information to the second edge node, and the request information is used to determine the second The amount of data
  • the request information may include at least one of the following information: expected data amount (that is, the amount of data that the first edge node expects to send), the first amount of data, the status of the source server, information about remaining data, and the like.
  • expected data amount that is, the amount of data that the first edge node expects to send
  • the second edge node may generate indication information according to the above manner, and send the indication information to the first edge node.
  • the first edge node may send the request information to the second edge node through control signaling; or, the first edge node carries the request information in the data packet of the data flow, that is, the request information carries in the packets of that data stream.
  • the first edge node may not send any request information, and after the second edge node sends the data transmitted by the first edge node, the second edge node feeds back an indication information to the first edge node.
  • the first edge node sends a request message to the second edge node, and the second edge node feeds back instruction information to the first edge node after sending the data transmitted by the first edge node.
  • obtaining the indication information by the first edge node may refer to obtaining the indication information by a processor in the first edge node.
  • the first edge node may receive the indication information sent by the second edge node through the communication interface or the transceiver, and at this time, the processor in the first edge node can obtain the indication information from the communication interface or the transceiver.
  • the first edge node may determine the third data amount according to the first data amount and the second data amount, for example, the third data amount is the first data amount The difference between the amount of data and the second data amount, if the third data amount is greater than a preset threshold, the first edge node can control the exchange of the data flow, and the preset threshold can be set in advance. Controlling the exchange of the data flow by the first edge node may include performing flow control on the data to be exchanged, and/or performing flow control on the source server corresponding to the data flow.
  • controlling the exchange of the data flow by the first edge node may include: the first edge node caches or discards at least part of the data to be exchanged, or controls the sending rate of the data to be exchanged to the second edge node to be lower than the predetermined Set rate, the preset rate can be set in advance.
  • the flow control of the source server by the first edge node may include: the first edge node sends flow control information to the source server, and the flow control information is used to indicate at least one of the following: instruct the source server to suspend sending the data flow, for example, sending Close (Xoff) signal or pause (pause) signal; Instruct the source server to send the sending rate of the data flow; Indicate the data volume that the source server is allowed to send the data flow.
  • the first edge node may also instruct the source server to resume sending the data flow, For example, an open (Xon) signal is sent to the origin server.
  • the first edge node when data exchange is performed between the first edge node and the second edge node, the first edge node can count the first data volume of the data to be exchanged in the data flow, and the second edge can send the instruction information to The first edge node indicates the second amount of data that the first edge node is allowed to send.
  • the second amount of data can be determined by the second edge node according to parameters such as state information and data flow information, so that the first edge node can and the second data volume control the exchange of the data flow, so as to realize the control of the first edge node on the ingress flow, and the second edge node's control on the egress flow, thereby avoiding congestion in the data exchange network and improving exchange efficiency.
  • the method may further include S204, and S204 may be located after S201.
  • the first edge node sends the data to be exchanged to the second edge node according to a preset amount of data.
  • the first edge node When the first edge node has a preset data amount, the first edge node can directly use the preset data amount to send the data to be exchanged to the second edge node, so that the second edge can receive the first edge node using the preset data amount Data sent to be exchanged.
  • the specific data to be sent may be part or all of the data to be exchanged, and is specifically related to the first data amount and the size of the preset data amount.
  • the preset data volume may be set in advance, for example, the preset data volume may be 50KB, 100KB, or 150KB, etc., which is not specifically limited in this embodiment of the present application.
  • the first edge node and the second edge node can also All the data of the data flow is sent to the second edge node in the manner provided above.
  • the preset data volume is greater than the first data volume, for the data of the data stream subsequently received by the first edge node, the first edge node can use the difference between the preset data volume and the first data volume, plus The second data amount indicated by the indication information sent by the second edge node is used as the data amount allowed to be sent by the second edge node for transmitting the data of the data flow subsequently received.
  • the part or all of the data does not need to wait for the round-trip delay (round-trip delay). time, RTT), so that the signaling interaction between the first edge node and the second edge node can be reduced, and the data transmission efficiency can be improved.
  • each network element such as the first edge node and the second edge node, includes a corresponding hardware structure and/or software module for performing each function.
  • the present application can be implemented in the form of hardware or a combination of hardware and computer software in combination with the units and algorithm steps of each example described in the embodiments disclosed herein. Whether a certain function is executed by hardware or computer software drives hardware depends on the specific application and design constraints of the technical solution. Skilled artisans may use different methods to implement the described functions for each specific application, but such implementation should not be regarded as exceeding the scope of the present application.
  • the function modules of the first edge node and the second edge node can be divided according to the above method example.
  • each function module can be divided corresponding to each function, or two or more functions can be integrated into one module.
  • the above-mentioned integrated modules can be implemented in the form of hardware or in the form of software function modules. It should be noted that the division of modules in the embodiment of the present application is schematic, and is only a logical function division, and there may be other division methods in actual implementation. The following is an example of dividing each function module by corresponding function:
  • FIG. 10 shows a possible structural diagram of the data exchange control device involved in the above embodiment.
  • the device may be a first edge node or a built-in chip of the first edge node, and the device includes: a processing unit 301 and a receiving unit 302 .
  • the processing unit 301 is used to support the apparatus to execute S201 and/or S203 in the method embodiment;
  • the receiving unit 302 supports the apparatus to execute the step of receiving the indication information sent by S202 in the method embodiment.
  • the device may further include a sending unit 303, and the sending unit 303 is configured to support the device in performing the step of sending the request information. All relevant content of the steps involved in the above method embodiments can be referred to the function descriptions of the corresponding functional modules, and will not be repeated here.
  • the processing unit 301 in this application can be the processor of the control device for data exchange
  • the receiving unit 302 can be the receiver of the device
  • the sending unit 303 can be the transmitter of the device.
  • it can be integrated with the receiver as a transceiver, and the specific transceiver can also be called a communication interface.
  • FIG. 11 is a schematic diagram of a possible logical structure of the device for controlling data exchange involved in the above-mentioned embodiments provided by the embodiments of the present application.
  • the device may be the first edge node or a built-in chip of the first edge node, and the device includes: a processor 312 and a communication interface 313 .
  • the processor 31 is used to control and manage the actions of the device, for example, the processor 312 is used to support the device in executing the method embodiment to generate request information, parse instruction information, count data to be exchanged, and/or use the other processes of technology.
  • the device can also include a memory 311 and a bus 314, the processor 312, the communication interface 313 and the memory 311 are connected to each other through the bus 314; the communication interface 313 is used to support the device to communicate; the memory 311 is used to store the program code of the device and data.
  • the processor 312 may be a central processing unit, a general purpose processor, a digital signal processor, an application specific integrated circuit, a field programmable gate array or other programmable logic devices, transistor logic devices, hardware components or any combination thereof. It can implement or execute the various illustrative logical blocks, modules and circuits described in connection with the present disclosure.
  • the processor may also be a combination that implements computing functions, such as a combination of one or more microprocessors, a combination of a digital signal processor and a microprocessor, and the like.
  • the bus 314 may be a peripheral component interconnect standard (peripheral component interconnect, PCI) bus or an extended industry standard architecture (extended industry standard architecture, EISA) bus or the like.
  • PCI peripheral component interconnect
  • EISA extended industry standard architecture
  • FIG. 12 shows a possible structural diagram of the data exchange control device involved in the above embodiment.
  • the device may be a second edge node or a built-in chip of the second edge node, and the device includes: a processing unit 401 and a sending unit 402 .
  • the processing unit 401 is used to support the device in the steps of determining the second data amount and generating indication information;
  • the sending unit 402 supports the device in performing S202 in the method embodiment.
  • the device may further include a receiving unit 403, and the receiving unit 403 is configured to support the device in performing the step of receiving request information. All relevant content of the steps involved in the above method embodiments can be referred to the function descriptions of the corresponding functional modules, and will not be repeated here.
  • the processing unit 401 in this application can be the processor of the control device for data exchange
  • the sending unit 402 can be the transmitter of the device
  • the receiving unit 403 can be the receiver and transmitter of the device.
  • it can be integrated with the receiver as a transceiver, and the specific transceiver can also be called a communication interface.
  • FIG. 13 is a schematic diagram of a possible logical structure of the data exchange control device involved in the above-mentioned embodiments provided by the embodiments of the present application.
  • the device may be the second edge node or a built-in chip of the second edge node, and the device includes: a processor 412 and a communication interface 413 .
  • the processor 412 is used to control and manage the actions of the device, for example, the processor 412 is used to support the device to perform the parsing of request information, determine the second data amount, generate instruction information, and/or use the method described herein other processes of the technology.
  • the device can also include a memory 411 and a bus 414, the processor 412, the communication interface 413 and the memory 411 are connected to each other through the bus 414; the communication interface 413 is used to support the device to communicate; the memory 411 is used to store the program code of the device and data.
  • the processor 412 may be a central processing unit, a general purpose processor, a digital signal processor, an application specific integrated circuit, a field programmable gate array or other programmable logic devices, transistor logic devices, hardware components or any combination thereof. It can implement or execute the various illustrative logical blocks, modules and circuits described in connection with the present disclosure.
  • the processor may also be a combination that implements computing functions, such as a combination of one or more microprocessors, a combination of a digital signal processor and a microprocessor, and the like.
  • the bus 414 may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like.
  • PCI Peripheral Component Interconnect
  • EISA Extended Industry Standard Architecture
  • the first edge node and the second edge node in the device embodiment of the present application may respectively correspond to the first edge node and the second edge node in the method embodiment of the present application.
  • the modules and other operations and/or functions of the first edge node and the second edge node are respectively intended to implement the corresponding processes of the above-mentioned method embodiments.
  • the description of the method embodiments of the present application can be applied to this device embodiment. I won't repeat them here.
  • an embodiment of the present application further provides a data exchange network, where the data exchange network includes a first edge node and a second edge node.
  • the first edge node may be the first edge node provided by the above-mentioned device embodiment, and is used to perform the steps of the first edge node in the above-mentioned method embodiment
  • the second edge node may be the second edge node provided by the above-mentioned device embodiment.
  • the edge node is configured to execute the steps of the second edge node in the above method embodiment.
  • the disclosed data exchange network, device and method can be implemented in other ways.
  • the device embodiments described above are only illustrative.
  • the division of the units is only a logical function division. In actual implementation, there may be other division methods.
  • multiple units or components can be combined or May be integrated into another system, or some features may be ignored, or not implemented.
  • the mutual coupling or direct coupling or communication connection shown or discussed may be through some interfaces, and the indirect coupling or communication connection of devices or units may be in electrical, mechanical or other forms.
  • the units described as separate components may or may not be physically separated, and the components shown as units may or may not be physical units, that is, they may be located in one place, or may be distributed to multiple network units. Part or all of the units can be selected according to actual needs to achieve the purpose of the solution of this embodiment.
  • each functional unit in each embodiment of the present application may be integrated into one processing unit, each unit may exist separately physically, or two or more units may be integrated into one unit.
  • the functions described above are realized in the form of software function units and sold or used as independent products, they can be stored in a computer-readable storage medium.
  • the technical solution of the present application is essentially or the part that contributes to the prior art or the part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium, including Several instructions are used to make a computer device (which may be a personal computer, a server, or a network device, etc.) execute all or part of the steps of the methods described in the various embodiments of the present application.
  • a readable storage medium is also provided, and computer-executable instructions are stored in the readable storage medium.
  • a device such as a single-chip microcomputer, chip, etc.
  • a processor executes the The steps of the first edge node in the data exchange control method are provided.
  • the above-mentioned readable storage medium may include various mediums capable of storing program codes such as U disk, mobile hard disk, read-only memory, random access memory, magnetic disk or optical disk.
  • a readable storage medium is also provided, and computer-executable instructions are stored in the readable storage medium.
  • a device such as a single-chip microcomputer, chip, etc.
  • a processor executes the The second edge node step in the data exchange control method is provided.
  • the above-mentioned readable storage medium may include various mediums capable of storing program codes such as U disk, mobile hard disk, read-only memory, random access memory, magnetic disk or optical disk.
  • a computer program product in another embodiment, includes computer-executable instructions, and the computer-executable instructions are stored in a computer-readable storage medium; Reading the storage medium reads the computer-executable instructions, and at least one processor executes the computer-executable instructions to make the device perform the steps of the first edge node in the data exchange control method provided by the above-mentioned method embodiments.
  • a computer program product in another embodiment, includes computer-executable instructions, and the computer-executable instructions are stored in a computer-readable storage medium; Reading the storage medium reads the computer-executable instructions, and at least one processor executes the computer-executable instructions so that the device implements the steps of the second edge node in the provided data communication method in the above method.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

本申请提供一种数据交换的控制方法及装置,涉及通信技术领域,用于避免数据交换网络中出现拥塞,提高交换效率。该方法应用于包括第一边缘节点和第二边缘节点的数据交换网络中,该方法包括:第一边缘节点统计输入的数据流中待交换数据的第一数据量;第一边缘节点获取来自第二边缘节点的指示信息,该指示信息用于指示第二数据量,第二边缘节点为输出该数据流的边缘节点;第一边缘节点根据第一数据量和第二数据量,控制该数据流的交换。

Description

一种数据交换的控制方法及装置 技术领域
本申请实施例涉及通信技术领域,尤其涉及一种数据交换的控制方法及装置。
背景技术
数据中心网络(dater center network,DCN)是一种应用于数据中心内的网络,可用于为数据中心内的众多服务器(server)提供全连接的网络,将不同服务器之间的数据进行交换。
目前,DCN中的多个交换节点之间通过以太网(Ethernet)协议或者因特网协议(internet protocol,IP)等标准协议互联,从而使得多个交换节点之间的处理相互独立。但是,随着DCN的带宽和规模的不断扩大,网络中数据流的数量越来越多,这样很容易造成网络拥塞和故障等问题,从而导致DCN的交换效率较低。
发明内容
本申请提供一种数据交换的控制方法及装置,用于避免数据交换网络中出现拥塞,提高交换效率。
为达到上述目的,本申请的实施例采用如下技术方案:
第一方面,提供一种数据交换的控制方法,应用于包括第一边缘节点和第二边缘节点的数据交换网络中,该方法包括:第一边缘节点统计输入的数据流中待交换数据的第一数据量;第一边缘节点获取来自第二边缘节点的指示信息,该指示信息用于指示第二数据量,第二边缘节点为输出该数据流的边缘节点;第一边缘节点根据第一数据量和第二数据量,控制该数据流的交换。其中,第一边缘节点获取该指示信息可以是指第一边缘节点中的处理器获取该指示信息,比如,第一边缘节点可以通过通信接口或者收发器接收来自第二边缘节点的指示信息,此时第一边缘节点中的处理器可以从该通信接口或者收发器获取该指示信息。
上述技术方案中,当第一边缘节点与第二边缘节点之间进行数据交换时,第一边缘节点可以统计数据流中待交换数据的第一数据量,第二边缘节点可以通过指示信息向第一边缘节点指示允许第一边缘节点发送的第二数据量,第二数据量可以是第二边缘节点根据状态信息和数据流信息等参数确定的,这样第一边缘节点可以根据第一数据量和第二数据量控制该数据流的交换,从而实现第一边缘节点对入口流量的控制,第二边缘节点对出口流量的控制,进而避免数据交换网络中出现拥塞,提高交换效率。
在第一方面的一种可能的实现方式中,第一边缘节点根据第一数据量和第二数据量,控制该数据流的交换,包括:第一边缘节点根据第一数据量和第二数据量,确定第三数据量;若第三数据量大于预设阈值,第一边缘节点控制该数据流的交换。上述可能的实现方式中,当第三数据量大于预设阈值时,表示第一边缘节点中的待交换数据的数据量较多,第二边缘节点能够输出的数据量有限,从而第一边缘节点通过控制该数据流的交换,可以避免数据交换网络中出现拥塞,提高交换效率。
在第一方面的一种可能的实现方式中,第一边缘节点控制该数据流的交换,包括:第 一边缘节点对该待交换数据做流量控制;和/或,第一边缘节点对该数据流对应的源服务器做流量控制。上述可能的实现方式中,第一边缘节点可以对该待交换数据做流量控制以避免该待交换数据对该数据交换网络造成拥塞,也可以对该数据流对应的源服务器做流量控制以避免发送较多的数据对该数据交换网络造成拥塞。
在第一方面的一种可能的实现方式中,第一边缘节点对该待交换数据做流量控制,包括:第一边缘节点缓存或丢弃该待交换数据,或者控制向第二边缘节点发送该待交换数据的发送速率;第一边缘节点对源服务器做流量控制,包括:第一边缘节点向该源服务器发送流量控制信息,该流量控制信息用于指示以下至少一项:指示该源服务器暂停发送该数据流;指示该源服务器发送该数据流的发送速率;指示允许该源服务器发送该数据流的数据量。上述可能的实现方式中,第一边缘节点可以通过多种方式对数据流做流量控制,从而提高流量控制的灵活性和有效性。
在第一方面的一种可能的实现方式中,第二数据量与第二边缘节点的状态信息有关;和/或,第二边缘节点发送该指示信息的时间点、间隔和粒度中的至少一个与第二边缘节点的状态信息有关。上述可能的实现方式中,第二边缘节点根据自身的状态信息确定允许第一边缘节点发送的第二数据量、以及对应发送该指示信息的时间点、间隔和粒度等,可以提高第一边缘节点对该数据流做流量控制的准确性。
在第一方面的一种可能的实现方式中,第二边缘节点的状态信息包括以下至少一项:已输出该数据流的数据量、第二边缘节点的缓存状态、该数据流的目的服务器的状态。上述可能的实现方式中,第二边缘节点可以根据多个不同的状态信息确定第二数据量和指示信息。
在第一方面的一种可能的实现方式中,该方法还包括:第一边缘节点向第二边缘节点发送请求信息,该请求信息用于确定第二数据量,该请求信息包括以下至少一种信息:剩余数据包的数量、剩余数据量、服务等级协议SLA信息。可选的,该请求信息承载在该数据流的数据包中。上述可能的实现方式中,第一边缘节点向第二边缘节点发送请求信息,以使第二边缘节点根据该请求信息携带的信息和自身的状态信息确定第二数据量和指示信息,从而可以提高第二边缘节点在不同用户、不同数据流或不同传输状态等情况下确定的第二数据量和指示信息的准确性,进而提高第一边缘节点对该数据流做流量控制的准确性。
在第一方面的一种可能的实现方式中,第一边缘节点获取来自第二边缘节点的指示信息之前,该方法还包括:第一边缘节点根据预设数据量,向第二边缘节点发送该待交换数据。上述可能的实现方式中,第一边缘节点直接使用该预设数据量向第二边缘节点发送该待交换数据中的部分或者全部数据时,该部分或者全部数据无需等待往返时延即可发送,从而减少第一边缘节点与第二边缘节点间的信令交互,提高数据传输效率。
在第一方面的一种可能的实现方式中,该数据流是根据网络输出端口或者目的网卡划分的数据流;和/或,该数据交换网络包括一个或者多个虚拟扩展局域网VXLAN,该数据流是根据VXLAN划分的数据流;和/或,该数据流是根据五元组划分的数据流。进一步的,该数据流的划分还与优先级有关。上述可能的实现方式,提高了该数据流划分的灵活性和多样性,从而对于不同粒度和优先级的数据流均可实现流量控制,进而避免数据交换网络中出现拥塞,提高交换效率。
第二方面,提供一种数据交换的控制方法,应用于包括第一边缘节点和第二边缘节点的数据交换网络中,该方法包括:第二边缘节点确定数据流对应的指示信息,该指示信息用于指示第二数据量,第二边缘节点为输出该数据流的边缘节点;第二边缘节点向第一边缘节点发送该指示信息,以使第一边缘节点根据该数据流中待交换数据的第一数据量和第二数据量控制该待交换数据的交换,第一数据量是第一边缘节点统计得到的。
上述技术方案中,当第一边缘节点与第二边缘节点之间进行数据交换时,第一边缘节点可以统计数据流中待交换数据的第一数据量,第二边缘可以通过指示信息向第一边缘节点指示允许第一边缘节点发送的第二数据量,第二数据量可以是第二边缘节点根据状态信息和数据流信息等参数确定的,这样第一边缘节点可以根据第一数据量和第二数据量控制该数据流的交换,从而实现第一边缘节点对入口流量的控制,第二边缘节点对出口流量的控制,进而避免数据交换网络中出现拥塞,提高交换效率。
在第二方面的一种可能的实现方式中,第二数据量与第二边缘节点的状态信息有关;和/或,第二边缘节点发送该指示信息的时间点、间隔和粒度中的至少一个与第二边缘节点的状态信息有关。上述可能的实现方式中,第二边缘节点根据自身的状态信息确定允许第一边缘节点发送的第二数据量、以及对应发送该指示信息的时间点、间隔和粒度等,可以提高第一边缘节点对该数据流做流量控制的准确性。。
在第二方面的一种可能的实现方式中,第二边缘节点的状态信息包括以下至少一项:已输出该数据流的数据量、第二边缘节点的缓存状态、该数据流的目的服务器的状态。上述可能的实现方式中,第二边缘节点可以根据多个不同的状态信息确定第二数据量和指示信息。
在第二方面的一种可能的实现方式中,第二边缘节点确定数据流的指示信息之前,该方法还包括:第二边缘节点获取来自第一边缘节点的请求信息,该请求信息用于确定第二数据量,该请求信息包括以下至少一种信息:剩余数据包的数量、剩余数据量、服务等级协议SLA信息。可选的,该请求信息承载在该数据流的数据包中。上述可能的实现方式中,第一边缘节点向第二边缘节点发送请求信息,以使第二边缘节点根据该请求信息携带的信息和自身的状态信息确定第二数据量和指示信息,从而可以提高第二边缘节点在不同用户、不同数据流或不同传输状态等情况下确定的第二数据量和指示信息的准确性,进而提高第一边缘节点对该数据流做流量控制的准确性。
在第二方面的一种可能的实现方式中,第二边缘节点确定数据流的指示信息之前,该方法还包括:第二边缘节点获取第一边缘节点根据预设数据量发送的该待交换数据。上述可能的实现方式中,第一边缘节点直接使用该预设数据量向第二边缘节点发送该待交换数据中的部分或者全部数据时,该部分或者全部数据无需等待往返时延即可发送,从而减少第一边缘节点与第二边缘节点间的信令交互,提高数据传输效率。
在第二方面的一种可能的实现方式中,该数据流是根据网络端口或者目的网卡划分的数据流;和/或,该数据交换网络包括一个或者多个虚拟扩展局域网VXLAN,该数据流是根据VXLAN划分的数据流;和/或,该数据流是根据五元组划分的数据流。进一步的,该数据流的划分还与优先级有关。上述可能的实现方式,提高了该数据流划分的灵活性和多样性,从而对于不同粒度和优先级的数据流均可实现流量控制,进而避免数据交换网络中出现拥塞,提高交换效率。
第三方面,提供一种数据交换的控制装置,应用于包括第一边缘节点和第二边缘节点的数据交换网络中,该装置作为第一边缘节点,包括:处理单元,用于统计输入的数据流中待交换数据的第一数据量;接收单元,用于获取来自第二边缘节点的指示信息,该指示信息用于指示第二数据量,第二边缘节点为输出该数据流的边缘节点;处理单元,还用于根据第一数据量和第二数据量,控制该数据流的交换。
在第三方面的一种可能的实现方式中,处理单元还用于:根据第一数据量和第二数据量,确定第三数据量;若第三数据量大于预设阈值,控制该数据流的交换。
在第三方面的一种可能的实现方式中,处理单元还用于:对该待交换数据做流量控制;和/或,对该数据流对应的源服务器做流量控制。
在第三方面的一种可能的实现方式中,处理单元还用于:缓存或丢弃该待交换数据,或者控制向第二边缘节点发送该待交换数据的发送速率;该装置包括:发送单元,用于向该源服务器发送流量控制信息,该流量控制信息用于指示以下至少一项:指示该源服务器暂停发送该数据流;指示该源服务器发送该数据流的发送速率;指示允许该源服务器发送该数据流的数据量。
在第三方面的一种可能的实现方式中,第二数据量与第二边缘节点的状态信息有关;和/或,第二边缘节点发送该指示信息的时间点、间隔和粒度中的至少一个与第二边缘节点的状态信息有关。
在第三方面的一种可能的实现方式中,第二边缘节点的状态信息包括以下至少一项:已输出该数据流的数据量、第二边缘节点的缓存状态、该数据流的目的服务器的状态。
在第三方面的一种可能的实现方式中,该装置包括:发送单元,还用于向第二边缘节点发送请求信息,该请求信息用于确定第二数据量,该请求信息包括以下至少一种信息:剩余数据包的数量、剩余数据量、服务等级协议SLA信息。可选的,该请求信息承载在该数据流的数据包中。
在第三方面的一种可能的实现方式中,该装置包括:发送单元,还用于根据预设数据量,向第二边缘节点发送该待交换数据。
在第三方面的一种可能的实现方式中,该数据流是根据网络端口或者目的网卡划分的数据流;和/或,该数据交换网络包括一个或者多个虚拟扩展局域网VXLAN,该数据流是根据VXLAN划分的数据流;和/或,该数据流是根据五元组划分的数据流。进一步的,该数据流的划分还与优先级有关。
第四方面,提供一种数据交换的控制装置,应用于包括第一边缘节点和第二边缘节点的数据交换网络中,该装置作为第二边缘节点,包括:处理单元,用于确定数据流对应的指示信息,该指示信息用于指示第二数据量,第二边缘节点为输出该数据流的边缘节点;发送单元,用于向第一边缘节点发送该指示信息,以使第一边缘节点根据该数据流中待交换数据的第一数据量和第二数据量控制该待交换数据的交换,第一数据量是第一边缘节点统计得到的。
在第四方面的一种可能的实现方式中,第二数据量与第二边缘节点的状态信息有关;和/或,第二边缘节点发送该指示信息的时间点、间隔和粒度中的至少一个与第二边缘节点的状态信息有关。
在第四方面的一种可能的实现方式中,第二边缘节点的状态信息包括以下至少一项: 已输出该数据流的数据量、第二边缘节点的缓存状态、该数据流的目的服务器的状态。
在第四方面的一种可能的实现方式中,该装置还包括:接收单元,用于获取来自第一边缘节点的请求信息,该请求信息用于确定第二数据量,该请求信息包括以下至少一种信息:剩余数据包的数量、剩余数据量、服务等级协议SLA信息。可选的,该请求信息承载在该数据流的数据包中。
在第四方面的一种可能的实现方式中,该装置还包括:接收单元,还用于获取第一边缘节点根据预设数据量发送的该待交换数据。
在第四方面的一种可能的实现方式中,该数据流是根据网络端口或者目的网卡划分的数据流;和/或,该数据交换网络包括一个或者多个虚拟扩展局域网VXLAN,该数据流是根据VXLAN划分的数据流;和/或,该数据流是根据五元组划分的数据流。进一步的,该数据流的划分还与优先级有关。
第五方面,提供一种数据交换的控制装置,该装置包括:处理器、存储器、通信接口和总线,该处理器、该存储器和该通信接口通过总线连接;该存储器用于存储程序代码,当该程序代码被该处理器执行时,使得该装置执行如第一方面或第一方面的任一种可能的实现方式所提供的数据交换的控制方法。
第六方面,提供一种数据交换的控制装置,该装置包括:处理器、存储器、通信接口和总线,该处理器、该存储器和该通信接口通过总线连接;该存储器用于存储程序代码,当该程序代码被该处理器执行时,使得该装置执行如第二方面或第人台方面的任一种可能的实现方式所提供的数据交换的控制方法。
在本申请的又一方面,提供一种数据交换网络,该数据交换网络包括第一边缘节点和第二边缘节点,第一边缘节点包括第三方面、第三方面的任一种可能的实现方式或者第五方面所提供的数据交换的控制装置,第二边缘节点包括第四方面、第四方面的任一种可能的实现方式或者第六方面所提供的数据交换的控制装置。
在本申请的又一方面,提供一种计算机可读存储介质,该计算机可读存储介质中存储有计算机程序或指令,当该计算机程序或指令被运行时,实现如第一方面或第一方面的任一种可能的实现方式所提供的数据交换的控制方法。
在本申请的又一方面,提供一种计算机可读存储介质,该计算机可读存储介质中存储有计算机程序或指令,当该计算机程序或指令被运行时,实现如第二方面或第二方面的任一种可能的实现方式所提供的数据交换的控制方法。
在本申请的又一方面,提供了一种计算机程序产品,该计算机程序产品包括:计算机程序(也可以称为代码,或指令),当该计算机程序被运行时,使得计算机执行如第一方面或者第一方面的任一种可能的实现方式所提供的数据交换的控制方法。
在本申请的又一方面,提供了一种计算机程序产品,该计算机程序产品包括:计算机程序(也可以称为代码,或指令),当该计算机程序被运行时,使得计算机执行如第二方面或者第二方面的任一种可能的实现方式所提供的数据交换的控制方法。
可以理解地,上述提供的任一种数据交换的控制装置、数据交换网络、计算机可读存储介质和计算机程序产品,其所能达到的有益效果可对应参考上文所提供的数据交换的控制方法中的有益效果,此处不再赘述。
附图说明
图1示出了一种交换框的结构示意图;
图2为本申请实施例提供的一种数据交换网络的结构示意图;
图3为本申请实施例提供的另一种数据交换网络的结构示意图;
图4为本申请实施例提供的又一种数据交换网络的结构示意图;
图5为本申请实施例提供的一种交换系统的结构示意图;
图6为本申请实施例提供的一种源节点的结构示意图;
图7为本申请实施例提供的一种目的节点的结构示意图;
图8为本申请实施例提供的一种数据交换的控制方法的流程示意图;
图9为本申请实施例提供的另一种数据交换的控制方法的流程示意图;
图10为本申请实施例提供的一种第一边缘节点的结构示意图;
图11为本申请实施例提供的另一种第一边缘节点的结构示意图;
图12为本申请实施例提供的一种第二边缘节点的结构示意图;
图13为本申请实施例提供的另一种第二边缘节点的结构示意图。
具体实施方式
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行描述。在本申请中,“至少一个”是指一个或者多个,“多个”是指两个或两个以上。“和/或”,描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B的情况,其中A,B可以是单数或者复数。字符“/”一般表示前后关联对象是一种“或”的关系。“以下至少一项(个)”或其类似表达,是指的这些项中的任意组合,包括单项(个)或复数项(个)的任意组合。例如,a,b或c中的至少一项(个),可以表示:a,b,c,a和b,a和c,b和c或a、b和c,其中a、b和c可以是单个,也可以是多个。另外,在本申请的实施例中,“第一”、“第二”等字样并不对数量和次序进行限定。
需要说明的是,本申请中,“示例性的”或者“例如”等词用于表示作例子、例证或说明。本申请中被描述为“示例性的”或者“例如”的任何实施例或设计方案不应被解释为比其他实施例或设计方案更优选或更具优势。确切而言,使用“示例性的”或者“例如”等词旨在以具体方式呈现相关概念。
本申请提供的技术方案可以应用于多种不同的数据交换网络中,比如,该数据交换网络可以包括数据中心网络(dater center network,DCN)、高性能计算(high performance computing,HPC)网络和云网络等物理网络,也可以包括运行在物理网络上的虚拟网络等。其中,该数据交换网络可以包括多个交换节点,该交换节点也可以称为节点。在实际应用中,该交换节点可以为交换机或者路由器等交换设备,也可以为交换框、交换板或者交换单元(switch element,SE)等。一个交换框中可以包括多个线卡(line card)和多个交换板,该多个线卡和该多个交换板可以设置在背板(backplane)上,每个交换板中可以包括多个交换单元。该交换板也可以称为交换网卡(switch card,SC),该网卡也可以称为网络接口卡(network interface card,NIC)。图1示出了一种交换框的结构示意图,且将多个线卡表示为LC1-LCi,将多个交换板所包括的交换单元表示为SE1-SEk,i和k为正整数。下面对该数据交换网络的结构进行具体说明。
图2为本申请实施例提供的一种数据交换网络的结构示意图,该数据交换网络可以包 括至少两个交换层,该至少两个交换层中包括多个节点。可选的,该多个节点可以均为光节点。其中,该多个节点中位于该数据交换网络边缘的节点可以称为边缘节点,该边缘节点可以为用于接入服务器的交换设备,或者该边缘节点为接入该数据交换网络的服务器中的网卡,图2中以边缘设备为交换设备为例进行说明。该多个节点中除边缘节点之间的其他节点可以称为中间节点,图2中未示出中间节点。
其中,当某一边缘节点用于接收输入该数据交换网络的数据流时,该边缘节点可以称为源节点;当某一边缘节点用于向该数据交换网络的外部输出数据流时,该边缘节点可以称为目的节点。在本申请实施例中,源节点和目的节点可用于管理输入和输出该数据交换网络的流量,从而源节点可用作该数据交换网络的入口流量管理器(ingress traffic manager,ITM),目的节点可用作该数据交换网络的出口流量管理器。图2中仅示出了该数据交换网络中的3个边缘节点,且以该3个边缘节点包括一个目的节点(表示为D1)、以及向该目的节点交换数据的两个源节点(表示为S1和S2)为例进行说明,上述图2并不对本申请实施例构成限制。
结合图2,如图3所示,为本申请实施例提供的另一种数据交换网络的结构示意图,该数据交换网络包括三个交换层。参见图3,该数据交换网络包括接入层、汇聚层和核心层,接入层中包括多个接入(access)节点,汇聚层中包括多个汇聚(aggregation)节点,核心层包括多个核心(core)节点,且接入节点的下行端口与需要进行数据流量交换的服务器(server)连接,接入节点的上行端口与汇聚节点的下行端口连接,汇聚节点的上行端口与核心节点连接。
其中,汇聚层和接入层可以被划分为多个群组(pod),一个群组中可以包括多个接入节点和多个汇聚节点,且每个接入节点与多个汇聚节点全连接。与同一个汇聚节点连接的多个核心节点可以称为一个核心(core)平面,每个核心平面分别和各个群组中的不同汇聚节点连接。图3中仅以该数据交换网络包括3个群组,一个群组内包括3个接入节点和4个汇聚节点,每个核心平面包括两个核心节点为例进行说明。图3中的接入节点可以表示为A1~A9,汇聚节点可以表示为B1~B12,核心节点可以表示为C1~C8,3个群组分别表示为P1~P3。
其中,当一个群组内不同接入节点连接的服务器之间进行数据流量交换时,可以通过与接入节点在同一群组内的汇聚节点实现,比如,接入节点A1和接入节点A3连接的服务器需要进行数据流量交换,则接入节点A1可以通过汇聚节点B1将其连接的服务器的数据流发送给接入节点A3。当不同群组内的接入节点连接的服务器之间进行数据流量交换时,可以通过与接入节点在同一群组内的汇聚节点、以及与汇聚节点连接的核心节点实现,比如,接入节点A1和接入节点A5连接的服务器需要进行数据流量交换,则接入节点A1可以将其连接的服务器的数据流发送给汇聚节点B1,由汇聚节点B1转发给核心节点C1,再由C1通过汇聚节点B5发送给接入节点A5。
需要说明的是,图3示出的数据交换网络的结构仅为示例性的,并不构成对其结构的限定。在实际应用中,数据交换网络的每个交换层中还可以包括比图示更多或者更少的节点,或者数据交换网络还可以为包括两个交换层的网络,或者核心层中的多个核心节点可以被划分为多个核心平面,也可以不划分为多个核心平面,本申请实施例对此不做具体限定。
结合图2,如图4所示,为本申请实施例提供的又一种数据交换网络的结构示意图,该数据交换网络包括物理网络、以及运行在该物理网络上的虚拟网络。该物理网络也可以称为底层网络(underlay network),用于提供底层控制平面(underlay control plane)。该物理网络中可以包括多个交换设备,该多个交换设备中位于该物理网络边缘的交换设备可以称为边缘设备(edge device),该边缘设备可以用于接入服务器(比如,终端或者主机等)。该虚拟网络也可以称为可以承载网络(overlay network),用于提供承载控制平面(overlay control plane)。
在一种示例中,该承载网络可以包括一个或者多个虚拟扩展局域网(virtual extensible local area network,VXLAN)。该VXLAN可以通过VXLAN隧道端点(VXLAN tunnel end point,VTEP)形成于该物理网络上。不同的VXLAN可以通过不同的VXLAN网络标识符(VXLAN network identifier,VNI/VNID)来区分。对同一VXLAN内的用户相当于与接入的VXLAN互连,即VXLAN内的用户不感知其他XVLAN和物理网络的存在。其中,当接入VXLAN内的不同用户之间进行数据流量交换时,可以通过该VXLAN对应的VTEP和该物理网络来实现。
需要说明的是,上述以该虚拟网络是基于VXLAN协议创建的VXLAN为例进行说明,在实际应用中,该虚拟网络还可以是基于其他承载协议创建的承载网络,上述图4并不对本申请实施例构成限制。
另外,上述图4中物理网络的结构与上述图3所示的数据交换网络的结构类似,该物理网络也可以包括多个交换层,比如,多个交换层可以为两个交换层或者三个交换层等。
为便于理解,这里将包括三个交换层的数据交换网络中连接到同一核心平面的多个汇聚节点(比如,n个)和一个核心平面内的多个核心节点看作一个n×n的交换网(switch fabric,SF)系统。如图5所示,该n×n的SF中包括n个源节点(source,S)和n个目的节点(destination,D),以及中间级的m个交换节点SN。其中,图5中的Si和Di为同一个汇聚节点(i的取值依次为1至n),即n个源节点和n个目的节点是连接到同一核心平面的n个汇聚节点在分别作为发送节点和接收节点时按功能划分出的节点。n个汇聚节点中的每个汇聚节点可以包括多个端口,对于S来说,该多个端口为输入端口(input port),对于D来说,该多个端口为输出端口(output port)。该数据交换网络可以看作是由多个交换网组成,每个交换网络都采用相同的控制机制。
类似地,包括两个交换层的数据交换网络也可以看作由多个交换网组成,不同之处在于,n个源节点和n个目的节点是连接到同一核心平面的n个接入节点在分别作为发送节点和接收节点时按功能划分出的节点,本申请实施例对此不再赘述。
该交换网可以完成将从S接收到的数据包(packet)交换到D的操作,数据包通过SE的时候,可以保持原本变长包(variable-length packet)的格式,也可以被S先切成信元(cell)发送,待D接收到所有信元之后,再重组成完整的数据包。在这样的交换网中,S通常可以将接收到的数据包尽量均匀地分发到各SN。S发出的数据包通常携带D的信息,SN根据携带的信息向对应的D转发数据包。
S通过输入端口从系统外部接收数据包,通常S内置多个虚拟输出队列(virtual output queue,VOQ)用于缓存去往不同D的数据包(或者是用于缓存去往不同D的不同输出端口的数据包;或者是多个VOQ对应更细粒度的流,即多个VOQ用于缓存不同粒度的流的 数据包),VOQ可用于保障端到端的服务质量(quality of service,QoS),是防止头阻塞(head-of-line blocking,HOL blocking)的手段。对于n×n的交换网络而言,每个S中一般至少有n个VOQ对应n个D,若进一步根据D的输出端口或要求更高的粒度进行细分,还可以包括更多的VOQ。
如图6所示为一种源节点(S)的结构示意图。该S中包括多个输入端口、队列管理器(queue manager,QM)、入口调度器(ingress scheduler,ISC)和网络接口(fabric interface)。其中,QM可用于管理K个VOQ(K为正整数),ISC可用于调度QM中的K个VOQ,调度输出的数据包可通过网络接口切成信元,并添加信元头(Header)后发送给SE,或者直接按照数据包发送给SE。
如图7所示为一种目的节点(D)的结构示意图。该D中包括多个输出端口、队列管理器(queue manager,QM)、出口调度器(egress scheduler,ESC)和网络接口(fabric interface)。其中,QM可用于管理L个输出队列(output queue,OQ),L为正整数,L个OQ用于缓存去往不同输出的数据包,ESC可用于调度QM中的L个OQ。
比如,S中的ISC可以根据S中VOQ的状态向对应的D发送请求,D中的ESC接收到请求之后完成调度,并将调度结果告知ISC。ISC根据调度结果调度S中VOQ中的数据包出队。其中,D中的ESC在调度的过程中,可以考虑不同请求的QoS特性,以及D中各OQ的拥塞程度。
本领域技术人员可以理解的是,本申请实施例描述的数据交换网络、以及源节点和目的节点的结构是为了更加清楚的说明本申请实施例的技术方案,并不构成对于本申请实施例提供的技术方案的限定。
图8为本申请实施例提供的一种数据交换的控制方法,该方法可应用于上文所提供的数据交换网络中,该方法包括以下几个步骤。
S201:第一边缘节点统计输入的数据流中待交换数据的第一数据量。
其中,第一边缘节点可以是该数据交换网络中位于边缘位置,且用于接收外部输入该数据交换网络中的数据流的节点。可选的,第一边缘节点可以为该数据交换网络中的交换设备,比如,该交换设备可以为交换机或者路由器等;第一边缘节点也可以为源网卡SNIC,该源网卡可以是指源服务器中的网卡。
另外,输入的数据流可以是指输入第一边缘节点的数据流。第一边缘节点可以接收到来自外部的一个或者多个数据流,该输入的数据流可以是该一个或者多个数据流中的任意一个数据流。该数据交换网络中的第二边缘节点可用于将该数据流输出到该数据交换网络之外,第二边缘节点可以为该数据交换网络中的交换设备,也可以为目的网卡DNIC,该目的网卡可以是指目的服务器中的网卡。
再者,该数据流中待交换数据可以是指第一边缘节点中存储的该数据流的数据,也即是,该待交换数据包括第一边缘节点已接收到但未输出给下游节点的该数据流的数据。进一步的,该待交换数据还可以包括第一边缘节点已经输出给下游节点,但第二边缘节点尚未输出的该数据流的数据。下文中以该待交换数据是第一边缘节点中存储的该数据流的数据为例进行说明。
在一种实施例中,第一边缘节点包括第一计数器(counter),第一边缘节点可以通过第一计数器统计该数据流中待交换数据的第一数据量。可选的,第一数据量可以是该待 交换数据中数据比特/字节的数量、数据包的数量、信元的数量或者数据块的数量等,本申请实施例对此不作具体限制。
示例性的,以第一数据量是该待交换数据中数据包的数量为例,第一边缘节点可以在每接收到该数据流的一个数据包时,将第一计数器的数值加一,在每输出该数据流的一个数据包时,将第一计数器的数值减一,从而第一计数器的数值即为第一数据量。当然,第一边缘节点也可以在每接收到该数据流的一个数据包时,将第一计数器的数值减一,在每输出该数据流的一个数据包时,将第一计数器的数值加一,本申请实施例对此不作具体限制。
上述数据流可以是指从同一个源发出、去往同一目的的数据。在应用过程中,可以根据不同粒度的源或不同粒度的目的划分得到不同的数据流,下面对几种可能的划分方式进行介绍说明。
第一种、根据网络输出端口或者目的网卡划分数据流。具体的,以第一边缘节点为例,将第一边缘节点接收到的所有去往同一个网络输出端口或者去往同一个目的网卡的数据认为是同一个数据流中的数据。该网络输出端口可以是指用于向该数据交换网络的外部输出数据的端口,该目的网卡可以是指目的服务器的网卡。
当该数据交换网络中包括承载网络时,由于承载网络与网络输出端口或者目的网卡之间均存在对应关系,从而对于承载网络中的数据也可以按照网络输出端口或者目的网卡划分数据流。示例性的,当该数据交换网络中包括至少一个VXLAN时,可以根据该至少一个VXLAN与网络输出端口之间的对应关系、或者该至少一个VXLAN与目的网卡之间的对应关系,确定该至少一个VXLAN的数据对应的数据流。
可选的,当第一边缘节点通过多个计数器统计不同数据流中待交换数据的数据量时,该多个计数器与多个网络输出端口之间可以存在对应的关系,或者该多个计数器与多个目的网卡之间可以存在对应的关系。比如,第一边缘节点中的每个计数器可用于统计一个网络输出端口或者一个目的网卡对应的数据量。
第二种、根据承载网络划分数据流,比如,该承载网络可以为VXLAN。具体的,以第一边缘节点为例,将第一边缘节点接收到的同一承载网络(比如,同一VXLAN)的数据认为是同一个数据流中的数据。
对于该数据交换网络中物理网络的数据,由于承载网络与网络输出端口或者目的网卡之间均存在对应关系,从而对于物理网络的数据也可以按照承载网络划分数据流。示例性的,当该数据交换网络中包括至少一个VXLAN时,可以根据该至少一个VXLAN与网络输出端口之间的对应关系、或者该至少一个VXLAN与目的网卡之间的对应关系,确定该物理网络中的数据对应的数据流。
可选的,当第一边缘节点通过多个计数器统计不同数据流中待交换数据的数据量时,该多个计数器与该数据交换网络包括的至少一个VXLAN之间可以存在对应的关系。比如,第一边缘节点中的每个计数器可用于统计一个VXLAN对应的数据量。
第三种、根据五元组划分数据流。具体的,以第一边缘节点为例,将第一边缘节点接收到的对应同一五元组的数据认为是同一个数据流中的数据。该五元组可以包括:源IP地址、源端口、目的IP地址、目的端口和传输层协议。类似的,在实际应用中,也可以根据四元组或者七元组等划分数据流,本申请实施例在此不再赘述。
当该数据交换网络中包括承载网络时,由于承载网络与五元组之间均存在对应关系,从而对于承载网络中的数据也可以按照五元组划分数据流。
可选的,当第一边缘节点通过多个计数器统计不同数据流中待交换数据的数据量时,该多个计数器中的每个计数器可用于在不同时间统计根据五元组划分得到的不同数据流的数据量,在同一时间统计同一个数据流的数据量。
进一步的,在按照上述几种不同的划分方式划分数据流时,还可以结合数据的优先级(priority)进行划分。其中,该优先级可以是指用于区分数据包的优先级或者类型的信息,该优先级通常可以与数据一起进行传输,比如该优先级可以来源于IPv4的数据包中的区分服务码点(differentiated services code point,DSCP)域,DSCP域也可以称为服务类型(type of service,ToS)。
示例性的,以根据目的网卡DNIC和优先级划分为例,可以将第一边缘节点接收到的去往同一DNIC且对应同一优先级的数据认为是同一个数据流的数据,从而也可以根据{DNIC,priority}的维度在第一边缘节点中设置计数器的数量。或者,以根据VXLAN和优先级划分为例,可以将第一边缘节点接收到的去往同一VXLAN(即对应同一VXLAN网络标识符VNI)且对应同一优先级的数据认为是同一个数据流的数据,从而也可以根据{VNI,priority}的维度在第一边缘节点中设置计数器的数量。
需要说明的是,上述示出的几种可能的数据流的划分方式仅为示例性的,在实际应用中,还可以使用其他不同粒度的划分方式的划分数据流,比如,根据用户指定的包头(header)中的一个或者多个关键字段划分等,上述示例并对本申请实施例构成限制。
S202:第二边缘节点向第一边缘节点发送指示信息,该指示信息用于指示第二数据量。
其中,第二数据量可以是第二边缘节点确定的允许第一边缘节点发送的数据量。该指示信息中可以包括用于直接指示第二数据量的信息,也可以包括用于间接指示第二数据量的信息。当该指示信息中包括用于直接指示第二数据量的信息时,该信息可以直接为第二数据量对应的数值,比如,该指示信息中包括数值200,则用于指示第二数据量为200KB。当该指示信息中包括用于间接指示第二数据量的信息时,该指示信息可以包括第二数据量对应的数据包的序列号(该序列号与当前已交换的最后一个数据包的序列号之间的差值可以用于指示第二数据量),或者该指示信息包括第一数值,第一数值与单位数据量的乘积用于指示第二数据量,比如,第一数值为6,则第二数据量可以为6*ΔF,ΔF表示单位数据量,比如ΔF可以为4KB、8KB等,该单位数据量可以事先进行设置或者配置。
在一种实施例中,在第一边缘节点向第二边缘节点交换该数据流的过程中,第二边缘节点可以根据该数据流的相关信息和/或第二边缘节点的状态信息生成该指示信息,并向第一边缘节点发送该指示信息。进一步的,该指示信息还可以用于指示第二边缘节点刚发送的该数据流的数据量(比如,数据包的长度),以及第二边缘节点的状态信息等其他可能的信息等。
其中,第二边缘节点还可以根据该数据流的相关信息和/或第二边缘节点的状态信息确定发送该指示信息的时间点、间隔和粒度(也可以称为颗粒度)等其他可能的信息。这里的发送该指示信息的颗粒度可以是指触发第二边缘节点发送指示信息时对应发送出去的数据的颗粒度。
可选的,第二边缘节点的状态信息包括以下至少一项:已输出该数据流的数据量、第 二边缘节点的缓存状态、第二边缘节点的输出端口状态。第二边缘节点的缓存状态可以包括第二边缘节点中该数据流的缓存状态,也可以包括其他数据流的缓存状态。第二边缘节点的输出端口状态可以包括第二边缘节点中该数据流对应的输出端口的带宽和速率,也可以包括输出端口的带宽和速率等。
该数据流的相关信息可以包括以下至少一项:该数据流的目的服务器的状态,源服务器的状态,剩余数据的信息,该数据流的服务等级协议(service-level agreement,SLA)信息。目的服务器的状态可以包括目的服务器的接收带宽和速率等;源服务器的状态可以包括第一数据量、发送带宽和速率等;剩余数据的信息可以包括剩余数据包的数量或剩余数据量;SLA信息可以包括用户等级或优先级、以及数据类型、业务类型等。
在另一实施例中,在第二边缘节点向第一边缘节点发送指示信息之前(即在S202之前),第一边缘节点可以向第二边缘节点发送请求信息,该请求信息用于确定第二数据量,该请求信息可以包括以下至少一种信息:期望数据量(即第一边缘节点期望发送的数据量),第一数据量,源服务器的状态,剩余数据的信息等。这样,当第二边缘节点接收到该请求信息时,第二边缘节点可以根据上述方式生成指示信息,并向第一边缘节点发送该指示信息。
可选的,第一边缘节点可以通过控制信令的方式向第二边缘节点发送该请求信息;或者,第一边缘节点将该请求信息携带在该数据流的数据包中,即该请信息承载在该数据流的数据包中。
综上描述可知,第一边缘节点可以不发送任何请求信息,第二边缘节点在发送出第一边缘节点传输的数据之后,第二边缘节点向第一边缘节点反馈一个指示信息。或者,第一边缘节点向第二边缘节点发送一个请求信息,第二边缘节点在发送出第一边缘节点传输的数据之后,向第一边缘节点反馈一个指示信息。
S203:当第一边缘节点获取到该指示信息时,根据第一数据量和第二数据量控制该数据流的交换。
其中,第一边缘节点获取到该指示信息可以是指第一边缘节点中的处理器获取到该指示信息。比如,第一边缘节点可以通过通信接口或者收发器接收到第二边缘节点发送的该指示信息,此时第一边缘节点中的处理器能够从通信接口或者收发器获取到该指示信息。
在一种实施例中,当第一边缘节点接收到该指示信息时,第一边缘节点可以根据第一数据量和第二数据量确定第三数据量,比如,第三数据量为第一数据量与第二数据量的差值,若第三数据量大于预设阈值,则第一边缘节点可以控制该数据流的交换,该预设阈值可以是事先设置的。第一边缘节点控制该数据流的交换可以包括对该待交换数据做流量控制,和/或对该数据流对应的源服务器做流量控制。
具体的,第一边缘节点控制该数据流的交换可以包括:第一边缘节点缓存或丢弃该待交换数据中的至少部分数据,或者控制向第二边缘节点发送该待交换数据的发送速率小于预设速率,该预设速率可以是事先设置的。第一边缘节点对源服务器做流量控制可以包括:第一边缘节点向源服务器发送流量控制信息,该流量控制信息用于指示以下至少一项:指示该源服务器暂停发送该数据流,比如,发送关闭(Xoff)信号或者暂停(pause)信号;指示该源服务器发送该数据流的发送速率;指示允许该源服务器发送该数据流的数据量。
其中,当该流量控制信息用于指示该源服务器暂停发送该数据流时,在该第一边缘节 点的条件满足一定条件时,第一边缘节点还可以指示该源服务器恢复该数据流的发送,比如,向该源服务器发送开启(Xon)信号。
在本申请实施例中,当第一边缘节点与第二边缘节点之间进行数据交换时,第一边缘节点可以统计数据流中待交换数据的第一数据量,第二边缘可以通过指示信息向第一边缘节点指示允许第一边缘节点发送的第二数据量,第二数据量可以是第二边缘节点根据状态信息和数据流信息等参数确定的,这样第一边缘节点可以根据第一数据量和第二数据量控制该数据流的交换,从而实现第一边缘节点对入口流量的控制,第二边缘节点对出口流量的控制,进而避免数据交换网络中出现拥塞,提高交换效率。
进一步的,如图9所示,在S202之前,该方法还可以包括S204,S204可以位于S201之后。
S204:第一边缘节点根据预设数据量,向第二边缘节点发送该待交换数据。
当第一边缘节点存在预设数据量时,第一边缘节点可以直接使用预设数据量向第二边缘节点发送该待交换数据,从而第二边缘可以接收第一边缘节点使用该预设数据量发送的待交换数据。第一边缘节点使用预设数据量发送该待交换数据时,具体发送的数据可以是该待交换数据中的部分或者全部数据,且具体与第一数据量和该预设数据量的大小有关。
其中,预设数据量可以事先进行设置,比如,预设数据量可以为50KB、100KB或者150KB等等,本申请实施例对此不做具体限定。
进一步的,当预设数据量小于第一数据量,即第一边缘节点使用预设数据量未将该待交换数据全部发送给第二边缘节点时,第一边缘节点和第二边缘节点还可以通过上文所提供的方式以该数据流的数据全部发送给第二边缘节点。当预设数据量大于第一数据量时,对于第一边缘节点后续接收到的该数据流的数据,第一边缘节点可以使用该预设数据量与第一数据量的差值、再加上第二边缘节点发送的指示信息所指示的第二数据量,作为传输后续接收到的该数据流的数据第二边缘节点允许发送的数据量。
在本申请实施例中,第一边缘节点直接使用该预设数据量向第二边缘节点发送该待交换数据中的部分或者全部数据时,该部分或者全部数据无需等待往返时延(round-trip time,RTT)即可发送,从而可以减少第一边缘节点与第二边缘节点间的信令交互,提高数据传输效率。
上述主要从各个节点之间交互的角度对本申请实施例提供的方案进行了介绍。可以理解的是,各个网元,例如第一边缘节点和第二边缘节点等,为了实现上述功能,其包含了执行各个功能相应的硬件结构和/或软件模块。本领域技术人员应该很容易意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,本申请能够以硬件或硬件和计算机软件的结合形式来实现。某个功能究竟以硬件还是计算机软件驱动硬件的方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。
本申请实施例可以根据上述方法示例对第一边缘节点和第二边缘节点进行功能模块的划分,例如,可以对应各个功能划分各个功能模块,也可以将两个或两个以上的功能集成在一个模块中。上述集成的模块既可以采用硬件的形式实现,也可以采用软 件功能模块的形式实现。需要说明的是,本申请实施例中对模块的划分是示意性的,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式。下面以采用对应各个功能划分各个功能模块为例进行说明:
在采用集成的单元的情况下,图10示出了上述实施例中所涉及的数据交换的控制装置的一种可能的结构示意图。该装置可以为第一边缘节点或者第一边缘节点内置的芯片,该装置包括:处理单元301和接收单元302。其中,处理单元301用于支持该装置执行方法实施例中的S201和/或S203;接收单元302支持该装置执行方法实施例中接收S202发送的指示信息的步骤。可选地,该装置还可以包括发送单元303,发送单元303用于支持该装置执行发送请求信息的步骤。上述方法实施例涉及的各步骤的所有相关内容均可以援引到对应功能模块的功能描述,在此不再赘述。
在采用硬件实现的基础上,本申请中的处理单元301可以为数据交换的控制装置的处理器,接收单元302可以为该装置的接收器,发送单元303可以为该装置的发送器,发送器通常可以和接收器集成在一起用作收发器,具体的收发器还可以称为通信接口。
图11所示,为本申请的实施例提供的上述实施例中所涉及的数据交换的控制装置的一种可能的逻辑结构示意图。该装置可以为第一边缘节点或者第一边缘节点内置的芯片,该装置包括:处理器312和通信接口313。处理器31用于对该装置动作进行控制管理,例如,处理器312用于支持该装置执行方法实施例中生成请求信息、解析指示信息、统计待交换数据,和/或用于本文所描述的技术的其他过程。此外,该装置还可以包括存储器311和总线314,处理器312、通信接口313以及存储器311通过总线314相互连接;通信接口313用于支持该装置进行通信;存储器311用于存储该装置的程序代码和数据。
其中,处理器312可以是中央处理器单元,通用处理器,数字信号处理器,专用集成电路,现场可编程门阵列或者其他可编程逻辑器件、晶体管逻辑器件、硬件部件或者其任意组合。其可以实现或执行结合本申请公开内容所描述的各种示例性的逻辑方框,模块和电路。所述处理器也可以是实现计算功能的组合,例如包含一个或多个微处理器组合,数字信号处理器和微处理器的组合等等。总线314可以是外设部件互连标准(peripheral component interconnect,PCI)总线或扩展工业标准结构(extended industry standard architecture,EISA)总线等。所述总线可以分为地址总线、数据总线、控制总线等。为便于表示,图11中仅用一条粗线表示,但并不表示仅有一根总线或一种类型的总线。
在采用集成的单元的情况下,图12示出了上述实施例中所涉及的数据交换的控制装置的一种可能的结构示意图。该装置可以为第二边缘节点或者第二边缘节点内置的芯片,该装置包括:处理单元401和发送单元402。其中,处理单元401用于支持该装置确定第二数据量和生成指示信息的步骤;发送单元402支持该装置执行方法实施例中的S202。可选的,该装置还可以包括接收单元403,接收单元403用于支持该装置执行接收请求信息的步骤。上述方法实施例涉及的各步骤的所有相关内容均可以援引到对应功能模块的功能描述,在此不再赘述。
在采用硬件实现的基础上,本申请中的处理单元401可以为数据交换的控制装置 的处理器,发送单元402可以为该装置的发送器,接收单元403可以为该装置的接收器,发送器通常可以和接收器集成在一起用作收发器,具体的收发器还可以称为通信接口。
图13所示,为本申请的实施例提供的上述实施例中所涉及的数据交换的控制装置的一种可能的逻辑结构示意图。该装置可以为第二边缘节点或者第二边缘节点内置的芯片,该装置包括:处理器412和通信接口413。处理器412用于对该装置动作进行控制管理,例如,处理器412用于支持该装置执行方法实施例中解析请求信息、确定第二数据量、生成指示信息,和/或用于本文所描述的技术的其他过程。此外,该装置还可以包括存储器411和总线414,处理器412、通信接口413以及存储器411通过总线414相互连接;通信接口413用于支持该装置进行通信;存储器411用于存储该装置的程序代码和数据。
其中,处理器412可以是中央处理器单元,通用处理器,数字信号处理器,专用集成电路,现场可编程门阵列或者其他可编程逻辑器件、晶体管逻辑器件、硬件部件或者其任意组合。其可以实现或执行结合本申请公开内容所描述的各种示例性的逻辑方框,模块和电路。所述处理器也可以是实现计算功能的组合,例如包含一个或多个微处理器组合,数字信号处理器和微处理器的组合等等。总线414可以是外设部件互连标准(PCI)总线或扩展工业标准结构(EISA)总线等。所述总线可以分为地址总线、数据总线、控制总线等。为便于表示,图13中仅用一条粗线表示,但并不表示仅有一根总线或一种类型的总线。
本申请装置实施例的第一边缘节点和第二边缘节点可分别对应于本申请方法实施例中的第一边缘节点和第二边缘节点。并且,第一边缘节点和第二边缘节点的各个模块和其它操作和/或功能分别为了实现上述方法实施例的相应流程,为了简洁,本申请方法实施例的描述可以适用于该装置实施例,在此不再赘述。
本申请装置实施例的有益效果可参考上述对应的方法实施例中的有益效果,此处不再赘述。另外,本申请装置实施例中相关内容的描述也可以参考上述对应的方法实施例。
基于此,本申请实施例还提供一种数据交换网络,该数据交换网络包括第一边缘节点和第二边缘节点。其中,第一边缘节点可以为上述装置实施例所提供的第一边缘节点,用于执行上述方法实施例中第一边缘节点的步骤;第二边缘节点可以为上述装置实施例所提供的第二边缘节点,用于执行上述方法实施例中第二边缘节点的步骤。
本领域普通技术人员可以意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的数据交换网络、装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。
在本申请所提供的几个实施例中,应该理解到,所揭露的数据交换网络、装置和 方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。
所述功能如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本申请各个实施例所述方法的全部或部分步骤。
在本申请的另一实施例中,还提供一种可读存储介质,可读存储介质中存储有计算机执行指令,当一个设备(可以是单片机,芯片等)或者处理器执行上述方法实施例所提供的数据交换的控制方法中第一边缘节点的步骤。前述的可读存储介质可以包括:U盘、移动硬盘、只读存储器、随机存取存储器、磁碟或者光盘等各种可以存储程序代码的介质。
在本申请的另一实施例中,还提供一种可读存储介质,可读存储介质中存储有计算机执行指令,当一个设备(可以是单片机,芯片等)或者处理器执行上述方法实施例所提供的数据交换的控制方法中第二边缘节点的步骤。前述的可读存储介质可以包括:U盘、移动硬盘、只读存储器、随机存取存储器、磁碟或者光盘等各种可以存储程序代码的介质。
在本申请的另一实施例中,还提供一种计算机程序产品,该计算机程序产品包括计算机执行指令,该计算机执行指令存储在计算机可读存储介质中;设备的至少一个处理器可以从计算机可读存储介质读取该计算机执行指令,至少一个处理器执行该计算机执行指令使得设备上述方法实施例所提供的数据交换的控制方法中第一边缘节点的步骤。
在本申请的另一实施例中,还提供一种计算机程序产品,该计算机程序产品包括计算机执行指令,该计算机执行指令存储在计算机可读存储介质中;设备的至少一个处理器可以从计算机可读存储介质读取该计算机执行指令,至少一个处理器执行该计算机执行指令使得设备上述方法实施所提供的数据通信方法中第二边缘节点的步骤。
最后应说明的是:以上所述,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何在本申请揭露的技术范围内的变化或替换,都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以所述权利要求的保护范围为准。

Claims (27)

  1. 一种数据交换的控制方法,其特征在于,应用于包括第一边缘节点和第二边缘节点的数据交换网络中,所述方法包括:
    所述第一边缘节点统计输入的数据流中待交换数据的第一数据量;
    所述第一边缘节点获取来自第二边缘节点的指示信息,所述指示信息用于指示第二数据量,所述第二边缘节点为输出所述数据流的边缘节点;
    所述第一边缘节点根据所述第一数据量和所述第二数据量,控制所述数据流的交换。
  2. 根据权利要求1所述的方法,其特征在于,所述第一边缘节点根据所述第一数据量和所述第二数据量,控制所述数据流的交换,包括:
    所述第一边缘节点根据所述第一数据量和所述第二数据量,确定第三数据量;
    若所述第三数据量大于预设阈值,所述第一边缘节点控制所述数据流的交换。
  3. 根据权利要求2所述的方法,其特征在于,所述第一边缘节点控制所述数据流的交换,包括:
    所述第一边缘节点对所述待交换数据做流量控制;和/或,
    所述第一边缘节点对所述数据流对应的源服务器做流量控制。
  4. 根据权利要求2或3所述的方法,其特征在于,所述第一边缘节点对所述待交换数据做流量控制,包括:所述第一边缘节点缓存或丢弃所述待交换数据,或者控制向所述第二边缘节点发送所述待交换数据的发送速率;
    所述第一边缘节点对源服务器做流量控制,包括:所述第一边缘节点向所述源服务器发送流量控制信息,所述流量控制信息用于指示以下至少一项:指示所述源服务器暂停发送所述数据流;指示所述源服务器发送所述数据流的发送速率;指示允许所述源服务器发送所述数据流的数据量。
  5. 根据权利要求1-4任一项所述的方法,其特征在于,所述第二数据量与所述第二边缘节点的状态信息有关;和/或,
    所述第二边缘节点发送所述指示信息的时间点、间隔和粒度中的至少一个与所述第二边缘节点的状态信息有关。
  6. 根据权利要求5所述的方法,其特征在于,所述第二边缘节点的状态信息包括以下至少一项:已输出所述数据流的数据量、所述第二边缘节点的缓存状态、所述数据流的目的服务器的状态。
  7. 根据权利要求1-6任一项所述的方法,其特征在于,所述方法还包括:
    所述第一边缘节点向所述第二边缘节点发送请求信息,所述请求信息用于确定所述第二数据量,所述请求信息包括以下至少一种信息:剩余数据包的数量、剩余数据量、服务等级协议SLA信息。
  8. 根据权利要求7所述的方法,其特征在于,所述请求信息承载在所述数据流的数据包中。
  9. 根据权利要求1-8任一项所述的方法,其特征在于,所述第一边缘节点获取来自第二边缘节点的指示信息之前,所述方法还包括:
    所述第一边缘节点根据预设数据量,向所述第二边缘节点发送所述待交换数据。
  10. 根据权利要求1-9任一项所述的方法,其特征在于,所述数据流是根据网络输出端口或者目的网卡划分的数据流。
  11. 根据权利要求1-10任一项所述的方法,其特征在于,所述数据交换网络包括一个或者多个虚拟扩展局域网VXLAN,所述数据流是根据VXLAN划分的数据流。
  12. 根据权利要求1-11任一项所述的方法,其特征在于,所述数据流是根据五元组划分的数据流。
  13. 根据权利要求10-12任一项所述的方法,其特征在于,所述数据流的划分还与优先级有关。
  14. 一种数据交换的控制方法,其特征在于,应用于包括第一边缘节点和第二边缘节点的数据交换网络中,所述方法包括:
    所述第二边缘节点确定数据流对应的指示信息,所述指示信息用于指示第二数据量,所述第二边缘节点为输出所述数据流的边缘节点;
    所述第二边缘节点向所述第一边缘节点发送所述指示信息,以使所述第一边缘节点根据所述数据流中待交换数据的第一数据量和所述第二数据量控制所述待交换数据的交换,所述第一数据量是所述第一边缘节点统计得到的。
  15. 根据权利要求14所述的方法,其特征在于,所述第二数据量与所述第二边缘节点的状态信息有关;和/或,
    所述第二边缘节点发送所述指示信息的时间点、间隔和粒度中的至少一个与所述第二边缘节点的状态信息有关。
  16. 根据权利要求15所述的方法,其特征在于,所述第二边缘节点的状态信息包括以下至少一项:已输出所述数据流的数据量、所述第二边缘节点的缓存状态、所述数据流的目的服务器的状态。
  17. 根据权利要求14-16任一项所述的方法,其特征在于,所述第二边缘节点确定数据流的指示信息之前,所述方法还包括:
    所述第二边缘节点获取来自所述第一边缘节点的请求信息,所述请求信息用于确定所述第二数据量,所述请求信息包括以下至少一种信息:剩余数据包的数量、剩余数据量、服务等级协议SLA信息。
  18. 根据权利要求17所述的方法,其特征在于,所述请求信息承载在所述数据流的数据包中。
  19. 根据权利要求14-18任一项所述的方法,其特征在于,所述第二边缘节点确定数据流的指示信息之前,所述方法还包括:
    所述第二边缘节点获取所述第一边缘节点根据预设数据量发送的所述待交换数据。
  20. 根据权利要求14-19任一项所述的方法,其特征在于,所述数据流是根据网络端口或者目的网卡划分的数据流。
  21. 根据权利要求14-20任一项所述的方法,其特征在于,所述数据交换网络包括一个或者多个虚拟扩展局域网VXLAN,所述数据流是根据VXLAN划分的数据流。
  22. 根据权利要求14-21任一项所述的方法,其特征在于,所述数据流是根据五元组划分的数据流。
  23. 根据权利要求14-22任一项所述的方法,其特征在于,所述数据流的划分还与优 先级有关。
  24. 一种数据交换的控制装置,其特征在于,所述装置包括:处理器、存储器、通信接口和总线,所述处理器、所述存储器和所述通信接口通过总线连接;所述存储器用于存储程序代码,当所述程序代码被所述处理器执行时,使得所述装置执行权利要求1-13任一项所述的数据交换的控制方法。
  25. 一种数据交换的控制装置,其特征在于,所述装置包括:处理器、存储器、通信接口和总线,所述处理器、所述存储器和所述通信接口通过总线连接;所述存储器用于存储程序代码,当所述程序代码被所述处理器执行时,使得所述装置执行权利要求14-23任一项所述的数据交换的控制方法。
  26. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质中存储有计算机程序或指令,当所述计算机程序或指令被运行时,实现如权利要求1-13任一项所述的数据交换的控制方法。
  27. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质中存储有计算机程序或指令,当所述计算机程序或指令被运行时,实现如权利要求14-23任一项所述的数据交换的控制方法。
PCT/CN2021/142564 2021-12-29 2021-12-29 一种数据交换的控制方法及装置 WO2023123075A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202180060636.3A CN116686332A (zh) 2021-12-29 2021-12-29 一种数据交换的控制方法及装置
PCT/CN2021/142564 WO2023123075A1 (zh) 2021-12-29 2021-12-29 一种数据交换的控制方法及装置

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2021/142564 WO2023123075A1 (zh) 2021-12-29 2021-12-29 一种数据交换的控制方法及装置

Publications (1)

Publication Number Publication Date
WO2023123075A1 true WO2023123075A1 (zh) 2023-07-06

Family

ID=86996841

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/142564 WO2023123075A1 (zh) 2021-12-29 2021-12-29 一种数据交换的控制方法及装置

Country Status (2)

Country Link
CN (1) CN116686332A (zh)
WO (1) WO2023123075A1 (zh)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102025640A (zh) * 2010-12-24 2011-04-20 北京星网锐捷网络技术有限公司 流量控制方法、装置及网络设备
CN110856222A (zh) * 2018-08-20 2020-02-28 华为技术有限公司 一种流量控制的方法及装置
CN112512080A (zh) * 2020-10-22 2021-03-16 中兴通讯股份有限公司 流量控制、链路状态通知方法、装置、设备和存储介质
WO2021078936A1 (en) * 2019-10-23 2021-04-29 Telefonaktiebolaget Lm Ericsson (Publ) Edge nodes, ue and methods performed therein

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102025640A (zh) * 2010-12-24 2011-04-20 北京星网锐捷网络技术有限公司 流量控制方法、装置及网络设备
CN110856222A (zh) * 2018-08-20 2020-02-28 华为技术有限公司 一种流量控制的方法及装置
WO2021078936A1 (en) * 2019-10-23 2021-04-29 Telefonaktiebolaget Lm Ericsson (Publ) Edge nodes, ue and methods performed therein
CN112512080A (zh) * 2020-10-22 2021-03-16 中兴通讯股份有限公司 流量控制、链路状态通知方法、装置、设备和存储介质

Also Published As

Publication number Publication date
CN116686332A (zh) 2023-09-01

Similar Documents

Publication Publication Date Title
US12058033B2 (en) Method and system for providing network ingress fairness between applications
US11336581B2 (en) Automatic rate limiting based on explicit network congestion notification in smart network interface card
CN109412964B (zh) 报文控制方法及网络装置
US7916718B2 (en) Flow and congestion control in switch architectures for multi-hop, memory efficient fabrics
US11290388B2 (en) Flow control method and apparatus
CN116018790A (zh) 基于接收方的精密拥塞控制
Liu et al. Floodgate: Taming incast in datacenter networks
US11646978B2 (en) Data communication method and apparatus
CN109995608B (zh) 网络速率计算方法和装置
Avci et al. Congestion aware priority flow control in data center networks
WO2023123075A1 (zh) 一种数据交换的控制方法及装置
US11805071B2 (en) Congestion control processing method, packet forwarding apparatus, and packet receiving apparatus
Crupnicoff et al. Deploying quality of service and congestion control in infiniband-based data center networks
WO2024179013A1 (zh) 一种报文调度方法及网络设备
US12047296B2 (en) Scalable loss tolerant remote direct memory access over optical infrastructure with shaped quota management
WO2024016327A1 (zh) 报文传输
WO2022246710A1 (zh) 一种控制数据流传输的方法及通信装置
Chen et al. On meeting deadlines in datacenter networks
Khan et al. Receiver-driven flow scheduling for commodity datacenters
Zhao Mitigating interconnect and end host congestion in modern networks
Shinde Nimble: Scalable Rate-Limiting on Today's Programmable Switches
Yang et al. Towards Better QoS and Lower Costs of P4 EIP Gateway at the Edge
CN117041166A (zh) 拥塞控制方法和装置、交换机和计算机可读存储介质
CN118631737A (zh) 一种拥塞管理方法、网络设备和数据中心
Ye A cluster-based, scalable and efficient router

Legal Events

Date Code Title Description
WWE Wipo information: entry into national phase

Ref document number: 202180060636.3

Country of ref document: CN

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21969444

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE