WO2019105360A1 - 数据传输方法、相关装置及网络 - Google Patents

数据传输方法、相关装置及网络 Download PDF

Info

Publication number
WO2019105360A1
WO2019105360A1 PCT/CN2018/117821 CN2018117821W WO2019105360A1 WO 2019105360 A1 WO2019105360 A1 WO 2019105360A1 CN 2018117821 W CN2018117821 W CN 2018117821W WO 2019105360 A1 WO2019105360 A1 WO 2019105360A1
Authority
WO
WIPO (PCT)
Prior art keywords
switch
data
transit
switches
network
Prior art date
Application number
PCT/CN2018/117821
Other languages
English (en)
French (fr)
Inventor
刘芳
黄永成
吴炜捷
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to EP18883603.5A priority Critical patent/EP3713161B1/en
Publication of WO2019105360A1 publication Critical patent/WO2019105360A1/zh
Priority to US16/886,894 priority patent/US20200296043A1/en

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L45/00Routing or path finding of packets in data switching networks
    • H04L45/38Flow based routing
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L45/00Routing or path finding of packets in data switching networks
    • H04L45/34Source routing
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L45/00Routing or path finding of packets in data switching networks
    • H04L45/44Distributed routing
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L45/00Routing or path finding of packets in data switching networks
    • H04L45/64Routing or path finding of packets in data switching networks using an overlay routing layer
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L45/00Routing or path finding of packets in data switching networks
    • H04L45/66Layer 2 routing, e.g. in Ethernet based MAN's
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/10Flow control; Congestion control
    • H04L47/12Avoiding congestion; Recovering from congestion
    • H04L47/125Avoiding congestion; Recovering from congestion by balancing the load, e.g. traffic engineering
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L49/00Packet switching elements
    • H04L49/15Interconnection of switching modules
    • H04L49/1515Non-blocking multistage, e.g. Clos
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L49/00Packet switching elements
    • H04L49/15Interconnection of switching modules
    • H04L49/1553Interconnection of ATM switching modules, e.g. ATM switching fabrics
    • H04L49/1569Clos switching fabrics
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L45/00Routing or path finding of packets in data switching networks
    • H04L45/48Routing tree calculation

Definitions

  • the present application relates to the field of data center (DC) and traffic scheduling technologies, and in particular, to a data transmission method, a related device, and a network.
  • DC data center
  • traffic scheduling technologies and in particular, to a data transmission method, a related device, and a network.
  • a data center is a network used to transfer, accelerate, display, calculate, and store data information on an Internet infrastructure, and may include computer systems and other supporting devices (such as communication and storage systems), data communication connection devices, and environments. Control equipment, monitoring equipment and various safety devices. Data centers are widely used in distributed storage, big data analysis, and more.
  • the network topology of most data centers is implemented as a multi-level switching network, such as a fat tree, a leaf and spine architecture, and the like.
  • the transit devices that are responsible for the transit data (such as the core switches in the fat tree network, the spine switches in the ridge network, etc.) are highly utilized, and the load of each transit device, that is, the data transferred by each transit device.
  • the amount affects the transmission delay and bandwidth resource utilization of the entire data center network. How to balance the load of each transit device to reduce the transmission delay of data packets and improve the utilization of bandwidth resources is an urgent problem to be solved.
  • the present application provides a data transmission method, a related device, and a network, which can implement load balancing of each transit switch in a data center network, improve bandwidth resource utilization, and reduce transmission delay.
  • the present application provides a data transmission method, which is applied to a first controller side, and the method may include: determining, by the first controller from the available transit switches of the data center network, that the respectively transferred m group data is available.
  • a transit switch the m group data is data transmitted by a source network node connected by the m source switch groups to a destination network node connected to the destination switch group;
  • the data center network includes a plurality of transit switches, the m source switches The group, the destination switch group, the source network node, and the destination network node; wherein the available transit switch is a transit switch in the plurality of intermediate switches that does not exceed a first threshold; m is a positive integer;
  • one available transit switch is used to transfer at least one group of the data, and the difference between the number of groups in which the data is transferred by any two available transit switches does not exceed a second threshold;
  • the first controller indicates the destination network The node sends routing information to the source network node, where the routing information includes means for transferring the
  • the data center network in the present application includes a plurality of transit switches, switch groups, controllers, and network nodes.
  • Each switch group is connected to all the transit switches, and the controller controls the switch group and the network nodes connected to the switch group.
  • the data transmission path is: network node - switch group - transit switch - another switch group - another network node.
  • a network node that sends data is referred to as a source network node
  • a switch group that connects to a source network node is referred to as a source switch group
  • a controller that controls a source switch group and a source network node to which it is connected is referred to as a first control.
  • the network node that receives the data is called the destination network node
  • the switch group that connects the destination network node is called the destination switch group.
  • the data center network in the present application is a fat tree network
  • the transit switch is implemented as a core switch
  • the switch group is implemented as a basic point of delivery (POD)
  • the controller controls the POD and the POD connected network node.
  • the data center network in the present application is a ridge network
  • the transit switch is implemented as a ridge switch
  • the switch group is implemented as a leaf switch
  • the controller controls the leaf switch and the network node connected by the leaf switch.
  • the load balancing of each intermediate switch can be implemented, the bandwidth resource utilization is improved, and the transmission delay is reduced.
  • the first controller determines the available transit switches that respectively relay the m sets of data from the available transit switches of the data center network, it is also necessary to determine that all of the transit switches of the data center network are currently available. Transit switch.
  • the data is transmitted multiple times, and the first controller determines the current available transit switch before each data transmission.
  • the transit switch has an explicit congestion notification (ECN) function.
  • ECN explicit congestion notification
  • the transit switch takes the value of the ECN field of the data packet.
  • the value is modified to a second value. If the load does not exceed the first threshold, the transit switch does not modify the ECN field of the data packet, or modify the ECN field of the data packet, so that the ECN field of the data packet takes the first value.
  • the first value and the second value may be predefined.
  • the first threshold may be pre-stored or pre-configured by the first controller according to the processing capability of the transit switch.
  • the transmission path of each initial data packet covers all the transit switches.
  • the first controller acquires an initial data packet from all the transit switches, and when the value of the ECN field of the initial data packet is the first value, determines that the transit switch that sends the data packet is an available transit switch; when the ECN domain of the initial data packet When the value is the second value, it is determined that the transit switch that sent the packet is an unavailable transit switch.
  • the source network node determines the path of the transmission data according to the transit switch determined by the first controller. Taking the second data transmission as an example, in the second data transmission, some transit switches are used in the transit switch of the data center network, and the rest of the transit switches are not used.
  • the used partial transit switch is in the initialization phase, and the first controller determines the intermediate switch for transmitting data in the second data transmission.
  • the first controller may determine an available transit switch in the used partial transit switch according to the ECN domain of the data packet in the second data transmission.
  • the remaining transit switches that are not used are the unavailable transit switches determined by the first controller in the initialization phase.
  • the first controller determines that the transit switch is unavailable, after the preset duration, the first controller considers the transit switches to be available transit switches.
  • the first controller determines the available transit switches that respectively transfer the m sets of data, The number of groups of any of the available transit switches that are determined to be transferred does not exceed Where k is the number of available transit switches for transferring a set of data determined by the first controller, and v is the number of available transit switches of the data center.
  • the first controller in order to implement an available transit switch for transferring at least one set of data, and the difference between the number of sets of transfer data of any two available transit switches does not exceed a second threshold, the first controller
  • the available transit switches that respectively transfer the m sets of data may be determined based on the packing sequence, as described in detail below.
  • multiple controllers can pre-store the same multiple packing sequence groups according to three parameters of V, K, and M.
  • V the number of transit switches of the data center network
  • B the number of switch groups.
  • the 1.1 packing sequence group includes M packing sequences.
  • Any one of the M packing sequences includes V elements.
  • V elements when V>K, there are K elements taking the third value; when V ⁇ K, all V elements take the third value.
  • any one element can take at most Third time value.
  • the third value may be predefined, and the application does not limit the application.
  • the third value can be one.
  • one packing sequence group has the following characteristics: in the M packing sequences of the packing sequence group, any one element takes at least one third value, and each element takes the third value as the number of times.
  • v is the number of available transit switches
  • k is the number of available transit switches that transit a set of data
  • m is the number of source switch groups.
  • one source switch group corresponds to one packing sequence
  • the packing sequence includes v elements
  • the v elements respectively correspond to v available transit switches in the data network
  • the available transit switch corresponding to the element is a transit switch that transits the data transmitted by the source network node connected to the source switch group to the destination network node connected to the destination switch group.
  • the first controller determines the available transit switches of the transfer m groups of data based on the packing sequence, and can implement load balancing of each available transit switch in the data center network, improve bandwidth resource utilization, and reduce transmission delay.
  • the routing information is carried in the acknowledgement signal.
  • the present application provides a controller, which may include a plurality of functional modules for respectively performing the method provided by the first aspect, or provided by any one of the possible embodiments of the first aspect Methods.
  • the application provides a controller for performing the data transmission method described in the first aspect.
  • the controller can include a memory and a processor coupled to the memory, wherein: the memory is to store implementation code of a data transfer method described in the first aspect, the processor is configured to execute the store in the memory
  • the program code that is, the method provided by the method of the first aspect, or the method provided by any of the possible embodiments of the first aspect.
  • the application provides a network, the network comprising: a controller, a transit switch, a switch group, and a network node.
  • the controller may be the controller described in the second aspect or the third aspect above.
  • a computer readable storage medium having instructions stored thereon that, when run on a computer, cause the computer to perform the data transfer method described in the first aspect above.
  • a computer program product comprising instructions, when executed on a computer, causes the computer to perform the data transfer method described in the first aspect above.
  • the first controller determines an available transit switch that respectively transfers m sets of data from the available transit switches of the data center network, and one available transit switch is used to transfer at least one set of data, any two available transit switches The difference in the number of groups of transit data does not exceed the second threshold.
  • the application can implement load balancing of each transit switch in the data center network, improve bandwidth resource utilization, and reduce transmission delay.
  • 1A is a schematic structural diagram of a fat tree network in the prior art
  • 1B is a schematic structural view of a ridge network in the prior art
  • FIG. 2 is a schematic structural diagram of a data center network provided by the present application.
  • 3A is a schematic structural diagram of a fat tree network provided by the present application.
  • 3B is a schematic structural view of a ridge network provided by the present application.
  • FIG. 4 is a schematic flowchart of an initialization phase in a data transmission method provided by the present application.
  • FIG. 5 is a schematic flowchart of the initialization after the data transmission method provided by the present application.
  • FIG. 6 is a schematic structural diagram of a controller provided by the present application.
  • FIG. 7 is a functional block diagram of a controller provided by the present application.
  • data center networks typically employ a hierarchical topology.
  • FIG. 1A is a schematic diagram of a 4-core fat tree network structure based on clos.
  • n-ary fat tree network structure that is, a fat tree network structure consisting of n data center basic point of delivery (POD), including n 2 / 4 core switches, n 2 /2 Aggregation switch, n 2 /2 edge switches, and n 3 / 4 network nodes.
  • POD data center basic point of delivery
  • each n/2 aggregation switches and n/2 edge switches form a POD.
  • each edge switch is connected to a different n/2 network nodes, and each edge switch is connected to the upper n/2 aggregation switches.
  • each n/2 core switches form a core switch group. All core switches in the tth core switch group are connected to the tth aggregation switch in each POD, 0 ⁇ t ⁇ n/2.
  • each POD is connected to all core switches, and an aggregation switch connected to a core switch in a certain POD is unique.
  • FIG. 1B is a schematic diagram of a blade ridge network topology.
  • the ridge network topology includes a ridge switch and a leaf switch, and each leaf switch is connected to all ridge switches.
  • the number of network nodes connected to each leaf switch is related to the number of downlink ports of the leaf switch.
  • each of the connected devices mentioned in FIG. 1A and FIG. 1B is connected through a port.
  • An equal-cost multi-path (ECMP) forwarding technique based on a hash algorithm is one of the typical distributed scheduling techniques.
  • ECMP referring to the fat tree network shown in FIG. 1A, there are multiple data transmission paths between two PODs, that is, when one POD connected network node transmits data to another POD connected network node, there are many An equivalent candidate path that is transmitted using multiple candidate paths according to a hash algorithm.
  • the global information in the data center network is collected by the centralized controller, including all the data packet information, the data request information of the destination network node, and the state of each switch, so as to make the most routing of each data packet. Excellent distribution, so as to achieve load balancing of each transit device.
  • the centralized scheduling method can obtain the optimal path selection scheme for each data packet, the scheduling method requires a large amount of information interaction.
  • the process of collecting information by the centralized controller occupies bandwidth resources and causes additional overhead to the data center network.
  • this scheduling method has high computational complexity and long reaction time.
  • the key to load balancing is to balance the workload of the transit devices that bear the role of the transit data (such as the core switches in the fat tree network, the spine switches in the ridge network, etc.).
  • the present application proposes a data center network, which can balance the workload of each transit device through a controller to achieve load balancing.
  • the data center network of the present application includes: a plurality of transit switches, a plurality of switch groups, a plurality of controllers, and network nodes respectively connected to the plurality of switch groups.
  • Each switch group is connected to all the transit switches, and multiple switch groups are respectively connected to different network nodes, and one switch group is connected to one controller.
  • each of the connected devices mentioned in FIG. 2 above is connected through a port.
  • data center network of the present application may also include other devices, which are not limited in this application.
  • the transit switch refers to the uppermost switch in the data center network for transferring data, which may be the core switch in the fat tree network shown in FIG. 1A, or the spine switch in the leaf ridge network shown in FIG. 1B. It can be a switch for transiting data in other data center networks based on hierarchical topologies.
  • the transit switch has an explicit congestion notification (ECN) function.
  • ECN explicit congestion notification
  • the transit switch checks the destination address of the data packet.
  • the transit switch determines the egress port used to forward the packet based on the destination address of the packet.
  • the transit switch has multiple egress ports. After determining the destination address of the packet, the transit switch can uniquely determine the egress port used to forward the packet.
  • the transit switch checks the load at the egress port.
  • the load at the egress port may be determined by factors such as the number of data packets waiting to be forwarded at the egress port, and the speed at which the egress port forwards the data packet. The more packets that are waiting to be forwarded, or the slower the forwarding speed, the greater the load on the current outbound port.
  • the transit switch sets the ECN field in the newly arrived packet to the second value, indicating that the current outbound port needs to forward too many data packets, and is experiencing congestion. If the current load of the outbound port does not exceed the first threshold, the transit switch does not make any change to the ECN field in the data packet, and the ECN field in the data packet can take the first value, indicating that the current outbound port can also be Forward new packets.
  • the first threshold may be preset.
  • the first threshold may be set according to a processing capability of the transit switch.
  • the first value and the second value can be predefined.
  • the network node 0 sends a data packet to the network node 3, and the destination address in the data packet is the address of the network node 3, and the transmission path of the data packet passes through the transit switch 0.
  • transit switch 0 and switch group 2 are connected through outbound port 0.
  • the transit switch 0 checks the number of data packets to be forwarded at port 0. If it is greater than 5, the ECN field of the newly arrived data packet is set to 11; if it is less than or equal to 5, the transit switch does not respond to the newly arrived data packet. If the ECN domain is changed, the ECN field of the packet may be 01 or 10.
  • the device receiving the data packet from the transit switch can learn the status of the transit switch through the ECN domain in the data packet: whether congestion is currently being experienced.
  • the switch group is a group of devices connected to the network node and the transit switch, and may be a POD in the fat tree network shown in FIG. 1A, or a leaf switch in the leaf ridge network shown in FIG. 1B, or may be A group of devices in other hierarchical transport networks used to connect network nodes and transit switches.
  • each switch group is configured with a controller that connects and controls the network nodes corresponding to the switch group and the switch group.
  • the controller can be used to acquire data packets from each transit switch, determine the status of the transit switch, perform route scheduling, and the like.
  • the network node may be a device having a unique network address, such as a workstation, a server, or a terminal device.
  • a data center network when communicating between any two network nodes connected to different switch groups, they need to go through the switch group and the transit switch that are connected to each other.
  • network node 0 and network node 2 are respectively connected to switch group 0 and switch group 1, and network node 0 sends data to network node 2, the data transmission path is: network node 0-switch group 0 - Any one of the transit switches in the data center network - switch group 1 - network node 2.
  • a network node that transmits data is referred to as a source network node
  • a network node that receives data is referred to as a destination network node.
  • the switch group connected to the source network node is called the source switch group
  • the switch group connected to the destination network node is called the destination switch group.
  • the corresponding controller of the destination switch group is referred to as a first controller. It can be understood that the first controller connects and controls the destination switch group and the destination network node.
  • the source switch group includes: switch group 0, switch group 2, switch group 3, and the destination switch group includes switch group 1.
  • the data transmission path is: the source network node - the source switch group connected to the source network node - the transit switch - the destination switch group connected to the destination network node - the destination Network node.
  • the data center network provided by the present application is specifically described below in conjunction with the fat tree network and the leaf ridge network.
  • FIG. 3A is a schematic structural diagram of a fat tree network provided by the present application
  • FIG. 3A is an example of a 6-member fat tree network.
  • each POD is configured with a POD controller.
  • the POD controller can be used to obtain data packets from each core switch, read the value of the ECN domain of each data packet, determine the state of the core switch, and perform route scheduling.
  • FIG. 3B is a schematic structural diagram of a ridge network provided by the present application.
  • each leaf switch is configured with a leaf controller.
  • the leaf controller can be used to acquire data packets from each spine switch, read the value of the ECN domain of each data packet, determine the state of the spine switch, perform routing scheduling, and the like.
  • the present application provides a data transmission method.
  • the main inventive principles of the present application may include: the first controller knows the status of each of the transit switches in the data center network, and in the available transit switches, determines available transit switches for relaying data from different source network nodes, respectively. The amount of data transferred by each available transit switch is balanced.
  • each transit switch in the data center network can be divided into two types: an available transit switch and an unavailable transit switch.
  • each of the transit switches has an egress port for forwarding data to the destination network node.
  • the available transit switch refers to a transit switch where the load at the egress port does not exceed the first threshold.
  • the unavailable transit switch refers to a transit switch where the load at the egress port exceeds the first threshold.
  • the data that passes through the same source switch group during transmission is called a set of data, that is, the data that is sent from the source network node connected to the same source switch group to the destination network node connected to the destination switch group is called a Group data.
  • the set of data can be generated by one or more source network nodes to which the source switch group is connected.
  • a set of data can be relayed by a transit switch, or can be coordinated by multiple transit switches.
  • a set of data passing through POD0 can be transited by the core switch group 0, or can be jointly transferred by the core switch group 0, the core switch group 1, and the core switch group 1.
  • the plurality of intermediate switches can transfer the same data, or can respectively transfer different parts of a group of data.
  • a group of data can be transmitted in the form of a data packet and can be transmitted multiple times. This application does not impose any limitation.
  • one packing sequence group is determined based on three parameters V, K, and M. After the values of V, K, and M are determined, the corresponding packing sequence group is constructed according to the following three conditions:
  • the 1.1 packing sequence group includes M packing sequences.
  • Any one of the M packing sequences includes V elements.
  • K i elements taking a third value 0 ⁇ i ⁇ M.
  • any one element can take at most Third time value.
  • the third value may be predefined, and the application does not limit the application.
  • the third value can be one.
  • a packing sequence group has the following characteristics: in the M packing sequences of the packing sequence group, any one element takes at least one third value, and each element takes the third value as the number of times.
  • the packing sequence group includes five packing sequences, and each packing sequence includes six elements.
  • the first element takes 2 "1
  • the second element takes 3 “1”
  • the third element takes 2 "1”
  • the 4th element takes 3
  • the fifth element takes 3 "1”
  • the fifth element takes 2 "1”. It can be seen that the number of times any element takes “1" is balanced.
  • Equation 1 ⁇ 1 is determined, and ⁇ 1 is the minimum value of ⁇ when Equation 1 is satisfied.
  • V, K, ⁇ 1 (V, K, ⁇ 1 )-packing is composed of a plurality of blocks, each of which is a set, and each block includes elements selected from V given elements.
  • each block includes K elements, and any two different elements appear at most simultaneously in ⁇ 1 block.
  • the constructed (V, K, ⁇ 1 )--packing includes at least M blocks.
  • (2,3,5)-packing which includes 10 blocks.
  • (2,3,5)-packing can be: ⁇ 1,2,4 ⁇ , ⁇ 2,3,5 ⁇ , ⁇ 3,4,6 ⁇ , ⁇ 1,4,5 ⁇ , ⁇ 2,5, 6 ⁇ , ⁇ 1, 3, 6 ⁇ , ⁇ 2, 3, 4 ⁇ , ⁇ 4, 5, 6 ⁇ , ⁇ 1, 2, 6 ⁇ , ⁇ 1, 3, 5 ⁇ .
  • the five blocks can be: ⁇ 1, 2, 4 ⁇ , ⁇ 2, 3, 5 ⁇ , ⁇ 3, 4, 6 ⁇ , ⁇ 1, 4, 5 ⁇ , ⁇ 2, 5, 6 ⁇ .
  • the corresponding packing sequence group is generated.
  • the group of packing sequences corresponding to the five blocks can be as shown in Table 1.
  • the first block is ⁇ 1, 2, 4 ⁇ , and the elements 1, 2, and 4 appear, and the first, second, and fourth elements in the first packing sequence take a third value (1), that is, The first packing sequence is: 110100.
  • the controller stores multiple packing sequence groups
  • the number of transit switches, the number of switch groups, and the number of controllers are determined.
  • Multiple controllers in the data center network pre-store multiple packing sequence groups according to the number of transit switches and the number of switch groups. It can be understood that the packing sequence stored in multiple controllers in the data center network is the same, and is applicable to the scenario where any one controller is the destination controller.
  • the data center network includes A transit switches and B switch groups as an example to describe multiple packing sequence groups stored in the controller.
  • the controller constructs a packing sequence group according to three parameters V, K, and M, where 1 ⁇ V ⁇ A, 1 ⁇ K ⁇ A, and 1 ⁇ M ⁇ B. Therefore, A*A*B packing sequence groups can be constructed and stored in the controller.
  • the fat tree network shown in FIG. 3A which includes 9 core switches (transit switches), 6 PODs (switch groups), and 6 controllers.
  • the controller can also construct a plurality of packing sequence groups of 1 ⁇ V ⁇ 9, 1 ⁇ K ⁇ 9, 1 ⁇ M ⁇ 6, and store them.
  • the data transmission method of the present application is described below.
  • the data when data is transmitted in the data center network, the data is transmitted in multiples in the form of data packets according to the amount of data. Data from source network nodes connected to the same source switch group may pass through different available transit switches in each transmission.
  • the data transmission can be divided into two phases: an initialization phase (the first packet transmission phase), and an initialization data transmission phase.
  • the embodiment shown in FIG. 4 includes a plurality of transit switches, m source switch groups, one destination switch group, multiple source network nodes, and one destination network node in a data center network, and transmits a total of m groups of data.
  • An example is introduced.
  • FIG. 4 is a schematic flowchart of an initialization phase in a data transmission method provided by the present application.
  • the initialization phase enables the first controller to know the status of all transit switches in the data center network and determine the available transit switches therein.
  • the method can include the following steps:
  • the source network node connected to the m source switch group transmits the initial data packet to the destination network node connected to the destination switch group, and the transmission path of the initial data packet passes through all the transit switches of the data center network.
  • the destination network node first sends a data transmission request to the plurality of source network nodes storing the data when invoking data of the plurality of source network nodes in the data center network.
  • the multiple source network nodes after receiving the data transmission request sent by the destination network node, the multiple source network nodes first transmit the initial data packet.
  • the initial data packet may be all data packets sent by multiple source network nodes to the destination network node in a normal primary data transmission; or may be a small part of all data transmitted by multiple source network nodes to the destination network node. Packets can speed up initialization when there are fewer initial packets.
  • the transmission path of the initial data packet in step S101 is: source network node-source switch group-intermediate switch-destination switch group-destination network node.
  • the transmission path of the initial data packet passes through all the transit switches of the data center network.
  • each source switch group forwards the received initial data packet to all transit switches when it receives the initial data packet of the source network node to which it is connected.
  • each source switch group simply traverses all the transit switches when forwarding the initial data packet, which is simple and easy, and can ensure that the transmission path of the initial data packet passes through all the transit switches.
  • the transit switch that forwards the initial data packet by each source switch group may be pre-defined, or may be specified by the first controller and carried in a data transmission request sent by the destination network node to the multiple source network nodes.
  • the source switch group 0 can forward the received initial data packet to the transit switch 0, 1, 2 respectively, and the source switch group 1 can forward the received initial data packet to the transit switch 3, 4 respectively. 5.
  • the source switch group 2 can forward the received initial data packets to the transit switches 6, 7, and 8, respectively.
  • each source switch group only needs to forward the initial data packet to a part of the transit switch, so that the transmission path of the initial data packet covers all the transit switches, thereby saving bandwidth resources.
  • the transit switch of the present application has an ECN function.
  • the transit switch When receiving the data packet, the transit switch rewrites the ECN domain of the data packet according to the load of the current forwarding data to the egress port of the destination network node.
  • the first controller determines an available transit switch of the data center network according to the initial data packet.
  • the first controller connects and controls the destination switch group and the destination network node, and when the initial data packet arrives at the destination switch group or the destination network node, the first controller can acquire the initial data packet.
  • the first controller can obtain the initial data packet forwarded by each intermediate switch, and learn all the transits in the data center network according to the ECN domain in the initial data packet.
  • the status of the switch determines the available transit switches and the unavailable transit switches.
  • setting the state of the unavailable transit switch after the preset duration is available, so that the setting function can refer to the details of the following FIG. 5 embodiment. Description, not repeated here.
  • the first controller determines, from the available transit switches of the data center network, the available transit switches that respectively transfer the m groups of data.
  • the m group data is data transmitted by the source network node connected by the m source switch groups to the destination network node connected to the destination switch group.
  • An available transit switch is used to transfer at least one of the sets of data, and the difference in the number of groups in which the data is transferred by any two available transit switches does not exceed a second threshold.
  • the first controller may determine that the available transit switch of the m group of data is respectively transferred in the second data transmission after the initialization. And, an available transit switch relays at least one set of data, and the difference between the number of sets of transfer data of any two available transit switches does not exceed the second threshold.
  • the second threshold may be predefined, for example, the second threshold may be 1 or 2.
  • each available transit switch in the data center network is used, and the difference in the amount of data carried by each available transit switch is within a certain range, thereby ensuring the available transit switch.
  • Load balancing In this way, in the second data transmission, each available transit switch in the data center network is used, and the difference in the amount of data carried by each available transit switch is within a certain range, thereby ensuring the available transit switch. Load balancing.
  • the first controller may determine the available transit switches that respectively transfer the m groups of data in the following two cases:
  • Each group of data is transferred by k available transit switches.
  • the first controller may make any one of the available transit switches when determining the respectively transferred m sets of data.
  • the number of groups in which the data can be transferred by the transit switch does not exceed Where k is the number of available transit switches for the transfer of a set of said data determined by said first controller.
  • the specific value of k is determined by the first controller according to the transmission rate required by the application that currently triggers the m group data to be transmitted from the multiple source network nodes to the destination network node, for example, when the transmission rate required by the application is higher, The value of k is larger.
  • the k switches can all transfer all of the set of data, or can respectively transfer different parts of the set of data.
  • the shuffled data may be equally divided into k shares, and each of the transit switches transfers one of the data, and the work of the k transit switches may be averaged. the amount.
  • the first The controller may select a corresponding packing sequence group from a plurality of packing sequence groups stored in advance, and determine an available transit switch for respectively transferring the m group data according to the packing sequence in the packing sequence group, which is described in detail below:
  • the first controller determines the values of V, K, and M according to the number of available transit switches, the number of available transit switches that transfer a set of the data, and the number of source switch groups, according to V, K, and M.
  • a corresponding one of the packing sequence groups is selected from a plurality of packing sequence groups stored in advance.
  • V is the number of available transit switches
  • K is the number k of available transit switches that transfer a set of the data
  • M is the number m of the source switch groups.
  • the pre-stored plurality of packing sequence groups in the first controller may refer to the related description in the concept (4) involved in the pre-text application, and the controller stores a plurality of packing sequence groups.
  • m packing sequences are included.
  • One source switch group corresponds to one packing sequence.
  • Each of the packing sequences includes v elements, and the v elements respectively correspond to v available transit switches in the data network; when the value of an element is a third value, the available transit switch corresponding to the element is And transiting a transit switch of the data transmitted by the source network node connected to the source switch group to the destination network node connected to the destination switch group.
  • the data center network shown in FIG. 2 includes six source network nodes: network nodes 0, 1, 2, 4, 5, and 6, and the destination network node is network node 3, that is, the data center network includes 5 Source switch group: switch group 0, switch group 1, switch group 3, switch group 4, switch group 5, and destination switch group is switch group 2.
  • the first controller is the controller 2. Assume that there are six available transit switches determined by the first controller: transit switches 0, 1, 2, 6, 7, 8.
  • the one packing sequence group is as shown in Table 1, where the packing sequences 1-5 correspond to five source switch groups respectively.
  • the first source switch group corresponds to packing sequence 1 (110100), indicating that the network node 0, 1 sends data to the network node 3, and is transferred by the first, second, and fourth available transit switches (transit switches 0, 1, 6). .
  • the meaning of other packing sequences can be deduced by analogy and will not be described here.
  • Each set of data can be transferred by a different number of available transit switches.
  • the i-th group data (that is, the data transmitted from the source network node connected by the i-th source switch group to the destination network node connected to the destination switch group) may be transited by the k i transit switches, 1 ⁇ i ⁇ m.
  • the value of k i is determined by the first controller.
  • the priority of the data from each source POD may be different, and the data with higher priority may be transited by the more available transit switches, that is, the value of k i corresponding to the group of data. The bigger.
  • the first controller may make any one of the available transit switches when determining the respectively transferred m sets of data.
  • the number of groups in which the data can be transferred by the transit switch does not exceed
  • the first The controller may select a corresponding packing sequence group from the plurality of packing sequence groups stored in advance, and determine an available transit switch that respectively transfers the m group data according to the packing sequence in the packing sequence group.
  • the packing sequence group may be constructed according to the values of V, K 1 , K 2 , K 3 ... K M , M and stored in the first controller, and the construction process of the packing sequence may refer to the concept involved in the previous text application. (3) A description of the group of packing sequences.
  • V is the number of available transit switches
  • K is the number k of available transit switches that transfer a set of the data.
  • the first controller instructs the destination network node to send routing information to the source network node, where the routing information includes an identifier of an available transit switch for the data group transmitted by the transit source network node to the destination network node.
  • the first controller instructs the destination network node to separately send the identifiers of the available transit switches that relay each set of data to the corresponding source network node.
  • the first controller may instruct the destination network node to send the identity of the transit switch 0, 1, 6 to the network node 0, 1.
  • the identifier of the available transit switch may be a MAC address, a switch virtual interface (SVI) address, etc., which can uniquely identify the identifier of the transit switch in the data center network.
  • the routing information may be carried in an acknowledgement (ACK) sent to each source network node after the destination network node receives the initialization data packet.
  • ACK acknowledgement
  • FIG. 5 is a schematic flowchart of a data transmission phase after initialization.
  • FIG. 5 illustrates the following steps: The source network node receives the routing information sent by the destination network node, and performs the second data packet transmission according to the routing information.
  • the source network node performs the second data packet transmission according to the routing information sent by the destination network node in the initialization phase.
  • the multiple source network nodes of the data center receive the routing information sent by the destination network node, and uniquely determine the path of the current data packet transmission according to the identifier of the available transit switch in the routing information.
  • the network node 0, 1 receives the routing information sent by the network node 3, including the identifiers of the transit switches 0, 1, and 6.
  • the network node 0 can determine three paths for the current data packet transmission, and each path includes an uplink and downlink path. The three paths are:
  • Path 1 uplink path: network node 0 - switch group 0 - transit switch 0; downlink path: transit switch 0 - switch group 2 - network node 3;
  • Uplink path network node 0 - switch group 0 - transit switch 1; downlink path: transit switch 1 - switch group 2 - network node 3;
  • Path 3 uplink path: network node 0 - switch group 0 - transit switch 6; downlink path: intermediate switch 6 - switch group 2 - network node 3.
  • the paths of other source network nodes can be similar.
  • the following describes the specific path of the second data transmission in combination with the fat tree network and the leaf ridge network.
  • the network node 0 can determine three paths for the current data packet transmission, and each path includes an uplink and downlink path, and the three paths are respectively:
  • Path 1 uplink path: network node 0 - edge switch 0 - aggregation switch 0 - core switch 0; downlink path: core switch 0 - aggregation switch 6 - edge switch 7 - network node 3;
  • uplink path network node 0 - edge switch 0 - aggregation switch 0 - core switch 1; downlink path: core switch 1 - aggregation switch 6 - edge switch 7 - network node 3;
  • Path 3 uplink path: network node 0 - edge switch 0 - aggregation switch 2 - core switch 6; downlink path: core switch 6 - aggregation switch 8 - edge switch 7 - network node 3.
  • the network node 0 can determine three paths of the current data packet transmission, and each path includes an uplink and downlink path, and the three paths are respectively:
  • Path 1 uplink path: network node 0 - leaf switch 0 - spine switch 0; downlink path: spine switch 0 - leaf switch 2 - network node 3;
  • uplink path network node 0-leaf switch 0-ridge switch 1
  • downlink path spine switch 1-leaf switch 2 - network node 3;
  • the first controller determines an available transit switch of the data center network.
  • the plurality of source network nodes transmit data via the available transit switch selected by the first controller in the initialization phase.
  • the first controller can acquire the data packet transmitted in the second time, and determine the state of the available transit switch selected by the first controller in the initialization phase according to the data packet transmitted in the second time.
  • step S202 the first controller can learn the status of the six intermediate switches, and re-confirm the available intermediate switches among the six intermediate switches when the second data packet transmission is performed.
  • the first controller cannot know the state of the transit switch (transit switch 3, 4, 5) in the second packet transmission without the relay data in the second packet transmission.
  • the effective duration of the unavailable transit switch by setting the effective duration of the unavailable transit switch, the following situation is avoided: after the partial transit switch is confirmed to be unavailable, even if the actual state transitions to be available in subsequent data transmission, the first controller cannot know the same. The state transition cannot be used to transfer data. In this way, the waste of bandwidth resources can be avoided.
  • the preset duration may be determined according to the speed at which the relay switch forwards data. Optionally, the faster the intermediate switch forwards data, the shorter the preset duration.
  • the first controller learns that the transit switches 3, 4, and 5 are unavailable transit switches, and therefore, the transit switch 3 is not used in the second data transmission after the initialization phase. 4, 5.
  • the first controller cannot know the status of the relay switches 3, 4, and 5 through the data packet.
  • the transit switches 3, 4, 5 are unavailable during the initialization phase and may become available during the second data transmission. Therefore, in order to prevent the first controller from always thinking that the state of the transit switch 3, 4, 5 is unavailable, in the present application, in the last data transmission (ie, in the initialization phase), the first controller acquires the transit switch 3, 4 When the state of 5 is unavailable, the first controller sets a preset duration. After the preset duration, the first controller considers the state of the transit switches 3, 4, 5 to be available.
  • the first controller determines that the available transit switch of the data center network has exceeded the preset duration in step S202, then The first controller considers that the state of the transit switches 3, 4, 5 is available at this time.
  • the available transit switch determined by the first controller includes the available transit switch determined by the ECN domain of the data packet, and further includes an available transit switch determined by the preset duration.
  • the first controller determines, from the available transit switches of the data center network, the available transit switches that respectively transfer the m groups of data.
  • the first controller instructs the destination network node to send routing information to the source network node, where the routing information includes an identifier of an available transit switch for the data group transmitted by the transit source network node to the destination network node.
  • the routing information in step S204 is used by each source network node to determine the path of the third data transmission.
  • steps S203-S204 is similar to the implementation of steps S103-S104 in the embodiment of FIG. 4, and may be referred to the related description, and details are not described herein.
  • the first controller can determine the available transit switch in each data transmission, and find the corresponding packing sequence group in the plurality of packing sequence groups stored in advance, according to the packing sequence.
  • the group determines the available transit switches that respectively relay the m sets of data.
  • the first controller excludes the unavailable transit switch when determining the available transit switch for respectively transferring the m group data, the overloaded transit switch is not used, and the load of the intermediate switch is further prevented, which is equivalent to performing The first load balancing.
  • each available transit switch is used, and the amount of data undertaken by each available transit switch is not much different, which is equivalent to being in all available transit switches.
  • a second load balancing is achieved.
  • the data transmission method of the present application has low computational complexity and high efficiency, can avoid excessive load of some transit switches, implement load balancing of each transit switch in the data center network, improve bandwidth resource utilization, and reduce transmission delay. .
  • FIG. 6 is a schematic structural diagram of a controller 10 provided by the present application.
  • the controller 10 may be implemented as a controller in the data center network of FIG. 2, or may be a POD controller in the fat tree network shown in FIG. 3A, a leaf controller in the leaf ridge network shown in FIG. 3B, or may be the above method.
  • the controller 10 may include a communication interface 103, one or more controller processors 101, a coupler 111, and a memory 105. These components can be connected by bus or other means, and FIG. 6 is exemplified by a bus connection. among them:
  • Communication interface 103 can be used by controller 10 to communicate with other devices, such as the switch group and network nodes of FIG. 2, the aggregation switches, edge switches, network nodes of FIG. 3A, leaf switches, network nodes, and the like in FIG. 3B.
  • the communication interface 103 can be a wired communication interface (such as an Ethernet interface).
  • Memory 105 is coupled to controller processor 101 for storing various software programs and/or sets of instructions.
  • memory 105 can include high speed random access memory, and can also include non-volatile memory, such as one or more magnetic disk storage devices, flash memory devices, or other non-volatile solid state storage devices.
  • the memory 105 can store an operating system such as an embedded operating system such as uCOS, VxWorks, RTLinux.
  • the memory 105 may further store a plurality of packing sequence groups in advance, and the controller may store related descriptions of a plurality of packing sequence groups in the concept (4) of the present application.
  • the memory 105 can be used to store an implementation of the data transmission method provided by one or more embodiments of the present application on the controller 10 side.
  • the data transmission method provided by one or more embodiments of the present application please refer to the method embodiments shown in FIG. 4-5.
  • the controller processor 101 may be a general purpose processor, such as a central processing unit (CPU), and the processor 101 may further include a hardware chip, and the hardware chip may be a combination of one or more of the following: an application specific integrated circuit ( Application specific integrated circuit (ASIC), field programmable gate array (FPGA), complex programmable logic device (CPLD).
  • the processor 101 can process the received data.
  • the processor 601 can also determine an available transit switch in the data center network according to the received data, and determine a transit switch that relays data sent by each source network node.
  • the controller processor 101 is operable to read and execute computer readable instructions. Specifically, the controller processor 101 can be used to invoke a program stored in the memory 105, for example, the implementation of the data transmission method provided by one or more embodiments of the present application on the controller 10 side, and execute the instructions included in the program. .
  • controller 10 shown in FIG. 6 is only one implementation of the embodiment of the present invention. In practical applications, the controller 10 may further include more or fewer components, which are not limited herein.
  • FIG. 7 is a functional block diagram of the controller 20 provided by the present application.
  • the controller 20 may include a determining unit 201 and an indicating unit 202. among them,
  • the determining unit 201 is configured to determine, from the available transit switches of the data center network, the available transit switches that respectively transfer the m sets of data; the m sets of data are respectively connected to the destination switch group by the source network nodes connected by the m source switch groups The data transmitted by the destination network node; the data center network includes a plurality of transit switches, the m source switch groups, the destination switch group, the source network node, and the destination network node; wherein the available The transit switch is a transit switch in which the load does not exceed the first threshold in the plurality of transit switches; m is a positive integer;
  • one available transit switch is used to transfer at least one group of the data, and the difference between the number of groups in which the data is transferred by any two available transit switches does not exceed a second threshold;
  • the indicating unit 202 is configured to instruct the destination network node to send routing information to the source network node, where the routing information includes an available transit switch for relaying a data group that the source network node transmits to the destination network node.
  • the controller 20 further includes an obtaining unit 203, configured to acquire at least one data packet, and the determining unit 201 is further configured to: indicate, in the data packet, that the value of the congestion indication field is the first value. In the case, it is determined that the transit switch that sends the data packet is the available transit switch; and in the case that the congestion display indication value of the domain is the second value, the transit switch that sends the data packet is determined.
  • the available transit switch is after the preset duration.
  • the at least one data packet is from the plurality of transit switches, or the at least one data packet is from an available transit switch in a last data transmission.
  • any one of the available transit switches determined by the first controller transfers the group of the data. No more than Where k is the number of available transit switches that relay a set of said data.
  • the determining unit 201 determines, according to the packing sequence, the intermediate switch that respectively transfers the m sets of data. Specifically, the determining unit 201 determines, in the plurality of packing sequence groups stored in advance, the m source switch groups respectively. a packing sequence; wherein a source switch group corresponds to a packing sequence, the packing sequence includes v elements, and the v elements respectively correspond to v available transit switches in the data network; when an element value is The third value, the available transit switch corresponding to the element is a transit switch that transfers data transmitted by the source network node connected to the source switch group to the destination network node connected to the destination switch group;
  • a packing sequence group includes m packing sequences And in each of the plurality of packing sequence groups, the third value is taken at least once for any one of the packing sequence groups, and at most Third value;
  • v is the number of available transit switches and k is the number of available transit switches that relay a set of said data.
  • the routing information is carried in an acknowledgement signal.
  • the data center network is a fat tree network, or the data center network is a leaf ridge network.
  • the present application further provides a data center network, which may be the network shown in FIG. 2, FIG. 3A or FIG. 3B, and may include: a transit switch, a switch group, a network node, and a controller.
  • the controller may be the first controller in the method embodiment corresponding to FIG. 4 to FIG. 5 respectively.
  • the controller may be a controller in the central network shown in FIG. 2, which may be a POD controller in the fat tree network shown in FIG. 3A, or may be in a leaf ridge network as shown in FIG. 3B.
  • Leaf controller may be a controller in the central network shown in FIG. 2, which may be a POD controller in the fat tree network shown in FIG. 3A, or may be in a leaf ridge network as shown in FIG. 3B.
  • Leaf controller may be a controller in the central network shown in FIG. 2, which may be a POD controller in the fat tree network shown in FIG. 3A, or may be in a leaf ridge network as shown in FIG. 3B.
  • Leaf controller may be a controller in the central network shown in FIG. 2, which may be a POD controller in the fat tree network shown in FIG. 3A, or may be in a leaf ridge network as shown in FIG. 3B. Leaf controller.
  • the controller may be the controller shown in FIG. 6 or FIG. 7.
  • the implementation of the present application can implement load balancing of each transit switch in the data center network, improve bandwidth resource utilization, and reduce transmission delay.
  • the computer program product includes one or more computer instructions.
  • the computer can be a general purpose computer, a special purpose computer, a computer network, or other programmable device.
  • the computer instructions can be stored in a computer readable storage medium or transferred from one computer readable storage medium to another computer readable storage medium, for example, the computer instructions can be from a website site, computer, server or data center Transfer to another website site, computer, server, or data center via wired (eg, coaxial cable, fiber optic, digital subscriber line) or wireless (eg, infrared, wireless, microwave, etc.).
  • the computer readable storage medium can be any available media that can be accessed by a computer or a data storage device such as a server, data center, or the like that includes one or more available media.
  • the usable medium may be a magnetic medium (eg, a floppy disk, a hard disk, a magnetic tape), an optical medium (eg, a DVD), or a semiconductor medium (eg, a solid state hard disk).

Abstract

本申请公开了一种数据传输方法、相关装置及网络。该方法可包括:第一控制器从数据中心网络的可用中转交换机中,确定出分别中转m组数据的可用中转交换机;其中,一个可用中转交换机用于中转至少一组所述数据,且任意两个可用中转交换机中转所述数据的组数的差值不超过第二阈值;所述第一控制器指示所述目的网络节点向所述源网络节点发送路由信息,所述路由信息包括用于中转所述源网络节点向所述目的网络节点传输的数据组的可用中转交换机的标识。上述方案能够实现数据中心网络中各个中转交换机的负载均衡,提高带宽资源利用率,减小传输时延。

Description

数据传输方法、相关装置及网络 技术领域
本申请涉及数据中心(data center,DC)及流量调度技术领域,特别涉及数据传输方法、相关装置及网络。
背景技术
数据中心是用于在互联网络基础设施上传递、加速、展示、计算、存储数据信息的网络,可包括计算机系统和其它与之配套的设备(例如通信和存储系统)、数据通信连接设备、环境控制设备、监控设备以及各种安全装置。数据中心广泛应用于分布式存储、大数据分析等。
当前,大部分数据中心的网络拓扑都实现为多级交换网络,例如胖树网络(fat tree)、叶脊网络(leaf and spine architecture)等。在数据中心网络中,承担中转数据作用的中转设备(例如胖树网络中的核心交换机、叶脊网络中的脊交换机等)利用率非常高,各个中转设备的负载,即各个中转设备中转的数据量影响到整个数据中心网络的传输时延、带宽资源利用率等。如何平衡各个中转设备的负载,以降低数据包的传输时延、提高带宽资源利用率是亟需解决的问题。
发明内容
本申请提供了一种数据传输方法、相关装置及网络,能够实现数据中心网络中各个中转交换机的负载均衡,提高带宽资源利用率,减小传输时延。
第一方面,本申请提供了一种数据传输方法,应用于第一控制器侧,该方法可包括:第一控制器从数据中心网络的可用中转交换机中,确定出分别中转m组数据的可用中转交换机;所述m组数据分别为m个源交换机组连接的源网络节点向目的交换机组连接的目的网络节点传输的数据;所述数据中心网络包括多个中转交换机、所述m个源交换机组,所述目的交换机组、所述源网络节点、所述目的网络节点;其中,所述可用中转交换机为所述多个中转交换机中负载不超过第一阈值的中转交换机;m为正整数;其中,一个可用中转交换机用于中转至少一组所述数据,且任意两个可用中转交换机中转所述数据的组数的差值不超过第二阈值;所述第一控制器指示所述目的网络节点向所述源网络节点发送路由信息,所述路由信息包括用于中转所述源网络节点向所述目的网络节点传输的数据组的可用中转交换机的标识。
具体的,本申请中的数据中心网络包括多个中转交换机、交换机组、控制器和网络节点。其中,每个交换机组都和所有的中转交换机连接,控制器控制交换机组以及和交换机组连接的网络节点。
在数据传输过程中,数据的传输路径为:网络节点-交换机组-中转交换机-另一个交换机组-另一个网络节点。本申请中,将发送数据的网络节点称为源网络节点,将连接源网络 节点的交换机组称为源交换机组,将控制源交换机组和其连接的源网络节点的控制器称为第一控制器,将接收数据的网络节点称为目的网络节点,将连接目的网络节点的交换机组称为目的交换机组。
可选的,本申请中的数据中心网络为胖树网络,中转交换机实现为核心交换机,交换机组实现为基本物理交换单元(point of delivery,POD),控制器控制POD以及POD连接的网络节点。
可选的,本申请中的数据中心网络为叶脊网络,中转交换机实现为脊交换机,交换机组实现为叶交换机,控制器控制叶交换机以及叶交换机连接的网络节点。
实施第一方面描述的方法,能够实现各个中转交换机中转的负载均衡,提高带宽资源利用率,减小传输时延。
结合第一方面,在一些实施例中,第一控制器从数据中心网络的可用中转交换机中确定出分别中转m组数据的可用中转交换机之前,还需要确定数据中心网络的所有中转交换机中当前可用的中转交换机。
本申请中,数据分多次进行传输,第一控制器在每一次数据传输前确定当前的可用中转交换机。
下面说明本申请中第一控制器确定可用中转交换机的方法。本申请中,中转交换机都具有显示拥塞通知(explicit congestion notification,ECN)功能,在数据包到达中转交换机时,若当前中转交换机的负载超过第一阈值,中转交换机将数据包的ECN域的取值修改为第二值;若负载不超过第一阈值,中转交换机不对数据包的ECN域作修改,或对数据包的ECN域作修改,使得数据包的ECN域的取值为第一值。可选的,第一值和第二值可以预先定义。可选的,第一阈值可以由第一控制器根据中转交换机的处理能力预存储或预配置。
(1)在初始化阶段,即源网络节点向目的网络节点传输初始数据包时,各个初始数据包的传输路径覆盖了全部的中转交换机。第一控制器获取来自全部中转交换机的初始数据包,在初始数据包的ECN域的取值为第一值时,确定发送该数据包的中转交换机为可用中转交换机;当初始数据包的ECN域的取值为第二值时,确定发送该数据包的中转交换机为不可用中转交换机。
(2)在初始化后的数据传输阶段。初始化后,源网络节点按照第一控制器确定的中转交换机确定传输数据的路径。以第2次数据传输为例,第2次数据传输时,数据中心网络的中转交换机中,部分中转交换机被使用,其余中转交换机未被使用。
其中,被使用到的部分中转交换机为初始化阶段中,第一控制器确定用于在第2次数据传输中发送数据的中转交换机。第一控制器可根据第2次数据传输中数据包的ECN域,确定该被使用到的部分中转交换机中的可用中转交换机。
其中,未被使用到的其余中转交换机为初始化阶段中,第一控制器确定的不可用中转交换机。在初始化阶段,从第一控制器确定不可用中转交换机时开始,经过预设时长后,第一控制器认为这些中转交换机为可用中转交换机。
结合第一方面,在一些实施例中,为了实现任意两个可用中转交换机中转数据的组数的差值不超过第二阈值,第一控制器在确定分别中转m组数据的可用中转交换机时,确定的任意一个可用中转交换机中转数据的组数不超过
Figure PCTCN2018117821-appb-000001
其中,k为第一控制器确定的中 转一组数据的可用中转交换机的数量,v为数据中心的可用中转交换机的数量。
结合第一方面,在一些实施例中,为了实现一个可用中转交换机用于中转至少一组数据,且任意两个可用中转交换机中转数据的组数的差值不超过第二阈值,第一控制器可基于packing序列确定分别中转m组数据的可用中转交换机,下面详细说明。
首先,介绍packing序列。在数据中心网络中,多个控制器都可根据V、K、M三个参数预先存储相同的多个packing序列组。其中,1≤V≤A,1≤K≤A,1≤M≤B,A为数据中心网络的中转交换机的数量,B为交换机组的数量。
具体的,根据确定的V、K、M构造的1个packing序列组满足以下3个条件:
1.1个packing序列组包括M个packing序列。
2. M个packing序列中的任意1个packing序列都包括V个元素。在V个元素中,当V>K时,有K个元素取第三值;当V≤K时,V个元素都取第三值。
3.在M个packing序列中的每一个packing序列中,任意一个元素最多能取
Figure PCTCN2018117821-appb-000002
次第三值。其中,第三值可以预先定义,本申请不做限制。例如,第三值可以为1。
可知,1个packing序列组具有以下特征:在packing序列组的M个packing序列中,任意一个元素至少会取一次第三值,且各个元素取第三值的次数差不多。
其次,介绍第一控制器基于packing序列确定分别中转m组数据的可用中转交换机的过程。
具体的,第一控制器根据三个参数V=v、K=k、M=m在预先存储的多个packing序列组中找到对应的1个packing序列组。其中,v为可用中转交换机的数量,k为中转一组数据的可用中转交换机的数量,m为源交换机组的数量。
在该对应的1个packing序列组中,一个源交换机组对应一个packing序列,所述packing序列包括v个元素,所述v个元素分别对应所述数据网络中的v个可用中转交换机;当一个元素的取值为第三值时,所述元素对应的可用中转交换机为,中转所述源交换机组连接的源网络节点向目的交换机组连接的目的网络节点传输的数据的中转交换机。
本申请中,第一控制器基于packing序列确定分别中转m组数据的可用中转交换机,能够实现数据中心网络中各个可用中转交换机的负载均衡,提高带宽资源利用率,减小传输时延。
结合第一方面,在一些实施例中,路由信息承载在应答信号中。
第二方面,本申请提供一种控制器,该控制器可包括多个功能模块,用于相应的执行第一方面所提供的方法,或者第一方面可能的实施方式中的任意一种所提供的方法。
第三方面,本申请提供一种控制器,用于执行第一方面描述的数据传输方法。所述控制器可包括:存储器以及与所述存储器耦合的处理器,其中:所述存储器用于存储第一方面描述的数据传输方法的实现代码,所述处理器用于执行所述存储器中存储的程序代码,即执行第一方面所提供的方法,或者第一方面可能的实施方式中的任意一种所提供的方法。
第四方面,本申请提供了一种网络,所述网络包括:控制器、中转交换机、交换机组和网络节点。其中,所述控制器可以是上述第二方面或第三方面描述的控制器。
第五方面,提供了一种计算机可读存储介质,所述可读存储介质上存储有指令,当其在计算机上运行时,使得计算机执行上述第一方面描述的数据传输方法。
第六方面,提供了一种包含指令的计算机程序产品,当其在计算机上运行时,使得计算机执行上述第一方面描述的数据传输方法。
实施本申请,第一控制器从数据中心网络的可用中转交换机中,确定出分别中转m组数据的可用中转交换机,并且,一个可用中转交换机用于中转至少一组数据,任意两个可用中转交换机中转数据的组数的差值不超过第二阈值。本申请能够实现数据中心网络中各个中转交换机的负载均衡,提高带宽资源利用率,减小传输时延。
附图说明
图1A为现有技术中胖树网络的结构示意图;
图1B为现有技术中叶脊网络的结构示意图;
图2为本申请提供的数据中心网络的结构示意图;
图3A为本申请提供的胖树网络的结构示意图;
图3B为本申请提供的叶脊网络的结构示意图;
图4为本申请提供的数据传输方法中初始化阶段的流程示意图;
图5为本申请提供的数据传输方法中初始化后的流程示意图;
图6为本申请提供的控制器的结构示意图;
图7为本申请提供的控制器的功能框图。
具体实施方式
本申请的实施方式部分使用的术语仅用于对本申请的具体实施例进行解释,而非旨在限定本申请。
当前,数据中心网络通常采用层级拓扑结构。下面简单介绍两种常见基于层级拓扑结构的数据中心网络:胖树网络和叶脊网络。
(一)胖树网络
可参见图1A,图1A为基于clos的4元胖树网络结构的示意图。
在一个n元的胖树网络结构中,即由n个数据中心基本物理交换单元(point of delivery,POD)组成的胖树网络结构中,包括n 2/4个核心交换机、n 2/2个汇聚交换机、n 2/2个边缘交换机和n 3/4个网络节点。
其中,每n/2个汇聚交换机和n/2个边缘交换机组成一个POD。在任意一个POD中,每个边缘交换机分别连接不同的n/2个网络节点,且各个边缘交换机都连接上层的n/2个汇聚交换机。
其中,每n/2个核心交换机组成一个核心交换机组。第t个核心交换机组中的全部核心交换机与每个POD中的第t个汇聚交换机相连,0≤t≤n/2。
从图1A中可以看出,在一个n元的胖树网络拓扑结构中,每个POD都连接到所有的核心交换机,一个确定的POD中和某个核心交换机连接的汇聚交换机是唯一的。
(二)叶脊网络
参见图1B,图1B为叶脊网络拓扑结构的示意图。叶脊网络拓扑结构中包括脊交换机 和叶交换机,每一个叶交换机连接到所有的脊交换机。如图1B所示,每个叶交换机连接的网络节点的数量和叶交换机的下行链路端口数相关。
可理解的,上述图1A及图1B中提到的各个有连接关系的设备之间,都是通过端口进行连接的。
上述可知,在基于层级拓扑结构的数据中心网络中,任意两个网络节点之间可能存在多条经由不同中转设备(如胖树网络中的核心交换机、叶脊网络中的脊交换机)的路径。
目前,在数据中心网络中,通常使用以下两种方法来调度数据的传输路径:分布式调度和集中式调度。
(一)分布式调度
基于哈希(hash)算法的等价多路径(equal-cost multi-path,ECMP)转发技术是典型的分布式调度技术中的一种。在ECMP中,可参见图1A所示的胖树网络,两个POD之间的数据传输路径有多条,即当一个POD连接的网络节点向另一个POD连接的网络节点传输数据时,存在多条等价的候选路径,这些数据根据哈希算法使用多条候选路径进行传输。
因为哈希算法的本质是随机映射,有可能出现2个或多个大象流(持续时间长,带宽要求高的数据流)同时选择同一条路径的情况,导致数据流发生碰撞,带宽资源分配不均、利用率低。
(二)集中式调度
在集中式调度方法中,通过集中控制器收集数据中心网络中的全局信息,包括全部的数据包信息、目的网络节点的数据请求信息、各个交换机的状态等,对各个数据包的路由做出最优的分配,从而实现各个中转设备的负载均衡。
集中式调度的方法虽然可以得到各个数据包最优的路径选择方案,但这种调度方法需要大量的信息交互,集中控制器收集信息的过程会占用带宽资源,给数据中心网络造成额外的开销;并且,这种调度方法计算复杂度高,反应时间长。
上述可以看出,在数据中心网络中,实现负载均衡的关键是平衡承担中转数据作用的中转设备(例如胖树网络中的核心交换机、叶脊网络中的脊交换机等)的工作量。
本申请提出了一种数据中心网络,可通过控制器来平衡经过各个中转设备的工作量,以实现负载均衡。
参见图2,本申请的数据中心网络包括:多个中转交换机、多个交换机组、多个控制器、和多个交换机组分别连接的网络节点。其中,任意一个交换机组都和全部中转交换机连接,多个交换机组分别连接不同的网络节点,一个交换机组连接一个控制器。
可理解的,上述图2中提到的各个有连接关系的设备之间,都是通过端口进行连接的。
可理解的,不限于上述提及的设备,具体实现中,本申请的数据中心网络还可包括其他的设备,本申请不做限制。
下面分别介绍数据中心网络中的各个网元。
(一)中转交换机
中转交换机是指数据中心网络中最上层的、用于中转数据的交换机,可以是图1A所 示胖树网络中的核心交换机,也可以是图1B所示的叶脊网络中的脊交换机,还可以是其他基于层级拓扑结构的数据中心网络中用于中转数据的交换机。
本申请中,中转交换机具有显示拥塞通知(explicit congestion notification,ECN)功能。下面简单说明中转交换机实现ECN功能的流程。
1.数据包到达中转交换机时,中转交换机查看数据包的目的地址。
2.中转交换机根据数据包的目的地址,确定用于转发该数据包的出端口。
可理解的,中转交换机有多个出端口。在确定数据包的目的地址后,中转交换机能唯一地确定用于转发该数据包的出端口。
3.中转交换机查看该出端口处的负载。
其中,出端口处的负载可由该出端口处等待被转发的数据包的个数、该出端口转发数据包的速度等因素确定。等待被转发的数据包的个数越多,或者,转发速度越慢,表示当前该出端口的负载越大。
4.如果当前该出端口的负载超过了第一阈值,中转交换机将新到达的该数据包中的ECN域设置为第二值,表示当前该出端口需要转发的数据包过多,正在经历拥塞;如果当前该出端口的负载不超过第一阈值,则中转交换机不会对该数据包中的ECN域作任何改变,数据包中的ECN域可以取第一值,表示当前该出端口还可以转发新的数据包。
其中,第一阈值可预先设置。可选的,第一阈值可根据中转交换机的处理能力进行设置。第一值和第二值可预先定义。
举例说明,如图2所示,假设网络节点0向网络节点3发送数据包,该数据包中的目的地址为网络节点3的地址,该数据包的传输路径经过中转交换机0。假设中转交换机0和交换机组2通过出端口0连接。中转交换机0查看出端口0处待转发的数据包个数,若大于5时,将新到达的数据包的ECN域设置为11;若小于或等于5时,中转交换机不对新到达的数据包的ECN域做任何改变,该数据包的ECN域可能为01或10。
由于本申请中的中转交换机具有ECN功能,接收到来自该中转交换机的数据包的设备可以通过数据包中的ECN域获知该中转交换机的状态:当前是否正在经历拥塞。
(二)交换机组
本申请中,交换机组是连接网络节点和中转交换机的一组设备,可以是图1A所示胖树网络中的POD,也可以是图1B所示的叶脊网络中的叶交换机,还可以是其他的分层传输网络中的用于连接网络节点和中转交换机的一组设备。
(三)控制器
本申请中,每个交换机组都配置有控制器,控制器连接并控制交换机组和交换机组对应的网络节点。
本申请中,控制器可用于获取来自各个中转交换机的数据包、确定中转交换机的状态、进行路由调度等。
(四)网络节点
本申请中,网络节点可以是工作站﹑服务器、终端设备等拥有唯一网络地址的设备。在数据中心网络中,连接不同的交换机组的任意两个网络节点之间进行通信时,需经过各自连接的交换机组和中转交换机。
举例说明,如图2所示,网络节点0、网络节点2分别连接交换机组0、交换机组1,网络节点0向网络节点2发送数据时,该数据的传输路径为:网络节点0-交换机组0-数据中心网络中的任意一个中转交换机-交换机组1-网络节点2。
(五)源网络节点和目的网络节点
本申请中,发送数据的网络节点称为源网络节点,接收数据的网络节点称为目的网络节点。当多个网络节点向同一个网络节点发送数据时,有多个源网络节点,一个目的网络节点。
相应地,和源网络节点连接的交换机组称为源交换机组,和目的网络节点连接的交换机组称为目的交换机组。本申请中,将目的交换机组的对应的控制器称为第一控制器,可理解的,第一控制器连接并控制目的交换机组和目的网络节点。
可理解的,当有多个网络节点向同一个网络节点发送数据时,对应有多个源交换机组和一个目的交换机组。例如,参见图2,当网络节点0、1、3、4都向网络节点2发送数据时,源交换机组包括:交换机组0、交换机组2、交换机组3,目的交换机组包括交换机组1。
可理解的,一个源网络节点向目的网络节点发送数据时,该数据的传输路径为:源网络节点-该源网络节点连接的源交换机组-中转交换机-目的网络节点连接的目的交换机组-目的网络节点。
下面结合胖树网络及叶脊网络具体描述本申请提供的数据中心网络。
参见图3A,图3A为本申请提供的胖树网络的结构示意图,图3A以6元胖树网络为例。和图1A相比,图3A所示的胖树网络中,每个POD都配置有POD控制器。本申请中,POD控制器可用于获取来自各个核心交换机的数据包、读取各个数据包的ECN域的取值并确定核心交换机的状态、进行路由调度等。
参见图3B,图3B为本申请提供的叶脊网络的结构示意图。和图1B相比,图3B所示的叶脊网络中,每个叶交换机都配置有叶控制器。本申请中,叶控制器可用于获取来自各个脊交换机的数据包、读取各个数据包的ECN域的取值并确定脊交换机的状态、进行路由调度等。
基于上述图2、图3A及图3B中的数据中心网络,为了实现各个中转交换机的负载均衡,降低数据的传输时延、提高带宽资源的利用率,本申请提供了一种数据传输方法。
本申请的主要发明原理可包括:第一控制器获知数据中心网络中各个中转交换机的状态,在可用中转交换机中,确定出分别用于中转来自不同源网络节点的数据的可用中转交换机。其中,各个可用中转交换机中转的数据量是均衡的。
下面对本申请涉及的几个概念进行描述:
(一)可用中转交换机、不可用中转交换机
本申请中,对于目的网络节点来说,数据中心网络中的各个中转交换机可分为两种:可用中转交换机和不可用中转交换机。
可理解的,在数据中心网络的多个中转交换机中,每个中转交换机分别有一个出端口 用于转发数据到该目的网络节点。
可用中转交换机是指:该出端口处的负载不超过第一阈值的中转交换机。
不可用中转交换机是指:该出端口处的负载超过第一阈值的中转交换机。
(二)一组数据
本申请中,将传输时经过同一个源交换机组的数据称为一组数据,即,将从同一个源交换机组连接的源网络节点发送给目的交换机组连接的目的网络节点的数据称为一组数据。可理解的,该一组数据可以是源交换机组连接的一个或多个源网络节点生成的。
以图3A所示的胖树网络为例说明,假设有6个源网络节点:网络节点0、网络节点1、网络节点2、网络节点4、网络节点5、网络节点6,对应有5个源POD:POD0、POD1、POD3、POD4、POD5;有1个目的网络节点3,对应1个目的POD:POD2。则,在图3A所示的胖树网络中,共有5组数据:分别是网络节点0和网络节点1、网络节点2、网络节点4、网络节点5、网络节点6发送给网络节点3的数据。
可理解的,一组数据可以由一个中转交换机中转,也可以由多个中转交换机共同合作中转。例如,参见3A,假设源POD为POD0,目的POD为POD2,则经过POD0的一组数据可以由核心交换机组0中转,也可以由核心交换机组0、核心交换机组1和核心交换机组1共同中转。当由多个中转交换机合作中转一组数据时,该多个中转交换机可以中转相同的数据,也可以分别中转一组数据中的不同部分。
可理解的,一组数据的传输过程中,可通过数据包的形式传输,并且可分多次传输,本申请不做任何限制。
(三)packing序列组
本申请中,1个packing序列组根据3个参数V、K、M确定。在V、K、M的取值确定后,根据以下3个条件构造对应的packing序列组:
1.1个packing序列组包括M个packing序列。
2. M个packing序列中的任意1个packing序列都包括V个元素。在第i个packing序列中,有K i个元素取第三值,0≤i≤M。在本申请的一些实施例中,K 1=K 2=K 3=…=K M=K,即在任意1个packing序列的V个元素中,当V>K时,有K个元素取第三值;当V≤K时,V个元素都取第三值。
3.在M个packing序列中的每一个packing序列中,任意一个元素最多能取
Figure PCTCN2018117821-appb-000003
Figure PCTCN2018117821-appb-000004
次第三值。
其中,第三值可以预先定义,本申请不做限制。例如,第三值可以为1。
从上述3个条件可知,1个packing序列组具有以下特征:在packing序列组的M个packing序列中,任意一个元素至少会取一次第三值,且各个元素取第三值的次数差不多。
参见表1,表示出了1个packing序列组的可能的实现形式。在该packing序列组中,V=6,K 1=K 2=K 3=…=K M=K=3,M=5,第三值为1。
Packing序列1 110100
Packing序列2 011010
Packing序列3 001101
Packing序列4 100110
Packing序列5 010011
表1 packing序列组
如表1所示,该packing序列组中包括5个packing序列,每个packing序列都包括6个元素。
在该packing序列组中,第1个元素取了2次“1”,第2个元素取了3次“1”,第3个元素取了2次“1”,第4个元素取了3次“1”,第5个元素取了3次“1”,第5个元素取了2次“1”。可见,任意一个元素取“1”的次数都是均衡的。
可理解的,当K 1、K 2、K 3…K M并不全部相等时,在一些情况下也能构造出对应的packing序列组。在构造出的packing序列组中,V个元素中的每个元素取第三值的次数是均衡的。
本申请中,V、K 1、K 2、K 3…K M、M的取值确定后,能够构造出对应的packing序列组。下面以K 1=K 2=K 3=…=K M=K的情况为例,简单介绍根据V、K、M构造对应的packing序列组的方法。
1.根据公式1,确定λ 1,λ 1为满足公式1时λ的最小值。
Figure PCTCN2018117821-appb-000005
2.构造(V,K,λ 1)-packing。
(V,K,λ 1)-packing由多个区组组成,每个区组为一个集合,每个区组包括的元素从V个给定的元素中选取。
(V,K,λ 1)--packing满足条件:每个区组包括K个元素,任意两个不同的元素最多同时出现在λ 1个区组中。
这里,经过第(1)个步骤,构造出的(V,K,λ 1)--packing中至少包括M个区组。
3.从(V,K,λ 1)--packing中的至少M个区组中,选择M个区组。其中,V个给定的元素中任意一个元素最多在
Figure PCTCN2018117821-appb-000006
个区组中出现。
4.当V>K时,根据选择的M个区组生成对应的packing序列组。具体的,第i个区组对应packing序列组中的第i个packing序列。其中,具体的生成规则可参照后续相关描述
当V≤K时,直接生成-packing序列组,其中的M个-packing序列中,每个元素都取第三值。举例说明,假设V=6,K=3,M=5。给定的V个元素为:1,2,3,4,5,6。
首先,根据公式1确定λ 1=2。
其次,构造(2,3,5)-packing,其中包括10个区组。(2,3,5)-packing可以为:{{1,2,4},{2,3,5},{3,4,6},{1,4,5},{2,5,6},{1,3,6},{2,3,4},{4,5,6},{1,2,6},{1,3,5}}。
然后,在该10个区组中选择5个区组,使得每个元素最多在
Figure PCTCN2018117821-appb-000007
个区组中出现。该5个区组可以为:{{1,2,4},{2,3,5},{3,4,6},{1,4,5},{2,5,6}}。
最后,V=6,K=3,V>K,根据选择的5个区组,生成对应的packing序列组。参见表 1,该5个区组对应的packing序列组可以如表1所示。举例说明,第1个区组为{1,2,4},元素1、2、4出现,对应第1个packing序列中的第1、2、4个元素取第三值(1),即,第1个packing序列为:110100。
显然地,根据上述方法构造出的packing序列组中,每一个元素取第三值的次数是均衡的。
(四)控制器存储多个packing序列组
本申请中,在一个确定的数据中心网络中,中转交换机的数量、交换机组的数量、控制器的数量是确定的。数据中心网络中的多个控制器,都根据中转交换机的数量、交换机组的数量预先存储多个packing序列组。可理解的,数据中心网络中多个控制器中存储的packing序列是相同的,适用于任意一个控制器为目的控制器的场景。
下面以数据中心网络中包括A个中转交换机,B个交换机组为例,说明控制器中存储的多个packing序列组。
控制器根据3个参数V、K、M构造packing序列组,其中,1≤V≤A,1≤K≤A,1≤M≤B。因此,控制器中可构造并存储A*A*B个packing序列组。
举例说明,参见图3A所示的胖树网络,其中包括9个核心交换机(中转交换机)、6个POD(交换机组)、6个控制器。图3A所示的6个控制器可根据1≤V≤9,1≤K≤9,1≤M≤6来构造并存储9*9*6=486个packing序列组。
例如,控制器中可存储当1≤V≤9,K=3,M=5时的9个packing序列组,该9个packing序列组可表2所示。其中,1列为1个packing序列组,例如,当V=6时对应的5个packing序列为1个packing序列组。
  1 2 3 4 5 6 7 8 9
1 1 11 111 1110 11010 110100 1010100 10010010 100100100
2 1 11 111 0111 01101 011010 1101000 01001001 010010010
3 1 11 111 1011 10110 001101 0011010 10100100 001001001
4 1 11 111 1101 01011 100110 0001101 01010100 100010001
5 1 11 111 1110 10101 010011 0100011 00101010 010001100
表2 1≤V≤9,K=3,M=5时的9个packing序列组
和上述表2类似,本申请中控制器还可构造其余的1≤V≤9,1≤K≤9,1≤M≤6时的多个packing序列组并进行存储。
下面介绍本申请的数据传输方法。本申请中,数据在数据中心网络中传输时,根据数据量以数据包的形式分多次进行传输。来自同一个源交换机组连接的源网络节点的数据在每一次传输中,可能经过不同的可用中转交换机。
本申请中,数据传输可分为两个阶段:初始化阶段(第1次数据包传输阶段),以及,初始化后的数据传输阶段。
参见图4,图4所示实施例以数据中心网络包括多个中转交换机、m个源交换机组、1个目的交换机组、多个源网络节点、1个目的网络节点,且一共传输m组数据为例进行介 绍。
图4为本申请提供的数据传输方法中初始化阶段的流程示意图。其中,初始化阶段能让第一控制器获知数据中心网络中全部中转交换机的状态,确定其中的可用中转交换机。
该方法可包括如下步骤:
S101、m个源交换机组连接的源网络节点向目的交换机组连接的目的网络节点传输初始数据包,并且,初始数据包的传输路径经过数据中心网络的全部中转交换机。
在可选实施例中,目的网络节点在调用数据中心网络中多个源网络节点的数据时,首先向存储数据的多个源网络节点发送数据传输请求。
具体的,多个源网络节点在接收到目的网络节点发送的数据传输请求后,首先传输初始数据包。其中,初始数据包可以为正常的一次数据传输中,多个源网络节点向目的网络节点发送的全部数据包;也可以是多个源网络节点向目的网络节点传输的全部数据中的很少一部分数据包,初始数据包较少时可以加快初始化的效率。
可理解的,步骤S101中初始数据包的传输路径为:源网络节点-源交换机组-中转交换机-目的交换机组-目的网络节点。
在本申请中,初始数据包的传输路径经过数据中心网络的全部中转交换机。为了使得初始数据包的传输路径经过数据中心网络的全部中转交换机,可以有以下两种策略:
第一种策略,每一个源交换机组在接收到其连接的源网络节点的初始数据包时,都将接收到的初始数据包转发至所有的中转交换机。通过第一种传输策略,每一个源交换机组在转发初始数据包时都简单地遍历全部的中转交换机,简单易行,且能够保证初始数据包的传输路径经过全部的中转交换机。
第二种策略,多个源交换机组在接收到其连接的源网络节点的初始数据包时,各自将初始数据包转发至不同的部分中转交换机。其中,各个源交换机组转发初始数据包的中转交换机可以预先规定,也可以由第一控制器规定并且携带在目的网络节点向多个源网络节点发送的数据传输请求中。
举例说明,参见图2,假设共有9个中转交换机,3个源交换机组:交换机组0、1、2,1个目的交换机组3。在第二种策略中,源交换机组0可将接收到的初始数据包分别转发至中转交换机0、1、2,源交换机组1可将接收到的初始数据包分别转发至中转交换机3、4、5,源交换机组2可将接收到的初始数据包分别转发至中转交换机6、7、8。
通过第二种策略,每一个源交换机组仅需转发初始数据包到部分的中转交换机,就能使得初始数据包的传输路径覆盖了全部的中转交换机,节约了带宽资源。
可理解的,本申请的中转交换机具有ECN功能,中转交换机在接收到数据包时,会根据当前转发数据至目的网络节点的出端口处的负载对数据包的ECN域进行改写,具体可参考图4所示数据中心网络中关于中转交换机的详细描述。
S102、第一控制器根据初始数据包确定数据中心网络的可用中转交换机。
具体的,第一控制器连接并控制目的交换机组和目的网络节点,在初始数据包到达目的交换机组或者目的网络节点时,第一控制器能够获取初始数据包。
由于步骤S101中,初始数据包遍历了全部的中转交换机,因此,第一控制器能够获取到每一个中转交换机转发的初始数据包,并根据初始数据包中的ECN域获知数据中心网络 中全部中转交换机的状态,确定可用中转交换机和不可用中转交换机。其中,根据初始数据包的ECN域确定可用中转交换机的方法可参照图4所示数据中心网络中关于中转交换机的详细描述,在此不赘述。
在可选实施例中,第一控制器确定当前的不可用中转交换机后,设定不可用中转交换机在预设时长后的状态为可用,这样设定的作用可参照后续图5实施例的详细描述,在此不赘述。
S103、第一控制器从数据中心网络的可用中转交换机中,确定出分别中转m组数据的可用中转交换机。其中,m组数据分别为m个源交换机组连接的源网络节点向目的交换机组连接的目的网络节点传输的数据。一个可用中转交换机用于中转至少一组所述数据,且任意两个可用中转交换机中转所述数据的组数的差值不超过第二阈值。
具体的,第一控制器在确定数据中心网络的可用中转交换机后,可以确定在初始化后的第2次数据传输中,分别中转m组数据的可用中转交换机。并且,使得一个可用中转交换机中转至少一组数据,且任意两个可用中转交换机中转数据的组数的差值不超过第二阈值。其中,第二阈值可以预先定义,例如第二阈值可以为1或2。
这样,在第2次的数据传输中,数据中心网络中的每一个可用中转交换机都被使用到,且,每一个可用中转交换机承担的数据量的差值在一定范围内,保证了可用中转交换机的负载均衡。
本申请中,第一控制器可以在以下两种情况下确定分别中转m组数据的可用中转交换机:
(1)每一组数据都由k个可用中转交换机中转。
在可选实施例中,为了保证任意两个可用中转交换机中转数据的组数的差值不超过第二阈值,第一控制器在确定分别中转m组数据的可用中转交换机时,可以使得任意一个可用中转交换机中转所述数据的组数不超过
Figure PCTCN2018117821-appb-000008
其中,k为所述第一控制器确定的中转一组所述数据的可用中转交换机的数量。k的具体取值由第一控制器根据当前触发该m组数据从多个源网络节点向目的网络节点传输的应用所要求的传输速率决定,例如,当该应用要求的传输速率越高时,k的取值越大。可理解的,中转一组所述数据的k个可用中转交换机中,该k个交换机可以都中转该组数据中的全部,也可以分别中转该组数据中的不同部分。可选的,当k个中转交换机分别中转该组数据中的不同部分时,可以将改组数据平均分为k份,每个中转交换机中转其中的一份数据,可以平均该k个中转交换机的工作量。
在另一可选实施例中,为了保证一个可用中转交换机用于中转至少一组所述数据,且任意两个可用中转交换机中转所述数据的组数的差值不超过第二阈值,第一控制器可从预先存储的多个packing序列组中选取对应的packing序列组,并根据packing序列组中的packing序列确定分别中转m组数据的可用中转交换机,下面详细说明:
1.首先,第一控制器根据可用中转交换机的数量、中转一组所述数据的可用中转交换机的数量、源交换机组的数量确定V,K,M的取值,根据V,K,M从预先存储的多个packing序列组中选取对应的1个packing序列组。
具体的,V的取值为可用中转交换机的数量v,K的取值为中转一组所述数据的可用中转交换机的数量k,M的取值为源交换机组的数量m。
其中,第一控制器中预先存储多个packing序列组可参照前文本申请涉及的概念(四)中控制器存储多个packing序列组的相关描述。
2.根据选取的1个packing序列组,确定分别中转m组数据的可用中转交换机。
具体的,在选取的1个packing序列组中,包括m个packing序列。其中,一个源交换机组对应一个packing序列。任意一个packing序列都包括v个元素,所述v个元素分别对应所述数据网络中的v个可用中转交换机;当一个元素的取值为第三值时,所述元素对应的可用中转交换机为,中转所述源交换机组连接的源网络节点向目的交换机组连接的目的网络节点传输的数据的中转交换机。
举例说明,假设如图2所示的数据中心网络中,包括6个源网络节点:网络节点0、1、2、4、5、6,目的网络节点为网络节点3,即数据中心网络包括5个源交换机组:交换机组0、交换机组1、交换机组3、交换机组4、交换机组5,目的交换机组为交换机组2。第一控制器为控制器2。假设第一控制器确定的可用中转交换机有6个:中转交换机0、1、2、6、7、8。
假设中转一组所述数据的可用中转交换机的数量为3,第三值为1。即v=6,k=3,m=5。第一控制器可根据V=6、K=3、M=5,在预先存储的多个packing序列组中找到对应的1个packing序列组。
可选的,该1个packing序列组可以如表1所示,其中,packing序列1-5分别对应5个源交换机组。第1个源交换机组对应于packing序列1(110100),表示网络节点0、1向网络节点3发送的数据,由第1、2、4个可用中转交换机(中转交换机0、1、6)中转。其他的packing序列的意义可以此类推,在此不赘述。
(2)每一组数据可以由不同数量的可用中转交换机中转。
具体的,第i组数据(即来自第i个源交换机组连接的源网络节点向目的交换机组连接的目的网络节点传输的数据)可以由k i个可用中转交换机中转,1≤i≤m。其中,k i的取值由第一控制器确定。在一种可能的实施方式中,来自每个源POD的数据的优先级可以不同,优先级越高的数据可以由数量越多的可用中转交换机中转,即该组数据对应的k i的取值越大。
在可选实施例中,为了保证任意两个可用中转交换机中转数据的组数的差值不超过第二阈值,第一控制器在确定分别中转m组数据的可用中转交换机时,可以使得任意一个可用中转交换机中转所述数据的组数不超过
Figure PCTCN2018117821-appb-000009
在另一可选实施例中,为了保证一个可用中转交换机用于中转至少一组所述数据,且任意两个可用中转交换机中转所述数据的组数的差值不超过第二阈值,第一控制器可从预先存储的多个packing序列组中选取对应的packing序列组,并根据packing序列组中的packing序列确定分别中转m组数据的可用中转交换机。该packing序列组可以是根据V、K 1、K 2、K 3…K M、M的取值构造并存储在第一控制器中的,该packing序列的构造过程可参照前文本申请涉及的概念(三)中packing序列组的相关描述。其中,V的取值为可用中转交换机的数量v,K的取值为中转一组所述数据的可用中转交换机的数量k,M的取值为源交换机组的数量m,K 1=k 1,K 2=k 2…K M=k m。可理解的,第一控制器根据packing序列组中的packing序列确定分别中转m组数据的可用中转交换机的过程可参照上述第(1)种 策略中的步骤2,在此不赘述。
从上述(1)(2)两种情况可知,本申请不仅适用于每个源POD中的数据由相同数量的可用中转交换机中转的情况,也适用于每个源POD中的数据由不同数量的可用中转交换机中转的情况。
S104、第一控制器指示目的网络节点向源网络节点发送路由信息,路由信息包括用于中转源网络节点向目的网络节点传输的数据组的可用中转交换机的标识。
具体的,第一控制器在确定分别中转m组数据的可用中转交换机后,指示目的网络节点将中转每一组数据的可用中转交换机的标识分别发送给对应的源网络节点。例如,以上述步骤S103中的例子说明,第一控制器可以指示目的网络节点将中转交换机0、1、6的标识发送给网络节点0、1。其中,可用中转交换机的标识可以是MAC地址、交换机虚拟接口(switch virtual interface,SVI)地址等可唯一确定数据中心网络中的中转交换机的标识
在可选实施例中,路由信息可以承载在目的网络节点接收到初始化数据包后发送给各个源网络节点的应答信号中(acknowledgement,ACK)。
经过图4所示方法实施例,完成了第1次的数据包传输,即初始化阶段。参见图5,图5为初始化后的数据传输阶段的流程示意图。图5以源网络节点接收到目的网络节点发送的路由信息,并根据路由信息进行第2次数据包传输为例进行说明,可包括如下步骤:
S201、源网络节点根据初始化阶段中目的网络节点发送的路由信息,进行第2次数据包传输。
具体的,数据中心的多个源网络节点都接收到目的网络节点发送的路由信息,并根据路由信息中的可用中转交换机的标识唯一地确定本次数据包传输的路径。
以图4实施例步骤S103中的例子为例,网络节点0、1接收到网络节点3发送的路由信息,其中包括中转交换机0、1、6的标识。网络节点0可确定本次数据包传输的3条路径,每条路径都包括上下行路径,这3条路径分别为:
路径1,上行路径:网络节点0-交换机组0-中转交换机0;下行路径:中转交换机0-交换机组2-网络节点3;
路径2,上行路径:网络节点0-交换机组0-中转交换机1;下行路径:中转交换机1-交换机组2-网络节点3;
路径3,上行路径:网络节点0-交换机组0-中转交换机6;下行路径:中转交换机6-交换机组2-网络节点3。
类似的,第2次数据包传输中,其他源网络节点的路径也可以此类推。
下面分别结合胖树网络和叶脊网络说明第2次数据传输的具体路径。
参见图3A,当数据中心网络为胖树网络时,网络节点0可确定本次数据包传输的3条路径,每条路径都包括上下行路径,这3条路径分别为:
路径1,上行路径:网络节点0-边缘交换机0-汇聚交换机0-核心交换机0;下行路径:核心交换机0-汇聚交换机6-边缘交换机7-网络节点3;
路径2,上行路径:网络节点0-边缘交换机0-汇聚交换机0-核心交换机1;下行路径: 核心交换机1-汇聚交换机6-边缘交换机7-网络节点3;
路径3,上行路径:网络节点0-边缘交换机0-汇聚交换机2-核心交换机6;下行路径:核心交换机6-汇聚交换机8-边缘交换机7-网络节点3。
参见图3B,当数据中心网络为叶脊网络时,网络节点0可确定本次数据包传输的3条路径,每条路径都包括上下行路径,这3条路径分别为:
路径1,上行路径:网络节点0-叶交换机0-脊交换机0;下行路径:脊交换机0-叶交换机2-网络节点3;
路径2,上行路径:网络节点0-叶交换机0-脊交换机1;下行路径:脊交换机1-叶交换机2-网络节点3;
路径3,上行路径:网络节点0-叶交换机0-脊交换机6;下行路径:脊交换机6-叶交换机2-网络节点3。
S202、第一控制器确定数据中心网络的可用中转交换机。
具体的,在第2次数据包传输中,多个源网络节点经由初始化阶段中第一控制器选择的可用中转交换机传输数据。可理解的,第一控制器能够获取到第2次传输的数据包,并根据第2次传输的数据包确定初始化阶段中第一控制器选择的可用中转交换机的状态。
以图4实施例步骤S103中的例子说明,参见图2,假设初始化阶段中第一控制器确定的可用中转交换机有6个:中转交换机0、1、2、6、7、8。那么,在步骤S202中,第一控制器能够获知这6个中转交换机的状态,重新确认在进行第2次数据包传输时,这6个中转交换机中的可用中转交换机。
显然地,在第2次数据包传输中,第一控制器无法通过数据包来获知第2次数据包传输中,没有中转数据的中转交换机(中转交换机3、4、5)的状态。
本申请中,通过设定不可用中转交换机的有效时长,避免以下情况:部分中转交换机被确认为不可用之后,即使在后续的数据传输中实际状态转变为可用,由于第一控制器无法获知其状态转变而无法利用其中转数据。通过这种方法能够避免带宽资源的浪费。其中,预设时长可以根据中转交换机转发数据的速度确定。可选的,中转交换机转发数据的速度越快,预设时长越短。
以上述例子为例说明,在初始化阶段,第一控制器获知中转交换机3、4、5为不可用中转交换机,因此,在初始化阶段后的第2次数据传输中并没有使用到中转交换机3、4、5。在进行第2次数据传输时,第一控制器无法通过数据包获知中转交换机3、4、5的状态。
中转交换机3、4、5在初始化阶段时为不可用状态,在第2次数据传输时可能转变为可用。因此,为了避免第一控制器一直认为中转交换机3、4、5的状态为不可用,本申请中,在上一次数据传输中(即初始化阶段中)第一控制器获取到中转交换机3、4、5的状态为不可用时,第一控制器设定一个预设时长,在预设时长后,第一控制器认为中转交换机3、4、5的状态为可用。
若第一控制器从初始化阶段中获取到中转交换机3、4、5的状态为不可用,到步骤S202中第一控制器确定数据中心网络的可用中转交换机时,已经超过了预设时长,那么第一控 制器认为此时中转交换机3、4、5的状态为可用。
通过步骤S202可知,在第2次数据传输中,第一控制器确定的可用中转交换机中,包括通过数据包的ECN域确定的可用中转交换机,还包括通过预设时长确定的可用中转交换机。
S203、第一控制器从数据中心网络的可用中转交换机中,确定出分别中转m组数据的可用中转交换机。
S204、第一控制器指示目的网络节点向源网络节点发送路由信息,路由信息包括用于中转源网络节点向目的网络节点传输的数据组的可用中转交换机的标识。
其中,步骤S204中的路由信息用于各个源网络节点确定第3次数据传输的路径。
可理解的,步骤S203-S204的实现和图4实施例中步骤S103-S104的实现类似,可参照相关描述,在此不赘述。
可理解的,在第2次数据传输之后,还会经过多次数据传输,直至m组数据传输完成。之后的多次数据传输的步骤都和图5所示的第2次数据传输的步骤相同,可参照相关描述,在此不赘述。
通过图4及图5所示方法实施例,第一控制器在每一次数据传输时可以确定可用中转交换机,并在预先存储的多个packing序列组中查找到对应的packing序列组,根据packing序列组确定分别中转m组数据的可用中转交换机。
这里,由于第一控制器在确定分别中转m组数据的可用中转交换机时,排除了不可用中转交换机,负载大的中转交换机不会被使用,避免进一步加大这些中转交换机的负载,相当于进行了第一次负载均衡。
第一控制器在确定分别中转m组数据的可用中转交换机时,每个可用中转交换机都被使用到,且每个可用中转交换机承担的数据量差别不大,相当于在所有的可用中转交换机中实现了第二次负载均衡。
综上,本申请的数据传输方法,计算复杂度低,效率高,能够避免部分中转交换机负载过大,实现数据中心网络中各个中转交换机的负载均衡,提高带宽资源利用率,减小传输时延。
上述详细描述了本申请的数据传输方法,为了便于更好地实施本申请的上述方法,相应地,下面提供了本申请的相关装置。
参见图6,图6为本申请提供的控制器10的结构示意图。控制器10可以实现为图2数据中心网络中的控制器,也可以是图3A所示胖树网络中的POD控制器、图3B所示叶脊网络中的叶控制器,还可以是上述方法实施例中的控制器。如图6所示,控制器10可包括:通信接口103、一个或多个控制器处理器101、耦合器111和存储器105。这些部件可通过总线或者其它方式连接,图6以通过总线连接为例。其中:
通信接口103可用于控制器10与其他设备,例如图2中的交换机组和网络节点,图3A中的汇聚交换机、边缘交换机、网络节点,图3B中的叶交换机、网络节点等进行通信。具体实现中,通信接口103可以是有线的通信接口(例如以太网接口)。
存储器105与控制器处理器101耦合,用于存储各种软件程序和/或多组指令。具体实现中,存储器105可包括高速随机存取的存储器,并且也可包括非易失性存储器,例如一个或多个磁盘存储设备、闪存设备或其他非易失性固态存储设备。存储器105可以存储操作系统,例如uCOS、VxWorks、RTLinux等嵌入式操作系统。本申请中,存储器105还可以预先存储多个packing序列组,可参照上述本申请涉及的概念(四)中控制器存储多个packing序列组的相关描述。
在本申请的一些实施例中,存储器105可用于存储本申请的一个或多个实施例提供的数据传输方法在控制器10侧的实现程序。关于本申请的一个或多个实施例提供的数据传输方法的实现,请参考图4-图5所示方法实施例。
控制器处理器101可以是通用处理器,例如中央处理器(central processing unit,CPU),处理器101还可包括硬件芯片,上述硬件芯片可以是以下一种或多种的组合:专用集成电路(application specific integrated circuit,ASIC)、现场可编程逻辑门阵列(field programmable gate array,FPGA),复杂可编程逻辑器件(complex programmable logic device,CPLD)。处理器101可处理接收到的数据,本申请中,处理器601还可根据接收到的数据确定数据中心网络中的可用中转交换机,以及,确定中转各个源网络节点发送的数据的中转交换机。
本发明实施例中,控制器处理器101可用于读取和执行计算机可读指令。具体的,控制器处理器101可用于调用存储于存储器105中的程序,例如本申请的一个或多个实施例提供的数据传输方法在控制器10侧的实现程序,并执行该程序包含的指令。
需要说明的,图6所示的控制器10仅仅是本发明实施例的一种实现方式,实际应用中,控制器10还可以包括更多或更少的部件,这里不作限制。
参见图7,图7为本申请提供的控制器20的功能框图。
如图7所示,控制器20可包括:确定单元201和指示单元202。其中,
确定单元201,用于从数据中心网络的可用中转交换机中,确定出分别中转m组数据的可用中转交换机;所述m组数据分别为m个源交换机组连接的源网络节点向目的交换机组连接的目的网络节点传输的数据;所述数据中心网络包括多个中转交换机、所述m个源交换机组,所述目的交换机组、所述源网络节点、所述目的网络节点;其中,所述可用中转交换机为所述多个中转交换机中负载不超过第一阈值的中转交换机;m为正整数;
其中,一个可用中转交换机用于中转至少一组所述数据,且任意两个可用中转交换机中转所述数据的组数的差值不超过第二阈值;
指示单元202,用于指示所述目的网络节点向所述源网络节点发送路由信息,所述路由信息包括用于中转所述源网络节点向所述目的网络节点传输的数据组的可用中转交换机的标识。
在可选实施例中,控制器20还包括获取单元203,用于获取至少一个数据包;确定单元201还用于:在所述数据包中的拥塞显示指示域的取值为第一值的情况下,确定发送所述数据包的中转交换机为所述可用中转交换机;在所述数据包中的拥塞显示指示域的取值为第二值的情况下,确定发送所述数据包的中转交换机在预设时长后为所述 可用中转交换机。
在可选实施例中,所述至少一个数据包来自所述多个中转交换机,或者,所述至少一个数据包来自上一次数据传输中的可用中转交换机。
在可选实施例中,为保证所述任意两个可用中转交换机中转所述数据的组数的差值不超过第二阈值,第一控制器确定的任意一个可用中转交换机中转所述数据的组数不超过
Figure PCTCN2018117821-appb-000010
其中,k为中转一组所述数据的可用中转交换机的数量。
在可选实施例中,确定单元201根据packing序列确定分别中转m组数据的中转交换机,具体的:确定单元201在预先存储的多个packing序列组中,确定所述m个源交换机组分别对应的packing序列;其中,一个源交换机组对应一个packing序列,所述packing序列包括v个元素,所述v个元素分别对应所述数据网络中的v个可用中转交换机;当一个元素的取值为第三值时,所述元素对应的可用中转交换机为,中转所述源交换机组连接的源网络节点向目的交换机组连接的目的网络节点传输的数据的中转交换机;
且,当v>k时,所述v个元素中有k个元素取第三值,当v≤k时,所述v个元素取第三值;其中,一个packing序列组包括m个packing序列,在所述多个packing序列组中的每一个packing序列组中,任意一个元素最少取一次第三值,最多取
Figure PCTCN2018117821-appb-000011
次第三值;
其中,v为所述可用中转交换机的数量,k为中转一组所述数据的可用中转交换机的数量。
在可选实施例中,所述路由信息承载在应答信号中。
在可选实施例中,所述数据中心网络为胖树网络,或者,所述数据中心网络为叶脊网络。
可理解的,关于控制器20包括的各个功能单元的具体实现,可参考前述图4-图5以及相关描述,这里不再赘述。
另外,本申请还提供了一种数据中心网络,所述数据中心网络可以是图2、图3A或图3B所示的网络,可包括:中转交换机、交换机组、网络节点和控制器。其中,所述控制器可以是图4-图5分别对应的方法实施例中的第一控制器。
具体实现中,所述控制器可以是图2所示所述中心网络中的控制器,可以是图3A所示胖树网络中的POD控制器,还可以是图3B所示叶脊网络中的叶控制器。
具体实现中,所述控制器可以是图6或图7所示的控制器。
综上,实施本申请,能够实现数据中心网络中各个中转交换机的负载均衡,提高带宽资源利用率,减小传输时延。
在上述实施例中,可以全部或部分地通过软件、硬件、固件或者其任意组合来实现。当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。所述计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行所述计算机程序指令时,全部或部分地产生按照本申请所述的流程或功能。所述计算机可以是通用计算机、专用计算机、计算机网络、或者其他可编程装置。所述计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一个计算机可读存储介质传输,例如,所述计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线(例如同轴电缆、光纤、数字 用户线)或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。所述计算机可读存储介质可以是计算机能够存取的任何可用介质或者是包含一个或多个可用介质集成的服务器、数据中心等数据存储设备。所述可用介质可以是磁性介质,(例如,软盘、硬盘、磁带)、光介质(例如,DVD)、或者半导体介质(例如固态硬盘Solid State Disk)等。

Claims (16)

  1. 一种数据传输方法,其特征在于,包括:
    第一控制器从数据中心网络的可用中转交换机中,确定出分别中转m组数据的可用中转交换机;所述m组数据分别为m个源交换机组连接的源网络节点向目的交换机组连接的目的网络节点传输的数据;所述数据中心网络包括多个中转交换机、所述m个源交换机组,所述目的交换机组、所述源网络节点、所述目的网络节点;其中,所述可用中转交换机为所述多个中转交换机中负载不超过第一阈值的中转交换机;m为正整数;
    其中,一个可用中转交换机用于中转至少一组所述数据,且任意两个可用中转交换机中转所述数据的组数的差值不超过第二阈值;
    所述第一控制器指示所述目的网络节点向所述源网络节点发送路由信息,所述路由信息包括用于中转所述源网络节点向所述目的网络节点传输的数据组的可用中转交换机的标识。
  2. 如权利要求1所述的方法,其特征在于,所述第一控制器从数据中心网络的可用中转交换机中,确定出分别中转m组数据的可用中转交换机之前,还包括:
    所述第一控制器获取至少一个数据包;
    在所述数据包中的拥塞显示指示域的取值为第一值的情况下,确定发送所述数据包的中转交换机为所述可用中转交换机;
    在所述数据包中的拥塞显示指示域的取值为第二值的情况下,确定发送所述数据包的中转交换机在预设时长后为所述可用中转交换机。
  3. 如权利要求2所述的方法,其特征在于,所述至少一个数据包来自所述多个中转交换机,或者,所述至少一个数据包来自上一次数据传输中的可用中转交换机。
  4. 如权利要求1-3任一项所述的方法,其特征在于,所述任意两个可用中转交换机中转所述数据的组数的差值不超过第二阈值,包括:
    任意一个可用中转交换机中转所述数据的组数不超过
    Figure PCTCN2018117821-appb-100001
    其中,k为所述第一控制器确定的中转一组所述数据的可用中转交换机的数量,v为所述数据中心网络的可用中转交换机的数量。
  5. 如权利要求1-4任一项所述的方法,其特征在于,所述第一控制器从数据中心网络的可用中转交换机中,确定出分别中转m组数据的可用中转交换机,包括:
    所述第一控制器在预先存储的多个packing序列组中,确定所述m个源交换机组分别对应的packing序列;其中,一个源交换机组对应一个packing序列,所述packing序列包括v个元素,所述v个元素分别对应所述数据网络中的v个可用中转交换机;当一个元素的取值为第三值时,所述元素对应的可用中转交换机为,中转所述源交换机组连接的源网络节点向目的交换机组连接的目的网络节点传输的数据的中转交换机;
    且,当v>k时,所述v个元素中有k个元素取第三值,当v≤k时,所述v个元素取第三值;其中,一个packing序列组包括m个packing序列,在所述多个packing序列组中的每一个packing序列组中,任意一个元素最少取一次第三值,最多取
    Figure PCTCN2018117821-appb-100002
    次第三值;
    其中,v为所述数据中心网络的可用中转交换机的数量,k为所述第一控制器确定的中转一组所述数据的可用中转交换机的数量。
  6. 如权利要求1-5任一项所述的方法,其特征在于,所述路由信息承载在应答信号中。
  7. 如权利要求1-6任一项所述的方法,其特征在于,
    所述数据中心网络为n元胖树网络;所述多个中转交换机的数量为n2/4,每n/2个中转交换机组成一个中转交换机组;
    所述m个源交换机组和所述目的交换机组中,任意一个交换机组包括:n/2个汇聚交换机和n/2个边缘交换机;
    其中,所述n/2个汇聚交换机分别和所述n/2个边缘交换机连接;
    所述n/2个汇聚交换机分别和不同的中转交换机组中的n/2个中转交换机连接;
    所述n/2个边缘交换机分别和不同的网络节点连接。
  8. 如权利要求1-6任一项所述的方法,其特征在于,
    所述数据网络为叶脊网络,所述m个源交换机组和所述目的交换机组中,任意一个交换机组包括:一个边缘交换机;
    其中,所述边缘交换机和所述多个中转交换机连接,不同的边缘交换机分别和不同的网络节点连接。
  9. 一种控制器,其特征在于,包括:确定单元,指示单元,其中:
    所述确定单元,用于从数据中心网络的可用中转交换机中,确定出分别中转m组数据的可用中转交换机;所述m组数据分别为m个源交换机组连接的源网络节点向目的交换机组连接的目的网络节点传输的数据;所述数据中心网络包括多个中转交换机、所述m个源交换机组,所述目的交换机组、所述源网络节点、所述目的网络节点;其中,所述可用中转交换机为所述多个中转交换机中负载不超过第一阈值的中转交换机;m为正整数;
    其中,一个可用中转交换机用于中转至少一组所述数据,且任意两个可用中转交换机中转所述数据的组数的差值不超过第二阈值;
    所述指示单元,用于指示所述目的网络节点向所述源网络节点发送路由信息,所述路由信息包括用于中转所述源网络节点向所述目的网络节点传输的数据组的可用中转交换机的标识。
  10. 如权利要求9所述的控制器,其特征在于,所述控制器还包括获取单元,所述 获取单元用于获取至少一个数据包;
    所述确定单元还用于,在所述数据包中的拥塞显示指示域的取值为第一值的情况下,确定发送所述数据包的中转交换机为所述可用中转交换机;
    在所述数据包中的拥塞显示指示域的取值为第二值的情况下,确定发送所述数据包的中转交换机在预设时长后为所述可用中转交换机。
  11. 如权利要求10所述的控制器,其特征在于,所述至少一个数据包来自所述多个中转交换机,或者,所述至少一个数据包来自上一次数据传输中的可用中转交换机。
  12. 如权利要求9-11任一项所述的控制器,其特征在于,所述任意两个可用中转交换机中转所述数据的组数的差值不超过第二阈值,包括:
    任意一个可用中转交换机中转所述数据的组数不超过
    Figure PCTCN2018117821-appb-100003
    其中,k为中转一组所述数据的可用中转交换机的数量,v为所述数据中心网络的可用中转交换机的数量。
  13. 如权利要求9-12任一项所述的控制器,其特征在于,所述确定单元,具体用于:
    在预先存储的多个packing序列组中,确定所述m个源交换机组分别对应的packing序列;其中,一个源交换机组对应一个packing序列,所述packing序列包括v个元素,所述v个元素分别对应所述数据网络中的v个可用中转交换机;当一个元素的取值为第三值时,所述元素对应的可用中转交换机为,中转所述源交换机组连接的源网络节点向目的交换机组连接的目的网络节点传输的数据的中转交换机;
    且,当v>k时,所述v个元素中有k个元素取第三值,当v≤k时,所述v个元素取第三值;其中,一个packing序列组包括m个packing序列,在所述多个packing序列组中的每一个packing序列组中,任意一个元素最少取一次第三值,最多取
    Figure PCTCN2018117821-appb-100004
    次第三值;
    其中,v为所述数据网络中心的可用中转交换机的数量,k为中转一组所述数据的可用中转交换机的数量。
  14. 如权利要求9-13任一项所述的控制器,其特征在于,所述路由信息承载在应答信号中。
  15. 如权利要求9-14任一项所述的控制器,其特征在于,
    所述数据中心网络为n元胖树网络;所述多个中转交换机的数量为n2/4,每n/2个中转交换机组成一个中转交换机组;
    所述m个源交换机组和所述目的交换机组中,任意一个交换机组包括:n/2个汇聚交换机和n/2个边缘交换机;
    其中,所述n/2个汇聚交换机分别和所述n/2个边缘交换机连接;
    所述n/2个汇聚交换机分别和不同的中转交换机组中的n/2个中转交换机连接;
    所述n/2个边缘交换机分别和不同的网络节点连接。
  16. 如权利要求9-14任一项所述的控制器,其特征在于,
    所述数据网络为叶脊网络,所述m个源交换机组和所述目的交换机组中,任意一个交换机组包括:一个边缘交换机;
    其中,所述边缘交换机和所述多个中转交换机连接,不同的边缘交换机分别和不同的网络节点连接。
PCT/CN2018/117821 2017-11-30 2018-11-28 数据传输方法、相关装置及网络 WO2019105360A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
EP18883603.5A EP3713161B1 (en) 2017-11-30 2018-11-28 Data transmission method, relevant device and network
US16/886,894 US20200296043A1 (en) 2017-11-30 2020-05-29 Data transmission method, related apparatus, and network

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201711245559.0A CN109861925B (zh) 2017-11-30 2017-11-30 数据传输方法、相关装置及网络
CN201711245559.0 2017-11-30

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US16/886,894 Continuation US20200296043A1 (en) 2017-11-30 2020-05-29 Data transmission method, related apparatus, and network

Publications (1)

Publication Number Publication Date
WO2019105360A1 true WO2019105360A1 (zh) 2019-06-06

Family

ID=66665391

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/117821 WO2019105360A1 (zh) 2017-11-30 2018-11-28 数据传输方法、相关装置及网络

Country Status (4)

Country Link
US (1) US20200296043A1 (zh)
EP (1) EP3713161B1 (zh)
CN (1) CN109861925B (zh)
WO (1) WO2019105360A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20210352018A1 (en) * 2019-03-18 2021-11-11 Huawei Technologies Co., Ltd. Traffic Balancing Method and Apparatus

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112511323B (zh) * 2019-09-16 2022-06-14 华为技术有限公司 处理网络拥塞的方法和相关装置
CN112491700B (zh) * 2020-12-14 2023-05-02 成都颜创启新信息技术有限公司 网络路径调整方法、系统、装置、电子设备及存储介质
CN114978980B (zh) * 2022-04-08 2024-01-19 新奥特(北京)视频技术有限公司 Ip信号交叉点调度装置和方法
CN117155851B (zh) * 2023-10-30 2024-02-20 苏州元脑智能科技有限公司 数据包的传输方法及系统、存储介质及电子装置

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105324964A (zh) * 2013-07-29 2016-02-10 甲骨文国际公司 用于支持在中间件机器环境中的多宿主胖树路由的系统和方法
US20170230260A1 (en) * 2016-02-09 2017-08-10 GlassBox Ltd. System and method for recording web sessions
US20170237624A1 (en) * 2012-05-22 2017-08-17 Xockets, Inc. Offloading of computation for servers using switching plane formed by modules inserted within such servers
CN107113233A (zh) * 2014-10-31 2017-08-29 甲骨文国际公司 用于支持多租户集群环境中的分区感知路由的系统和方法
US20170331728A1 (en) * 2012-12-21 2017-11-16 Dell Products L.P. System and methods for load placement in data centers

Family Cites Families (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5101451A (en) * 1988-12-29 1992-03-31 At&T Bell Laboratories Real-time network routing
US7260647B2 (en) * 2002-03-28 2007-08-21 International Business Machines Corporation Method of load balancing traffic among routers in a data transmission system
US8737228B2 (en) * 2007-09-27 2014-05-27 International Business Machines Corporation Flow control management in a data center ethernet network over an extended distance
CN101854306B (zh) * 2010-06-07 2011-12-28 西安西电捷通无线网络通信股份有限公司 一种交换路由探寻方法及系统
GB2493240B (en) * 2011-07-29 2016-01-20 Sca Ipla Holdings Inc Mobile communications network, infrastructure equipment and method
WO2015040624A1 (en) * 2013-09-18 2015-03-26 Hewlett-Packard Development Company, L.P. Monitoring network performance characteristics
US10778584B2 (en) * 2013-11-05 2020-09-15 Cisco Technology, Inc. System and method for multi-path load balancing in network fabrics
US20160353325A1 (en) * 2014-02-05 2016-12-01 Nokia Solutions And Networks Oy Load balanced gateway selection in lte communications
US9485115B2 (en) * 2014-04-23 2016-11-01 Cisco Technology, Inc. System and method for enabling conversational learning in a network environment
US10362506B2 (en) * 2014-10-07 2019-07-23 Nec Corporation Communication aggregation system, control device, processing load control method and non-transitory computer readable medium storing program
TWI543566B (zh) * 2015-05-12 2016-07-21 財團法人工業技術研究院 基於軟體定義網路的資料中心網路系統及其封包傳送方法、位址解析方法與路由控制器
CN106911584B (zh) * 2015-12-23 2020-04-14 华为技术有限公司 一种基于叶-脊拓扑结构的流量负载分担方法、装置及系统
US10320681B2 (en) * 2016-04-12 2019-06-11 Nicira, Inc. Virtual tunnel endpoints for congestion-aware load balancing
US10439879B2 (en) * 2016-09-01 2019-10-08 Cisco Technology, Inc. Bandwidth management in a non-blocking network fabric
KR102380619B1 (ko) * 2017-08-11 2022-03-30 삼성전자 주식회사 이동 통신 시스템 망에서 혼잡 제어를 효율적으로 수행하는 방법 및 장치
US10673761B2 (en) * 2017-09-29 2020-06-02 Vmware, Inc. Methods and apparatus to improve packet flow among virtualized servers
JP2019097073A (ja) * 2017-11-24 2019-06-20 富士通株式会社 情報処理装置、情報処理方法及びプログラム

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170237624A1 (en) * 2012-05-22 2017-08-17 Xockets, Inc. Offloading of computation for servers using switching plane formed by modules inserted within such servers
US20170331728A1 (en) * 2012-12-21 2017-11-16 Dell Products L.P. System and methods for load placement in data centers
CN105324964A (zh) * 2013-07-29 2016-02-10 甲骨文国际公司 用于支持在中间件机器环境中的多宿主胖树路由的系统和方法
CN107113233A (zh) * 2014-10-31 2017-08-29 甲骨文国际公司 用于支持多租户集群环境中的分区感知路由的系统和方法
US20170230260A1 (en) * 2016-02-09 2017-08-10 GlassBox Ltd. System and method for recording web sessions

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP3713161A4 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20210352018A1 (en) * 2019-03-18 2021-11-11 Huawei Technologies Co., Ltd. Traffic Balancing Method and Apparatus

Also Published As

Publication number Publication date
CN109861925B (zh) 2021-12-21
EP3713161B1 (en) 2023-04-26
EP3713161A1 (en) 2020-09-23
CN109861925A (zh) 2019-06-07
US20200296043A1 (en) 2020-09-17
EP3713161A4 (en) 2020-12-23

Similar Documents

Publication Publication Date Title
WO2019105360A1 (zh) 数据传输方法、相关装置及网络
US9667524B2 (en) Method to check health of automatically discovered controllers in software defined networks (SDNs)
US9960991B2 (en) Method, apparatus and system for determining service transmission path
CN108243103B (zh) 用于在克洛斯网络中分配路由协议信息的装置、系统和方法
JP6141407B2 (ja) 802.1aqのためのスプリットタイブレーカ
US9654401B2 (en) Systems and methods for multipath load balancing
US9455916B2 (en) Method and system for changing path and controller thereof
CN103944828A (zh) 一种协议报文的传输方法和设备
CN103067291A (zh) 一种上下行链路关联的方法和装置
JP2016531372A (ja) メモリモジュールアクセス方法および装置
US9036629B2 (en) Switch module
CN104350711A (zh) 用于在diameter信令路由器处路由diameter消息的方法、系统及计算机可读介质
US20140233581A1 (en) Switch and switch system
CN104639437A (zh) 堆叠系统中广播报文的转发方法及装置
JP6064989B2 (ja) 制御装置、通信システム、ノード制御方法及びプログラム
CN112769584B (zh) 网络切片共享上联口的方法、装置及存储介质
JPWO2013176262A1 (ja) パケット転送システム、制御装置、パケット転送方法及びプログラム
US20150036508A1 (en) Method and Apparatus For Gateway Selection In Multilevel SPB Network
WO2017164068A1 (ja) トランスポートネットワーク制御装置、通信システム、転送ノードの制御方法及びプログラム
JP5889813B2 (ja) 通信システムおよびプログラム
US20160191299A1 (en) Information processing system and control method for information processing system
WO2018028457A1 (zh) 一种确定路由的方法、装置及通信设备
JP6633502B2 (ja) 通信装置
US9110721B2 (en) Job homing
CN111163005B (zh) 一种信息处理方法、装置、终端及存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18883603

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 2018883603

Country of ref document: EP

Effective date: 20200615