WO2022037266A1 - 数据中心中的通信方法、装置和系统 - Google Patents

数据中心中的通信方法、装置和系统 Download PDF

Info

Publication number
WO2022037266A1
WO2022037266A1 PCT/CN2021/103256 CN2021103256W WO2022037266A1 WO 2022037266 A1 WO2022037266 A1 WO 2022037266A1 CN 2021103256 W CN2021103256 W CN 2021103256W WO 2022037266 A1 WO2022037266 A1 WO 2022037266A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
data flow
routing
routing policy
information
Prior art date
Application number
PCT/CN2021/103256
Other languages
English (en)
French (fr)
Inventor
周轶刚
卢胜文
毛修斌
胡中华
李凤凯
刘永锋
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to EP21857362.4A priority Critical patent/EP4184937A4/en
Publication of WO2022037266A1 publication Critical patent/WO2022037266A1/zh
Priority to US18/170,293 priority patent/US20230198896A1/en

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L45/00Routing or path finding of packets in data switching networks
    • H04L45/34Source routing
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04BTRANSMISSION
    • H04B10/00Transmission systems employing electromagnetic waves other than radio-waves, e.g. infrared, visible or ultraviolet light, or employing corpuscular radiation, e.g. quantum communication
    • H04B10/27Arrangements for networking
    • H04B10/271Combination of different networks, e.g. star and ring configuration in the same network or two ring networks interconnected
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/08Configuration management of networks or network elements
    • H04L41/0894Policy-based network configuration management
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L45/00Routing or path finding of packets in data switching networks
    • H04L45/02Topology update or discovery
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L45/00Routing or path finding of packets in data switching networks
    • H04L45/24Multipath
    • H04L45/245Link aggregation, e.g. trunking
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L45/00Routing or path finding of packets in data switching networks
    • H04L45/38Flow based routing
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L45/00Routing or path finding of packets in data switching networks
    • H04L45/42Centralised routing
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L45/00Routing or path finding of packets in data switching networks
    • H04L45/645Splitting route computation layer and forwarding layer, e.g. routing according to path computational element [PCE] or based on OpenFlow functionality
    • H04L45/655Interaction between route computation entities and forwarding entities, e.g. for route determination or for flow table update
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/10Flow control; Congestion control
    • H04L47/12Avoiding congestion; Recovering from congestion
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04QSELECTING
    • H04Q11/00Selecting arrangements for multiplex systems
    • H04Q11/0001Selecting arrangements for multiplex systems using optical switching
    • H04Q11/0062Network aspects
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04QSELECTING
    • H04Q11/00Selecting arrangements for multiplex systems
    • H04Q11/0001Selecting arrangements for multiplex systems using optical switching
    • H04Q11/0062Network aspects
    • H04Q2011/0073Provisions for forwarding or routing, e.g. lookup tables
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks

Definitions

  • the present invention generally relates to communication technologies, and in particular, to a communication method, device and system in a data center.
  • DCN Data Center Network
  • Switches in traditional DCN networks process network packets carried by electrical signals, support packet-level switching and routing, and support advanced functions such as packet buffering and traffic congestion control on switches. Because the traditional DCN network is based on optical signal transmission in the communication line, and the switching node must be based on electrical signals when exchanging packets, so every hop switch on the transmission path must convert optical signals to electrical signals and then to optical signals, which also leads to DCN network. Problems such as high energy consumption, high construction cost, high packet E2E transmission delay, and limited bandwidth of switch ports due to electrical signal processing capabilities.
  • Embodiments of the present application provide a method, device and system for communication in a data center, so as to improve the communication efficiency of the data center or save the energy consumption and cost of the data center.
  • the present application provides a communication method in a data center.
  • the data center includes multiple servers, multiple electrical switches, and at least one optical cross-connect device.
  • the uplink port is interconnected with the at least one optical cross-connect device; the method includes: receiving network topology information issued by a network topology manager, acquiring a data flow, configuring a routing policy for the data flow according to the network topology information, and configuring
  • the routing policy may include a first routing policy instructing to forward the data flow through the optical channel in the at least one optical cross-connect device, or may include a first routing policy instructing to divide the data flow into at least two sub-data flows for forwarding.
  • the second routing strategy may also include a third routing strategy instructing to forward the data flow through the electrical switch in the data center, or may include adopting the third routing strategy for the first part of the data flow and the The second part of the packets adopts the first combination strategy of the first routing strategy, and may also include the third routing strategy for the first part of the data flow and the second routing strategy for the second part of the data flow. Two combination strategies.
  • the method is executed by at least one server in the plurality of servers, or the method is executed by each server in the plurality of servers.
  • the method is performed by a network card in the server, such as a smart network card with a network processor.
  • the embodiment of the present invention adopts traditional packet switching equipment (such as electrical switches) and optical cross-connect equipment to form a hybrid DCN network, and adopts the source routing technology on the server to realize the routing of data flows in the data center, and distributes the work of centralized routing calculation to the server
  • Independent computing on the intelligent network card on the SDN controller avoids the difficulty of controller fault tolerance caused by the centralized path calculation of the SDN controller, and the problems of slow rerouting and routing convergence after network failure.
  • the third routing policy is configured for the data flow according to the address information carried by the data flow and the network topology information.
  • the type of the data flow is identified according to the message of the data flow, and an updated routing policy is configured for the data flow according to the type of the data flow, and the updated routing policy includes the first a routing strategy or the second routing strategy.
  • the first routing policy is configured for a first type of data flow
  • the second routing policy is configured for a second type of data flow.
  • the embodiment of the present invention schedules the classified data flow to the electrical switch or optical cross-connect device for forwarding, so as to flexibly utilize the advantages of the electrical switch and the optical cross-connect, which can not only reduce the cost of DCN network construction, but also greatly reduce the transmission of large data flows such as elephant flows to the network. Congestion and the impact of network fairness.
  • the first type of elephant flow is directly forwarded by the optical channel, which reduces the large amount of photoelectric conversion caused by the use of electrical switches to forward such large flow, which can improve the forwarding efficiency of the first type of elephant flow and reduce the energy consumption of the data center.
  • the impact of the first type of elephant flow on other traffic in the data center during the time period in which it occurs can be reduced.
  • a split forwarding method is adopted for the second common elephant flow, which can improve the forwarding efficiency of such an elephant flow.
  • the method further includes: selecting a target optical cross-connect device from the at least one optical cross-connect device, and instructing the target optical cross-connect device to establish the optical channel.
  • the optical channel on the traditional OxC device is manually and statically configured, and cannot be flexibly configured according to the dynamic change of the data traffic.
  • the server identifies the type or change of the data traffic, and drives the optical channel controller to dynamically configure the optical cross-connect device to dynamically establish the optical channel.
  • the first type of elephant flow can be scheduled to the optical channel of OxC on demand, and the impact of the first type of elephant flow on other traffic on the network can be reduced.
  • the method further includes: obtaining first sub-routing information, where the first sub-routing information includes information for routing the data flow of the first type to the target optical cross-connect device; The information forwards subsequent packets of the data flow of the first type to the target optical cross-connect device.
  • the method further includes: obtaining routing information of at least two equal-cost sub-paths according to the third routing policy.
  • the method further includes: dividing the subsequent packets of the data flow of the second type into at least two sub-data flows, and forwarding the two sub-data flows respectively according to the routing information of the at least two equal-cost paths.
  • the method of segmentation can disperse the forwarding of packets, improve the load balance of each path in the network, and thus improve the communication efficiency.
  • the data flow of the first type is a tidal elephant flow
  • the data flow of the second type is a normal elephant flow
  • identifying the feature information carried by the message in the data stream and if the feature information of the message matches the preset information, it is determined that the data stream is a data stream of the first type; or,
  • the number of packets or the amount of data or the required bandwidth in a period of time is greater than a preset threshold, and it is determined that the data flow is a data flow of the second type.
  • network connection information of adjacent switches is acquired through a link discovery protocol, and the acquired network connection information is reported to the topology manager.
  • An agent running the link discovery protocol is deployed on a server (such as an intelligent network card), the server is incorporated into the network, and routes are distributed and calculated at the source end, sharing the load of the switches in the network.
  • the present application provides a data center, the data center includes multiple servers, multiple electrical switches, and at least one optical cross-connect device, and the uplink ports of at least two electrical switches in the multiple electrical switches are the same as the one described above.
  • At least one optical cross-connect device is interconnected; at least one server among the multiple servers is used to receive network topology information, obtain data flow, and configure a routing policy for the data flow according to the network topology information, and the routing policy It includes any one or a combination of the following routing strategies: a first routing strategy, the first routing strategy instructs to forward the data flow through an optical channel in the at least one optical cross-connect device; a second routing strategy, the second routing strategy The policy instructs to divide the data flow into at least two sub-data flows for forwarding; and a third routing policy instructs to forward the data flow through an electrical switch in the data center.
  • the multiple servers and the multiple electrical switches form multiple service clusters, wherein the first service cluster includes at least two servers, at least two access switches and at least one aggregation switch, the at least two The first upstream ports of the access switches are interconnected with the at least one optical cross-connect device, and the second upstream ports of the at least two access switches are interconnected with the at least one aggregation switch.
  • the multiple servers and the multiple electrical switches form multiple service clusters, wherein the first service cluster includes at least two access switches and at least one aggregation switch, and the first service cluster of the at least one aggregation switch is The upstream port is interconnected with the at least one optical cross-connect device, and the second upstream port of the at least one aggregation switch is interconnected with the backbone switch.
  • the multiple servers and the multiple electrical switches form multiple service clusters, wherein the first service cluster includes at least two servers, at least two access switches and at least one aggregation switch, the at least two The first uplink ports of the access switches are interconnected with the first optical cross-connect device in the at least one optical cross-connect device, and the second uplink ports of the at least two access switches are interconnected with the at least one aggregation switch;
  • the first upstream port of the at least one aggregation switch is interconnected with a second optical cross-connect device in the at least one optical cross-connect device, and the second upstream port of the at least one aggregation switch is interconnected with the backbone switch.
  • the data center further includes a topology manager, and the topology manager is configured to obtain network connection information sent by each device in the data center, obtain network topology information according to the network connection information of each device, and Delivering the network topology information to at least one server among the multiple servers.
  • the topology manager is configured to obtain network connection information sent by each device in the data center, obtain network topology information according to the network connection information of each device, and Delivering the network topology information to at least one server among the multiple servers.
  • the at least one server includes a network card, the network card obtains the network topology information delivered by the topology manager, and the network card configures the routing policy according to the network topology information.
  • the present application provides a server, where the server includes a processing unit for executing any one of the methods described in the first aspect.
  • the present application provides a network card, where the network card includes a network processor, and the network processor is configured to execute any one of the methods described in the first aspect.
  • the present application provides a server or a network card, the server includes a processor and a memory, wherein the memory is used to store program codes, and the processor is used to execute the program codes to implement the first aspect. any method.
  • the present application provides a routing processing device, which is applied to the data center described in the second aspect.
  • the routing processing device includes a source routing controller, and the source routing controller is configured to receive a network topology issued by a network topology manager. information, obtain a data flow, and configure a routing policy for the data flow according to the network topology information, where the configured routing policy may include a first routing policy instructing to forward the data flow through an optical channel in the at least one optical cross-connect device , may also include a second routing strategy for instructing to split the data flow into at least two sub-data flows for forwarding, and may also include a third routing strategy for instructing to forward the data flow through an electrical switch in the data center, It can also include the first combination strategy of adopting the third routing strategy for the first part of the data flow and adopting the first routing strategy for the second part of the data flow, or it can also include adopting the first part of the data flow.
  • the third routing strategy and the second combined strategy of adopting the second routing strategy for the second part of the data includes
  • the route processing device is an intelligent network card in the server.
  • the source routing controller includes a source routing engine and an optical cross-connect control engine, the source routing engine is used to calculate a routing policy according to network topology information, and the optical cross-connect control engine is used to control the optical cross-connect device when an optical channel needs to be established. Create a light channel.
  • the routing processing apparatus further includes an identification engine for identifying the type of data flow, reporting different types of data flow to the source routing engine, and obtaining a routing policy provided by the source routing engine.
  • the present application provides a computer-readable storage medium or a computer program product, where an instruction is stored in the computer-readable storage medium, and when the instruction is executed by a processor, the instructions provided in any one of the implementation manners of the foregoing second aspect are implemented. communication method.
  • the computer-readable storage medium includes, but is not limited to, read-only memory, random access memory, flash memory, HDD or SSD.
  • the embodiment of the present invention adopts traditional packet switching equipment (such as electrical switches) and optical cross-connect equipment to form a hybrid DCN network, and adopts the source routing technology on the server to realize the routing of data flows in the data center, and distributes the work of centralized routing calculation to the server
  • Independent computing on the intelligent network card on the SDN controller avoids the difficulty of controller fault tolerance caused by the centralized path calculation of the SDN controller, and the problems of slow rerouting and routing convergence after network failure.
  • the embodiment of the present invention breaks through the networking architecture of the traditional data center, provides a new type of optical-electrical hybrid routing data center networking architecture, provides optical cross-connect equipment and basic switches or edge switches in the data center network architecture for interconnection, and uses optical channels. Forward related data streams to improve network transmission efficiency.
  • Figure 1 is a traditional data center network architecture diagram
  • Fig. 2 is a data center network architecture diagram provided by an embodiment of the present invention.
  • FIG. 3 is a system networking diagram provided by an embodiment of the present invention.
  • FIG. 4 is a flowchart of a topology management method provided by an embodiment of the present invention.
  • FIG. 5 is a flow chart of establishing an optical channel and forwarding a message according to an embodiment of the present invention
  • FIG. 6 is a flowchart of a data center communication method according to an embodiment of the present invention.
  • FIG. 7 is a schematic diagram of the composition of a device according to an embodiment of the present invention.
  • a cluster (referred to as a service cluster in the embodiment of the present invention), in a PoD cluster, in order to improve the reliability of the link, the edge switches and the aggregation switches can be fully interconnected.
  • Layer-3 backbone switches (such as spine switches) are used to provide external access functions and provide interconnection between PoD clusters, and each backbone switch is connected to at least one aggregation switch.
  • the switches at all levels in the above-mentioned traditional data centers include energy-consuming optical-electrical (OE) and electro-optical (EO) transceivers, which have high energy consumption, high construction costs, high packet E2E transmission delay, and switch ports. Bandwidth is limited by electrical signal processing capability constraints.
  • the types of traffic generated in the data center are becoming more and more complex, and the management of the traffic can also improve the efficiency of the data center.
  • OxC Optical Cross-Connect
  • DCI Data Center Interconnect
  • the embodiment of the present invention adopts a traditional packet switching device (such as an electrical switch) and an optical cross-connect device to form a hybrid DCN network, and adopts the source routing technology to realize the routing of the data flow in the data center.
  • the server in the data center adopts the source routing technology to The data flow is routed, and more specifically, the intelligent network card on the server can dynamically identify and distinguish different types of traffic in the physical network.
  • Flexible use of the advantages of electrical switching and optical cross-connect can not only reduce the cost of DCN network construction, but also greatly reduce the impact of large data flows such as elephant flows on network congestion and network fairness.
  • FIG 2 shows the DCN networking topology provided by the embodiment of the present invention.
  • at least one optical cross-connect device OxC is deployed, and the Leaf and/or Leaf switch layers are deployed.
  • the uplink port of the ToR switch is interconnected with the optical cross-connect port of the OxC.
  • an optical cross-connect device OXC1 is also deployed on the Leaf layer in PoD1, and the uplink ports from ToR1 to ToRn in PoD1 are connected to the optical cross-connect ports of OXC1 respectively, and so on.
  • the Leaf layer in PoDn also deploys optical cross-connects.
  • the upstream ports of ToR1 to ToRn in PoDn are connected to the optical cross-connect port of OXCn respectively.
  • the deployment quantity and connection method of the optical cross-connect equipment OXC in the PoD shown in Figure 2 is only one of the achievable examples. A larger number of optical cross-connect equipment OXC can be deployed inside each PoD.
  • the connection between the ToR switch inside each PoD and the optical cross-connect equipment OXC inside the PoD can be fully interconnected or other partial connection methods can be used. connection, which is not described in detail in this embodiment.
  • an optical cross-connect device OXC11 is also deployed outside the PoD.
  • the upstream ports of the switches Leaf1 to Leafn in the PoD1 are connected to the optical cross-connect ports of the OXC11 respectively, and so on.
  • the upstream ports of the switches Leaf1 to Leafn in the PoDn are respectively It is connected to the optical cross-connect port of OXC11.
  • the deployment quantity and connection method of the PoD external optical cross-connect equipment OXC shown in Figure 2 is only one of the achievable examples.
  • the connection mode between the Leaf switch inside the PoD and the optical cross-connect device OXC outside the PoD may be a full interconnection mode, or may be connected in other partial connection modes, which will not be described in detail in this embodiment.
  • the dotted line connection in the figure is only used to illustrate the connection between the electrical switch and the optical cross-connect device and the connection between the electrical switches, and does not mean that there is no connection between the two.
  • the above-mentioned DCN networking can also include a topology manager (not shown in the figure), which can be any one or more servers in the data center, or a topology management component running on any one or more servers.
  • the topology manager may also be a software defined network (Software Defined Network, SDN) controller.
  • SDN Software Defined Network
  • the topology manager is used to collect and manage the topology of the entire network, and send the collected topology information of the entire network to the network card in each server.
  • the topology of the electrical switch can be collected through the Link Layer Discovery Protocol (LLDP) and then reported to the topology manager for synthesis, and the port topology information of the optical cross-connect device OxC can be statically configured in the topology manager.
  • LLDP Link Layer Discovery Protocol
  • the above-mentioned DCN networking also includes an OXC controller (not shown in the figure), and the OXC controller is an independent device or a module in each OXC optical cross-connect device.
  • the OXC controller is used to manage the optical channel.
  • the intelligent network card in the server is improved so that the intelligent network card has the source routing function.
  • the programmable network processor in the intelligent network card can be programmed so that the intelligent network card can realize the source routing function.
  • the source routing function may include functions such as discovery of network connection information, identification of network topology data flow, and data flow routing in the embodiment of the present invention.
  • at least one server can be selected for configuration in each PoD in the data center, so that the selected server has the function of source routing, or the intelligent network card on each server in the data center can be selected for configuration, In order to achieve distributed source routing technology.
  • the embodiment of the present invention is described by taking the smart network card on the server as an example. In specific implementation, other devices on the server may also perform the functions performed by the above-mentioned improved smart network card.
  • the SDN controller will centrally calculate the global route after collecting the whole network topology and issue the flow forwarding table to the electrical switch.
  • the intelligent network card on the server is used to realize the source route control, and the work of the centralized route calculation is realized.
  • Independent computing distributed to the intelligent network card on the server avoids the difficulty of controller fault tolerance caused by the centralized calculation of the SDN controller, and the problem of slow re-routing and routing convergence after network failure.
  • the smart network card on the server in the data center can obtain the network topology information of the entire network from the topology manager, and the smart network card can also be based on the statistics of the sent traffic characteristics of each data stream (such as the number of reports sent per unit time).
  • the number of packets, the amount of data carried by the packets, or the total bandwidth occupied by the packets) identify different types of network traffic, and adopt different routing strategies of segmentation (for ordinary elephant traffic) or bypass (for tidal elephant traffic).
  • the traffic that causes network congestion may only account for 10%, and these traffic accounts for 90% of the total traffic size. This kind of traffic is called Elephant Flow, and the rest is called rat Mice Flow.
  • the tidal elephant flow and the ordinary elephant flow are used as examples to illustrate the distinction of the elephant flow generated by the server.
  • Other classification standards or classification results may also be used.
  • the source routing control plane on the smart network card in the embodiment of the present invention can dynamically plan the routing strategy for the packet to reach the destination node according to the data traffic type and/or network topology dynamic information, and the smart network card can generate the source routing forwarding label stack according to different routing strategies Divide different types of elephant flows to electrical switches or optical cross-connect devices. For example, tidal elephant flows are directly forwarded through optical channels, reducing a large number of photoelectric conversions caused by electrical switch forwarding for large flows, which can improve tidal elephant flows.
  • the forwarding efficiency and the reduction of the energy consumption of the data center can also reduce the impact of the tidal elephant flow on other traffic in the data center during the time period when it occurs.
  • the embodiment of the present invention may adopt a split forwarding manner, which may improve the forwarding efficiency of the elephant flow.
  • the optical channel on the traditional OxC device is manually and statically configured, and cannot be flexibly configured according to the dynamic change of data traffic.
  • the type or change of the data traffic is identified by the intelligent network card, and the optical channel controller is driven to dynamically configure the OxC device to dynamically establish Optical channel, which can schedule the tidal elephant flow to the optical channel of OxC on demand, reducing the impact of the tidal elephant flow on other traffic on the network.
  • control plane consists of a topology manager and an OxC controller
  • data plane It consists of the intelligent network card on the server, electrical switches such as ToR/Leaf/Spine, and optical cross-connect equipment OxC.
  • Topology manager 31 responsible for collecting the network connection information of each node from the smart network card running the Link Layer Discovery Protocol (LLDP) agent and the electrical switches at all levels, and synthesizing the network topology information of the whole network.
  • the control network is issued to the server, preferably, it can be the smart network card on all the servers in the whole network.
  • LLDP Link Layer Discovery Protocol
  • OxC controller 32 responsible for receiving the request for establishing an optical channel sent from the control plane of the intelligent network card on the server, and sending it to the optical cross-connect device 36 to establish an optical channel between the OxC source port and the OxC destination port.
  • the intelligent network card control plane 33 includes a source routing controller, which is used to calculate the routing strategy (forwarding path) from the server node to other server nodes in the entire network based on the network topology information issued by the topology manager, wherein the source routing engine 331 After calculating the primary path and multiple optional backup paths passing through the electrical switch, the OxC engine 332 calculates the optical channel passing through the optical cross-connect device OxC, which may also be called a fast forwarding path. After receiving the packet of the data flow, the source routing controller responds to the data forwarding request submitted from the data plane of the smart NIC, determines the forwarding path of the data flow based on the routing policy and routing algorithm, generates a forwarding label stack, and then sends it to the to the data plane. Specifically, different forwarding paths can be generated according to the type of the data flow.
  • Smart NIC data plane 34 includes an identification engine 341, used for identifying the tide based on the characteristic information or statistical characteristics of the packets carried by the packets of the transmitted traffic for each data stream (determined by the quintuple or 7-tuple) Elephant flow or ordinary elephant flow, and submit the request to the control plane of the smart NIC for forwarding through the ordinary path or the fast forwarding path.
  • the data plane of the iNIC After receiving the source routing label stack delivered by the control plane, the data plane of the iNIC will add the label stack packet by packet according to the label stack information of different data streams and send it, and store the source routing label information in the source routing label table 342 .
  • Electrical switch 35 An electrical switch that supports source-routing label stacks, will parse the label stack in the packet header (the label stack includes the outgoing port number of each switch on the forwarding path), and parse the label stack corresponding to this hop on the label stack. After the information about the egress port of the switch is parsed, the packet is forwarded directly through the egress port, regardless of whether the next hop is an electrical switch or an optical cross-connect device.
  • the flowchart of the method for topology discovery in DCN hybrid networking includes:
  • the topology manager obtains a static topology configuration file, the static topology configuration file includes the network connection information between the electrical switches in the data center and the optical cross-connect device OxC. Specifically, all electrical switches in the data center can be obtained.
  • the connection relationship with all optical cross-connect equipment OxC for example, the upstream port of PoD1-ToR1->PoD1-OxC1, the upstream port of PoDn-ToR1->PoDn-OxC1, the upstream port of PoD1-Leaf1->OxC11, PoDn-Leaf1 Upstream port -> OxC11.
  • the smart network card on the server runs the LLDP agent to obtain the network connection information (such as port serial number, IP address or MAC address, etc.) of the neighbor node.
  • the neighbor node can be an adjacent switch, specifically, it can be data
  • the smart network cards on all the servers in the center run LLDP agents to collect the network connection information of neighbor nodes, and the smart network cards on some servers can also run LLDP agents to collect network connection information of neighbor nodes, only one of which is shown in Figure 4 one.
  • the electrical switch runs an LLDP agent, and obtains the network connection information (such as port serial number, IP address or MAC address, etc.) of the neighbor node through the LLDP protocol.
  • the neighbor node can be an adjacent switch or an adjacent server.
  • all electrical switches in the data center may run LLDP agents to collect network information of neighbor nodes, or some switches may run LLDP agents to collect network connection information of neighbor nodes (for example, inactive electrical switches may not join the network). ), only one of which is shown in FIG. 4 .
  • the smart network card on the server reports the collected network connection information of the neighbor nodes to the topology manager. Specifically, the smart network card on all the servers in the data center can periodically report the collected neighbor node information. The network connection information is reported to the topology manager. In addition, the smart network card can also report its own network connection information to the topology manager.
  • the electrical switch reports the collected network connection information of the neighbor nodes. Specifically, all the electrical switches in the data center may periodically report the network connection information of the neighbor nodes collected by themselves to the topology manager. In addition, each electrical switch can also report its own network connection information to the topology manager.
  • the topology manager comprehensively obtains network topology information including all intelligent network cards, electrical switches and optical cross-connect devices OxC in the entire network based on the information collected by each node.
  • the topology manager sends the network topology information of the entire network to at least one smart network card in the data center. Specifically, one smart network card inside each server is used for identifying and data flow of the internal data flow of the server. Flow forwarding, therefore, the topology manager delivers the network topology information of the entire network to at least one smart network card inside each server.
  • the electrical switches or smart network cards running the LLDP agent can periodically exchange heartbeats and refresh the network connection information of neighbor nodes, when the network connection information of each node in the data center changes, the network topology information of the entire network also changes. Therefore, the topology manager will also re-deliver the refreshed network topology information periodically or according to the request of the intelligent network card on the server.
  • the SDN controller collects the network topology information of the entire network, and will centrally calculate the global route and deliver the flow forwarding table to each electrical switch.
  • the topology manager delivers the network topology information to each server, and sends The work of centralized routing calculation is distributed to the server for independent calculation, which avoids the difficulty of fault tolerance of the controller caused by the centralized calculation of the SDN controller, and the problems of rerouting and slow routing convergence after network failure.
  • the flow chart of sending network packet traffic in a data center using an optoelectronic hybrid architecture includes:
  • the server generates a packet of the data flow, and the packet arrives at the local smart network card, and the intelligent network card on the server identifies the type of the data flow generated by the server.
  • the smart network card can identify the characteristic information carried by the packets in the data flow, and if the characteristic information of the packets matches the preset information, it is determined that the data flow is a tidal elephant flow; the intelligent network card can identify that the data flow is within the first time period The number of packets in the first time period, or the amount of data contained in the packets in the first time period, or the bandwidth required for the packets in the first time period, if the number of packets in the first time period or the amount of data or If the required bandwidth is greater than the preset threshold, it is determined that the data stream is an ordinary elephant stream.
  • the smart NIC can collect statistics on traffic through periodic sampling, for example, based on the payload size of the data packets in the data flow.
  • the data packet size is larger than a preset threshold, it is identified as an elephant flow, or it can also calculate the data flow.
  • One of the above methods for identifying traffic can be selected, or any combination can be used to obtain a more accurate identification result.
  • Smart NICs can also have built-in artificial intelligence AI algorithm models, use data packet traffic to train the AI algorithm model, and use the trained AI algorithm model to identify and classify data packet traffic.
  • the identification of data packet traffic by the smart network card can be real-time or non-real-time.
  • the smart network card After the smart network card recognizes that the data packet is a tidal elephant flow, it determines to forward the tidal elephant flow through the optical channel.
  • the source routing controller on the smart network card can send data from the data center according to the network topology
  • the target optical cross-connect device is selected from the optical cross-connect devices of the target optical cross-connect device, and an OxC optical channel establishment request is sent to the optical cross-connect controller corresponding to the target optical cross-connect device through the control network.
  • the intelligent network card may select one target optical cross-connect device OxC according to the network topology information, or may select multiple target optical cross-connect devices OxC. If the principle of selecting the optical cross-connect device OxC can be the principle of proximity to the sending end, select the optical cross-connect device OxC closest to the sending end server, or select the optical cross-connect device OxC closest to the ToR or Leaf switch connected to the sending end server, or Using the principle of proximity to the destination, select the optical cross-connect device OxC closest to the destination server, or the optical cross-connect device OxC closest to the ToR or Leaf switch connected to the destination server.
  • the optical cross-connect controller sends a control command to the target optical cross-connect device, instructing the target optical cross-connect device to establish a corresponding optical channel.
  • the target optical cross-connect device establishes a corresponding optical channel.
  • the optical cross-connect controller feeds back the response that the optical channel is successfully established to the source routing controller on the intelligent network card.
  • the source routing controller on the smart network card calculates the first sub-route of the tidal elephant flow bypassed from the electrical switch to the OxC optical channel according to the network topology information, and obtains the information of the first sub-route.
  • the source routing controller on the smart network card generates a new source routing label according to the first sub-routing information and delivers it to the forwarding flow table corresponding to the tidal elephant flow in the data plane of the smart network card.
  • the smart network card labels the subsequent packets of the tidal elephant flow based on the new source routing label and forwards them.
  • the subsequent packets of the tidal elephant flow may be forwarded to the target optical cross-connect device according to the new source routing label stack.
  • the optical channel on the traditional OxC device is manually and statically configured, and cannot be configured flexibly according to the dynamic change of the data traffic.
  • the embodiment of the present invention identifies the change of the network traffic through the intelligent network card, and drives the optical channel controller to dynamically configure the OxC device to dynamically establish the optical channel.
  • Schedule periodic elephant flows or tidal elephant flows to the OxC channel on demand reduce the impact of such elephant flows on the rest of the network, improve the forwarding efficiency of such elephant flows, and avoid a large number of data flows photoelectric conversion operation, thereby reducing energy consumption in the data center and saving costs.
  • FIG. 6 a flowchart of another method for forwarding data streams by using an optoelectronic hybrid data center according to an embodiment of the present invention is shown.
  • FIG. 6 one of the implementation manners in which the smart network card identifies and forwards the type of the data stream is shown in FIG. 6 , including:
  • DSCP Differentiated Services Code Point
  • the DiffServ architecture stipulates that each transmitted message will be classified into different classes in the network.
  • each switch and router adopts the same transmission service policy for packets containing the same classification information, and adopts different transmission service policies for packets containing different classification information.
  • the classification information of the message can be given by the host, switch, router or other network device on the network.
  • the practice of identifying the content of the message in order to assign category information to the message often needs to consume a large amount of processing resources of the network device.
  • the DSCP identification is applied to the server, specifically, the smart network card on the server can avoid avoiding the The processing overhead of network resources such as switches is heavily occupied.
  • the data packets output or received by the server may be assigned different category information based on different application policies or based on differences in the contents of the packets. Differences in the DSCP values of the data packets indicate different types of data packets.
  • Different DSCP values represent different types of messages
  • the DSCP value can have different indication methods
  • the DSCP value of the data message can be included in the IP header, such as The Type Of Service (TOS) in the IP header is used to carry the DSCP value (classification information) of the message, or the message is determined by including User Priority bits in the second layer header of the message.
  • TOS The Type Of Service
  • DSCP value of the text is determined by extracting the source MAC address, destination MAC address and Ethertype field of the packet to match the associated access control lists (Access Control Lists, ACLs).
  • the DSCP value of the packet is obtained according to the default CoS value of the packet input port.
  • initial routing information is generated according to the address information carried in the packet, such as destination address information or source address information , the initial routing information includes the initial forwarding path or the initial routing label.
  • the source routing engine notifies the OxC engine, and the OxC engine establishes an optical channel.
  • the source routing engine generates a new routing label for routing the first data flow to the optical channel.
  • the source routing engine generates multiple equal-cost forwarding paths for the common elephant flow, and generates corresponding routing labels.
  • the source routing engine may generate multiple equal-cost forwarding paths according to the initial routing information previously generated for the second data flow.
  • the source routing engine delivers the routing labels of the multiple equal-cost forwarding paths generated for the second data flow to the data plane of the smart network card.
  • the data plane of the smart network card divides the subsequent packets of the second data stream into at least two sub-data streams, and forwards the two sub-data streams respectively according to the routing information of the at least two equal-cost paths.
  • the third data flow determined to be a general data flow continue to forward the subsequent packets of the third data flow according to the initial routing information of the third data flow.
  • FIG. 7 is a schematic diagram of a device 700 according to an embodiment of the present invention.
  • the device 700 includes a processor 701 , a memory 702 , a communication interface 703 , and a bus 704 .
  • the processor 701 , the memory 702 , and the communication interface 703 communicate through the bus 704 , and can also communicate through other means such as wireless transmission.
  • the memory 702 is used to store program code 7021, and the processor 701 is used to call the program code 7021 stored in the memory 702 to execute the operations of each method described in the embodiments of this application.
  • the processor 701 may perform operations related to the methods in the embodiments of the present invention.
  • the processor 701 may be a CPU, and the processor 701 may also be other general-purpose processors, digital signal processors (DSPs), application specific integrated circuits (ASICs), field programmable gate arrays ( FPGA), GPU, network processor or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc.
  • DSPs digital signal processors
  • ASICs application specific integrated circuits
  • FPGA field programmable gate arrays
  • GPU network processor or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc.
  • a general purpose processor may be a microprocessor or any conventional processor or the like.
  • Memory 702 may be volatile memory or nonvolatile memory, or may include both volatile and nonvolatile memory.
  • the non-volatile memory may be read-only memory (ROM), programmable read-only memory (PROM), erasable programmable read-only memory (EPROM), electrically programmable Erase programmable read-only memory (electrically EPROM, EEPROM) or flash memory.
  • Volatile memory may be random access memory (RAM), which acts as an external cache.
  • RAM random access memory
  • SRAM static random access memory
  • DRAM dynamic random access memory
  • SDRAM synchronous dynamic random access memory
  • Double data rate synchronous dynamic random access memory double data date SDRAM, DDR SDRAM
  • enhanced synchronous dynamic random access memory enhanced SDRAM, ESDRAM
  • synchronous link dynamic random access memory direct rambus RAM, DR RAM
  • bus 704 may also include a power bus, a control bus, a status signal bus, and the like.
  • bus 704 may also include a power bus, a control bus, a status signal bus, and the like.
  • the various buses are labeled as bus 704 in the figure.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Computing Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Electromagnetism (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

本申请实施例公开了一种数据中心中的通信方法、装置和系统,所述数据中心包括多台服务器,多个电交换机,以及至少一个光交叉设备,所述多个电交换机中的至少两个电交换机的上行端口与所述至少一个光交叉设备互连;所述方法包括:接收拓扑管理器下发的网络拓扑信息;获取数据流;根据所述网络拓扑信息为所述数据流配置路由策略,所述路由策略包括以下任意路由策略之一或组合:第一路由策略,所述第一路由策略指示通过所述至少一个光交叉设备中的光通道转发所述数据流;第二路由策略,所述第二路由策略指示将所述数据流切分为至少两个子数据流进行转发;第三路由策略,所述第三路由策略指示通过所述数据中心中的电交换机转发所述数据流。

Description

数据中心中的通信方法、装置和系统
本申请要求于2020年8月17日提交的申请号为202010826601.3、发明名称为“数据中心中的通信方法、装置和系统”的中国专利申请的优先权,以及要求于2020年12月31日提交的申请号为202011639115.7、发明名称为“数据中心中的通信方法、装置和系统”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本发明大体上涉及通信技术,尤其涉及一种数据中心中的通信方法、装置和系统。
背景技术
在传统采用电交换机的数据中心网络中,一般采用胖树(Fat-tree)方式组网,端侧服务器通过电缆或者光缆(需经过光电信号转换)连接ToR(Top of Rack)交换机,ToR交换机和Leaf、Spine、Core交换机按照一定收敛比互联,可组成一张规模最高可达几百万节点的数据中心网络(Data Center Network,DCN)。
传统DCN网络的交换机处理电信号承载的网络报文,支持报文级交换和路由,也支持在交换机上缓存报文和对流量拥塞控制等高级功能。由于传统DCN网络在通信线路基于光信号传输,而交换节点在报文交换时必须基于电信号,因此传输路径上每跳交换机都必须进行光信号到电信号再到光信号转换,也导致DCN网络能耗高、建设成本高、报文E2E传输时延高以及交换端口带宽受电信号处理能力约束受限等问题。
发明内容
本申请实施例提供了一种数据中心中通信的方法、装置和系统,以提高数据中心的通信效率或节约数据中心的能耗与成本。
第一方面,本申请提供一种数据中心中的通信方法,所述数据中心包括多台服务器,多个电交换机以及至少一个光交叉设备,所述多个电交换机中的至少两个电交换机的上行端口与所述至少一个光交叉设备互连;所述方法包括:接收网络拓扑管理器下发的网络拓扑信息,获取数据流,根据所述网络拓扑信息为该数据流配置路由策略,所配置的路由策略可以包括指示通过所述至少一个光交叉设备中的光通道转发所述数据流的第一路由策略,也可以包括指示将所述数据流切分为至少两个子数据流进行转发的第二路由策略,也可以包括指示通过所述数据中心中的电交换机转发所述数据流的第三路由策略,也可以包括对数据流的第一部分报文采取第三路由策略与对该数据流的第二部分报文采取第一路由策略的第一组合策略,也可以包括对数据流的第一部分报文采取第三路由策略与对该数据流的第二部分报文采取第二路由策略的第二组合策略。
优选地,所述方法由所述多台服务器中的至少一台服务器执行,或者,所述方法由所述多台服务器中的每台服务器执行。
优选地,所述方法由服务器中的网卡执行,例如带有网络处理器的智能网卡。
本发明实施例采用传统分组交换设备(例如电交换机)和光交叉设备混合DCN组网,并采用服务器上的源路由技术实现数据中心中的数据流的路由,把集中式算路由的工作分布到服务器上的智能网卡上独立计算,避免了SDN控制器集中式算路带来的控制器容错难以及网络故障后重路由和路由收敛慢问题。
可选地,在接收到所述数据流的第一报文时,根据所述数据流携带的地址信息和所述网络拓扑信息为所述数据流配置所述第三路由策略。
可选地,根据所述数据流的报文识别所述数据流的类型,根据所述数据流的类型为所述数据流配置更新后的路由策略,所述更新后的路由策略包括所述第一路由策略或所述第二路由策略。
可选地,为第一类型的数据流配置所述第一路由策略,为第二类型的数据流配置所述第二路由策略。
本发明实施例将分类的数据流调度到电交换机或者光交叉设备转发,以灵活利用电交换和光交叉的优势,不但可降低DCN网络建设成本,而且可大幅降低大数据流例如大象流对网络拥塞以及网络公平性的影响。对第一类型大象流直接采取光通道进行转发,减少此类大流量采用电交换机转发带来的大量的光电转换,可以提高第一类型大象流的转发效率以及减少数据中心的能耗,另外可以减少第一类型大象流在出现的时间段内对数据中心的其他流量的冲击。还例如,对于第二普通大象流采取切分转发的方式,可以提高此类大象流的转发效率。
可选地,还包括:从所述至少一个光交叉设备中选择一个目标光交叉设备,指示所述目标光交叉设备建立所述光通道。传统OxC设备上光通道是人工静态配置的,无法根据数据流量的动态变化灵活配置,本发明实施例通过服务器识别数据流量的类型或者变化,驱动光通道控制器动态配置光交叉设备动态建立光通道,可按需调度第一类型大象流到OxC的光通道,降低第一类型大象流对网络上其它流量的影响。
可选地,还包括:获得第一子路由信息,所述第一子路由信息包括将所述第一类型的数据流路由到所述目标光交叉设备的信息;根据所述第一子路由的信息转发所述第一类型的数据流的后续报文到所述目标光交叉设备。
可选地,还包括:根据所述第三路由策略获得至少两条等价子路径的路由信息。
可选地,还包括:将所述第二类型的数据流的后续报文切分为至少两个子数据流,根据所述至少两条等价路径的路由信息分别转发所述两个子数据流。对于普通大象流采取切分的方式,可以分散报文的转发,提高网络中各路径的负载均衡,从而提高通信效率。
可选地,所述第一类型的数据流为潮汐大象流,所述第二类型的数据流为普通大象流。
可选地,识别所述数据流中的报文携带的特征信息,若所述报文的特征信息与预设信息匹配,确定所述数据流为第一类型的数据流;或者,
识别所述数据流在第一时间段内的报文的数量,或第一时间段内的报文所包含的数据量,或第一时间段内的报文所需的带宽,若所述第一时间段内的报文的数量或数据量或所需带宽大于预设阈值,确定所述数据流为第二类型的数据流。
可选地,通过链路发现协议获取相邻交换机的网络连接信息,将获取到的网络连接信息上报到所述拓扑管理器。在服务器(例如智能网卡)上部署运行链路发现协议的代理,将服务器纳入网络中,在源端进行路由的分发和计算,分担了网络中交换机的负载。
第二方面,本申请提供一种数据中心,该数据中心包括多台服务器,多个电交换机,以及至少一个光交叉设备,该多个电交换机中的至少两个电交换机的上行端口与所述至少一个光交叉设备互连;所述多台服务器中的至少一台服务器,用于接收网络拓扑信息,获取数据流,根据所述网络拓扑信息为所述数据流配置路由策略,所述路由策略包括以下任意路由策略之一或组合:第一路由策略,所述第一路由策略指示通过所述至少一个光交叉设备中的光通道转发所述数据流;第二路由策略,所述第二路由策略指示将所述数据流切分为至少两个 子数据流进行转发;第三路由策略,所述第三路由策略指示通过所述数据中心中的电交换机转发所述数据流。
可选地,所述多台服务器和所述多个电交换机组成多个业务集群,其中,第一业务集群包括至少两台服务器,至少两个接入交换机和至少一个汇聚交换机,所述至少两个接入交换机的第一上行端口与所述至少一个光交叉设备互连,所述至少两个接入交换机的第二上行端口与所述至少一个汇聚交换机互连。
可选地,所述多台服务器和所述多个电交换机组成多个业务集群,其中,第一业务集群包括至少两个接入交换机和至少一个汇聚交换机,所述至少一个汇聚交换机的第一上行端口与所述至少一个光交叉设备互连,所述至少一个汇聚交换机的第二上行端口与骨干交换机互连。
可选地,所述多台服务器和所述多个电交换机组成多个业务集群,其中,第一业务集群包括至少两台服务器,至少两个接入交换机和至少一个汇聚交换机,所述至少两个接入交换机的第一上行端口与所述至少一个光交叉设备中的第一光交叉设备互连,所述至少两个接入交换机的第二上行端口与所述至少一个汇聚交换机互连;所述至少一个汇聚交换机的第一上行端口与所述至少一个光交叉设备中的第二光交叉设备互连,所述至少一个汇聚交换机的第二上行端口与骨干交换机互连。
可选地,所述数据中心还包括拓扑管理器,所述拓扑管理器用于获取所述数据中心中的各个设备发送的网络连接信息,根据所述各个设备的网络连接信息获得网络拓扑信息,并将所述网络拓扑信息下发给所述多台服务器中的至少一台服务器。
可选地,所述至少一台服务器包括网卡,所述网卡获取所述拓扑管理器下发的网络拓扑信息,所述网卡根据所述网络拓扑信息配置所述路由策略。
第三方面,本申请提供一种服务器,该服务器包括用于执行第一方面所述的任一项方法的处理单元。
第四方面,本申请提供一种网卡,该网卡包括网络处理器,该网络处理器用于执行第一方面所述的任一项方法。
第五方面,本申请提供一种服务器或网卡,该服务器包括处理器和存储器,其中,所述存储器用于存储程序代码,所述处理器用于执行所述程序代码以实现第一方面所述的任一项方法。
第六方面,本申请提供一种路由处理装置,应用于第二方面所述的数据中心,该路由处理装置包括源路由控制器,该源路由控制器用于接收网络拓扑管理器下发的网络拓扑信息,获取数据流,根据所述网络拓扑信息为该数据流配置路由策略,所配置的路由策略可以包括指示通过所述至少一个光交叉设备中的光通道转发所述数据流的第一路由策略,也可以包括指示将所述数据流切分为至少两个子数据流进行转发的第二路由策略,也可以包括指示通过所述数据中心中的电交换机转发所述数据流的第三路由策略,也可以包括对数据流的第一部分报文采取第三路由策略与对该数据流的第二部分报文采取第一路由策略的第一组合策略,也可以包括对数据流的第一部分报文采取第三路由策略与对该数据流的第二部分报文采取第二路由策略的第二组合策略。
可选地,该路由处理装置为服务器中的智能网卡。
可选地,该源路由控制器包括源路由引擎和光交叉控制引擎,该源路由引擎用于根据网络拓扑信息计算路由策略,该光交叉控制引擎用于在需要建立光通道时,控制光交叉设备建 立光通道。
可选地,该路由处理装置还包括识别引擎,用于识别数据流的类型,将不同类型的数据流上报给所述源路由引擎,并获得该源路由引擎提供的路由策略。
第七方面,本申请提供一种计算机可读存储介质或计算机程序产品,该计算机可读存储介质中存储了指令,该指令被处理器运行时,实现前述第二方面任意一种实现方式中提供的通信方法。该计算机可读存储介质包括但不限于只读存储器,随机访问存储器,快闪存储器、HDD或SSD。
本发明实施例采用传统分组交换设备(例如电交换机)和光交叉设备混合DCN组网,并采用服务器上的源路由技术实现数据中心中的数据流的路由,把集中式算路由的工作分布到服务器上的智能网卡上独立计算,避免了SDN控制器集中式算路带来的控制器容错难以及网络故障后重路由和路由收敛慢问题。
本发明实施例突破传统数据中心的组网架构,提供新型的光电混合路由的数据中心组网架构,在数据中心网络架构中提供光交叉设备与基础交换机或边缘交换机进行互连,并采用光通道转发相关数据流,提高了网络传输效率。
附图说明
为了更清楚地说明本申请实施例或背景技术中的技术方案,下面将对本申请实施例或背景技术中所需要使用的附图进行说明。
图1是传统的数据中心网络架构图;
图2是本发明实施例提供的数据中心网络架构图;
图3是本发明实施例提供的系统组网图;
图4是本发明实施例提供的拓扑管理方法流程图;
图5是根据本发明实施例的光通道建立与转发报文流程图;
图6是根据本发明实施例的数据中心通信方法流程图;
图7是根据本发明实施例的设备的组成示意图。
具体实施方式
下面结合本申请实施例中的附图对本申请实施例进行描述。
如图1所示,传统的数据中心大多采用胖树结构,网络结构采用树形三层结构,网络按照核心层、汇聚层和边缘接入层划分。每一个机架中装载有多个服务器,它们通过机架顶层交换机(ToR交换机)进行连接,ToR交换机作为边缘接入交换机与汇聚层交换机(Leaf交换机)互连,形成一个PoD(Point of Delivery)集群(本发明实施例中称为业务集群),在PoD集群中,为提高链路的可靠性,边缘交换机与汇聚交换机可以交叉全互联。三层的骨干交换机(例如Spine交换机)用于提供对外访问功能,以及提供PoD集群之间的互连,每个骨干交换机与至少一个汇聚交换机相连。上述传统数据中心中各层级的交换机包含能耗型的光-电(O-E)以及电-光(E-O)的收发器,存在着高能耗、建设成本高、报文E2E传输时延高以及交换端口带宽受电信号处理能力约束受限等问题。另外,数据中心中产生的流量的种类也越来越复杂,对流量的管理也可以提高数据中心的效率。
光交叉连接(Optical Cross-Connect,OXC)是一种用于光纤网络节点的设备,通过对光信号进行交叉连接,能够灵活有效地管理光传输网络。由于OxC在信号传送过程中无需光电转换,所以存在成本低、转发时延为零、链路带宽理论上可以无限宽等特点,在现有大规模数 据中心网络中主要用于DCI(Data Center Interconnect)互联不同地域的数据中心。
本发明实施例采用传统分组交换设备(例如电交换机)和光交叉设备混合DCN组网,并采用源路由技术实现数据中心中的数据流的路由,具体地,数据中心中的服务器采用源路由技术对数据流进行路由,再具体地,可以由服务器上的智能网卡动态识别并区分物理网络中不同类型的流量,智能网卡通过源路由技术将分类的数据流调度到电交换机或者光交叉设备转发,以灵活利用电交换和光交叉的优势,不但可降低DCN网络建设成本,而且可大幅降低大数据流例如大象流对网络拥塞以及网络公平性的影响。
如图2所示为本发明实施例提供的DCN组网拓扑架构,在胖树网络的Spine及Leaf交换机层,除了传统的交换设备外,再部署至少一个光交叉设备OxC,并把Leaf和/或ToR交换机的上行端口与OxC的光交叉端口互连。如图2所示,PoD1中的Leaf层还部署光交叉设备OXC1,PoD1中的ToR1至ToRn的上行端口分别与OXC1的光交叉端口进行连接,以此类推,PoDn中的Leaf层也部署光交叉设备OXCn,PoDn中的ToR1至ToRn的上行端口分别与OXCn的光交叉端口进行连接,图2所示的PoD内部光交叉设备OXC的部署数量与连接方式只是其中一种可实现的实施例,每个PoD内部还可以部署更多数量的光交叉设备OXC,每个PoD内部的ToR交换机与PoD内部的光交叉设备OXC的连接方式可以采取全互连的方式,也可以采用其他的部分连接的方式连接,本实施例不赘述。图2中,在PoD外部还部署有光交叉设备OXC11,PoD1中的交换机Leaf1至Leafn的上行端口分别与OXC11的光交叉端口进行连接,以此类推,PoDn中的交换机Leaf1至Leafn的上行端口分别与OXC11的光交叉端口进行连接,图2所示的PoD外部光交叉设备OXC的部署数量与连接方式只是其中一种可实现的实施例,PoD外部还可以部署更多数量的光交叉设备OXC,PoD内部的Leaf交换机与PoD外部的光交叉设备OXC的连接方式可以采取全互连的方式,也可以采用其他的部分连接的方式连接,本实施例不赘述。图示中虚线连接仅用于示例电交换机与光交叉设备之间的连接与电交换机之间的连接两种连接方式,并非表示两者不存在连接关系。
基于图2所示的实施例,对于PoD内服务器之间跨ToR交换机的流量,例如PoD1中的ToR1与ToRn之间的数据交互,除了传统的上行ToR1->Leaf1->下行ToRn的转发路径外,还可以由智能网卡调度到上行ToR1->OxC1->下行ToRn的转发途径。对于跨PoD的流量,例如PoD1中的ToR1与PoDn中的ToR1之间的数据交互,除了传统的上行ToR1->上行Leaf1->Spine1->下行Leaf1>下行ToR1的转发路径外,还可以由智能网卡调度到ToR1->上行Leaf1->OxC11->下行Leaf1>下行ToR1的转发途径。
上述DCN组网中还可以包括拓扑管理器(图未示),该拓扑管理器可以是数据中心中任意一台或多台服务器,也可以是任意一台或多台服务器上运行的拓扑管理组件,该拓扑管理器还可以是软件定义网络(Software Defined Network,SDN)控制器。该拓扑管理器用于对全网拓扑进行收集和管理,并将收集到的全网拓扑信息发送到每台服务器中的网卡。其中,电交换机的拓扑可以通过链路层发现协议(Link Layer Discovery Protocol,LLDP)收集后统一上报给拓扑管理器综合后获得,光交叉设备OxC的端口拓扑信息可以在拓扑管理器中静态配置。
上述DCN组网中还包括OXC控制器(图未示),该OXC控制器独立的设备,也可以是每个OXC光交叉设备中的模块。该OXC控制器用于对光通道进行管理。
本发明实施例中,对服务器中的智能网卡进行改进以使得智能网卡具备源路由功能,具体地,可以对智能网卡中的可编程的网络处理器进行编程,以使得智能网卡可以实现源路由 功能,源路由功能可以包括本发明实施例中的网络连接信息的发现,网络拓扑数据流的识别,以及数据流的路由等功能。本发明实施中,可以在数据中心中的每个PoD中选择至少一台服务器进行配置,使得该选择的服务器具备源路由功能,也可以选择数据中心中的每台服务器上的智能网卡进行配置,以实现分布式的源路由技术。本发明实施例以服务器上的智能网卡为例来说明,具体实现中,也可以是服务器上的其他设备来执行上述改进智能网卡所执行的功能。
在SDN网络中,SDN控制器收集全网拓扑后会集中计算全局路由并下发流转发表到电交换机中,本发明实施例采用服务器上的智能网卡实现源路由控制,把集中式算路由的工作分布到服务器上的智能网卡上独立计算,避免了SDN控制器集中式算路带来的控制器容错难以及网络故障后重路由和路由收敛慢问题。
本发明实施例中,数据中心中的服务器上的智能网卡可以从拓扑管理器获取全网的网络拓扑信息,智能网卡还可以基于统计的每条数据流已发送流量特性(如单位时间发送的报文数目、报文携带的数据量或报文占用的总带宽),识别不同类型的网络流量,采用切分(针对普通大象流)或者旁路(潮汐大象流)的不同的路由策略。在实际数据中心网络中,导致网络拥塞的流量可能指只占10%,而这些流量占总流量大小的90%,这种流量被称为大象流(Elephant Flow),其余流量被称为老鼠流(Mice Flow)。当前互联网应用越来越多,不同的应用由其自身的特点决定其需要不同的网络能力来匹配。在众多的应用中,视频类、智能AI业务、游戏类业务等应用在互联网流量中所占比重已经超过70%,并且未来还将越来越大,此类业务也会呈周期性或者在固定的时间段产生流量洪峰,例如人们习惯在某个时间段内进行该类业务,因此而周期性出现的大流量的数据流,在本发明实施例中将此类数据流称为潮汐大象流,其他非周期性出现或者不固定出现的大流量的数据流在本发明实施例中被称为普通大象流。本实施例中,以潮汐大象流与普通大象流为例来说明对服务器产生的大象流的区分,实际业务中,还可以是其他的分类标准或分类结果。本发明实施例智能网卡上的源路由控制面可以动态根据数据流量类型和/或网络拓扑动信息动态规划报文到达目的节点的路由策略,智能网卡可以根据不同的路由策略生成源路由转发标签栈将不同类型的大象流分流到电交换机或者光交叉设备,例如对潮汐大象流直接采取光通道进行转发,减少大流量采用电交换机转发带来的大量的光电转换,可以提高潮汐大象流的转发效率以及减少数据中心的能耗,另外可以减少潮汐大象流在出现的时间段内对数据中心的其他流量的冲击。还例如,对于普通大象流,本发明实施例可以采取切分转发的方式,可以提高大象流的转发效率。另外,传统OxC设备上光通道是人工静态配置的,无法根据数据流量的动态变化灵活配置,本发明实施例通过智能网卡识别数据流量的类型或者变化,驱动光通道控制器动态配置OxC设备动态建立光通道,可按需调度潮汐大象流到OxC的光通道,降低潮汐大象流对网络上其它流量的影响。
如图3所示,管理和调度这样一个光电混合的数据中心网络,需要在数据面和控制面两个平面进行相应的适配,其中控制平面由拓扑管理器和OxC控制器组成,而数据面由服务器上的智能网卡、ToR/Leaf/Spine等电交换机、光交叉设备OxC组成。
拓扑管理器31:负责从运行链路发现协议(Link Layer Discovery Protocol,LLDP)代理的智能网卡,以及各级电交换机上收集各个节点的网络连接信息,并综合成全网的网络拓扑信息拓扑后通过控制网络下发给服务器,优选地,可以是整网中的所有服务器上的智能网卡。
OxC控制器32:负责接收从服务器上的智能网卡的控制面发送过来的建立光通道的请 求,并下发给光交叉设备36建立一条从OxC源端口和OxC目的端口之间的光通道。
智能网卡控制平面33:包括源路由控制器,用于基于拓扑管理器下发的网络拓扑信息计算从本服务器节点到达整网中其他服务器节点的路由策略(转发路径),其中,源路由引擎331计算经过电交换机的主路径和多条可选备份路径,OxC引擎332计算经过光交叉设备OxC的光通道,也可称为快速转发路径。当接收到数据流的报文后,源路由控制器响应从智能网卡数据平面提交的数据转发请求,基于选路策略和选路算法确定该数据流的转发路径并生成转发标签栈,然后下发给数据平面。具体的,可以根据数据流的类型,生成不同的转发路径。
智能网卡数据平面34:包括识别引擎341,用于基于每条数据流(由五元组或者七元组确定)已发送流量的报文携带的特征信息或报文的统计性特征,识别出潮汐大象流或者普通大象流,并提交给智能网卡控制平面请求采用普通路径或者快速转发路径进行转发。在接收到控制平面下发的源路由标签栈后,智能网卡数据平面会根据不同数据流的标签栈信息,逐包打上标签栈后发送,并在源路由标签表342中存储源路由标签信息。
电交换机35:支持源路由标签栈的电交换机,会逐包解析报文头中的标签栈(标签栈包含了转发路径上每一跳电交换机的出端口号),把标签栈上对应本跳交换机的出端口信息解析出来后,直接通过该出端口转发报文,而不关心下一跳是电交换机还是光交叉设备。
下表示例了几种转发路径的相关信息:
Figure PCTCN2021103256-appb-000001
如图4所示,本发明实施例提供的DCN混合组网中拓扑发现的方法流程图包括:
4-1,拓扑管理器获取静态拓扑配置文件,该静态拓扑配置文件中包含数据中心中的电交换机与光交叉设备OxC之间的网络连接信息,具体的,可以获得数据中心中全部的电交换机与全部的光交叉设备OxC的连接关系,例如,PoD1-ToR1的上行端口->PoD1-OxC1,PoDn-ToR1的上行端口->PoDn-OxC1,PoD1-Leaf1的上行端口->OxC11,PoDn-Leaf1的上行端口->OxC11。
4-2,服务器上的智能网卡运行LLDP代理(agent),获取邻居节点的网络连接信息(例如端口序号、IP地址或MAC地址等),邻居节点可以是相邻交换机,具体的,可以是数据中心中全部的服务器上的智能网卡运行LLDP代理以收集邻居节点的网络连接信息,也可以是部分服务器上的智能网卡运行LLDP代理以收集邻居节点的网络连接信息,图4中仅示出其中之一。
4-3,电交换机运行LLDP代理(agent),通过LLDP协议获取邻居节点的网络连接信息(例如端口序号、IP地址或MAC地址等),邻居节点可以是相邻交换机也可以是相邻服务器,具体的,可以是数据中心中全部的电交换机运行LLDP代理以收集邻居节点的网络信息,也可以是部分交换机运行LLDP代理以收集邻居节点的网络连接信息(例如未激活的电交换机可 以不加入网络),图4中仅示出其中之一。
4-4,服务器上的智能网卡向拓扑管理器上报收集到的邻居节点的网络连接信息,具体的,可以是数据中心中的全部服务器上的智能网卡周期性的将自己搜集到的邻居节点的网络连接信息上报给拓扑管理器。另外,智能网卡还可以向拓扑管理器上报自己的网络连接信息。
4-5,电交换机上报收集到的邻居节点的网络连接信息,具体的,可以是数据中心中的全部电交换机周期性的将自己搜集到的邻居节点的网络连接信息上报给拓扑管理器。另外,每个电交换机还可以向拓扑管理器上报自己的网络连接信息。
4-6,拓扑管理器基于各个节点收集的信息综合获得包含全网所有智能网卡、电交换机和光交叉设备OxC的网络拓扑信息。
4-7,拓扑管理器把全网的网络拓扑信息下发给数据中心中的至少一个智能网卡,具体的,每个服务器内部的一个智能网卡用于负责本服务器内部的数据流的识别和数据流的转发,因此,拓扑管理器将全网的网络拓扑信息下发给每个服务器内部的至少一个智能网卡。
由于运行LLDP代理的电交换机或智能网卡可以周期性的交换心跳,并刷新邻居节点的网络连接信息,当数据中心中各节点的网络连接信息发生变化,全网的网络拓扑信息也会发生变化,因此拓扑管理器也会周期性地或者根据服务器上的智能网卡的请求重新下发刷新的网络拓扑信息。
在SDN网络中,SDN控制器收集全网网络拓扑信息后会集中计算全局路由并下发流转发表到各个电交换机中,本实施例中,拓扑管理器下发网络拓扑信息到各服务器上,把集中式算路由的工作分布到服务器上独立计算,避免了SDN控制器集中式算路带来的控制器容错难以及网络故障后重路由和路由收敛慢问题。
如图5所示,本发明实施例采用光电混合架构的数据中心中网络报文流量的发送流程图中,包括:
5-1,服务器生成数据流的报文,该报文到达本地的智能网卡,该服务器上的智能网卡识别服务器产生的数据流的类型。
智能网卡可以识别数据流中的报文携带的特征信息,若报文的特征信息与预设信息匹配,确定所述数据流为潮汐大象流;智能网卡可以识别数据流在第一时间段内的报文的数量,或第一时间段内的报文所包含的数据量,或第一时间段内的报文所需的带宽,若第一时间段内的报文的数量或数据量或所需带宽大于预设阈值,确定该数据流为普通大象流。智能网卡可以通过定期的采样统计流量,例如通过数据流中的数据报文的净荷大小进行判断,数据报文大小大于预设阈值的被识别为大象流,又或者,还可以计算数据流在某个时间段内的平均流量,当计算出的平均流量大于预设阈值时,将数据流识别为大象流,进一步地,根据预置的信息确定该大象流为普通大象流还是潮汐大象流。上述识别流量的方法可以任选其一,也可以任意组合已获得更精确的识别结果。智能网卡还可以内置人工智能AI的算法模型,采用数据报文流量对AI算法模型进行训练,并采用训练后的AI算法模型来识别数据报文流量并对数据报文流量进行分类。智能网卡对数据报文流量的识别,可以是实时的,也可以是非实时的。
5-2,智能网卡在识别出该数据报文为潮汐大象流后,确定将该潮汐大象流通过光通道进行转发,智能网卡上的源路由控制器可以根据网络拓扑信息从数据中心中的光交叉设备中选择目标光交叉设备,通过控制网络向目标光交叉设备所对应的光交叉控制器发送OxC光通道建立请求。
具体地,智能网卡可以根据网络拓扑信息选择一个目标光交叉设备OxC,也可以选择多个目标光交叉设备OxC。如果选择光交叉设备OxC的原则可以是发送端就近原则,选择距离发送端服务器最近的光交叉设备OxC,也可以选择距离发送端服务器所连接的ToR或Leaf交换机最近的光交叉设备OxC,也可以采用目的端就近原则,选择距离目的服务器最近的光交叉设备OxC,也可以选择距离目的端服务器所连接的ToR或Leaf交换机最近的光交叉设备OxC。
5-3,光交叉控制器发送控制命令给目标光交叉设备,指示该目标光交叉设备建立相应的光通道。
5-4,目标光交叉设备建立相应的光通道。
5-5,目标光交叉设备建立光通道后,反馈给光交叉控制器。
5-6,光交叉控制器将光通道建立成功的响应反馈到智能网卡上的源路由控制器。
5-7,智能网卡上源路由控制器根据网络拓扑信息计算该潮汐大象流从电交换机旁路到OxC光通道的第一子路由,并获得该第一子路由的信息。
5-8,智能网卡上源路由控制器根据该第一子路由信息生成新的源路由标签并下发到智能网卡数据面中对应该潮汐大象流的转发流表中。
5-9,智能网卡基于新源路由标签给该潮汐大象流的后续报文打上标签后转发。
具体地,可以根据新的源路由标签栈转发所述潮汐大象流的后续报文到目标光交叉设备。
传统OxC设备上光通道是人工静态配置的,无法根据数据流量的动态变化灵活配置,本发明实施例通过智能网卡识别网络流量的变化,驱动光通道控制器动态配置OxC设备动态建立光通道,可按需调度周期性大象流或潮汐大象流到OxC通道,降低此类大象流对网络上其余流量的影响,同时提高了此类大象流的转发效率,并因为避免了大量数据流的光电转换操作,从而降低了数据中心中的能耗,节约了成本。
如图6所示,本发明实施例采用光电混合的数据中心进行数据流转发的另一方法流程图。
本发明实施例中,智能网卡对数据流的类型进行识别和转发的其中一种实现方式如图6所示,包括:
6-1,在智能网卡上或者在服务器内部配置潮汐大象流的差分服务代码点(DifferentiatedServices Code Point,DSCP)的值的范围。
DiffServ体系规定每个传输报文将在网络中被分类到不同的类别。在遵循DiffServ体系的网络中,各交换机和路由器对包含同样分类信息的报文采取同样的传输服务策略,对包含不同分类信息的报文采取不同的传输服务策略。报文的分类信息可以被网络上的主机、交换机、路由器或者其它网络设备赋予。识别报文的内容以便为报文赋予类别信息的做法往往需要消耗网络设备的大量处理资源,本发明实施例将DSCP识别运用于服务器上,具体地,运用于服务器上的智能网卡,可以避免避免网络资源如交换机的处理开销被大量占用。服务器输出或接收的数据报文可以基于不同的应用策略或者基于报文内容的不同为报文赋予不同的类别信息。数据报文的DSCP值的不同指示了数据报文的不同的类型。
不同的DSCP值(本发明称为报文的特征信息)代表了不同类型的报文,DSCP的值可以有不同的指示方式,数据报文的DSCP值可以被包含在IP报文头中,例如使用了IP报文头中的服务类型(Type Of Service,TOS)来携带报文的DSCP值(分类信息),或者,通过报文的第二层报文头中包含User Priority bits,来确定报文的DSCP值。还例如,通过提取报文的源MAC地址、目的MAC地址以及Ethertype域来匹配关联的接入控制列表(Access  Control Lists,ACLs),以确定报文的DSCP值。还例如,根据报文输入端口的缺省CoS值来获得报文的DSCP值。
6-2,在在智能网卡上或者在服务器内部的存储器上配置普通大象流在单位时间内的报文数量或报文携带的数据量或报文所占的带宽的预设阈值。
6-3,对于接收到的每个数据流,当接收到该数据流的第一个数据报文,根据报文中携带的地址信息,例如目的地址信息或源地址信息,生成初始的路由信息,该初始的路由信息包括初始的转发路径或初始的路由标签。
上述两个步骤之间没有固定的执行顺序,可同时执行,亦可各自先后执行。
6-4,继续接收第一数据流的报文,获取继续接收到的报文携带的DSCP值,根据该报文携带的DSCP值确定该第一数据流是否属于潮汐大象流。
6-5,对于判断为潮汐大象流的第一数据流,通知智能网卡中的源路由引擎。
6-6,源路由引擎通知OxC引擎,OxC引擎建立光通道。
6-7,源路由引擎生成该第一数据流路由到光通道的新的路由标签。
6-8,继续接收第二数据流,周期性采样获得单位时间内的第二数据流的数据报文的数量或报文流量(报文携带的数据量或报文占用的带宽)。
6-9,对于判断为普通大象流的第二数据流,通知智能网卡中的源路由引擎。
6-10,源路由引擎生成该普通大象流的多条等价转发路径,并生成相应的路由标签。源路由引擎可以根据之前为该第二数据流生成的初始路由信息生成多条等价转发路径。
6-11,源路由引擎将为该第二数据流生成的多条等价转发路径的路由标签下发到智能网卡的数据面。
6-12,智能网卡的数据面将第二数据流的后续报文切分为至少两个子数据流,根据所述至少两条等价路径的路由信息分别转发所述两个子数据流。
6-13,对于判断为一般数据流的第三数据流,继续根据该第三数据流的初始路由信息转发该第三数据流的后续报文。
图7为本发明实施例提供的一种设备700的示意图,如图所示,所述设备700包括处理器701、存储器702、通信接口703和总线704。其中,处理器701、存储器702、通信接口703通过总线704进行通信,也可以通过无线传输等其他手段实现通信。该存储器702用于存储程序代码7021,处理器701用于调用存储器702存储的程序代码7021以执行本申请实施例多介绍的各方法的操作。
处理器701可以执行本发明实施例中的方法相关的操作。
应理解,在本发明实施例中,处理器701可以是CPU,该处理器701还可以是其他通用处理器、数字信号处理器(DSP)、专用集成电路(ASIC)、现场可编程门阵列(FPGA)、GPU、网络处理器或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。通用处理器可以是微处理器或者是任何常规的处理器等。
存储器702可以是易失性存储器或非易失性存储器,或可包括易失性和非易失性存储器两者。其中,非易失性存储器可以是只读存储器(read-only memory,ROM)、可编程只读存储器(programmable ROM,PROM)、可擦除可编程只读存储器(erasable PROM,EPROM)、电可擦除可编程只读存储器(electrically EPROM,EEPROM)或闪存。易失性存储器可以是随机存取存储器(random access memory,RAM),其用作外部高速缓存。通过示例性但不是限制性说明,许多形式的RAM可用,例如静态随机存取存储器(static RAM,SRAM)、动 态随机存取存储器(DRAM)、同步动态随机存取存储器(synchronous DRAM,SDRAM)、双倍数据速率同步动态随机存取存储器(double data date SDRAM,DDR SDRAM)、增强型同步动态随机存取存储器(enhanced SDRAM,ESDRAM)、同步连接动态随机存取存储器(synchlink DRAM,SLDRAM)和直接内存总线随机存取存储器(direct rambus RAM,DR RAM)。
总线704除包括数据总线之外,还可以包括电源总线、控制总线和状态信号总线等。但是为了清楚说明起见,在图中将各种总线都标为总线704。
最后,应了解上述实施例仅仅用于阐释,本申请的技术方案不限于此。尽管参考上述优选实施例对本申请进行详细描述,但是应了解,所属领域的技术人员可在不脱离本申请所附权利要求书的范围的情况下,做出各种修改、变更或替换。

Claims (32)

  1. 一种数据中心中的通信方法,其特征在于,所述数据中心包括多台服务器,多个电交换机,以及至少一个光交叉设备,所述多个电交换机中的至少两个电交换机的上行端口与所述至少一个光交叉设备互连;所述方法包括:
    接收拓扑管理器下发的网络拓扑信息;
    获取数据流;
    根据所述网络拓扑信息为所述数据流配置路由策略,所述路由策略包括以下任意路由策略之一或组合:
    第一路由策略,所述第一路由策略指示通过所述至少一个光交叉设备中的光通道转发所述数据流;
    第二路由策略,所述第二路由策略指示将所述数据流切分为至少两个子数据流进行转发;
    第三路由策略,所述第三路由策略指示通过所述数据中心中的电交换机转发所述数据流。
  2. 根据权利要求1所述的方法,其特征在于,所述根据所述网络拓扑信息为所述数据流配置路由策略包括:
    在接收到所述数据流的第一报文时,根据所述数据流携带的地址信息和所述网络拓扑信息为所述数据流配置所述第三路由策略。
  3. 根据权利要求2所述的方法,其特征在于,所述根据所述网络拓扑信息为所述数据流配置路由策略还包括:
    根据所述数据流的报文识别所述数据流的类型,根据所述数据流的类型为所述数据流配置更新后的路由策略,所述更新后的路由策略包括所述第一路由策略或所述第二路由策略。
  4. 根据权利要求3所述的方法,其特征在于,根据所述数据流的类型为所述数据流配置更新后的路由策略包括:
    为第一类型的数据流配置所述第一路由策略。
  5. 根据权利要求4所述的方法,其特征在于,所述方法还包括:
    从所述至少一个光交叉设备中选择一个目标光交叉设备,指示所述目标光交叉设备建立所述光通道。
  6. 根据权利要求5所述的方法,其特征在于,所述方法还包括:
    获得第一子路由信息,所述第一子路由信息包括将所述第一类型的数据流路由到所述目标光交叉设备的信息;
    根据所述第一子路由的信息转发所述第一类型的数据流的后续报文到所述目标光交叉设备。
  7. 根据权利要求3所述的方法,其特征在于,根据所述数据流的类型为所述数据流配置更新后的路由策略包括:
    为第二类型的数据流配置所述第二路由策略。
  8. 根据权利要求7所述的方法,其特征在于,所述方法还包括:
    根据所述第三路由策略获得至少两条等价子路径的路由信息。
  9. 根据权利要求8所述的方法,其特征在于,所述方法还包括:
    将所述第二类型的数据流的后续报文切分为至少两个子数据流,根据所述至少两条等价 路径的路由信息分别转发所述两个子数据流。
  10. 根据权利要求4或7所述的方法,其特征在于,所述第一类型的数据流为潮汐大象流,所述第二类型的数据流为普通大象流。
  11. 根据权利要求10所述的方法,其特征在于,所述识别所述数据流的类型包括:
    识别所述数据流中的报文携带的特征信息,若所述报文的特征信息与预设信息匹配,确定所述数据流为第一类型的数据流;或者,
    识别所述数据流在第一时间段内的报文的数量,或第一时间段内的报文所包含的数据量,或第一时间段内的报文所需的带宽,若所述第一时间段内的报文的数量或数据量或所需带宽大于预设阈值,确定所述数据流为第二类型的数据流。
  12. 根据权利要求1-11任一项所述的方法,其特征在于,所述方法还包括:
    通过链路发现协议获取相邻交换机的网络连接信息,将获取到的网络连接信息上报到所述拓扑管理器。
  13. 根据权利要求1-12任一项所述的方法,其特征在于,所述方法由所述多台服务器中的至少一台服务器执行。
  14. 根据权利要求13所述的方法,其特征在于,所述方法由所述至少一台服务器中的网卡执行。
  15. 一种数据中心,其特征在于,所述数据中心包括多台服务器,多个电交换机,以及至少一个光交叉设备,所述多个电交换机中的至少两个电交换机的上行端口与所述至少一个光交叉设备互连;
    所述多台服务器中的至少一台服务器,用于接收网络拓扑信息,获取数据流,根据所述网络拓扑信息为所述数据流配置路由策略,所述路由策略包括以下任意路由策略之一或组合:
    第一路由策略,所述第一路由策略指示通过所述至少一个光交叉设备中的光通道转发所述数据流;
    第二路由策略,所述第二路由策略指示将所述数据流切分为至少两个子数据流进行转发;
    第三路由策略,所述第三路由策略指示通过所述数据中心中的电交换机转发所述数据流。
  16. 根据权利要求15所述的数据中心,其特征在于,所述多台服务器和所述多个电交换机组成多个业务集群,其中,第一业务集群包括至少两台服务器,至少两个接入交换机和至少一个汇聚交换机,所述至少两个接入交换机的第一上行端口与所述至少一个光交叉设备互连,所述至少两个接入交换机的第二上行端口与所述至少一个汇聚交换机互连。
  17. 根据权利要求15所述的数据中心,其特征在于,所述多台服务器和所述多个电交换机组成多个业务集群,其中,第一业务集群包括至少两个接入交换机和至少一个汇聚交换机,所述至少一个汇聚交换机的第一上行端口与所述至少一个光交叉设备互连,所述至少一个汇聚交换机的第二上行端口与骨干交换机互连。
  18. 根据权利要求15所述的数据中心,其特征在于,所述多台服务器和所述多个电交换机组成多个业务集群,其中,第一业务集群包括至少两台服务器,至少两个接入交换机和至少一个汇聚交换机,所述至少两个接入交换机的第一上行端口与所述至少一个光交叉设备中的第一光交叉设备互连,所述至少两个接入交换机的第二上行端口与所述至少一个汇聚交换机互连;所述至少一个汇聚交换机的第一上行端口与所述至少一个光交叉设备中的第二光交叉设备互连,所述至少一个汇聚交换机的第二上行端口与骨干交换机互连。
  19. 根据权利要求15-18任一项所述的数据中心,其特征在于,所述数据中心还包括拓 扑管理器,所述拓扑管理器用于获取所述数据中心中的各个设备发送的网络连接信息,根据所述各个设备的网络连接信息获得网络拓扑信息,并将所述网络拓扑信息下发给所述多台服务器中的至少一台服务器。
  20. 根据权利要求19所述的数据中心,其特征在于,所述至少一台服务器包括网卡,所述网卡获取所述拓扑管理器下发的网络拓扑信息,所述网卡根据所述网络拓扑信息配置所述路由策略。
  21. 根据权利要求20所述的数据中心,其特征在于,所述网卡还用于通过链路发现协议获取相邻交换机的网络连接信息,将获取到的网络连接信息上报到所述拓扑管理器。
  22. 根据权利要求20或21所述的数据中心,其特征在于,所述网卡在接收到所述数据流的第一报文时,根据所述数据流携带的地址信息和所述网络拓扑信息为所述数据流配置所述第三路由策略。
  23. 根据权利要求22所述的数据中心,其特征在于,所述网卡根据所述数据流的报文识别所述数据流的类型,根据所述数据流的类型为所述数据流配置更新后的路由策略,所述更新后的路由策略包括所述第一路由策略或所述第二路由策略。
  24. 根据权利要求23所述的数据中心,其特征在于,所述网卡为第一类型的数据流配置所述第一路由策略,或者,为第二类型的数据流配置所述第二路由策略。
  25. 根据权利要求24所述的数据中心,其特征在于,所述网卡还用于从所述至少一个光交叉设备中选择一个目标光交叉设备,指示所述目标光交叉设备建立所述光通道。
  26. 根据权利要求25所述的数据中心,其特征在于,所述网卡还用于获得第一子路由信息,所述第一子路由信息包括将所述第一类型的数据流路由到所述目标光交叉设备的信息;以及根据所述第一子路由的信息转发所述第一类型的数据流的后续报文到所述目标光交叉设备。
  27. 根据权利要求24所述的数据中心,其特征在于,所述网卡还用于根据所述第三路由策略获得至少两条等价子路径的路由信息。
  28. 根据权利要求27所述的数据中心,其特征在于,所述网卡还用于将所述第二类型的数据流的后续报文切分为至少两个子数据流,根据所述至少两条等价路径的路由信息分别转发所述两个子数据流。
  29. 一种服务器,其特征在于,所述服务器包括用于执行权利要求1-14所述的任一项方法的处理单元。
  30. 一种网卡,其特征在于,所述网卡包括网络处理器,所述网络处理器被配置为执行权利要求1-14所述的任一项方法。
  31. 一种服务器,其特征在于,所述服务器包括处理器和存储器,其中,所述存储器用于存储程序代码,所述处理器用于执行所述程序代码以实现如权利要求1-14所述的方法。
  32. 一种网卡,其特征在于,所述网卡包括网络处理器和存储器,其中,所述存储器用于存储程序代码,所述处理器用于执行所述程序代码以实现如权利要求1-14所述的方法。
PCT/CN2021/103256 2020-08-17 2021-06-29 数据中心中的通信方法、装置和系统 WO2022037266A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
EP21857362.4A EP4184937A4 (en) 2020-08-17 2021-06-29 METHOD, DEVICE AND SYSTEM FOR COMMUNICATION IN A DATA CENTER
US18/170,293 US20230198896A1 (en) 2020-08-17 2023-02-16 Method for communication in data center, apparatus, and system

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
CN202010826601.3 2020-08-17
CN202010826601 2020-08-17
CN202011639115.7 2020-12-31
CN202011639115.7A CN114079625A (zh) 2020-08-17 2020-12-31 数据中心中的通信方法、装置和系统

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US18/170,293 Continuation US20230198896A1 (en) 2020-08-17 2023-02-16 Method for communication in data center, apparatus, and system

Publications (1)

Publication Number Publication Date
WO2022037266A1 true WO2022037266A1 (zh) 2022-02-24

Family

ID=80282854

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/103256 WO2022037266A1 (zh) 2020-08-17 2021-06-29 数据中心中的通信方法、装置和系统

Country Status (4)

Country Link
US (1) US20230198896A1 (zh)
EP (1) EP4184937A4 (zh)
CN (1) CN114079625A (zh)
WO (1) WO2022037266A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11671350B1 (en) * 2022-08-15 2023-06-06 Red Hat, Inc. Data request servicing using multiple paths of smart network interface cards

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115065632B (zh) * 2022-03-31 2023-11-17 重庆金美通信有限责任公司 一种轻量化的树形网络数据转发方法
CN115567446A (zh) * 2022-07-07 2023-01-03 华为技术有限公司 报文转发方法、装置、计算设备及卸载卡
CN115314262B (zh) * 2022-07-20 2024-04-23 杭州熠芯科技有限公司 一种可信网卡的设计方法及其组网方法

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103441942A (zh) * 2013-08-26 2013-12-11 重庆大学 基于软件定义的数据中心网络系统及数据通信方法
US20140270762A1 (en) * 2013-03-15 2014-09-18 Plexxi Inc System and method for data center optical connection
CN105516830A (zh) * 2014-10-16 2016-04-20 中国电信股份有限公司 基于波长交换的数据中心光网络通信方法和系统
CN106209294A (zh) * 2016-07-01 2016-12-07 西安电子科技大学 一种高扩展的数据中心全光互连网络系统及通信方法
US20170019168A1 (en) * 2014-03-10 2017-01-19 Aeponyx Inc. Methods and systems relating to optical networks
CN106941633A (zh) * 2017-02-20 2017-07-11 武汉邮电科学研究院 基于sdn的全光交换数据中心网络控制系统及其实现方法
CN107196878A (zh) * 2016-03-14 2017-09-22 华为技术有限公司 一种光电混合网络、系统确定方法以及接入交换机
US20200236449A1 (en) * 2013-02-27 2020-07-23 Juniper Networks, Inc. Data center architecture utilizing optical switches

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200236449A1 (en) * 2013-02-27 2020-07-23 Juniper Networks, Inc. Data center architecture utilizing optical switches
US20140270762A1 (en) * 2013-03-15 2014-09-18 Plexxi Inc System and method for data center optical connection
CN103441942A (zh) * 2013-08-26 2013-12-11 重庆大学 基于软件定义的数据中心网络系统及数据通信方法
US20170019168A1 (en) * 2014-03-10 2017-01-19 Aeponyx Inc. Methods and systems relating to optical networks
CN105516830A (zh) * 2014-10-16 2016-04-20 中国电信股份有限公司 基于波长交换的数据中心光网络通信方法和系统
CN107196878A (zh) * 2016-03-14 2017-09-22 华为技术有限公司 一种光电混合网络、系统确定方法以及接入交换机
CN106209294A (zh) * 2016-07-01 2016-12-07 西安电子科技大学 一种高扩展的数据中心全光互连网络系统及通信方法
CN106941633A (zh) * 2017-02-20 2017-07-11 武汉邮电科学研究院 基于sdn的全光交换数据中心网络控制系统及其实现方法

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP4184937A4

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11671350B1 (en) * 2022-08-15 2023-06-06 Red Hat, Inc. Data request servicing using multiple paths of smart network interface cards

Also Published As

Publication number Publication date
CN114079625A (zh) 2022-02-22
EP4184937A1 (en) 2023-05-24
EP4184937A4 (en) 2024-01-24
US20230198896A1 (en) 2023-06-22

Similar Documents

Publication Publication Date Title
WO2022037266A1 (zh) 数据中心中的通信方法、装置和系统
EP2911348B1 (en) Control device discovery in networks having separate control and forwarding devices
US7760738B1 (en) Admission control for services
CN103746931B (zh) 一种用于计算机网络的方法、网络设备及服务卡
CN109714275B (zh) 一种用于接入业务传输的sdn控制器及其控制方法
US7046665B1 (en) Provisional IP-aware virtual paths over networks
EP3949293A1 (en) Slice-based routing
US9667570B2 (en) Fabric extra traffic
US8040901B1 (en) Packet queueing within ring networks
CN106341346A (zh) 基于SDN的数据中心网络中一种保障QoS的路由算法
US6487177B1 (en) Method and system for enhancing communications efficiency in data communications networks
CN100356757C (zh) 光因特网络的服务质量控制方法
CN108833279A (zh) 软件定义网络中基于业务分类的多约束QoS路由的方法
CN112350949B (zh) 软件定义网络中基于流调度的重路由拥塞控制方法及系统
WO2008145043A1 (en) Traffic distribution and bandwidth management for link aggregation
CN103067291A (zh) 一种上下行链路关联的方法和装置
Paliwal et al. Effective resource management in SDN enabled data center network based on traffic demand
CN101127723B (zh) 多协议标签交换三层虚拟专用网服务质量保障方法
Rahman et al. Performance analysis and the study of the behavior of MPLS protocols
Han et al. Future data center networking: From low latency to deterministic latency
Barakabitze et al. Multipath protections and dynamic link recoveryin softwarized 5G networks using segment routing
CN103441930B (zh) 一种mpls te分组转发与管理方法及装置
CN1294723C (zh) 移动ip突发流量的缓解调节方法
WO2015135284A1 (zh) 数据流转发的控制方法及系统、计算机存储介质
Wadekar Enhanced ethernet for data center: Reliable, channelized and robust

Legal Events

Date Code Title Description
ENP Entry into the national phase

Ref document number: 2021857362

Country of ref document: EP

Effective date: 20230217

NENP Non-entry into the national phase

Ref country code: DE