WO2019042303A1 - 报文转发 - Google Patents

报文转发 Download PDF

Info

Publication number
WO2019042303A1
WO2019042303A1 PCT/CN2018/102840 CN2018102840W WO2019042303A1 WO 2019042303 A1 WO2019042303 A1 WO 2019042303A1 CN 2018102840 W CN2018102840 W CN 2018102840W WO 2019042303 A1 WO2019042303 A1 WO 2019042303A1
Authority
WO
WIPO (PCT)
Prior art keywords
packet
route
network segment
spine
node
Prior art date
Application number
PCT/CN2018/102840
Other languages
English (en)
French (fr)
Inventor
李�昊
Original Assignee
新华三技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 新华三技术有限公司 filed Critical 新华三技术有限公司
Priority to US16/643,479 priority Critical patent/US11165693B2/en
Publication of WO2019042303A1 publication Critical patent/WO2019042303A1/zh

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L45/00Routing or path finding of packets in data switching networks
    • H04L45/02Topology update or discovery
    • H04L45/08Learning-based routing, e.g. using neural networks or artificial intelligence
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L45/00Routing or path finding of packets in data switching networks
    • H04L45/02Topology update or discovery
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L45/00Routing or path finding of packets in data switching networks
    • H04L45/02Topology update or discovery
    • H04L45/033Topology update or discovery by updating distance vector protocols
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L45/00Routing or path finding of packets in data switching networks
    • H04L45/22Alternate routing
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L45/00Routing or path finding of packets in data switching networks
    • H04L45/44Distributed routing
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L45/00Routing or path finding of packets in data switching networks
    • H04L45/48Routing tree calculation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L45/00Routing or path finding of packets in data switching networks
    • H04L45/72Routing based on the source address
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L49/00Packet switching elements
    • H04L49/15Interconnection of switching modules
    • H04L49/1515Non-blocking multistage, e.g. Clos
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L49/00Packet switching elements
    • H04L49/25Routing or path finding in a switch fabric
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L45/00Routing or path finding of packets in data switching networks
    • H04L45/50Routing or path finding of packets in data switching networks using label swapping, e.g. multi-protocol label switch [MPLS]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/10Flow control; Congestion control
    • H04L47/12Avoiding congestion; Recovering from congestion
    • H04L47/125Avoiding congestion; Recovering from congestion by balancing the load, e.g. traffic engineering

Definitions

  • the data center has adopted the "Spine node + leaf node” networking mode, and the Leaf node is responsible for accessing the host.
  • the data center can adopt the distributed gateway deployment mode, that is, each Leaf node is a distributed gateway, so that the Leaf node can provide both Layer 2 access and IP (Internet Protocol). Gateway, which provides cross-network segment forwarding.
  • each leaf node needs to advertise the MAC (Medium Access Control) address and IP address of the host (which can be either a physical or a virtual machine).
  • MAC Medium Access Control
  • IP Internet Protocol
  • Other Leaf nodes, other Leaf nodes will save them in the routing table entry (in memory) and send them to the forwarding plane of the node, that is, the hardware chip.
  • Each server can also virtualize multiple virtual machines. For example, if there are 20,000 servers in the data center, each server is virtualized. With 10 virtual machines, the data center will reach 200,000 hosts, which has greater requirements on the routing table entry specifications and hardware table specifications of the distributed gateway. For cost reasons, distributed gateways are usually not particularly high-end devices. The routing entry specifications and hardware entry specifications are limited, and may not be sufficient to carry all routes within a large data center.
  • FIG. 1 is a schematic diagram of a Spine node + Leaf node network provided by the present application.
  • FIG. 3 is a flow chart of interaction between a Spine node and a Leaf node provided by the present application
  • FIG. 4 is a diagram showing an example of forwarding of a message before and after a host route is delivered;
  • FIG. 5 is a structural block diagram of a device provided by the present application.
  • first, second, third, etc. may be used to describe various information in this application, such information should not be limited to these terms. These terms are only used to distinguish the same type of information from each other.
  • first information may also be referred to as the second information without departing from the scope of the present application.
  • second information may also be referred to as the first information.
  • word "if” as used herein may be interpreted as "when” or "when” or "in response to determination.”
  • each Leaf node is a distributed gateway, and each distributed gateway can pass the standard EVPN (Ethernet Virtual Private Network).
  • the private network protocol establishes a VXLAN (Virtual Extensible LAN) tunnel to synchronize the tenant's routes.
  • each server can also virtualize multiple virtual machines, which may lead to the routing of tenants in the data center.
  • the number far exceeds the routing table entry specifications (memory) and hardware entry specifications (hardware chips) that the distributed gateway can support.
  • the hardware entry specification for the gateway cannot meet the data center size problem.
  • the following solution is adopted: after receiving the route, the control plane of each distributed gateway does not immediately deliver the route to its own forwarding plane. When a traffic arrives, the route is delivered to the forwarding plane based on the destination MAC address or destination IP address of the traffic.
  • this solution can only solve the problem that the hardware entry specification of the distributed gateway cannot match the number of routes. Since the distributed gateway still needs to save a complete route, the problem that the size of the routing entry cannot match the number of routes cannot be solved.
  • the present application provides a message forwarding scheme that solves the current dilemma by utilizing the capabilities of a centrally located Spine node in a data center.
  • the solution provided by the present application is described below based on the Spine node + Leaf node networking.
  • the core nodes in the network include two types. The first one is Leaf nodes 121, 122, and 123, which are used to access the host. The second type is The Spine nodes 111, 112 are responsible for connecting the Leaf nodes 121, 122, and 123.
  • the leaf nodes 121, 122, and 123 are IP gateways
  • the Spine nodes 111 and 112 are used as RRs (Route Reflectors). All the Leaf nodes 121, 122, and 123 establish BGP connections with the Spine nodes 111 and 112 (establishing BGP neighbors).
  • the Leaf nodes 121, 122, and 123 The route of the host that is online can be advertised to the Spine nodes 111 and 112, and the routes are issued by the Spine nodes 111 and 112 to other Leaf nodes.
  • the Spine nodes 111 and 112 are generally high-end devices. Therefore, the routing entry specifications and hardware entry specifications of the Spine nodes 111 and 112 can generally support all routes in the data center.
  • the route of the network segment is the route to a certain network segment. As shown in Figure 1, the route with the destination IP address of 10.1.0.0/16, 10.2.0.0/16, and 10.3.0.0/16 is the network segment route. The route of a specific host, as shown in Figure 1, the route with the destination IP address of 10.2.0.11./32 and 10.3.0.56/32 is the host route.
  • the leaf nodes 121, 122, and 123 will distribute the host routes of the hosts that are online and the corresponding network segment routes to the Spine nodes 111 and 112, and then the Spine nodes 111 and 112.
  • the host route is distributed to other Leaf nodes along with the network segment route.
  • a special routing publishing policy is deployed on the Spine nodes 111 and 112, including: initial Spine nodes 111 and 112.
  • the network segment routes are advertised only to the leaf nodes 121, 122, and 123.
  • the host routes are advertised as needed.
  • the next hop of the corresponding route points to the Spine nodes 111 and 112. Therefore, when there are multiple Spine nodes 111 and 112 as RRs, an equal-cost route is formed for load sharing.
  • the network segment route and host route of the Leaf node 122 and the Leaf node 123 are both advertised, but in the initial case, only the network segment routes 10.2.0.0/16 and 10.3.0.0/16 are advertised to the Leaf node. 121, and an equivalent route is formed on the Leaf node 121. That is, the next hop of the equal-cost route from the Leaf node 121 to the network segment 10.2.0.0/16 is the Spine node 111 and the Spine node 112, and the next hop of the equal-cost route to the network segment 10.3.0.0/16 is also It is a Spine node 111 and a Spine node 112.
  • a data center includes a Spine node and a Leaf node, and any Leaf node establishes a BGP neighbor with each Spine node, and any Leaf node is an IP gateway (that is, the data center adopts a distributed gateway. Architecture), then any Leaf node performs the following steps during its operation:
  • Step 201 Publish network segment routes and host routes to the Spine node.
  • Step 202 Learn the network segment route advertised by the Spine node.
  • Step 203 When the first packet hits the network segment route learned by the leaf node, the first packet is sent to the Spine node corresponding to the next hop of the network segment route that is hit, so that the Spine node will The first packet is sent to the leaf node corresponding to the next hop of the hitting host route.
  • the Spine node in the present application learns the network segment route and the host route at the same time, and the leaf node learns the network segment route from the Spine node first, and the packet can be implemented through the network segment route.
  • the Spine node After being sent to the Spine node, the Spine node sends the packet according to the host route. Therefore, the leaf node can forward the packet even if it does not learn the host route from the Spine node. Because the number of network segment routes in the data center is far less than the number of host routes, the routing entries and hardware entries on the Leaf node can meet the routing of a small number of network segments in the data center.
  • the item specifications and hardware table specifications are limited.
  • FIG. 3 shows a flow chart of the interaction between the Spine node and any Leaf node, which may include the following steps:
  • Step 301 The Leaf node issues a network segment route and a host route to the Spine node.
  • next hop of the network segment route and the host route advertised by the Leaf node to the Spine node is the IP address of the Leaf node itself.
  • Step 302 The Leaf node learns the network segment route of other Leaf nodes issued by the Spine node.
  • the next hop of the network segment route is changed to itself.
  • the next hop of the network segment route learned by the leaf node from the Spine node is the IP address of the Spine node.
  • Step 303 After the control layer learns the route of the network segment advertised by the Spine node, the control layer sends the route to the forwarding layer.
  • Step 304 When the first packet hits the route of the network segment, the forwarding layer of the leaf node encapsulates the first packet, and sends the encapsulated first packet to the next hop corresponding to the route of the hit network segment. Spine node.
  • the forwarding layer may send the encapsulated first packet to a Spine node that meets the preset load sharing policy.
  • the encapsulated first packet in step 304 may have two layers of IP headers, wherein the inner layer source IP address is the host IP address of the packet sender, and the inner layer destination IP address is the host IP address of the packet receiver.
  • the outer source IP address is the IP address of the source leaf node (that is, the leaf node connected to the packet sender), and the outer destination IP address is the IP address of the Spine node.
  • Step 305 The Spine node decapsulates the received packet to obtain the first packet, and queries the host route that matches the destination IP address of the first packet (that is, the host IP address of the packet receiver). The first packet is encapsulated, and the encapsulated first packet is sent to the leaf node corresponding to the next hop of the matching host route; if not, the first packet is discarded.
  • the first packet encapsulated in step 305 also has two layers of IP headers, wherein the inner layer source IP address and the inner layer destination IP address remain unchanged, and are still the host IP address of the packet sender and the packet receiver.
  • the IP address of the outer packet of the first packet is the IP address of the Spine node
  • the destination IP address of the outer packet is the IP address of the destination leaf node (that is, the leaf node connected to the receiver of the packet).
  • step 301 to step 305 implement the hierarchical forwarding process of the packet.
  • the complete host route is on the Spine node.
  • the source leaf node sends the packet to the Spine node.
  • the Spine node then forwards the packet to the destination leaf node connected to the packet receiver.
  • the packet needs to take two VXLAN tunnels, which are respectively VXLAN tunnels between the source Leaf node and the Spine node, and A VXLAN tunnel between the Spine node and the destination Leaf node.
  • the packet travels twice and the VXLAN tunnel compares the packets directly to the VXLAN tunnel between the source node and the destination Leaf node.
  • the physical path of the two methods is the same, but in some scenarios, for statistics, management, etc. Demand, or hope that the message can logically do not need to bypass the Spine node, but can directly transmit through the VXLAN tunnel between the Leaf nodes.
  • the forwarding layer of the leaf node needs to trigger the control plane while routing the packet according to the local network segment, and request the Spine node to match the host to be forwarded. routing.
  • a feasible way is as follows:
  • the control plane of the Leaf node can add an action attribute to the learned network segment route, and deliver the network segment route with the action attribute to the forwarding layer.
  • the action attribute is used to instruct the forwarding layer to notify the Spine node of the host route matching the packet when the packet hits the route of the network segment.
  • the forwarding layer may perform the following steps while performing step 304:
  • Step 306 When the first packet hits the network segment route, the forwarding layer on the Leaf node also notifies the control layer to initiate a routing request to the Spine node.
  • the manner in which the forwarding layer notifies the control plane to initiate a routing request may include but is not limited to the following two types:
  • Manner 1 Add a special tag to the first packet of the route that is sent to the network segment, and send the first packet with the special tag to the control plane.
  • the special tag is used to indicate that the control plane does not need to forward the first packet.
  • the first message can be sent to the control plane through a dedicated message delivery interface.
  • the reason for adding a special flag to the first packet is that the forwarding layer has forwarded the first packet to avoid repeated transmission of the first packet.
  • the control layer has limited processing capability. If the control layer is responsible for forwarding the message, it may cause problems such as packet loss.
  • Manner 2 The information such as the destination IP address is extracted from the first packet routed by the hit network segment, and the extracted information is sent to the control plane through an interface other than the interface sent by the packet.
  • the extracted destination IP address is used to make the control plane aware of which host route is specifically requested from the Spine node.
  • Step 307 After receiving the notification of the forwarding layer, the control plane of the Leaf node requests the Spine node to obtain a host route matching the first packet.
  • the manner in which the control layer of the Leaf node requests the Spine node to request the host route includes but is not limited to the following two types:
  • the control plane can initiate a routing request to the Spine node through TCP (Transmission Control Protocol) or UDP (User Datagram Protoco).
  • TCP Transmission Control Protocol
  • UDP User Datagram Protoco
  • a module independent of the routing module may be added to the Leaf node and the Spine node, and the routing request interaction between the Leaf node and the Spine node is specifically implemented.
  • This approach is implemented through a proprietary protocol and is independent of existing routing protocols.
  • the control plane can initiate a routing request to the Spine node by extending the routing protocol.
  • the second method needs to extend the routing protocol, and then implement routing request interaction through the existing routing module on the Leaf node and the Spine node. .
  • Step 308 The Spine node advertises the host route matching the first packet to the Leaf node according to the request of the Leaf node.
  • the Spine node does not modify the next hop of the host route when the host route is advertised.
  • the next hop of the host route is still the IP address of the original Leaf node.
  • the Spine node when the Spine node does not have a host route that matches the first packet, the Spine node can send a route request suppression command to the leaf node to prevent the leaf node from initiating the same route request to the Spine node. Indicates that there is no host route matching the first match on the Spine node. The leaf node does not request the Spine node to match the host route of the first packet within the preset duration after receiving the route request suppression command. It should be noted that the route request suppression command does not affect the route that the leaf node requests from the Spine node to match other packets.
  • the other packet here refers to the destination IP address that is different from the destination IP address of the first packet. Message.
  • Step 309 After the control layer learns the host route that is matched by the Spine node and matches the first packet, the control layer sends the host route to the forwarding layer.
  • Step 310 When the second packet hits the host route matching the first packet, the forwarding layer of the Leaf node encapsulates the second packet, and sends the encapsulated second packet to the hit host route. One hop corresponding to the Leaf node.
  • the second packet encapsulated in step 310 has two IP headers, wherein the inner source IP address is the host IP address of the packet sender, and the inner destination IP address is the host IP address of the packet receiver.
  • the IP address is the IP address of the source Leaf node, and the outer destination IP address is the IP address of the destination Leaf node.
  • the message is logically directly from the Leaf node to another Leaf node, and the Spine node is not logically bypassed.
  • the packet will pass through the Spine node.
  • the Spine node can forward the packet after receiving the second packet. The process of encapsulating messages.
  • the route of the other leaf nodes is not required to be learned on the leaf node, and the size of the routing entry and the size of the hardware entry are reduced.
  • the leaf node can obtain the corresponding host route to the Spine node as needed, so that the packet is forwarded according to the ideal path.
  • the host route since the leaf node only stores the host route with the traffic forwarding requirement, the host route It does not occupy too many routing entries and hardware entries on the Leaf node.
  • the control layer is set. If no packet is sent to the host route, the control plane of the leaf node can age the host and notify the forwarding plane to age the host route. The forwarding plane ages the host route after receiving notification from the control plane. The routing entries and hardware entries consumed by the host route are released in time. As for network segment routing, there is no need to do aging.
  • the Spine node 111 and the Spine node 112 are route reflectors, and the Leaf node 121-Leaf node 123 is an IP gateway.
  • the Leaf node 122 has a network segment with a destination IP address of 10.2.0.0/16 and a host route with a destination address of 10.2.0.11./32.
  • the leaf node 123 has a network segment with a destination IP address of 10.3.0.0/16. Routing and a host route with a destination address of 10.3.0.56./32.
  • the network segment route and the host route on the Leaf node 122 and the Leaf node 123 are advertised to the Spine node 111 and the Spine node 112. However, in the initial case, the Leaf node 121 only learns the network from the Spine node 111 and the Spine node 112. Segment routing. In addition, the network segment route learned by the leaf node 121 forms an equal-cost route on the Leaf node 121, and the next hop is the Spine node 121 and the Spine node 122, respectively.
  • the control plane adds action attributes to the network segment routes of the received destination IP addresses 10.2.0.0/16 and 10.3.0.0/16, and then delivers the network segment routes to the forwarding plane.
  • the attribute enables the forwarding layer to notify the control layer to request the host route matching the to-be-forwarded packet to the Spine node 111 or the Spine node 112 while forwarding the packet according to the network segment route.
  • the leaf node 121 When the leaf node 121 receives the packet with the destination IP address of 10.3.0.56, it can only match the network segment whose destination IP address is 10.3.0.0/16. According to the specific load sharing principle, the forwarding layer will The packet is sent to a Spine node, such as the Spine node 111, through the VXLAN tunnel.
  • a Spine node such as the Spine node 111
  • the destination gateway is the leaf node 123, and the Spine node 111 sends the packet to the Leaf node 123 through the VXLAN tunnel to the Leaf node 123.
  • the forwarding of the message passes through the VXLAN tunnel of the Leaf node 121 ⁇ Spine node 111, the Spine node 111 ⁇ the Leaf node 123.
  • the forwarding layer also notifies the control plane to request the Spine node 111 to obtain the host route with the destination IP address of 10.3.0.56.
  • the Spine node 111 returns the host route requested by Leaf1 to the Leaf node 121.
  • the control plane of the leaf node 121 After receiving the route of the host with the destination IP address of 10.3.0.56/32, the control plane of the leaf node 121 sends the route to the forwarding plane. The leaf node 121 sends the packet to the VXLAN tunnel of the Leaf node 121 ⁇ Leaf node 123 for the packet with the destination IP address of 10.3.0.56.
  • the packet is sent through the VXLAN tunnel of the Leaf node 121 ⁇ Leaf node 123 (shown by the dotted line 3 in FIG. 4).
  • the packet needs to be sent through the VXLAN tunnel of the Leaf node 121 ⁇ Spine node 111, the Spine node 111 ⁇ the Leaf node 123 (shown by the dotted line 1 and the dotted line 2 in FIG. 4).
  • the present application provides a Leaf node device, which establishes a BGP neighbor with a Spine node device.
  • the hardware environment of the Leaf node device usually includes at least a processor 501 such as a CPU (Central Processing Unit) and a hardware chip 502.
  • the leaf node device may further include other hardware, such as a memory, and the like, and interact with the processor 501 and the hardware chip 502 to implement the operations provided by the present invention.
  • the interaction between the other hardware and the processor 501 and the hardware chip 502 may be used.
  • There is a technical solution as long as the functions of the processor 501 and the hardware chip 502 described above can be implemented.
  • the processor 501 can be used to implement the control plane function of the Leaf node device, and the hardware chip 502 can be used to implement the forwarding layer function of the Leaf node device, as follows:
  • the processor 501 is configured to send a network segment route and a host route to the Spine node, learn the network segment route advertised by the Spine node, and deliver the learned network segment route to the hardware chip 502.
  • the hardware chip 502 is configured to: when the first packet hits the network segment route, send the first packet to the Spine node corresponding to the next hop of the hit network segment route, so that the Spine node The first packet is sent to the leaf node corresponding to the next hop of the hitting host route.
  • the hardware chip 502 is configured to send the first packet to a preset load sharing when the next hop of the hit network segment route corresponds to multiple Spine nodes.
  • a Spine node for the policy is configured to send the first packet to a preset load sharing when the next hop of the hit network segment route corresponds to multiple Spine nodes.
  • the hardware chip 502 is further configured to notify the processor 501 to send the first packet to the Spine node corresponding to the next hop of the hit network segment route.
  • the Spine node requests a host route that matches the first packet.
  • the processor 501 is further configured to: after receiving the notification of the hardware chip 502, request the host route that matches the first packet to the Spine node, and learn the relationship between the Spine node and the Spine node. And the host route that is matched by the first packet is sent to the hardware chip 502.
  • the hardware chip 502 is further configured to: when the second packet hits the host route that matches the first packet, send the second packet to the Leaf node corresponding to the next hop of the host route.
  • the processor 501 is further configured to add an action attribute to the learned network segment route and send the action attribute when the learned network segment route is sent to the hardware chip.
  • the network segment route is delivered to the hardware chip 502.
  • the hardware chip 502 is further configured to request the Spine node according to the action attribute included in the network segment route. The host route matched by the first packet.
  • the processor 501 is further configured to: if the host does not have a packet matching the first packet in the preset duration of the host routing, the aging is performed. The host routes and notifies the hardware chip 502 to age the host route. such,
  • the hardware chip 502 is further configured to age the host route after receiving the notification from the processor 501.
  • the processor 501 is further configured to receive a route request suppression command sent by the Spine node, where The routing request suppression instruction indicates that there is no host route matching the first packet; and the host route matching the first packet is prohibited from being requested from the Spine node within a preset duration.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

向骨干节点发布网段路由和主机路由;学习骨干节点发布的网段路由;当第一报文命中网段路由时,将第一报文发送给命中的网段路由的下一跳对应的骨干节点,以使该骨干节点将第一报文发送给命中的主机路由的下一跳对应的叶子节点。

Description

报文转发
相关申请的交叉引用
本专利申请要求于2017年8月29日提交的、申请号为201710756807.1、发明名称为“一种报文转发方法和叶子节点设备”的中国专利申请的优先权,该申请的全文以引用的方式并入本文中。
背景技术
目前数据中心已普遍采用“骨干(Spine)节点+叶子(Leaf)节点”的组网方式,Leaf节点负责接入主机。
在一种方式下,数据中心可以采取分布式网关的部署方式,即每个Leaf节点都是一个分布式网关,这样Leaf节点既可提供二层接入,也可作为IP(Internet Protocol,网际协议)网关,提供跨网段转发。在分布式网关的部署方式下,各Leaf节点需要将本地上线的主机(可以是物理机,也可以是虚拟机)的MAC(Medium Access Control,媒体接入控制)地址和IP地址作为路由发布给其它Leaf节点,其它Leaf节点收到后会将其保存在路由表项(内存中),并下发到本节点的转发平面,即硬件芯片。
但随着数据中心规模的增大,一个数据中心内可能会有上万台服务器,每台服务器还可以虚拟化出多个虚拟机,比如假设数据中心内有2万台服务器,每台服务器虚拟出10个虚拟机,那么数据中心内将达到200000个主机,这对分布式网关的路由表项规格和硬件表项规格有了较大的要求。而出于成本的考虑,分布式网关通常不会是特别高端的设备,路由表项规格和硬件表项规格均有限,可能不足以承载规模较大的数据中心内部的所有路由。
附图说明
图1是本申请提供的一种Spine节点+Leaf节点组网示意图;
图2是本申请提供的方法流程图;
图3是本申请提供的Spine节点与Leaf节点之间的交互流程图;
图4是本申请提供的主机路由下发前后报文的转发示例图;
图5是本申请提供的装置结构框图。
具体实施方式
这里将详细地对示例性实施例进行说明,其示例表示在附图中。下面的描述涉及附图时,除非另有表示,不同附图中的相同数字表示相同或相似的要素。以下示例性实施例中所描述的实施方式并不代表与本申请相一致的所有实施方式。相反,它们仅是与如所附权利要求书中所详述的、本申请的一些方面相一致的装置和方法的例子。
在本申请使用的术语是仅仅出于描述特定实施例的目的,而非旨在限制本申请。在本申请和所附权利要求书中所使用的单数形式的“一种”、“所述”和“该”也旨在包括多数形式,除非上下文清楚地表示其他含义。还应当理解,本文中使用的术语“和/或”是指并包含一个或多个相关联的列出项目的任何或所有可能组合。
应当理解,尽管在本申请可能采用术语第一、第二、第三等来描述各种信息,但这些信息不应限于这些术语。这些术语仅用来将同一类型的信息彼此区分开。例如,在不脱离本申请范围的情况下,第一信息也可以被称为第二信息,类似地,第二信息也可以被称为第一信息。取决于语境,如在此所使用的词语“如果”可以被解释成为“在……时”或“当……时”或“响应于确定”。
目前的数据中心部署,越来越倾向于分布式网关的部署方式,即每个Leaf节点都是一个分布式网关,各分布式网关之间可以通过标准的EVPN(Ethernet Virtual Private Network,以太网虚拟专用网络)协议建立VXLAN(Virtual Extensible LAN,虚拟扩展局域网)隧道,同步租户的路由。
但随着数据中心规模的增大和虚拟化技术的发展,一个数据中心内可能会有上万台服务器,每台服务器还可以虚拟化出多个虚拟机,这可能会导致数据中心内租户的路由数量远远超过分布式网关可支持的路由表项规格(内存)和硬件表项规格(硬件芯片)。
针对网关的硬件表项规格不能满足数据中心规模的问题,有如下一种解决方案:各分布式网关的控制层面在收到路由后,不立即将路由下发到自身的转发层面,而是在有流量到达时,根据流量的目的MAC地址或目的IP地址,再将该路由下发到转发层面。但此方案只能解决分布式网关的硬件表项规格不能匹配路由数量的问题,由于分布式网关还是需要保存完整的路由,其路由表项规格大小不能匹配路由数量的问题还是无法解决。
本申请提供了一种报文转发方案,通过利用数据中心内处于中心位置的Spine节点的能力,来解决目前所面临的困境。为便于描述,下文基于Spine节点+Leaf节点组网, 来说明本申请提供的方案。
参考图1,为一种典型的Spine节点+Leaf节点组网,该组网中的核心节点包括两种,第一种为Leaf节点121、122、123,用于接入主机;第二种为Spine节点111、112,负责连接Leaf节点121、122、123。
在Leaf节点121、122、123为IP网关的场景下,为了避免网关太多导致连接太多,通常不会采用网关之间建立BGP(Border Gateway Protocol,边界网关协议)连接的方式,而是使用Spine节点111、112作为RR(Route Reflector,路由反射器),所有Leaf节点121、122、123都和Spine节点111、112建立BGP连接(建立BGP邻居),这样,各Leaf节点121、122、123可以将本地上线的主机的路由发布给Spine节点111、112,由Spine节点111、112将路由发布给其它Leaf节点。实际应用中,Spine节点111、112一般为高端设备,因此Spine节点111、112的路由表项规格和硬件表项规格一般可以支持数据中心内的所有路由。
对于路由,可以包括网段路由和主机路由。网段路由即到达某一网段的路由,如图1中,目的IP地址为10.1.0.0/16、10.2.0.0/16和10.3.0.0/16的路由是网段路由;主机路由即到达某一个具体主机的路由,如图1中,目的IP地址为10.2.0.11./32和10.3.0.56/32的路由是主机路由。
在Spine节点111、112作为RR的场景下,Leaf节点121、122、123会将本地上线的主机的主机路由以及对应的网段路由一同发布给Spine节点111、112,再由Spine节点111、112将该主机路由和网段路由一同发布给其它Leaf节点。
而本申请中,为了节省Leaf节点的路由表项和硬件表项而又不影响报文的正常转发,在Spine节点111、112上部署了特殊的路由发布策略,包括:Spine节点111、112初始只向各Leaf节点121、122、123发布网段路由,主机路由则是按需发布。而且,在各Leaf节点121、122、123上,对于本节点上不存在的网段,其对应的路由的下一跳都是指向Spine节点111、112。从而在存在多个Spine节点111、112作为RR的情况下,会形成等价路由做负载分担。
例如,在图1中,Leaf节点122和Leaf节点123的网段路由和主机路由都会对外发布,但是初始情况下,只有网段路由10.2.0.0/16和10.3.0.0/16会发布到Leaf节点121上,并且在Leaf节点121上形成等价路由。也就是说,从Leaf节点121到达网段10.2.0.0/16的等价路由的下一跳为Spine节点111和Spine节点112,到达网段10.3.0.0/16 的等价路由的下一跳也为Spine节点111和Spine节点112。
基于上面描述,下面对本申请提供的方法进行描述:
参见图2,在一个实施方案中,数据中心内包括Spine节点和Leaf节点,任一Leaf节点与每个Spine节点建立BGP邻居,任一Leaf节点均为IP网关(即该数据中心采用分布式网关架构),则任一Leaf节点在运行过程中执行如下步骤:
步骤201:向Spine节点发布网段路由和主机路由。
步骤202:学习Spine节点发布的网段路由。
步骤203:当第一报文命中该leaf节点学习到的网段路由时,将该第一报文发送给命中的该网段路由的下一跳对应的Spine节点,以使该Spine节点将该第一报文发送给命中的主机路由的下一跳对应的Leaf节点。
从步骤201至步骤203可以看出,本申请中Spine节点会同时学习网段路由和主机路由,而Leaf节点会先从Spine节点处学习到网段路由,通过该网段路由可以实现将报文发送到Spine节点,最终由Spine节点将报文根据主机路由发送,如此Leaf节点即使不从Spine节点学习主机路由,也可以实现报文的转发。由于数据中心内的网段路由的数量远少于主机路由的数量,因此Leaf节点上的路由表项和硬件表项承载能力能够满足数据中心中少量的网段路由,从而可以解决Leaf节点的路由表项规格及硬件表项规格受限的问题。
请参考图3,以下通过Spine节点和Leaf节点之间的交互,来阐述本申请的具体实施过程。图3所示为Spine节点和任一Leaf节点之间的交互流程图,可包括以下步骤:
步骤301:Leaf节点向Spine节点发布网段路由和主机路由。
这里,Leaf节点向Spine节点发布的网段路由和主机路由的下一跳为该Leaf节点自身的IP地址。
步骤302:Leaf节点学习Spine节点发布的其它Leaf节点的网段路由。
Spine节点在发布网段路由时会将网段路由的下一跳修改为自己,从而这里Leaf节点从Spine节点学习到的网段路由的下一跳为Spine节点的IP地址。
步骤303:Leaf节点上,控制层面学习到Spine节点发布的网段路由后,将该网段路由下发到转发层面。
步骤304:当第一报文命中该网段路由时,Leaf节点的转发层面对第一报文进行封 装,并将封装后的第一报文发送给命中的网段路由的下一跳对应的Spine节点。
在一种实施方式中,当命中的网段路由的下一跳对应多个Spine节点时,转发层面可以将封装后的第一报文发送给满足预设的负载分担策略的一个Spine节点。
这里,步骤304中封装后的第一报文可以有两层IP头,其中内层源IP地址为报文发送方的主机IP地址,内层目的IP地址为报文接收方的主机IP地址,外层源IP地址为源Leaf节点(即报文发送方连接的Leaf节点)的IP地址,外层目的IP地址为Spine节点的IP地址。
步骤305:Spine节点对收到的报文进行解封装得到第一报文,查询与第一报文的目的IP地址(即报文接收方的主机IP地址)匹配的主机路由,若查询到,则对第一报文进行封装,并将封装后的第一报文发送给匹配主机路由的下一跳对应的Leaf节点;若查询不到,则丢弃该第一报文。
这里,步骤305中封装后的第一报文也有两层IP头,其中内层源IP地址和内层目的IP地址维持不变,仍为报文发送方和报文接收方的主机IP地址,稍有不同的是,此处封装后的第一报文外层源IP地址为Spine节点的IP地址,外层目的IP地址为目的Leaf节点(即报文接收方连接的Leaf节点)的IP地址。
如此,上述步骤301至步骤305即实现了报文的分级转发过程,在此过程中,由于报文发送方连接的源Leaf节点上只保存了网段路由,而Spine节点上有完整的主机路由和网段路由,因此源Leaf节点首先将命中网段路由的报文发送给Spine节点,Spine节点再将该报文转发给报文接收方连接的目的Leaf节点。
虽然上述步骤301至步骤305可以节省Leaf节点的路由表项和硬件表项,但是在转发过程中,报文需要走两次VXLAN隧道,分别是源Leaf节点与Spine节点之间的VXLAN隧道,以及Spine节点与目的Leaf节点之间的VXLAN隧道。报文走两次VXLAN隧道相比较报文直接走源Leaf节点与目的Leaf节点之间的VXLAN隧道,两种方式经过的物理路径是一样的,但在某些场景下,出于统计、管理等需求,还是希望报文能够在逻辑上不需要绕行Spine节点,而是能直接通过Leaf节点之间的VXLAN隧道进行传输。
为了实现报文能够在逻辑上不绕行Spine节点,Leaf节点的转发层面在根据本地的网段路由转发报文的同时,还需要触发控制层面,向Spine节点请求与待转发报文匹配的主机路由。为实现这一点,一种可行的方式是:在上述步骤303中,Leaf节点的控制 层面可以为学习到的网段路由添加动作属性,并将添加有动作属性的网段路由下发到转发层面,该动作属性用于指示转发层面在有报文命中此网段路由时,通知控制层面向Spine节点请求与该报文匹配的主机路由。
基于步骤303控制层面增加的如上处理,在上述第一报文命中该添加了动作属性的网段路由后,转发层面在执行步骤304的同时,还可以执行如下步骤:
步骤306:在第一报文命中该网段路由时,Leaf节点上的转发层面还会通知控制层面向Spine节点发起路由请求。
作为一个实施例,转发层面通知控制层面发起路由请求的方式可以包括但不限于以下两种:
方式一:为上述命中网段路由的第一报文添加特殊标记,并将添加有特殊标记的第一报文上送到控制层面,该特殊标记用于指示控制层面无需转发此第一报文。这里,可以通过专门的报文上送接口将第一报文上送到控制层面。
方式一中,之所以为第一报文添加特殊标记,一来是因为转发层面已对第一报文做过转发,避免第一报文的重复发送;二来是因为控制层面的处理能力有限,如果让控制层面肩负报文的转发,可能会产生丢包等问题。
方式二:从上述命中网段路由的第一报文中提取目的IP地址等信息,并将提取的信息通过除报文上送接口以外的其它接口上送到控制层面。这里,提取的目的IP地址,用于令控制层面感知具体要向Spine节点请求哪一条主机路由。
步骤307:Leaf节点的控制层面收到转发层面的通知后,向Spine节点请求与第一报文匹配的主机路由。
作为一个实施例,Leaf节点的控制层面向Spine节点请求主机路由的方式包括但不限于以下两种:
方式一:控制层面可以通过TCP(Transmission Control Protocol,传输控制协议)或UDP(User Datagram Protoco,用户数据报协议),向Spine节点发起路由请求。
具体的,在方式一中,可以在Leaf节点和Spine节点上各增加一个独立于路由模块的模块,专门用来实现Leaf节点和Spine节点之间的路由请求交互。此种方式通过私有协议实现,与现有的路由协议无关。
方式二:控制层面可以通过扩展路由协议,向Spine节点发起路由请求。
由于目前常用的路由协议,如BGP协议一般不支持一个设备向另一个设备单独请求路由,因此方式二需要对路由协议进行扩展,然后通过Leaf节点和Spine节点上已有的路由模块实现路由请求交互。
步骤308:Spine节点根据Leaf节点的请求,将与第一报文匹配的主机路由发布给该Leaf节点。
这里,Spine节点在发布主机路由时不修改主机路由的下一跳,主机路由的下一跳仍为原始Leaf节点的IP地址。
作为一个实施方式,当Spine节点本地不存在与第一报文匹配的主机路由时,为了避免Leaf节点频繁地向Spine节点发起同一路由请求,Spine节点可以向Leaf节点下发路由请求抑制指令,以指示Spine节点上不存在与第一匹配的主机路由。Leaf节点在收到该路由请求抑制指令之后的预设时长内,不再向Spine节点请求与第一报文匹配的主机路由。需要说明的是,此路由请求抑制指令不会影响Leaf节点向Spine节点请求与其它报文匹配的主机路由,这里的其它报文指的是目的IP地址有别于第一报文的目的IP地址的报文。
步骤309:Leaf节点上,控制层面学习到Spine节点发布的与第一报文匹配的主机路由后,将该主机路由下发到转发层面。
步骤310:当第二报文命中与第一报文匹配的主机路由时,Leaf节点的转发层面对第二报文进行封装,并将封装后的第二报文发送给命中的主机路由的下一跳对应的Leaf节点。
在转发层面同时存在主机路由以及网段路由的情况下,按照路由最长匹配原则,报文会优先命中主机路由。步骤310中封装后的第二报文有两层IP头,其中内层源IP地址为报文发送方的主机IP地址,内层目的IP地址为报文接收方的主机IP地址,外层源IP地址为源Leaf节点的IP地址,外层目的IP地址为目的Leaf节点的IP地址。
如此,上述步骤306至步骤310即实现报文在逻辑上直接从Leaf节点到达另一Leaf节点,在逻辑上不绕行Spine节点。不过在物理路径上,报文还是会经过Spine节点,与步骤305不同的是,Spine节点收到封装后的第二报文后直接转发即可,不需要经历解封装报文、查找主机路由、封装报文这一过程。
通过上述步骤301至步骤310,可以实现在没有报文的时候,Leaf节点上无需学习其它Leaf节点的主机路由,减小了路由表项规模和硬件表项规模。在有报文的时候, Leaf节点又能够按需向Spine节点获取对应的主机路由,使报文按照理想的路径转发;同时,由于Leaf节点上只保存有流量转发需求的主机路由,因此主机路由在Leaf节点上也不会占用过多的路由表项和硬件表项。
作为一个实施例,为了进一步减小主机路由对Leaf节点的路由表项的占用情况,在Leaf节点的控制层面学习到Spine节点发布的主机路由并将主机路由下发到转发层面之后,如果在设定时长内没有报文命中该主机路由,则Leaf节点的控制层面可以老化该主机路由,同时通知转发层面老化该主机路由;转发层面在收到控制层面的通知后老化该主机路由;如此,可以及时释放该主机路由消耗的路由表项和硬件表项。至于网段路由,则不需要做老化。
为了使本领域技术人员更加清楚和明白,以下结合图1所示的Spine节点+Leaf节点组网场景来描述本申请的实现过程。
图1中,Spine节点111和Spine节点112为路由反射器,Leaf节点121-Leaf节点123为IP网关。Leaf节点122上有一目的IP地址为10.2.0.0/16的网段路由以及一目的地址为10.2.0.11./32的主机路由,Leaf节点123上有一目的IP地址为10.3.0.0/16的网段路由以及一目的地址为10.3.0.56./32的主机路由。
(1)Leaf节点122和Leaf节点123上的网段路由和主机路由都会发布给Spine节点111和Spine节点112,但初始情况下,Leaf节点121只会从Spine节点111和Spine节点112学习到网段路由。并且,Leaf节点121学习到的网段路由在Leaf节点121上会形成等价路由,下一跳分别是Spine节点121和Spine节点122。
(2)Leaf节点121上,控制层面为收到的目的IP地址为10.2.0.0/16和10.3.0.0/16的网段路由添加动作属性,然后将网段路由下发到转发层面,该动作属性使得转发层面在根据网段路由转发报文的同时,还能通知控制层面向Spine节点111或Spine节点112请求与待转发报文匹配的主机路由。
(3)当Leaf节点121收到目的IP地址为10.3.0.56的报文时,由于只能匹配到目的IP地址为10.3.0.0/16的网段路由,根据具体的负载分担原则,转发层面会将报文通过VXLAN隧道发给某一个Spine节点,如Spine节点111。
(4)由于Spine节点111上有完整的主机路由和网段路由,查找后得到目的网关为Leaf节点123,Spine节点111将该报文通过到Leaf节点123的VXLAN隧道发送给Leaf节点123。
这时,该报文的转发会经过Leaf节点121→Spine节点111,Spine节点111→Leaf节点123的VXLAN隧道。
(5)在Leaf节点121第一次收到目的IP地址为10.3.0.56的报文时,其转发层面还会通知控制层面,向Spine节点111请求目的IP地址为10.3.0.56的主机路由。
(6)Spine节点111将Leaf1所请求的主机路由返回给Leaf节点121。
(7)Leaf节点121的控制层面收到目的IP地址为10.3.0.56/32的主机路由后,将其下发给转发层面。后续对于目的IP地址为10.3.0.56的报文,Leaf节点121会将报文通过Leaf节点121→Leaf节点123的VXLAN隧道发送出去。
如图4所示,在Leaf节点121得到10.3.0.56/32的主机路由后,报文会通过Leaf节点121→Leaf节点123的VXLAN隧道(如图4中的虚线3所示)发送,而在Leaf节点121得到该主机路由之前,报文需要经过Leaf节点121→Spine节点111、Spine节点111→Leaf节点123的VXLAN隧道(如图4中的虚线1和虚线2所示)发送。
以上对本申请提供的方法进行了描述。下面对本申请提供的装置进行描述。
本申请提供一种Leaf节点设备,该Leaf节点设备与Spine节点设备建立BGP邻居。请参考图5,Leaf节点设备的硬件环境通常至少包括例如CPU(Central Processing Unit,中央处理器)的处理器501以及硬件芯片502。当然,Leaf节点设备可能还包括其他硬件、例如存储器等,与处理器501、硬件芯片502交互实现本发明上述提供的操作,其中,上述其他硬件与处理器501、硬件芯片502的交互可采用现有技术的方案,只要能够实现上述处理器501和硬件芯片502的功能即可。
处理器501可以用于实现Leaf节点设备的控制层面功能,硬件芯片502可以用于实现Leaf节点设备的转发层面功能,具体如下:
所述处理器501用于向Spine节点发布网段路由和主机路由,学习所述Spine节点发布的网段路由,并将学习到的网段路由下发到所述硬件芯片502。
所述硬件芯片502,用于当第一报文命中网段路由时,将所述第一报文发送给命中的网段路由的下一跳对应的Spine节点,以使该Spine节点将所述第一报文发送给命中的主机路由的下一跳对应的Leaf节点。
在其中一种实施方式中,所述硬件芯片502,用于在所述命中的网段路由的下一跳对应多个Spine节点时,将所述第一报文发送给满足预设的负载分担策略的一个Spine 节点。
在其中一种实施方式中,所述硬件芯片502,在将所述第一报文发送给命中的网段路由的下一跳对应的Spine节点时,还用于通知所述处理器501向所述Spine节点请求与所述第一报文匹配的主机路由。
所述处理器501,还用于在收到所述硬件芯片502的通知后,向所述Spine节点请求所述与所述第一报文匹配的主机路由,并学习所述Spine节点发布的与所述第一报文匹配的主机路由;还用于将学习到的与所述第一报文匹配的主机路由下发到所述硬件芯片502。
所述硬件芯片502,还用于当第二报文命中与第一报文匹配的主机路由时,将所述第二报文发送给该主机路由的下一跳对应的Leaf节点。
在其中一种实施方式中,所述处理器501,在将学习到的网段路由下发到所述硬件芯片时,还用于为学习到的网段路由添加动作属性,并将包含动作属性的网段路由下发到所述硬件芯片502。
所述硬件芯片502在将所述第一报文发送给命中的网段路由的下一跳对应的Spine节点时,还用于根据该网段路由包含的动作属性,向所述Spine节点请求与所述第一报文匹配的主机路由。
在其中一种实施方式中,所述处理器501,还用于若在学习到所述主机路由起的预设时长内没有报文命中与所述第一报文匹配的主机路由,则老化该主机路由,并通知所述硬件芯片502老化该主机路由。这样,
所述硬件芯片502,还用于在收到所述处理器501的通知后,老化该主机路由。
在其中一种实施方式中,在向所述Spine节点请求与所述第一报文匹配的主机路由之后,所述处理器501,还用于接收所述Spine节点发送的路由请求抑制指令,所述路由请求抑制指令指示不存在与所述第一报文匹配的主机路由;在预设时长内,禁止向所述Spine节点请求与所述第一报文匹配的主机路由。
以上所述仅为本申请的较佳实施例而已,并不用以限制本申请,凡在本申请的精神和原则之内,所做的任何修改、等同替换、改进等,均应包含在本申请保护的范围之内。

Claims (12)

  1. 一种报文转发方法,应用于包括Spine节点和Leaf节点的数据中心内的任一Leaf节点,所述任一Leaf节点与每个Spine节点建立边界网关协议BGP邻居,该方法包括:
    向所述Spine节点发布网段路由和主机路由;
    学习所述Spine节点发布的网段路由;
    当第一报文命中网段路由时,将所述第一报文发送给命中的网段路由的下一跳对应的Spine节点,以使该Spine节点将所述第一报文发送给命中的主机路由的下一跳对应的Leaf节点。
  2. 如权利要求1所述的方法,其特征在于,将所述第一报文发送给命中的网段路由的下一跳对应的Spine节点,包括:
    在所述命中的网段路由的下一跳对应多个Spine节点时,将所述第一报文发送给满足预设的负载分担策略的一个Spine节点。
  3. 如权利要求1所述的方法,其特征在于,将所述第一报文发送给命中的网段路由的下一跳对应的Spine节点时,该方法还包括:
    向所述Spine节点请求与所述第一报文匹配的主机路由;
    学习所述Spine节点发布的与所述第一报文匹配的主机路由;
    当第二报文命中与第一报文匹配的主机路由时,将所述第二报文发送给该主机路由的下一跳对应的Leaf节点。
  4. 如权利要求3所述的方法,其特征在于,所述向所述Spine节点请求与所述第一报文匹配的主机路由,包括:
    将所述第一报文发送给命中的网段路由的下一跳对应的Spine节点时,根据该网段路由包含的动作属性,向所述Spine节点请求与所述第一报文匹配的主机路由。
  5. 如权利要求3所述的方法,其特征在于,
    若在预设时长内没有报文命中与所述第一报文匹配的主机路由,则老化该主机路由。
  6. 如权利要求3所述的方法,其特征在于,所述向所述Spine节点请求与所述第一报文匹配的主机路由之后,该方法还包括:
    接收所述Spine节点发送的路由请求抑制指令,所述路由请求抑制指令指示不存在与所述第一报文匹配的主机路由;
    在预设时长内,禁止向所述Spine节点请求与所述第一报文匹配的主机路由。
  7. 一种Leaf节点设备,所述Leaf节点设备包括:中央处理器CPU和硬件芯片, 其中,
    所述处理器,用于向Spine节点发布网段路由和主机路由;学习所述Spine节点发布的网段路由;并将学习到的网段路由下发到所述硬件芯片;
    所述硬件芯片,用于当第一报文命中网段路由时,将所述第一报文发送给命中的网段路由的下一跳对应的Spine节点,以使该Spine节点将所述第一报文发送给命中的主机路由的下一跳对应的Leaf节点。
  8. 如权利要求7所述的Leaf节点设备,其特征在于,
    所述硬件芯片,用于在所述命中的网段路由的下一跳对应多个Spine节点时,将所述第一报文发送给满足预设的负载分担策略的一个Spine节点。
  9. 如权利要求7所述的Leaf节点设备,其特征在于,
    所述硬件芯片,在将所述第一报文发送给命中的网段路由的下一跳对应的Spine节点时,还用于通知所述处理器向所述Spine节点请求与所述第一报文匹配的主机路由;
    所述处理器,还用于在收到所述硬件芯片的通知后,向所述Spine节点请求所述与所述第一报文匹配的主机路由,并学习所述Spine节点发布的与所述第一报文匹配的主机路由;还用于将学习到的与所述第一报文匹配的主机路由下发到所述硬件芯片;
    所述硬件芯片,还用于当第二报文命中与第一报文匹配的主机路由时,将所述第二报文发送给该主机路由的下一跳对应的Leaf节点。
  10. 如权利要求9所述的Leaf节点设备,其特征在于,
    所述处理器,在将学习到的网段路由下发到所述硬件芯片时,用于为学习到的网段路由添加动作属性,并将包含动作属性的网段路由下发到所述硬件芯片;
    所述硬件芯片,在将所述第一报文发送给命中的网段路由的下一跳对应的Spine节点时,还用于根据该网段路由包含的动作属性,通知所述处理器向所述Spine节点请求与所述第一报文匹配的主机路由。
  11. 如权利要求9所述的Leaf节点设备,其特征在于,
    所述处理器,还用于若在预设时长内没有报文命中与所述第一报文匹配的主机路由,则老化该主机路由,并通知所述硬件芯片老化该主机路由;
    所述硬件芯片,还用于在收到所述处理器的通知后,老化该主机路由。
  12. 如权利要求9所述的Leaf节点设备,其特征在于,
    在向所述Spine节点请求与所述第一报文匹配的主机路由之后,所述处理器,还用于接收所述Spine节点发送的路由请求抑制指令,所述路由请求抑制指令指示不存在与所述第一报文匹配的主机路由;在预设时长内,禁止向所述Spine节点请求与所述第一 报文匹配的主机路由。
PCT/CN2018/102840 2017-08-29 2018-08-29 报文转发 WO2019042303A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US16/643,479 US11165693B2 (en) 2017-08-29 2018-08-29 Packet forwarding

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201710756807.1A CN108632145B (zh) 2017-08-29 2017-08-29 一种报文转发方法和叶子节点设备
CN201710756807.1 2017-08-29

Publications (1)

Publication Number Publication Date
WO2019042303A1 true WO2019042303A1 (zh) 2019-03-07

Family

ID=63705757

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/102840 WO2019042303A1 (zh) 2017-08-29 2018-08-29 报文转发

Country Status (3)

Country Link
US (1) US11165693B2 (zh)
CN (1) CN108632145B (zh)
WO (1) WO2019042303A1 (zh)

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11252192B1 (en) * 2018-09-28 2022-02-15 Palo Alto Networks, Inc. Dynamic security scaling
US20190109789A1 (en) * 2018-12-06 2019-04-11 Intel Corporation Infrastructure and components to provide a reduced latency network with checkpoints
CN110535744B (zh) * 2019-08-29 2021-12-24 新华三信息安全技术有限公司 报文处理方法、装置及Leaf设备
CN110768901B (zh) * 2019-10-24 2022-02-25 新华三技术有限公司 路由发布方法、路由选择方法、相关装置及系统
EP3965379A1 (en) * 2020-09-04 2022-03-09 Huawei Technologies Co., Ltd. Data transmission method, apparatus, and network device
CN113037647A (zh) * 2021-03-17 2021-06-25 杭州迪普科技股份有限公司 报文处理方法、装置、设备及计算机可读存储介质
CN113595893B (zh) * 2021-07-20 2024-05-14 锐捷网络股份有限公司 一种路由接收系统、路由接收方法、装置、设备及介质
CN114978980B (zh) * 2022-04-08 2024-01-19 新奥特(北京)视频技术有限公司 Ip信号交叉点调度装置和方法

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102739520A (zh) * 2012-05-31 2012-10-17 华为技术有限公司 查找方法及装置
CN103780490A (zh) * 2012-10-17 2014-05-07 中兴通讯股份有限公司 一种更新路由查找树的方法及装置
US20150295862A1 (en) * 2014-04-11 2015-10-15 Cisco Technology, Inc. Hierarchical programming of dual-stack switches in a network environment
CN105721312A (zh) * 2016-01-14 2016-06-29 盛科网络(苏州)有限公司 一种网络堆叠设备中实现路由分离的芯片实现方法及装置

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10187290B2 (en) * 2016-03-24 2019-01-22 Juniper Networks, Inc. Method, system, and apparatus for preventing tromboning in inter-subnet traffic within data center architectures
US10158564B2 (en) * 2016-11-17 2018-12-18 Cisco Technology, Inc. Border leaf traffic convergence in a software defined network
US10785701B2 (en) * 2018-06-26 2020-09-22 Ciscot Technology, Inc. Hybrid control plane entity for fat tree route disaggregation

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102739520A (zh) * 2012-05-31 2012-10-17 华为技术有限公司 查找方法及装置
CN103780490A (zh) * 2012-10-17 2014-05-07 中兴通讯股份有限公司 一种更新路由查找树的方法及装置
US20150295862A1 (en) * 2014-04-11 2015-10-15 Cisco Technology, Inc. Hierarchical programming of dual-stack switches in a network environment
CN105721312A (zh) * 2016-01-14 2016-06-29 盛科网络(苏州)有限公司 一种网络堆叠设备中实现路由分离的芯片实现方法及装置

Also Published As

Publication number Publication date
CN108632145B (zh) 2020-01-03
US11165693B2 (en) 2021-11-02
CN108632145A (zh) 2018-10-09
US20200195551A1 (en) 2020-06-18

Similar Documents

Publication Publication Date Title
US11240065B2 (en) NSH encapsulation for traffic steering
WO2019042303A1 (zh) 报文转发
US10116559B2 (en) Operations, administration and management (OAM) in overlay data center environments
US10333836B2 (en) Convergence for EVPN multi-homed networks
WO2017114196A1 (zh) 一种报文处理方法、相关装置及nvo3网络系统
WO2021089004A1 (zh) 报文传输方法、代理节点及存储介质
US8913613B2 (en) Method and system for classification and management of inter-blade network traffic in a blade server
US10798048B2 (en) Address resolution protocol suppression using a flow-based forwarding element
US8830834B2 (en) Overlay-based packet steering
WO2021089052A1 (zh) 报文传输方法、代理节点及存储介质
CN110945837B (zh) 优化sdn中的服务节点监视
US20150358232A1 (en) Packet Forwarding Method and VXLAN Gateway
WO2017113306A1 (zh) 可扩展虚拟局域网报文发送方法、计算机设备和可读介质
WO2017197885A1 (zh) 用于虚拟可扩展局域网的通信方法和装置
US9716687B2 (en) Distributed gateways for overlay networks
US9887905B2 (en) Transferring data in a gateway
WO2017113300A1 (zh) 路由确定方法、网络配置方法以及相关装置
JP2020520612A (ja) パケット伝送方法、エッジデバイス及び機械可読記憶媒体
WO2017186122A1 (zh) 流量调度
WO2022001835A1 (zh) 发送报文的方法、装置、网络设备、系统及存储介质
JP6121548B2 (ja) パケットを送信するための方法、ルーティング・ブリッジ、およびシステム
WO2019201209A1 (zh) 报文转发
US10476786B2 (en) Method and system using a scalable label scheme for aliasing in a multihomed Ethernet virtual private network (EVPN) network
WO2018171722A1 (zh) Mac地址同步
WO2022117018A1 (zh) 报文传输的方法和装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18851774

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18851774

Country of ref document: EP

Kind code of ref document: A1