WO2022105932A1 - 一种pfc反压报文及其处理方法 - Google Patents

一种pfc反压报文及其处理方法 Download PDF

Info

Publication number
WO2022105932A1
WO2022105932A1 PCT/CN2021/132527 CN2021132527W WO2022105932A1 WO 2022105932 A1 WO2022105932 A1 WO 2022105932A1 CN 2021132527 W CN2021132527 W CN 2021132527W WO 2022105932 A1 WO2022105932 A1 WO 2022105932A1
Authority
WO
WIPO (PCT)
Prior art keywords
pfc
backpressure
message
packet
hop
Prior art date
Application number
PCT/CN2021/132527
Other languages
English (en)
French (fr)
Inventor
成伟
王俊杰
Original Assignee
苏州盛科通信股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 苏州盛科通信股份有限公司 filed Critical 苏州盛科通信股份有限公司
Publication of WO2022105932A1 publication Critical patent/WO2022105932A1/zh

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L69/00Network arrangements, protocols or services independent of the application payload and not provided for in the other groups of this subclass
    • H04L69/22Parsing or analysis of headers
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L45/00Routing or path finding of packets in data switching networks
    • H04L45/54Organization of routing tables
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L61/00Network arrangements, protocols or services for addressing or naming
    • H04L61/50Address allocation
    • H04L61/5007Internet protocol [IP] addresses

Definitions

  • Embodiments of the present invention relate to the technical field of flow control, and in particular, to a PFC backpressure message and a processing method thereof.
  • PFC Priority-based Flow Control
  • PFC can provide priority-based flow control hop-by-hop
  • PFC allows the creation of 8 priority channels on an Ethernet link, allowing to stop and resume any one of them independently Priority channel, while allowing traffic of other priority channels to be forwarded normally.
  • a network device When a network device forwards a packet, it enters the queue corresponding to the mapping relationship according to the priority of the packet for scheduling and forwarding.
  • the sending rate of a certain priority packet exceeds the receiving rate, the available data buffer space of the receiver is insufficient.
  • a PFC backpressure notification message is sent to the upstream to notify the upstream device to stop sending packets; when the buffer used by the queue falls below the PFC threshold, the PFC is sent to the upstream
  • the back pressure stops the packet and notifies the upstream device to resend the packet, so as to finally realize the rate control of the data flow of the source server, so as to realize no packet loss and retransmission of the data packet.
  • PFC helps to control network traffic, it also brings potential risks.
  • PFC Pause priority-based flow control backpressure
  • the packets in the data buffer cannot be forwarded, and the devices repeatedly send and receive PFC frames.
  • the buffer resources of the device interface are always occupied and cannot be released.
  • the device enters the PFC deadlock state. If two or more queues are permanently blocked, each queue is waiting for resources that are occupied and blocked by other queues, eventually leading to systemic risks to the network.
  • the existing PFC Watch Dog (priority-based flow control watchdog) function through the network chip, this function performs periodic detection on Pause (back pressure) frames.
  • the network device receives a large number of Pause frames, and the number exceeds the PFC deadlock warning threshold, indicating that a PFC deadlock occurs, and reports the PFC deadlock detection result through DMA (Direct Memory Access, direct memory access).
  • the network device system software supports Issue the configuration to disable the PFC function or ignore the response to the PFC XOFF frame (PFC XOFF frame means stop traffic transmission, PFC XON frame means resume traffic transmission), in order to solve the problem that the service data flow cannot be normally forwarded in the network due to PFC deadlock question.
  • Another solution is to pre-analyze whether there is a possibility of PFC deadlock according to the network topology, and pre-deliver ACL (Access Control List, access control list) entries on the network device with PFC deadlock to match the possibility
  • ACL Access Control List, access control list
  • the priority of the service data flow that causes the PFC deadlock is modified according to the ACL matching result. That is, when the data flow triggers the PFC deadlock, the ACL will change the priority of the data flow that triggers the PFC deadlock from the current priority. level switch to another different priority to break the response to the PFC Pause frame of that priority, thus solving the PFC deadlock.
  • This solution not only consumes ACL entries, but also has different network topologies for different scenarios. As a result, the ACL configurations corresponding to each deployment solution are also different. In practical applications, management and operation and maintenance are greatly affected. challenge.
  • the purpose of the embodiments of the present invention is to overcome the defects of the prior art, and to provide a UDP-based PFC backpressure message and a processing method thereof.
  • a PFC backpressure message is a UDP-based PFC backpressure message
  • the PFC backpressure message includes a message header
  • the message header includes a MAC header, an IP header and a UDP header.
  • the PFC backpressure message further includes a data field, and the data field includes a MAC control opcode, a backpressure enable vector, and a backpressure timer.
  • the embodiment of the present invention also discloses another technical solution: a method for processing a PFC backpressure message based on the above-mentioned PFC backpressure message, the method comprising:
  • the current hop device when the ingress cache of the current hop device exceeds a preset PFC threshold, the current hop device triggers to send the PFC back-pressure message to its upstream next-hop device;
  • next-hop device when the next-hop device receives the PFC backpressure packet, it performs a route forwarding entry search, and forwards the PFC backpressure packet to the next hop or directly processes the PFC backpressure packet according to the search result. ;
  • each next-hop device performs step S200 until the PFC backpressure packet is forwarded to the source server, and the source server processes the PFC backpressure packet.
  • the routing and forwarding entry is searched according to the destination IP address in the packet header of the PFC backpressure packet.
  • the PFC backpressure packet is continued to be forwarded to the next hop.
  • the search result is that the destination IP address of the PFC backpressure packet matches the IP address of the next-hop device, it means that the current next-hop device is the source server, and the current next-hop device directly Process the PFC backpressure message.
  • the source server stops sending the data stream corresponding to the priority or resumes sending the data stream corresponding to the priority according to the priority of the PFC backpressure message and the backpressure timer.
  • the TTL field in the PFC backpressure packet is correspondingly decremented by one.
  • the PFC backpressure message is discarded.
  • the embodiment of the present invention expands the traditional MAC-based PFC back-pressure frame into a UDP-based PFC back-pressure frame, and realizes the back pressure of traffic from the congested node directly to the source server of the data stream by means of route forwarding, solving the traditional hop-by-hop solution.
  • the PFC deadlock problem caused by back pressure can simplify the deployment difficulty of the PFC solution, solve the existing PFC deadlock problem, and avoid the consumption of chip ACL entry resources.
  • the UDP-based PFC Pause frame in the embodiment of the present invention can directly perform traffic flow to the source server of the data stream by means of routing forwarding. Back pressure, the intermediate devices along the way only need to be able to forward based on IPDA, so that the source server directly stops or resumes the transmission of traffic according to the received PFC Pause message, which can not only solve the traditional PFC hop-by-hop back pressure
  • the deadlock problem caused by the end-to-end flow control mechanism is also implemented.
  • the embodiment of the present invention can realize UDP-based PFC Pause anti-loop with the help of the TTL field of the IP header of the PFC Pause frame, because when the TTL field contained in the packet header of the three-layer PFC Pause frame is reduced to 0, the PFC Pause Pause packets will be discarded, which completely solves the problem that PFC deadlock cannot be broken.
  • Fig. 1 is the principle schematic diagram of existing PFC deadlock
  • Fig. 2 is the message format schematic diagram of the PFC Pause message based on UDP in the embodiment of the present invention
  • FIG. 3 is a schematic flowchart of a method according to an embodiment of the present invention.
  • FIG. 4 is a schematic diagram of a processing process of a PFC Pause message according to an optional embodiment of the present invention.
  • a PFC backpressure message disclosed in the embodiment of the present invention is to expand the PFC backpressure message based on MAC (Media Access Control, media access control) into a UDP (User Datagram) based message.
  • Protocol User Data packet protocol
  • the optional format is shown in Figure 1.
  • the PFC backpressure message includes a message header and a data field.
  • the PFC backpressure message in this embodiment of the present invention has a
  • the message header includes a MAC header, an IP (Internet Protocol, Internet Protocol message) header, and a UDP header, that is, in this embodiment of the present invention, the MAC header information of the original PFC backpressure message is replaced by a MAC header+IP Header + UDP header, while retaining the data field of the original MAC-based PFC backpressure message.
  • the data fields include MAC control opcodes, backpressure enable vectors, backpressure timers, and more.
  • each field of the PFC backpressure message in the embodiment of the present invention is as follows:
  • MAC header including destination MAC address field and source MAC address field
  • the IP header includes a source IP address field, a destination IP address field, and a Time To Live (TTL) field, wherein the source IP address field and the destination IP address field are used to indicate the source host address and The destination host address for receiving IP packets;
  • TTL Time To Live
  • the UDP header including the source port number field and the destination port number field, is used to identify the source and destination respectively;
  • MAC control opcode (Control opcode): PFC Pause frame is also a kind of MAC control, and the corresponding opcode is 0x0101;
  • Back pressure enable vector Class-Enable vector, E[0 ⁇ 7] corresponds to different priorities of back pressure
  • Backpressure timer Time, its value is decremented by 1 every time it passes through a router, and when it reaches 0, it means that the backpressure is canceled and the peer packet transmission is resumed.
  • This embodiment of the present invention replaces the original MAC header with IP routing and UDP on the premise of retaining the attributes of the original MAC-based PFC backpressure message, such as the MAC control opcode, backpressure enable vector, and backpressure timer.
  • the header of the message, and a specific UDP port number is reserved, that is, the original MAC layer 2 forwarding mechanism is improved to an IP layer 3 forwarding mechanism. Since IP routing technology is used as a traditional Layer 3 forwarding function, network devices and chips are well supported and compatible with the expansion of PFC Pause packets.
  • the UDP-based PFC Pause message not only completely inherits the multiple PFC priorities of the traditional MAC-based PFC Pause frame, but also preferably uses the TTL field in the IP message header.
  • UDP-based PFC Pause anti-loop means that when the TTL field contained in the packet header of a Layer 3 PFC Pause packet is reduced to 0, the PFC Pause packet will be discarded, which completely solves the problem of PFC deadlock. break the problem.
  • the TTL field in it will not be reduced to 0, but if a packet with a priority specified by the PFC forms a loop, the device enters the PFC at this time. In the deadlock state, the TTL field of the PFC Pause packet may be reduced to 0. When it is reduced to 0, the PFC Pause packet is discarded, which naturally solves the problem that the PFC deadlock cannot be broken.
  • a method for processing a PFC backpressure message disclosed in an embodiment of the present invention includes the following steps:
  • the current hop device when the ingress buffer of the current hop device exceeds the preset PFC threshold, the current hop device triggers to send a PFC back-pressure packet to the next-hop device upstream of the current hop device.
  • next-hop device when the next-hop device receives the PFC backpressure packet, it searches for a route forwarding entry, and forwards the PFC backpressure packet to the next hop or directly processes the PFC backpressure packet according to the search result.
  • each next-hop device performs step S200 until the PFC backpressure packet is forwarded to the source server, and the source server processes the PFC backpressure packet.
  • FIG. 4 includes the source server, the destination server, and the switch 1 ( SW1), switch 2 (SW2) and switch 3 (SW3).
  • the source server starts to send the data stream to the SW1 device, assuming that the priority of the data stream is 3 and the bandwidth is 10G (Gigabits per second, Gigabits per second); when the SW1 device receives the data stream sent by the source server, according to the transfer
  • the result of the entry search forwards the data stream to the next SW2 device; when the SW2 device receives the data stream sent by the source server, it forwards the data stream to the next hop SW3 device according to the result of the forwarding entry search; because the SW2 device and SW3
  • the bandwidth of the interconnection between devices is only 1G, and the bandwidth of this data stream is 10G, which exceeds the egress forwarding capability of the SW2 device.
  • the ingress cache of the SW2 device will exceed the preset PFC threshold. Therefore, the SW2 device will trigger the above-mentioned UDP-based PFC Pause message, which is packaged by the SW2 device and forwarded to the upstream next-hop SW1 device.
  • the SW1 device searches for routing and forwarding entries according to the IPDA (Internet Protocol Destination Address, destination IP address) field in the header of the PFC Pause message, Determines whether the IP address of the SW1 device matches the IPDA field of the PFC Pause packet. If it matches, it means that the device is the source server and needs to process the PFC Pause packet. If it does not match, it means that the device is not the source server, indicating that this The hop device needs to forward the route according to the destination IP address of the PFC Pause packet, and forward the PFC Pause packet to the next one from the corresponding exit of the hop device according to the routing and forwarding table entry lookup structure.
  • the SW1 device because the SW1 device is not the source server, its IP address does not match the IPDA field of the PFC Pause message, so the SW1 device forwards the PFC Pause message to the source server according to the routing and forwarding table entry search result.
  • the source server When the source server receives the UDP-based PFC Pause message forwarded by the SW1 device, it finds that the IPDA field of the PFC Pause message matches the IP address of the source server, so it needs to process the PFC Pause message.
  • An optional process is: : According to the priority and Time of the PFC Pause message, stop the sending of the data stream with the corresponding priority.
  • the source server and the device sending the PFC Pause message are not limited to only one intermediate device here, namely the SW1 device, that is, there can be multiple ones, and each intermediate device has the same effect on the PFC Pause message.
  • the processing process is the same as that of the SW1 device, and will not be repeated here.
  • the TTL field in the header of the above-mentioned PFC Pause message is decremented by one for each forwarding hop. If the TTL field is 0, it means that the PFC Pause message needs to be discarded, which can solve the PFC deadlock.
  • the difference between the embodiment of the present invention and the traditional MAC-based PFC back-pressure message flow back-pressure mechanism is that the traditional MAC-based PFC back-pressure message is point-to-point hop-by-hop traffic back-pressure, while the embodiment of the present invention is based on UDP
  • the PFC backpressure message directly backpressures the source server of the data stream by means of routing forwarding.
  • the intermediate device between the source server and the faulty device only needs to be able to forward it based on the destination IP address (IPDA), thus
  • IPDA destination IP address
  • the source server directly stops or resumes the sending of traffic according to the received PFC Pause message, which not only solves the problem of PFC deadlock caused by hop-by-hop backpressure in traditional PFC in actual deployment, but also realizes end-to-end end flow control mechanism.
  • the embodiment of the present invention can solve two problems existing in the existing PFC deadlock scheme by extending the MAC-based PFC backpressure message into a UDP-based PFC backpressure message: first, it can avoid PFC deadlock.
  • the second is to solve the impact of the ACL switching data flow priority scheme on the data flow itself, and at the same time, reduce the difficulty of deploying ACLs to solve PFC deadlocks in different network topologies.
  • the embodiments of the present invention can simplify the difficulty of PFC deployment, and can also solve the existing PFC deadlock problem well by extending the PFC message, avoiding the inaccurate detection of the chip threshold caused by the PFC deadlock.
  • the embodiment of the present invention can cover all 8 priorities of the PFC at the same time, and cannot consume ACL resources. In actual deployment, the network topology in different scenarios will not affect the PFC deadlock deployment.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Computer Security & Cryptography (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

本发明实施例揭示了一种PFC反压报文及其处理方法,所述PFC反压报文为基于UDP的PFC反压报文,其包括报文头部,报文头部包括MAC头部、IP头部和UDP头部。本发明实施例通过对PFC反压报文进行扩展,实现从拥塞节点直接对数据流的源端服务器进行流量的反压,解决传统逐跳反压的PFC死锁问题,能够简化PFC方案部署、还不需要消耗芯片ACL表项资源。

Description

一种PFC反压报文及其处理方法
本发明要求于2020年11月23日提交中国专利局、申请号为202011324328.0、发明名称“一种PFC反压报文及其处理方法”的中国专利申请的优先权,其全部内容通过引用结合在本发明中。
技术领域
本发明实施例涉及一种流量控制技术领域,尤其是涉及一种PFC反压报文及其处理方法。
背景技术
PFC(Priority-based Flow Control,基于优先级的流量控制)能够逐跳提供基于优先级的流量控制,PFC允许在一条以太网链路上创建8个优先级通道,允许单独停止和恢复其中任意一优先级通道,同时允许其它优先级通道的流量正常转发。
网络设备在进行报文转发时,根据报文的优先级进入对应映射关系的队列中进行调度转发。当某一优先级报文发送速率超过接收速率,导致接收方可用数据缓冲空间不足。当队列已使用的缓存超过PFC门限值时,则向上游发送PFC反压通知报文,通知上游设备停止发包;当队列已使用的缓存降低到PFC门限值以下时,则向上游发送PFC反压停止报文,通知上游设备重新发包,从而最终实现对源端服务器数据流的速率的控制,从而最终实现数据报文的无丢包和重传。
PFC虽然有助于实现网络流量的调控,但也会带来潜在的风险。如图1所示,如果网络中出现大量PFC Pause(基于优先级的流量控制反压)帧,就极有可能诱发PFC死锁。当PFC指定优先级的报文形成环路时,会导致数据缓冲区内报文无法转发,设备间反复发送和接收PFC帧,导致设备接口的缓冲区资源一直被占用无法释放,此时设备进入PFC死锁状态。 如果出现两个或多个队列发生永久堵塞,每个队列都在等待被其他队列占用并堵塞了的资源,最终导致网络系统性风险。
为解决上述PFC死锁问题,现有通过网络芯片的PFC Watch Dog(基于优先级的流量控制的看门狗)功能,该功能针对Pause(反压)帧进行周期性检测,如果发现一段时间该网络设备接收到大量的Pause帧,且该数量超过PFC死锁预警阈值,表示PFC死锁发生,并将PFC死锁检测结果通过DMA(Direct Memory Access,直接存储器访问)上报,网络设备系统软件支持下发配置来关闭PFC功能或者忽略对PFC XOFF帧的响应(PFC XOFF帧表示停止流量发送,PFC XON帧表示恢复流量的发送),以解决PFC死锁导致业务数据流无法在网络中正常转发的问题。但是该方案针对PFC的死锁检测,需要依赖芯片的支持,已有的传统芯片大部分尚未支持,同时,在已有支持PFC Watch Dog功能的芯片,大部分网络设备与芯片能力只支持两个优先级的PFC死锁检测能力,而不是全部8个优先的PFC死锁检测能力。另外,PFC死锁的阈值配置存在经验成分,不合理的阈值配置会导致对PFC死锁是否真正发送出现误判。
另一种方案是根据网络拓扑来预先分析是否存在PFC死锁的可能,并在存在发生PFC死锁的网络设备上,预先下发ACL(Access Control List,访问控制列表)表项来匹配有可能导致PFC死锁发生的业务数据流,根据ACL匹配结果对该数据流的优先级进行修改,也就是当该数据流触发PFC死锁的同时,ACL会将触发PFC死锁的数据流从当前优先级切换到另一个不同优先级,以打破对该优先级PFC Pause帧的响应,从而解决PFC死锁。当是该方案不仅会消耗ACL表项,且针对不同的场景的网络,拓扑并不相同,导致每次部署方案对应的ACL配置也不相同,在实际应用中给管理和运维带来较大的挑战。
发明内容
本发明实施例的目的在于克服现有技术的缺陷,提供一种基于UDP的PFC反压报文及其处理方法。
为实现上述目的,本发明实施例提出如下技术方案:一种PFC反压报文,所述PFC反压报文为基于UDP的PFC反压报文,所述PFC反压报文包括报文头部,所述报文头部包括MAC头部、IP头部和UDP头部。
可选的,所述PFC反压报文还包括数据字段,所述数据字段包括MAC控制操作码、反压使能向量和反压定时器。
本发明实施例还揭示了另外一种技术方案:一种基于上述PFC反压报文的PFC反压报文的处理方法,所述方法包括:
S100,当本跳设备的入口缓存超出预设的PFC阈值,则所述本跳设备触发向其上游的下一跳设备发送所述PFC反压报文;
S200,当下一跳设备接收到所述PFC反压报文,则进行路由转发表项查找,根据查找结果将所述PFC反压报文继续转发下一跳或者直接处理所述PFC反压报文;
S300,每个下一跳设备执行步骤S200,直至将所述PFC反压报文转发到源服务器,所述源服务器处理所述PFC反压报文。
可选的,所述S200中,根据所述PFC反压报文的报文头部的目的IP地址查找所述路由转发表项。
可选的,若所述查找结果为所述PFC反压报文的目的IP地址与所述下一跳设备的IP地址不匹配,则将所述PFC反压报文继续转发下一跳。
可选的,若不匹配,则根据所述PFC反压报文的目的IP地址将所述PFC反压报文进行转发,并根据路由表项查找结果从下一跳设备的对应出口继续转发下一跳。
可选的,若所述查找结果为所述PFC反压报文的目的IP地址与所述下一跳设备的IP地址匹配,则表示当前下一跳设备为源服务器,当前下 一跳设备直接处理所述PFC反压报文。
可选的,所述源服务器根据所述PFC反压报文的优先级和反压定时器停止对应优先级数据流的发送或者恢复对应优先级数据流的发送。
可选的,所述PFC反压报文每转发一跳,所述PFC反压报文中的TTL字段对应减一。
可选的,所述PFC反压报文中的TTL字段为0时,则丢弃所述PFC反压报文。
本发明实施例的有益效果是:
1、本发明实施例将传统基于MAC的PFC反压帧扩展为基于UDP的PFC反压帧,借助路由转发实现从拥塞节点直接对数据流的源端服务器进行流量的反压,解决传统逐跳反压的PFC死锁问题,能够简化PFC方案部署难度、解决现有的PFC死锁问题及还避免不需要消耗芯片ACL表项资源的消耗。
2、与传统基于MAC的PFC Pause帧是点到点逐跳进行流量反压方式不同的是,本发明实施例基于UDP的PFC Pause帧可以借助路由转发直接对数据流的源端服务器进行流量的反压,沿途的中间设备只需要能够基于IPDA进行转发即可,从而直接由源服务器根据接收到的PFC Pause报文进行停止流量的发送或恢复流量的发送,不仅能够解决传统PFC逐跳反压带来的死锁问题,还实现了端到端的流控机制。
3、本发明实施例借助PFC Pause帧的IP头部的TTL字段能够实现基于UDP的PFC Pause防环,因当三层PFC Pause帧的报文头部包含的TTL字段减少到0,则该PFC Pause报文就会被丢弃,也就彻底解决了PFC死锁无法打破的问题。
附图说明
图1是现有发生PFC死锁的原理示意图;
图2是本发明实施例基于UDP的PFC Pause报文的报文格式示意图;
图3是本发明实施例方法的流程示意图;
图4是本发明一可选实施例的PFC Pause报文的处理过程示意图。
具体实施方式
下面将结合本发明实施例的附图,对本发明实施例的技术方案进行清楚、完整的描述。
如图2所示,本发明实施例所揭示的一种PFC反压报文,是将基于MAC(Media Access Control,媒体访问控制)的PFC反压报文进行扩展,扩展为基于UDP(User Datagram Protocol,用户数据报文协议)的PFC反压报文。可选的格式如图1所示,所述PFC反压报文包括报文头部和数据字段,与基于MAC的PFC反压报文格式不同的是,本发明实施例PFC反压报文的报文头部包括MAC头部、IP(Internet Protocol,互联网协议报文)头部和UDP头部,即本发明实施例将原先PFC反压报文的MAC头部信息替换为MAC头部+IP头部+UDP头部,同时保留原先基于MAC的PFC反压报文的数据字段。数据字段包括MAC控制操作码、反压使能向量和反压定时器等等。
如图2所示,本发明实施例的PFC反压报文的各字段定义和描述如下:
MAC头部,包括目的MAC地址字段、源MAC地址字段;
IP头部,包括源IP地址字段、目的IP地址字段和生存时间(Time To Live,TTL)字段,其中,源IP地址字段、目的IP地址字段用于标明发送IP数据报文的源主机地址和接收IP报文的目标主机地址;
UDP头部,包括源端端口号字段和目的端口号字段,用于分别标识源端和目的端;
MAC控制操作码(Control opcode):PFC Pause帧也是MAC控制的一种,对应的操作码是0x0101;
反压使能向量:Class-Enable vector,E[0~7]对应反压的不同优先级;
反压定时器:Time,每经过一个路由器,其值减1,直到0时,表示取消反压,恢复对端报文发送。
本发明实施例在保留原有基于MAC的PFC反压报文的MAC控制操作码、反压使能向量和反压定时器等属性的前提下,将原先的MAC头部替换为IP路由和UDP报文头部,并预留特定的UDP端口号,即将原先的MAC二层转发机制改进为IP三层转发机制。由于IP路由技术作为传统三层转发功能,网络设备和芯片支持很完善,能够兼容PFC Pause报文的扩展。通过复用传统的路由转发机制,基于UDP的PFC Pause报文不仅完全继承了传统基于MAC的PFC Pause帧的多个PFC优先级,还有较优选的是借助IP报文头部的TTL字段能够实现基于UDP的PFC Pause防环,就是当三层PFC Pause报文的报文头部包含的TTL字段减少到0,则该PFC Pause报文就会被丢弃,也就彻底解决了PFC死锁无法打破的问题。当然,在一般情况下,当PFC Pause报文被转发到源服务器时,其内的TTL字段不会减少到0,但是如果当PFC指定优先级的报文形成环路时,此时设备进入PFC死锁状态,PFC Pause报文的TTL字段就有可能减少到0,当其减少到0时,PFC Pause报文被丢弃,也就自然解决了PFC死锁无法打破的问题。
如图3所示,基于上述基于UDP的PFC反压报文,本发明实施例所揭示的一种PFC反压报文的处理方法,包括以下步骤:
S100,当本跳设备的入口缓存超出预设的PFC阈值,则本跳设备触发向其上游的下一跳设备发送PFC反压报文。
S200,当下一跳设备接收到PFC反压报文,则进行路由转发表项查找,根据查找结果将PFC反压报文继续转发下一跳或者直接处理PFC反压报文。
S300,每个下一跳设备执行步骤S200,直至将PFC反压报文转发到 源服务器,源服务器处理PFC反压报文。
下面以一可选实施例来详细说明本发明实施例PFC反压报文的处理流程,如图4所示,图4中包括源服务器、目的服务器及源服务器和目的服务器之间的交换机1(SW1)、交换机2(SW2)和交换机3(SW3)。
首先,源服务器开始发送数据流到SW1设备,假设数据流的优先级为3,带宽为10G(Gigabits per second,千兆比特每秒);当SW1设备接收到源服务器发送的数据流,根据转发表项查找结果将该数据流转发到下一条SW2设备;当SW2设备接收到源服务器发送的数据流,根据转发表项查找结果将该数据流转发到下一跳SW3设备;由于SW2设备和SW3设备之间互联的带宽只有1G,而该数据流的带宽是10G,超出了SW2设备的出口转发能力,所以SW2设备的入口缓存会超出预设的PFC阈值。因此,在SW2设备会触发上述基于UDP的PFC Pause报文,该报文由SW2设备组包并转发到上游的下一跳SW1设备。
可选地,当SW1设备接收到SW2设备发送的基于UDP的PFC Pause报文,SW1设备根据PFC Pause报文头部的IPDA(Internet Protocol Destination Address,目的IP地址)字段进行路由转发表项查找,判断SW1设备的IP地址与PFC Pause报文的IPDA字段是否匹配,若匹配,则表示该设备就是源服务器,需要处理该PFC Pause报文,若不匹配,则表示该设备不是源服务器,表示本跳设备需要根据PFC Pause报文的目的IP地址进行路由转发,并根据路由转发表项查找结构从本跳设备对应的出口将PFC Pause报文转发给下一条。本实施例中,因SW1设备不是源服务器,所以其IP地址与PFC Pause报文的IPDA字段不匹配,所以SW1设备根据路由转发表项查找结果将PFC Pause报文转发到源端服务器。
当源服务器接收到SW1设备转发的基于UDP的PFC Pause报文,发现该PFC Pause报文的IPDA字段与源服务器的IP地址匹配,所以需要处理该PFC Pause报文,可选的一种处理是:根据PFC Pause报文的优先级以及Time停止相应优先级的数据流的发送。
当然,在其他替换实施例中,源服务器和发出PFC Pause报文的设备之间不限于这里只有一个中间设备,即SW1设备,即可以有多个,每个的中间设备对PFC Pause报文的处理过程与SW1设备的处理流程一样,这里不做赘述。
另外,上述PFC Pause报文每转发一跳,其头部的TTL字段对应减一,若TTL字段为0,则表示需要丢弃该PFC Pause报文,可解决PFC死锁。
本发明实施例与传统的基于MAC的PFC反压报文的流量反压机制的区别在于,传统基于MAC的PFC反压报文是点对点逐跳进行流量的反压,而本发明实施例基于UDP的PFC反压报文借助于路由转发直接对数据流的源端服务器进行流量的反压,源服务器和故障设备之间的中间设备只需能够基于目的IP地址(IPDA)进行转发即可,从而直接由源服务器根据接收到的PFC Pause报文进行停止流量的发送或恢复流量的发送,不仅能够解决传统PFC在实际部署中因逐跳反压带来的PFC死锁问题,还实现了端到端的流控机制。
本发明实施例通过将基于MAC的PFC反压报文进行扩展,扩展为基于UDP的PFC反压报文,可以解决现有PFC死锁方案存在的两个问题:一是能够避免对PFC死锁发生的误判判断,二是解决对ACL切换数据流优先级的方案对数据流自身带来的影响,同时,降低不同网络拓扑对ACL解决PFC死锁的部署难度。综上,本发明实施例能够简化PFC部署难度,通过对PFC报文的扩展,还能很好的解决现有的PFC死锁问题,避免了因PFC死锁带来的芯片阈值检测不准,以及传统方案对芯片ACL资源的消耗,降低了在网络中部署的难度。另外,本发明实施例可以同时覆盖PFC全部8个优先级,且不能消耗ACL资源,在实际部署中,不会受到不同场景下网络拓扑对PFC死锁部署的影响。
本发明实施例的技术内容及技术特征已揭示如上,然而熟悉本领域的技术人员仍可能基于本发明实施例的教示及揭示而作种种不背离本发明实施例精神的替换及修饰,因此,本发明实施例保护范围应不限于实施例 所揭示的内容,而应包括各种不背离本发明实施例的替换及修饰,并为本专利申请权利要求所涵盖。

Claims (10)

  1. 一种PFC反压报文,所述PFC反压报文为基于UDP的PFC反压报文,所述PFC反压报文包括报文头部,所述报文头部包括MAC头部、IP头部和UDP头部。
  2. 根据权利要求1所述的一种PFC反压报文,其中,所述PFC反压报文还包括数据字段,所述数据字段包括MAC控制操作码、反压使能向量和反压定时器。
  3. 一种权利要求1或2所述的PFC反压报文的处理方法,其中,所述方法包括:
    S100,当本跳设备的入口缓存超出预设的PFC阈值,则所述本跳设备触发向其上游的下一跳设备发送所述PFC反压报文;
    S200,当下一跳设备接收到所述PFC反压报文,则进行路由转发表项查找,根据查找结果将所述PFC反压报文继续转发下一跳或者直接处理所述PFC反压报文;
    S300,每个下一跳设备执行步骤S200,直至将所述PFC反压报文转发到源服务器,所述源服务器处理所述PFC反压报文。
  4. 根据权利要求3所述的一种PFC反压报文的处理方法,其中,所述S200中,根据所述PFC反压报文的报文头部的目的IP地址查找所述路由转发表项。
  5. 根据权利要求3或4所述的一种PFC反压报文的处理方法,其中,若所述查找结果为所述PFC反压报文的目的IP地址与所述下一跳设备的IP地址不匹配,则将所述PFC反压报文继续转发下一跳。
  6. 根据权利要求5所述的一种PFC反压报文的处理方法,其中,若不匹配,则根据所述PFC反压报文的目的IP地址将所述PFC反压报文进行转发,并根据路由表项查找结果从下一跳设备的对应出口继续转发 下一跳。
  7. 根据权利要求3或4所述的一种PFC反压报文的处理方法,其中,若所述查找结果为所述PFC反压报文的目的IP地址与所述下一跳设备的IP地址匹配,则表示当前下一跳设备为源服务器,当前下一跳设备直接处理所述PFC反压报文。
  8. 根据权利要求3所述的一种PFC反压报文的处理方法,其中,所述源服务器根据所述PFC反压报文的优先级和反压定时器停止对应优先级数据流的发送或者恢复对应优先级数据流的发送。
  9. 根据权利要求3所述的一种PFC反压报文的处理方法,其中,所述PFC反压报文每转发一跳,所述PFC反压报文中的TTL字段对应减一。
  10. 根据权利要求9所述的一种PFC反压报文的处理方法,其中,所述PFC反压报文中的TTL字段为0时,则丢弃所述PFC反压报文。
PCT/CN2021/132527 2020-11-23 2021-11-23 一种pfc反压报文及其处理方法 WO2022105932A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202011324328.0A CN112565087A (zh) 2020-11-23 2020-11-23 一种pfc反压报文及其处理方法
CN202011324328.0 2020-11-23

Publications (1)

Publication Number Publication Date
WO2022105932A1 true WO2022105932A1 (zh) 2022-05-27

Family

ID=75044921

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/132527 WO2022105932A1 (zh) 2020-11-23 2021-11-23 一种pfc反压报文及其处理方法

Country Status (2)

Country Link
CN (1) CN112565087A (zh)
WO (1) WO2022105932A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116527205B (zh) * 2023-06-30 2023-09-05 芯耀辉科技有限公司 数据传输方法、装置及存储介质

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112565087A (zh) * 2020-11-23 2021-03-26 盛科网络(苏州)有限公司 一种pfc反压报文及其处理方法
CN114157609B (zh) * 2021-11-30 2024-02-23 迈普通信技术股份有限公司 Pfc死锁检测方法及装置
CN115883466B (zh) * 2023-03-03 2023-06-16 苏州浪潮智能科技有限公司 交换机的控制方法及装置、存储介质及电子装置
CN115941599B (zh) * 2023-03-10 2023-05-16 珠海星云智联科技有限公司 一种用于预防pfc死锁的流量控制方法、设备及介质

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103746927A (zh) * 2013-12-27 2014-04-23 杭州华为数字技术有限公司 基于优先级的流控pfc方法及发送设备、接收设备
US20150009823A1 (en) * 2013-07-02 2015-01-08 Ilango Ganga Credit flow control for ethernet
CN107493238A (zh) * 2016-06-13 2017-12-19 华为技术有限公司 一种网络拥塞控制方法、设备及系统
US20190280978A1 (en) * 2018-03-06 2019-09-12 International Business Machines Corporation Flow management in networks
CN111526095A (zh) * 2019-02-02 2020-08-11 华为技术有限公司 一种流量控制方法和装置
CN112565087A (zh) * 2020-11-23 2021-03-26 盛科网络(苏州)有限公司 一种pfc反压报文及其处理方法

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1728366B1 (en) * 2004-03-05 2007-12-05 Xyratex Technology Limited A method for congestion management of a network, a signalling protocol, a switch, an end station and a network
CN102025617B (zh) * 2010-11-26 2015-04-01 中兴通讯股份有限公司 以太网拥塞控制方法及装置

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150009823A1 (en) * 2013-07-02 2015-01-08 Ilango Ganga Credit flow control for ethernet
CN103746927A (zh) * 2013-12-27 2014-04-23 杭州华为数字技术有限公司 基于优先级的流控pfc方法及发送设备、接收设备
CN107493238A (zh) * 2016-06-13 2017-12-19 华为技术有限公司 一种网络拥塞控制方法、设备及系统
US20190280978A1 (en) * 2018-03-06 2019-09-12 International Business Machines Corporation Flow management in networks
CN111526095A (zh) * 2019-02-02 2020-08-11 华为技术有限公司 一种流量控制方法和装置
CN112565087A (zh) * 2020-11-23 2021-03-26 盛科网络(苏州)有限公司 一种pfc反压报文及其处理方法

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116527205B (zh) * 2023-06-30 2023-09-05 芯耀辉科技有限公司 数据传输方法、装置及存储介质

Also Published As

Publication number Publication date
CN112565087A (zh) 2021-03-26

Similar Documents

Publication Publication Date Title
WO2022105932A1 (zh) 一种pfc反压报文及其处理方法
US20220210092A1 (en) System and method for facilitating global fairness in a network
US7539133B2 (en) Method and apparatus for preventing congestion in load-balancing networks
EP1958400B1 (en) Managing the distribution of control protocol information in a network node
US8116203B2 (en) Multiple virtual channels for use in network devices
US8416795B2 (en) Ethernet switching
US10965604B2 (en) Deadlock avoidance in leaf-spine networks
JP2002252640A (ja) ネットワーク中継装置及び方法並びにシステム
US20090300209A1 (en) Method and system for path based network congestion management
US20120075999A1 (en) Dynamic latency-based rerouting
EP3763094A1 (en) Flow management in networks
CN104205754A (zh) 通过分组循环进行网络拥塞管理
WO2015070608A1 (zh) Oam性能监控方法及装置
WO2022111724A1 (zh) 网络拥塞检测方法及装置
JP2009111707A (ja) パケット転送装置
US10009277B2 (en) Backward congestion notification in layer-3 networks
US7957375B2 (en) Apparatus and method for policy routing
WO2013104178A1 (zh) 一种基于微波传输的以太网流量控制装置和方法
JP4611863B2 (ja) ループ検出方法およびループ検出装置
JP2008118281A (ja) 通信装置
US7471630B2 (en) Systems and methods for performing selective flow control
CN102480471A (zh) 实现监控RRPP环中QoS处理的方法和网络节点
CN113114578B (zh) 一种流量拥塞隔离方法、装置和系统
CN112039798B (zh) 一种基于优先级流量控制的自协商方法及装置
JP5283581B2 (ja) 中継通信装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21894078

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 16.10.2023)

122 Ep: pct application non-entry in european phase

Ref document number: 21894078

Country of ref document: EP

Kind code of ref document: A1