CN115987831B - PFC deadlock function test method, device and medium of network device - Google Patents

PFC deadlock function test method, device and medium of network device Download PDF

Info

Publication number
CN115987831B
CN115987831B CN202211632925.9A CN202211632925A CN115987831B CN 115987831 B CN115987831 B CN 115987831B CN 202211632925 A CN202211632925 A CN 202211632925A CN 115987831 B CN115987831 B CN 115987831B
Authority
CN
China
Prior art keywords
leaf
equipment
server
spine
devices
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202211632925.9A
Other languages
Chinese (zh)
Other versions
CN115987831A (en
Inventor
吴莹
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Inspur Cisco Networking Technology Co Ltd
Original Assignee
Inspur Cisco Networking Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Inspur Cisco Networking Technology Co Ltd filed Critical Inspur Cisco Networking Technology Co Ltd
Priority to CN202211632925.9A priority Critical patent/CN115987831B/en
Publication of CN115987831A publication Critical patent/CN115987831A/en
Application granted granted Critical
Publication of CN115987831B publication Critical patent/CN115987831B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/50Reducing energy consumption in communication networks in wire-line communication networks, e.g. low power modes or reduced link rate

Landscapes

  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The embodiment of the specification discloses a PFC deadlock function test method, equipment and medium of network equipment, and relates to the technical field of communication, wherein the method comprises the following steps: the method comprises the steps of building a test environment, wherein the test environment comprises a plurality of Spine devices, a plurality of Leaf devices and a plurality of server devices, the Spine devices are connected with the Leaf devices, and the Leaf devices are connected with the server devices; performing equipment configuration on the Spine equipment, the Leaf equipment and the server equipment to enable the flow paths among the server equipment to meet preset requirements; controlling the appointed server equipment and the opposite-end server equipment to mutually send bidirectional flow, and configuring a Qos speed limiting function for an uplink Spine equipment port of the appointed Leaf equipment by operating the appointed Leaf equipment to trigger a PFC deadlock scene in a test environment; the method comprises the steps that PFC deadlock detection functions are configured on the Spine equipment and the Leaf equipment, and flow receiving data of a designated server equipment and a corresponding server equipment are obtained; and judging whether the PFC deadlock detection function of the Spine device and the Leaf device is effective or not based on the flow receiving data so as to test the PFC deadlock detection function.

Description

PFC deadlock function test method, device and medium of network device
Technical Field
The present disclosure relates to the field of communications technologies, and in particular, to a method, an apparatus, and a medium for testing PFC deadlock functions of a network device.
Background
With the acceleration of global enterprise digital transformation, the mission of data centers is moving from focused business rapid delivery to focused data efficient processing. In order to improve the efficiency of data processing, the current hot applications such as high-performance computing, distributed storage, AI artificial intelligence and the like require that the data center network has the capabilities of no packet loss, low time delay and high throughput. However, the traditional network communication based on the TCP/IP protocol stack cannot meet the high requirement on the network performance due to the fact that resources are consumed in key links such as data copying and the like greatly and time delay is too high.
The remote direct memory access function (Remote Direct Memory Access, RDMA) uses related hardware and network technology to enable the memory to be directly read between network cards of the server, and finally achieves the effects of high bandwidth, low time delay and low resource consumption rate. The current applications such as high-performance computing, distributed storage, artificial intelligence and the like all adopt the RoCEv2 protocol to reduce the processing and time delay of the CPU and improve the performance of the application. However, since RDMA was originally proposed to be carried in lossless InfiniBand networks, the RoCEv2 protocol lacks a perfect packet loss protection mechanism and is extremely sensitive to network packet loss. Meanwhile, the distributed high-performance applications are characterized by an Incast flow model of many-to-one communication, and for an Ethernet switch, incast flow is easy to cause instantaneous burst congestion and even packet loss of a queue cache in the switch, so that the increase of application time delay and the decrease of throughput are brought, and the performance of the distributed applications is damaged.
Because in the actual data center network, a large number of devices such as servers, storage devices, virtual machines, network switches and routers are involved, the actual environment is very complex, the forwarding path of the traffic is also intricate and complex, and the PFC deadlock detection and recovery function is an important characteristic for constructing a lossless data center network, and is also an important index for testing whether the network switch can be used for the data center network. PFC deadlock (PFC Dead Lock) refers to a network state in which when a plurality of switches are simultaneously congested due to a loop or the like, respective port buffers consume more than a threshold value and wait for each other to release resources, thereby causing data flows on all switches to be permanently blocked. The existing architecture is complex in networking and topology, a large number of servers and network equipment are needed, the configuration is complex, and the testing requirement of PFC deadlock functions cannot be met.
Disclosure of Invention
One or more embodiments of the present disclosure provide a method, an apparatus, and a medium for testing PFC deadlock functions of a network device, which are configured to solve the following technical problems: the existing architecture is complex in networking and topology, a large number of servers and network equipment are needed, the configuration is complex, and the testing requirement of PFC deadlock functions cannot be met.
One or more embodiments of the present disclosure adopt the following technical solutions:
one or more embodiments of the present disclosure provide a PFC deadlock function test method for a network device, where the method includes: building a test environment, wherein the test environment comprises a plurality of Spine devices, a plurality of Leaf devices and a plurality of server devices, the Spine devices are connected with the Leaf devices, and the Leaf devices are connected with the server devices; performing equipment configuration on the Spine equipment, the Leaf equipment and the server equipment so that a flow path between the server equipment meets preset requirements; when the flow path between the server devices meets preset requirements, controlling the appointed server devices and the opposite-end server devices to mutually send bidirectional flow, configuring Qos speed limiting functions for ports of uplink devices of the appointed Leaf devices through operating the appointed Leaf devices, and triggering PFC deadlock scenes in the test environment; under the PFC deadlock scene, configuring PFC deadlock detection functions for the Spine equipment and the Leaf equipment, and acquiring flow receiving data of the appointed server equipment and the opposite-end server equipment; based on the flow receiving data, judging whether PFC deadlock detection functions of the Spine equipment and the Leaf equipment are effective or not so as to test the PFC deadlock detection functions.
Further, building a test environment, specifically including: through a designated link, respectively interconnecting the first Spine device with the first Leaf device, the second Leaf device and the third Leaf device, and respectively interconnecting the second Spine device with the second Leaf device, the third Leaf device and the fourth Leaf device; through a preset link, the first Leaf device and the first server device are interconnected, the second Leaf device and the second server device are interconnected, the third Leaf device and the third server device are interconnected, and the fourth Leaf device and the fourth server device are interconnected, wherein the first server device and the fourth server device are opposite-end devices, and the second server device and the third server device are opposite-end devices.
Further, the device configuration is performed on the Spine device, the Leaf device and the server device, so that a flow path between the server devices meets preset requirements, and the method specifically comprises the following steps: configuring IP addresses for the first server device and the fourth server device, sending binding flow, configuring static routes on the Spine device and the Leaf device, and planning a flow forwarding path between the first server device and the fourth server device to meet a first preset requirement; and configuring IP addresses for the second server device and the third server device, sending binding flow, configuring static routes on the Spine device and the Leaf device, and planning a flow forwarding path between the second server device and the third server device to meet a second preset requirement.
Further, planning a traffic forwarding path between the first server device and the fourth server device to meet a first preset requirement specifically includes: when the traffic forwarding direction is that the first server device forwards traffic to the fourth server device, the traffic forwarding path corresponding to the first preset requirement is that the first server device, the first Leaf device, the first Spine device, the third Leaf device, the second Spine device, the fourth Leaf device and the fourth server device; when the traffic forwarding direction is that the fourth server device forwards traffic to the first server device, the traffic forwarding path corresponding to the first preset requirement is that the fourth server device, the fourth Leaf device, the second Spine device, the second Leaf device, the first Spine device, the first Leaf device and the first server device; planning that a traffic forwarding path between the second server device and the third server device meets a second preset requirement specifically includes: when the traffic forwarding direction is that the second server device forwards traffic to the third server device, the traffic forwarding path corresponding to the second preset requirement is that the second server device, the second Leaf device, the first Spine device, the third Leaf device and the third server device; and when the traffic forwarding direction is that the third server equipment forwards traffic to the second server equipment, the traffic forwarding path corresponding to the second preset requirement is that the third server equipment, the third Leaf equipment, the second Spine equipment, the second Leaf equipment and the second server equipment.
Further, based on the flow receiving data, determining whether PFC deadlock detection functions of the Spine device and the Leaf device are effective, specifically includes: and if each server device cannot receive the flow, judging that the PFC deadlock detection functions of the Spine device and the Leaf device are effective.
Further, after determining whether PFC deadlock detection functions of the Spine device and the Leaf device are in effect based on the traffic received data, the method further includes: configuring PFC deadlock detection recovery functions for the Spine equipment and the Leaf equipment, and acquiring flow parameters of the server equipment; and judging whether the PFC deadlock detection recovery function of the Spine device and the Leaf device is effective or not according to the flow parameters of the server device.
Further, according to the flow parameters of the server device, judging whether the PFC deadlock detection recovery functions of the Spine device and the Leaf device are effective or not, which specifically includes: determining a real-time traffic sending rate and a traffic forwarding state in traffic parameters of the server equipment, wherein the traffic forwarding state comprises a stable forwarding state and a cut-off state; and when the real-time traffic sending rate is smaller than the initial traffic sending rate and the traffic forwarding state is a stable forwarding state, judging that the PFC deadlock detection recovery function of the Spine equipment and the Leaf equipment is effective, wherein the initial traffic sending rate is the corresponding traffic sending rate after triggering the PFC deadlock scene in the test environment.
Further, after determining whether PFC deadlock detection functions of the Spine device and the Leaf device are in effect, the method further includes: after the PFC deadlock detection function of the Spine equipment and the Leaf equipment is effective, the uplink Spine equipment port of the appointed Leaf equipment is disconnected through operating the appointed Leaf equipment, and the Qos speed limiting function is canceled to realize normal forwarding of flow.
One or more embodiments of the present specification provide a PFC deadlock function test device for a network device, including:
at least one processor; the method comprises the steps of,
a memory communicatively coupled to the at least one processor; wherein,
the memory stores instructions executable by the at least one processor to enable the at least one processor to:
building a test environment, wherein the test environment comprises a plurality of Spine devices, a plurality of Leaf devices and a plurality of server devices, the Spine devices are connected with the Leaf devices, and the Leaf devices are connected with the server devices; performing equipment configuration on the Spine equipment, the Leaf equipment and the server equipment so that a flow path between the server equipment meets preset requirements; when the flow path between the server devices meets preset requirements, controlling the appointed server devices and the opposite-end server devices to mutually send bidirectional flow, configuring Qos speed limiting functions for ports of uplink devices of the appointed Leaf devices through operating the appointed Leaf devices, and triggering PFC deadlock scenes in the test environment; under the PFC deadlock scene, configuring PFC deadlock detection functions for the Spine equipment and the Leaf equipment, and acquiring flow receiving data of the appointed server equipment and the opposite-end server equipment; based on the flow receiving data, judging whether PFC deadlock detection functions of the Spine equipment and the Leaf equipment are effective or not so as to test the PFC deadlock detection functions.
One or more embodiments of the present specification provide a non-volatile computer storage medium storing computer-executable instructions configured to:
building a test environment, wherein the test environment comprises a plurality of Spine devices, a plurality of Leaf devices and a plurality of server devices, the Spine devices are connected with the Leaf devices, and the Leaf devices are connected with the server devices; performing equipment configuration on the Spine equipment, the Leaf equipment and the server equipment so that a flow path between the server equipment meets preset requirements; when the flow path between the server devices meets preset requirements, controlling the appointed server devices and the opposite-end server devices to mutually send bidirectional flow, configuring Qos speed limiting functions for ports of uplink devices of the appointed Leaf devices through operating the appointed Leaf devices, and triggering PFC deadlock scenes in the test environment; under the PFC deadlock scene, configuring PFC deadlock detection functions for the Spine equipment and the Leaf equipment, and acquiring flow receiving data of the appointed server equipment and the opposite-end server equipment; based on the flow receiving data, judging whether PFC deadlock detection functions of the Spine equipment and the Leaf equipment are effective or not so as to test the PFC deadlock detection functions.
The above-mentioned at least one technical scheme that this description embodiment adopted can reach following beneficial effect: in a real data center nondestructive network test, a large number of network devices such as a server, a storage device and network devices are involved, topology construction is very complex, PFC deadlock detection and automatic recovery functions of the network devices are simulated and tested in a laboratory environment by simplifying the test topology, the PFC deadlock detection and automatic recovery functions can be normally effective when congestion occurs in a network is verified, packet loss is prevented when the network is congested, and stability and reliability of an intelligent nondestructive network are guaranteed. In addition, through the test topology and the flow, a large number of PFC back pressure signals in the intelligent lossless network are simulated by using the PFC deadlock detection and automatic recovery technology, when the network deadlock is most likely to be induced, the quick response and the automatic recovery are realized, the network traffic is ensured not to be deadlocked, the packet loss is avoided when the network traffic is congested, and the accuracy and the high efficiency of corresponding network equipment in the intelligent lossless network are verified.
Drawings
In order to more clearly illustrate the embodiments of the present description or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described below, it being obvious that the drawings in the following description are only some of the embodiments described in the present description, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art. In the drawings:
Fig. 1 is a schematic flow chart of a PFC deadlock function test method of a network device according to an embodiment of the present disclosure;
FIG. 2 is a schematic topology diagram of a test environment provided by an embodiment of the present disclosure;
fig. 3 is a schematic diagram of a PFC-on deadlock detection according to an embodiment of the present disclosure;
fig. 4 is a schematic diagram of a PFC deadlock decision according to an embodiment of the present disclosure;
fig. 5 is a schematic diagram of a process for entering PFC deadlock recovery according to an embodiment of the present disclosure;
fig. 6 is a schematic diagram of a closing PFC deadlock detection according to an embodiment of the present disclosure;
fig. 7 is a schematic structural diagram of a PFC deadlock function test device of a network device according to an embodiment of the present disclosure.
Detailed Description
In order to make the technical solutions in the present specification better understood by those skilled in the art, the technical solutions in the embodiments of the present specification will be clearly and completely described below with reference to the drawings in the embodiments of the present specification, and it is obvious that the described embodiments are only some embodiments of the present specification, not all embodiments. All other embodiments, which can be made by one of ordinary skill in the art based on the embodiments herein without making any inventive effort, shall fall within the scope of the present disclosure.
With the acceleration of global enterprise digital transformation, the mission of data centers is moving from focused business rapid delivery to focused data efficient processing. In order to improve the efficiency of data processing, the current hot applications such as high-performance computing, distributed storage, AI artificial intelligence and the like require that the data center network has the capabilities of no packet loss, low time delay and high throughput. However, the traditional network communication based on the TCP/IP protocol stack cannot meet the high requirement on the network performance due to the fact that resources are consumed in key links such as data copying and the like greatly and time delay is too high.
The remote direct memory access function (Remote Direct Memory Access, RDMA) uses related hardware and network technology to enable the memory to be directly read between network cards of the server, and finally achieves the effects of high bandwidth, low time delay and low resource consumption rate. The current applications such as high-performance computing, distributed storage, artificial intelligence and the like all adopt the RoCEv2 protocol to reduce the processing and time delay of the CPU and improve the performance of the application. However, since RDMA was originally proposed to be carried in lossless InfiniBand networks, the RoCEv2 protocol lacks a perfect packet loss protection mechanism and is extremely sensitive to network packet loss. Meanwhile, the distributed high-performance applications are characterized by an Incast flow model of many-to-one communication, and for an Ethernet switch, incast flow is easy to cause instantaneous burst congestion and even packet loss of a queue cache in the switch, so that the increase of application time delay and the decrease of throughput are brought, and the performance of the distributed applications is damaged.
Because in the actual data center network, a large number of devices such as servers, storage devices, virtual machines, network switches and routers are involved, the actual environment is very complex, the forwarding path of the traffic is also intricate and complex, and the PFC deadlock detection and recovery function is an important characteristic for constructing a lossless data center network, and is also an important index for testing whether the network switch can be used for the data center network. PFC deadlock (PFC Dead Lock) refers to a network state in which when a plurality of switches are simultaneously congested due to a loop or the like, respective port buffers consume more than a threshold value and wait for each other to release resources, thereby causing data flows on all switches to be permanently blocked. The existing architecture is complex in networking and topology, a large number of servers and network equipment are needed, the configuration is complex, and the testing requirement of PFC deadlock functions cannot be met.
The embodiment of the present disclosure provides a method for testing PFC deadlock functions of a network device, and it should be noted that an execution body in the embodiment of the present disclosure may be a server, or may be any device having data processing capability. Fig. 1 is a flow chart of a PFC deadlock function test method for a network device according to an embodiment of the present disclosure, as shown in fig. 1, mainly including the following steps:
And step S101, building a test environment.
The test environment comprises a plurality of Spine devices, a plurality of Leaf devices and a plurality of server devices, wherein the Spine devices are connected with the Leaf devices, and the Leaf devices are connected with the server devices.
Building a test environment, which specifically comprises the following steps: through a designated link, respectively interconnecting the first Spine device with the first Leaf device, the second Leaf device and the third Leaf device, and respectively interconnecting the second Spine device with the second Leaf device, the third Leaf device and the fourth Leaf device; through a preset link, the first Leaf device and the first server device are interconnected, the second Leaf device and the second server device are interconnected, the third Leaf device and the third server device are interconnected, and the fourth Leaf device and the fourth server device are interconnected, wherein the first server device and the fourth server device are opposite-end devices, and the second server device and the third server device are opposite-end devices.
In an embodiment of the present disclosure, a test environment is built, and fig. 2 is a schematic topology diagram of a test environment provided in the embodiment of the present disclosure, where, as shown in fig. 2, the test environment includes two Spine devices, respectively, spine1 and Spine2, including four Leaf devices, respectively, leaf1A, leaf1B, leaf a and Leaf2B, and further includes four Server devices, respectively, server1, server2, server3, and Server4. The Spine1 devices were interconnected with Leaf1A, leaf1B, leaf a via 100GE links, respectively, and the Spine2 devices were interconnected with Leaf1B, leaf a and Leaf2B via 100GE links, respectively. In addition, the 4 leaf devices and the 4 server devices are interconnected through a 25GE link, and the servers all use network cards with 25G speed. That is, the Leaf1A device is interconnected with the Server1 through a 25GE link, the Leaf1B device is interconnected with the Server2 through a 25GE link, the Leaf2A device is interconnected with the Server3 through a 25GE link, and the Leaf2B device is interconnected with the Server4 through a 25GE link.
Step S102, device configuration is carried out on the Spine device, the Leaf device and the server device so that the flow paths among the server devices meet preset requirements.
The equipment configuration is carried out on the Spine equipment, the Leaf equipment and the server equipment so that the flow paths among the server equipment meet preset requirements, and the method specifically comprises the following steps: configuring IP addresses for the first server device and the fourth server device, sending binding flow, configuring static routes on the Spine device and the Leaf device, and planning a flow forwarding path between the first server device and the fourth server device to meet a first preset requirement; and configuring IP addresses for the second server device and the third server device, sending binding flow, configuring static routes on the Spine device and the Leaf device, and planning a flow forwarding path between the second server device and the third server device to meet a second preset requirement.
Planning a traffic forwarding path between the first server device and the fourth server device to meet a first preset requirement specifically includes: when the traffic forwarding direction is that the first server device forwards the traffic to the fourth server device, the traffic forwarding path corresponding to the first preset requirement is that the first server device, the first Leaf device, the first Spine device, the third Leaf device, the second Spine device, the fourth Leaf device and the fourth server device; when the traffic forwarding direction is that the fourth server device forwards the traffic to the first server device, the traffic forwarding path corresponding to the first preset requirement is that the fourth server device, the fourth Leaf device, the second Spine device, the second Leaf device, the first Spine device, the first Leaf device and the first server device; planning a traffic forwarding path between the second server device and the third server device to meet a second preset requirement specifically includes: when the traffic forwarding direction is that the second server device forwards the traffic to the third server device, the traffic forwarding path corresponding to the second preset requirement is that the second server device, the second Leaf device, the first Spine device, the third Leaf device and the third server device; and when the traffic forwarding direction is that the third server equipment forwards the traffic to the second server equipment, the traffic forwarding path corresponding to the second preset requirement is that the third server equipment, the third Leaf equipment, the second Spine equipment, the second Leaf equipment and the second server equipment.
In one embodiment of the present disclosure, IP addresses are configured for network cards of Server1 and Server4, and corresponding static routing information is configured on network devices Spine1, spine2, and Leaf1A, leaf1B, leaf2A, leaf B, so that binding flows sent by opposite ends can be received between Server1 and Server4, where a flow path from Server1 to Server4 is defined as follows:
the flow path of Server1, leaf1A, spe1, leaf2A, spe2, leaf2B, server4 to Server1 is:
Server4→Leaf2B→Spine2→Leaf1B→Spine1→Leaf1A→Server1。
in addition, the network cards of the Server2 and the Server3 are configured with IP addresses, and corresponding static route information is configured on the network devices of the Spine1, the Spine2 and the Leaf1A, leaf1B, leaf2A, leaf B, so that the binding flow sent by the opposite ends can be mutually received between the Server2 and the Server3, wherein the flow path from the Server2 to the Server3 is planned as follows: server 2- & gt Leaf 1B- & gt Spine 1- & gt Leaf 2A- & gt Server3; the traffic path planning from Server3 to Server2 is:
Server3→Leaf2A→Spine2→Leaf1B→Server2。
in one embodiment of the present disclosure, after the test environment is built, it needs to be determined whether the device in the test environment can implement forwarding of the traffic. The network cards of the Server1, server2, server3 and Server4 simultaneously and continuously send test flow, and check on the Server1, server2, server3 and Server4, can receive the flow sent by the opposite end Server, will not appear the flow to lose, then verify successfully. Here, it should be noted that Server1 and Server4 are opposite terminal devices, and Server2 and Server3 are opposite terminal devices.
Step S103, when the flow path between the server devices meets the preset requirement, controlling the appointed server device and the opposite-end server device to mutually send bidirectional flow, configuring Qos speed limiting function for the port of the uplink Spine device of the appointed Leaf device through operating the appointed Leaf device, and triggering PFC deadlock scene in the test environment.
In one embodiment of the present disclosure, since Server1 and Server4, and Server2 and Server3 send bi-directional traffic to each other, no packet loss occurs in normal situations, i.e., when the network topology is stable. At this time, the network device leaf1B is operated, and a Qos speed limiting function is configured on a port of the network device leaf1, so that the trafficable bandwidth of the whole network is reduced by the operation, and a scene of massive congestion of the flow can be accurately constructed in the test network.
It should be noted that PFC DeadLock (PFC DeadLock) refers to a network state in which, when congestion occurs between a plurality of switches due to a loop or the like, the respective port buffers consume more than a threshold value, and wait for each other to release resources, thereby causing a permanent blocking of data flows on all switches. Under normal conditions, the flow pausing in the PFC only aims at one or a plurality of priority queues, the whole interface is not interrupted, each queue can be paused or restarted independently, the flow on other queues is not influenced, and various flow sharing links are really realized. However, when a link failure or device failure occurs, a transient loop may occur in the network during route re-convergence, resulting in a circular dependent buffer (Cyclic Buffer Dependency). When a plurality of switches reach the PFC threshold, PFC back-pressure frames are simultaneously sent to opposite ends, and all switches in the topology are in a stopped flow state.
PFC deadlock detection monitors PFC deadlock throughout the process, and when the device continues to receive PFC backpressure frames during the deadlock detection period, it will not respond. Firstly, in the deadlock detection, fig. 3 is a schematic diagram of starting PFC deadlock detection provided in the embodiment of the present disclosure, as shown in fig. 3, after a port of Device2 receives a PFC back pressure frame sent by Device1, an internal scheduler will stop sending queue traffic with a corresponding priority, start a timer, and start detecting the PFC back pressure frame received by the queue according to the set deadlock detection and accuracy. Next, a deadlock determination is shown in fig. 4, which is a schematic diagram for determining PFC deadlock according to the embodiment of the present disclosure, and as shown in fig. 4, if Device2 is always in a PFC-XOFF (i.e. is fluidified) state within a set PFC deadlock detection time, it is considered that Device1 has a PFC deadlock, and a PFC deadlock recovery processing flow is required for Device 2. In addition, in the deadlock recovery process, as shown in fig. 5, device2 ignores the PFC backpressure frame received by the port, and the internal scheduler may resume sending the queue traffic of the corresponding priority, or may choose to discard the queue traffic of the corresponding priority, and the Device2 resumes the normal flow control mechanism of PFC after the recovery period. If Device2 still judges that the deadlock occurs in the next deadlock detection period, a new cycle of deadlock recovery flow is performed. Finally, the manual recovery of the deadlock control is performed, fig. 6 is a schematic diagram of closing the detection of the PFC deadlock according to the embodiment of the present disclosure, and as shown in fig. 6, if the above-mentioned deadlock recovery process is not effective, the Device1 may still continuously generate the PFC deadlock phenomenon, and then after how many times the Device2 is deadlocked in a period of time, the Device may be configured to forcedly enter the deadlock control process. For example, after the PFC deadlock is triggered for a certain number of times within a set period of time, it is considered that a deadlock phenomenon frequently occurs in the network, and there is a great risk, at this time, device2 enters a deadlock control flow, and the Device will automatically close the PFC function, and needs to be manually recovered.
Step S104, under the PFC deadlock scene, the PFC deadlock detection function is configured for the Spine equipment and the Leaf equipment, and flow receiving data of the designated server equipment and the opposite-end server equipment are obtained.
In one embodiment of the present disclosure, in PFC deadlock scenarios, PFC deadlock detection functions are configured for the Spine device and the Leaf device, i.e., the Spine1, spine2, leaf1A, leaf1B, leaf a, and Leaf2B are configured with PFC deadlock detection functions, and no auto-recovery function is enabled.
Step S105, based on the flow receiving data, determines whether PFC deadlock detection functions of the Spine device and the Leaf device are effective, so as to test the PFC deadlock detection functions.
Based on the flow receiving data, judging whether the PFC deadlock detection function of the Spine device and the Leaf device is effective or not, and specifically comprising: and if each server device cannot receive the flow, judging that the PFC deadlock detection functions of the Spine device and the Leaf device are effective.
In one embodiment of the present disclosure, since all network devices in the test topology start the PFC deadlock detection function, traffic congestion is exacerbated, and PFC deadlock is triggered steadily across the entire network topology. Because the PFC deadlock automatic recovery function is not started, once the PFC deadlock occurs, all network devices do not forward traffic any more, and the whole network is interrupted.
Therefore, server1, server2, server3 and Server4 no longer receive any traffic, which means that PFC deadlock occurs in the network and cannot be automatically recovered, resulting in total interruption of traffic.
After determining whether the PFC deadlock detection functions of the Spine device and the Leaf device are in effect, the method further includes: after the PFC deadlock detection function of the Spine device and the Leaf device is effective, the uplink Spine device port of the appointed Leaf device is disconnected through operating the appointed Leaf device, and the Qos speed limiting function is canceled, so that normal forwarding of flow is realized.
In one embodiment of the present description, the traffic is totally interrupted due to PFC deadlock in the network and failure to recover automatically. At this time, the port of the upper spine1 can be disconnected by operating the network equipment leaf1B, and Qos speed limiting configuration is canceled, so that PFC deadlock of the whole network can be eliminated, and normal forwarding of the network can be recovered.
After determining whether the PFC deadlock detection functions of the Spine device and the Leaf device are in effect based on the traffic reception data, the method further includes: configuring PFC deadlock detection recovery functions for the Spine device and the Leaf device, and acquiring flow parameters of the server device; and judging whether the PFC deadlock detection recovery function of the Spine device and the Leaf device is effective or not according to the flow parameters of the server device.
Judging whether the PFC deadlock detection recovery function of the Spine device and the Leaf device is effective or not according to the flow parameters of the server device, and specifically comprising: determining a real-time traffic sending rate and a traffic forwarding state in traffic parameters of the server device, wherein the traffic forwarding state comprises a stable forwarding state and a cut-off state; when the real-time traffic sending rate is smaller than the initial traffic sending rate and the traffic forwarding state is a stable forwarding state, judging that the PFC deadlock detection recovery function of the Spine equipment and the Leaf equipment is effective, wherein the initial traffic sending rate is the corresponding traffic sending rate after triggering the PFC deadlock scene in the test environment.
In one embodiment of the present disclosure, all network devices start PFC deadlock detection recovery functions, recover the port of the duplex 1 on the leaf1, determine that the Server1, the Server2, the Server3, and the Server4 can normally receive traffic, and resume network forwarding. At this time, the network device leaf1B is operated, the Qos speed limiting function is configured on the port of the upper link Spine1, and through the operation again, the passable bandwidth of the whole network is reduced, and a scene of massive congestion of the flow is accurately constructed in the test network.
Because all network devices start the PFC deadlock detection automatic recovery function, when the traffic congestion is aggravated, the network devices can automatically recover after the PFC deadlock occurs, and the sending end Server1, server2, server3 and Server4 of the traffic are notified to reduce the sending rate by sending the pause frame, so that the traffic congestion does not occur in the network any more, and the PFC deadlock of the whole network is further automatically eliminated.
Therefore, the sending rate of network cards of the Server1, server2, server3 and Server4 is automatically reduced, and finally, the PFC deadlock detection and automatic recovery functions of the network equipment are normally effective when the condition of no flow break occurs according to the stable forwarding flow of the configured qos speed limit value.
In a real data center nondestructive network test, a large number of network devices such as a server, a storage device and network devices are involved, topology construction is very complex, PFC deadlock detection and automatic recovery functions of the network devices are simulated and tested in a laboratory environment by simplifying the test topology, the PFC deadlock detection and automatic recovery functions can be normally effective when congestion occurs in a network is verified, packet loss is prevented when the network is congested, and stability and reliability of an intelligent nondestructive network are guaranteed. In addition, through the test topology and the flow, a large number of PFC back pressure signals in the intelligent lossless network are simulated by using the PFC deadlock detection and automatic recovery technology, when the network deadlock is most likely to be induced, the quick response and the automatic recovery are realized, the network traffic is ensured not to be deadlocked, the packet loss is avoided when the network traffic is congested, and the accuracy and the high efficiency of corresponding network equipment in the intelligent lossless network are verified.
The embodiment of the present disclosure further provides a PFC deadlock function test device of a network device, as shown in fig. 7, where the device includes: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor, the instructions being executable by the at least one processor to enable the at least one processor to:
building a test environment, wherein the test environment comprises a plurality of Spine devices, a plurality of Leaf devices and a plurality of server devices, the Spine devices are connected with the Leaf devices, and the Leaf devices are connected with the server devices; performing equipment configuration on the Spine equipment, the Leaf equipment and the server equipment so that a flow path between the server equipment meets preset requirements; when the flow path between the server devices meets the preset requirement, controlling the appointed server device and the opposite-end server device to mutually send bidirectional flow, configuring a Qos speed limiting function for an uplink Spine device port of the appointed Leaf device through operating the appointed Leaf device, and triggering a PFC deadlock scene in the test environment; under the PFC deadlock scene, configuring PFC deadlock detection functions for the Spine equipment and the Leaf equipment, and acquiring flow receiving data of the appointed server equipment and the opposite-end server equipment; based on the flow receiving data, judging whether the PFC deadlock detection function of the Spine device and the Leaf device is effective or not so as to test the PFC deadlock detection function.
The present specification embodiments also provide a non-volatile computer storage medium storing computer-executable instructions configured to:
building a test environment, wherein the test environment comprises a plurality of Spine devices, a plurality of Leaf devices and a plurality of server devices, the Spine devices are connected with the Leaf devices, and the Leaf devices are connected with the server devices; performing equipment configuration on the Spine equipment, the Leaf equipment and the server equipment so that a flow path between the server equipment meets preset requirements; when the flow path between the server devices meets the preset requirement, controlling the appointed server device and the opposite-end server device to mutually send bidirectional flow, configuring a Qos speed limiting function for an uplink Spine device port of the appointed Leaf device through operating the appointed Leaf device, and triggering a PFC deadlock scene in the test environment; under the PFC deadlock scene, configuring PFC deadlock detection functions for the Spine equipment and the Leaf equipment, and acquiring flow receiving data of the appointed server equipment and the opposite-end server equipment; based on the flow receiving data, judging whether the PFC deadlock detection function of the Spine device and the Leaf device is effective or not so as to test the PFC deadlock detection function.
In this specification, each embodiment is described in a progressive manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments. In particular, for apparatus, devices, non-volatile computer storage medium embodiments, the description is relatively simple, as it is substantially similar to method embodiments, with reference to the section of the method embodiments being relevant.
The foregoing describes specific embodiments of the present disclosure. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims can be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing are also possible or may be advantageous.
The devices and media provided in the embodiments of the present disclosure are in one-to-one correspondence with the methods, so that the devices and media also have similar beneficial technical effects as the corresponding methods, and since the beneficial technical effects of the methods have been described in detail above, the beneficial technical effects of the devices and media are not repeated here.
It will be appreciated by those skilled in the art that embodiments of the present description may be provided as a method, system, or computer program product. Accordingly, the present specification may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present description can take the form of a computer program product on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.
The present description is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the specification. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
In one typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
The memory may include volatile memory in a computer-readable medium, random Access Memory (RAM) and/or nonvolatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of computer-readable media.
Computer readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of storage media for a computer include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium, which can be used to store information that can be accessed by a computing device. Computer-readable media, as defined herein, does not include transitory computer-readable media (transmission media), such as modulated data signals and carrier waves.
It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article or apparatus that comprises the element.
The foregoing is merely one or more embodiments of the present description and is not intended to limit the present description. Various modifications and alterations to one or more embodiments of this description will be apparent to those skilled in the art. Any modification, equivalent replacement, improvement, or the like, which is within the spirit and principles of one or more embodiments of the present description, is intended to be included within the scope of the claims of the present description.

Claims (5)

1. A method for testing PFC deadlock functions of a network device, the method comprising:
building a test environment, wherein the test environment comprises a plurality of Spine devices, a plurality of Leaf devices and a plurality of server devices, the Spine devices are connected with the Leaf devices, and the Leaf devices are connected with the server devices;
performing equipment configuration on the Spine equipment, the Leaf equipment and the server equipment so that a flow path between the server equipment meets preset requirements;
when the flow path between the server devices meets preset requirements, controlling the appointed server devices and the opposite-end server devices to mutually send bidirectional flow, configuring Qos speed limiting functions for ports of uplink devices of the appointed Leaf devices through operating the appointed Leaf devices, and triggering PFC deadlock scenes in the test environment;
Under the PFC deadlock scene, configuring PFC deadlock detection functions for the Spine equipment and the Leaf equipment, and acquiring flow receiving data of the appointed server equipment and the opposite-end server equipment;
based on the flow receiving data, judging whether PFC deadlock detection functions of the Spine equipment and the Leaf equipment are effective or not so as to test the PFC deadlock detection functions;
based on the flow receiving data, judging whether the PFC deadlock detection functions of the Spine device and the Leaf device are effective or not, and specifically comprising:
if each server device cannot receive the flow, judging that PFC deadlock detection functions of the Spine device and the Leaf device are effective; building a test environment, which specifically comprises the following steps:
through a designated link, respectively interconnecting the first Spine device with the first Leaf device, the second Leaf device and the third Leaf device, and respectively interconnecting the second Spine device with the second Leaf device, the third Leaf device and the fourth Leaf device;
through a preset link, the first Leaf device and the first server device are interconnected, the second Leaf device and the second server device are interconnected, the third Leaf device and the third server device are interconnected, the fourth Leaf device and the fourth server device are interconnected, wherein the first server device and the fourth server device are opposite-end devices, and the second server device and the third server device are opposite-end devices; performing device configuration on the Spine device, the Leaf device and the server device to enable a flow path between the server devices to meet preset requirements, wherein the method specifically comprises the following steps:
Configuring IP addresses for the first server device and the fourth server device, sending binding flow, configuring static routes on the Spine device and the Leaf device, and planning a flow forwarding path between the first server device and the fourth server device to meet a first preset requirement;
configuring IP addresses for the second server device and the third server device, sending binding flow, configuring static routes on the Spine device and the Leaf device, and planning a flow forwarding path between the second server device and the third server device to meet a second preset requirement; planning a traffic forwarding path between the first server device and the fourth server device to meet a first preset requirement specifically includes:
when the traffic forwarding direction is that the first server device forwards traffic to the fourth server device, the traffic forwarding path corresponding to the first preset requirement is that the first server device, the first Leaf device, the first Spine device, the third Leaf device, the second Spine device, the fourth Leaf device and the fourth server device;
When the traffic forwarding direction is that the fourth server device forwards traffic to the first server device, the traffic forwarding path corresponding to the first preset requirement is that the fourth server device, the fourth Leaf device, the second Spine device, the second Leaf device, the first Spine device, the first Leaf device and the first server device;
planning that a traffic forwarding path between the second server device and the third server device meets a second preset requirement specifically includes:
when the traffic forwarding direction is that the second server device forwards traffic to the third server device, the traffic forwarding path corresponding to the second preset requirement is that the second server device, the second Leaf device, the first Spine device, the third Leaf device and the third server device;
and when the traffic forwarding direction is that the third server equipment forwards traffic to the second server equipment, the traffic forwarding path corresponding to the second preset requirement is that the third server equipment, the third Leaf equipment, the second Spine equipment, the second Leaf equipment and the second server equipment.
2. The method for testing PFC deadlock functionality of a network device of claim 1, wherein after determining whether PFC deadlock detection functionality of the Spine device and the Leaf device is in effect based on the traffic received data, the method further comprises:
configuring PFC deadlock detection recovery functions for the Spine equipment and the Leaf equipment, and acquiring flow parameters of the server equipment;
judging whether PFC deadlock detection recovery functions of the Spine equipment and the Leaf equipment are effective or not according to flow parameters of the server equipment; judging whether the PFC deadlock detection recovery function of the Spine device and the Leaf device is effective or not according to the flow parameters of the server device, wherein the method specifically comprises the following steps:
determining a real-time traffic sending rate and a traffic forwarding state in traffic parameters of the server equipment, wherein the traffic forwarding state comprises a stable forwarding state and a cut-off state;
and when the real-time traffic sending rate is smaller than the initial traffic sending rate and the traffic forwarding state is a stable forwarding state, judging that the PFC deadlock detection recovery function of the Spine equipment and the Leaf equipment is effective, wherein the initial traffic sending rate is the corresponding traffic sending rate after triggering the PFC deadlock scene in the test environment.
3. The method for testing PFC deadlock functionality of a network device of claim 1, wherein after determining whether PFC deadlock detection functionality of the Spine device and the Leaf device is in effect, the method further comprises:
after the PFC deadlock detection function of the Spine equipment and the Leaf equipment is effective, the uplink Spine equipment port of the appointed Leaf equipment is disconnected through operating the appointed Leaf equipment, and the Qos speed limiting function is canceled to realize normal forwarding of flow.
4. A PFC deadlock function test device for a network device, the device comprising:
at least one processor; the method comprises the steps of,
a memory communicatively coupled to the at least one processor; wherein,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-3.
5. A non-transitory computer storage medium storing computer executable instructions arranged to perform the method of any one of claims 1-3.
CN202211632925.9A 2022-12-19 2022-12-19 PFC deadlock function test method, device and medium of network device Active CN115987831B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211632925.9A CN115987831B (en) 2022-12-19 2022-12-19 PFC deadlock function test method, device and medium of network device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211632925.9A CN115987831B (en) 2022-12-19 2022-12-19 PFC deadlock function test method, device and medium of network device

Publications (2)

Publication Number Publication Date
CN115987831A CN115987831A (en) 2023-04-18
CN115987831B true CN115987831B (en) 2024-04-12

Family

ID=85965930

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211632925.9A Active CN115987831B (en) 2022-12-19 2022-12-19 PFC deadlock function test method, device and medium of network device

Country Status (1)

Country Link
CN (1) CN115987831B (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107294764A (en) * 2017-04-26 2017-10-24 中国科学院信息工程研究所 Intelligent supervision method and intelligent monitoring system
CN108768781A (en) * 2018-06-06 2018-11-06 清华大学 Avoid deadlock without losing flow control methods and device
CN113992588A (en) * 2021-10-21 2022-01-28 浪潮电子信息产业股份有限公司 Data transmission method and device, electronic equipment and readable storage medium
CN114070799A (en) * 2020-07-27 2022-02-18 中国电信股份有限公司 Processing method and processing device of priority pause frame and target network equipment
CN114157609A (en) * 2021-11-30 2022-03-08 迈普通信技术股份有限公司 PFC deadlock detection method and device
CN114448896A (en) * 2022-03-10 2022-05-06 中国工商银行股份有限公司 Network optimization method and device

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10594505B2 (en) * 2016-01-27 2020-03-17 Quanta Computer Inc. System and method for avoiding deadlock in transmission of broadcast traffic in a server system
US10891179B2 (en) * 2018-10-22 2021-01-12 Western Digital Technologies, Inc. Data storage device with deadlock recovery capabilities

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107294764A (en) * 2017-04-26 2017-10-24 中国科学院信息工程研究所 Intelligent supervision method and intelligent monitoring system
CN108768781A (en) * 2018-06-06 2018-11-06 清华大学 Avoid deadlock without losing flow control methods and device
CN114070799A (en) * 2020-07-27 2022-02-18 中国电信股份有限公司 Processing method and processing device of priority pause frame and target network equipment
CN113992588A (en) * 2021-10-21 2022-01-28 浪潮电子信息产业股份有限公司 Data transmission method and device, electronic equipment and readable storage medium
CN114157609A (en) * 2021-11-30 2022-03-08 迈普通信技术股份有限公司 PFC deadlock detection method and device
CN114448896A (en) * 2022-03-10 2022-05-06 中国工商银行股份有限公司 Network optimization method and device

Also Published As

Publication number Publication date
CN115987831A (en) 2023-04-18

Similar Documents

Publication Publication Date Title
Wang et al. Adaptive path isolation for elephant and mice flows by exploiting path diversity in datacenters
CN108696449B (en) Data scheduling method and device
Kushwaha et al. Congestion control for high-speed wired network: A systematic literature review
US10644985B1 (en) Device-contained data plane validation
US10771405B2 (en) Switching and load balancing techniques in a communication network
CN105915467A (en) Data center network flow balancing method and device oriented to software definition
Liu et al. Floodgate: Taming incast in datacenter networks
Abdous et al. Burst-tolerant datacenter networks with vertigo
CN106549821B (en) Network loop detection method and controller
Khabbaz et al. Delay-aware flow scheduling in low latency enterprise datacenter networks: Modeling and performance analysis
CN116366536A (en) Link aggregation port switching method, network equipment and computer storage medium
WO2018082787A1 (en) Transmission of guaranteed and non-guaranteed data packets on redundant links
Miguel-Alonso A research review of OpenFlow for datacenter networking
Hoque et al. SDN-DTN Combined Architecture in Post Disaster Scenario—A new way to start
KR101975082B1 (en) Methods for managing transaction in software defined networking network
CN115987831B (en) PFC deadlock function test method, device and medium of network device
Marchese et al. Simple protocol enhancements of rapid spanning tree protocol over ring topologies
Chang et al. Using sdn technology to mitigate congestion in the openstack data center network
CN106330834B (en) Virtual channel connection establishing method and device
Liu et al. Modeling multi-path TCP throughput with coupled congestion control and flow control
Nikitinskiy et al. A stateless transport protocol in software defined networks
Molina et al. Managing path diversity in layer 2 critical networks by using OpenFlow
US9813319B1 (en) Method of detecting packet loss in a communication network
KR102177737B1 (en) Randomization of packet size
Hussein et al. Layer-4 Load Balancer for Flow Size Prediction with TCP/UDP Separation Using P4

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant