WO2016029749A1

WO2016029749A1 - Communication failure detection method, device and system

Info

Publication number: WO2016029749A1
Application number: PCT/CN2015/084002
Authority: WO
Inventors: 张小东; 田彦峰; 孙名逊
Original assignee: 华为技术有限公司
Priority date: 2014-08-26
Filing date: 2015-07-14
Publication date: 2016-03-03
Also published as: CN104219107A; CN104219107B

Abstract

The present invention relates to the field of communications. Provided in an embodiment of the present invention are a communication failure detection method, device and system for solving the problem in the prior art that LAG cannot detect a malfunctioning port operating abnormally, thus avoiding the risk of transmitting data by using the malfunctioning port. The solution comprises: a detection device respectively acquires detection results of N ports in X servers, the detection results comprising wrong package data and lost package data of other ports determined by each port according to received detection messages transmitted by other ports; the detection device determines a state of a first port according to the wrong package data and the lost package data of other ports determined by each port, the state of the first port being used to indicate whether the first port has a fault; and the detection device generates a fault notification of the first port according to the state of the first port.

Description

Method, device and system for detecting communication fault

Technical field

The present invention relates to the field of communications, and in particular, to a method, device and system for detecting a communication failure.

Background technique

In the network construction technology, network port aggregation and switch stacking are used to improve network plane reliability. However, after port aggregation and switch stacking in various types of servers, each port in the server may be unavailable due to some failures, which may result in a communication path between the ports being unavailable.

In the prior art, the LAG (Link Aggregation Group) in the server can periodically detect the status of its own port. When the port is unavailable, the server is in accordance with the Link Aggregation Control Protocol (LACP). ), the unavailable port is removed from the LAG to implement the switching of the communication path. As shown in Figure 1, when port 1 of server 1 is unavailable, and

ports

2, 3, and 4 are running normally, port 1 is removed from the LAG, and LAG automatically selects

ports

2, 3, and 4 for data. Forwarding of the package.

However, when each port sends and receives data packets, it may be in a "sub-health" state due to some faults (for convenience of description, the present invention uniformly refers to a port in the "sub-health" state as a faulty port). At this time, the port can still be Other ports perform packet transmission and reception (that is, the port is still available), but the port may drop packets when sending packets, or tamper with the contents of the packet, etc., due to the status of the port to other ports. It is still available, so the LAG cannot detect the abnormality of the port when sending and receiving data packets, nor can it switch the communication path associated with the port. In this way, the faulty port ("sub-health" port) is used. The transmitted data will continue to be compromised, increasing the risk of data transmission.

Summary of the invention

The embodiment of the invention provides a method, a device and a system for detecting a communication failure, which solves the problem that the LAG cannot detect a faulty port in which abnormal operation occurs in the prior art, and avoids the risk of transmitting data using the faulty port.

In order to achieve the above object, embodiments of the present invention adopt the following technical solutions:

In a first aspect, an embodiment of the present invention provides a method for detecting a communication failure, including:

The detecting device obtains the detection results of the N ports in the X servers, and the detection result includes the error packet data and the packet loss data of the other ports determined by each port according to the received probe messages sent by other ports. >2,X>2;

Determining, by the detecting device, the status of the first port according to the error packet data and the packet loss data of the other port determined by each port, where the status of the first port is used to indicate whether the first port is faulty ;

The detecting device generates a failure notification of the first port according to the state of the first port.

In a first possible implementation manner of the first aspect, the detecting device determines, according to the error packet data and the packet loss data of the other port determined by each port, whether the first port is faulty, including:

The detecting device calculates, according to the detection result, a packet loss rate of the detection messages sent by the N ports to each other;

The detecting device determines whether the first port is faulty according to a packet loss rate of the detection messages sent between the N ports.

With reference to the first possible implementation manner of the first aspect, in a second possible implementation manner of the first aspect, the detecting device separately calculates, according to the detection result, the detecting, by the N ports The packet loss rate of the message, including:

The detecting device converts the error packet data in the detection result into relative packet loss data according to the first preset function;

The detecting device calculates a packet loss rate of the detection messages between the N ports according to the second preset function according to the relative packet loss data and the packet loss data in the detection result.

With reference to the first possible implementation manner of the first aspect, in a third possible implementation manner of the first aspect, the detecting device determines, according to a packet loss rate of the detection messages sent between the N ports Whether the first port is faulty, including:

In the N ports, if at least N/2 ports send the detection message to The packet loss rate of the first port is greater than a first preset value, and the packet loss rate of the detection message sent by the at least N/2 ports is less than a second preset value, and the detecting device determines The first port is faulty; otherwise, the detecting device determines that the first port is not faulty.

With reference to the foregoing first aspect, or any one of the first to the third possible implementation manners of the first aspect, in a fourth possible implementation manner of the first aspect, the fault notification includes a first failure notification, where the first failure notification is used to indicate that the first port is faulty,

The generating a failure notification of the first port includes:

The detecting device generates the first failure notification of the first port, so that after the server acquires the first failure notification, the first port is removed from the link aggregation group LAG.

In conjunction with the fourth possible implementation of the first aspect, in a fifth possible implementation manner of the first aspect, the fault notification includes a second fault notification, where the second fault notification is used to indicate the X N ports in the server are faulty.

The generating a failure notification of the first port includes:

The detecting device generates the second failure notification of the first port, so that the server acquires the second failure notification, and invokes a DRS (Distributed Resource Scheduler) to the server. The running virtual machine performs virtual machine hot migration.

With reference to the foregoing first aspect, or any one of the first to the fifth possible implementation manners of the first aspect, in the sixth possible implementation manner of the first aspect, the N ports The physical port in the X servers, or the virtual port in the virtual machine running in the X servers.

In a second aspect, an embodiment of the present invention provides a method for detecting a communication failure, including:

The server receives the probe message from the N-1 ports in the other server through the first port, where the probe message is used to determine the error packet data and the packet loss data of the N-1 ports, where N>2;

The server generates a detection result according to the detection message, where the detection result includes the packet loss data and the number of error packets sent by the N-1 ports to the first port by the probe message. according to;

The server acquires a fault notification sent by the detecting device according to the detection result, where the fault notification is used to indicate whether the first port is faulty.

In a first possible implementation manner of the second aspect, the first port is a physical port in the server, or is a virtual port in a virtual machine running in the server,

After the server obtains the fault notification sent by the detecting device according to the detection result, the method further includes:

If the first port is a physical port in the server, and the first port is faulty, the server removes the first port from the link aggregation group LAG according to the failure notification;

If the first port is a virtual port in the virtual machine running in the server, and the first port is faulty, the server virtualizes the virtual machine corresponding to the first port according to the failure notification. Machine heat migration.

In a second possible implementation manner of the second aspect, after the server obtains the fault notification sent by the detecting device according to the detection result, the method further includes:

If the first port is not faulty, the server queries whether the first port is in the LAG;

If the first port is not in the LAG, the server adds the first port to the LAG to perform data transmission and reception through the first port.

In a third possible implementation manner of the second aspect, the server generates the detection result according to the detection message, including:

The server calculates packet loss data of the N-1 ports to the first port according to the number of the probe messages received in the preset time;

The server analyzes whether the probe message is a wrong packet according to the probe message received in the preset time, to collect error packet data of the N-1 ports to the first port;

The server generates the detection result according to the lost packet data and the error packet data.

In a fourth possible implementation manner of the second aspect, the method further includes:

Obtaining, by the server, a MAC (Media Access Control) address of the N-1 ports, respectively;

The server constructs the probe message according to the MAC address;

The server sends the probe message to the N-1 ports by using the first port according to the MAC address of the N-1 ports.

In a third aspect, an embodiment of the present invention provides a detecting apparatus, including:

The obtaining unit is configured to obtain the detection results of the N ports in the X servers respectively, where the detection result includes the error packet data and the packet loss of the other port determined by each port according to the received probe message sent by the other port. Data, N>2, X>2;

a determining unit, configured to determine a state of the first port according to the error packet data and the packet loss data of the other port determined by each port in the acquiring unit, where the state of the first port is used to indicate the first Whether the port is faulty, the first port is one of the N ports;

And a processing unit, configured to generate a failure notification of the first port according to a state of the first port in the determining unit.

In a first possible implementation manner of the third aspect, the determining unit includes a computing subunit, where

The calculating subunit is configured to calculate, according to the detection result, a packet loss rate of the detection messages sent by the N ports to each other;

The determining unit is specifically configured to determine whether the first port is faulty according to a packet loss rate of the detection messages sent between the N ports.

In conjunction with the first possible implementation of the third aspect, in a second possible implementation of the third aspect,

The calculating subunit is specifically configured to convert the error packet data in the detection result into relative packet loss data according to a first preset function; and according to the relative packet loss data and the packet loss data in the detection result. And calculating, according to the second preset function, a packet loss rate of the detection messages sent by the N ports to each other.

In combination with the first possible implementation of the third aspect, the third In the way of implementation,

The determining unit is configured to: if the N/2 ports of the N ports send the detection message to the first port, the packet loss rate is greater than a first preset value, and the determining If the packet loss rate of the detection message sent between the N/2 ports is less than the second preset value, it is determined that the first port is faulty; otherwise, the first port is determined to be faultless.

With reference to the foregoing third aspect, or any one of the first possible implementation manners of the third aspect, in a fourth possible implementation manner of the third aspect,

The processing unit is specifically configured to generate the first fault notification of the first port, so that after the server acquires the first fault notification, the first port is removed from the LAG;

The fault notification includes a first fault notification, where the first fault notification is used to indicate that the first port is faulty.

In conjunction with the fourth possible implementation of the third aspect, in a fifth possible implementation manner of the third aspect,

The processing unit is specifically configured to generate the second failure notification of the first port, so that the server acquires the second failure notification, and invokes a distributed resource scheduler DRS to run in the server. The virtual machine performs virtual machine hot migration;

The fault notification includes a second fault notification, where the second fault notification is used to indicate that all the N ports in the X servers are faulty.

With reference to the foregoing third aspect, or any one of the first to the fifth possible implementation manners of the third aspect, in a sixth possible implementation manner of the third aspect, the N ports The physical port in the X servers, or the virtual port in the virtual machine running in the X servers.

In a fourth aspect, an embodiment of the present invention provides a server, including:

a receiving unit, configured to receive, by using the first port, a probe message from the N-1 ports in the other server, where the probe message is used to determine the error packet data and the packet loss data of the N-1 ports, where N>2;

a processing unit, configured to generate a detection result according to the detection message, where the detection result includes packet loss data and error packet data of the N-1 ports sending the probe message to the first port;

And an obtaining unit, configured to acquire, according to the detection result, a fault notification sent by the detecting device, where the fault notification is used to indicate whether the first port is faulty.

In a first possible implementation manner of the fourth aspect, the first port is a physical port in the server, or is a virtual port in a virtual machine running in the server, where the server is further Including removal unit and migration unit,

The removing unit is configured to: if the first port in the acquiring unit is a physical port in the server, and the first port is faulty, according to the fault notification in the acquiring unit, The first port is removed from the LAG;

The migrating unit is configured to: if the first port in the acquiring unit is a virtual port in a virtual machine running in the server, and the first port is faulty, according to the fault in the acquiring unit Notifying that the virtual machine corresponding to the first port performs virtual machine hot migration.

In a second possible implementation of the fourth aspect,

The processing unit is further configured to: if the first port is not faulty, query whether the first port is in the LAG; and if the first port is not in the LAG, then the first A port is added to the LAG for data transceiving through the first port.

In a third possible implementation of the fourth aspect,

The processing unit is configured to calculate, according to the number of the probe messages in the receiving unit received in the preset time, the packet loss data of the N-1 ports to the first port; The probe message in the receiving unit received in the preset time analyzes whether the probe message is a wrong packet, to collect error packet data of the N-1 ports to the first port; The packet loss data and the error packet data generate the detection result.

In a fourth possible implementation manner of the fourth aspect, the server further includes a sending unit,

The obtaining unit is further configured to separately obtain media access control of the N-1 ports MAC address;

The processing unit is further configured to construct the probe message according to a MAC address in the acquiring unit;

The sending unit is configured to send, by using the first port, a probe message in the processing unit to the N-1 ports according to a MAC address of the N-1 ports in the acquiring unit.

In a fifth aspect, an embodiment of the present invention provides a communication failure detection system, where the detection system includes the third aspect and any one of the first to sixth possible implementation manners of the third aspect. The detecting device, and the server according to the fourth aspect of the detecting device, and the possible implementation of any one of the first to fourth possible implementations of the fourth aspect.

An embodiment of the present invention provides a method, an apparatus, and a system for detecting a communication failure. The detection device obtains a detection result of N ports in the server, and the detection result is generated by the server according to the detection messages respectively received by the N ports. The detection result includes the error packet data and the packet loss data of the other port determined by each port according to the detection message sent by the other port, and therefore, the detection device determines the error of the other port according to each port. Packet data and packet loss data, determine whether one of the N ports is a faulty port, to detect whether a port with a "sub-health" status affects the efficiency of data transmission through the port, thereby improving the reliability of data transmission. Sex.

DRAWINGS

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the embodiments or the prior art description will be briefly described below.

1 is a schematic structural diagram of a detection system for communication failure in the prior art;

2 is a schematic structural diagram of a communication fault detection system according to an embodiment of the present invention;

FIG. 3 is a schematic diagram of hardware of a detecting device according to an embodiment of the present disclosure;

4 is a schematic diagram of hardware of a server according to an embodiment of the present invention;

FIG. 5 is a flowchart 1 of a method for detecting a communication fault according to an embodiment of the present invention;

FIG. 6 is a second flowchart of a method for detecting a communication fault according to an embodiment of the present invention;

FIG. 7 is a flowchart 3 of a method for detecting a communication fault according to an embodiment of the present invention;

FIG. 8 is a schematic structural diagram 1 of a detecting device according to an embodiment of the present disclosure;

FIG. 9 is a schematic structural diagram 2 of a detecting device according to an embodiment of the present disclosure;

FIG. 10 is a schematic structural diagram 1 of a server according to an embodiment of the present disclosure;

FIG. 11 is a schematic structural diagram 2 of a server according to an embodiment of the present disclosure;

FIG. 12 is a schematic structural diagram 3 of a server according to an embodiment of the present invention.

detailed description

In the following description, for purposes of illustration and description, reference However, it will be apparent to those skilled in the art that the present invention may be practiced in other embodiments without these specific details. In other instances, detailed descriptions of well-known devices, circuits, and methods are omitted so as not to obscure the description of the invention.

The terms "system" and "network" are used interchangeably herein. In order to facilitate the understanding of a method, device and system for detecting communication failures provided by the embodiments of the present invention, some concepts related to the present invention are first introduced.

Port aggregation, also known as ethernet channel, is mainly used for connections between switches or servers. If port aggregation is used, the switch will combine a group of physical ports as a logical channel (such as

ports

1, 2, 3, and 4 shown in Figure 1,) that is, channel-group, so the switch considers this logical channel as One port. After port aggregation technology is used, as long as not all ports in the group are down (down), communication between the two switches can still continue. In this way, port aggregation technology can be used to allow multiple switches to pass multiple ports. Parallel connections simultaneously transmit data to provide higher bandwidth, greater throughput and recoverability, increasing system reliability.

Switch stacking refers to combining more than one switch to work together to provide as many ports as possible in a limited space. After multiple switches are stacked, they have sufficient system bandwidth and increase system reliability.

The LACP protocol is a protocol for implementing dynamic aggregation and de-aggregation of links. After the LACP protocol of a port is used, the port will advertise its system priority, system MAC address, port priority, and port number to the peer end by sending LACPdu. After receiving the information, the peer compares the information with the information saved by other ports to select the port that can be aggregated. Therefore, the two parties can agree to join or exit the dynamic aggregation group.

Link aggregation (Link Aggregation) refers to the binding of multiple physical ports to a logical port to implement the load balancing of the inbound/outbound traffic on each member port. The switch determines the report based on the port load balancing policy configured by the user. From which member port the text is sent to the peer switch. When the switch or the server detects that the link of one of the member ports is faulty, it stops sending data on this port, and recalculates the port on which the packet is sent in the remaining link according to the load sharing policy. After the faulty port is restored, the port is restored again. Recalculating data retransmission ports, therefore, link aggregation is an important technology in terms of increasing link bandwidth, achieving link transmission flexibility and redundancy.

In addition, the server involved in the present invention may be various types of servers, such as a blade server, and at least one virtual machine may be running in the server, and the virtual machine includes a virtual port. The switch involved in the present invention is a network device for electrical signal forwarding, which can meet at least Layer 2 switching requirements, that is, can identify MAC address information in a data packet, forward according to the MAC address, and forward the MAC addresses. Record with the corresponding port in an address table inside itself.

Specifically, after port aggregation and switch stacking in various types of servers, each port in the server may have a "sub-health" status. At this time, the port can still send and receive data packets with other ports (that is, the port is still available). ), but the port may drop packets when sending packets, or tamper with the contents of the packet and other abnormal operations. In the prior art, the LAG cannot detect the port in the “sub-health” state, and the data transmitted through the “sub-health” port will continue to be damaged. Therefore, the embodiment of the present invention provides a method for detecting a communication failure. The device and the system solve the problem that the LAG cannot detect the "sub-health" state that may occur in the port in the prior art, and improve the reliability of data transmission.

Embodiment 1

An embodiment of the present invention provides a communication failure detection system, as shown in FIG. 2, including X servers 01 after link aggregation, and Y switches after switch stacking. 02, and detecting device 03, wherein

The server 01 includes at least one port, the switch 02 includes at least one port, and the server 01 and the switch 02 are connected through corresponding ports.

The server 01 runs at least one virtual machine, and the virtual machine includes a virtual port.

The detecting device 03 may be deployed on any one of the X servers 01, or may be separately deployed in the detecting system of the communication failure independently of the X servers 01.

On the one hand, in the embodiment of the present invention, the detecting device 03 respectively obtains the detection results of the N ports in the X servers 01, and the detection result includes each port determining according to the probe message sent by the other ports received. The error packet data and the packet loss data of the other port, N>2, X>2; the detecting device 03 determines the first packet according to the error packet data and the packet loss data of the other port determined by each port. Whether a port is faulty, the first port is one of the N ports.

Further, the detecting device 03 determines whether the first port is faulty according to the error packet data and the packet loss data of the other port determined by each port, and may specifically include the following steps: the detecting device 03 according to the The detection result is respectively calculated, and the detection device 03 determines the packet loss rate according to the packet loss rate of the detection messages sent by the N ports to each other. Is there a fault in one port?

Further, the detecting device 03 respectively calculates, according to the detection result, a packet loss rate of the detection messages sent by the N ports, which may specifically include the following steps: the detecting device 03 detects the detection result. The error packet data is converted into relative packet loss data according to the first preset function; the detecting device 03 respectively calculates the data according to the second preset function according to the relative packet loss data and the packet loss data in the detection result. The packet loss rate of the detection message is sent between the N ports.

Further, the detecting device 03 determines whether the first port is faulty according to the packet loss rate of the detection messages sent by the N ports, and may specifically include the following steps: in the N ports, If at least N/2 ports send the detection message to the first port, the packet loss rate is greater than the first preset value, and the packet loss rate of the detection message is sent between the at least N/2 ports Less than the second preset value, the detecting device 03 Then determining that the first port is faulty; otherwise, the detecting device 03 determines that the first port is not faulty.

Further, after the detecting device 03 sends the packet loss data and the error packet data of the detection message between the N ports to determine whether the first port is faulty, the detecting device may further include the following steps: If the port is faulty, the detecting device 03 sends a first failure notification to the server 01, so that the server 01 removes the first port from the LAG according to the first failure notification.

Further, after the detecting device 03 sends the packet loss data and the error packet data of the detection message between the N ports to determine whether the first port is faulty, the detecting device may further include the following steps:

If the ports in the server 01 are faulty, the detecting device 03 invokes the DRS to perform virtual machine hot migration on the virtual machine running in the server 01, or

If the ports in the server 01 are faulty, the detecting device 03 sends a second failure notification to the server 01, so that the server 01 calls the DRS to the server according to the second failure notification. The virtual machine running inside performs virtual machine hot migration.

On the other hand, in the embodiment of the present invention, the server 01 receives the probe messages from other N-1 ports in the other server 01 through the first port, and the probe message is used to determine the error of the N-1 ports. Packet data and packet loss data, N>2; the server 01 generates a detection result according to the probe message, where the detection result includes the packet loss data of the N-1 ports sending the probe message to the first port, and The server 01 acquires a fault notification sent by the detecting device 03 according to the detection result, and the fault notification is used to indicate whether the first port is faulty.

Further, the first port is a physical port in the server 01, or is a virtual port in the virtual machine running in the server 01, where the server 01 sends the detection device 03 according to the detection result. After the fault notification, the following steps can also be included:

If the first port is a physical port in the server 01, and the first port is faulty, the server 01 removes the first port from the LAG according to the fault notification;

If the first port is a virtual port in the virtual machine running in the server 01, and the first port is faulty, the server 01 notifies the virtual machine corresponding to the first port according to the fault notification. Perform virtual machine hot migration.

Further, after the server 01 acquires the fault notification sent by the detecting device 03 according to the detection result, the method may further include the following steps: if the first port is not faulty, the server 01 queries whether the first port is in the In the LAG, if the first port is not in the LAG, the server 01 adds the first port to the LAG to perform data transmission and reception through the first port.

Further, the server 01 generates the detection result according to the detection message, and may specifically include the following steps: the server 01 calculates the N-1 ports according to the number of the detection messages received within the preset time. The packet loss data of the first port; the server 01 analyzes whether the probe message is a wrong packet according to the probe message received in the preset time, to count the N-1 ports to the first The error packet data of one port; the server 01 generates the detection result according to the packet loss data and the error packet data.

Further, the method for detecting the communication failure may further include the following steps: the server 01 acquires media access control MAC addresses of the N-1 ports respectively; and the server 01 constructs the probe message according to the MAC address. The server 01 sends the probe message to the N-1 ports through the first port according to the MAC address of the N-1 ports.

It should be noted that the N ports are physical ports in the server 01 or virtual ports in the virtual machines running in the server 01. In this way, the embodiment of the present invention provides a method, a device, and a system for detecting a communication failure, which can be applied to an IAAS (Infrastructure as a Service) scenario, and can also be applied to a PAAS (Platform-as- The a-Service (Platform-as-a-Service) scenario is used to implement the automatic switching of the communication plane in the cloud scenario. The method for detecting communication faults in the IAAS and PAAS scenarios will be described in detail in subsequent embodiments, so it will not be described here.

In addition, the above IAAS and PAAS are different forms of service in cloud computing. The cloud computing is an increase, use and delivery mode of related services based on the Internet, and generally involves dynamically providing the Internet through the Internet. , easy to extend, and often a virtualized resource. Among them, cloud computing can include the following levels of services: Infrastructure as a Service (IAAS), Platform as a Service (PAAS) and Software as a Service (SAAS, Software-as-a-Service). IAAS means that consumers can obtain services from a complete computer infrastructure through the Internet, such as: hardware server rental; PAAS refers to the software development platform as a service, such as: personalized customization of software.

The method for detecting the communication fault provided by the embodiment of the present invention can be applied to the IAAS scenario, that is, the communication fault detection of the physical port of the server in the IAAS is fully interconnected, and the path of the communication faulty port is switched. The method for detecting a communication fault provided by the present invention can also be applied to the PAAS scenario, that is, the communication fault detection of the virtual port of the virtual machine running in the server in the PAAS is combined, and the detection result of the physical port in the IAAS scenario is combined. To achieve automatic path switching for ports with communication failures.

In the prior art, the server can only detect whether each port of the port is available through the LAG, that is, whether the port can transmit data, and cannot detect the "sub-health" situation in which the data is transmitted when the port is faulty (for example, sending a data packet) When a large amount of packet loss occurs, or the contents of the packet are falsified, etc., the data transmitted through the "sub-health" port will continue to be damaged, and the reliability of data transmission is lowered. The detection method of the communication failure provided by the present invention can detect the port of the “sub-health” state, and then remove the port of the “sub-health” state from the LAG in time, thereby improving the reliability of data transmission.

An embodiment of the present invention provides a communication failure detection system, in which a server obtains a detection message from N-1 ports in each server through a first port, and generates a detection result according to the detection message, when the detection device acquires X respectively. After the detection result of the N ports in the server, the first port is determined to be faulty according to the detection result. In this solution, the detecting device obtains the detection results of the N ports in the X servers respectively, and the detection result is generated by each server according to the detection messages respectively received by the N ports, because the detection result includes each port according to the receiving The probe message sent by the other port is determined, and the error packet data and the packet loss data of the other port are determined. Therefore, the detecting device determines, according to the error packet data and the packet loss data of the other port determined by each port. Whether one of the N ports is a faulty port, to detect whether a port with a "sub-health" status affects the data transmission efficiency of the port, thereby improving the reliability of data transmission, and solving the problem that the LAG cannot be detected in the prior art. The problem of a failed port to an abnormal operation avoids the risk of using the failed port to transfer data.

Embodiment 2

FIG. 3 is a schematic diagram showing the hardware of the detecting device provided by the embodiment of the present invention:

The detecting device may be a server or a blade, and the detecting device may be deployed in a server that reports the detection result in the detecting system of the communication fault, or may introduce a new server as a detecting device in the detecting system of the communication fault, specifically:

As shown in FIG. 3, the detecting device includes a processor 11, a transceiver module 12, and a memory 13, wherein

The processor 11 is a control center of the detecting device, and the detecting device performs various functions and processing data of the detecting device by running or executing a software program and/or a module stored in the memory, and calling data stored in the memory. .

The transceiver module 12 can be used for receiving and transmitting signals during the process of transmitting and receiving information. In particular, the transceiver module can communicate with the network and other devices through wireless communication. The wireless communication can use any communication standard or protocol. In the present invention, the transceiver module can perform data transmission and reception based on the LACP protocol or the ARP (Address Resolution Protocol).

The memory 13 can be used to store software programs and modules, and the processor executes various functional applications and data processing of the detecting device by running software programs and modules stored in the memory.

In the embodiment of the present invention, the transceiver module 12 obtains the detection results of the N ports in the X servers respectively, and the detection result includes the error of the other ports determined by each port according to the received probe messages sent by other ports. Packet data and packet loss data, N>2, X>2; the processor 11 determines, according to the error packet data and the packet loss data of the other port determined by each port, whether the first port is faulty, the first One port is one of the N ports.

Further, the processor 11 determines whether the first port is faulty according to the packet loss data and the error packet data of the detection messages sent by the N ports, and may further include the following steps: the processor 11 according to the The detection results respectively calculate a packet loss rate of the detection messages sent by the N ports to the memory 13; and the processor 11 sends a packet loss of the detection message according to the N ports. The rate determines if the first port is faulty.

Further, the processor 11 respectively calculates, according to the detection result, a packet loss rate of the detection messages sent by the N ports, and may further include the following steps: the processor 11 The error packet data is converted into relative packet loss data according to the first preset function; the processor 11 respectively calculates the data according to the second preset function according to the relative packet loss data and the packet loss data in the detection result. The packet loss rate of the detection message is sent between the N ports.

Further, the processor 11 determines whether the first port is faulty according to the packet loss rate of the detection messages sent by the N ports, and may further include the following steps: in the N ports, If at least N/2 ports send the detection message to the first port, the packet loss rate is greater than the first preset value, and the packet loss rate of the detection message is sent between the at least N/2 ports Less than the second preset value, the processor 11 determines that the first port is faulty; otherwise, the processor 11 determines that the first port is not faulty.

Further, after the processor 11 sends the packet loss data and the error packet data of the detection message between the N ports to determine whether the first port is faulty, the processor 11 may further include the following steps: if the processor 11 determines The first port is faulty, and the processor 11 sends a first failure notification to the server corresponding to the first port by the transceiver module 12, so that the server sends the first according to the first failure notification. The port is removed from the LAG.

Further, after the processor 11 sends the packet loss data and the error packet data of the detection message between the N ports to determine whether the first port is faulty, the processor 11 may further include the following steps: if the processor 11 determines the Each port in the X servers is faulty, and the processor 11 calls the DRS in the memory 13 to perform virtual machine hot migration on the virtual machines running in the X servers, or

If the processor 11 determines that all the X ports in the server are faulty, the processor 11 sends a second failure notification to the X servers through the transceiver module 12, so that the X servers are according to the The second failure notification invokes the DRS to perform virtual machine hot migration on the virtual machines running in the X servers.

Further, the N ports are physical ports in the server, or are virtual ports in a virtual machine running in the server. Specifically, in the IAAS, the communication fault detection of the physical port of the server is fully interconnected, and the end of the communication failure is performed. The port performs path switching. In the PAAS, the virtual port of the virtual machine running in the server is fully interconnected and detected, and the detection result of the physical port in the IAAS scenario is combined to implement automatic path switching of the port with communication failure.

FIG. 4 is a schematic diagram of hardware of a server provided by an embodiment of the present invention:

The server can be for various types of servers (such as blade servers, etc.), specifically:

As shown in FIG. 4, the server includes a processor 21, a transceiver module 22, and a memory 23, where

The processor 21 is a control center of the server, and the server performs various functions and processing data of the detecting device by running or executing software programs and/or modules stored in the memory, and calling data stored in the memory.

The transceiver module 22 can be used for receiving and transmitting signals during the process of transmitting and receiving information. In particular, the transceiver module can communicate with the network and other devices through wireless communication. The wireless communication can use any communication standard or protocol. In the present invention, the transceiver module can perform data transmission and reception based on the LACP protocol or the ARP protocol.

The memory 23 can be used to store software programs and modules, and the processor executes various functional applications and data processing of the server by running software programs and modules stored in the memory.

In the embodiment of the present invention, the transceiver module 22 receives the probe message from the N-1 ports in the other server through the first port, where the probe message is used to determine the error packet data and the packet loss data of the N-1 ports. The N21 generates a detection result according to the detection message and sends the detection result to the transceiver module 22. The detection result includes the packet loss data sent by the N-1 ports to the first port by the detection message. And the error packet data; the transceiver module 22 acquires the fault notification sent by the detecting device according to the detection result, and sends the fault notification to the processor 21, where the fault notification is used to indicate whether the first port is faulty.

Further, the first port is a physical port in the server, or is a virtual port in a virtual machine running in the server.

Further, after the transceiver module 22 acquires the fault notification sent by the detecting device according to the detection result and sends it to the processor 21, the method may further include the following steps: if the first port is a physical port in the server, and the The first port is faulty, where The processor 21 then removes the first port from the LAG in the memory 23 according to the failure notification;

If the first port is a virtual port in the virtual machine running in the server, and the first port is faulty, the processor 21 notifies the virtual machine corresponding to the first port according to the fault notification. Perform virtual machine hot migration.

Further, after the transceiver module 22 acquires the fault notification sent by the detecting device according to the detection result and sends it to the processor 21, the method may further include the following steps: if the first port is not faulty, the processor 21 queries the Whether the first port is in the LAG of the memory 23; if the first port is not in the LAG, the processor 21 adds the first port to the LAG and updates the added LAG to The memory 23 is configured such that the transceiver module 22 performs data transmission and reception through the first port.

Further, the processor 21 generates a detection result according to the detection message and sends the detection result to the transceiver module 22, which may include the following steps: the processor 21 calculates the location according to the number of the detection messages received within the preset time. The packet loss data of the N-1 ports to the first port is saved to the memory 23; the processor 21 analyzes whether the probe message is a wrong packet according to the probe message received within the preset time. Collecting the error packet data of the N-1 ports to the first port and saving to the memory 23; the processor 21 generates the detection according to the packet loss data and the error packet data in the memory 23 result.

Further, the method for detecting a communication failure may further include the following steps: the processor 21 acquires MAC addresses of the N-1 ports respectively; and the processing constructs the probe message according to the MAC address; The transceiver module 22 sends the probe message to the N-1 ports by using the first port according to the MAC address of the N-1 ports.

It can be seen that, in the prior art, the server can only detect whether each port of its own port is available through the LAG, that is, whether the port can transmit data, and cannot detect the "sub-health" situation in which the data is transmitted when the port is faulty (for example, When a large amount of packet loss occurs during the transmission of a data packet, or the content in the data packet is falsified, the data transmitted through the "sub-health" port will continue to be damaged, and the reliability of data transmission is lowered. The detection method of the communication failure provided by the present invention can detect the port of the “sub-health” state, and then remove the port of the “sub-health” state from the LAG in time, thereby improving data transmission. The reliability of the loss.

An embodiment of the present invention provides a device for detecting a communication failure, in which a server acquires a probe message from N-1 ports in each server through a first port, and generates a probe result according to the probe message, where the detection device acquires X respectively. After the detection result of the N ports in the server, the first port is determined to be faulty according to the detection result. In this solution, the detecting device obtains the detection results of the N ports in the X servers respectively, and the detection result is generated by each server according to the detection messages respectively received by the N ports, because the detection result includes each port according to the receiving The probe message sent by the other port is determined, and the error packet data and the packet loss data of the other port are determined. Therefore, the detecting device determines, according to the error packet data and the packet loss data of the other port determined by each port. Whether one of the N ports is a faulty port, to detect whether a port with a "sub-health" status affects the data transmission efficiency of the port, thereby improving the reliability of data transmission, and solving the problem that the LAG cannot be detected in the prior art. The problem of a failed port to an abnormal operation avoids the risk of using the failed port to transfer data.

Embodiment 3

An embodiment of the present invention provides a method for detecting a communication failure, as shown in FIG. 5, including:

The detection device obtains the detection results of the N ports in the X servers, and the detection result includes the error packet data and the packet loss data of the other ports determined by each port according to the detection messages sent by the other ports.

Wherein, N>2, X>2, wherein the N ports are communication fault detection systems, and each port in the server after port aggregation (as shown in

port

1, 2, 3, and 4 of server 1 in FIG. 2) ).

The detection result is generated by each server according to the received detection message and reported to the detection device. Specifically, the detection result includes packet loss data and error packet data of the detection messages sent by the N ports, as shown in Table 1. The detection result sent by the server to the detecting device through the port 1 includes the error packet data and the packet loss data of the remaining N-1 ports in the detection system of the port 1 to the communication failure, and the error packet data and the packet loss The data reflects the communication quality of the communication path from port 1 to the other N-1 ports.

Table 1

	错包数据Wrong packet data	丢包数据Packet loss data
	错包数据Wrong packet data	丢包数据Packet loss data
端口1至端口2Port 1 to port 2	5个5	3个3
端口1至端口2Port 1 to port 2	5个5	3个3	端口1至端口3 Port 1 to port 3	0个0	0个0
端口1至端口4 Port 1 to port 4	3个3	5个5	端口1至端口3 Port 1 to port 3	0个0	0个0
端口1至端口4 Port 1 to port 4	3个3	5个5	端口1至端口5 Port 1 to port 5	1个1	0个0

Correspondingly, after the detection device obtains the detection results of all the N ports, the communication quality of all communication paths in the detection system of the current communication failure is obtained, so that the detection device evaluates the faulty port according to the communication quality of all communication paths. .

It should be noted that the calculation method of the error packet data and the packet loss data will be elaborated in the subsequent embodiments, and therefore will not be described herein.

102. The detecting device determines, according to the error packet data and the packet loss data of the other ports determined by each port, the state of the first port, where the state of the first port is used to indicate whether the first port is faulty.

After obtaining the detection result of the N ports in the X servers, the detecting device may determine whether the first port is faulty according to the detection result, and the first port is one of the N ports.

Optionally, after detecting the detection result of the N ports in the server, the detecting device may separately calculate, according to the detection result, a packet loss rate of the detection messages sent by the N ports, and further, according to the N The packet loss rate of the detection messages sent between the ports determines whether the first port is faulty.

Exemplarily, the detecting device may convert the error packet data in the detection result into relative packet loss data according to a first preset function; and then, according to the relative packet loss data and the packet loss data in the detection result. And calculating, according to the second preset function, a packet loss rate of the detection messages sent by the N ports to each other. Finally, as shown in Table 2, the packet loss rate between the ports when the detection messages are sent and received between the ports is reflected. For example, the packet loss rate of port 1 to port 3 is 0.2%. Among them, the data in Table 2 is the percentage data.

Table 2

	端口1 Port 1	端口2 Port 2	端口3 Port 3	端口4Port 4
	端口1 Port 1	端口2 Port 2	端口3 Port 3	端口4Port 4	端口1 Port 1	无no	11	0.20.2	00
端口2 Port 2	00	无no	0.30.3	0.30.3	端口1 Port 1	无no	11	0.20.2	00
端口2 Port 2	00	无no	0.30.3	0.30.3	端口3 Port 3	0.10.1	11	无no	0.20.2
端口4Port 4	0.10.1	0.90.9	00	无no	端口3 Port 3	0.10.1	11	无no	0.20.2

Further, after the detecting device calculates the packet loss rate of the detection message between the N ports, it is determined whether the first port is faulty. Exemplarily, the detecting device performs statistics according to Table 2, if at least N/2 ports send detection messages to the first port, the packet loss rate is greater than the first preset value, and at least N/2 The packet loss rate of the detection message sent between the ports is less than the second preset value, and the detecting device determines that the first port is faulty; otherwise, the detecting device determines that the first port is not faulty.

Optionally, the threshold of the packet loss data and the error packet data may be preset in the detecting device, and the packet loss data and the error packet data of the detection message sent by the detecting device to the other port and the other port satisfy the preset. When the threshold of packet loss data and packet error data is determined, the port is determined to be a faulty port. Using this port for data transmission and reception will affect the reliability of data.

Optionally, the detecting device may further calculate a ratio of packet loss data and error packet data between each port and other ports according to packet loss data and error packet data of the detection messages sent between the N ports, and obtain the lost ratio. Packets with relatively small packets and error packets. When N ports are faulty, select the port with the relatively small packet loss and error packet to send and receive data to ensure the normal operation of the server.

At this point, the detecting device determines whether the first port is faulty according to the detection result.

103. The detecting device generates a fault notification of the first port according to the state of the first port.

After the detecting device sends the packet loss data and the error packet data of the detection message between the N ports to determine whether the first port is faulty, if the first port is faulty, the detecting device may generate the first a failure notification of the port, further, the detecting device may send a first failure notification to the server, so that the server removes the first port from the LAG according to the first failure notification, that is, stops at Data is sent on this port, and the data is recalculated in the remaining links according to the load sharing policy. The port recalculates the data sending port again after the failed port is restored, so that automatic switching of the communication path between the N ports can be implemented.

Further, if the detecting device determines that each port in the server is faulty according to the detection result, the detecting device may invoke DRS to perform virtual machine hot migration on the virtual machine running in the server, or the detecting device may Sending a second failure notification to the server, so that the server invokes the DRS according to the second failure notification to perform virtual machine hot migration on the virtual machine running in the server, and the virtual server in the server on the faulty port The machine is migrated to other servers that do not have a faulty port to ensure that data transmission is not impaired when the virtual machine corresponding to the failed port performs business interaction.

At this point, it can be seen that the method for detecting a communication failure using the present invention can effectively detect a port in a "sub-health" state, that is, the port can still transmit data, but the packet loss rate during data transmission is very large, resulting in passing through the port. The data will continue to be damaged, and after detecting the port of the "sub-health" state, the first port is removed from the LAG in time, or the virtual machine running in the server is subjected to virtual machine hot migration. To achieve automatic switching of the communication path between the N ports and to ensure that data transmission is not damaged.

It should be noted that the N ports are physical ports in the server, or are virtual ports in a virtual machine running in the server. Specifically, in the IAAS, the physical port of the server is fully interconnected and the communication fault is detected, and the port of the communication failure is switched. In the PAAS, the virtual port of the virtual machine running in the server is completely interconnected. The detection, in combination with the detection result of the physical port in the IAAS scenario, implements automatic path switching of the port with communication failure.

An embodiment of the present invention provides a method for detecting a communication failure, as shown in FIG. 6, including:

201. The server receives the probe message from the N-1 ports in the other server through the first port.

The probe message is used to determine the error packet data and the packet loss data of the N-1 ports, where N>2.

The server may periodically receive the probe message from the N-1 ports through the first port. For example, the first port receives the probe message from the N-1 ports in one minute, and according to the original communication protocol in the server, the first The port receives each port in a fixed period The number of sent probe messages is predetermined. The predetermined number reflects the ability of the port to send and receive data. For example, port 1 should receive 60 probe messages sent by port 3 within one minute. The probe message may be used to reflect QoS (Quality of Service) of the N-1 ports to the first port, where QoS refers to a set of quality requirements on the collective behavior of one or more objects. . Since the path between the first port and the other N-1 ports may be faulty, the server may determine the error packet data of the N-1 ports by periodically transmitting a specified number of probe messages for each port. And the packet loss data, the error packet data and the packet loss data of the N-1 ports reflect the quality of service QoS of the N-1 ports to the first port.

Exemplarily, as shown in Table 3, the number of probe messages of the N-1 ports received by the first port in one minute, and the first port receives the probes sent by the N-1 ports in one minute. The predetermined number of messages is 100. It can be seen that the number of probe messages of the N-1 ports received by the first port reflects the communication capability between the first port and the N-1 ports.

table 3

202. The server generates a detection result according to the detection message, where the detection result includes packet loss data and error packet data that the N-1 ports send the probe message to the first port.

After the server receives the probe message from the N-1 ports through the first port, the server may generate a probe result according to the probe message.

Specifically, the server may calculate, according to the number of the detection messages received in the preset time, the packet loss data of the N-1 ports to the first port; The probe message received in the time analyzes whether the probe message is a wrong packet, to collect the error packet data of the N-1 ports to the first port; finally, the server according to the packet loss data and the error Packet data, the detection result is generated.

Exemplarily, as shown in Table 4, on the basis of Table 3, the server generates a detection result of the first port to the N-1 ports according to the probe message and reports it to the detecting device to enable the check. The measuring device determines whether the first port is faulty according to the detection result of the N ports. The error packet data of the first port in Table 4 is calculated according to a CRC (Cyclic Redundancy Check) of each probe message received by the first port.

Table 4

At this point, the server receives the probe message from the N-1 ports through the first port, and generates a probe result according to the probe message, so that the detecting device determines the faulty port according to the detection result of the respective ports.

Further, when the server receives the probe message from the N-1 ports through the first port, and generates the probe result according to the probe message, the server may also periodically send the probe message to the other N-1 ports. The other N-1 ports are similarly reported to the detecting device according to the detection message.

First, the server obtains the MAC addresses of other N-1 ports respectively; wherein, the MAC address, or MAC address, hardware address, is used to define the location of the network device, indicating the identifier of each site on the Internet.

Specifically, the server can obtain the MAC addresses of the ports in other servers according to the ARP protocol or the LACP protocol.

Second, the server constructs probe messages based on the MAC addresses of the other N-1 ports.

The probe message may be a Layer 2 packet. In the OSI model, the Layer 3 network layer is responsible for the IP address, and the Layer 2 data link layer is responsible for the MAC address, so each network location has a dedicated layer. MAC address. The first port in the server identifies the MAC address information in the Layer 2 data packet, forwards the MAC address according to the MAC address, and records the MAC address and the corresponding port in an internal address table.

Finally, the server sends the probe message to the N-1 ports by using the first port according to the MAC address of the N-1 ports.

In this way, the server periodically sends the probe message to the other N-1 ports, so that the other N-1 ports also report their own detection results to the detection device according to the probe message.

203. The server acquires a fault notification sent by the detecting device according to the detection result, where the fault notification is used to indicate whether the first port is faulty.

After the server generates the detection result according to the detection message, the detection device determines whether the first port is faulty according to the detection result of each port, and the server can obtain the failure notification sent by the detection device according to the detection result.

Specifically, if the first port is a physical port in the server, and the first port is faulty, the server may remove the first port from the LAG according to the fault notification.

If the first port is a virtual port in the virtual machine running in the server, and the first port is faulty, the server may perform virtual machine hot on the virtual machine corresponding to the first port according to the fault notification. migrate.

If the first port is not faulty, the server queries whether the first port is in the LAG, that is, determines whether a failure has been removed from the LAG before the first port; if the first port is not In the LAG, that is, the first port has been removed from the LAG, the server may re-add the first port to the LAG at this time to perform data transmission and reception through the first port.

It should be noted that, after the detecting device determines that the first port is faulty according to the detection result, the work of removing the first port from the LAG may be performed by the detecting device, or may be sent by the detecting device. The message informs the server that the first port is faulty, and the first port is removed from the LAG by the server itself, which is not limited by the present invention.

In addition, the N ports are physical ports in the server, or are virtual ports in a virtual machine running in the server. Specifically, in the IAAS, the physical port of the server is fully interconnected and the communication fault is detected, and the port of the communication failure is switched. In the PAAS, the virtual port of the virtual machine running in the server is performed. Fully interconnected communication fault detection, combined with the detection of physical ports in the IAAS scenario, to achieve automatic path switching for ports with communication failures.

At this point, the server receives and sends the probe message through each port to form a fully interconnected path detection system, and generates a detection result to detect the quality of service between the ports, and analyzes the detection result reported by each device through the detection device. The port of the "sub-health" state is detected, and the port of the "sub-health" state is removed from the LAG in time to prevent the server from using the port of the "sub-health" state for data transmission and reception, and the data is continuously damaged.

In the prior art, the server can only detect whether each port of the server is available through the LAG, that is, whether the port can transmit data, and cannot detect an abnormal situation in which the data is transmitted when the port is faulty (for example, when a packet is sent) Packet loss, or tampering with the contents of the packet, etc., will result in continued loss of data transmitted through the "sub-health" port, reducing the reliability of data transmission. The detection method of the communication failure provided by the present invention can detect the port of the “sub-health” state, and then remove the port of the “sub-health” state from the LAG in time, thereby improving the reliability of data transmission.

An embodiment of the present invention provides a method for detecting a communication failure. The server obtains a probe message from N-1 ports in each server through a first port, and generates a detection result according to the probe message, where the detection device acquires X respectively. After the detection result of the N ports in the server, the first port is determined to be faulty according to the detection result. In this solution, the detecting device obtains the detection results of the N ports in the X servers respectively, and the detection result is generated by each server according to the detection messages respectively received by the N ports, because the detection result includes each port according to the receiving The probe message sent by the other port is determined, and the error packet data and the packet loss data of the other port are determined. Therefore, the detecting device determines, according to the error packet data and the packet loss data of the other port determined by each port. Whether one of the N ports is a faulty port, to detect whether a port with a "sub-health" status affects the data transmission efficiency of the port, thereby improving the reliability of data transmission, and solving the problem that the LAG cannot be detected in the prior art. The problem of a failed port to an abnormal operation avoids the risk of using the failed port to transfer data.

Embodiment 4

An embodiment of the present invention provides a method for detecting a communication failure, as shown in FIG. 7, including:

301. The server receives the probe message from the N-1 ports in the other server through the first port.

The probe message is used to determine the error packet data and the packet loss data of the N-1 ports, where N>2. The probe message may be a layer 2 data packet, the length of the layer 2 data packet may be changed, and the content of the layer 2 data packet may be randomly variable.

Since the number of the probe messages sent by the first port to the respective ports in the fixed period is predetermined, the predetermined number reflects the capability of the port to send and receive data, so the server can receive the other servers and the own ones through the first port. Each port periodically sends a specified number of probe messages to determine the error packet data and packet loss data of the N-1 ports. For example, port 1 should receive 60 probe messages sent by port 2 every minute, if actual Port 1 only accepts 50 probe messages sent by port 2 every minute, indicating that packet loss occurs on port 1 or port 2.

In addition, when the server receives the probe message from the N-1 ports through the first port, and generates the probe result according to the probe message, the server may also periodically send the probe message to the N-1 ports in the other server. Therefore, the other N-1 ports are similarly reported to the detecting device according to the detection message.

Specifically, the server can obtain the MAC addresses of the ports in other servers according to the ARP protocol or the LACP protocol. The probe message is further constructed according to the MAC addresses of the other N-1 ports. Finally, the server sends the probe message to the N-1 ports by using the first port according to the MAC address of the N-1 ports.

302. The server generates a detection result according to the detection message, where the detection result includes packet loss data and error packet data that the N-1 ports send the probe message to the first port.

Illustrative, packet loss data = number of probe messages that should be received during the period - actual during the period The number of probe messages received;

When calculating the error packet data, first calculating the CRC value of each of the received probe messages, and if the calculated CRC value does not match the CRC value carried in the received probe message, the received probe message is recorded. Is a wrong packet of data. Among them, CRC is the most commonly used error check code in the field of data communication, and the feature is that the length of the information field and the check field can be arbitrarily selected. CRC is a data transmission error detection function. Polynomial calculation is performed on the data, and the obtained result is attached to the frame. The receiving device also performs a similar algorithm to ensure the correctness and integrity of the data transmission.

303. The detecting device acquires a detection result of the N ports in each server.

The detection device may be configured with a path detection system to periodically receive detection results of N ports in the server, and the path detection system analyzes the faulty port according to the detection results of the N ports. The detecting device obtains the detection result of the N ports in the server, and the detection result includes packet loss data and error packet data in which the detection messages are sent between the N ports.

Specifically, each of the ports in the server repeats the above steps 301 and 302 until the path detection system of the detecting device acquires the detection results of all N ports, as shown in Table 5. After the path detection system of the detecting device acquires the detection results of all the N ports, the communication quality of all the communication paths in the detection system of the current communication failure is obtained, so that the detecting device evaluates the fault according to the communication quality of all the communication paths. port.

table 5

	丢包数据Packet loss data	错包数据Wrong packet data
	丢包数据Packet loss data	错包数据Wrong packet data	第一端口First port	AA	BB
……......	……......	……......	第一端口First port	AA	BB
……......	……......	……......	第N端口Nth port	CC	DD

304. The detecting device calculates, according to the detection result, a packet loss rate of sending detection messages between the N ports.

After obtaining the detection result of the N ports in the server, the detecting device may determine whether the first port is faulty according to the detection result, and the first port is one of the N ports.

Specifically, first, the detecting device may convert the error packet data in the detection result into relative packet loss data according to the first preset function.

Exemplarily, the first preset function F1=relative packet loss data=error packet data*5, that is, the error packet data is converted into relative packet loss data according to a ratio of 1:5. Assuming that the error packet data of port 1 to port 2 is 2, the relative packet loss data of port 1 to port 2 = error packet data *5 = 2 * 5 = 10.

Then, according to the relative packet loss data and the packet loss data in the detection result, the packet loss rate of the detection messages sent by the N ports is calculated according to the second preset function. In order to accurately evaluate the path communication quality between the N ports, the packet loss rate may be recorded as a relative packet loss rate. As a result of the large packet loss rate between the ports of the server, the detection device may cause all ports to fail according to the absolute packet loss rate. Therefore, the detection device follows the relative relationship between the N ports. The packet loss rate determines if the first port is faulty.

Exemplarily, the second preset function F2 = packet loss rate = (relative packet loss data + packet loss data) / the number of probe messages that should be received. Assume that the relative packet loss data of port 1 to port 2 is 10, the packet loss data is 3, and the number of probe messages to be received in the period is 100. Then, the packet loss rate of port 1 to port 2 = (relative packet loss data + lost) Packet data) / The number of probe messages that should be received = (10 + 3) / 100 = 0.13.

Further, if the packet loss rate of port 1 to port 2 is 0.13, the packet loss rate of port 1 to port 3 is 0.15, and the packet loss rate of port 1 to port 4 is 0.05, then the packet loss rate is the minimum (0.05). Calculate the relative packet loss rate of port 1 to

port

2, 3, and 4 for the reference, where the packet loss rate of port 1 to port 2 is 0.08, the packet loss rate of port 1 to port 3 is 0.1, and port 1 to port 4 The packet loss rate is zero.

So far, the detecting device calculates the relative packet loss rate of the detection messages sent between the N ports according to the detection result.

305. The detecting device determines, according to a packet loss rate of sending detection messages between the N ports, whether the first port is faulty.

The first port may be any one of N ports. The packet loss rate of the detection message sent by the at least N/2 ports to the first port is greater than the first preset value, and the packet loss rate of the detection message sent by the at least N/2 ports is smaller than the number of the N ports. The second preset value, the detecting device determines that the first port is faulty; otherwise, the detecting device determines that the first port is not faulty.

Exemplarily, taking Table 6 as an example, it is determined whether the port 1 is faulty according to the relative packet loss rate of the detection messages sent between the four ports. Among them, the data in Table 6 is the percentage data.

Table 6

	端口1 Port 1	端口2 Port 2	端口3 Port 3	端口4Port 4
	端口1 Port 1	端口2 Port 2	端口3 Port 3	端口4Port 4	端口1 Port 1	无no	1.21.2	2.22.2	2.52.5
端口2 Port 2	33	无no	0.030.03	0.030.03	端口1 Port 1	无no	1.21.2	2.22.2	2.52.5
端口2 Port 2	33	无no	0.030.03	0.030.03	端口3 Port 3	44	0.080.08	无no	0.020.02
端口4Port 4	2.32.3	0.90.9	00	无no	端口3 Port 3	44	0.080.08	无no	0.020.02

Specifically, according to the statistics in Table 6, in the four ports, if the

ports

2, 3, and 4 send the detection message to the port 1 the relative packet loss rate is greater than the first preset value (1%), and the

ports

2, 3 The relative packet loss rate of sending the detection message between 4 and 4 is less than the second preset value (0.2%). Therefore, the detecting device determines that port 1 is faulty.

The detecting device may determine whether each of the N ports is faulty according to the foregoing method, that is, detecting whether a port in the "sub-health" state exists in each port of the server affects data transmission efficiency of the port.

306. If the first port is faulty, the detecting device generates a first fault notification.

The first failure notification is used to instruct the server to remove the first port from the LAG.

Specifically, if the first port is faulty, the first port is a port in a “sub-health” state, and the port affects data transmission efficiency. Therefore, the detecting device may generate a first failure notification and send a first failure notification to the server, so that the server removes the first port from the LAG according to the first failure notification, ie stops Data is sent on the port, and the port for data transmission is recalculated in the remaining link according to the load sharing policy. When the failed port is restored, the data sending port is recalculated again, so that the N ports can be implemented. Automatic switching of communication paths.

307. If each port in the server is faulty, the detecting device invokes DRS to perform virtual machine hot migration on the virtual machine running in the server.

Specifically, if the detecting device determines that each port in the server is faulty according to the detection result, the detecting device may invoke DRS to perform virtual machine hot migration on the virtual machine running in the server, or the detecting device may Sending a second failure notification to the server, so that the server invokes the DRS according to the second failure notification to perform virtual machine hot migration on the virtual machine running in the server, and the virtual server in the server on the faulty port The machine is migrated to other servers that do not have a faulty port to ensure that data transmission is not impaired when the virtual machine corresponding to the failed port performs business interaction.

Among them, virtual machine migration (VM Live Migration, also known as dynamic migration, live migration), that is, virtual machine save / restore (Save / Restore) means: the entire virtual machine running state is completely saved, and can be quickly restored Go to the original hardware platform or even different hardware platforms. After recovery, the virtual machine is still running smoothly and the user is not aware of any differences.

308. If each port in the server is faulty, the detecting device generates a second fault notification.

The second fault notification is used to instruct the server to invoke the DRS to perform virtual machine hot migration on the virtual machine running in the server.

309. If the first port is not faulty, and the first port is not in the LAG, the server adds the first port to the LAG, so that data is sent and received through the first port.

Obviously, the above steps 306 to 309 are four possible situations after the step 308, so the steps 306 to 309 are in a side-by-side relationship, and the embodiment of the present invention does not limit the logical relationship between the steps 306 to 309.

In addition, the N ports are physical ports in the server, or are virtual ports in a virtual machine running in the server. Specifically, in the IAAS, the physical port of the server is fully interconnected and the communication fault is detected, and the port of the communication failure is switched. In the PAAS, the virtual port of the virtual machine running in the server is completely interconnected. The detection, in combination with the detection result of the physical port in the IAAS scenario, implements automatic path switching of the port with communication failure.

Optionally, the following provides a method for detecting a communication failure in the PAAS:

In the PAAS, at least one virtual machine is running in each server, and the virtual machine has a virtual port. The method for detecting a communication failure provided by the present invention is used to detect whether the virtual port is faulty.

Among them, virtual machine refers to a complete computer system that runs through a software and has complete hardware system functions and runs in a completely isolated environment.

Specifically, the method for detecting a communication failure in the PAAS may include the following steps:

401. The virtual machine receives, by using the first virtual port, a virtual probe message from the M-1 virtual ports, where the virtual probe message is used to determine the error packet data and the packet loss data of the M-1 ports, where M>2.

The method for receiving the virtual probe message from the M-1 virtual ports may refer to step 301.

The virtual machine generates a virtual probe result according to the virtual probe message, where the virtual probe result includes the packet loss data and the error packet data that the M-1 virtual ports send the virtual probe message to the first virtual port.

The method for generating a virtual probe result according to the virtual probe message may refer to step 302.

403. The virtual machine acquires the detection result from the M virtual ports.

The virtual path detection system may be deployed in the virtual machine, and the detection results from the M virtual ports are periodically received according to steps 401 and 402, and the virtual path is further The path detection system analyzes the faulty virtual port based on the virtual probe results of the M virtual ports.

404. The virtual path detection system determines, according to the virtual detection result, whether the first virtual port is faulty, and the first virtual port is one of the N virtual ports.

Specifically, the virtual path detection system may calculate the packet loss rate of the virtual detection messages sent by the M virtual ports according to the virtual detection result, and the method for calculating the packet loss rate may refer to step 304. In addition, the virtual path detection system determines whether the first virtual port is faulty according to the packet loss rate of the virtual detection messages sent by the N virtual ports, and the method for determining whether the first virtual port is faulty may be referred to step 305.

405. If the first virtual port is faulty, the virtual path detection system generates a virtual fault information to report to the VNFM, so that the VNFM sends the virtual fault information to the detecting device in the IAAS.

VNFM (Virtual Net Function Manager) refers to the management software of the virtual machine in the NFV (Network Function Virtualization), which can be used to complete the initial deployment and life cycle of the application network element. Management, flexible management, virtual layer virtualization and key alarms at the hardware layer, and reporting of KPIs (Key Performance Indicators) are important for scheduling and allocating virtual resources.

Specifically, if the virtual path detection system determines that the first virtual port is faulty, the virtual path detection system generates virtual fault information, where the virtual fault information may carry the ID of the first virtual port, and the virtual machine corresponding to the first virtual port The ID, and the ID of the server of the virtual machine corresponding to the first virtual port, the virtual path detection system reports the virtual fault information to the VNFM, and then forwarded by the VNFM to the detecting device in the IAAS.

406. The detecting device in the IAAS performs communication path switching according to the virtual fault information.

Specifically, the detecting device in the IAAS queries whether the physical port on the server of the virtual machine corresponding to the first virtual port is faulty according to the ID of the server in the virtual fault information. If the physical port on the server is not faulty, the detecting device is The virtual machine indicated by the ID of the virtual machine corresponding to the first virtual port performs virtual machine hot migration.

So far, an embodiment of the present invention provides a method for detecting whether a virtual port is faulty in a PAAS, and simultaneously detects a fault in a timely manner in combination with a detection result of a detection device in an IAAS. The virtual port performs communication path switching, and implements path switching in a cloud scenario in which IAAS and PAAS are effectively combined.

It can be seen that the server receives and sends probe messages through the virtual ports or physical ports to form a path detection system that is fully interconnected in the IAAS and PAAS scenarios, and generates probe results to detect the quality of service between the ports. The detection device analyzes the detection results reported by each port, detects the port in the “sub-health” state, and then removes the port in the “sub-health” state from the LAG in time to prevent the server from using the port in the “sub-health” state. Data is sent and received and the data continues to be damaged.

Embodiment 5

An embodiment of the present invention provides a detecting device, as shown in FIG. 8, including:

The obtaining unit 31 is configured to respectively obtain the detection results of the N ports in the X servers, where the detection result includes the error packet data of the other ports determined by each port according to the received probe messages sent by other ports. Packet data, N>2;

a determining unit 32, configured to determine a state of the first port according to the error packet data and the packet loss data of the other port determined by each port in the obtaining unit 31, where the state of the first port is used to indicate the Whether the first port is faulty, and the first port is one of the N ports;

The processing unit 33 is configured to determine, according to the state of the first port in the determining unit 32, Generating a failure notification of the first port.

Further, as shown in FIG. 9, the determining unit 32 includes a calculating subunit 321, wherein

The calculating sub-unit 321 is configured to separately calculate, according to the detection result, a packet loss rate of the detection messages sent by the N ports to each other;

The determining unit 32 is configured to determine whether the first port is faulty according to a packet loss rate of the detection messages sent by the N ports in the computing subunit 321 .

Further, the calculating sub-unit 321 is specifically configured to convert the error packet data in the detection result into relative packet loss data according to a first preset function; and according to the relative packet loss data and the detection result. The packet loss data is calculated according to the second preset function, respectively, and the packet loss rate of the detection messages sent by the N ports is calculated.

Further, the determining unit 32 is configured to: if the N2 ports of the N ports send the detection message to the first port, the packet loss rate is greater than the first preset value, And determining, by the at least N/2 ports, that the packet loss rate of the detection message is less than a second preset value, determining that the first port is faulty; otherwise, determining that the first port is faulty.

further,

The processing unit 33 is specifically configured to generate the first fault notification of the first port, so that after the server acquires the first fault notification, the first port is removed from the LAG;

further,

The processing unit 33 is specifically configured to generate the second failure notification of the first port, so that the server acquires the second failure notification, and invokes a distributed resource scheduler DRS to run in the server. Virtual machine for virtual machine hot migration;

Further, the N ports are physical ports in the X servers, or are virtual ports in virtual machines running in the X servers.

An embodiment of the present invention provides a server, as shown in FIG. 10, including:

The receiving unit 41 is configured to receive, by using the first port, a probe message from the N-1 ports in the other server, where the probe message is used to determine the error packet data and the packet loss data of the N-1 ports, where N>2 ;

The processing unit 42 is configured to generate a detection result according to the detection message of the receiving unit 41, where the detection result includes the packet loss data and the error packet data of the N-1 ports sending the probe message to the first port ;

The obtaining unit 43 is configured to acquire, according to the detection result of the processing unit 42 , a fault notification sent by the detecting device, where the fault notification is used to indicate whether the first port is faulty.

Further, the first port is a physical port in the server, or is a virtual port in a virtual machine running in the server, wherein, as shown in FIG. 11, the server further includes a removing unit 44. And migration unit 45,

The removing unit 44 is configured to: if the first port in the obtaining unit 43 is a physical port in the server, and the first port is faulty, according to the fault notification in the acquiring unit 43 Removing the first port from the LAG;

The migrating unit 45 is configured to: if the first port in the acquiring unit 43 is a virtual port in a virtual machine running in the server, and the first port is faulty, according to the acquiring unit 43 The fault notification in the virtual machine performs virtual machine hot migration on the virtual machine corresponding to the first port.

Further, the processing unit 42 is further configured to: if the first port in the obtaining unit 43 is not faulty, query whether the first port is in the LAG; and if the first port is not in the In the LAG, the first port is added to the LAG to perform data transmission and reception through the first port.

Further, the processing unit 42 is specifically configured to calculate, according to the number of the probe messages in the receiving unit 41 received in the preset time, the loss of the N-1 ports to the first port. Packet data; and analyzing whether the probe message is a wrong packet according to the probe message in the receiving unit 41 received within the preset time, to count the statistics The error packet data of the N-1 ports to the first port; and the detection result is generated according to the packet loss data and the error packet data.

Further, as shown in FIG. 12, the server further includes a sending unit 46,

The obtaining unit 43 is further configured to separately obtain media access control MAC addresses of the N-1 ports;

The processing unit 42 is further configured to construct the probe message according to the MAC address in the acquiring unit 43;

The sending unit 46 is configured to send the probe message in the processing unit 42 to the N-1 ports by using the first port according to the MAC address of the N-1 ports in the acquiring unit 43.

In the prior art, the server can only detect whether each port of its own port is available through the LAG, that is, whether the port can transmit data, and cannot detect the "sub-health" situation in which the data is transmitted when the port is faulty (for example, when sending a data packet) The occurrence of a large number of packet loss, or tampering with the contents of the data packet, etc., causes the data transmitted through the "sub-health" port to continue to be damaged, thereby reducing the reliability of data transmission. The detection method of the communication failure provided by the present invention can detect the port of the “sub-health” state, and then remove the port of the “sub-health” state from the LAG in time, thereby improving the reliability of data transmission.

It will be apparent to those skilled in the art that for the convenience and brevity of the description, The following is an example of the division of each functional module. In practical applications, the above-mentioned function assignment can be completed by different functional modules as needed, that is, the internal structure of the device is divided into different functional modules to complete all of the above descriptions or Some features. For the specific working process of the system, the device and the unit described above, reference may be made to the corresponding process in the foregoing method embodiments, and details are not described herein again.

In the several embodiments provided by the present application, it should be understood that the disclosed system, apparatus, and method may be implemented in other manners. For example, the device embodiments described above are merely illustrative. For example, the division of the modules or units is only a logical function division. In actual implementation, there may be another division manner, for example, multiple units or components may be used. Combinations can be integrated into another system, or some features can be ignored or not executed. In addition, the mutual coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection through some interface, device or unit, and may be in an electrical, mechanical or other form.

The units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, each functional unit in each embodiment of the present invention may be integrated into one processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit. The above integrated unit can be implemented in the form of hardware or in the form of a software functional unit.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a standalone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention, which is essential or contributes to the prior art, or all or part of the technical solution, may be embodied in the form of a software product stored in a storage medium. A number of instructions are included to cause a computer device (which may be a personal computer, server, or network device, etc.) or a processor to perform all or part of the steps of the methods described in various embodiments of the present invention. The foregoing storage medium includes: a U disk, a mobile hard disk, a read-only memory (ROM), a random access memory (RAM), a magnetic disk, or an optical disk, and the like. .

The above is only a specific embodiment of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily think of changes or substitutions within the technical scope of the present invention. It should be covered by the scope of the present invention. Therefore, the scope of the invention should be determined by the scope of the appended claims.

Claims

A method for detecting a communication failure, characterized in that it comprises:

The detecting device obtains the detection results of the N ports in the X servers, and the detection result includes the error packet data and the packet loss data of the other ports determined by each port according to the received probe messages sent by other ports. >2,X>2;

Determining, by the detecting device, the status of the first port according to the error packet data and the packet loss data of the other port determined by each port, where the status of the first port is used to indicate whether the first port is faulty ;

The detecting device generates a failure notification of the first port according to the state of the first port.
The method according to claim 1, wherein the detecting device determines the state of the first port according to the error packet data and the packet loss data of the other port determined by each port, including:

The detecting device calculates, according to the detection result, a packet loss rate of the detection messages sent by the N ports to each other;

The detecting device determines whether the first port is faulty according to a packet loss rate of the detection messages sent between the N ports.
The method according to claim 2, wherein the detecting device separately calculates a packet loss rate of the detection messages sent by the N ports according to the detection result, including:

The detecting device converts the error packet data in the detection result into relative packet loss data according to the first preset function;

The detecting device calculates a packet loss rate of the detection messages between the N ports according to the second preset function according to the relative packet loss data and the packet loss data in the detection result.
The method according to claim 2, wherein the detecting device determines whether the first port is faulty according to a packet loss rate of the detection messages sent by the N ports, including:

And the packet loss rate of the at least N/2 ports that send the detection message to the first port is greater than a first preset value, and the at least N/2 ports are sent between the N ports. If the packet loss rate of the detection message is less than a second preset value, the detecting device determines that the first port is faulty; otherwise, the detecting device determines that the first port is not barrier.
The method according to any one of claims 1 to 4, wherein the failure notification includes a first failure notification, the first failure notification is used to indicate that the first port is faulty,

The generating a failure notification of the first port includes:

The detecting device generates the first failure notification of the first port, so that after the server acquires the first failure notification, the first port is removed from the link aggregation group LAG.
The method according to claim 5, wherein the failure notification includes a second failure notification, and the second failure notification is used to indicate that all of the N ports in the X servers are faulty.

The generating a failure notification of the first port includes:

The detecting device generates the second failure notification of the first port, so that the server acquires the second failure notification, and invokes a distributed resource scheduler DRS to virtualize a virtual machine running in the server. Machine heat migration.
The method according to any one of claims 1 to 6, wherein the N ports are physical ports in the X servers, or in a virtual machine running in the X servers Virtual port.
A method for detecting a communication failure, characterized in that it comprises:

The server receives the probe message from the N-1 ports in the other server through the first port, where the probe message is used to determine the error packet data and the packet loss data of the N-1 ports, where N>2;

The server generates a detection result according to the detection message, where the detection result includes packet loss data and error packet data of the N-1 ports sending the probe message to the first port;

The server acquires a fault notification sent by the detecting device according to the detection result, where the fault notification is used to indicate whether the first port is faulty.
The method according to claim 8, wherein the first port is a physical port in the server, or is a virtual port in a virtual machine running in the server.

After the server obtains the fault notification sent by the detecting device according to the detection result, the method further includes:

If the first port is a physical port in the server, and the first port has If the fault occurs, the server removes the first port from the link aggregation group LAG according to the fault notification;

If the first port is a virtual port in the virtual machine running in the server, and the first port is faulty, the server virtualizes the virtual machine corresponding to the first port according to the failure notification. Machine heat migration.
The method according to claim 8, wherein after the server obtains the failure notification sent by the detection device according to the detection result, the method further includes:

If the first port is not faulty, the server queries whether the first port is in the LAG;

If the first port is not in the LAG, the server adds the first port to the LAG to perform data transmission and reception through the first port.
The method according to claim 8, wherein the server generates the detection result according to the detection message, including:

The server calculates packet loss data of the N-1 ports to the first port according to the number of the probe messages received in the preset time;

The server analyzes whether the probe message is a wrong packet according to the probe message received in the preset time, to collect error packet data of the N-1 ports to the first port;

The server generates the detection result according to the lost packet data and the error packet data.
The method of claim 8 further comprising:

The server respectively obtains media access control MAC addresses of the N-1 ports;

The server constructs the probe message according to the MAC address;

The server sends the probe message to the N-1 ports by using the first port according to the MAC address of the N-1 ports.
A detecting device, comprising:

The obtaining unit is configured to obtain the detection results of the N ports in the X servers respectively, where the detection result includes the error packet data and the packet loss of the other port determined by each port according to the received probe message sent by the other port. Data, N>2, X>2;

a determining unit, configured to determine a state of the first port according to the error packet data and the packet loss data of the other port determined by each port in the acquiring unit, where the state of the first port is used to indicate the first Whether the port is faulty, the first port is the N ends One of the mouths;

And a processing unit, configured to generate a failure notification of the first port according to a state of the first port in the determining unit.
The detecting device according to claim 13, wherein the determining unit comprises a calculating subunit, wherein

The calculating subunit is configured to calculate, according to the detection result, a packet loss rate of the detection messages sent by the N ports to each other;

The determining unit is specifically configured to determine, according to a packet loss rate of the detection messages sent by the N ports in the computing subunit, whether the first port is faulty.
The detecting device according to claim 14, wherein

The calculating subunit is specifically configured to convert the error packet data in the detection result into relative packet loss data according to a first preset function; and according to the relative packet loss data and the packet loss data in the detection result. And calculating, according to the second preset function, a packet loss rate of the detection messages sent by the N ports to each other.
The detecting device according to claim 14, wherein

The determining unit is configured to: if the N/2 ports of the N ports send the detection message to the first port, the packet loss rate is greater than a first preset value, and the determining If the packet loss rate of the detection message sent between the N/2 ports is less than the second preset value, it is determined that the first port is faulty; otherwise, the first port is determined to be faultless.
A detecting device according to any one of claims 13 to 16, wherein

The processing unit is specifically configured to generate the first fault notification of the first port, so that after the server obtains the first fault notification, the first port is removed from the link aggregation group LAG;

The fault notification includes a first fault notification, where the first fault notification is used to indicate that the first port is faulty.
The detecting device according to claim 17, wherein

The processing unit is specifically configured to generate the second failure notification of the first port, so that the server acquires the second failure notification, and invokes a distributed resource scheduler DRS to run in the server. The virtual machine performs virtual machine hot migration;

The fault notification includes a second fault notification, where the second fault notification is used to indicate that all the N ports in the X servers are faulty.
The detecting device according to any one of claims 13 to 18, wherein the N ports are physical ports in the X servers, or are in a virtual machine running in the X servers Virtual port.
A server, comprising:

a receiving unit, configured to receive, by using the first port, a probe message from the N-1 ports in the other server, where the probe message is used to determine the error packet data and the packet loss data of the N-1 ports, where N>2;

a processing unit, configured to generate a detection result according to the detection message of the receiving unit, where the detection result includes packet loss data and error packet data of the N-1 ports sending the probe message to the first port;

And an obtaining unit, configured to acquire, according to the detection result of the processing unit, a fault notification sent by the detecting device, where the fault notification is used to indicate whether the first port is faulty.
The server according to claim 20, wherein the first port is a physical port in the server, or is a virtual port in a virtual machine running in the server, wherein the server further includes Remove the unit and the migration unit,

The removing unit is configured to: if the first port in the acquiring unit is a physical port in the server, and the first port is faulty, according to the fault notification in the acquiring unit, The first port is removed from the link aggregation group LAG;

The migrating unit is configured to: if the first port in the acquiring unit is a virtual port in a virtual machine running in the server, and the first port is faulty, according to the fault in the acquiring unit Notifying that the virtual machine corresponding to the first port performs virtual machine hot migration.
A server according to claim 20, wherein

The processing unit is further configured to: if the first port in the acquiring unit is not faulty, query whether the first port is in the LAG; and if the first port is not in the LAG, Adding the first port to the LAG for data transceiving through the first port.
A server according to claim 20, wherein

The processing unit is configured to calculate, according to the number of the probe messages in the receiving unit received in the preset time, the packet loss data of the N-1 ports to the first port; The probe message in the receiving unit received in the preset time analyzes whether the probe message is a wrong packet, to collect error packet data of the N-1 ports to the first port; The packet loss data and the error packet data generate the probe Test results.
The server according to claim 20, wherein said server further comprises a transmitting unit,

The acquiring unit is further configured to separately obtain media access control MAC addresses of the N-1 ports;

The processing unit is further configured to construct the probe message according to a MAC address in the acquiring unit;

The sending unit is configured to send, by using the first port, a probe message in the processing unit to the N-1 ports according to a MAC address of the N-1 ports in the acquiring unit.
A detection system for communication failure, characterized in that the detection system comprises the detection device according to any one of claims 13 to 19, and the connection device as claimed in any one of claims 20 to 24 One of the described servers.