WO2016029749A1 - Communication failure detection method, device and system - Google Patents

Communication failure detection method, device and system Download PDF

Info

Publication number
WO2016029749A1
WO2016029749A1 PCT/CN2015/084002 CN2015084002W WO2016029749A1 WO 2016029749 A1 WO2016029749 A1 WO 2016029749A1 CN 2015084002 W CN2015084002 W CN 2015084002W WO 2016029749 A1 WO2016029749 A1 WO 2016029749A1
Authority
WO
WIPO (PCT)
Prior art keywords
port
ports
server
data
packet loss
Prior art date
Application number
PCT/CN2015/084002
Other languages
French (fr)
Chinese (zh)
Inventor
张小东
田彦峰
孙名逊
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2016029749A1 publication Critical patent/WO2016029749A1/en

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/40Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks using virtualisation of network functions or resources, e.g. SDN or NFV entities

Definitions

  • the present invention relates to the field of communications, and in particular, to a method, device and system for detecting a communication failure.
  • network port aggregation and switch stacking are used to improve network plane reliability.
  • each port in the server may be unavailable due to some failures, which may result in a communication path between the ports being unavailable.
  • the LAG Link Aggregation Group in the server can periodically detect the status of its own port.
  • the server is in accordance with the Link Aggregation Control Protocol (LACP).
  • LACP Link Aggregation Control Protocol
  • the unavailable port is removed from the LAG to implement the switching of the communication path.
  • port 1 of server 1 is unavailable, and ports 2, 3, and 4 are running normally, port 1 is removed from the LAG, and LAG automatically selects ports 2, 3, and 4 for data. Forwarding of the package.
  • each port when each port sends and receives data packets, it may be in a "sub-health" state due to some faults (for convenience of description, the present invention uniformly refers to a port in the "sub-health" state as a faulty port).
  • the port can still be Other ports perform packet transmission and reception (that is, the port is still available), but the port may drop packets when sending packets, or tamper with the contents of the packet, etc., due to the status of the port to other ports. It is still available, so the LAG cannot detect the abnormality of the port when sending and receiving data packets, nor can it switch the communication path associated with the port. In this way, the faulty port ("sub-health" port) is used. The transmitted data will continue to be compromised, increasing the risk of data transmission.
  • the embodiment of the invention provides a method, a device and a system for detecting a communication failure, which solves the problem that the LAG cannot detect a faulty port in which abnormal operation occurs in the prior art, and avoids the risk of transmitting data using the faulty port.
  • an embodiment of the present invention provides a method for detecting a communication failure, including:
  • the detecting device obtains the detection results of the N ports in the X servers, and the detection result includes the error packet data and the packet loss data of the other ports determined by each port according to the received probe messages sent by other ports. >2,X>2;
  • the detecting device generates a failure notification of the first port according to the state of the first port.
  • the detecting device determines, according to the error packet data and the packet loss data of the other port determined by each port, whether the first port is faulty, including:
  • the detecting device calculates, according to the detection result, a packet loss rate of the detection messages sent by the N ports to each other;
  • the detecting device determines whether the first port is faulty according to a packet loss rate of the detection messages sent between the N ports.
  • the detecting device separately calculates, according to the detection result, the detecting, by the N ports
  • the packet loss rate of the message including:
  • the detecting device converts the error packet data in the detection result into relative packet loss data according to the first preset function
  • the detecting device calculates a packet loss rate of the detection messages between the N ports according to the second preset function according to the relative packet loss data and the packet loss data in the detection result.
  • the detecting device determines, according to a packet loss rate of the detection messages sent between the N ports Whether the first port is faulty, including:
  • the detecting device determines The first port is faulty; otherwise, the detecting device determines that the first port is not faulty.
  • the fault notification includes a first failure notification, where the first failure notification is used to indicate that the first port is faulty,
  • the generating a failure notification of the first port includes:
  • the detecting device generates the first failure notification of the first port, so that after the server acquires the first failure notification, the first port is removed from the link aggregation group LAG.
  • the fault notification includes a second fault notification, where the second fault notification is used to indicate the X N ports in the server are faulty.
  • the generating a failure notification of the first port includes:
  • the detecting device generates the second failure notification of the first port, so that the server acquires the second failure notification, and invokes a DRS (Distributed Resource Scheduler) to the server.
  • the running virtual machine performs virtual machine hot migration.
  • the N ports The physical port in the X servers, or the virtual port in the virtual machine running in the X servers.
  • an embodiment of the present invention provides a method for detecting a communication failure, including:
  • the server receives the probe message from the N-1 ports in the other server through the first port, where the probe message is used to determine the error packet data and the packet loss data of the N-1 ports, where N>2;
  • the server generates a detection result according to the detection message, where the detection result includes the packet loss data and the number of error packets sent by the N-1 ports to the first port by the probe message. according to;
  • the server acquires a fault notification sent by the detecting device according to the detection result, where the fault notification is used to indicate whether the first port is faulty.
  • the first port is a physical port in the server, or is a virtual port in a virtual machine running in the server,
  • the method further includes:
  • the server removes the first port from the link aggregation group LAG according to the failure notification;
  • the server virtualizes the virtual machine corresponding to the first port according to the failure notification. Machine heat migration.
  • the method further includes:
  • the server queries whether the first port is in the LAG;
  • the server adds the first port to the LAG to perform data transmission and reception through the first port.
  • the server generates the detection result according to the detection message, including:
  • the server calculates packet loss data of the N-1 ports to the first port according to the number of the probe messages received in the preset time;
  • the server analyzes whether the probe message is a wrong packet according to the probe message received in the preset time, to collect error packet data of the N-1 ports to the first port;
  • the server generates the detection result according to the lost packet data and the error packet data.
  • the method further includes:
  • the server constructs the probe message according to the MAC address
  • the server sends the probe message to the N-1 ports by using the first port according to the MAC address of the N-1 ports.
  • an embodiment of the present invention provides a detecting apparatus, including:
  • the obtaining unit is configured to obtain the detection results of the N ports in the X servers respectively, where the detection result includes the error packet data and the packet loss of the other port determined by each port according to the received probe message sent by the other port.
  • Data N>2, X>2;
  • a determining unit configured to determine a state of the first port according to the error packet data and the packet loss data of the other port determined by each port in the acquiring unit, where the state of the first port is used to indicate the first Whether the port is faulty, the first port is one of the N ports;
  • a processing unit configured to generate a failure notification of the first port according to a state of the first port in the determining unit.
  • the determining unit includes a computing subunit, where
  • the calculating subunit is configured to calculate, according to the detection result, a packet loss rate of the detection messages sent by the N ports to each other;
  • the determining unit is specifically configured to determine whether the first port is faulty according to a packet loss rate of the detection messages sent between the N ports.
  • the calculating subunit is specifically configured to convert the error packet data in the detection result into relative packet loss data according to a first preset function; and according to the relative packet loss data and the packet loss data in the detection result. And calculating, according to the second preset function, a packet loss rate of the detection messages sent by the N ports to each other.
  • the determining unit is configured to: if the N/2 ports of the N ports send the detection message to the first port, the packet loss rate is greater than a first preset value, and the determining If the packet loss rate of the detection message sent between the N/2 ports is less than the second preset value, it is determined that the first port is faulty; otherwise, the first port is determined to be faultless.
  • the processing unit is specifically configured to generate the first fault notification of the first port, so that after the server acquires the first fault notification, the first port is removed from the LAG;
  • the fault notification includes a first fault notification, where the first fault notification is used to indicate that the first port is faulty.
  • the processing unit is specifically configured to generate the second failure notification of the first port, so that the server acquires the second failure notification, and invokes a distributed resource scheduler DRS to run in the server.
  • the virtual machine performs virtual machine hot migration;
  • the fault notification includes a second fault notification, where the second fault notification is used to indicate that all the N ports in the X servers are faulty.
  • the N ports The physical port in the X servers, or the virtual port in the virtual machine running in the X servers.
  • an embodiment of the present invention provides a server, including:
  • a receiving unit configured to receive, by using the first port, a probe message from the N-1 ports in the other server, where the probe message is used to determine the error packet data and the packet loss data of the N-1 ports, where N>2;
  • a processing unit configured to generate a detection result according to the detection message, where the detection result includes packet loss data and error packet data of the N-1 ports sending the probe message to the first port;
  • an obtaining unit configured to acquire, according to the detection result, a fault notification sent by the detecting device, where the fault notification is used to indicate whether the first port is faulty.
  • the first port is a physical port in the server, or is a virtual port in a virtual machine running in the server, where the server is further Including removal unit and migration unit,
  • the removing unit is configured to: if the first port in the acquiring unit is a physical port in the server, and the first port is faulty, according to the fault notification in the acquiring unit, The first port is removed from the LAG;
  • the migrating unit is configured to: if the first port in the acquiring unit is a virtual port in a virtual machine running in the server, and the first port is faulty, according to the fault in the acquiring unit Notifying that the virtual machine corresponding to the first port performs virtual machine hot migration.
  • the processing unit is further configured to: if the first port is not faulty, query whether the first port is in the LAG; and if the first port is not in the LAG, then the first A port is added to the LAG for data transceiving through the first port.
  • the processing unit is configured to calculate, according to the number of the probe messages in the receiving unit received in the preset time, the packet loss data of the N-1 ports to the first port;
  • the probe message in the receiving unit received in the preset time analyzes whether the probe message is a wrong packet, to collect error packet data of the N-1 ports to the first port;
  • the packet loss data and the error packet data generate the detection result.
  • the server further includes a sending unit
  • the obtaining unit is further configured to separately obtain media access control of the N-1 ports MAC address;
  • the processing unit is further configured to construct the probe message according to a MAC address in the acquiring unit;
  • the sending unit is configured to send, by using the first port, a probe message in the processing unit to the N-1 ports according to a MAC address of the N-1 ports in the acquiring unit.
  • an embodiment of the present invention provides a communication failure detection system, where the detection system includes the third aspect and any one of the first to sixth possible implementation manners of the third aspect.
  • An embodiment of the present invention provides a method, an apparatus, and a system for detecting a communication failure.
  • the detection device obtains a detection result of N ports in the server, and the detection result is generated by the server according to the detection messages respectively received by the N ports.
  • the detection result includes the error packet data and the packet loss data of the other port determined by each port according to the detection message sent by the other port, and therefore, the detection device determines the error of the other port according to each port.
  • Packet data and packet loss data determine whether one of the N ports is a faulty port, to detect whether a port with a "sub-health" status affects the efficiency of data transmission through the port, thereby improving the reliability of data transmission. Sex.
  • FIG. 1 is a schematic structural diagram of a detection system for communication failure in the prior art
  • FIG. 2 is a schematic structural diagram of a communication fault detection system according to an embodiment of the present invention.
  • FIG. 3 is a schematic diagram of hardware of a detecting device according to an embodiment of the present disclosure.
  • FIG. 4 is a schematic diagram of hardware of a server according to an embodiment of the present invention.
  • FIG. 5 is a flowchart 1 of a method for detecting a communication fault according to an embodiment of the present invention
  • FIG. 6 is a second flowchart of a method for detecting a communication fault according to an embodiment of the present invention.
  • FIG. 7 is a flowchart 3 of a method for detecting a communication fault according to an embodiment of the present invention.
  • FIG. 8 is a schematic structural diagram 1 of a detecting device according to an embodiment of the present disclosure.
  • FIG. 9 is a schematic structural diagram 2 of a detecting device according to an embodiment of the present disclosure.
  • FIG. 10 is a schematic structural diagram 1 of a server according to an embodiment of the present disclosure.
  • FIG. 11 is a schematic structural diagram 2 of a server according to an embodiment of the present disclosure.
  • FIG. 12 is a schematic structural diagram 3 of a server according to an embodiment of the present invention.
  • system and “network” are used interchangeably herein.
  • network In order to facilitate the understanding of a method, device and system for detecting communication failures provided by the embodiments of the present invention, some concepts related to the present invention are first introduced.
  • Port aggregation also known as ethernet channel
  • ethernet channel is mainly used for connections between switches or servers. If port aggregation is used, the switch will combine a group of physical ports as a logical channel (such as ports 1, 2, 3, and 4 shown in Figure 1,) that is, channel-group, so the switch considers this logical channel as One port. After port aggregation technology is used, as long as not all ports in the group are down (down), communication between the two switches can still continue. In this way, port aggregation technology can be used to allow multiple switches to pass multiple ports. Parallel connections simultaneously transmit data to provide higher bandwidth, greater throughput and recoverability, increasing system reliability.
  • Switch stacking refers to combining more than one switch to work together to provide as many ports as possible in a limited space. After multiple switches are stacked, they have sufficient system bandwidth and increase system reliability.
  • the LACP protocol is a protocol for implementing dynamic aggregation and de-aggregation of links. After the LACP protocol of a port is used, the port will advertise its system priority, system MAC address, port priority, and port number to the peer end by sending LACPdu. After receiving the information, the peer compares the information with the information saved by other ports to select the port that can be aggregated. Therefore, the two parties can agree to join or exit the dynamic aggregation group.
  • Link aggregation refers to the binding of multiple physical ports to a logical port to implement the load balancing of the inbound/outbound traffic on each member port.
  • the switch determines the report based on the port load balancing policy configured by the user. From which member port the text is sent to the peer switch.
  • the switch or the server detects that the link of one of the member ports is faulty, it stops sending data on this port, and recalculates the port on which the packet is sent in the remaining link according to the load sharing policy. After the faulty port is restored, the port is restored again. Recalculating data retransmission ports, therefore, link aggregation is an important technology in terms of increasing link bandwidth, achieving link transmission flexibility and redundancy.
  • the server involved in the present invention may be various types of servers, such as a blade server, and at least one virtual machine may be running in the server, and the virtual machine includes a virtual port.
  • the switch involved in the present invention is a network device for electrical signal forwarding, which can meet at least Layer 2 switching requirements, that is, can identify MAC address information in a data packet, forward according to the MAC address, and forward the MAC addresses. Record with the corresponding port in an address table inside itself.
  • each port in the server may have a "sub-health" status.
  • the port can still send and receive data packets with other ports (that is, the port is still available). ), but the port may drop packets when sending packets, or tamper with the contents of the packet and other abnormal operations.
  • the LAG cannot detect the port in the “sub-health” state, and the data transmitted through the “sub-health” port will continue to be damaged. Therefore, the embodiment of the present invention provides a method for detecting a communication failure. The device and the system solve the problem that the LAG cannot detect the "sub-health" state that may occur in the port in the prior art, and improve the reliability of data transmission.
  • An embodiment of the present invention provides a communication failure detection system, as shown in FIG. 2, including X servers 01 after link aggregation, and Y switches after switch stacking. 02, and detecting device 03, wherein
  • the server 01 includes at least one port
  • the switch 02 includes at least one port
  • the server 01 and the switch 02 are connected through corresponding ports.
  • the server 01 runs at least one virtual machine, and the virtual machine includes a virtual port.
  • the detecting device 03 may be deployed on any one of the X servers 01, or may be separately deployed in the detecting system of the communication failure independently of the X servers 01.
  • the detecting device 03 respectively obtains the detection results of the N ports in the X servers 01, and the detection result includes each port determining according to the probe message sent by the other ports received.
  • the error packet data and the packet loss data of the other port N>2, X>2; the detecting device 03 determines the first packet according to the error packet data and the packet loss data of the other port determined by each port. Whether a port is faulty, the first port is one of the N ports.
  • the detecting device 03 determines whether the first port is faulty according to the error packet data and the packet loss data of the other port determined by each port, and may specifically include the following steps: the detecting device 03 according to the The detection result is respectively calculated, and the detection device 03 determines the packet loss rate according to the packet loss rate of the detection messages sent by the N ports to each other. Is there a fault in one port?
  • the detecting device 03 respectively calculates, according to the detection result, a packet loss rate of the detection messages sent by the N ports, which may specifically include the following steps: the detecting device 03 detects the detection result.
  • the error packet data is converted into relative packet loss data according to the first preset function; the detecting device 03 respectively calculates the data according to the second preset function according to the relative packet loss data and the packet loss data in the detection result.
  • the packet loss rate of the detection message is sent between the N ports.
  • the detecting device 03 determines whether the first port is faulty according to the packet loss rate of the detection messages sent by the N ports, and may specifically include the following steps: in the N ports, If at least N/2 ports send the detection message to the first port, the packet loss rate is greater than the first preset value, and the packet loss rate of the detection message is sent between the at least N/2 ports Less than the second preset value, the detecting device 03 Then determining that the first port is faulty; otherwise, the detecting device 03 determines that the first port is not faulty.
  • the detecting device 03 may further include the following steps: If the port is faulty, the detecting device 03 sends a first failure notification to the server 01, so that the server 01 removes the first port from the LAG according to the first failure notification.
  • the detecting device 03 may further include the following steps:
  • the detecting device 03 invokes the DRS to perform virtual machine hot migration on the virtual machine running in the server 01, or
  • the detecting device 03 sends a second failure notification to the server 01, so that the server 01 calls the DRS to the server according to the second failure notification.
  • the virtual machine running inside performs virtual machine hot migration.
  • the server 01 receives the probe messages from other N-1 ports in the other server 01 through the first port, and the probe message is used to determine the error of the N-1 ports.
  • Packet data and packet loss data N>2; the server 01 generates a detection result according to the probe message, where the detection result includes the packet loss data of the N-1 ports sending the probe message to the first port, and The server 01 acquires a fault notification sent by the detecting device 03 according to the detection result, and the fault notification is used to indicate whether the first port is faulty.
  • the first port is a physical port in the server 01, or is a virtual port in the virtual machine running in the server 01, where the server 01 sends the detection device 03 according to the detection result.
  • the following steps can also be included:
  • the server 01 removes the first port from the LAG according to the fault notification;
  • the server 01 If the first port is a virtual port in the virtual machine running in the server 01, and the first port is faulty, the server 01 notifies the virtual machine corresponding to the first port according to the fault notification. Perform virtual machine hot migration.
  • the method may further include the following steps: if the first port is not faulty, the server 01 queries whether the first port is in the In the LAG, if the first port is not in the LAG, the server 01 adds the first port to the LAG to perform data transmission and reception through the first port.
  • the server 01 generates the detection result according to the detection message, and may specifically include the following steps: the server 01 calculates the N-1 ports according to the number of the detection messages received within the preset time. The packet loss data of the first port; the server 01 analyzes whether the probe message is a wrong packet according to the probe message received in the preset time, to count the N-1 ports to the first The error packet data of one port; the server 01 generates the detection result according to the packet loss data and the error packet data.
  • the method for detecting the communication failure may further include the following steps: the server 01 acquires media access control MAC addresses of the N-1 ports respectively; and the server 01 constructs the probe message according to the MAC address. The server 01 sends the probe message to the N-1 ports through the first port according to the MAC address of the N-1 ports.
  • the N ports are physical ports in the server 01 or virtual ports in the virtual machines running in the server 01.
  • the embodiment of the present invention provides a method, a device, and a system for detecting a communication failure, which can be applied to an IAAS (Infrastructure as a Service) scenario, and can also be applied to a PAAS (Platform-as- The a-Service (Platform-as-a-Service) scenario is used to implement the automatic switching of the communication plane in the cloud scenario.
  • IAAS Infrastructure as a Service
  • PAAS Platinum-as- The a-Service
  • the method for detecting communication faults in the IAAS and PAAS scenarios will be described in detail in subsequent embodiments, so it will not be described here.
  • IAAS and PAAS are different forms of service in cloud computing.
  • the cloud computing is an increase, use and delivery mode of related services based on the Internet, and generally involves dynamically providing the Internet through the Internet. , easy to extend, and often a virtualized resource.
  • cloud computing can include the following levels of services: Infrastructure as a Service (IAAS), Platform as a Service (PAAS) and Software as a Service (SAAS, Software-as-a-Service).
  • IAAS means that consumers can obtain services from a complete computer infrastructure through the Internet, such as: hardware server rental;
  • PAAS refers to the software development platform as a service, such as: personalized customization of software.
  • the method for detecting the communication fault provided by the embodiment of the present invention can be applied to the IAAS scenario, that is, the communication fault detection of the physical port of the server in the IAAS is fully interconnected, and the path of the communication faulty port is switched.
  • the method for detecting a communication fault provided by the present invention can also be applied to the PAAS scenario, that is, the communication fault detection of the virtual port of the virtual machine running in the server in the PAAS is combined, and the detection result of the physical port in the IAAS scenario is combined. To achieve automatic path switching for ports with communication failures.
  • the server can only detect whether each port of the port is available through the LAG, that is, whether the port can transmit data, and cannot detect the "sub-health" situation in which the data is transmitted when the port is faulty (for example, sending a data packet)
  • the detection method of the communication failure provided by the present invention can detect the port of the “sub-health” state, and then remove the port of the “sub-health” state from the LAG in time, thereby improving the reliability of data transmission.
  • An embodiment of the present invention provides a communication failure detection system, in which a server obtains a detection message from N-1 ports in each server through a first port, and generates a detection result according to the detection message, when the detection device acquires X respectively. After the detection result of the N ports in the server, the first port is determined to be faulty according to the detection result.
  • the detecting device obtains the detection results of the N ports in the X servers respectively, and the detection result is generated by each server according to the detection messages respectively received by the N ports, because the detection result includes each port according to the receiving The probe message sent by the other port is determined, and the error packet data and the packet loss data of the other port are determined.
  • the detecting device determines, according to the error packet data and the packet loss data of the other port determined by each port. Whether one of the N ports is a faulty port, to detect whether a port with a "sub-health" status affects the data transmission efficiency of the port, thereby improving the reliability of data transmission, and solving the problem that the LAG cannot be detected in the prior art.
  • the problem of a failed port to an abnormal operation avoids the risk of using the failed port to transfer data.
  • FIG. 3 is a schematic diagram showing the hardware of the detecting device provided by the embodiment of the present invention:
  • the detecting device may be a server or a blade, and the detecting device may be deployed in a server that reports the detection result in the detecting system of the communication fault, or may introduce a new server as a detecting device in the detecting system of the communication fault, specifically:
  • the detecting device includes a processor 11, a transceiver module 12, and a memory 13, wherein
  • the processor 11 is a control center of the detecting device, and the detecting device performs various functions and processing data of the detecting device by running or executing a software program and/or a module stored in the memory, and calling data stored in the memory. .
  • the transceiver module 12 can be used for receiving and transmitting signals during the process of transmitting and receiving information.
  • the transceiver module can communicate with the network and other devices through wireless communication.
  • the wireless communication can use any communication standard or protocol.
  • the transceiver module can perform data transmission and reception based on the LACP protocol or the ARP (Address Resolution Protocol).
  • the memory 13 can be used to store software programs and modules, and the processor executes various functional applications and data processing of the detecting device by running software programs and modules stored in the memory.
  • the transceiver module 12 obtains the detection results of the N ports in the X servers respectively, and the detection result includes the error of the other ports determined by each port according to the received probe messages sent by other ports.
  • Packet data and packet loss data N>2, X>2; the processor 11 determines, according to the error packet data and the packet loss data of the other port determined by each port, whether the first port is faulty, the first One port is one of the N ports.
  • the processor 11 determines whether the first port is faulty according to the packet loss data and the error packet data of the detection messages sent by the N ports, and may further include the following steps: the processor 11 according to the The detection results respectively calculate a packet loss rate of the detection messages sent by the N ports to the memory 13; and the processor 11 sends a packet loss of the detection message according to the N ports. The rate determines if the first port is faulty.
  • the processor 11 respectively calculates, according to the detection result, a packet loss rate of the detection messages sent by the N ports, and may further include the following steps: the processor 11
  • the error packet data is converted into relative packet loss data according to the first preset function; the processor 11 respectively calculates the data according to the second preset function according to the relative packet loss data and the packet loss data in the detection result.
  • the packet loss rate of the detection message is sent between the N ports.
  • the processor 11 determines whether the first port is faulty according to the packet loss rate of the detection messages sent by the N ports, and may further include the following steps: in the N ports, If at least N/2 ports send the detection message to the first port, the packet loss rate is greater than the first preset value, and the packet loss rate of the detection message is sent between the at least N/2 ports Less than the second preset value, the processor 11 determines that the first port is faulty; otherwise, the processor 11 determines that the first port is not faulty.
  • the processor 11 may further include the following steps: if the processor 11 determines The first port is faulty, and the processor 11 sends a first failure notification to the server corresponding to the first port by the transceiver module 12, so that the server sends the first according to the first failure notification.
  • the port is removed from the LAG.
  • the processor 11 may further include the following steps: if the processor 11 determines the Each port in the X servers is faulty, and the processor 11 calls the DRS in the memory 13 to perform virtual machine hot migration on the virtual machines running in the X servers, or
  • the processor 11 determines that all the X ports in the server are faulty, the processor 11 sends a second failure notification to the X servers through the transceiver module 12, so that the X servers are according to the The second failure notification invokes the DRS to perform virtual machine hot migration on the virtual machines running in the X servers.
  • the N ports are physical ports in the server, or are virtual ports in a virtual machine running in the server.
  • the communication fault detection of the physical port of the server is fully interconnected, and the end of the communication failure is performed.
  • the port performs path switching.
  • the virtual port of the virtual machine running in the server is fully interconnected and detected, and the detection result of the physical port in the IAAS scenario is combined to implement automatic path switching of the port with communication failure.
  • FIG. 4 is a schematic diagram of hardware of a server provided by an embodiment of the present invention:
  • the server can be for various types of servers (such as blade servers, etc.), specifically:
  • the server includes a processor 21, a transceiver module 22, and a memory 23, where
  • the processor 21 is a control center of the server, and the server performs various functions and processing data of the detecting device by running or executing software programs and/or modules stored in the memory, and calling data stored in the memory.
  • the transceiver module 22 can be used for receiving and transmitting signals during the process of transmitting and receiving information.
  • the transceiver module can communicate with the network and other devices through wireless communication.
  • the wireless communication can use any communication standard or protocol.
  • the transceiver module can perform data transmission and reception based on the LACP protocol or the ARP protocol.
  • the memory 23 can be used to store software programs and modules, and the processor executes various functional applications and data processing of the server by running software programs and modules stored in the memory.
  • the transceiver module 22 receives the probe message from the N-1 ports in the other server through the first port, where the probe message is used to determine the error packet data and the packet loss data of the N-1 ports.
  • the N21 generates a detection result according to the detection message and sends the detection result to the transceiver module 22.
  • the detection result includes the packet loss data sent by the N-1 ports to the first port by the detection message.
  • the transceiver module 22 acquires the fault notification sent by the detecting device according to the detection result, and sends the fault notification to the processor 21, where the fault notification is used to indicate whether the first port is faulty.
  • the first port is a physical port in the server, or is a virtual port in a virtual machine running in the server.
  • the method may further include the following steps: if the first port is a physical port in the server, and the The first port is faulty, where The processor 21 then removes the first port from the LAG in the memory 23 according to the failure notification;
  • the processor 21 If the first port is a virtual port in the virtual machine running in the server, and the first port is faulty, the processor 21 notifies the virtual machine corresponding to the first port according to the fault notification. Perform virtual machine hot migration.
  • the method may further include the following steps: if the first port is not faulty, the processor 21 queries the Whether the first port is in the LAG of the memory 23; if the first port is not in the LAG, the processor 21 adds the first port to the LAG and updates the added LAG to The memory 23 is configured such that the transceiver module 22 performs data transmission and reception through the first port.
  • the processor 21 generates a detection result according to the detection message and sends the detection result to the transceiver module 22, which may include the following steps: the processor 21 calculates the location according to the number of the detection messages received within the preset time. The packet loss data of the N-1 ports to the first port is saved to the memory 23; the processor 21 analyzes whether the probe message is a wrong packet according to the probe message received within the preset time. Collecting the error packet data of the N-1 ports to the first port and saving to the memory 23; the processor 21 generates the detection according to the packet loss data and the error packet data in the memory 23 result.
  • the method for detecting a communication failure may further include the following steps: the processor 21 acquires MAC addresses of the N-1 ports respectively; and the processing constructs the probe message according to the MAC address; The transceiver module 22 sends the probe message to the N-1 ports by using the first port according to the MAC address of the N-1 ports.
  • the server can only detect whether each port of its own port is available through the LAG, that is, whether the port can transmit data, and cannot detect the "sub-health" situation in which the data is transmitted when the port is faulty (for example, When a large amount of packet loss occurs during the transmission of a data packet, or the content in the data packet is falsified, the data transmitted through the "sub-health" port will continue to be damaged, and the reliability of data transmission is lowered.
  • the detection method of the communication failure provided by the present invention can detect the port of the “sub-health” state, and then remove the port of the “sub-health” state from the LAG in time, thereby improving data transmission. The reliability of the loss.
  • An embodiment of the present invention provides a device for detecting a communication failure, in which a server acquires a probe message from N-1 ports in each server through a first port, and generates a probe result according to the probe message, where the detection device acquires X respectively. After the detection result of the N ports in the server, the first port is determined to be faulty according to the detection result.
  • the detecting device obtains the detection results of the N ports in the X servers respectively, and the detection result is generated by each server according to the detection messages respectively received by the N ports, because the detection result includes each port according to the receiving
  • the probe message sent by the other port is determined, and the error packet data and the packet loss data of the other port are determined.
  • the detecting device determines, according to the error packet data and the packet loss data of the other port determined by each port. Whether one of the N ports is a faulty port, to detect whether a port with a "sub-health" status affects the data transmission efficiency of the port, thereby improving the reliability of data transmission, and solving the problem that the LAG cannot be detected in the prior art.
  • the problem of a failed port to an abnormal operation avoids the risk of using the failed port to transfer data.
  • An embodiment of the present invention provides a method for detecting a communication failure, as shown in FIG. 5, including:
  • the detection device obtains the detection results of the N ports in the X servers, and the detection result includes the error packet data and the packet loss data of the other ports determined by each port according to the detection messages sent by the other ports.
  • N >2, X>2, wherein the N ports are communication fault detection systems, and each port in the server after port aggregation (as shown in port 1, 2, 3, and 4 of server 1 in FIG. 2) ).
  • the detection result is generated by each server according to the received detection message and reported to the detection device.
  • the detection result includes packet loss data and error packet data of the detection messages sent by the N ports, as shown in Table 1.
  • the detection result sent by the server to the detecting device through the port 1 includes the error packet data and the packet loss data of the remaining N-1 ports in the detection system of the port 1 to the communication failure, and the error packet data and the packet loss
  • the data reflects the communication quality of the communication path from port 1 to the other N-1 ports.
  • the detection device obtains the detection results of all the N ports, the communication quality of all communication paths in the detection system of the current communication failure is obtained, so that the detection device evaluates the faulty port according to the communication quality of all communication paths. .
  • the detecting device determines, according to the error packet data and the packet loss data of the other ports determined by each port, the state of the first port, where the state of the first port is used to indicate whether the first port is faulty.
  • the detecting device may determine whether the first port is faulty according to the detection result, and the first port is one of the N ports.
  • the detecting device may separately calculate, according to the detection result, a packet loss rate of the detection messages sent by the N ports, and further, according to the N The packet loss rate of the detection messages sent between the ports determines whether the first port is faulty.
  • the detecting device may convert the error packet data in the detection result into relative packet loss data according to a first preset function; and then, according to the relative packet loss data and the packet loss data in the detection result. And calculating, according to the second preset function, a packet loss rate of the detection messages sent by the N ports to each other.
  • the packet loss rate between the ports when the detection messages are sent and received between the ports is reflected.
  • the packet loss rate of port 1 to port 3 is 0.2%.
  • the data in Table 2 is the percentage data.
  • the detecting device calculates the packet loss rate of the detection message between the N ports, it is determined whether the first port is faulty.
  • the detecting device performs statistics according to Table 2, if at least N/2 ports send detection messages to the first port, the packet loss rate is greater than the first preset value, and at least N/2 The packet loss rate of the detection message sent between the ports is less than the second preset value, and the detecting device determines that the first port is faulty; otherwise, the detecting device determines that the first port is not faulty.
  • the threshold of the packet loss data and the error packet data may be preset in the detecting device, and the packet loss data and the error packet data of the detection message sent by the detecting device to the other port and the other port satisfy the preset.
  • the port is determined to be a faulty port. Using this port for data transmission and reception will affect the reliability of data.
  • the detecting device may further calculate a ratio of packet loss data and error packet data between each port and other ports according to packet loss data and error packet data of the detection messages sent between the N ports, and obtain the lost ratio. Packets with relatively small packets and error packets. When N ports are faulty, select the port with the relatively small packet loss and error packet to send and receive data to ensure the normal operation of the server.
  • the detecting device determines whether the first port is faulty according to the detection result.
  • the detecting device generates a fault notification of the first port according to the state of the first port.
  • the detecting device After the detecting device sends the packet loss data and the error packet data of the detection message between the N ports to determine whether the first port is faulty, if the first port is faulty, the detecting device may generate the first a failure notification of the port, further, the detecting device may send a first failure notification to the server, so that the server removes the first port from the LAG according to the first failure notification, that is, stops at Data is sent on this port, and the data is recalculated in the remaining links according to the load sharing policy. The port recalculates the data sending port again after the failed port is restored, so that automatic switching of the communication path between the N ports can be implemented.
  • the detecting device may invoke DRS to perform virtual machine hot migration on the virtual machine running in the server, or the detecting device may Sending a second failure notification to the server, so that the server invokes the DRS according to the second failure notification to perform virtual machine hot migration on the virtual machine running in the server, and the virtual server in the server on the faulty port The machine is migrated to other servers that do not have a faulty port to ensure that data transmission is not impaired when the virtual machine corresponding to the failed port performs business interaction.
  • the method for detecting a communication failure using the present invention can effectively detect a port in a "sub-health" state, that is, the port can still transmit data, but the packet loss rate during data transmission is very large, resulting in passing through the port.
  • the data will continue to be damaged, and after detecting the port of the "sub-health" state, the first port is removed from the LAG in time, or the virtual machine running in the server is subjected to virtual machine hot migration. To achieve automatic switching of the communication path between the N ports and to ensure that data transmission is not damaged.
  • the N ports are physical ports in the server, or are virtual ports in a virtual machine running in the server.
  • the physical port of the server is fully interconnected and the communication fault is detected, and the port of the communication failure is switched.
  • the virtual port of the virtual machine running in the server is completely interconnected. The detection, in combination with the detection result of the physical port in the IAAS scenario, implements automatic path switching of the port with communication failure.
  • An embodiment of the present invention provides a method for detecting a communication failure, as shown in FIG. 6, including:
  • the server receives the probe message from the N-1 ports in the other server through the first port.
  • the probe message is used to determine the error packet data and the packet loss data of the N-1 ports, where N>2.
  • the server may periodically receive the probe message from the N-1 ports through the first port.
  • the first port receives the probe message from the N-1 ports in one minute, and according to the original communication protocol in the server, the first The port receives each port in a fixed period
  • the number of sent probe messages is predetermined.
  • the predetermined number reflects the ability of the port to send and receive data.
  • port 1 should receive 60 probe messages sent by port 3 within one minute.
  • the probe message may be used to reflect QoS (Quality of Service) of the N-1 ports to the first port, where QoS refers to a set of quality requirements on the collective behavior of one or more objects. .
  • QoS Quality of Service
  • the server may determine the error packet data of the N-1 ports by periodically transmitting a specified number of probe messages for each port. And the packet loss data, the error packet data and the packet loss data of the N-1 ports reflect the quality of service QoS of the N-1 ports to the first port.
  • the predetermined number of messages is 100. It can be seen that the number of probe messages of the N-1 ports received by the first port reflects the communication capability between the first port and the N-1 ports.
  • the server generates a detection result according to the detection message, where the detection result includes packet loss data and error packet data that the N-1 ports send the probe message to the first port.
  • the server may generate a probe result according to the probe message.
  • the server may calculate, according to the number of the detection messages received in the preset time, the packet loss data of the N-1 ports to the first port;
  • the probe message received in the time analyzes whether the probe message is a wrong packet, to collect the error packet data of the N-1 ports to the first port; finally, the server according to the packet loss data and the error Packet data, the detection result is generated.
  • the server generates a detection result of the first port to the N-1 ports according to the probe message and reports it to the detecting device to enable the check.
  • the measuring device determines whether the first port is faulty according to the detection result of the N ports.
  • the error packet data of the first port in Table 4 is calculated according to a CRC (Cyclic Redundancy Check) of each probe message received by the first port.
  • the server receives the probe message from the N-1 ports through the first port, and generates a probe result according to the probe message, so that the detecting device determines the faulty port according to the detection result of the respective ports.
  • the server may also periodically send the probe message to the other N-1 ports.
  • the other N-1 ports are similarly reported to the detecting device according to the detection message.
  • the server obtains the MAC addresses of other N-1 ports respectively; wherein, the MAC address, or MAC address, hardware address, is used to define the location of the network device, indicating the identifier of each site on the Internet.
  • the server can obtain the MAC addresses of the ports in other servers according to the ARP protocol or the LACP protocol.
  • the server constructs probe messages based on the MAC addresses of the other N-1 ports.
  • the probe message may be a Layer 2 packet.
  • the Layer 3 network layer is responsible for the IP address
  • the Layer 2 data link layer is responsible for the MAC address, so each network location has a dedicated layer.
  • MAC address The first port in the server identifies the MAC address information in the Layer 2 data packet, forwards the MAC address according to the MAC address, and records the MAC address and the corresponding port in an internal address table.
  • the server sends the probe message to the N-1 ports by using the first port according to the MAC address of the N-1 ports.
  • the server periodically sends the probe message to the other N-1 ports, so that the other N-1 ports also report their own detection results to the detection device according to the probe message.
  • the server acquires a fault notification sent by the detecting device according to the detection result, where the fault notification is used to indicate whether the first port is faulty.
  • the detection device After the server generates the detection result according to the detection message, the detection device determines whether the first port is faulty according to the detection result of each port, and the server can obtain the failure notification sent by the detection device according to the detection result.
  • the server may remove the first port from the LAG according to the fault notification.
  • the server may perform virtual machine hot on the virtual machine corresponding to the first port according to the fault notification. migrate.
  • the server queries whether the first port is in the LAG, that is, determines whether a failure has been removed from the LAG before the first port; if the first port is not In the LAG, that is, the first port has been removed from the LAG, the server may re-add the first port to the LAG at this time to perform data transmission and reception through the first port.
  • the work of removing the first port from the LAG may be performed by the detecting device, or may be sent by the detecting device.
  • the message informs the server that the first port is faulty, and the first port is removed from the LAG by the server itself, which is not limited by the present invention.
  • the N ports are physical ports in the server, or are virtual ports in a virtual machine running in the server.
  • the physical port of the server is fully interconnected and the communication fault is detected, and the port of the communication failure is switched.
  • the virtual port of the virtual machine running in the server is performed. Fully interconnected communication fault detection, combined with the detection of physical ports in the IAAS scenario, to achieve automatic path switching for ports with communication failures.
  • the server receives and sends the probe message through each port to form a fully interconnected path detection system, and generates a detection result to detect the quality of service between the ports, and analyzes the detection result reported by each device through the detection device.
  • the port of the "sub-health" state is detected, and the port of the "sub-health” state is removed from the LAG in time to prevent the server from using the port of the "sub-health” state for data transmission and reception, and the data is continuously damaged.
  • the server can only detect whether each port of the server is available through the LAG, that is, whether the port can transmit data, and cannot detect an abnormal situation in which the data is transmitted when the port is faulty (for example, when a packet is sent) Packet loss, or tampering with the contents of the packet, etc., will result in continued loss of data transmitted through the "sub-health" port, reducing the reliability of data transmission.
  • the detection method of the communication failure provided by the present invention can detect the port of the “sub-health” state, and then remove the port of the “sub-health” state from the LAG in time, thereby improving the reliability of data transmission.
  • An embodiment of the present invention provides a method for detecting a communication failure.
  • the server obtains a probe message from N-1 ports in each server through a first port, and generates a detection result according to the probe message, where the detection device acquires X respectively.
  • the first port is determined to be faulty according to the detection result.
  • the detecting device obtains the detection results of the N ports in the X servers respectively, and the detection result is generated by each server according to the detection messages respectively received by the N ports, because the detection result includes each port according to the receiving
  • the probe message sent by the other port is determined, and the error packet data and the packet loss data of the other port are determined.
  • the detecting device determines, according to the error packet data and the packet loss data of the other port determined by each port. Whether one of the N ports is a faulty port, to detect whether a port with a "sub-health" status affects the data transmission efficiency of the port, thereby improving the reliability of data transmission, and solving the problem that the LAG cannot be detected in the prior art.
  • the problem of a failed port to an abnormal operation avoids the risk of using the failed port to transfer data.
  • An embodiment of the present invention provides a method for detecting a communication failure, as shown in FIG. 7, including:
  • the server receives the probe message from the N-1 ports in the other server through the first port.
  • the probe message is used to determine the error packet data and the packet loss data of the N-1 ports, where N>2.
  • the probe message may be a layer 2 data packet, the length of the layer 2 data packet may be changed, and the content of the layer 2 data packet may be randomly variable.
  • the predetermined number reflects the capability of the port to send and receive data, so the server can receive the other servers and the own ones through the first port.
  • Each port periodically sends a specified number of probe messages to determine the error packet data and packet loss data of the N-1 ports. For example, port 1 should receive 60 probe messages sent by port 2 every minute, if actual Port 1 only accepts 50 probe messages sent by port 2 every minute, indicating that packet loss occurs on port 1 or port 2.
  • the server may also periodically send the probe message to the N-1 ports in the other server. Therefore, the other N-1 ports are similarly reported to the detecting device according to the detection message.
  • the server can obtain the MAC addresses of the ports in other servers according to the ARP protocol or the LACP protocol.
  • the probe message is further constructed according to the MAC addresses of the other N-1 ports.
  • the server sends the probe message to the N-1 ports by using the first port according to the MAC address of the N-1 ports.
  • the server generates a detection result according to the detection message, where the detection result includes packet loss data and error packet data that the N-1 ports send the probe message to the first port.
  • the server may generate a probe result according to the probe message.
  • the server may calculate, according to the number of the detection messages received in the preset time, the packet loss data of the N-1 ports to the first port;
  • the probe message received in the time analyzes whether the probe message is a wrong packet, to collect the error packet data of the N-1 ports to the first port; finally, the server according to the packet loss data and the error Packet data, the detection result is generated.
  • packet loss data number of probe messages that should be received during the period - actual during the period The number of probe messages received;
  • CRC is the most commonly used error check code in the field of data communication, and the feature is that the length of the information field and the check field can be arbitrarily selected.
  • CRC is a data transmission error detection function. Polynomial calculation is performed on the data, and the obtained result is attached to the frame. The receiving device also performs a similar algorithm to ensure the correctness and integrity of the data transmission.
  • the server receives the probe message from the N-1 ports through the first port, and generates a probe result according to the probe message, so that the detecting device determines the faulty port according to the detection result of the respective ports.
  • the detecting device acquires a detection result of the N ports in each server.
  • the detection device may be configured with a path detection system to periodically receive detection results of N ports in the server, and the path detection system analyzes the faulty port according to the detection results of the N ports.
  • the detecting device obtains the detection result of the N ports in the server, and the detection result includes packet loss data and error packet data in which the detection messages are sent between the N ports.
  • each of the ports in the server repeats the above steps 301 and 302 until the path detection system of the detecting device acquires the detection results of all N ports, as shown in Table 5. After the path detection system of the detecting device acquires the detection results of all the N ports, the communication quality of all the communication paths in the detection system of the current communication failure is obtained, so that the detecting device evaluates the fault according to the communication quality of all the communication paths. port.
  • the detecting device calculates, according to the detection result, a packet loss rate of sending detection messages between the N ports.
  • the detecting device may determine whether the first port is faulty according to the detection result, and the first port is one of the N ports.
  • the detecting device may convert the error packet data in the detection result into relative packet loss data according to the first preset function.
  • the packet loss rate of the detection messages sent by the N ports is calculated according to the second preset function.
  • the packet loss rate may be recorded as a relative packet loss rate.
  • the detection device may cause all ports to fail according to the absolute packet loss rate. Therefore, the detection device follows the relative relationship between the N ports. The packet loss rate determines if the first port is faulty.
  • the packet loss rate of port 1 to port 2 is 0.13, the packet loss rate of port 1 to port 3 is 0.15, and the packet loss rate of port 1 to port 4 is 0.05, then the packet loss rate is the minimum (0.05).
  • the packet loss rate is the minimum (0.05).
  • the detecting device calculates the relative packet loss rate of the detection messages sent between the N ports according to the detection result.
  • the detecting device determines, according to a packet loss rate of sending detection messages between the N ports, whether the first port is faulty.
  • the first port may be any one of N ports.
  • the packet loss rate of the detection message sent by the at least N/2 ports to the first port is greater than the first preset value, and the packet loss rate of the detection message sent by the at least N/2 ports is smaller than the number of the N ports.
  • the second preset value the detecting device determines that the first port is faulty; otherwise, the detecting device determines that the first port is not faulty.
  • the detecting device determines that port 1 is faulty.
  • the detecting device may determine whether each of the N ports is faulty according to the foregoing method, that is, detecting whether a port in the "sub-health" state exists in each port of the server affects data transmission efficiency of the port.
  • the detecting device If the first port is faulty, the detecting device generates a first fault notification.
  • the first failure notification is used to instruct the server to remove the first port from the LAG.
  • the detecting device may generate a first failure notification and send a first failure notification to the server, so that the server removes the first port from the LAG according to the first failure notification, ie stops Data is sent on the port, and the port for data transmission is recalculated in the remaining link according to the load sharing policy.
  • the data sending port is recalculated again, so that the N ports can be implemented. Automatic switching of communication paths.
  • the detecting device invokes DRS to perform virtual machine hot migration on the virtual machine running in the server.
  • the detecting device may invoke DRS to perform virtual machine hot migration on the virtual machine running in the server, or the detecting device may Sending a second failure notification to the server, so that the server invokes the DRS according to the second failure notification to perform virtual machine hot migration on the virtual machine running in the server, and the virtual server in the server on the faulty port
  • the machine is migrated to other servers that do not have a faulty port to ensure that data transmission is not impaired when the virtual machine corresponding to the failed port performs business interaction.
  • VM Live Migration also known as dynamic migration, live migration
  • virtual machine save / restore Save / Restore
  • the detecting device If each port in the server is faulty, the detecting device generates a second fault notification.
  • the second fault notification is used to instruct the server to invoke the DRS to perform virtual machine hot migration on the virtual machine running in the server.
  • the server adds the first port to the LAG, so that data is sent and received through the first port.
  • the server queries whether the first port is in the LAG, that is, determines whether a failure has been removed from the LAG before the first port; if the first port is not In the LAG, that is, the first port has been removed from the LAG, the server may re-add the first port to the LAG at this time to perform data transmission and reception through the first port.
  • the work of removing the first port from the LAG may be performed by the detecting device, or may be sent by the detecting device.
  • the message informs the server that the first port is faulty, and the first port is removed from the LAG by the server itself, which is not limited by the present invention.
  • steps 306 to 309 are four possible situations after the step 308, so the steps 306 to 309 are in a side-by-side relationship, and the embodiment of the present invention does not limit the logical relationship between the steps 306 to 309.
  • the N ports are physical ports in the server, or are virtual ports in a virtual machine running in the server.
  • the physical port of the server is fully interconnected and the communication fault is detected, and the port of the communication failure is switched.
  • the virtual port of the virtual machine running in the server is completely interconnected. The detection, in combination with the detection result of the physical port in the IAAS scenario, implements automatic path switching of the port with communication failure.
  • the following provides a method for detecting a communication failure in the PAAS:
  • At least one virtual machine is running in each server, and the virtual machine has a virtual port.
  • the method for detecting a communication failure provided by the present invention is used to detect whether the virtual port is faulty.
  • virtual machine refers to a complete computer system that runs through a software and has complete hardware system functions and runs in a completely isolated environment.
  • the method for detecting a communication failure in the PAAS may include the following steps:
  • the virtual machine receives, by using the first virtual port, a virtual probe message from the M-1 virtual ports, where the virtual probe message is used to determine the error packet data and the packet loss data of the M-1 ports, where M>2.
  • the method for receiving the virtual probe message from the M-1 virtual ports may refer to step 301.
  • the virtual machine generates a virtual probe result according to the virtual probe message, where the virtual probe result includes the packet loss data and the error packet data that the M-1 virtual ports send the virtual probe message to the first virtual port.
  • the method for generating a virtual probe result according to the virtual probe message may refer to step 302.
  • the virtual machine acquires the detection result from the M virtual ports.
  • the virtual path detection system may be deployed in the virtual machine, and the detection results from the M virtual ports are periodically received according to steps 401 and 402, and the virtual path is further The path detection system analyzes the faulty virtual port based on the virtual probe results of the M virtual ports.
  • the virtual path detection system determines, according to the virtual detection result, whether the first virtual port is faulty, and the first virtual port is one of the N virtual ports.
  • the virtual path detection system may calculate the packet loss rate of the virtual detection messages sent by the M virtual ports according to the virtual detection result, and the method for calculating the packet loss rate may refer to step 304.
  • the virtual path detection system determines whether the first virtual port is faulty according to the packet loss rate of the virtual detection messages sent by the N virtual ports, and the method for determining whether the first virtual port is faulty may be referred to step 305.
  • the virtual path detection system If the first virtual port is faulty, the virtual path detection system generates a virtual fault information to report to the VNFM, so that the VNFM sends the virtual fault information to the detecting device in the IAAS.
  • VNFM Virtual Net Function Manager
  • NFV Network Function Virtualization
  • the virtual path detection system determines that the first virtual port is faulty, the virtual path detection system generates virtual fault information, where the virtual fault information may carry the ID of the first virtual port, and the virtual machine corresponding to the first virtual port The ID, and the ID of the server of the virtual machine corresponding to the first virtual port, the virtual path detection system reports the virtual fault information to the VNFM, and then forwarded by the VNFM to the detecting device in the IAAS.
  • the detecting device in the IAAS performs communication path switching according to the virtual fault information.
  • the detecting device in the IAAS queries whether the physical port on the server of the virtual machine corresponding to the first virtual port is faulty according to the ID of the server in the virtual fault information. If the physical port on the server is not faulty, the detecting device is The virtual machine indicated by the ID of the virtual machine corresponding to the first virtual port performs virtual machine hot migration.
  • an embodiment of the present invention provides a method for detecting whether a virtual port is faulty in a PAAS, and simultaneously detects a fault in a timely manner in combination with a detection result of a detection device in an IAAS.
  • the virtual port performs communication path switching, and implements path switching in a cloud scenario in which IAAS and PAAS are effectively combined.
  • the server receives and sends probe messages through the virtual ports or physical ports to form a path detection system that is fully interconnected in the IAAS and PAAS scenarios, and generates probe results to detect the quality of service between the ports.
  • the detection device analyzes the detection results reported by each port, detects the port in the “sub-health” state, and then removes the port in the “sub-health” state from the LAG in time to prevent the server from using the port in the “sub-health” state. Data is sent and received and the data continues to be damaged.
  • An embodiment of the present invention provides a method for detecting a communication failure.
  • the server obtains a probe message from N-1 ports in each server through a first port, and generates a detection result according to the probe message, where the detection device acquires X respectively.
  • the first port is determined to be faulty according to the detection result.
  • the detecting device obtains the detection results of the N ports in the X servers respectively, and the detection result is generated by each server according to the detection messages respectively received by the N ports, because the detection result includes each port according to the receiving
  • the probe message sent by the other port is determined, and the error packet data and the packet loss data of the other port are determined.
  • the detecting device determines, according to the error packet data and the packet loss data of the other port determined by each port. Whether one of the N ports is a faulty port, to detect whether a port with a "sub-health" status affects the data transmission efficiency of the port, thereby improving the reliability of data transmission, and solving the problem that the LAG cannot be detected in the prior art.
  • the problem of a failed port to an abnormal operation avoids the risk of using the failed port to transfer data.
  • An embodiment of the present invention provides a detecting device, as shown in FIG. 8, including:
  • the obtaining unit 31 is configured to respectively obtain the detection results of the N ports in the X servers, where the detection result includes the error packet data of the other ports determined by each port according to the received probe messages sent by other ports. Packet data, N>2;
  • a determining unit 32 configured to determine a state of the first port according to the error packet data and the packet loss data of the other port determined by each port in the obtaining unit 31, where the state of the first port is used to indicate the Whether the first port is faulty, and the first port is one of the N ports;
  • the processing unit 33 is configured to determine, according to the state of the first port in the determining unit 32, Generating a failure notification of the first port.
  • the determining unit 32 includes a calculating subunit 321, wherein
  • the calculating sub-unit 321 is configured to separately calculate, according to the detection result, a packet loss rate of the detection messages sent by the N ports to each other;
  • the determining unit 32 is configured to determine whether the first port is faulty according to a packet loss rate of the detection messages sent by the N ports in the computing subunit 321 .
  • the calculating sub-unit 321 is specifically configured to convert the error packet data in the detection result into relative packet loss data according to a first preset function; and according to the relative packet loss data and the detection result.
  • the packet loss data is calculated according to the second preset function, respectively, and the packet loss rate of the detection messages sent by the N ports is calculated.
  • the determining unit 32 is configured to: if the N2 ports of the N ports send the detection message to the first port, the packet loss rate is greater than the first preset value, And determining, by the at least N/2 ports, that the packet loss rate of the detection message is less than a second preset value, determining that the first port is faulty; otherwise, determining that the first port is faulty.
  • the processing unit 33 is specifically configured to generate the first fault notification of the first port, so that after the server acquires the first fault notification, the first port is removed from the LAG;
  • the fault notification includes a first fault notification, where the first fault notification is used to indicate that the first port is faulty.
  • the processing unit 33 is specifically configured to generate the second failure notification of the first port, so that the server acquires the second failure notification, and invokes a distributed resource scheduler DRS to run in the server.
  • Virtual machine for virtual machine hot migration
  • the fault notification includes a second fault notification, where the second fault notification is used to indicate that all the N ports in the X servers are faulty.
  • the N ports are physical ports in the X servers, or are virtual ports in virtual machines running in the X servers.
  • An embodiment of the present invention provides a server, as shown in FIG. 10, including:
  • the receiving unit 41 is configured to receive, by using the first port, a probe message from the N-1 ports in the other server, where the probe message is used to determine the error packet data and the packet loss data of the N-1 ports, where N>2 ;
  • the processing unit 42 is configured to generate a detection result according to the detection message of the receiving unit 41, where the detection result includes the packet loss data and the error packet data of the N-1 ports sending the probe message to the first port ;
  • the obtaining unit 43 is configured to acquire, according to the detection result of the processing unit 42 , a fault notification sent by the detecting device, where the fault notification is used to indicate whether the first port is faulty.
  • the first port is a physical port in the server, or is a virtual port in a virtual machine running in the server, wherein, as shown in FIG. 11, the server further includes a removing unit 44. And migration unit 45,
  • the removing unit 44 is configured to: if the first port in the obtaining unit 43 is a physical port in the server, and the first port is faulty, according to the fault notification in the acquiring unit 43 Removing the first port from the LAG;
  • the migrating unit 45 is configured to: if the first port in the acquiring unit 43 is a virtual port in a virtual machine running in the server, and the first port is faulty, according to the acquiring unit 43 The fault notification in the virtual machine performs virtual machine hot migration on the virtual machine corresponding to the first port.
  • processing unit 42 is further configured to: if the first port in the obtaining unit 43 is not faulty, query whether the first port is in the LAG; and if the first port is not in the In the LAG, the first port is added to the LAG to perform data transmission and reception through the first port.
  • the processing unit 42 is specifically configured to calculate, according to the number of the probe messages in the receiving unit 41 received in the preset time, the loss of the N-1 ports to the first port. Packet data; and analyzing whether the probe message is a wrong packet according to the probe message in the receiving unit 41 received within the preset time, to count the statistics The error packet data of the N-1 ports to the first port; and the detection result is generated according to the packet loss data and the error packet data.
  • the server further includes a sending unit 46,
  • the obtaining unit 43 is further configured to separately obtain media access control MAC addresses of the N-1 ports;
  • the processing unit 42 is further configured to construct the probe message according to the MAC address in the acquiring unit 43;
  • the sending unit 46 is configured to send the probe message in the processing unit 42 to the N-1 ports by using the first port according to the MAC address of the N-1 ports in the acquiring unit 43.
  • the server can only detect whether each port of its own port is available through the LAG, that is, whether the port can transmit data, and cannot detect the "sub-health" situation in which the data is transmitted when the port is faulty (for example, when sending a data packet)
  • the occurrence of a large number of packet loss, or tampering with the contents of the data packet, etc. causes the data transmitted through the "sub-health" port to continue to be damaged, thereby reducing the reliability of data transmission.
  • the detection method of the communication failure provided by the present invention can detect the port of the “sub-health” state, and then remove the port of the “sub-health” state from the LAG in time, thereby improving the reliability of data transmission.
  • An embodiment of the present invention provides a device for detecting a communication failure, in which a server acquires a probe message from N-1 ports in each server through a first port, and generates a probe result according to the probe message, where the detection device acquires X respectively. After the detection result of the N ports in the server, the first port is determined to be faulty according to the detection result.
  • the detecting device obtains the detection results of the N ports in the X servers respectively, and the detection result is generated by each server according to the detection messages respectively received by the N ports, because the detection result includes each port according to the receiving
  • the probe message sent by the other port is determined, and the error packet data and the packet loss data of the other port are determined.
  • the detecting device determines, according to the error packet data and the packet loss data of the other port determined by each port. Whether one of the N ports is a faulty port, to detect whether a port with a "sub-health" status affects the data transmission efficiency of the port, thereby improving the reliability of data transmission, and solving the problem that the LAG cannot be detected in the prior art.
  • the problem of a failed port to an abnormal operation avoids the risk of using the failed port to transfer data.
  • the disclosed system, apparatus, and method may be implemented in other manners.
  • the device embodiments described above are merely illustrative.
  • the division of the modules or units is only a logical function division.
  • there may be another division manner for example, multiple units or components may be used. Combinations can be integrated into another system, or some features can be ignored or not executed.
  • the mutual coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection through some interface, device or unit, and may be in an electrical, mechanical or other form.
  • the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of the embodiment.
  • each functional unit in each embodiment of the present invention may be integrated into one processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit.
  • the above integrated unit can be implemented in the form of hardware or in the form of a software functional unit.
  • the integrated unit if implemented in the form of a software functional unit and sold or used as a standalone product, may be stored in a computer readable storage medium.
  • the technical solution of the present invention which is essential or contributes to the prior art, or all or part of the technical solution, may be embodied in the form of a software product stored in a storage medium.
  • a number of instructions are included to cause a computer device (which may be a personal computer, server, or network device, etc.) or a processor to perform all or part of the steps of the methods described in various embodiments of the present invention.
  • the foregoing storage medium includes: a U disk, a mobile hard disk, a read-only memory (ROM), a random access memory (RAM), a magnetic disk, or an optical disk, and the like. .

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)
  • Small-Scale Networks (AREA)

Abstract

The present invention relates to the field of communications. Provided in an embodiment of the present invention are a communication failure detection method, device and system for solving the problem in the prior art that LAG cannot detect a malfunctioning port operating abnormally, thus avoiding the risk of transmitting data by using the malfunctioning port. The solution comprises: a detection device respectively acquires detection results of N ports in X servers, the detection results comprising wrong package data and lost package data of other ports determined by each port according to received detection messages transmitted by other ports; the detection device determines a state of a first port according to the wrong package data and the lost package data of other ports determined by each port, the state of the first port being used to indicate whether the first port has a fault; and the detection device generates a fault notification of the first port according to the state of the first port.

Description

一种通信故障的检测方法、装置及系统Method, device and system for detecting communication fault 技术领域Technical field
本发明涉及通信领域,尤其涉及一种通信故障的检测方法、装置及系统。The present invention relates to the field of communications, and in particular, to a method, device and system for detecting a communication failure.
背景技术Background technique
在网络组建技术中,通常采用各类服务器端口聚合及交换机堆叠的方式提升网络平面可靠性。然而,在各类服务器内进行端口聚合及交换机堆叠后,服务器内的各个端口可能会因为出现一些故障而不可用,进而导致端口之间的通信路径不可用。In the network construction technology, network port aggregation and switch stacking are used to improve network plane reliability. However, after port aggregation and switch stacking in various types of servers, each port in the server may be unavailable due to some failures, which may result in a communication path between the ports being unavailable.
在现有技术中,服务器内的LAG(Link Aggregation Group,链路聚合组)可以周期性地检测自身的端口的状态,当端口不可用时,服务器根据LACP(Link Aggregation Control Protocol,链路聚合控制协议),将不可用端口从LAG中移除以实现通信路径的切换。如图1所示,当服务器1的1号端口不可用,且2、3、4号端口正常运行时,将1号端口从LAG中移除,LAG自动选择2、3、4号端口进行数据包的转发。In the prior art, the LAG (Link Aggregation Group) in the server can periodically detect the status of its own port. When the port is unavailable, the server is in accordance with the Link Aggregation Control Protocol (LACP). ), the unavailable port is removed from the LAG to implement the switching of the communication path. As shown in Figure 1, when port 1 of server 1 is unavailable, and ports 2, 3, and 4 are running normally, port 1 is removed from the LAG, and LAG automatically selects ports 2, 3, and 4 for data. Forwarding of the package.
然而,各个端口在收发数据包时,可能会因为一些故障出现“亚健康”状态(为方便说明,本发明统一将“亚健康”状态的端口称为故障端口),此时,端口仍然可以与其他端口进行数据包的收发动作(即端口依然可用),但该端口在发送数据包时可能会出现丢包,或者,篡改数据包中的内容等异常操作,由于该端口对其他端口表现的状态仍为可用状态,因此LAG无法检测到该端口在收发数据包时的异常现象,也无法实现与该端口相关的通信路径的切换,这样一来,通过该故障端口(“亚健康”端口)进行传输的数据将持续受损,使数据传输的风险增加。However, when each port sends and receives data packets, it may be in a "sub-health" state due to some faults (for convenience of description, the present invention uniformly refers to a port in the "sub-health" state as a faulty port). At this time, the port can still be Other ports perform packet transmission and reception (that is, the port is still available), but the port may drop packets when sending packets, or tamper with the contents of the packet, etc., due to the status of the port to other ports. It is still available, so the LAG cannot detect the abnormality of the port when sending and receiving data packets, nor can it switch the communication path associated with the port. In this way, the faulty port ("sub-health" port) is used. The transmitted data will continue to be compromised, increasing the risk of data transmission.
发明内容Summary of the invention
本发明的实施例提供一种通信故障的检测方法、装置及系统,解决了现有技术中LAG无法检测到发生异常操作的故障端口的问题,避免了使用故障端口传输数据的风险。 The embodiment of the invention provides a method, a device and a system for detecting a communication failure, which solves the problem that the LAG cannot detect a faulty port in which abnormal operation occurs in the prior art, and avoids the risk of transmitting data using the faulty port.
为达到上述目的,本发明的实施例采用如下技术方案:In order to achieve the above object, embodiments of the present invention adopt the following technical solutions:
第一方面,本发明的实施例提供一种通信故障的检测方法,包括:In a first aspect, an embodiment of the present invention provides a method for detecting a communication failure, including:
检测设备分别获取X个服务器内N个端口的探测结果,所述探测结果包括每个端口根据接收到的其他端口发送的探测消息,确定的所述其它端口的错包数据和丢包数据,N>2,X>2;The detecting device obtains the detection results of the N ports in the X servers, and the detection result includes the error packet data and the packet loss data of the other ports determined by each port according to the received probe messages sent by other ports. >2,X>2;
所述检测设备根据所述每个端口确定的所述其它端口的错包数据和丢包数据,确定第一端口的状态,所述第一端口的状态用于指示所述第一端口是否有故障;Determining, by the detecting device, the status of the first port according to the error packet data and the packet loss data of the other port determined by each port, where the status of the first port is used to indicate whether the first port is faulty ;
所述检测设备根据所述第一端口的状态,生成所述第一端口的故障通知。The detecting device generates a failure notification of the first port according to the state of the first port.
在第一方面的第一种可能的实现方式中,所述检测设备根据所述每个端口确定的所述其它端口的错包数据和丢包数据,确定第一端口是否有故障,包括:In a first possible implementation manner of the first aspect, the detecting device determines, according to the error packet data and the packet loss data of the other port determined by each port, whether the first port is faulty, including:
所述检测设备根据所述探测结果分别计算所述N个端口之间互相发送所述检测消息的丢包率;The detecting device calculates, according to the detection result, a packet loss rate of the detection messages sent by the N ports to each other;
所述检测设备根据所述N个端口之间互相发送所述检测消息的丢包率确定所述第一端口是否有故障。The detecting device determines whether the first port is faulty according to a packet loss rate of the detection messages sent between the N ports.
结合第一方面的第一种可能的实现方式,在第一方面的第二种可能的实现方式中,所述检测设备根据所述探测结果分别计算所述N个端口之间互相发送所述检测消息的丢包率,包括:With reference to the first possible implementation manner of the first aspect, in a second possible implementation manner of the first aspect, the detecting device separately calculates, according to the detection result, the detecting, by the N ports The packet loss rate of the message, including:
所述检测设备将所述探测结果中的错包数据按照第一预置函数折算为相对丢包数据;The detecting device converts the error packet data in the detection result into relative packet loss data according to the first preset function;
所述检测设备根据所述相对丢包数据和所述探测结果中的丢包数据,按照第二预置函数分别计算所述N个端口之间互相发送所述检测消息的丢包率。The detecting device calculates a packet loss rate of the detection messages between the N ports according to the second preset function according to the relative packet loss data and the packet loss data in the detection result.
结合第一方面的第一种可能的实现方式,在第一方面的第三种可能的实现方式中,所述检测设备根据所述N个端口之间互相发送所述检测消息的丢包率确定所述第一端口是否有故障,包括:With reference to the first possible implementation manner of the first aspect, in a third possible implementation manner of the first aspect, the detecting device determines, according to a packet loss rate of the detection messages sent between the N ports Whether the first port is faulty, including:
在所述N个端口中,若有至少N/2个端口发送所述检测消息到 所述第一端口的丢包率大于第一预设值,且所述至少N/2个端口之间发送所述检测消息的丢包率小于第二预设值,所述检测设备则确定所述第一端口有故障;否则,所述检测设备则确定所述第一端口没有故障。In the N ports, if at least N/2 ports send the detection message to The packet loss rate of the first port is greater than a first preset value, and the packet loss rate of the detection message sent by the at least N/2 ports is less than a second preset value, and the detecting device determines The first port is faulty; otherwise, the detecting device determines that the first port is not faulty.
结合前述的第一方面或第一方面的第一至第三种可能的实现方式中的任一种可能的实现方式,在第一方面的第四种可能的实现方式中,所述故障通知包含第一故障通知,所述第一故障通知用于指示所述第一端口有故障,With reference to the foregoing first aspect, or any one of the first to the third possible implementation manners of the first aspect, in a fourth possible implementation manner of the first aspect, the fault notification includes a first failure notification, where the first failure notification is used to indicate that the first port is faulty,
其中,生成所述第一端口的故障通知,包括:The generating a failure notification of the first port includes:
所述检测设备生成所述第一端口的所述第一故障通知,以使得服务器获取所述第一故障通知后,将所述第一端口从链路聚合组LAG中移除。The detecting device generates the first failure notification of the first port, so that after the server acquires the first failure notification, the first port is removed from the link aggregation group LAG.
结合第一方面的第四种可能的实现方式,在第一方面的第五种可能的实现方式中,所述故障通知包含第二故障通知,所述第二故障通知用于指示所述X个服务器内的N个端口均有故障,In conjunction with the fourth possible implementation of the first aspect, in a fifth possible implementation manner of the first aspect, the fault notification includes a second fault notification, where the second fault notification is used to indicate the X N ports in the server are faulty.
其中,生成所述第一端口的故障通知,包括:The generating a failure notification of the first port includes:
所述检测设备生成所述第一端口的所述第二故障通知,以使得所述服务器获取所述第二故障通知,并调用DRS(Distributed Resource Scheduler,分布式资源调度程序)对所述服务器内运行的虚拟机进行虚拟机热迁移。The detecting device generates the second failure notification of the first port, so that the server acquires the second failure notification, and invokes a DRS (Distributed Resource Scheduler) to the server. The running virtual machine performs virtual machine hot migration.
结合前述的第一方面或第一方面的第一至第五种可能的实现方式中的任一种可能的实现方式,在第一方面的第六种可能的实现方式中,所述N个端口为所述X个服务器内的物理端口,或者,为所述X个服务器内运行的虚拟机中的虚拟端口。With reference to the foregoing first aspect, or any one of the first to the fifth possible implementation manners of the first aspect, in the sixth possible implementation manner of the first aspect, the N ports The physical port in the X servers, or the virtual port in the virtual machine running in the X servers.
第二方面,本发明的实施例提供一种通信故障的检测方法,包括:In a second aspect, an embodiment of the present invention provides a method for detecting a communication failure, including:
服务器通过第一端口接收来自其他服务器内N-1个端口的探测消息,所述探测消息用于确定所述N-1个端口的错包数据和丢包数据,N>2;The server receives the probe message from the N-1 ports in the other server through the first port, where the probe message is used to determine the error packet data and the packet loss data of the N-1 ports, where N>2;
服务器根据所述探测消息生成探测结果,所述探测结果包括所述N-1个端口发送所述探测消息到所述第一端口的丢包数据和错包数 据;The server generates a detection result according to the detection message, where the detection result includes the packet loss data and the number of error packets sent by the N-1 ports to the first port by the probe message. according to;
服务器根据所述探测结果获取检测设备发送的故障通知,所述故障通知用于指示所述第一端口是否有故障。The server acquires a fault notification sent by the detecting device according to the detection result, where the fault notification is used to indicate whether the first port is faulty.
在第二方面的第一种可能的实现方式中,所述第一端口为所述服务器内的物理端口,或者,为所述服务器内运行的虚拟机中的虚拟端口,In a first possible implementation manner of the second aspect, the first port is a physical port in the server, or is a virtual port in a virtual machine running in the server,
其中,在服务器根据所述探测结果获取检测设备发送的故障通知之后,还包括:After the server obtains the fault notification sent by the detecting device according to the detection result, the method further includes:
若所述第一端口为所述服务器内的物理端口,且所述第一端口有故障,所述服务器则根据所述故障通知将所述第一端口从链路聚合组LAG中移除;If the first port is a physical port in the server, and the first port is faulty, the server removes the first port from the link aggregation group LAG according to the failure notification;
若所述第一端口为所述服务器内运行的虚拟机中的虚拟端口,且所述第一端口有故障,所述服务器则根据所述故障通知对所述第一端口对应的虚拟机进行虚拟机热迁移。If the first port is a virtual port in the virtual machine running in the server, and the first port is faulty, the server virtualizes the virtual machine corresponding to the first port according to the failure notification. Machine heat migration.
在第二方面的第二种可能的实现方式中,在服务器根据所述探测结果获取检测设备发送的故障通知之后,还包括:In a second possible implementation manner of the second aspect, after the server obtains the fault notification sent by the detecting device according to the detection result, the method further includes:
若所述第一端口没有故障,所述服务器则查询所述第一端口是否在所述LAG中;If the first port is not faulty, the server queries whether the first port is in the LAG;
若所述第一端口不在所述LAG中,所述服务器则将所述第一端口添加至所述LAG中,以便通过所述第一端口进行数据收发。If the first port is not in the LAG, the server adds the first port to the LAG to perform data transmission and reception through the first port.
在第二方面的第三种可能的实现方式中,服务器根据所述探测消息生成探测结果,包括:In a third possible implementation manner of the second aspect, the server generates the detection result according to the detection message, including:
所述服务器根据在预置时间内接收到的所述探测消息的个数,计算所述N-1个端口到所述第一端口的丢包数据;The server calculates packet loss data of the N-1 ports to the first port according to the number of the probe messages received in the preset time;
所述服务器根据在所述预置时间内接收到的探测消息分析所述探测消息是否是错包,以统计所述N-1个端口到所述第一端口的错包数据;The server analyzes whether the probe message is a wrong packet according to the probe message received in the preset time, to collect error packet data of the N-1 ports to the first port;
所述服务器根据所述丢包数据和所述错包数据,生成所述探测结果。 The server generates the detection result according to the lost packet data and the error packet data.
在第二方面的第四种可能的实现方式中,所述方法还包括:In a fourth possible implementation manner of the second aspect, the method further includes:
所述服务器分别获取所述N-1个端口的MAC(Media Access Control,介质访问控制)地址;Obtaining, by the server, a MAC (Media Access Control) address of the N-1 ports, respectively;
所述服务器根据所述MAC地址构造所述探测消息;The server constructs the probe message according to the MAC address;
所述服务器根据所述N-1个端口的MAC地址,通过所述第一端口将所述探测消息发送至所述N-1个端口。The server sends the probe message to the N-1 ports by using the first port according to the MAC address of the N-1 ports.
第三方面,本发明的实施例提供一种检测设备,包括:In a third aspect, an embodiment of the present invention provides a detecting apparatus, including:
获取单元,用于分别获取X个服务器内N个端口的探测结果,所述探测结果包括每个端口根据接收到的其他端口发送的探测消息,确定的所述其它端口的错包数据和丢包数据,N>2,X>2;The obtaining unit is configured to obtain the detection results of the N ports in the X servers respectively, where the detection result includes the error packet data and the packet loss of the other port determined by each port according to the received probe message sent by the other port. Data, N>2, X>2;
确定单元,用于根据所述获取单元中每个端口确定的所述其它端口的错包数据和丢包数据,确定第一端口的状态,所述第一端口的状态用于指示所述第一端口是否有故障,所述第一端口为所述N个端口中的一个;a determining unit, configured to determine a state of the first port according to the error packet data and the packet loss data of the other port determined by each port in the acquiring unit, where the state of the first port is used to indicate the first Whether the port is faulty, the first port is one of the N ports;
处理单元,用于根据所述确定单元中所述第一端口的状态,生成所述第一端口的故障通知。And a processing unit, configured to generate a failure notification of the first port according to a state of the first port in the determining unit.
在第三方面的第一种可能的实现方式中,所述确定单元包括计算子单元,其中,In a first possible implementation manner of the third aspect, the determining unit includes a computing subunit, where
所述计算子单元,用于根据所述探测结果分别计算所述N个端口之间互相发送所述检测消息的丢包率;The calculating subunit is configured to calculate, according to the detection result, a packet loss rate of the detection messages sent by the N ports to each other;
所述确定单元,具体用于根据所述N个端口之间互相发送所述检测消息的丢包率确定所述第一端口是否有故障。The determining unit is specifically configured to determine whether the first port is faulty according to a packet loss rate of the detection messages sent between the N ports.
结合第三方面的第一种可能的实现方式,在第三方面的第二种可能的实现方式中,In conjunction with the first possible implementation of the third aspect, in a second possible implementation of the third aspect,
所述计算子单元,具体用于将所述探测结果中的错包数据按照第一预置函数折算为相对丢包数据;以及根据所述相对丢包数据和所述探测结果中的丢包数据,按照第二预置函数分别计算所述N个端口之间互相发送所述检测消息的丢包率。The calculating subunit is specifically configured to convert the error packet data in the detection result into relative packet loss data according to a first preset function; and according to the relative packet loss data and the packet loss data in the detection result. And calculating, according to the second preset function, a packet loss rate of the detection messages sent by the N ports to each other.
结合第三方面的第一种可能的实现方式,在第三方面的第三种可 能的实现方式中,In combination with the first possible implementation of the third aspect, the third In the way of implementation,
所述确定单元,具体用于在所述N个端口中,若有至少N/2个端口发送所述检测消息到所述第一端口的丢包率大于第一预设值,且所述至少N/2个端口之间发送所述检测消息的丢包率小于第二预设值,则确定所述第一端口有故障;否则,则确定所述第一端口没有故障。The determining unit is configured to: if the N/2 ports of the N ports send the detection message to the first port, the packet loss rate is greater than a first preset value, and the determining If the packet loss rate of the detection message sent between the N/2 ports is less than the second preset value, it is determined that the first port is faulty; otherwise, the first port is determined to be faultless.
结合前述的第三方面或第三方面的第一至第三种可能的实现方式中的任一种可能的实现方式,在第三方面的第四种可能的实现方式中,With reference to the foregoing third aspect, or any one of the first possible implementation manners of the third aspect, in a fourth possible implementation manner of the third aspect,
所述处理单元,具体用于生成所述第一端口的所述第一故障通知,以使得服务器获取所述第一故障通知后,将所述第一端口从LAG中移除;The processing unit is specifically configured to generate the first fault notification of the first port, so that after the server acquires the first fault notification, the first port is removed from the LAG;
其中,所述故障通知包含第一故障通知,所述第一故障通知用于指示所述第一端口有故障。The fault notification includes a first fault notification, where the first fault notification is used to indicate that the first port is faulty.
结合第三方面的第四种可能的实现方式,在第三方面的第五种可能的实现方式中,In conjunction with the fourth possible implementation of the third aspect, in a fifth possible implementation manner of the third aspect,
所述处理单元,具体用于生成所述第一端口的所述第二故障通知,以使得所述服务器获取所述第二故障通知,并调用分布式资源调度程序DRS对所述服务器内运行的虚拟机进行虚拟机热迁移;The processing unit is specifically configured to generate the second failure notification of the first port, so that the server acquires the second failure notification, and invokes a distributed resource scheduler DRS to run in the server. The virtual machine performs virtual machine hot migration;
其中,所述故障通知包含第二故障通知,所述第二故障通知用于指示所述X个服务器内的N个端口均有故障。The fault notification includes a second fault notification, where the second fault notification is used to indicate that all the N ports in the X servers are faulty.
结合前述的第三方面或第三方面的第一至第五种可能的实现方式中的任一种可能的实现方式,在第三方面的第六种可能的实现方式中,所述N个端口为所述X个服务器内的物理端口,或者,为所述X个服务器内运行的虚拟机中的虚拟端口。With reference to the foregoing third aspect, or any one of the first to the fifth possible implementation manners of the third aspect, in a sixth possible implementation manner of the third aspect, the N ports The physical port in the X servers, or the virtual port in the virtual machine running in the X servers.
第四方面,本发明的实施例提供一种服务器,包括:In a fourth aspect, an embodiment of the present invention provides a server, including:
接收单元,用于通过第一端口接收来自其他服务器内N-1个端口的探测消息,所述探测消息用于确定所述N-1个端口的错包数据和丢包数据,N>2; a receiving unit, configured to receive, by using the first port, a probe message from the N-1 ports in the other server, where the probe message is used to determine the error packet data and the packet loss data of the N-1 ports, where N>2;
处理单元,用于根据所述探测消息生成探测结果,所述探测结果包括所述N-1个端口发送所述探测消息到所述第一端口的丢包数据和错包数据;a processing unit, configured to generate a detection result according to the detection message, where the detection result includes packet loss data and error packet data of the N-1 ports sending the probe message to the first port;
获取单元,用于根据所述探测结果获取检测设备发送的故障通知,所述故障通知用于指示所述第一端口是否有故障。And an obtaining unit, configured to acquire, according to the detection result, a fault notification sent by the detecting device, where the fault notification is used to indicate whether the first port is faulty.
在第四方面的第一种可能的实现方式中,所述第一端口为所述服务器内的物理端口,或者,为所述服务器内运行的虚拟机中的虚拟端口,其中,所述服务器还包括移除单元和迁移单元,In a first possible implementation manner of the fourth aspect, the first port is a physical port in the server, or is a virtual port in a virtual machine running in the server, where the server is further Including removal unit and migration unit,
所述移除单元,用于若所述获取单元中所述第一端口为所述服务器内的物理端口,且所述第一端口有故障,则根据所述获取单元中的故障通知将所述第一端口从LAG中移除;The removing unit is configured to: if the first port in the acquiring unit is a physical port in the server, and the first port is faulty, according to the fault notification in the acquiring unit, The first port is removed from the LAG;
所述迁移单元,用于若所述获取单元中所述第一端口为所述服务器内运行的虚拟机中的虚拟端口,且所述第一端口有故障,则根据所述获取单元中的故障通知对所述第一端口对应的虚拟机进行虚拟机热迁移。The migrating unit is configured to: if the first port in the acquiring unit is a virtual port in a virtual machine running in the server, and the first port is faulty, according to the fault in the acquiring unit Notifying that the virtual machine corresponding to the first port performs virtual machine hot migration.
在第四方面的第二种可能的实现方式中,In a second possible implementation of the fourth aspect,
所述处理单元,还用于若所述第一端口没有故障,则查询所述第一端口是否在所述LAG中;以及若所述第一端口不在所述LAG中,则将所述第一端口添加至所述LAG中,以便通过所述第一端口进行数据收发。The processing unit is further configured to: if the first port is not faulty, query whether the first port is in the LAG; and if the first port is not in the LAG, then the first A port is added to the LAG for data transceiving through the first port.
在第四方面的第三种可能的实现方式中,In a third possible implementation of the fourth aspect,
所述处理单元,具体用于根据在预置时间内接收到的所述接收单元中的探测消息的个数,计算所述N-1个端口到所述第一端口的丢包数据;并根据在所述预置时间内接收到的所述接收单元中的探测消息分析所述探测消息是否是错包,以统计所述N-1个端口到所述第一端口的错包数据;以及根据所述丢包数据和所述错包数据,生成所述探测结果。The processing unit is configured to calculate, according to the number of the probe messages in the receiving unit received in the preset time, the packet loss data of the N-1 ports to the first port; The probe message in the receiving unit received in the preset time analyzes whether the probe message is a wrong packet, to collect error packet data of the N-1 ports to the first port; The packet loss data and the error packet data generate the detection result.
在第四方面的第四种可能的实现方式中,所述服务器还包括发送单元,In a fourth possible implementation manner of the fourth aspect, the server further includes a sending unit,
所述获取单元,还用于分别获取所述N-1个端口的介质访问控制 MAC地址;The obtaining unit is further configured to separately obtain media access control of the N-1 ports MAC address;
所述处理单元,还用于根据所述获取单元中的MAC地址构造所述探测消息;The processing unit is further configured to construct the probe message according to a MAC address in the acquiring unit;
所述发送单元,用于根据所述获取单元中N-1个端口的MAC地址,通过所述第一端口将所述处理单元中的探测消息发送至所述N-1个端口。The sending unit is configured to send, by using the first port, a probe message in the processing unit to the N-1 ports according to a MAC address of the N-1 ports in the acquiring unit.
第五方面,本发明的实施例提供一种通信故障的检测系统,所述检测系统包括第三方面以及第三方面的第一至第六种可能的实现方式中任一种可能的实现方式所述的检测设备,以及与所述检测设备相连接的第四方面以及第四方面的第一至第四种可能的实现方式中任一种可能的实现方式所述的服务器。In a fifth aspect, an embodiment of the present invention provides a communication failure detection system, where the detection system includes the third aspect and any one of the first to sixth possible implementation manners of the third aspect. The detecting device, and the server according to the fourth aspect of the detecting device, and the possible implementation of any one of the first to fourth possible implementations of the fourth aspect.
本发明的实施例提供一种通信故障的检测方法、装置及系统,检测设备获取服务器内N个端口的探测结果,所述探测结果为服务器根据N个端口分别接收到的检测消息生成的,由于所述探测结果包括每个端口根据接收到的其他端口发送的探测消息,确定的所述其它端口的错包数据和丢包数据,因此,检测设备根据每个端口确定的所述其它端口的错包数据和丢包数据,确定N个端口中的某一端口是否为故障端口,以检测是否出现“亚健康”状态的端口影响了通过该端口的数据传输的效率,从而提高了数据传输的可靠性。An embodiment of the present invention provides a method, an apparatus, and a system for detecting a communication failure. The detection device obtains a detection result of N ports in the server, and the detection result is generated by the server according to the detection messages respectively received by the N ports. The detection result includes the error packet data and the packet loss data of the other port determined by each port according to the detection message sent by the other port, and therefore, the detection device determines the error of the other port according to each port. Packet data and packet loss data, determine whether one of the N ports is a faulty port, to detect whether a port with a "sub-health" status affects the efficiency of data transmission through the port, thereby improving the reliability of data transmission. Sex.
附图说明DRAWINGS
为了更清楚地说明本发明实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍。In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the embodiments or the prior art description will be briefly described below.
图1为现有技术中通信故障的检测系统的架构示意图;1 is a schematic structural diagram of a detection system for communication failure in the prior art;
图2为本发明实施例提供的一种通信故障的检测系统的架构示意图;2 is a schematic structural diagram of a communication fault detection system according to an embodiment of the present invention;
图3为本发明实施例提供的检测设备的硬件示意图;FIG. 3 is a schematic diagram of hardware of a detecting device according to an embodiment of the present disclosure;
图4为本发明实施例提供的服务器的硬件示意图;4 is a schematic diagram of hardware of a server according to an embodiment of the present invention;
图5为本发明实施例提供的一种通信故障的检测方法的流程图一; FIG. 5 is a flowchart 1 of a method for detecting a communication fault according to an embodiment of the present invention;
图6为本发明实施例提供的一种通信故障的检测方法的流程图二;FIG. 6 is a second flowchart of a method for detecting a communication fault according to an embodiment of the present invention;
图7为本发明实施例提供的一种通信故障的检测方法的流程图三;FIG. 7 is a flowchart 3 of a method for detecting a communication fault according to an embodiment of the present invention;
图8为本发明实施例提供的一种检测设备的结构示意图一;FIG. 8 is a schematic structural diagram 1 of a detecting device according to an embodiment of the present disclosure;
图9为本发明实施例提供的一种检测设备的结构示意图二;FIG. 9 is a schematic structural diagram 2 of a detecting device according to an embodiment of the present disclosure;
图10为本发明实施例提供的一种服务器的结构示意图一;FIG. 10 is a schematic structural diagram 1 of a server according to an embodiment of the present disclosure;
图11为本发明实施例提供的一种服务器的结构示意图二;FIG. 11 is a schematic structural diagram 2 of a server according to an embodiment of the present disclosure;
图12为本发明实施例提供的一种服务器的结构示意图三。FIG. 12 is a schematic structural diagram 3 of a server according to an embodiment of the present invention.
具体实施方式detailed description
以下描述中,为了说明而不是为了限定,提出了诸如特定系统结构、接口、技术之类的具体细节,以便透彻理解本发明。然而,本领域的技术人员应当清楚,在没有这些具体细节的其它实施例中也可以实现本发明。在其它情况中,省略对众所周知的装置、电路以及方法的详细说明,以免不必要的细节妨碍本发明的描述。In the following description, for purposes of illustration and description, reference However, it will be apparent to those skilled in the art that the present invention may be practiced in other embodiments without these specific details. In other instances, detailed descriptions of well-known devices, circuits, and methods are omitted so as not to obscure the description of the invention.
本文中术语“系统”和“网络”在本文中常被可互换使用。为方便理解本发明实施例提供的一种通信故障的检测方法、装置及系统,首先介绍与本发明相关的一些概念。The terms "system" and "network" are used interchangeably herein. In order to facilitate the understanding of a method, device and system for detecting communication failures provided by the embodiments of the present invention, some concepts related to the present invention are first introduced.
端口聚合,也叫做以太通道(ethernet channel),主要用于交换机或服务器之间连接。使用端口聚合的话,交换机会把一组物理端口联合起来作为一个逻辑的通道(如图1所示的端口1、2、3、4),也就是channel-group,这样交换机会认为这个逻辑通道为一个端口。使用端口聚合技术后,只要组内不是所有的端口都down掉(停机),两个交换机之间仍然可以继续通信,这样一来,使用端口聚合技术可以允许多个交换器之间通过多个端口并行连接同时传输数据以提供更高的带宽、更大的吞吐量和可恢复性的技术,增加了系统的可靠性。Port aggregation, also known as ethernet channel, is mainly used for connections between switches or servers. If port aggregation is used, the switch will combine a group of physical ports as a logical channel (such as ports 1, 2, 3, and 4 shown in Figure 1,) that is, channel-group, so the switch considers this logical channel as One port. After port aggregation technology is used, as long as not all ports in the group are down (down), communication between the two switches can still continue. In this way, port aggregation technology can be used to allow multiple switches to pass multiple ports. Parallel connections simultaneously transmit data to provide higher bandwidth, greater throughput and recoverability, increasing system reliability.
交换机堆叠,是指将一台以上的交换机组合起来共同工作,以便在有限的空间内提供尽可能多的端口。多台交换机堆叠后,具有足够的系统带宽,增加了系统的可靠性。 Switch stacking refers to combining more than one switch to work together to provide as many ports as possible in a limited space. After multiple switches are stacked, they have sufficient system bandwidth and increase system reliability.
LACP协议,是一种实现链路动态聚合与解聚合的协议。使用某端口的LACP协议后,该端口将通过发送LACPdu向对端通告自己的系统优先级、系统MAC、端口优先级、端口号。对端接收到这些信息后,将这些信息与其它端口所保存的信息比较以选择能够聚合的端口,从而双方可以对端口加入或退出某个动态聚合组达成一致。The LACP protocol is a protocol for implementing dynamic aggregation and de-aggregation of links. After the LACP protocol of a port is used, the port will advertise its system priority, system MAC address, port priority, and port number to the peer end by sending LACPdu. After receiving the information, the peer compares the information with the information saved by other ports to select the port that can be aggregated. Therefore, the two parties can agree to join or exit the dynamic aggregation group.
链路聚合(Link Aggregation),是指将多个物理端口捆绑在一起,成为一个逻辑端口,以实现出/入流量在各成员端口中的负荷分担,交换机根据用户配置的端口负荷分担策略决定报文从哪一个成员端口发送到对端的交换机。当交换机或服务器检测到其中一个成员端口的链路发生故障时,就停止在此端口上发送数据,并根据负荷分担策略在剩下链路中重新计算报文发送的端口,故障端口恢复后再次重新计算数据再发送端口,因此,链路聚合在增加链路带宽、实现链路传输弹性和冗余等方面是一项很重要的技术。Link aggregation (Link Aggregation) refers to the binding of multiple physical ports to a logical port to implement the load balancing of the inbound/outbound traffic on each member port. The switch determines the report based on the port load balancing policy configured by the user. From which member port the text is sent to the peer switch. When the switch or the server detects that the link of one of the member ports is faulty, it stops sending data on this port, and recalculates the port on which the packet is sent in the remaining link according to the load sharing policy. After the faulty port is restored, the port is restored again. Recalculating data retransmission ports, therefore, link aggregation is an important technology in terms of increasing link bandwidth, achieving link transmission flexibility and redundancy.
另外,本发明中涉及的服务器,可以是各种类型的服务器,例如刀片服务器,并且,服务器内可以运行至少一个虚拟机,所述虚拟机中包含虚拟端口。本发明中涉及的交换机,是一种用于电信号转发的网络设备,它至少可以满足二层交换需求,即可以识别数据包中的MAC地址信息,根据MAC地址进行转发,并将这些MAC地址与对应的端口记录在自己内部的一个地址表中。In addition, the server involved in the present invention may be various types of servers, such as a blade server, and at least one virtual machine may be running in the server, and the virtual machine includes a virtual port. The switch involved in the present invention is a network device for electrical signal forwarding, which can meet at least Layer 2 switching requirements, that is, can identify MAC address information in a data packet, forward according to the MAC address, and forward the MAC addresses. Record with the corresponding port in an address table inside itself.
具体的,在各类服务器内进行端口聚合及交换机堆叠后,服务器内的各个端口可能会出现“亚健康”状态,此时,端口仍然可以与其他端口进行数据包的收发动作(即端口依然可用),但该端口在发送数据包时可能会出现丢包,或者,篡改数据包中的内容等异常操作。而现有技术中LAG无法检测到“亚健康”状态的端口,导致通过该“亚健康”端口进行传输的数据将持续受损,因此,本发明的实施例提供一种通信故障的检测方法、装置及系统,解决了现有技术中LAG无法检测到端口可能出现的“亚健康”状态的问题,提高了数据传输的可靠性。Specifically, after port aggregation and switch stacking in various types of servers, each port in the server may have a "sub-health" status. At this time, the port can still send and receive data packets with other ports (that is, the port is still available). ), but the port may drop packets when sending packets, or tamper with the contents of the packet and other abnormal operations. In the prior art, the LAG cannot detect the port in the “sub-health” state, and the data transmitted through the “sub-health” port will continue to be damaged. Therefore, the embodiment of the present invention provides a method for detecting a communication failure. The device and the system solve the problem that the LAG cannot detect the "sub-health" state that may occur in the port in the prior art, and improve the reliability of data transmission.
实施例一 Embodiment 1
本发明的实施例提供一种通信故障的检测系统,如图2所示,包括链路聚合后的X个服务器01,和进行交换机堆叠后的Y个交换机 02,以及检测设备03,其中,An embodiment of the present invention provides a communication failure detection system, as shown in FIG. 2, including X servers 01 after link aggregation, and Y switches after switch stacking. 02, and detecting device 03, wherein
所述服务器01内包含有至少一个端口,所述交换机02内包含有至少一个端口,所述服务器01与所述交换机02通过对应端口相连接。The server 01 includes at least one port, the switch 02 includes at least one port, and the server 01 and the switch 02 are connected through corresponding ports.
所述服务器01内运行有至少一个虚拟机,所述虚拟机包含有虚拟端口。The server 01 runs at least one virtual machine, and the virtual machine includes a virtual port.
所述检测设备03,可以部署在所述X个服务器01中的任意一个上,也可以独立于所述X个服务器01单独部署在所述通信故障的检测系统中。The detecting device 03 may be deployed on any one of the X servers 01, or may be separately deployed in the detecting system of the communication failure independently of the X servers 01.
一方面,在本发明的实施例中,所述检测设备03分别获取X个服务器01内N个端口的探测结果,所述探测结果包括每个端口根据接收到的其他端口发送的探测消息,确定的所述其它端口的错包数据和丢包数据,N>2,X>2;所述检测设备03根据所述每个端口确定的所述其它端口的错包数据和丢包数据,确定第一端口是否有故障,所述第一端口为所述N个端口中的一个。On the one hand, in the embodiment of the present invention, the detecting device 03 respectively obtains the detection results of the N ports in the X servers 01, and the detection result includes each port determining according to the probe message sent by the other ports received. The error packet data and the packet loss data of the other port, N>2, X>2; the detecting device 03 determines the first packet according to the error packet data and the packet loss data of the other port determined by each port. Whether a port is faulty, the first port is one of the N ports.
进一步地,所述检测设备03根据所述每个端口确定的所述其它端口的错包数据和丢包数据,确定第一端口是否有故障,可以具体包括以下步骤:所述检测设备03根据所述探测结果分别计算所述N个端口之间互相发送所述检测消息的丢包率;所述检测设备03根据所述N个端口之间互相发送所述检测消息的丢包率确定所述第一端口是否有故障。Further, the detecting device 03 determines whether the first port is faulty according to the error packet data and the packet loss data of the other port determined by each port, and may specifically include the following steps: the detecting device 03 according to the The detection result is respectively calculated, and the detection device 03 determines the packet loss rate according to the packet loss rate of the detection messages sent by the N ports to each other. Is there a fault in one port?
进一步地,所述检测设备03根据所述探测结果分别计算所述N个端口之间互相发送所述检测消息的丢包率,可以具体包括以下步骤:所述检测设备03将所述探测结果中的错包数据按照第一预置函数折算为相对丢包数据;所述检测设备03根据所述相对丢包数据和所述探测结果中的丢包数据,按照第二预置函数分别计算所述N个端口之间互相发送所述检测消息的丢包率。Further, the detecting device 03 respectively calculates, according to the detection result, a packet loss rate of the detection messages sent by the N ports, which may specifically include the following steps: the detecting device 03 detects the detection result. The error packet data is converted into relative packet loss data according to the first preset function; the detecting device 03 respectively calculates the data according to the second preset function according to the relative packet loss data and the packet loss data in the detection result. The packet loss rate of the detection message is sent between the N ports.
进一步地,所述检测设备03根据所述N个端口之间互相发送所述检测消息的丢包率确定所述第一端口是否有故障,可以具体包括以下步骤:在所述N个端口中,若有至少N/2个端口发送所述检测消息到所述第一端口的丢包率大于第一预设值,且所述至少N/2个端口之间发送所述检测消息的丢包率小于第二预设值,所述检测设备03 则确定所述第一端口有故障;否则,所述检测设备03则确定所述第一端口没有故障。Further, the detecting device 03 determines whether the first port is faulty according to the packet loss rate of the detection messages sent by the N ports, and may specifically include the following steps: in the N ports, If at least N/2 ports send the detection message to the first port, the packet loss rate is greater than the first preset value, and the packet loss rate of the detection message is sent between the at least N/2 ports Less than the second preset value, the detecting device 03 Then determining that the first port is faulty; otherwise, the detecting device 03 determines that the first port is not faulty.
进一步地,在所述检测设备03根据所述N个端口之间互相发送检测消息的丢包数据和错包数据,确定第一端口是否有故障之后,还可以包括以下步骤:若所述第一端口有故障,所述检测设备03则向所述服务器01发送第一故障通知,以使得所述服务器01根据所述第一故障通知将所述第一端口从LAG中移除。Further, after the detecting device 03 sends the packet loss data and the error packet data of the detection message between the N ports to determine whether the first port is faulty, the detecting device may further include the following steps: If the port is faulty, the detecting device 03 sends a first failure notification to the server 01, so that the server 01 removes the first port from the LAG according to the first failure notification.
进一步地,在所述检测设备03根据所述N个端口之间互相发送检测消息的丢包数据和错包数据,确定第一端口是否有故障之后,还可以包括以下步骤:Further, after the detecting device 03 sends the packet loss data and the error packet data of the detection message between the N ports to determine whether the first port is faulty, the detecting device may further include the following steps:
若所述服务器01内的各个端口均有故障,所述检测设备03则调用DRS对所述服务器01内运行的虚拟机进行虚拟机热迁移,或者,If the ports in the server 01 are faulty, the detecting device 03 invokes the DRS to perform virtual machine hot migration on the virtual machine running in the server 01, or
若所述服务器01内的各个端口均有故障,所述检测设备03则向所述服务器01发送第二故障通知,以使得所述服务器01根据所述第二故障通知调用DRS对所述服务器01内运行的虚拟机进行虚拟机热迁移。If the ports in the server 01 are faulty, the detecting device 03 sends a second failure notification to the server 01, so that the server 01 calls the DRS to the server according to the second failure notification. The virtual machine running inside performs virtual machine hot migration.
另一方面,在本发明的实施例中,服务器01通过第一端口接收来自其他服务器01内其他N-1个端口的探测消息,所述探测消息用于确定所述N-1个端口的错包数据和丢包数据,N>2;服务器01根据所述探测消息生成探测结果,所述探测结果包括所述N-1个端口发送所述探测消息到所述第一端口的丢包数据和错包数据;服务器01根据所述探测结果获取检测设备03发送的故障通知,所述故障通知用于指示所述第一端口是否有故障。On the other hand, in the embodiment of the present invention, the server 01 receives the probe messages from other N-1 ports in the other server 01 through the first port, and the probe message is used to determine the error of the N-1 ports. Packet data and packet loss data, N>2; the server 01 generates a detection result according to the probe message, where the detection result includes the packet loss data of the N-1 ports sending the probe message to the first port, and The server 01 acquires a fault notification sent by the detecting device 03 according to the detection result, and the fault notification is used to indicate whether the first port is faulty.
进一步地,所述第一端口为所述服务器01内的物理端口,或者,为所述服务器01内运行的虚拟机中的虚拟端口,其中,在服务器01根据所述探测结果获取检测设备03发送的故障通知之后,还可以包括以下步骤:Further, the first port is a physical port in the server 01, or is a virtual port in the virtual machine running in the server 01, where the server 01 sends the detection device 03 according to the detection result. After the fault notification, the following steps can also be included:
若所述第一端口为所述服务器01内的物理端口,且所述第一端口有故障,所述服务器01则根据所述故障通知将所述第一端口从LAG中移除; If the first port is a physical port in the server 01, and the first port is faulty, the server 01 removes the first port from the LAG according to the fault notification;
若所述第一端口为所述服务器01内运行的虚拟机中的虚拟端口,且所述第一端口有故障,所述服务器01则根据所述故障通知对所述第一端口对应的虚拟机进行虚拟机热迁移。If the first port is a virtual port in the virtual machine running in the server 01, and the first port is faulty, the server 01 notifies the virtual machine corresponding to the first port according to the fault notification. Perform virtual machine hot migration.
进一步地,在服务器01根据所述探测结果获取检测设备03发送的故障通知之后,还可以包括以下步骤:若所述第一端口没有故障,所述服务器01则查询所述第一端口是否在所述LAG中;若所述第一端口不在所述LAG中,所述服务器01则将所述第一端口添加至所述LAG中,以便通过所述第一端口进行数据收发。Further, after the server 01 acquires the fault notification sent by the detecting device 03 according to the detection result, the method may further include the following steps: if the first port is not faulty, the server 01 queries whether the first port is in the In the LAG, if the first port is not in the LAG, the server 01 adds the first port to the LAG to perform data transmission and reception through the first port.
进一步地,服务器01根据所述探测消息生成探测结果,可以具体包括以下步骤:所述服务器01根据在预置时间内接收到的所述探测消息的个数,计算所述N-1个端口到所述第一端口的丢包数据;所述服务器01根据在所述预置时间内接收到的探测消息分析所述探测消息是否是错包,以统计所述N-1个端口到所述第一端口的错包数据;所述服务器01根据所述丢包数据和所述错包数据,生成所述探测结果。Further, the server 01 generates the detection result according to the detection message, and may specifically include the following steps: the server 01 calculates the N-1 ports according to the number of the detection messages received within the preset time. The packet loss data of the first port; the server 01 analyzes whether the probe message is a wrong packet according to the probe message received in the preset time, to count the N-1 ports to the first The error packet data of one port; the server 01 generates the detection result according to the packet loss data and the error packet data.
进一步地,所述通信故障的检测方法还可以包括以下步骤:所述服务器01分别获取所述N-1个端口的介质访问控制MAC地址;所述服务器01根据所述MAC地址构造所述探测消息;所述服务器01根据所述N-1个端口的MAC地址,通过所述第一端口将所述探测消息发送至所述N-1个端口。Further, the method for detecting the communication failure may further include the following steps: the server 01 acquires media access control MAC addresses of the N-1 ports respectively; and the server 01 constructs the probe message according to the MAC address. The server 01 sends the probe message to the N-1 ports through the first port according to the MAC address of the N-1 ports.
需要说明的是,所述N个端口为所述服务器01内的物理端口,或者,为所述服务器01内运行的虚拟机中的虚拟端口。这样一来,本发明的实施例提供一种通信故障的检测方法、装置及系统,即可以应用于IAAS(Infrastructure as a Service,基础设施即服务)场景,也可以应用于PAAS(Platform-as-a-Service,平台即服务)场景,实现云场景下通信平面的自动切换,具体在IAAS和PAAS场景下实施通信故障的检测方法将在后续实施例中详细阐述,故此处不再赘述。It should be noted that the N ports are physical ports in the server 01 or virtual ports in the virtual machines running in the server 01. In this way, the embodiment of the present invention provides a method, a device, and a system for detecting a communication failure, which can be applied to an IAAS (Infrastructure as a Service) scenario, and can also be applied to a PAAS (Platform-as- The a-Service (Platform-as-a-Service) scenario is used to implement the automatic switching of the communication plane in the cloud scenario. The method for detecting communication faults in the IAAS and PAAS scenarios will be described in detail in subsequent embodiments, so it will not be described here.
另外,上述IAAS和PAAS都是云计算中不同层次的一种服务形式,所述云计算(cloud computing),是基于互联网的相关服务的增加、使用和交付模式,通常涉及通过互联网来提供动态地、易扩展地,且经常是虚拟化的资源。其中,云计算可以包括以下几个层次的服务: 基础设施即服务(IAAS),平台即服务(PAAS)和软件即服务(SAAS,Software-as-a-Service)。IAAS是指消费者通过Internet可以从完善的计算机基础设施获得服务,例如:硬件服务器租用;PAAS是指将软件研发的平台作为一种服务,例如:软件的个性化定制开发。In addition, the above IAAS and PAAS are different forms of service in cloud computing. The cloud computing is an increase, use and delivery mode of related services based on the Internet, and generally involves dynamically providing the Internet through the Internet. , easy to extend, and often a virtualized resource. Among them, cloud computing can include the following levels of services: Infrastructure as a Service (IAAS), Platform as a Service (PAAS) and Software as a Service (SAAS, Software-as-a-Service). IAAS means that consumers can obtain services from a complete computer infrastructure through the Internet, such as: hardware server rental; PAAS refers to the software development platform as a service, such as: personalized customization of software.
而本发明实施例提供的通信故障的检测方法,既可以应用在IAAS场景中,即对IAAS中的服务器的物理端口进行全互联的通信故障检测,并对通信故障的端口进行路径切换,同时,本发明提供的通信故障的检测方法还可以应用在PAAS场景中,即对PAAS中的服务器内运行的虚拟机的虚拟端口进行全互联的通信故障检测,进而结合IAAS场景中对物理端口的探测结果,实现对通信故障的端口的自动路径切换。The method for detecting the communication fault provided by the embodiment of the present invention can be applied to the IAAS scenario, that is, the communication fault detection of the physical port of the server in the IAAS is fully interconnected, and the path of the communication faulty port is switched. The method for detecting a communication fault provided by the present invention can also be applied to the PAAS scenario, that is, the communication fault detection of the virtual port of the virtual machine running in the server in the PAAS is combined, and the detection result of the physical port in the IAAS scenario is combined. To achieve automatic path switching for ports with communication failures.
由于在现有技术中,服务器只能通过LAG检测自身的各个端口是否可用,即判断端口能否传送数据,而不能检测到端口故障时传送数据发生的“亚健康”情况(比如在发送数据包时出现大量丢包,或者篡改数据包中的内容等),导致通过该“亚健康”端口进行传输的数据将持续受损,使数据传输的可靠性降低。而本发明提供的通信故障的检测方法恰恰可以检测出“亚健康”状态的端口,进而及时将“亚健康”状态的端口从LAG中移除,从而提高了数据传输的可靠性。In the prior art, the server can only detect whether each port of the port is available through the LAG, that is, whether the port can transmit data, and cannot detect the "sub-health" situation in which the data is transmitted when the port is faulty (for example, sending a data packet) When a large amount of packet loss occurs, or the contents of the packet are falsified, etc., the data transmitted through the "sub-health" port will continue to be damaged, and the reliability of data transmission is lowered. The detection method of the communication failure provided by the present invention can detect the port of the “sub-health” state, and then remove the port of the “sub-health” state from the LAG in time, thereby improving the reliability of data transmission.
本发明的实施例提供一种通信故障的检测系统,服务器通过第一端口获取到来自各个服务器内N-1个端口的探测消息,并根据所述探测消息生成探测结果,当检测设备分别获取X个服务器内N个端口的探测结果后,根据所述探测结果确定第一端口是否存在故障。该方案中,检测设备分别获取X个服务器内N个端口的探测结果,所述探测结果为各个服务器根据N个端口分别接收到的检测消息生成的,由于所述探测结果包括每个端口根据接收到的其他端口发送的探测消息,确定的所述其它端口的错包数据和丢包数据,因此,检测设备根据所述每个端口确定的所述其它端口的错包数据和丢包数据,确定N个端口中的某一端口是否为故障端口,以检测是否出现“亚健康”状态的端口影响该端口的数据传输效率,从而提高了数据传输的可靠性,解决了现有技术中LAG无法检测到发生异常操作的故障端口的问题,避免了使用故障端口传输数据的风险。 An embodiment of the present invention provides a communication failure detection system, in which a server obtains a detection message from N-1 ports in each server through a first port, and generates a detection result according to the detection message, when the detection device acquires X respectively. After the detection result of the N ports in the server, the first port is determined to be faulty according to the detection result. In this solution, the detecting device obtains the detection results of the N ports in the X servers respectively, and the detection result is generated by each server according to the detection messages respectively received by the N ports, because the detection result includes each port according to the receiving The probe message sent by the other port is determined, and the error packet data and the packet loss data of the other port are determined. Therefore, the detecting device determines, according to the error packet data and the packet loss data of the other port determined by each port. Whether one of the N ports is a faulty port, to detect whether a port with a "sub-health" status affects the data transmission efficiency of the port, thereby improving the reliability of data transmission, and solving the problem that the LAG cannot be detected in the prior art. The problem of a failed port to an abnormal operation avoids the risk of using the failed port to transfer data.
实施例二 Embodiment 2
附图3示出的是本发明实施例提供的检测设备的硬件示意图:FIG. 3 is a schematic diagram showing the hardware of the detecting device provided by the embodiment of the present invention:
该检测设备可以为服务器或刀片等,且该检测设备可以部署在通信故障的检测系统中上报探测结果的服务器内,也可以在通信故障的检测系统中引入新的服务器作为检测设备,具体的:The detecting device may be a server or a blade, and the detecting device may be deployed in a server that reports the detection result in the detecting system of the communication fault, or may introduce a new server as a detecting device in the detecting system of the communication fault, specifically:
如图3,所述检测设备包括处理器11、收发模块12、存储器13,其中,As shown in FIG. 3, the detecting device includes a processor 11, a transceiver module 12, and a memory 13, wherein
处理器11,是所述检测设备的控制中心,检测设备通过运行或执行存储在存储器内的软件程序和/或模块,以及调用存储在存储器内的数据,执行检测设备的各种功能和处理数据。The processor 11 is a control center of the detecting device, and the detecting device performs various functions and processing data of the detecting device by running or executing a software program and/or a module stored in the memory, and calling data stored in the memory. .
收发模块12,可用于收发信息过程中,信号的接收和发送,特别地,收发模块可以通过无线通信与网络和其他设备通信。所述无线通信可以使用任一通信标准或协议,本发明中,收发模块可基于LACP协议或ARP(Address Resolution Protocol,地址解析协议)进行数据收发。The transceiver module 12 can be used for receiving and transmitting signals during the process of transmitting and receiving information. In particular, the transceiver module can communicate with the network and other devices through wireless communication. The wireless communication can use any communication standard or protocol. In the present invention, the transceiver module can perform data transmission and reception based on the LACP protocol or the ARP (Address Resolution Protocol).
存储器13,可用于存储软件程序以及模块,处理器通过运行存储在存储器的软件程序以及模块,从而执行检测设备的各种功能应用以及数据处理。The memory 13 can be used to store software programs and modules, and the processor executes various functional applications and data processing of the detecting device by running software programs and modules stored in the memory.
在本发明实施例中,收发模块12分别获取X个服务器内N个端口的探测结果,所述探测结果包括每个端口根据接收到的其他端口发送的探测消息,确定的所述其它端口的错包数据和丢包数据,N>2,X>2;处理器11根据所述每个端口确定的所述其它端口的错包数据和丢包数据,确定第一端口是否有故障,所述第一端口为所述N个端口中的一个。In the embodiment of the present invention, the transceiver module 12 obtains the detection results of the N ports in the X servers respectively, and the detection result includes the error of the other ports determined by each port according to the received probe messages sent by other ports. Packet data and packet loss data, N>2, X>2; the processor 11 determines, according to the error packet data and the packet loss data of the other port determined by each port, whether the first port is faulty, the first One port is one of the N ports.
进一步的,所述处理器11根据所述N个端口之间互相发送检测消息的丢包数据和错包数据,确定第一端口是否有故障,还可以包括以下步骤:所述处理器11根据所述探测结果分别计算所述N个端口之间互相发送所述检测消息的丢包率并保存至存储器13;所述处理器11根据所述N个端口之间互相发送所述检测消息的丢包率确定所述第一端口是否有故障。 Further, the processor 11 determines whether the first port is faulty according to the packet loss data and the error packet data of the detection messages sent by the N ports, and may further include the following steps: the processor 11 according to the The detection results respectively calculate a packet loss rate of the detection messages sent by the N ports to the memory 13; and the processor 11 sends a packet loss of the detection message according to the N ports. The rate determines if the first port is faulty.
进一步的,所述处理器11根据所述探测结果分别计算所述N个端口之间互相发送所述检测消息的丢包率,还可以包括以下步骤:所述处理器11将所述探测结果中的错包数据按照第一预置函数折算为相对丢包数据;所述处理器11根据所述相对丢包数据和所述探测结果中的丢包数据,按照第二预置函数分别计算所述N个端口之间互相发送所述检测消息的丢包率。Further, the processor 11 respectively calculates, according to the detection result, a packet loss rate of the detection messages sent by the N ports, and may further include the following steps: the processor 11 The error packet data is converted into relative packet loss data according to the first preset function; the processor 11 respectively calculates the data according to the second preset function according to the relative packet loss data and the packet loss data in the detection result. The packet loss rate of the detection message is sent between the N ports.
进一步的,所述处理器11根据所述N个端口之间互相发送所述检测消息的丢包率确定所述第一端口是否有故障,还可以包括以下步骤:在所述N个端口中,若有至少N/2个端口发送所述检测消息到所述第一端口的丢包率大于第一预设值,且所述至少N/2个端口之间发送所述检测消息的丢包率小于第二预设值,所述处理器11则确定所述第一端口有故障;否则,所述处理器11则确定所述第一端口没有故障。Further, the processor 11 determines whether the first port is faulty according to the packet loss rate of the detection messages sent by the N ports, and may further include the following steps: in the N ports, If at least N/2 ports send the detection message to the first port, the packet loss rate is greater than the first preset value, and the packet loss rate of the detection message is sent between the at least N/2 ports Less than the second preset value, the processor 11 determines that the first port is faulty; otherwise, the processor 11 determines that the first port is not faulty.
进一步的,在处理器11根据所述N个端口之间互相发送检测消息的丢包数据和错包数据,确定第一端口是否有故障之后,还可以包括以下步骤:若所述处理器11确定所述第一端口有故障,所述处理器11则通过收发模块12向所述第一端口对应的服务器发送第一故障通知,以使得所述服务器根据所述第一故障通知将所述第一端口从LAG中移除。Further, after the processor 11 sends the packet loss data and the error packet data of the detection message between the N ports to determine whether the first port is faulty, the processor 11 may further include the following steps: if the processor 11 determines The first port is faulty, and the processor 11 sends a first failure notification to the server corresponding to the first port by the transceiver module 12, so that the server sends the first according to the first failure notification. The port is removed from the LAG.
进一步的,在处理器11根据所述N个端口之间互相发送检测消息的丢包数据和错包数据,确定第一端口是否有故障之后,还可以包括以下步骤:若处理器11确定所述X个服务器内的各个端口均有故障,所述处理器11则调用存储器13中的DRS对所述X个服务器内运行的虚拟机进行虚拟机热迁移,或者,Further, after the processor 11 sends the packet loss data and the error packet data of the detection message between the N ports to determine whether the first port is faulty, the processor 11 may further include the following steps: if the processor 11 determines the Each port in the X servers is faulty, and the processor 11 calls the DRS in the memory 13 to perform virtual machine hot migration on the virtual machines running in the X servers, or
若处理器11确定所述服务器内的X个端口均有故障,所述处理器11则通过收发模块12向所述X个服务器发送第二故障通知,以使得所述X个服务器根据所述第二故障通知调用DRS对所述X个服务器内运行的虚拟机进行虚拟机热迁移。If the processor 11 determines that all the X ports in the server are faulty, the processor 11 sends a second failure notification to the X servers through the transceiver module 12, so that the X servers are according to the The second failure notification invokes the DRS to perform virtual machine hot migration on the virtual machines running in the X servers.
进一步的,所述N个端口为所述服务器内的物理端口,或者,为所述服务器内运行的虚拟机中的虚拟端口。具体的,在IAAS中,对服务器的物理端口进行全互联的通信故障检测,并对通信故障的端 口进行路径切换,在PAAS中,对服务器内运行的虚拟机的虚拟端口进行全互联的通信故障检测,进而结合IAAS场景中对物理端口的探测结果,实现对通信故障的端口的自动路径切换。Further, the N ports are physical ports in the server, or are virtual ports in a virtual machine running in the server. Specifically, in the IAAS, the communication fault detection of the physical port of the server is fully interconnected, and the end of the communication failure is performed. The port performs path switching. In the PAAS, the virtual port of the virtual machine running in the server is fully interconnected and detected, and the detection result of the physical port in the IAAS scenario is combined to implement automatic path switching of the port with communication failure.
附图4示出的是本发明实施例提供的服务器的硬件示意图:FIG. 4 is a schematic diagram of hardware of a server provided by an embodiment of the present invention:
该服务器可以为各种类型的服务器(例如刀片服务器等),具体的:The server can be for various types of servers (such as blade servers, etc.), specifically:
如图4,所述服务器包括处理器21、收发模块22、存储器23,其中,As shown in FIG. 4, the server includes a processor 21, a transceiver module 22, and a memory 23, where
处理器21,是所述服务器的控制中心,服务器通过运行或执行存储在存储器内的软件程序和/或模块,以及调用存储在存储器内的数据,执行检测设备的各种功能和处理数据。The processor 21 is a control center of the server, and the server performs various functions and processing data of the detecting device by running or executing software programs and/or modules stored in the memory, and calling data stored in the memory.
收发模块22,可用于收发信息过程中,信号的接收和发送,特别地,收发模块可以通过无线通信与网络和其他设备通信。所述无线通信可以使用任一通信标准或协议,本发明中,收发模块可基于LACP协议或ARP协议进行数据收发。The transceiver module 22 can be used for receiving and transmitting signals during the process of transmitting and receiving information. In particular, the transceiver module can communicate with the network and other devices through wireless communication. The wireless communication can use any communication standard or protocol. In the present invention, the transceiver module can perform data transmission and reception based on the LACP protocol or the ARP protocol.
存储器23,可用于存储软件程序以及模块,处理器通过运行存储在存储器的软件程序以及模块,从而执行服务器的各种功能应用以及数据处理。The memory 23 can be used to store software programs and modules, and the processor executes various functional applications and data processing of the server by running software programs and modules stored in the memory.
在本发明实施例中,收发模块22通过第一端口接收来自其他服务器内N-1个端口的探测消息,所述探测消息用于确定所述N-1个端口的错包数据和丢包数据,N>2;处理器21根据所述探测消息生成探测结果并发送至收发模块22,所述探测结果包括所述N-1个端口发送所述探测消息到所述第一端口的丢包数据和错包数据;收发模块22根据所述探测结果获取检测设备发送的故障通知并发送至处理器21,所述故障通知用于指示所述第一端口是否有故障。In the embodiment of the present invention, the transceiver module 22 receives the probe message from the N-1 ports in the other server through the first port, where the probe message is used to determine the error packet data and the packet loss data of the N-1 ports. The N21 generates a detection result according to the detection message and sends the detection result to the transceiver module 22. The detection result includes the packet loss data sent by the N-1 ports to the first port by the detection message. And the error packet data; the transceiver module 22 acquires the fault notification sent by the detecting device according to the detection result, and sends the fault notification to the processor 21, where the fault notification is used to indicate whether the first port is faulty.
进一步的,所述第一端口为所述服务器内的物理端口,或者,为所述服务器内运行的虚拟机中的虚拟端口。Further, the first port is a physical port in the server, or is a virtual port in a virtual machine running in the server.
进一步的,收发模块22根据所述探测结果获取检测设备发送的故障通知并发送至处理器21之后,还可以包括以下步骤:若所述第一端口为所述服务器内的物理端口,且所述第一端口有故障,所述处 理器21则根据所述故障通知将所述第一端口从存储器23中的LAG中移除;Further, after the transceiver module 22 acquires the fault notification sent by the detecting device according to the detection result and sends it to the processor 21, the method may further include the following steps: if the first port is a physical port in the server, and the The first port is faulty, where The processor 21 then removes the first port from the LAG in the memory 23 according to the failure notification;
若所述第一端口为所述服务器内运行的虚拟机中的虚拟端口,且所述第一端口有故障,所述处理器21则根据所述故障通知对所述第一端口对应的虚拟机进行虚拟机热迁移。If the first port is a virtual port in the virtual machine running in the server, and the first port is faulty, the processor 21 notifies the virtual machine corresponding to the first port according to the fault notification. Perform virtual machine hot migration.
进一步的,收发模块22根据所述探测结果获取检测设备发送的故障通知并发送至处理器21之后,还可以包括以下步骤:若所述第一端口没有故障,所述处理器21则查询所述第一端口是否在所述存储器23的LAG中;若所述第一端口不在所述LAG中,所述处理器21则将所述第一端口添加至所述LAG中并更新添加后的LAG至存储器23,以便收发模块22通过所述第一端口进行数据收发。Further, after the transceiver module 22 acquires the fault notification sent by the detecting device according to the detection result and sends it to the processor 21, the method may further include the following steps: if the first port is not faulty, the processor 21 queries the Whether the first port is in the LAG of the memory 23; if the first port is not in the LAG, the processor 21 adds the first port to the LAG and updates the added LAG to The memory 23 is configured such that the transceiver module 22 performs data transmission and reception through the first port.
进一步的,处理器21根据所述探测消息生成探测结果并发送至收发模块22,可以包括以下步骤:所述处理器21根据在预置时间内接收到的所述探测消息的个数,计算所述N-1个端口到所述第一端口的丢包数据并保存至存储器23;所述处理器21根据在所述预置时间内接收到的探测消息分析所述探测消息是否是错包,以统计所述N-1个端口到所述第一端口的错包数据并保存至存储器23;所述处理器21根据存储器23中所述丢包数据和所述错包数据,生成所述探测结果。Further, the processor 21 generates a detection result according to the detection message and sends the detection result to the transceiver module 22, which may include the following steps: the processor 21 calculates the location according to the number of the detection messages received within the preset time. The packet loss data of the N-1 ports to the first port is saved to the memory 23; the processor 21 analyzes whether the probe message is a wrong packet according to the probe message received within the preset time. Collecting the error packet data of the N-1 ports to the first port and saving to the memory 23; the processor 21 generates the detection according to the packet loss data and the error packet data in the memory 23 result.
进一步的,所述通信故障的检测方法还可以包括以下步骤:所述处理器21分别获取所述N-1个端口的MAC地址;所述处理根据所述MAC地址构造所述探测消息;所述收发模块22根据所述N-1个端口的MAC地址,通过所述第一端口将所述探测消息发送至所述N-1个端口。Further, the method for detecting a communication failure may further include the following steps: the processor 21 acquires MAC addresses of the N-1 ports respectively; and the processing constructs the probe message according to the MAC address; The transceiver module 22 sends the probe message to the N-1 ports by using the first port according to the MAC address of the N-1 ports.
可以看出,由于在现有技术中,服务器只能通过LAG检测自身的各个端口是否可用,即判断端口能否传送数据,而不能检测到端口故障时传送数据发生的“亚健康”情况(比如在发送数据包时出现大量丢包,或者篡改数据包中的内容等),导致通过该“亚健康”端口进行传输的数据将持续受损,使数据传输的可靠性降低。而本发明提供的通信故障的检测方法恰恰可以检测出“亚健康”状态的端口,进而及时将“亚健康”状态的端口从LAG中移除,从而提高了数据传 输的可靠性。It can be seen that, in the prior art, the server can only detect whether each port of its own port is available through the LAG, that is, whether the port can transmit data, and cannot detect the "sub-health" situation in which the data is transmitted when the port is faulty (for example, When a large amount of packet loss occurs during the transmission of a data packet, or the content in the data packet is falsified, the data transmitted through the "sub-health" port will continue to be damaged, and the reliability of data transmission is lowered. The detection method of the communication failure provided by the present invention can detect the port of the “sub-health” state, and then remove the port of the “sub-health” state from the LAG in time, thereby improving data transmission. The reliability of the loss.
本发明的实施例提供一种通信故障的检测装置,服务器通过第一端口获取到来自各个服务器内N-1个端口的探测消息,并根据所述探测消息生成探测结果,当检测设备分别获取X个服务器内N个端口的探测结果后,根据所述探测结果确定第一端口是否存在故障。该方案中,检测设备分别获取X个服务器内N个端口的探测结果,所述探测结果为各个服务器根据N个端口分别接收到的检测消息生成的,由于所述探测结果包括每个端口根据接收到的其他端口发送的探测消息,确定的所述其它端口的错包数据和丢包数据,因此,检测设备根据所述每个端口确定的所述其它端口的错包数据和丢包数据,确定N个端口中的某一端口是否为故障端口,以检测是否出现“亚健康”状态的端口影响该端口的数据传输效率,从而提高了数据传输的可靠性,解决了现有技术中LAG无法检测到发生异常操作的故障端口的问题,避免了使用故障端口传输数据的风险。An embodiment of the present invention provides a device for detecting a communication failure, in which a server acquires a probe message from N-1 ports in each server through a first port, and generates a probe result according to the probe message, where the detection device acquires X respectively. After the detection result of the N ports in the server, the first port is determined to be faulty according to the detection result. In this solution, the detecting device obtains the detection results of the N ports in the X servers respectively, and the detection result is generated by each server according to the detection messages respectively received by the N ports, because the detection result includes each port according to the receiving The probe message sent by the other port is determined, and the error packet data and the packet loss data of the other port are determined. Therefore, the detecting device determines, according to the error packet data and the packet loss data of the other port determined by each port. Whether one of the N ports is a faulty port, to detect whether a port with a "sub-health" status affects the data transmission efficiency of the port, thereby improving the reliability of data transmission, and solving the problem that the LAG cannot be detected in the prior art. The problem of a failed port to an abnormal operation avoids the risk of using the failed port to transfer data.
实施例三 Embodiment 3
本发明的实施例提供一种通信故障的检测方法,如图5所示,包括:An embodiment of the present invention provides a method for detecting a communication failure, as shown in FIG. 5, including:
101、检测设备分别获取X个服务器内N个端口的探测结果,所述探测结果包括每个端口根据接收到的其他端口发送的探测消息,确定的其它端口的错包数据和丢包数据。The detection device obtains the detection results of the N ports in the X servers, and the detection result includes the error packet data and the packet loss data of the other ports determined by each port according to the detection messages sent by the other ports.
其中,N>2,X>2,所述N个端口为通信故障的检测系统中,经过端口聚合后的服务器中的各个端口(如图2服务器1的端口1、2、3、4所示)。Wherein, N>2, X>2, wherein the N ports are communication fault detection systems, and each port in the server after port aggregation (as shown in port 1, 2, 3, and 4 of server 1 in FIG. 2) ).
所述探测结果为各个服务器根据接收到的探测消息生成并上报至检测设备的,具体的,所述探测结果包括N个端口之间互相发送检测消息的丢包数据和错包数据,如表1所示,为服务器通过端口1发送至探测设备的探测结果,其中包括端口1至通信故障的检测系统中剩余N-1个端口的错包数据和丢包数据,而该错包数据和丢包数据反映了端口1至其他N-1个端口的通信路径的通信质量。The detection result is generated by each server according to the received detection message and reported to the detection device. Specifically, the detection result includes packet loss data and error packet data of the detection messages sent by the N ports, as shown in Table 1. The detection result sent by the server to the detecting device through the port 1 includes the error packet data and the packet loss data of the remaining N-1 ports in the detection system of the port 1 to the communication failure, and the error packet data and the packet loss The data reflects the communication quality of the communication path from port 1 to the other N-1 ports.
表1 Table 1
  错包数据Wrong packet data 丢包数据Packet loss data
端口1至端口2Port 1 to port 2 5个5 3个3
端口1至端口3 Port 1 to port 3 0个0 0个0
端口1至端口4 Port 1 to port 4 3个3 5个5
端口1至端口5 Port 1 to port 5 1个1 0个0
相应的,检测设备获取到所有的N个端口的探测结果后,即获取了当前通信故障的检测系统中,所有通信路径的通信质量,以便检测设备根据所有通信路径的通信质量评估存在故障的端口。Correspondingly, after the detection device obtains the detection results of all the N ports, the communication quality of all communication paths in the detection system of the current communication failure is obtained, so that the detection device evaluates the faulty port according to the communication quality of all communication paths. .
需要说明的是,所述错包数据和丢包数据的计算方法将在后续实施例中详细阐述,故此处不再赘述。It should be noted that the calculation method of the error packet data and the packet loss data will be elaborated in the subsequent embodiments, and therefore will not be described herein.
102、检测设备根据每个端口确定的其它端口的错包数据和丢包数据,确定第一端口的状态,所述第一端口的状态用于指示第一端口是否有故障。102. The detecting device determines, according to the error packet data and the packet loss data of the other ports determined by each port, the state of the first port, where the state of the first port is used to indicate whether the first port is faulty.
检测设备在获取X个服务器内N个端口的探测结果之后,可以根据该探测结果确定第一端口是否有故障,第一端口为N个端口中的一个。After obtaining the detection result of the N ports in the X servers, the detecting device may determine whether the first port is faulty according to the detection result, and the first port is one of the N ports.
可选的,检测设备在获取服务器内N个端口的探测结果之后,可以根据所述探测结果分别计算所述N个端口之间互相发送所述检测消息的丢包率;进而根据所述N个端口之间互相发送所述检测消息的丢包率确定所述第一端口是否有故障。Optionally, after detecting the detection result of the N ports in the server, the detecting device may separately calculate, according to the detection result, a packet loss rate of the detection messages sent by the N ports, and further, according to the N The packet loss rate of the detection messages sent between the ports determines whether the first port is faulty.
示例性的,所述检测设备可以将所述探测结果中的错包数据按照第一预置函数折算为相对丢包数据;然后根据所述相对丢包数据和所述探测结果中的丢包数据,按照第二预置函数分别计算所述N个端口之间互相发送所述检测消息的丢包率。最终,如表2所示,反映了各个端口之间进行检测消息的收发时各个端口之间的丢包率,例如端口1至端口3的丢包率为0.2%。其中,表2内数据为百分制数据。Exemplarily, the detecting device may convert the error packet data in the detection result into relative packet loss data according to a first preset function; and then, according to the relative packet loss data and the packet loss data in the detection result. And calculating, according to the second preset function, a packet loss rate of the detection messages sent by the N ports to each other. Finally, as shown in Table 2, the packet loss rate between the ports when the detection messages are sent and received between the ports is reflected. For example, the packet loss rate of port 1 to port 3 is 0.2%. Among them, the data in Table 2 is the percentage data.
表2 Table 2
  端口1 Port 1 端口2 Port 2 端口3 Port 3 端口4Port 4
端口1 Port 1 no 11 0.20.2 00
端口2 Port 2 00 no 0.30.3 0.30.3
端口3 Port 3 0.10.1 11 no 0.20.2
端口4Port 4 0.10.1 0.90.9 00 no
进而,在检测设备计算所述N个端口之间互相发送所述检测消息的丢包率之后,确定所述第一端口是否有故障。示例性的,检测设备根据表2进行统计,在N个端口中,若有至少N/2个端口发送检测消息到第一端口的丢包率大于第一预设值,且至少N/2个端口之间发送检测消息的丢包率小于第二预设值,检测设备则确定第一端口有故障;否则,检测设备则确定第一端口没有故障。Further, after the detecting device calculates the packet loss rate of the detection message between the N ports, it is determined whether the first port is faulty. Exemplarily, the detecting device performs statistics according to Table 2, if at least N/2 ports send detection messages to the first port, the packet loss rate is greater than the first preset value, and at least N/2 The packet loss rate of the detection message sent between the ports is less than the second preset value, and the detecting device determines that the first port is faulty; otherwise, the detecting device determines that the first port is not faulty.
可选的,检测设备内还可以预置丢包数据和错包数据的阈值,当检测设备接收到的某一个端口与其他端口互相发送检测消息的丢包数据和错包数据满足所述预置丢包数据和错包数据的阈值时,确定该端口为有故障的端口,使用该端口进行数据收发会影响数据的可靠性。Optionally, the threshold of the packet loss data and the error packet data may be preset in the detecting device, and the packet loss data and the error packet data of the detection message sent by the detecting device to the other port and the other port satisfy the preset. When the threshold of packet loss data and packet error data is determined, the port is determined to be a faulty port. Using this port for data transmission and reception will affect the reliability of data.
可选的,检测设备内还可以根据N个端口之间互相发送检测消息的丢包数据和错包数据,计算每个端口与其他端口之间的丢包数据和错包数据的比率,得到丢包和错包相对较小的端口,当N个端口都出现故障时,选取该丢包和错包相对较小的端口收发数据,尽最大可能保证服务器正常工作。Optionally, the detecting device may further calculate a ratio of packet loss data and error packet data between each port and other ports according to packet loss data and error packet data of the detection messages sent between the N ports, and obtain the lost ratio. Packets with relatively small packets and error packets. When N ports are faulty, select the port with the relatively small packet loss and error packet to send and receive data to ensure the normal operation of the server.
至此,检测设备根据探测结果确定第一端口是否有故障。At this point, the detecting device determines whether the first port is faulty according to the detection result.
103、检测设备根据第一端口的状态,生成第一端口的故障通知。103. The detecting device generates a fault notification of the first port according to the state of the first port.
在检测设备根据所述N个端口之间互相发送检测消息的丢包数据和错包数据,确定第一端口是否有故障之后,若所述第一端口有故障,所述检测设备可以生成第一端口的故障通知,进一步地,所述检测设备可以向所述服务器发送第一故障通知,以使得所述服务器根据所述第一故障通知将所述第一端口从LAG中移除,即停止在此端口上发送数据,并根据负荷分担策略在剩下链路中重新计算数据发送的 端口,当故障的端口恢复后再次重新计算数据发送端口,这样一来,可以实现所述N个端口之间通信路径的自动切换。After the detecting device sends the packet loss data and the error packet data of the detection message between the N ports to determine whether the first port is faulty, if the first port is faulty, the detecting device may generate the first a failure notification of the port, further, the detecting device may send a first failure notification to the server, so that the server removes the first port from the LAG according to the first failure notification, that is, stops at Data is sent on this port, and the data is recalculated in the remaining links according to the load sharing policy. The port recalculates the data sending port again after the failed port is restored, so that automatic switching of the communication path between the N ports can be implemented.
进一步地,若检测设备根据所述探测结果确定所述服务器内的各个端口均有故障,此时检测设备可以调用DRS对所述服务器内运行的虚拟机进行虚拟机热迁移,或者,检测设备可以向所述服务器发送第二故障通知,以使得所述服务器根据所述第二故障通知调用DRS对所述服务器内运行的虚拟机进行虚拟机热迁移,将有故障的端口上的服务器内的虚拟机迁移到其他没有故障端口的服务器上,以保故障端口对应的虚拟机在进行业务交互时,数据传输不受损害。Further, if the detecting device determines that each port in the server is faulty according to the detection result, the detecting device may invoke DRS to perform virtual machine hot migration on the virtual machine running in the server, or the detecting device may Sending a second failure notification to the server, so that the server invokes the DRS according to the second failure notification to perform virtual machine hot migration on the virtual machine running in the server, and the virtual server in the server on the faulty port The machine is migrated to other servers that do not have a faulty port to ensure that data transmission is not impaired when the virtual machine corresponding to the failed port performs business interaction.
至此,可以看出,使用本发明提供通信故障的检测方法可以有效检测出“亚健康”状态的端口,即端口仍然可以进行传输数据,但在数据传输时丢包率非常大导致经过该端口的数据将持续受损的端口,并且在检测出“亚健康”状态的端口后,及时将所述第一端口从LAG中移除,或者对所述服务器内运行的虚拟机进行虚拟机热迁移,以实现所述N个端口之间通信路径的自动切换且保证数据传输时不受损害。At this point, it can be seen that the method for detecting a communication failure using the present invention can effectively detect a port in a "sub-health" state, that is, the port can still transmit data, but the packet loss rate during data transmission is very large, resulting in passing through the port. The data will continue to be damaged, and after detecting the port of the "sub-health" state, the first port is removed from the LAG in time, or the virtual machine running in the server is subjected to virtual machine hot migration. To achieve automatic switching of the communication path between the N ports and to ensure that data transmission is not damaged.
需要说明的是,所述N个端口为所述服务器内的物理端口,或者,为所述服务器内运行的虚拟机中的虚拟端口。具体的,在IAAS中,对服务器的物理端口进行全互联的通信故障检测,并对通信故障的端口进行路径切换,在PAAS中,对服务器内运行的虚拟机的虚拟端口进行全互联的通信故障检测,进而结合IAAS场景中对物理端口的探测结果,实现对通信故障的端口的自动路径切换。It should be noted that the N ports are physical ports in the server, or are virtual ports in a virtual machine running in the server. Specifically, in the IAAS, the physical port of the server is fully interconnected and the communication fault is detected, and the port of the communication failure is switched. In the PAAS, the virtual port of the virtual machine running in the server is completely interconnected. The detection, in combination with the detection result of the physical port in the IAAS scenario, implements automatic path switching of the port with communication failure.
本发明的实施例提供一种通信故障的检测方法,如图6所示,包括:An embodiment of the present invention provides a method for detecting a communication failure, as shown in FIG. 6, including:
201、服务器通过第一端口接收来自其他服务器内N-1个端口的探测消息。201. The server receives the probe message from the N-1 ports in the other server through the first port.
其中,所述探测消息用于确定所述N-1个端口的错包数据和丢包数据,N>2。The probe message is used to determine the error packet data and the packet loss data of the N-1 ports, where N>2.
服务器可以周期性的通过第一端口接收来自N-1个端口的探测消息,比如,第一端口在一分钟内接收来自N-1个端口的探测消息,而根据服务器内原有通信协议,第一端口在固定周期内接收各个端口 发送的探测消息的个数是预定好的,该预定的个数体现了端口收发数据的能力,比如,端口1在一分钟内应该接收到端口3发送的60个探测消息。所述探测消息可以用于反映N-1个端口到第一端口的QoS(Quality of Service,服务质量),其中,QoS是指在一个或多个对象的集体行为上的一套质量需求的集合。由于第一端口与其他N-1个端口之间的路径可能存在故障,因此,服务器可以通过为各个端口周期性的发送指定数目的探测消息,以确定所述N-1个端口的错包数据和丢包数据,该N-1个端口的错包数据和丢包数据反映了所述N-1个端口到第一端口的服务质量QoS。The server may periodically receive the probe message from the N-1 ports through the first port. For example, the first port receives the probe message from the N-1 ports in one minute, and according to the original communication protocol in the server, the first The port receives each port in a fixed period The number of sent probe messages is predetermined. The predetermined number reflects the ability of the port to send and receive data. For example, port 1 should receive 60 probe messages sent by port 3 within one minute. The probe message may be used to reflect QoS (Quality of Service) of the N-1 ports to the first port, where QoS refers to a set of quality requirements on the collective behavior of one or more objects. . Since the path between the first port and the other N-1 ports may be faulty, the server may determine the error packet data of the N-1 ports by periodically transmitting a specified number of probe messages for each port. And the packet loss data, the error packet data and the packet loss data of the N-1 ports reflect the quality of service QoS of the N-1 ports to the first port.
示例性的,如表3所示,为第一端口在一分钟内接收到的N-1个端口的探测消息的个数,而第一端口在一分钟内接收N-1个端口发送的探测消息的个数的预定值为100个,可以看出,第一端口收到的N-1个端口的探测消息的个数,反映了第一端口至N-1个端口之间的通信能力。Exemplarily, as shown in Table 3, the number of probe messages of the N-1 ports received by the first port in one minute, and the first port receives the probes sent by the N-1 ports in one minute. The predetermined number of messages is 100. It can be seen that the number of probe messages of the N-1 ports received by the first port reflects the communication capability between the first port and the N-1 ports.
表3table 3
Figure PCTCN2015084002-appb-000001
Figure PCTCN2015084002-appb-000001
202、服务器根据探测消息生成探测结果,所述探测结果包括N-1个端口发送探测消息到第一端口的丢包数据和错包数据。202. The server generates a detection result according to the detection message, where the detection result includes packet loss data and error packet data that the N-1 ports send the probe message to the first port.
在服务器通过第一端口接收来自N-1个端口的探测消息之后,服务器可以根据探测消息生成探测结果。After the server receives the probe message from the N-1 ports through the first port, the server may generate a probe result according to the probe message.
具体的,服务器可以根据在预置时间内接收到的所述探测消息的个数,计算所述N-1个端口到所述第一端口的丢包数据;另外,服务器根据在所述预置时间内接收到的探测消息分析所述探测消息是否是错包,以统计所述N-1个端口到所述第一端口的错包数据;最后,服务器根据所述丢包数据和所述错包数据,生成所述探测结果。Specifically, the server may calculate, according to the number of the detection messages received in the preset time, the packet loss data of the N-1 ports to the first port; The probe message received in the time analyzes whether the probe message is a wrong packet, to collect the error packet data of the N-1 ports to the first port; finally, the server according to the packet loss data and the error Packet data, the detection result is generated.
示例性的,如表4所示,在表3的基础上,服务器根据探测消息生成第一端口至N-1个端口的探测结果并上报至检测设备,以使得检 测设备根据N个端口的探测结果确定所述第一端口是否有故障。其中,表4中第一端口的错包数据是根据第一端口接收到的每一个探测消息的CRC(Cyclic Redundancy Check,循环冗余校验码)计算得到的。Exemplarily, as shown in Table 4, on the basis of Table 3, the server generates a detection result of the first port to the N-1 ports according to the probe message and reports it to the detecting device to enable the check. The measuring device determines whether the first port is faulty according to the detection result of the N ports. The error packet data of the first port in Table 4 is calculated according to a CRC (Cyclic Redundancy Check) of each probe message received by the first port.
表4Table 4
Figure PCTCN2015084002-appb-000002
Figure PCTCN2015084002-appb-000002
至此,服务器通过第一端口接收来自N-1个端口的探测消息,并根据探测消息生成探测结果,以便检测设备根据所述各个端口的探测结果确定有故障的端口。At this point, the server receives the probe message from the N-1 ports through the first port, and generates a probe result according to the probe message, so that the detecting device determines the faulty port according to the detection result of the respective ports.
进一步地,在服务器通过第一端口接收来自N-1个端口的探测消息,并根据探测消息生成探测结果的同时,服务器中的还可以周期性的向其他N-1个端口发送探测消息,以使得其他N-1个端口同样的根据探测消息生成自己的探测结果上报给检测设备。Further, when the server receives the probe message from the N-1 ports through the first port, and generates the probe result according to the probe message, the server may also periodically send the probe message to the other N-1 ports. The other N-1 ports are similarly reported to the detecting device according to the detection message.
首先,服务器分别获取其他N-1个端口的MAC地址;其中,MAC地址,或称为MAC位址、硬件位址,用来定义网络设备的位置,表示互联网上每一个站点的标识符。First, the server obtains the MAC addresses of other N-1 ports respectively; wherein, the MAC address, or MAC address, hardware address, is used to define the location of the network device, indicating the identifier of each site on the Internet.
具体的,服务器可以根据ARP协议或者LACP协议获取到其他服务器内各个端口的MAC地址。Specifically, the server can obtain the MAC addresses of the ports in other servers according to the ARP protocol or the LACP protocol.
其次,服务器根据其他N-1个端口的MAC地址构造探测消息。Second, the server constructs probe messages based on the MAC addresses of the other N-1 ports.
该探测消息可以是一个二层数据包,在OSI模型中,第三层网络层负责IP地址,第二层数据链路层则负责MAC位址,因此每个网络位置会有一个专属于它的MAC地址。服务器内的第一端口识别二层数据包中的MAC地址信息,根据MAC地址进行转发,并将这些MAC地址与对应的端口记录在自己内部的一个地址表中。 The probe message may be a Layer 2 packet. In the OSI model, the Layer 3 network layer is responsible for the IP address, and the Layer 2 data link layer is responsible for the MAC address, so each network location has a dedicated layer. MAC address. The first port in the server identifies the MAC address information in the Layer 2 data packet, forwards the MAC address according to the MAC address, and records the MAC address and the corresponding port in an internal address table.
最后,服务器根据所述N-1个端口的MAC地址,通过所述第一端口将所述探测消息发送至所述N-1个端口。Finally, the server sends the probe message to the N-1 ports by using the first port according to the MAC address of the N-1 ports.
这样一来,服务器中的周期性的向其他N-1个端口发送探测消息,以使得其他N-1个端口同样的根据探测消息生成自己的探测结果上报给检测设备。In this way, the server periodically sends the probe message to the other N-1 ports, so that the other N-1 ports also report their own detection results to the detection device according to the probe message.
203、服务器根据探测结果获取检测设备发送的故障通知,故障通知用于指示第一端口是否有故障。203. The server acquires a fault notification sent by the detecting device according to the detection result, where the fault notification is used to indicate whether the first port is faulty.
服务器根据探测消息生成探测结果后,检测设备根据各个端口的探测结果确定第一端口是否有故障,服务器可以根据探测结果获取检测设备发送的故障通知。After the server generates the detection result according to the detection message, the detection device determines whether the first port is faulty according to the detection result of each port, and the server can obtain the failure notification sent by the detection device according to the detection result.
具体的,若所述第一端口为所述服务器内的物理端口,且所述第一端口有故障,所述服务器可以根据所述故障通知将所述第一端口从LAG中移除。Specifically, if the first port is a physical port in the server, and the first port is faulty, the server may remove the first port from the LAG according to the fault notification.
若所述第一端口为所述服务器内运行的虚拟机中的虚拟端口,且所述第一端口有故障,服务器可以根据所述故障通知对所述第一端口对应的虚拟机进行虚拟机热迁移。If the first port is a virtual port in the virtual machine running in the server, and the first port is faulty, the server may perform virtual machine hot on the virtual machine corresponding to the first port according to the fault notification. migrate.
若所述第一端口没有故障,所述服务器则查询所述第一端口是否在所述LAG中,即判断第一端口之前是不是发生故障已经从LAG中移除;若所述第一端口不在所述LAG中,即第一端口已经从LAG中移除,所述服务器此时可以将所述第一端口重新添加至所述LAG中,以便通过所述第一端口进行数据收发。If the first port is not faulty, the server queries whether the first port is in the LAG, that is, determines whether a failure has been removed from the LAG before the first port; if the first port is not In the LAG, that is, the first port has been removed from the LAG, the server may re-add the first port to the LAG at this time to perform data transmission and reception through the first port.
需要说明的是,在所述检测设备根据所述探测结果确定第一端口有故障之后,将所述第一端口从LAG中移除的工作既可以由检测设备完成,也可以由检测设备发送故障消息告知服务器第一端口有故障,进而由服务器自己将所述第一端口从LAG中移除,本发明对此并不做限定。It should be noted that, after the detecting device determines that the first port is faulty according to the detection result, the work of removing the first port from the LAG may be performed by the detecting device, or may be sent by the detecting device. The message informs the server that the first port is faulty, and the first port is removed from the LAG by the server itself, which is not limited by the present invention.
另外,所述N个端口为所述服务器内的物理端口,或者,为所述服务器内运行的虚拟机中的虚拟端口。具体的,在IAAS中,对服务器的物理端口进行全互联的通信故障检测,并对通信故障的端口进行路径切换,在PAAS中,对服务器内运行的虚拟机的虚拟端口进行 全互联的通信故障检测,进而结合IAAS场景中对物理端口的探测结果,实现对通信故障的端口的自动路径切换。In addition, the N ports are physical ports in the server, or are virtual ports in a virtual machine running in the server. Specifically, in the IAAS, the physical port of the server is fully interconnected and the communication fault is detected, and the port of the communication failure is switched. In the PAAS, the virtual port of the virtual machine running in the server is performed. Fully interconnected communication fault detection, combined with the detection of physical ports in the IAAS scenario, to achieve automatic path switching for ports with communication failures.
至此,服务器之间通过各个端口接收和发送探测消息,形成一个全互联的路径探测系统,生成探测结果以探测各个端口之间的服务质量,并通过检测设备对各个端口上报的探测结果进行分析,检测出“亚健康”状态的端口,进而及时将“亚健康”状态的端口从LAG中移除,避免服务器使用该“亚健康”状态的端口进行数据收发而导致数据持续受损。At this point, the server receives and sends the probe message through each port to form a fully interconnected path detection system, and generates a detection result to detect the quality of service between the ports, and analyzes the detection result reported by each device through the detection device. The port of the "sub-health" state is detected, and the port of the "sub-health" state is removed from the LAG in time to prevent the server from using the port of the "sub-health" state for data transmission and reception, and the data is continuously damaged.
而在现有技术中,服务器只能通过LAG检测自身的各个端口是否可用,即判断端口能否传送数据,而不能检测到端口故障时传送数据发生的异常情况(比如在发送数据包时出现大量丢包,或者篡改数据包中的内容等),导致通过该“亚健康”端口进行传输的数据将持续受损,使数据传输的可靠性降低。而本发明提供的通信故障的检测方法恰恰可以检测出“亚健康”状态的端口,进而及时将“亚健康”状态的端口从LAG中移除,提高了数据传输的可靠性。In the prior art, the server can only detect whether each port of the server is available through the LAG, that is, whether the port can transmit data, and cannot detect an abnormal situation in which the data is transmitted when the port is faulty (for example, when a packet is sent) Packet loss, or tampering with the contents of the packet, etc., will result in continued loss of data transmitted through the "sub-health" port, reducing the reliability of data transmission. The detection method of the communication failure provided by the present invention can detect the port of the “sub-health” state, and then remove the port of the “sub-health” state from the LAG in time, thereby improving the reliability of data transmission.
本发明的实施例提供一种通信故障的检测方法,服务器通过第一端口获取到来自各个服务器内N-1个端口的探测消息,并根据所述探测消息生成探测结果,当检测设备分别获取X个服务器内N个端口的探测结果后,根据所述探测结果确定第一端口是否存在故障。该方案中,检测设备分别获取X个服务器内N个端口的探测结果,所述探测结果为各个服务器根据N个端口分别接收到的检测消息生成的,由于所述探测结果包括每个端口根据接收到的其他端口发送的探测消息,确定的所述其它端口的错包数据和丢包数据,因此,检测设备根据所述每个端口确定的所述其它端口的错包数据和丢包数据,确定N个端口中的某一端口是否为故障端口,以检测是否出现“亚健康”状态的端口影响该端口的数据传输效率,从而提高了数据传输的可靠性,解决了现有技术中LAG无法检测到发生异常操作的故障端口的问题,避免了使用故障端口传输数据的风险。An embodiment of the present invention provides a method for detecting a communication failure. The server obtains a probe message from N-1 ports in each server through a first port, and generates a detection result according to the probe message, where the detection device acquires X respectively. After the detection result of the N ports in the server, the first port is determined to be faulty according to the detection result. In this solution, the detecting device obtains the detection results of the N ports in the X servers respectively, and the detection result is generated by each server according to the detection messages respectively received by the N ports, because the detection result includes each port according to the receiving The probe message sent by the other port is determined, and the error packet data and the packet loss data of the other port are determined. Therefore, the detecting device determines, according to the error packet data and the packet loss data of the other port determined by each port. Whether one of the N ports is a faulty port, to detect whether a port with a "sub-health" status affects the data transmission efficiency of the port, thereby improving the reliability of data transmission, and solving the problem that the LAG cannot be detected in the prior art. The problem of a failed port to an abnormal operation avoids the risk of using the failed port to transfer data.
实施例四Embodiment 4
本发明的实施例提供一种通信故障的检测方法,如图7所示,包括: An embodiment of the present invention provides a method for detecting a communication failure, as shown in FIG. 7, including:
301、服务器通过第一端口接收来自其他服务器内N-1个端口的探测消息。301. The server receives the probe message from the N-1 ports in the other server through the first port.
其中,所述探测消息用于确定所述N-1个端口的错包数据和丢包数据,N>2。所述探测消息可以是一个二层数据包,该二层数据包的长度可以改变,且二层数据包的内容可以是随机可变的。The probe message is used to determine the error packet data and the packet loss data of the N-1 ports, where N>2. The probe message may be a layer 2 data packet, the length of the layer 2 data packet may be changed, and the content of the layer 2 data packet may be randomly variable.
由于第一端口在固定周期内接收各个端口发送的探测消息的个数是预定好的,该预定的个数体现了端口收发数据的能力,因此服务器可以通过第一端口接收其他服务器以及自己中的各个端口周期性的发送指定数目的探测消息,以确定所述N-1个端口的错包数据和丢包数据,比如,端口1每分钟应该接受到端口2发送的60个探测消息,如果实际端口1每分钟仅仅接受到端口2发送的50个探测消息,那么说明端口1或者端口2出现了丢包的现象。Since the number of the probe messages sent by the first port to the respective ports in the fixed period is predetermined, the predetermined number reflects the capability of the port to send and receive data, so the server can receive the other servers and the own ones through the first port. Each port periodically sends a specified number of probe messages to determine the error packet data and packet loss data of the N-1 ports. For example, port 1 should receive 60 probe messages sent by port 2 every minute, if actual Port 1 only accepts 50 probe messages sent by port 2 every minute, indicating that packet loss occurs on port 1 or port 2.
另外,在服务器通过第一端口接收来自N-1个端口的探测消息,并根据探测消息生成探测结果的同时,服务器中的还可以周期性的向其他服务器内的N-1个端口发送探测消息,以使得其他N-1个端口同样的根据探测消息生成自己的探测结果上报给检测设备。In addition, when the server receives the probe message from the N-1 ports through the first port, and generates the probe result according to the probe message, the server may also periodically send the probe message to the N-1 ports in the other server. Therefore, the other N-1 ports are similarly reported to the detecting device according to the detection message.
具体的,服务器可以根据ARP协议或者LACP协议获取到其他服务器内各个端口的MAC地址。进而根据其他N-1个端口的MAC地址构造探测消息。最后,服务器根据所述N-1个端口的MAC地址,通过所述第一端口将所述探测消息发送至所述N-1个端口。Specifically, the server can obtain the MAC addresses of the ports in other servers according to the ARP protocol or the LACP protocol. The probe message is further constructed according to the MAC addresses of the other N-1 ports. Finally, the server sends the probe message to the N-1 ports by using the first port according to the MAC address of the N-1 ports.
302、服务器根据探测消息生成探测结果,所述探测结果包括N-1个端口发送探测消息到第一端口的丢包数据和错包数据。302. The server generates a detection result according to the detection message, where the detection result includes packet loss data and error packet data that the N-1 ports send the probe message to the first port.
在服务器通过第一端口接收来自N-1个端口的探测消息之后,服务器可以根据探测消息生成探测结果。After the server receives the probe message from the N-1 ports through the first port, the server may generate a probe result according to the probe message.
具体的,服务器可以根据在预置时间内接收到的所述探测消息的个数,计算所述N-1个端口到所述第一端口的丢包数据;另外,服务器根据在所述预置时间内接收到的探测消息分析所述探测消息是否是错包,以统计所述N-1个端口到所述第一端口的错包数据;最后,服务器根据所述丢包数据和所述错包数据,生成所述探测结果。Specifically, the server may calculate, according to the number of the detection messages received in the preset time, the packet loss data of the N-1 ports to the first port; The probe message received in the time analyzes whether the probe message is a wrong packet, to collect the error packet data of the N-1 ports to the first port; finally, the server according to the packet loss data and the error Packet data, the detection result is generated.
示例性的,丢包数据=周期内应接收的探测消息个数-周期内实际 接收的探测消息个数;Illustrative, packet loss data = number of probe messages that should be received during the period - actual during the period The number of probe messages received;
在计算错包数据时,首先计算接收到的每一个探测消息的CRC值,若果计算得到的CRC值与接收到的探测消息中携带的CRC值不相符,则将该接收到的探测消息记为一个错包数据。其中,CRC是数据通信领域中最常用的一种差错校验码,其特征是信息字段和校验字段的长度可以任意选定。CRC是一种数据传输检错功能,对数据进行多项式计算,并将得到的结果附在帧的后面,接收设备也执行类似的算法,以保证数据传输的正确性和完整性。When calculating the error packet data, first calculating the CRC value of each of the received probe messages, and if the calculated CRC value does not match the CRC value carried in the received probe message, the received probe message is recorded. Is a wrong packet of data. Among them, CRC is the most commonly used error check code in the field of data communication, and the feature is that the length of the information field and the check field can be arbitrarily selected. CRC is a data transmission error detection function. Polynomial calculation is performed on the data, and the obtained result is attached to the frame. The receiving device also performs a similar algorithm to ensure the correctness and integrity of the data transmission.
至此,服务器通过第一端口接收来自N-1个端口的探测消息,并根据探测消息生成探测结果,以便检测设备根据所述各个端口的探测结果确定有故障的端口。At this point, the server receives the probe message from the N-1 ports through the first port, and generates a probe result according to the probe message, so that the detecting device determines the faulty port according to the detection result of the respective ports.
303、检测设备获取各个服务器内N个端口的探测结果。303. The detecting device acquires a detection result of the N ports in each server.
其中,所述检测设备中可以部署有路径探测系统,周期性的接收服务器内N个端口的探测结果,进而路径探测系统根据N个端口的探测结果分析有故障的端口。检测设备获取服务器内N个端口的探测结果,所述探测结果包括N个端口之间互相发送检测消息的丢包数据和错包数据。The detection device may be configured with a path detection system to periodically receive detection results of N ports in the server, and the path detection system analyzes the faulty port according to the detection results of the N ports. The detecting device obtains the detection result of the N ports in the server, and the detection result includes packet loss data and error packet data in which the detection messages are sent between the N ports.
具体的,服务器中的每一个端口都重复上述步骤301和302,直至检测设备的路径探测系统获取到所有N个端口的探测结果,如表5所示。检测设备的路径探测系统获取到所有的N个端口的探测结果后,即获取了当前通信故障的检测系统中,所有通信路径的通信质量,以便检测设备根据所有通信路径的通信质量评估存在故障的端口。Specifically, each of the ports in the server repeats the above steps 301 and 302 until the path detection system of the detecting device acquires the detection results of all N ports, as shown in Table 5. After the path detection system of the detecting device acquires the detection results of all the N ports, the communication quality of all the communication paths in the detection system of the current communication failure is obtained, so that the detecting device evaluates the fault according to the communication quality of all the communication paths. port.
表5table 5
  丢包数据Packet loss data 错包数据Wrong packet data
第一端口First port AA BB
……...... ……...... ……......
第N端口Nth port CC DD
304、检测设备根据探测结果分别计算N个端口之间互相发送检测消息的丢包率。 304. The detecting device calculates, according to the detection result, a packet loss rate of sending detection messages between the N ports.
检测设备在获取服务器内N个端口的探测结果之后,可以根据该探测结果确定第一端口是否有故障,第一端口为N个端口中的一个。After obtaining the detection result of the N ports in the server, the detecting device may determine whether the first port is faulty according to the detection result, and the first port is one of the N ports.
具体的,首先,所述检测设备可以将所述探测结果中的错包数据按照第一预置函数折算为相对丢包数据。Specifically, first, the detecting device may convert the error packet data in the detection result into relative packet loss data according to the first preset function.
示例性的,第一预置函数F1=相对丢包数据=错包数据*5,即按照1:5的比例将所述错包数据折算为相对丢包数据。假设端口1至端口2的错包数据为2,那么端口1至端口2的相对丢包数据=错包数据*5=2*5=10。Exemplarily, the first preset function F1=relative packet loss data=error packet data*5, that is, the error packet data is converted into relative packet loss data according to a ratio of 1:5. Assuming that the error packet data of port 1 to port 2 is 2, the relative packet loss data of port 1 to port 2 = error packet data *5 = 2 * 5 = 10.
其次,根据所述相对丢包数据和所述探测结果中的丢包数据,按照第二预置函数分别计算所述N个端口之间互相发送所述检测消息的丢包率。为准确评估N个端口之间的路径通信质量,可以将所述丢包率记为相对丢包率。由于可能出现服务器的各个端口之间都出现较大丢包率,此时若检测设备按照绝对丢包率计算可能会导致所有端口出现故障的问题,因此,检测设备按照N个端口之间的相对丢包率确定第一端口是否有故障。Then, according to the relative packet loss data and the packet loss data in the detection result, the packet loss rate of the detection messages sent by the N ports is calculated according to the second preset function. In order to accurately evaluate the path communication quality between the N ports, the packet loss rate may be recorded as a relative packet loss rate. As a result of the large packet loss rate between the ports of the server, the detection device may cause all ports to fail according to the absolute packet loss rate. Therefore, the detection device follows the relative relationship between the N ports. The packet loss rate determines if the first port is faulty.
示例性的,第二预置函数F2=丢包率=(相对丢包数据+丢包数据)/应接收探测消息的个数。假设端口1至端口2的相对丢包数据为10,丢包数据为3,周期内应接收探测消息的个数为100,那么,端口1至端口2的丢包率=(相对丢包数据+丢包数据)/应接收探测消息的个数=(10+3)/100=0.13。Exemplarily, the second preset function F2 = packet loss rate = (relative packet loss data + packet loss data) / the number of probe messages that should be received. Assume that the relative packet loss data of port 1 to port 2 is 10, the packet loss data is 3, and the number of probe messages to be received in the period is 100. Then, the packet loss rate of port 1 to port 2 = (relative packet loss data + lost) Packet data) / The number of probe messages that should be received = (10 + 3) / 100 = 0.13.
进一步地,若端口1至端口2的丢包率为0.13,端口1至端口3的丢包率为0.15,端口1至端口4的丢包率为0.05,那么取丢包率最小值(0.05)为基准计算端口1至端口2、3、4的相对丢包率,其中,端口1至端口2的丢包率为0.08,端口1至端口3的丢包率为0.1,端口1至端口4的丢包率为0。Further, if the packet loss rate of port 1 to port 2 is 0.13, the packet loss rate of port 1 to port 3 is 0.15, and the packet loss rate of port 1 to port 4 is 0.05, then the packet loss rate is the minimum (0.05). Calculate the relative packet loss rate of port 1 to port 2, 3, and 4 for the reference, where the packet loss rate of port 1 to port 2 is 0.08, the packet loss rate of port 1 to port 3 is 0.1, and port 1 to port 4 The packet loss rate is zero.
至此,检测设备根据探测结果分别计算N个端口之间互相发送检测消息的相对丢包率。So far, the detecting device calculates the relative packet loss rate of the detection messages sent between the N ports according to the detection result.
305、检测设备根据N个端口之间互相发送检测消息的丢包率确定第一端口是否有故障。 305. The detecting device determines, according to a packet loss rate of sending detection messages between the N ports, whether the first port is faulty.
其中,所述第一端口可以是N个端口中的任一个。在N个端口中,若有至少N/2个端口发送检测消息到第一端口的丢包率大于第一预设值,且至少N/2个端口之间发送检测消息的丢包率小于第二预设值,检测设备则确定第一端口有故障;否则,检测设备则确定第一端口没有故障。The first port may be any one of N ports. The packet loss rate of the detection message sent by the at least N/2 ports to the first port is greater than the first preset value, and the packet loss rate of the detection message sent by the at least N/2 ports is smaller than the number of the N ports. The second preset value, the detecting device determines that the first port is faulty; otherwise, the detecting device determines that the first port is not faulty.
示例性的,以表6为例,根据4个端口之间互相发送检测消息的相对丢包率确定端口1是否有故障。其中,表6内数据为百分制数据。Exemplarily, taking Table 6 as an example, it is determined whether the port 1 is faulty according to the relative packet loss rate of the detection messages sent between the four ports. Among them, the data in Table 6 is the percentage data.
表6Table 6
  端口1 Port 1 端口2 Port 2 端口3 Port 3 端口4Port 4
端口1 Port 1 no 1.21.2 2.22.2 2.52.5
端口2 Port 2 33 no 0.030.03 0.030.03
端口3 Port 3 44 0.080.08 no 0.020.02
端口4Port 4 2.32.3 0.90.9 00 no
具体的,根据表6进行统计,在4个端口中,若端口2、3、4发送检测消息到端口1的相对丢包率都大于第一预设值(1%),且端口2、3、4之间发送检测消息的相对丢包率小于第二预设值(0.2%),因此,检测设备确定端口1有故障。Specifically, according to the statistics in Table 6, in the four ports, if the ports 2, 3, and 4 send the detection message to the port 1 the relative packet loss rate is greater than the first preset value (1%), and the ports 2, 3 The relative packet loss rate of sending the detection message between 4 and 4 is less than the second preset value (0.2%). Therefore, the detecting device determines that port 1 is faulty.
检测设备可以根据上述方法确定N个端口中的每一个端口是否有故障,即检测服务器的各个端口内是否出现“亚健康”状态的端口影响该端口的数据传输效率。The detecting device may determine whether each of the N ports is faulty according to the foregoing method, that is, detecting whether a port in the "sub-health" state exists in each port of the server affects data transmission efficiency of the port.
306、若第一端口有故障,检测设备则生成第一故障通知。306. If the first port is faulty, the detecting device generates a first fault notification.
所述第一故障通知用于指示服务器将第一端口从LAG中移除。The first failure notification is used to instruct the server to remove the first port from the LAG.
具体的,若第一端口有故障,则所述第一端口为“亚健康”状态的端口,该端口影响了数据传输效率。因此,所述检测设备可以生成第一故障通知,并向所述服务器发送第一故障通知,以使得所述服务器根据所述第一故障通知将所述第一端口从LAG中移除,即停止在此端口上发送数据,并根据负荷分担策略在剩下链路中重新计算数据发送的端口,当故障的端口恢复后再次重新计算数据发送端口,这样一来,可以实现所述N个端口之间通信路径的自动切换。 Specifically, if the first port is faulty, the first port is a port in a “sub-health” state, and the port affects data transmission efficiency. Therefore, the detecting device may generate a first failure notification and send a first failure notification to the server, so that the server removes the first port from the LAG according to the first failure notification, ie stops Data is sent on the port, and the port for data transmission is recalculated in the remaining link according to the load sharing policy. When the failed port is restored, the data sending port is recalculated again, so that the N ports can be implemented. Automatic switching of communication paths.
307、若服务器内的各个端口均有故障,检测设备则调用DRS对服务器内运行的虚拟机进行虚拟机热迁移。307. If each port in the server is faulty, the detecting device invokes DRS to perform virtual machine hot migration on the virtual machine running in the server.
具体的,若检测设备根据所述探测结果确定所述服务器内的各个端口均有故障,此时检测设备可以调用DRS对所述服务器内运行的虚拟机进行虚拟机热迁移,或者,检测设备可以向所述服务器发送第二故障通知,以使得所述服务器根据所述第二故障通知调用DRS对所述服务器内运行的虚拟机进行虚拟机热迁移,将有故障的端口上的服务器内的虚拟机迁移到其他没有故障端口的服务器上,以保故障端口对应的虚拟机在进行业务交互时,数据传输不受损害。Specifically, if the detecting device determines that each port in the server is faulty according to the detection result, the detecting device may invoke DRS to perform virtual machine hot migration on the virtual machine running in the server, or the detecting device may Sending a second failure notification to the server, so that the server invokes the DRS according to the second failure notification to perform virtual machine hot migration on the virtual machine running in the server, and the virtual server in the server on the faulty port The machine is migrated to other servers that do not have a faulty port to ensure that data transmission is not impaired when the virtual machine corresponding to the failed port performs business interaction.
其中,虚拟机热迁移(VM Live Migration,又叫动态迁移、实时迁移),即虚拟机保存/恢复(Save/Restore)是指:将整个虚拟机的运行状态完整保存下来,同时可以快速的恢复到原有硬件平台甚至是不同硬件平台上。恢复以后,虚拟机仍旧平滑运行,用户不会察觉到任何差异。Among them, virtual machine migration (VM Live Migration, also known as dynamic migration, live migration), that is, virtual machine save / restore (Save / Restore) means: the entire virtual machine running state is completely saved, and can be quickly restored Go to the original hardware platform or even different hardware platforms. After recovery, the virtual machine is still running smoothly and the user is not aware of any differences.
308、若服务器内的各个端口均有故障,检测设备则生成第二故障通知。308. If each port in the server is faulty, the detecting device generates a second fault notification.
其中,所述第二故障通知用于指示服务器调用DRS对服务器内运行的虚拟机进行虚拟机热迁移。The second fault notification is used to instruct the server to invoke the DRS to perform virtual machine hot migration on the virtual machine running in the server.
309、若第一端口没有故障,且第一端口不在LAG中,服务器则将第一端口添加至LAG中,以便通过第一端口进行数据收发。309. If the first port is not faulty, and the first port is not in the LAG, the server adds the first port to the LAG, so that data is sent and received through the first port.
若所述第一端口没有故障,所述服务器则查询所述第一端口是否在所述LAG中,即判断第一端口之前是不是发生故障已经从LAG中移除;若所述第一端口不在所述LAG中,即第一端口已经从LAG中移除,所述服务器此时可以将所述第一端口重新添加至所述LAG中,以便通过所述第一端口进行数据收发。If the first port is not faulty, the server queries whether the first port is in the LAG, that is, determines whether a failure has been removed from the LAG before the first port; if the first port is not In the LAG, that is, the first port has been removed from the LAG, the server may re-add the first port to the LAG at this time to perform data transmission and reception through the first port.
需要说明的是,在所述检测设备根据所述探测结果确定第一端口有故障之后,将所述第一端口从LAG中移除的工作既可以由检测设备完成,也可以由检测设备发送故障消息告知服务器第一端口有故障,进而由服务器自己将所述第一端口从LAG中移除,本发明对此并不做限定。 It should be noted that, after the detecting device determines that the first port is faulty according to the detection result, the work of removing the first port from the LAG may be performed by the detecting device, or may be sent by the detecting device. The message informs the server that the first port is faulty, and the first port is removed from the LAG by the server itself, which is not limited by the present invention.
显而易见的,上述步骤306至309是步骤308之后的四种可能出现的情况,故步骤306至309之间为并列关系,本发明实施例并不限制步骤306至309之间的逻辑关系。Obviously, the above steps 306 to 309 are four possible situations after the step 308, so the steps 306 to 309 are in a side-by-side relationship, and the embodiment of the present invention does not limit the logical relationship between the steps 306 to 309.
另外,所述N个端口为所述服务器内的物理端口,或者,为所述服务器内运行的虚拟机中的虚拟端口。具体的,在IAAS中,对服务器的物理端口进行全互联的通信故障检测,并对通信故障的端口进行路径切换,在PAAS中,对服务器内运行的虚拟机的虚拟端口进行全互联的通信故障检测,进而结合IAAS场景中对物理端口的探测结果,实现对通信故障的端口的自动路径切换。In addition, the N ports are physical ports in the server, or are virtual ports in a virtual machine running in the server. Specifically, in the IAAS, the physical port of the server is fully interconnected and the communication fault is detected, and the port of the communication failure is switched. In the PAAS, the virtual port of the virtual machine running in the server is completely interconnected. The detection, in combination with the detection result of the physical port in the IAAS scenario, implements automatic path switching of the port with communication failure.
可选的,下面提供一种在PAAS中通信故障的检测方法:Optionally, the following provides a method for detecting a communication failure in the PAAS:
在PAAS中,各个服务器内运行有至少一个虚拟机,所述虚拟机内有虚拟端口,本发明提供的通信故障的检测方法用于检测所述虚拟端口是否出现故障。In the PAAS, at least one virtual machine is running in each server, and the virtual machine has a virtual port. The method for detecting a communication failure provided by the present invention is used to detect whether the virtual port is faulty.
其中,虚拟机(Virtual Machine)指通过软件模拟的具有完整硬件系统功能的、运行在一个完全隔离环境中的完整计算机系统。Among them, virtual machine refers to a complete computer system that runs through a software and has complete hardware system functions and runs in a completely isolated environment.
具体的,在PAAS中通信故障的检测方法可以包括以下步骤:Specifically, the method for detecting a communication failure in the PAAS may include the following steps:
401、虚拟机通过第一虚拟端口接收来自M-1个虚拟端口的虚拟探测消息,所述虚拟探测消息用于确定所述M-1个端口的错包数据和丢包数据,M>2。401. The virtual machine receives, by using the first virtual port, a virtual probe message from the M-1 virtual ports, where the virtual probe message is used to determine the error packet data and the packet loss data of the M-1 ports, where M>2.
其中,接收来自M-1个虚拟端口的虚拟探测消息的方法可以参考步骤301。The method for receiving the virtual probe message from the M-1 virtual ports may refer to step 301.
402、虚拟机根据虚拟探测消息生成虚拟探测结果,所述虚拟探测结果包括M-1个虚拟端口发送虚拟探测消息到第一虚拟端口的丢包数据和错包数据。The virtual machine generates a virtual probe result according to the virtual probe message, where the virtual probe result includes the packet loss data and the error packet data that the M-1 virtual ports send the virtual probe message to the first virtual port.
其中,根据虚拟探测消息生成虚拟探测结果的方法可以参考步骤302。The method for generating a virtual probe result according to the virtual probe message may refer to step 302.
403、虚拟机获取来自M个虚拟端口的探测结果。403. The virtual machine acquires the detection result from the M virtual ports.
其中,所述虚拟机中可以部署有虚拟路径探测系统,按照步骤401和402周期性的接收来自M个虚拟端口的探测结果,进而虚拟路 径探测系统根据M个虚拟端口的虚拟探测结果分析有故障的虚拟端口。The virtual path detection system may be deployed in the virtual machine, and the detection results from the M virtual ports are periodically received according to steps 401 and 402, and the virtual path is further The path detection system analyzes the faulty virtual port based on the virtual probe results of the M virtual ports.
404、虚拟路径探测系统根据虚拟探测结果确定第一虚拟端口是否有故障,第一虚拟端口为N个虚拟端口中的一个。404. The virtual path detection system determines, according to the virtual detection result, whether the first virtual port is faulty, and the first virtual port is one of the N virtual ports.
具体的,虚拟路径探测系统可以根据虚拟探测结果分别计算M个虚拟端口之间互相发送虚拟检测消息的丢包率,其中,丢包率的计算方法可以参考步骤304。进而,虚拟路径探测系统根据N个虚拟端口之间互相发送虚拟检测消息的丢包率确定第一虚拟端口是否有故障,其中确定第一虚拟端口是否有故障的方法可以参考步骤305。Specifically, the virtual path detection system may calculate the packet loss rate of the virtual detection messages sent by the M virtual ports according to the virtual detection result, and the method for calculating the packet loss rate may refer to step 304. In addition, the virtual path detection system determines whether the first virtual port is faulty according to the packet loss rate of the virtual detection messages sent by the N virtual ports, and the method for determining whether the first virtual port is faulty may be referred to step 305.
405、若第一虚拟端口有故障,虚拟路径探测系统则生成虚拟故障信息上报至VNFM,以使得VNFM将所述虚拟故障信息发送至IAAS中的检测设备。405. If the first virtual port is faulty, the virtual path detection system generates a virtual fault information to report to the VNFM, so that the VNFM sends the virtual fault information to the detecting device in the IAAS.
其中,VNFM(Virtual Net Function Manager,虚拟网元功能管理)是指在NFV(Network Function Virtualization,网络功能虚拟化)中虚拟机的管理软件,它可以用于完成应用网元的初始部署、生命周期管理、弹性伸缩管理、虚拟层的虚拟化和硬件层的关键告警以及KPI(Key Performance Indicator,关键绩效指标)的上报等,对调度和分配虚拟资源具有重要意义。VNFM (Virtual Net Function Manager) refers to the management software of the virtual machine in the NFV (Network Function Virtualization), which can be used to complete the initial deployment and life cycle of the application network element. Management, flexible management, virtual layer virtualization and key alarms at the hardware layer, and reporting of KPIs (Key Performance Indicators) are important for scheduling and allocating virtual resources.
具体的,若虚拟路径探测系统确定第一虚拟端口有故障,那么虚拟路径探测系统生成虚拟故障信息,所述虚拟故障信息中可以携带有第一虚拟端口的ID,第一虚拟端口对应的虚拟机的ID,以及第一虚拟端口对应的虚拟机的服务器的ID,虚拟路径探测系统将所述虚拟故障信息上报至VNFM,进而由VNFM转发至IAAS中的检测设备。Specifically, if the virtual path detection system determines that the first virtual port is faulty, the virtual path detection system generates virtual fault information, where the virtual fault information may carry the ID of the first virtual port, and the virtual machine corresponding to the first virtual port The ID, and the ID of the server of the virtual machine corresponding to the first virtual port, the virtual path detection system reports the virtual fault information to the VNFM, and then forwarded by the VNFM to the detecting device in the IAAS.
406、IAAS中的检测设备根据虚拟故障信息进行通信路径切换。406. The detecting device in the IAAS performs communication path switching according to the virtual fault information.
具体的,IAAS中的检测设备根据虚拟故障信息中的服务器的ID,查询第一虚拟端口对应的虚拟机的服务器上的物理端口是否有故障,如果服务器上的物理端口没有故障,那么检测设备对第一虚拟端口对应的虚拟机的ID所指示的虚拟机进行虚拟机热迁移。Specifically, the detecting device in the IAAS queries whether the physical port on the server of the virtual machine corresponding to the first virtual port is faulty according to the ID of the server in the virtual fault information. If the physical port on the server is not faulty, the detecting device is The virtual machine indicated by the ID of the virtual machine corresponding to the first virtual port performs virtual machine hot migration.
至此,本发明的实施例提供一种在PAAS中检测虚拟端口是否故障的方法,同时结合IAAS中的检测设备的检测结果,及时对有故障 的虚拟端口进行通信路径切换,实现了IAAS与PAAS有效结合的云场景下的路径切换。So far, an embodiment of the present invention provides a method for detecting whether a virtual port is faulty in a PAAS, and simultaneously detects a fault in a timely manner in combination with a detection result of a detection device in an IAAS. The virtual port performs communication path switching, and implements path switching in a cloud scenario in which IAAS and PAAS are effectively combined.
可以看出,服务器之间通过各个虚拟端口或者物理端口接收和发送探测消息,形成一个在IAAS和PAAS场景下全互联的路径探测系统,生成探测结果以探测各个端口之间的服务质量,并通过检测设备对各个端口上报的探测结果进行分析,检测出“亚健康”状态的端口,进而及时将“亚健康”状态的端口从LAG中移除,避免服务器使用该“亚健康”状态的端口进行数据收发而导致数据持续受损。It can be seen that the server receives and sends probe messages through the virtual ports or physical ports to form a path detection system that is fully interconnected in the IAAS and PAAS scenarios, and generates probe results to detect the quality of service between the ports. The detection device analyzes the detection results reported by each port, detects the port in the “sub-health” state, and then removes the port in the “sub-health” state from the LAG in time to prevent the server from using the port in the “sub-health” state. Data is sent and received and the data continues to be damaged.
本发明的实施例提供一种通信故障的检测方法,服务器通过第一端口获取到来自各个服务器内N-1个端口的探测消息,并根据所述探测消息生成探测结果,当检测设备分别获取X个服务器内N个端口的探测结果后,根据所述探测结果确定第一端口是否存在故障。该方案中,检测设备分别获取X个服务器内N个端口的探测结果,所述探测结果为各个服务器根据N个端口分别接收到的检测消息生成的,由于所述探测结果包括每个端口根据接收到的其他端口发送的探测消息,确定的所述其它端口的错包数据和丢包数据,因此,检测设备根据所述每个端口确定的所述其它端口的错包数据和丢包数据,确定N个端口中的某一端口是否为故障端口,以检测是否出现“亚健康”状态的端口影响该端口的数据传输效率,从而提高了数据传输的可靠性,解决了现有技术中LAG无法检测到发生异常操作的故障端口的问题,避免了使用故障端口传输数据的风险。An embodiment of the present invention provides a method for detecting a communication failure. The server obtains a probe message from N-1 ports in each server through a first port, and generates a detection result according to the probe message, where the detection device acquires X respectively. After the detection result of the N ports in the server, the first port is determined to be faulty according to the detection result. In this solution, the detecting device obtains the detection results of the N ports in the X servers respectively, and the detection result is generated by each server according to the detection messages respectively received by the N ports, because the detection result includes each port according to the receiving The probe message sent by the other port is determined, and the error packet data and the packet loss data of the other port are determined. Therefore, the detecting device determines, according to the error packet data and the packet loss data of the other port determined by each port. Whether one of the N ports is a faulty port, to detect whether a port with a "sub-health" status affects the data transmission efficiency of the port, thereby improving the reliability of data transmission, and solving the problem that the LAG cannot be detected in the prior art. The problem of a failed port to an abnormal operation avoids the risk of using the failed port to transfer data.
实施例五Embodiment 5
本发明的实施例提供一种检测设备,如图8所示,包括:An embodiment of the present invention provides a detecting device, as shown in FIG. 8, including:
获取单元31,用于分别获取X个服务器内N个端口的探测结果,所述探测结果包括每个端口根据接收到的其他端口发送的探测消息,确定的所述其它端口的错包数据和丢包数据,N>2;The obtaining unit 31 is configured to respectively obtain the detection results of the N ports in the X servers, where the detection result includes the error packet data of the other ports determined by each port according to the received probe messages sent by other ports. Packet data, N>2;
确定单元32,用于根据所述获取单元31中每个端口确定的所述其它端口的错包数据和丢包数据,确定第一端口的状态,所述第一端口的状态用于指示所述第一端口是否有故障,所述第一端口为所述N个端口中的一个;a determining unit 32, configured to determine a state of the first port according to the error packet data and the packet loss data of the other port determined by each port in the obtaining unit 31, where the state of the first port is used to indicate the Whether the first port is faulty, and the first port is one of the N ports;
处理单元33,用于根据所述确定单元32中所述第一端口的状态, 生成所述第一端口的故障通知。The processing unit 33 is configured to determine, according to the state of the first port in the determining unit 32, Generating a failure notification of the first port.
进一步地,如图9所示,所述确定单元32包括计算子单元321,其中,Further, as shown in FIG. 9, the determining unit 32 includes a calculating subunit 321, wherein
所述计算子单元321,用于根据所述探测结果分别计算所述N个端口之间互相发送所述检测消息的丢包率;The calculating sub-unit 321 is configured to separately calculate, according to the detection result, a packet loss rate of the detection messages sent by the N ports to each other;
所述确定单元32,具体用于根据所述计算子单元321中的N个端口之间互相发送所述检测消息的丢包率确定所述第一端口是否有故障。The determining unit 32 is configured to determine whether the first port is faulty according to a packet loss rate of the detection messages sent by the N ports in the computing subunit 321 .
进一步地,所述计算子单元321,具体用于将所述探测结果中的错包数据按照第一预置函数折算为相对丢包数据;以及根据所述相对丢包数据和所述探测结果中的丢包数据,按照第二预置函数分别计算所述N个端口之间互相发送所述检测消息的丢包率。Further, the calculating sub-unit 321 is specifically configured to convert the error packet data in the detection result into relative packet loss data according to a first preset function; and according to the relative packet loss data and the detection result. The packet loss data is calculated according to the second preset function, respectively, and the packet loss rate of the detection messages sent by the N ports is calculated.
进一步地,所述确定单元32,具体用于在所述N个端口中,若有至少N/2个端口发送所述检测消息到所述第一端口的丢包率大于第一预设值,且所述至少N/2个端口之间发送所述检测消息的丢包率小于第二预设值,则确定所述第一端口有故障;否则,则确定所述第一端口没有故障。Further, the determining unit 32 is configured to: if the N2 ports of the N ports send the detection message to the first port, the packet loss rate is greater than the first preset value, And determining, by the at least N/2 ports, that the packet loss rate of the detection message is less than a second preset value, determining that the first port is faulty; otherwise, determining that the first port is faulty.
进一步地,further,
所述处理单元33,具体用于生成所述第一端口的所述第一故障通知,以使得服务器获取所述第一故障通知后,将所述第一端口从LAG中移除;The processing unit 33 is specifically configured to generate the first fault notification of the first port, so that after the server acquires the first fault notification, the first port is removed from the LAG;
其中,所述故障通知包含第一故障通知,所述第一故障通知用于指示所述第一端口有故障。The fault notification includes a first fault notification, where the first fault notification is used to indicate that the first port is faulty.
进一步地,further,
所述处理单元33,具体用于生成所述第一端口的所述第二故障通知,以使得所述服务器获取所述第二故障通知,并调用分布式资源调度程序DRS对所述服务器内运行的虚拟机进行虚拟机热迁移;The processing unit 33 is specifically configured to generate the second failure notification of the first port, so that the server acquires the second failure notification, and invokes a distributed resource scheduler DRS to run in the server. Virtual machine for virtual machine hot migration;
其中,所述故障通知包含第二故障通知,所述第二故障通知用于指示所述X个服务器内的N个端口均有故障。 The fault notification includes a second fault notification, where the second fault notification is used to indicate that all the N ports in the X servers are faulty.
进一步地,所述N个端口为所述X个服务器内的物理端口,或者,为所述X个服务器内运行的虚拟机中的虚拟端口。Further, the N ports are physical ports in the X servers, or are virtual ports in virtual machines running in the X servers.
本发明的实施例提供一种服务器,如图10所示,包括:An embodiment of the present invention provides a server, as shown in FIG. 10, including:
接收单元41,用于通过第一端口接收来自其他服务器内N-1个端口的探测消息,所述探测消息用于确定所述N-1个端口的错包数据和丢包数据,N>2;The receiving unit 41 is configured to receive, by using the first port, a probe message from the N-1 ports in the other server, where the probe message is used to determine the error packet data and the packet loss data of the N-1 ports, where N>2 ;
处理单元42,用于根据所述接收单元41的探测消息生成探测结果,所述探测结果包括所述N-1个端口发送所述探测消息到所述第一端口的丢包数据和错包数据;The processing unit 42 is configured to generate a detection result according to the detection message of the receiving unit 41, where the detection result includes the packet loss data and the error packet data of the N-1 ports sending the probe message to the first port ;
获取单元43,用于根据所述处理单元42的探测结果获取检测设备发送的故障通知,所述故障通知用于指示所述第一端口是否有故障。The obtaining unit 43 is configured to acquire, according to the detection result of the processing unit 42 , a fault notification sent by the detecting device, where the fault notification is used to indicate whether the first port is faulty.
进一步地,所述第一端口为所述服务器内的物理端口,或者,为所述服务器内运行的虚拟机中的虚拟端口,其中,如图11所示,所述服务器还包括移除单元44和迁移单元45,Further, the first port is a physical port in the server, or is a virtual port in a virtual machine running in the server, wherein, as shown in FIG. 11, the server further includes a removing unit 44. And migration unit 45,
所述移除单元44,用于若所述获取单元43中所述第一端口为所述服务器内的物理端口,且所述第一端口有故障,则根据所述获取单元43中的故障通知将所述第一端口从LAG中移除;The removing unit 44 is configured to: if the first port in the obtaining unit 43 is a physical port in the server, and the first port is faulty, according to the fault notification in the acquiring unit 43 Removing the first port from the LAG;
所述迁移单元45,用于若所述获取单元43中所述第一端口为所述服务器内运行的虚拟机中的虚拟端口,且所述第一端口有故障,则根据所述获取单元43中的故障通知对所述第一端口对应的虚拟机进行虚拟机热迁移。The migrating unit 45 is configured to: if the first port in the acquiring unit 43 is a virtual port in a virtual machine running in the server, and the first port is faulty, according to the acquiring unit 43 The fault notification in the virtual machine performs virtual machine hot migration on the virtual machine corresponding to the first port.
进一步地,所述处理单元42,还用于若所述获取单元43中所述第一端口没有故障,则查询所述第一端口是否在所述LAG中;以及若所述第一端口不在所述LAG中,则将所述第一端口添加至所述LAG中,以便通过所述第一端口进行数据收发。Further, the processing unit 42 is further configured to: if the first port in the obtaining unit 43 is not faulty, query whether the first port is in the LAG; and if the first port is not in the In the LAG, the first port is added to the LAG to perform data transmission and reception through the first port.
进一步地,所述处理单元42,具体用于根据在预置时间内接收到的所述接收单元41中的探测消息的个数,计算所述N-1个端口到所述第一端口的丢包数据;并根据在所述预置时间内接收到的所述接收单元41中的探测消息分析所述探测消息是否是错包,以统计所述 N-1个端口到所述第一端口的错包数据;以及根据所述丢包数据和所述错包数据,生成所述探测结果。Further, the processing unit 42 is specifically configured to calculate, according to the number of the probe messages in the receiving unit 41 received in the preset time, the loss of the N-1 ports to the first port. Packet data; and analyzing whether the probe message is a wrong packet according to the probe message in the receiving unit 41 received within the preset time, to count the statistics The error packet data of the N-1 ports to the first port; and the detection result is generated according to the packet loss data and the error packet data.
进一步地,如图12所示,所述服务器还包括发送单元46,Further, as shown in FIG. 12, the server further includes a sending unit 46,
所述获取单元43,还用于分别获取所述N-1个端口的介质访问控制MAC地址;The obtaining unit 43 is further configured to separately obtain media access control MAC addresses of the N-1 ports;
所述处理单元42,还用于根据所述获取单元43中的MAC地址构造所述探测消息;The processing unit 42 is further configured to construct the probe message according to the MAC address in the acquiring unit 43;
所述发送单元46,用于根据所述获取单元43中N-1个端口的MAC地址,通过所述第一端口将所述处理单元42中的探测消息发送至所述N-1个端口。The sending unit 46 is configured to send the probe message in the processing unit 42 to the N-1 ports by using the first port according to the MAC address of the N-1 ports in the acquiring unit 43.
在现有技术中,服务器只能通过LAG检测自身的各个端口是否可用,即判断端口能否传送数据,而不能检测到端口故障时传送数据发生的“亚健康”情况(比如在发送数据包时出现大量丢包,或者篡改数据包中的内容等),导致通过该“亚健康”端口进行传输的数据将持续受损,使数据传输的可靠性降低。而本发明提供的通信故障的检测方法恰恰可以检测出“亚健康”状态的端口,进而及时将“亚健康”状态的端口从LAG中移除,提高了数据传输的可靠性。In the prior art, the server can only detect whether each port of its own port is available through the LAG, that is, whether the port can transmit data, and cannot detect the "sub-health" situation in which the data is transmitted when the port is faulty (for example, when sending a data packet) The occurrence of a large number of packet loss, or tampering with the contents of the data packet, etc., causes the data transmitted through the "sub-health" port to continue to be damaged, thereby reducing the reliability of data transmission. The detection method of the communication failure provided by the present invention can detect the port of the “sub-health” state, and then remove the port of the “sub-health” state from the LAG in time, thereby improving the reliability of data transmission.
本发明的实施例提供一种通信故障的检测装置,服务器通过第一端口获取到来自各个服务器内N-1个端口的探测消息,并根据所述探测消息生成探测结果,当检测设备分别获取X个服务器内N个端口的探测结果后,根据所述探测结果确定第一端口是否存在故障。该方案中,检测设备分别获取X个服务器内N个端口的探测结果,所述探测结果为各个服务器根据N个端口分别接收到的检测消息生成的,由于所述探测结果包括每个端口根据接收到的其他端口发送的探测消息,确定的所述其它端口的错包数据和丢包数据,因此,检测设备根据所述每个端口确定的所述其它端口的错包数据和丢包数据,确定N个端口中的某一端口是否为故障端口,以检测是否出现“亚健康”状态的端口影响该端口的数据传输效率,从而提高了数据传输的可靠性,解决了现有技术中LAG无法检测到发生异常操作的故障端口的问题,避免了使用故障端口传输数据的风险。An embodiment of the present invention provides a device for detecting a communication failure, in which a server acquires a probe message from N-1 ports in each server through a first port, and generates a probe result according to the probe message, where the detection device acquires X respectively. After the detection result of the N ports in the server, the first port is determined to be faulty according to the detection result. In this solution, the detecting device obtains the detection results of the N ports in the X servers respectively, and the detection result is generated by each server according to the detection messages respectively received by the N ports, because the detection result includes each port according to the receiving The probe message sent by the other port is determined, and the error packet data and the packet loss data of the other port are determined. Therefore, the detecting device determines, according to the error packet data and the packet loss data of the other port determined by each port. Whether one of the N ports is a faulty port, to detect whether a port with a "sub-health" status affects the data transmission efficiency of the port, thereby improving the reliability of data transmission, and solving the problem that the LAG cannot be detected in the prior art. The problem of a failed port to an abnormal operation avoids the risk of using the failed port to transfer data.
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁, 仅以上述各功能模块的划分进行举例说明,实际应用中,可以根据需要而将上述功能分配由不同的功能模块完成,即将装置的内部结构划分成不同的功能模块,以完成以上描述的全部或者部分功能。上述描述的系统,装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。It will be apparent to those skilled in the art that for the convenience and brevity of the description, The following is an example of the division of each functional module. In practical applications, the above-mentioned function assignment can be completed by different functional modules as needed, that is, the internal structure of the device is divided into different functional modules to complete all of the above descriptions or Some features. For the specific working process of the system, the device and the unit described above, reference may be made to the corresponding process in the foregoing method embodiments, and details are not described herein again.
在本申请所提供的几个实施例中,应该理解到,所揭露的系统,装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述模块或单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。In the several embodiments provided by the present application, it should be understood that the disclosed system, apparatus, and method may be implemented in other manners. For example, the device embodiments described above are merely illustrative. For example, the division of the modules or units is only a logical function division. In actual implementation, there may be another division manner, for example, multiple units or components may be used. Combinations can be integrated into another system, or some features can be ignored or not executed. In addition, the mutual coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection through some interface, device or unit, and may be in an electrical, mechanical or other form.
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。The units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of the embodiment.
另外,在本发明各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。In addition, each functional unit in each embodiment of the present invention may be integrated into one processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit. The above integrated unit can be implemented in the form of hardware or in the form of a software functional unit.
所述集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本发明的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)或处理器(processor)执行本发明各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、磁碟或者光盘等各种可以存储程序代码的介质。 The integrated unit, if implemented in the form of a software functional unit and sold or used as a standalone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention, which is essential or contributes to the prior art, or all or part of the technical solution, may be embodied in the form of a software product stored in a storage medium. A number of instructions are included to cause a computer device (which may be a personal computer, server, or network device, etc.) or a processor to perform all or part of the steps of the methods described in various embodiments of the present invention. The foregoing storage medium includes: a U disk, a mobile hard disk, a read-only memory (ROM), a random access memory (RAM), a magnetic disk, or an optical disk, and the like. .
以上所述,仅为本发明的具体实施方式,但本发明的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本发明揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本发明的保护范围之内。因此,本发明的保护范围应以所述权利要求的保护范围为准。 The above is only a specific embodiment of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily think of changes or substitutions within the technical scope of the present invention. It should be covered by the scope of the present invention. Therefore, the scope of the invention should be determined by the scope of the appended claims.

Claims (25)

  1. 一种通信故障的检测方法,其特征在于,包括:A method for detecting a communication failure, characterized in that it comprises:
    检测设备分别获取X个服务器内N个端口的探测结果,所述探测结果包括每个端口根据接收到的其他端口发送的探测消息,确定的所述其它端口的错包数据和丢包数据,N>2,X>2;The detecting device obtains the detection results of the N ports in the X servers, and the detection result includes the error packet data and the packet loss data of the other ports determined by each port according to the received probe messages sent by other ports. >2,X>2;
    所述检测设备根据所述每个端口确定的所述其它端口的错包数据和丢包数据,确定第一端口的状态,所述第一端口的状态用于指示所述第一端口是否有故障;Determining, by the detecting device, the status of the first port according to the error packet data and the packet loss data of the other port determined by each port, where the status of the first port is used to indicate whether the first port is faulty ;
    所述检测设备根据所述第一端口的状态,生成所述第一端口的故障通知。The detecting device generates a failure notification of the first port according to the state of the first port.
  2. 根据权利要求1所述的方法,其特征在于,所述检测设备根据所述每个端口确定的所述其它端口的错包数据和丢包数据,确定第一端口的状态,包括:The method according to claim 1, wherein the detecting device determines the state of the first port according to the error packet data and the packet loss data of the other port determined by each port, including:
    所述检测设备根据所述探测结果分别计算所述N个端口之间互相发送所述检测消息的丢包率;The detecting device calculates, according to the detection result, a packet loss rate of the detection messages sent by the N ports to each other;
    所述检测设备根据所述N个端口之间互相发送所述检测消息的丢包率确定所述第一端口是否有故障。The detecting device determines whether the first port is faulty according to a packet loss rate of the detection messages sent between the N ports.
  3. 根据权利要求2所述的方法,其特征在于,所述检测设备根据所述探测结果分别计算所述N个端口之间互相发送所述检测消息的丢包率,包括:The method according to claim 2, wherein the detecting device separately calculates a packet loss rate of the detection messages sent by the N ports according to the detection result, including:
    所述检测设备将所述探测结果中的错包数据按照第一预置函数折算为相对丢包数据;The detecting device converts the error packet data in the detection result into relative packet loss data according to the first preset function;
    所述检测设备根据所述相对丢包数据和所述探测结果中的丢包数据,按照第二预置函数分别计算所述N个端口之间互相发送所述检测消息的丢包率。The detecting device calculates a packet loss rate of the detection messages between the N ports according to the second preset function according to the relative packet loss data and the packet loss data in the detection result.
  4. 根据权利要求2所述的方法,其特征在于,所述检测设备根据所述N个端口之间互相发送所述检测消息的丢包率确定所述第一端口是否有故障,包括:The method according to claim 2, wherein the detecting device determines whether the first port is faulty according to a packet loss rate of the detection messages sent by the N ports, including:
    在所述N个端口中,若有至少N/2个端口发送所述检测消息到所述第一端口的丢包率大于第一预设值,且所述至少N/2个端口之间发送所述检测消息的丢包率小于第二预设值,所述检测设备则确定所述第一端口有故障;否则,所述检测设备则确定所述第一端口没有故 障。And the packet loss rate of the at least N/2 ports that send the detection message to the first port is greater than a first preset value, and the at least N/2 ports are sent between the N ports. If the packet loss rate of the detection message is less than a second preset value, the detecting device determines that the first port is faulty; otherwise, the detecting device determines that the first port is not barrier.
  5. 根据权利要求1至4中任一项所述的方法,其特征在于,所述故障通知包含第一故障通知,所述第一故障通知用于指示所述第一端口有故障,The method according to any one of claims 1 to 4, wherein the failure notification includes a first failure notification, the first failure notification is used to indicate that the first port is faulty,
    其中,生成所述第一端口的故障通知,包括:The generating a failure notification of the first port includes:
    所述检测设备生成所述第一端口的所述第一故障通知,以使得服务器获取所述第一故障通知后,将所述第一端口从链路聚合组LAG中移除。The detecting device generates the first failure notification of the first port, so that after the server acquires the first failure notification, the first port is removed from the link aggregation group LAG.
  6. 根据权利要求5所述的方法,其特征在于,所述故障通知包含第二故障通知,所述第二故障通知用于指示所述X个服务器内的N个端口均有故障,The method according to claim 5, wherein the failure notification includes a second failure notification, and the second failure notification is used to indicate that all of the N ports in the X servers are faulty.
    其中,生成所述第一端口的故障通知,包括:The generating a failure notification of the first port includes:
    所述检测设备生成所述第一端口的所述第二故障通知,以使得所述服务器获取所述第二故障通知,并调用分布式资源调度程序DRS对所述服务器内运行的虚拟机进行虚拟机热迁移。The detecting device generates the second failure notification of the first port, so that the server acquires the second failure notification, and invokes a distributed resource scheduler DRS to virtualize a virtual machine running in the server. Machine heat migration.
  7. 根据权利要求1至6中任一项所述的方法,其特征在于,所述N个端口为所述X个服务器内的物理端口,或者,为所述X个服务器内运行的虚拟机中的虚拟端口。The method according to any one of claims 1 to 6, wherein the N ports are physical ports in the X servers, or in a virtual machine running in the X servers Virtual port.
  8. 一种通信故障的检测方法,其特征在于,包括:A method for detecting a communication failure, characterized in that it comprises:
    服务器通过第一端口接收来自其他服务器内N-1个端口的探测消息,所述探测消息用于确定所述N-1个端口的错包数据和丢包数据,N>2;The server receives the probe message from the N-1 ports in the other server through the first port, where the probe message is used to determine the error packet data and the packet loss data of the N-1 ports, where N>2;
    所述服务器根据所述探测消息生成探测结果,所述探测结果包括所述N-1个端口发送所述探测消息到所述第一端口的丢包数据和错包数据;The server generates a detection result according to the detection message, where the detection result includes packet loss data and error packet data of the N-1 ports sending the probe message to the first port;
    所述服务器根据所述探测结果获取检测设备发送的故障通知,所述故障通知用于指示所述第一端口是否有故障。The server acquires a fault notification sent by the detecting device according to the detection result, where the fault notification is used to indicate whether the first port is faulty.
  9. 根据权利要求8所述的方法,其特征在于,所述第一端口为所述服务器内的物理端口,或者,为所述服务器内运行的虚拟机中的虚拟端口,The method according to claim 8, wherein the first port is a physical port in the server, or is a virtual port in a virtual machine running in the server.
    其中,在服务器根据所述探测结果获取检测设备发送的故障通知之后,还包括:After the server obtains the fault notification sent by the detecting device according to the detection result, the method further includes:
    若所述第一端口为所述服务器内的物理端口,且所述第一端口有 故障,所述服务器则根据所述故障通知将所述第一端口从链路聚合组LAG中移除;If the first port is a physical port in the server, and the first port has If the fault occurs, the server removes the first port from the link aggregation group LAG according to the fault notification;
    若所述第一端口为所述服务器内运行的虚拟机中的虚拟端口,且所述第一端口有故障,所述服务器则根据所述故障通知对所述第一端口对应的虚拟机进行虚拟机热迁移。If the first port is a virtual port in the virtual machine running in the server, and the first port is faulty, the server virtualizes the virtual machine corresponding to the first port according to the failure notification. Machine heat migration.
  10. 根据权利要求8所述的方法,其特征在于,在服务器根据所述探测结果获取检测设备发送的故障通知之后,还包括:The method according to claim 8, wherein after the server obtains the failure notification sent by the detection device according to the detection result, the method further includes:
    若所述第一端口没有故障,所述服务器则查询所述第一端口是否在所述LAG中;If the first port is not faulty, the server queries whether the first port is in the LAG;
    若所述第一端口不在所述LAG中,所述服务器则将所述第一端口添加至所述LAG中,以便通过所述第一端口进行数据收发。If the first port is not in the LAG, the server adds the first port to the LAG to perform data transmission and reception through the first port.
  11. 根据权利要求8所述的方法,其特征在于,服务器根据所述探测消息生成探测结果,包括:The method according to claim 8, wherein the server generates the detection result according to the detection message, including:
    所述服务器根据在预置时间内接收到的所述探测消息的个数,计算所述N-1个端口到所述第一端口的丢包数据;The server calculates packet loss data of the N-1 ports to the first port according to the number of the probe messages received in the preset time;
    所述服务器根据在所述预置时间内接收到的探测消息分析所述探测消息是否是错包,以统计所述N-1个端口到所述第一端口的错包数据;The server analyzes whether the probe message is a wrong packet according to the probe message received in the preset time, to collect error packet data of the N-1 ports to the first port;
    所述服务器根据所述丢包数据和所述错包数据,生成所述探测结果。The server generates the detection result according to the lost packet data and the error packet data.
  12. 根据权利要求8所述的方法,其特征在于,所述方法还包括:The method of claim 8 further comprising:
    所述服务器分别获取所述N-1个端口的介质访问控制MAC地址;The server respectively obtains media access control MAC addresses of the N-1 ports;
    所述服务器根据所述MAC地址构造所述探测消息;The server constructs the probe message according to the MAC address;
    所述服务器根据所述N-1个端口的MAC地址,通过所述第一端口将所述探测消息发送至所述N-1个端口。The server sends the probe message to the N-1 ports by using the first port according to the MAC address of the N-1 ports.
  13. 一种检测设备,其特征在于,包括:A detecting device, comprising:
    获取单元,用于分别获取X个服务器内N个端口的探测结果,所述探测结果包括每个端口根据接收到的其他端口发送的探测消息,确定的所述其它端口的错包数据和丢包数据,N>2,X>2;The obtaining unit is configured to obtain the detection results of the N ports in the X servers respectively, where the detection result includes the error packet data and the packet loss of the other port determined by each port according to the received probe message sent by the other port. Data, N>2, X>2;
    确定单元,用于根据所述获取单元中每个端口确定的所述其它端口的错包数据和丢包数据,确定第一端口的状态,所述第一端口的状态用于指示所述第一端口是否有故障,所述第一端口为所述N个端 口中的一个;a determining unit, configured to determine a state of the first port according to the error packet data and the packet loss data of the other port determined by each port in the acquiring unit, where the state of the first port is used to indicate the first Whether the port is faulty, the first port is the N ends One of the mouths;
    处理单元,用于根据所述确定单元中所述第一端口的状态,生成所述第一端口的故障通知。And a processing unit, configured to generate a failure notification of the first port according to a state of the first port in the determining unit.
  14. 根据权利要求13所述的检测设备,其特征在于,所述确定单元包括计算子单元,其中,The detecting device according to claim 13, wherein the determining unit comprises a calculating subunit, wherein
    所述计算子单元,用于根据所述探测结果分别计算所述N个端口之间互相发送所述检测消息的丢包率;The calculating subunit is configured to calculate, according to the detection result, a packet loss rate of the detection messages sent by the N ports to each other;
    所述确定单元,具体用于根据所述计算子单元中的N个端口之间互相发送所述检测消息的丢包率确定所述第一端口是否有故障。The determining unit is specifically configured to determine, according to a packet loss rate of the detection messages sent by the N ports in the computing subunit, whether the first port is faulty.
  15. 根据权利要求14所述的检测设备,其特征在于,The detecting device according to claim 14, wherein
    所述计算子单元,具体用于将所述探测结果中的错包数据按照第一预置函数折算为相对丢包数据;以及根据所述相对丢包数据和所述探测结果中的丢包数据,按照第二预置函数分别计算所述N个端口之间互相发送所述检测消息的丢包率。The calculating subunit is specifically configured to convert the error packet data in the detection result into relative packet loss data according to a first preset function; and according to the relative packet loss data and the packet loss data in the detection result. And calculating, according to the second preset function, a packet loss rate of the detection messages sent by the N ports to each other.
  16. 根据权利要求14所述的检测设备,其特征在于,The detecting device according to claim 14, wherein
    所述确定单元,具体用于在所述N个端口中,若有至少N/2个端口发送所述检测消息到所述第一端口的丢包率大于第一预设值,且所述至少N/2个端口之间发送所述检测消息的丢包率小于第二预设值,则确定所述第一端口有故障;否则,则确定所述第一端口没有故障。The determining unit is configured to: if the N/2 ports of the N ports send the detection message to the first port, the packet loss rate is greater than a first preset value, and the determining If the packet loss rate of the detection message sent between the N/2 ports is less than the second preset value, it is determined that the first port is faulty; otherwise, the first port is determined to be faultless.
  17. 根据权利要求13至16中任一项所述的检测设备,其特征在于,A detecting device according to any one of claims 13 to 16, wherein
    所述处理单元,具体用于生成所述第一端口的所述第一故障通知,以使得服务器获取所述第一故障通知后,将所述第一端口从链路聚合组LAG中移除;The processing unit is specifically configured to generate the first fault notification of the first port, so that after the server obtains the first fault notification, the first port is removed from the link aggregation group LAG;
    其中,所述故障通知包含第一故障通知,所述第一故障通知用于指示所述第一端口有故障。The fault notification includes a first fault notification, where the first fault notification is used to indicate that the first port is faulty.
  18. 根据权利要求17所述的检测设备,其特征在于,The detecting device according to claim 17, wherein
    所述处理单元,具体用于生成所述第一端口的所述第二故障通知,以使得所述服务器获取所述第二故障通知,并调用分布式资源调度程序DRS对所述服务器内运行的虚拟机进行虚拟机热迁移;The processing unit is specifically configured to generate the second failure notification of the first port, so that the server acquires the second failure notification, and invokes a distributed resource scheduler DRS to run in the server. The virtual machine performs virtual machine hot migration;
    其中,所述故障通知包含第二故障通知,所述第二故障通知用于指示所述X个服务器内的N个端口均有故障。 The fault notification includes a second fault notification, where the second fault notification is used to indicate that all the N ports in the X servers are faulty.
  19. 根据权利要求13至18中任一项所述的检测设备,其特征在于,所述N个端口为所述X个服务器内的物理端口,或者,为所述X个服务器内运行的虚拟机中的虚拟端口。The detecting device according to any one of claims 13 to 18, wherein the N ports are physical ports in the X servers, or are in a virtual machine running in the X servers Virtual port.
  20. 一种服务器,其特征在于,包括:A server, comprising:
    接收单元,用于通过第一端口接收来自其他服务器内N-1个端口的探测消息,所述探测消息用于确定所述N-1个端口的错包数据和丢包数据,N>2;a receiving unit, configured to receive, by using the first port, a probe message from the N-1 ports in the other server, where the probe message is used to determine the error packet data and the packet loss data of the N-1 ports, where N>2;
    处理单元,用于根据所述接收单元的探测消息生成探测结果,所述探测结果包括所述N-1个端口发送所述探测消息到所述第一端口的丢包数据和错包数据;a processing unit, configured to generate a detection result according to the detection message of the receiving unit, where the detection result includes packet loss data and error packet data of the N-1 ports sending the probe message to the first port;
    获取单元,用于根据所述处理单元的探测结果获取检测设备发送的故障通知,所述故障通知用于指示所述第一端口是否有故障。And an obtaining unit, configured to acquire, according to the detection result of the processing unit, a fault notification sent by the detecting device, where the fault notification is used to indicate whether the first port is faulty.
  21. 根据权利要求20所述的服务器,其特征在于,所述第一端口为所述服务器内的物理端口,或者,为所述服务器内运行的虚拟机中的虚拟端口,其中,所述服务器还包括移除单元和迁移单元,The server according to claim 20, wherein the first port is a physical port in the server, or is a virtual port in a virtual machine running in the server, wherein the server further includes Remove the unit and the migration unit,
    所述移除单元,用于若所述获取单元中所述第一端口为所述服务器内的物理端口,且所述第一端口有故障,则根据所述获取单元中的故障通知将所述第一端口从链路聚合组LAG中移除;The removing unit is configured to: if the first port in the acquiring unit is a physical port in the server, and the first port is faulty, according to the fault notification in the acquiring unit, The first port is removed from the link aggregation group LAG;
    所述迁移单元,用于若所述获取单元中所述第一端口为所述服务器内运行的虚拟机中的虚拟端口,且所述第一端口有故障,则根据所述获取单元中的故障通知对所述第一端口对应的虚拟机进行虚拟机热迁移。The migrating unit is configured to: if the first port in the acquiring unit is a virtual port in a virtual machine running in the server, and the first port is faulty, according to the fault in the acquiring unit Notifying that the virtual machine corresponding to the first port performs virtual machine hot migration.
  22. 根据权利要求20所述的服务器,其特征在于,A server according to claim 20, wherein
    所述处理单元,还用于若所述获取单元中所述第一端口没有故障,则查询所述第一端口是否在所述LAG中;以及若所述第一端口不在所述LAG中,则将所述第一端口添加至所述LAG中,以便通过所述第一端口进行数据收发。The processing unit is further configured to: if the first port in the acquiring unit is not faulty, query whether the first port is in the LAG; and if the first port is not in the LAG, Adding the first port to the LAG for data transceiving through the first port.
  23. 根据权利要求20所述的服务器,其特征在于,A server according to claim 20, wherein
    所述处理单元,具体用于根据在预置时间内接收到的所述接收单元中的探测消息的个数,计算所述N-1个端口到所述第一端口的丢包数据;并根据在所述预置时间内接收到的所述接收单元中的探测消息分析所述探测消息是否是错包,以统计所述N-1个端口到所述第一端口的错包数据;以及根据所述丢包数据和所述错包数据,生成所述探 测结果。The processing unit is configured to calculate, according to the number of the probe messages in the receiving unit received in the preset time, the packet loss data of the N-1 ports to the first port; The probe message in the receiving unit received in the preset time analyzes whether the probe message is a wrong packet, to collect error packet data of the N-1 ports to the first port; The packet loss data and the error packet data generate the probe Test results.
  24. 根据权利要求20所述的服务器,其特征在于,所述服务器还包括发送单元,The server according to claim 20, wherein said server further comprises a transmitting unit,
    所述获取单元,还用于分别获取所述N-1个端口的介质访问控制MAC地址;The acquiring unit is further configured to separately obtain media access control MAC addresses of the N-1 ports;
    所述处理单元,还用于根据所述获取单元中的MAC地址构造所述探测消息;The processing unit is further configured to construct the probe message according to a MAC address in the acquiring unit;
    所述发送单元,用于根据所述获取单元中N-1个端口的MAC地址,通过所述第一端口将所述处理单元中的探测消息发送至所述N-1个端口。The sending unit is configured to send, by using the first port, a probe message in the processing unit to the N-1 ports according to a MAC address of the N-1 ports in the acquiring unit.
  25. 一种通信故障的检测系统,其特征在于,所述检测系统包括如权利要求13至19中任一项所述的检测设备,以及与所述检测设备相连接的如权利要求20至24中任一项所述的服务器。 A detection system for communication failure, characterized in that the detection system comprises the detection device according to any one of claims 13 to 19, and the connection device as claimed in any one of claims 20 to 24 One of the described servers.
PCT/CN2015/084002 2014-08-26 2015-07-14 Communication failure detection method, device and system WO2016029749A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201410425003.X 2014-08-26
CN201410425003.XA CN104219107B (en) 2014-08-26 2014-08-26 A kind of detection method of communication failure, apparatus and system

Publications (1)

Publication Number Publication Date
WO2016029749A1 true WO2016029749A1 (en) 2016-03-03

Family

ID=52100263

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2015/084002 WO2016029749A1 (en) 2014-08-26 2015-07-14 Communication failure detection method, device and system

Country Status (2)

Country Link
CN (1) CN104219107B (en)
WO (1) WO2016029749A1 (en)

Families Citing this family (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104219107B (en) * 2014-08-26 2018-08-14 华为技术有限公司 A kind of detection method of communication failure, apparatus and system
CN104869023B (en) * 2015-05-29 2019-02-26 华为技术有限公司 A kind of time-correcting method, apparatus and system
CN106330650B (en) * 2015-06-25 2019-12-03 中兴通讯股份有限公司 A kind of IP moving method and device, virtualization network system
CN107409063B (en) 2015-08-25 2019-12-24 华为技术有限公司 Method, device and system for acquiring VNF information
CN106533964A (en) * 2015-09-09 2017-03-22 中兴通讯股份有限公司 Method and device for managing packet loss of link aggregation member ports
CN105656715B (en) * 2015-12-30 2019-06-18 中国银联股份有限公司 Method and apparatus for monitoring the state of cloud computing environment lower network equipment
CN106685695B (en) * 2016-11-28 2020-02-14 上海华为技术有限公司 Fault detection method and equipment thereof
CN108337102B (en) * 2017-01-19 2020-07-24 华为技术有限公司 Method and device for deploying and generating parameters and files in virtual network
CN106791823B (en) * 2017-02-07 2018-09-28 浙江大华技术股份有限公司 A kind of equipment zero code stream fault handling method, device and electronic equipment
CN108881011B (en) * 2017-05-08 2022-03-29 中兴通讯股份有限公司 LACP (Link aggregation control protocol) switching method and device applied to cross-device
CN107690139A (en) * 2017-08-28 2018-02-13 苏州思创源博电子科技有限公司 A kind of communication means for photovoltaic generation
CN107566222A (en) * 2017-10-18 2018-01-09 中国联合网络通信集团有限公司 A kind of method and device for calculating packet loss
CN107888457B (en) * 2017-12-08 2020-08-14 新华三技术有限公司 Port packet loss detection method and device and communication equipment
CN108390780B (en) * 2018-02-11 2021-04-20 北京百度网讯科技有限公司 Method and apparatus for processing information
CN108683542A (en) * 2018-05-22 2018-10-19 郑州云海信息技术有限公司 A kind of fault self-diagnosis method of distributed memory system, system and device
CN109039887B (en) * 2018-09-10 2021-06-29 迈普通信技术股份有限公司 Stacking system fault processing method and equipment
CN110213128B (en) * 2019-05-28 2020-06-05 掌阅科技股份有限公司 Service port detection method, electronic device and computer storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101478492A (en) * 2009-02-10 2009-07-08 杭州华三通信技术有限公司 Method and apparatus for stacking member port detection
CN102014022A (en) * 2010-12-02 2011-04-13 福建星网锐捷网络有限公司 Equipment port fault processing method and device, and network equipment
CN102164056A (en) * 2011-03-17 2011-08-24 杭州华三通信技术有限公司 Stacked link aggregation fault detection method and stacked devices
CN104219107A (en) * 2014-08-26 2014-12-17 华为技术有限公司 Communication fault detecting method, communication fault detecting device and communication fault detecting system

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101340456B (en) * 2008-08-15 2012-04-18 杭州华三通信技术有限公司 A converging method of distributed aggregated link failure and a stacking apparatus
CN101610212B (en) * 2009-07-27 2012-12-12 迈普通信技术股份有限公司 Method and card for realizing reliable data plane communication
US9629012B2 (en) * 2010-09-20 2017-04-18 Empire Technology Development Llc Dynamic mobile application quality-of-service monitor

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101478492A (en) * 2009-02-10 2009-07-08 杭州华三通信技术有限公司 Method and apparatus for stacking member port detection
CN102014022A (en) * 2010-12-02 2011-04-13 福建星网锐捷网络有限公司 Equipment port fault processing method and device, and network equipment
CN102164056A (en) * 2011-03-17 2011-08-24 杭州华三通信技术有限公司 Stacked link aggregation fault detection method and stacked devices
CN104219107A (en) * 2014-08-26 2014-12-17 华为技术有限公司 Communication fault detecting method, communication fault detecting device and communication fault detecting system

Also Published As

Publication number Publication date
CN104219107A (en) 2014-12-17
CN104219107B (en) 2018-08-14

Similar Documents

Publication Publication Date Title
WO2016029749A1 (en) Communication failure detection method, device and system
US10917322B2 (en) Network traffic tracking using encapsulation protocol
US10601643B2 (en) Troubleshooting method and apparatus using key performance indicator information
US8891516B2 (en) Extended link aggregation (LAG) for use in multiple switches
CN109450666B (en) Distributed system network management method and device
EP2798800B1 (en) Expanding member ports of a link aggregation group between clusters
US9838245B2 (en) Systems and methods for improved fault tolerance in solicited information handling systems
CN108632099B (en) Fault detection method and device for link aggregation
US8341231B2 (en) Systems and methods for processing heartbeat messages
US9197516B2 (en) In-service throughput testing in distributed router/switch architectures
CN111988191B (en) Fault detection method and device for distributed communication network
EP3624401B1 (en) Systems and methods for non-intrusive network performance monitoring
WO2017032223A1 (en) Virtual machine deployment method and apparatus
US8675498B2 (en) System and method to provide aggregated alarm indication signals
CN113472646B (en) Data transmission method, node, network manager and system
CN105743816A (en) Link aggregation method and device
CN103220189A (en) Multi-active detection (MAD) backup method and equipment
CN107332793B (en) Message forwarding method, related equipment and system
US10996971B2 (en) Service OAM for virtual systems and services
US11539728B1 (en) Detecting connectivity disruptions by observing traffic flow patterns
CN113364678A (en) Data transmission system, method, device, electronic equipment and computer readable medium
CN108141374B (en) Network sub-health diagnosis method and device
US10917504B1 (en) Identifying the source of CRC errors in a computing network
CN107104837B (en) Method and control device for path detection
CN105721234A (en) Port aggregation method and apparatus

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 15836849

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 15836849

Country of ref document: EP

Kind code of ref document: A1