WO2018107882A1 - 故障定位方法和网络设备 - Google Patents

故障定位方法和网络设备 Download PDF

Info

Publication number
WO2018107882A1
WO2018107882A1 PCT/CN2017/105881 CN2017105881W WO2018107882A1 WO 2018107882 A1 WO2018107882 A1 WO 2018107882A1 CN 2017105881 W CN2017105881 W CN 2017105881W WO 2018107882 A1 WO2018107882 A1 WO 2018107882A1
Authority
WO
WIPO (PCT)
Prior art keywords
network device
network
kpi
statistical
fault information
Prior art date
Application number
PCT/CN2017/105881
Other languages
English (en)
French (fr)
Inventor
薛莉
谢于明
张亮
吴俊�
丁律
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to EP17881289.7A priority Critical patent/EP3547612A1/en
Publication of WO2018107882A1 publication Critical patent/WO2018107882A1/zh
Priority to US16/437,442 priority patent/US11411810B2/en

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0677Localisation of faults
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/18Complex mathematical operations for evaluating statistical data, e.g. average values, frequency distributions, probability functions, regression analysis
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0631Management of faults, events, alarms or notifications using root cause analysis; using analysis of correlation between notifications, alarms or events based on decision criteria, e.g. hierarchy, tree or time analysis
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/14Network analysis or design
    • H04L41/142Network analysis or design using statistical or mathematical methods
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/50Network service management, e.g. ensuring proper service fulfilment according to agreements
    • H04L41/5003Managing SLA; Interaction between SLA and QoS
    • H04L41/5009Determining service level performance parameters or violations of service level contracts, e.g. violations of agreed response time or mean time between failures [MTBF]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/08Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters
    • H04L43/0805Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters by checking availability
    • H04L43/0811Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters by checking availability by checking connectivity

Definitions

  • Embodiments of the present invention relate to the field of communications, and more specifically, to a fault location method and a network device.
  • the network often fails.
  • a fault occurs, if the faulty network device or link is not located and processed in time, the fault may spread to the entire network, that is, network flapping occurs.
  • Network turbulence can cause network disruption and business disruption. For example, if a routing device has a clock failure, the system time of the routing device is several hundred times faster than the system time of other routing devices in the network. This causes other routing devices on the entire network to repeatedly delete and generate corresponding routing devices. The entry of the other network is severely depleted. When the resources of other routing devices are exhausted, the entire network and the entire network are interrupted.
  • Each router collects the state information of each router's CPU separately, and the Telnet protocol can only be grouped and serially logged into different network devices for information collection. Therefore, in this case, multiple network devices are required to access the network remotely, and the efficiency of collecting and locating fault information is low.
  • the second is to manually check the massive information of many network devices when analyzing the faults.
  • This mode requires the operation and maintenance personnel to have rich experience in equipment operation and maintenance, and the analysis efficiency is low, resulting in long failures.
  • the business is affected by a large area.
  • the embodiment of the invention provides a fault location method and a network device, which can quickly and accurately locate a faulty network device in the network.
  • a method of fault location receives the fault information message sent by the network device other than the first network device in a flooding manner, and each fault information packet includes a network device that sends the fault information message about the internal gateway protocol.
  • the statistical information of the packet, the statistical information of each network device includes a statistical result of the network device for one or more key performance indicator KPIs; the first network device according to the statistical information of the first network device and the Other network device statistics to determine the network device that has failed in the network.
  • the information that is, the KPI of the network device, accelerates the process of fault location and shortens the fault location time.
  • the first network device acquires statistics information of the first network device, where the statistics information includes one or more of the first network devices a statistical result of the KPI; the first network device sends the fault information packet of the first network device to the other network device in a flooding manner, where the fault information packet of the first network device includes the first Statistics of network devices.
  • the first network device sends the fault information report of the first network device to the other network device in a flooding manner And the first network device sends the fault information packet of the first network device to the other network device in a flooding manner according to the first preset period; or in the first network device If the statistic result of the KPI meets the preset condition, the first network device sends the fault information packet of the first network device to the other network device in a flooding manner.
  • the flooding mode of the first network device is sent to the other network device by using the flooding mode in which the KPI of the first network device meets the preset condition, thereby saving the resource occupation of the network bandwidth by the fault information message.
  • the first network device is configured according to the first network device
  • the statistical information and the statistical information of the other network devices are used to determine the faulty network device in the network, including: the first network device determines a target KPI; and the first network device determines statistics according to the first network device And calculating, by the statistical information of the other network device, a KPI change rate of the target KPI on each network device in the network; the first network device according to a KPI of a target KPI on each network device in the network a rate of change, selecting a failed network device from the first network device and the other network device, wherein a KPI change rate of the target KPI of the failed network device is greater than or equal to a preset KPI change rate threshold.
  • the first network device is configured according to the first network device The statistical information and the statistical information of the other network device, determining the faulty network device in the network, including: the first network device determining a target KPI; the first network device according to the first network device And the statistics information of the other network devices, and the statistics result of the packets corresponding to the internal gateway protocol of the target KPI transmitted between any two network devices in the network;
  • the target KPI for the inter-transmission corresponds to the statistics of the packets of the internal gateway protocol, and generates an adjacency matrix; determining, according to the adjacency matrix, a centrality of each network device in the network; Centrality, determining the failed network device in the network.
  • the adjacency matrix is generated by the statistical result of the target KPI corresponding to the internal gateway protocol packet transmitted between any two network devices, and the centrality of the generated adjacency matrix is calculated, so that the faulty network device can be accurately located.
  • the fault information packet is specifically used for carrying the network device KPI message.
  • the fault information packet is a packet based on an internal gateway protocol IGP.
  • the other network device includes a second network device, where the The fault information message sent by the network device carries the statistics of one or more KPIs of the third network device adjacent to the second network device, where the third network device is a network that does not support the transmission of the fault information message. device.
  • a network device comprising one or more modules for performing the method of the first aspect.
  • a network device comprising a memory and a processor, the memory for storing program code, the processor for calling the program code to implement the first aspect and the implementation of the first aspect The method in the way.
  • a fourth aspect a computer readable medium for storing program code executable by the network device, the program code comprising for performing the first aspect and the first aspect described above Instructions for methods in various implementations.
  • FIG. 1 is a schematic architectural diagram of an application scenario according to an embodiment of the present invention.
  • FIG. 2 is a schematic flow chart of a fault location method according to an embodiment of the present invention.
  • FIG. 3 is another schematic flowchart of a fault location method according to an embodiment of the present invention.
  • FIG. 4 is a schematic diagram of a network that can be applied to an embodiment of the present invention.
  • FIG. 5 is a schematic diagram of another network that can be applied to an embodiment of the present invention.
  • FIG. 6 is a schematic block diagram of a network device according to an embodiment of the present invention.
  • FIG. 7 is a schematic structural diagram of a network device according to an embodiment of the present invention.
  • FIG. 1 is a schematic structural diagram of an application scenario of an embodiment of the present invention.
  • the network 100 includes network devices A, B, C, D, E, F, G, and H, and the plurality of network devices are connected to each other.
  • the network devices in the network 100 may be routers, switches, hubs, bridges, gateways, or other types of network devices, where the types of network devices may be the same or different.
  • the network device in the network 100 can receive the fault information packet sent by the network device other than the network device in the flooding manner, and each fault information report
  • the key performance indicator (English: key performance indicator, referred to as "KPI"
  • KPI key performance indicator
  • the embodiment of the present invention provides a fault location method, which can quickly collect information for fault location and improve fault location efficiency. The method is described in detail below with reference to FIG. 2 .
  • FIG. 2 is a schematic flowchart of a fault location method 200, which may be performed by a first network device, where the first network device may be any one of the network devices, for example, the first The network device may be network device A in network 100 or other network device in network 100.
  • Method 200 as shown in FIG. 2 includes:
  • the first network device receives the fault information packet sent by the network device in the network except the first network device in a flooding manner, and each fault information packet includes a network device that sends the fault information packet.
  • the statistics of the packets of the internal gateway protocol, and the statistics of each network device include the statistical results of the network device for one or more key performance indicator KPIs.
  • the first network device determines, according to the statistics information of the first network device and the statistics information of other network devices, the faulty network device in the network.
  • the network device in the network receives the fault information packet sent by the network device other than the network device in the flooding manner, and each fault information packet includes sending the fault information packet.
  • the network device of the text about the statistics of the message of the internal gateway protocol, the any network device determines the network device that is faulty in the network according to the statistical information of the any one of the network devices and the statistical information of other network devices in the network, thereby Solve the problem that the fault information collection is difficult when the network equipment in the network fails, and locate the network fault quickly and accurately.
  • flooding is a kind of message delivery (English: delivery) technology.
  • a network device sends fault information packets in a flooded manner. The network device passes fault information packets through all interfaces of the network device. Send it out.
  • each network device After the network devices send fault information packets to each other in flooding mode, each network device receives fault information packets from other network devices to collect KPIs of other network devices.
  • the network device fails, operation and maintenance A person can log in to any network device and locate the fault based on the KPI of the network device and the KPIs of other network devices collected by the network device.
  • the first network device determines, according to the statistical information of the first network device and the statistical information of the other network device, that the network device in the network is faulty, according to the KPI change rate of the network device. Identify network devices that have failed in the network.
  • the KPI change rate of a KPI is a rate of change of the statistical result of the KPI at the first time relative to the statistical result of the KPI at the second time
  • the KPI indicator is the number of received IS-IS hello messages.
  • the statistical result obtained at the first time T0 is 100
  • the statistical result obtained at the second time T1 is 150
  • the first network device determines a target KPI, where the target KPI is a KPI to be analyzed at a current time. After determining the target KPI, the first network device is configured according to statistics information of the first network device and other network devices in the network. Statistical information, the KPI change rate of the target KPI on each network device in the network is calculated. When the KPI change rate of the target KPI on a network device in the network is greater than or equal to the preset KPI change rate threshold, the network device is used. Determined as a failed network device. The failed network device may be one or more.
  • the target KPI determined here may be one or more.
  • the target KPI whose KPI change rate is greater than or equal to the preset KPI change rate threshold may also be one or multiple.
  • each network in the network is calculated.
  • the rate of change of the number of IS-IS routing protocol packets received by the device is greater than or equal to the first preset threshold
  • the rate of change of the number of IS-IS routing protocol packets received by the device is greater than or equal to the first preset threshold. Determine that the network device or devices are faulty network devices.
  • the first network device is configured according to the statistics information of the first network device.
  • the statistical information of the other network device when determining the network device that is faulty in the network, the statistics result of the packet corresponding to the internal gateway protocol of the target KPI transmitted between any two network devices in the network may be obtained, according to the statistical result.
  • the adjacency matrix represents a matrix of connections between any two network devices in the network. For example, when the statistics of the packets of the internal gateway protocol corresponding to the target KPI transmitted between the network device A and the network device B can be obtained, the network device A and the network device B can be considered as the network devices having the connection relationship, and the network is The statistical result of the target KPI of device A and network device B is taken as an element of the adjacency matrix.
  • the centrality of a network device refers to the extent to which a network device is at the core of the network, reflecting the importance of the network device in the network. In a network, the closer a network device is to a neighboring network device, the more frequent it is. The higher the core position of the network device is, the more important it is, that is, the higher the centrality of the network device. In the embodiment of the present invention, the network device in the network is determined by the degree of the centrality of the network device.
  • the first network device determines a target KPI, where the target KPI is a KPI to be analyzed at the current time.
  • the first network device is configured according to the statistics information of the first network device and other network devices.
  • the statistical information of the internal KG protocol corresponding to the target KPI transmitted between any two network devices in the network, and the adjacency matrix is generated according to the statistical result, and each network device can be determined according to the adjacency matrix.
  • the centrality determines the network device that has failed in the network according to the network device corresponding to the centrality value of each network device.
  • the network device corresponding to the largest centrality value among the centralities of all the network devices in the network is determined as the failed network device; for example, the minimum centrality value among the centralities of all the network devices in the network may be correspondingly The network device is determined to be the failed network device.
  • the target KPI determined here is a certain target KPI of the current time, that is, the current time is counted for one target KPI to obtain an adjacency matrix.
  • Table 1 shows the target KPI of the network device transmitted between any two network devices of network devices A, B, C, D, E, F, G, and H when receiving the number of IS-IS routing protocol packets. Receives the statistics of the number of IS-IS routing protocol packets and generates the adjacency matrix Array ij .
  • Array ij in Table 1 represents the statistical result of the target KPI transmitted between the network device i and the network device j on the network device i, wherein the statistical result of the target KPI transmitted between a certain network device and itself is zero.
  • the data of the third row and the first column shown in Table 1 indicates the statistical result of the number of IS-IS routing protocol packets sent by the network device A on the network device B.
  • the data of the third row and the third column in Table 1 is obviously much larger than the data of other rows and columns, which indicates that the network device corresponding to the third row and the third column B is closely related to other network devices in the network, that is, the network device B corresponding to the third row and the third column may be faulty, and then calculated according to the adjacency matrix shown in Table 1.
  • the centrality of each network device in the adjacency matrix is calculated to be the largest centrality of the network device B, so that the network device B can be determined to be a failed network device.
  • the calculation of the centrality of the i-th network device in the adjacency matrix may be: first calculating the eigenvector and the eigenvalue of the adjacency matrix, and obtaining the i-th component of the largest eigenvector corresponding to the eigenvalue, that is, the network may be obtained.
  • the centrality of the i-th network device may be: first calculating the eigenvector and the eigenvalue of the adjacency matrix, and obtaining the i-th component of the largest eigenvector corresponding to the eigenvalue, that is, the network may be obtained.
  • the centrality of the i-th network device may be: first calculating the eigenvector and the eigenvalue of the adjacency matrix, and obtaining the i-th component of the largest eigenvector corresponding to the eigenvalue, that is, the network may be obtained.
  • any network device in the network receives the fault information packet sent by other network devices in the network in a flooding manner. Therefore, any network device in the network can be obtained for the fault location.
  • the required information that is, the KPI of each network device in the network, can then determine the faulty network device in the network according to the KPI of each network device in the network, thereby solving the difficulty in collecting fault information after the network fails.
  • the problem is to achieve a fast and accurate positioning of the fault.
  • the protocol involved in the oscillating fault is usually the Interior Gateway Protocol (English: Interior Gateway Protocol, or "IGP").
  • IGP Interior Gateway Protocol
  • the common IGP protocol types are: IS-IS routing protocol, open type. Shortest Path First (English): Open Shortest Path First (OSPF) protocol, Routing Information Protocol (RIP), Enhanced Interior Gateway Routing Protocol (English): Enhanced Interior Gateway Routing Protocol (English: Enhanced Interior Gateway Routing Protocol) Referred to as "EIGRP").
  • the KPI of the IS-IS routing protocol packet is taken as an example, and the KPI of each network device in the embodiment of the present invention is described. However, the embodiment of the present invention is not limited thereto.
  • Table 2 shows the KPIs associated with IS-IS routing protocol packets.
  • the embodiment of the present invention only uses the KPI related to the IS-IS routing protocol packet as an example, but The embodiments of the present invention are not limited thereto. It should be understood that the KPI of the network device in the embodiment of the present invention is not limited to the KPI related to the IS-IS protocol, and may also be a KPI related to other protocol packets in the IGP protocol, for example, may be an OSPF protocol packet. The embodiment of the present invention is not limited to the KPI related to the RIP protocol packet or the EIGRP protocol packet.
  • a fault information message of a KPI carrying a network device is defined, and the fault information message may be a packet specifically used to carry a KPI of the network device, or may be based on an existing IGP protocol.
  • the extended packet is used to carry the KPI of the network device.
  • the packet is defined by a type-length-value ("TLV") field.
  • the KPI carrying the network device carries the KPI identifier of the network device, the KPI value of the network device, the system identifier, the KPI source system identifier, and the KPI destination system identifier.
  • the fault information packet type is a KPI for carrying the network device
  • the packet length is the length of the content of the packet.
  • the KPI identifier is used to indicate the different KPIs of the same network device.
  • the KPI value is the size of the KPI.
  • the system identifier is used to indicate the network device that sends the fault information packet.
  • the KPI source system identifier is used to indicate that the KPI carried in the fault information packet belongs to the KPI.
  • the network device, the KPI destination system identifier is used to indicate the network device receiving the KPI.
  • the system identifiers of the multiple fault information messages are the same, and are used to indicate that the multiple fault information messages are sent by the network device A.
  • the KPI source system identifier in the fault information packet of the KPI carrying the network device A is used to indicate that the fault information packet carries the KPI of the network device A
  • the KPI source system identifier in the fault information packet of the bearer network device C is used.
  • the KPI destination system identifier is used to indicate that the network device that receives the KPI is the network device B, so that the KPI source system identifier can be identified by the KPI source system when the KPI is received.
  • the fault information message of the KPI carrying the network device A is easily distinguished from the fault information packets.
  • the packet type is 1 byte
  • the packet length is 1 byte
  • the KPI identifier is 1 byte
  • the KPI value is 16 bytes
  • the system identifier is 6 bytes
  • the KPI source system identifier is 6 bytes.
  • the KPI destination system identifier is 6 bytes as an example.
  • the byte length of the KPI identifier and the KPI value in the embodiment of the present invention may not be limited thereto.
  • the KPI identifier may also be 2 bytes
  • the KPI value may also be 20 characters. Section.
  • Table 3 is used to carry the KPI packets of the network device.
  • the TLV field of the IGP protocol packet may be extended.
  • the TLV field of the LSP packet of the IS-IS routing protocol is extended.
  • the extended TLV field carries the KPI identifier of the network device, the KPI value of the network device, the system identifier, the KPI source system identifier, and the KPI destination system identifier.
  • the first network device may further send the fault information packet of the first network device to the other network devices in the network in a flooding manner, where the fault information packet includes the first network device Statistical letter
  • the statistics of the first network device include statistics of the first network device for one or more KPIs.
  • the first network device may send the fault information message of the first network device to other network devices in the network in a flooding manner according to the first preset periodicity, for example, the first network.
  • the first network device sends the fault information report of the first network device to the other network devices in the network every 120 seconds. Text.
  • the embodiment of the present invention is only an example in which the first preset period is 120 seconds. However, the embodiment of the present invention is not limited thereto, and the first preset period may be set according to different network devices.
  • the first network device may further send the same to the other network devices in the network in a flooding manner if the statistics of the first KPI of the first network device meet the preset condition.
  • the fault information packet of the first network device may further send the same to the other network devices in the network in a flooding manner if the statistics of the first KPI of the first network device meet the preset condition.
  • the first KPI is one or more KPIs of the first network device.
  • FIG. 3 is another schematic flowchart of a fault location method according to an embodiment of the present invention.
  • the first network device configures a KPI threshold of the first KPI of the first network device, and performs statistics on the first KPI according to a preset period to obtain a statistical result of the first KPI.
  • the statistical result of a KPI is compared with a KPI threshold set by the first network device for the first KPI.
  • the first network device When the statistic result of the first KPI is greater than or equal to the KPI threshold, the first network device sends the fault information packet of the first network device to other network devices in the network in a flooding manner, so as to facilitate other The network device determines, according to the statistics result of the first KPI included in the fault information packet sent by the first network device, whether the first network device is faulty.
  • the KPI threshold of the first KPI is corresponding to the threshold of the number of the received IS-IS routing protocol packets, and the receiving IS-IS routing protocol
  • the threshold of the number of packets may be a first threshold.
  • the first network device sends the fault information packet of the first network device to the other network devices in the network;
  • the first network device enters the next preset period to continue to count the number of received IS-IS routing protocol packets.
  • the first network device when the statistical result of a part of the KPIs in the plurality of KPIs that are counted by the first network device is greater than or equal to a KPI threshold corresponding to the partial KPIs, the first network device is in the network.
  • the fault information message sent by the other network device may only include the statistical result of the part of the KPI, or may include the statistical result of all the KPIs of the first network device that are counted in the preset period.
  • the preset period may be the same as the first preset period, or may be different, which is not limited by the embodiment of the present invention.
  • the KPI threshold set by the first network device for each KPI may be manually adjusted, or may be dynamically adjusted according to the statistical result of each KPI of the first network device collected when the first network device fails before.
  • the acquired statistics of one or more KPIs of the first network device are returned to the first network device, where the first network device is
  • the KPI threshold of the one or more KPIs may be dynamically adjusted according to the statistics of the one or more KPIs collected when the first network device fails.
  • the KPI threshold of the first network device By dynamically adjusting the KPI threshold of the first network device, a reasonable setting of the KPI threshold is achieved, which avoids Because the KPI threshold is set too low, the first network device frequently sends fault information packets to the network devices in the network, which causes waste of resources. It also avoids that the KPI threshold is set too high, resulting in lower than the KPI.
  • the statistics of the KPIs of the first network device of the threshold are not sent to other network devices in the network in time, so that other network devices in the network cannot obtain the statistics of the KPIs of the first network device, and therefore cannot be used for the first network. Whether the device is faulty for accurate positioning.
  • a part of the network does not support the transmission of the fault information message.
  • the failure information message sent by the network device adjacent to the network device that does not support the transmission of the fault information message carries the statistics of one or more KPIs of the network device that does not support the transmission of the fault information message. result.
  • the third network device in the network does not support the transmission of the fault information message, and the fault information packet sent by the second network device in the network may carry the statistical result of one or more KPIs of the third network device.
  • the second network device is adjacent to the third network device.
  • FIG. 4 is a schematic diagram of a network that can be applied to an embodiment of the present invention.
  • the network device C in the network does not support the transmission of the fault information message.
  • the network device B and/or the network device E adjacent to the network device C can obtain the network through pre-configuration.
  • the statistical result of one or more KPIs of the device C thereby transmitting the statistical result of one or more KPIs of the network device C to each network device in the network through the network device B or the network device E, in the network
  • logging in to other network devices except the network device C in the network can locate the faulty network device in the network.
  • This fault location method and the network shown in Figure 2 The method for locating device faults is the same. To avoid repetition, we will not repeat them here.
  • the network device that does not support the transmission of the fault information message may be more than one. In the embodiment of the present invention, only one network device in the network does not support the transmission of the fault information packet, but the embodiment of the present invention is not limited thereto. .
  • the network location of the network device that does not support the transmission of the fault information message is separated in the network, in which case the network may be the first divided in advance.
  • FIG. 5 is a schematic diagram of another network that can be applied to an embodiment of the present invention.
  • the network device C and the network device D are network devices that do not support fault information transmission, and the network locations of the network device C and the network device D divide the network into a first subnet and a second subnet.
  • the network may be a subnet in the first subnet and the second subnet, so the method for performing network device fault location in the first subnet and the second subnet and the network device fault shown in FIG. 2
  • the positioning method is the same. To avoid repetition, it will not be described here.
  • FIG. 6 is a schematic block diagram of a network device 600 in accordance with an embodiment of the present invention. As shown in FIG. 6, the network device 600 includes:
  • the receiving module 610 is configured to receive a fault information packet sent by the network device other than the network device 600 in a flooding manner, where each fault information packet includes a network device that sends the fault information packet.
  • the statistics information of the packets of the internal gateway protocol, and the statistics information of each network device includes the statistical result of the network device for one or more KPIs;
  • the determining module 620 is configured to determine, according to the statistical information of the network device 600 and the statistical information of other network devices, the faulty network device in the network.
  • the network device receives, by using the receiving module 610, the fault information packet sent by the network device other than the network device in a flooding manner, and can quickly collect information required for fault location, that is, the network device.
  • the statistical results of one or more KPIs accelerate the process of fault location and shorten the fault location time.
  • the network device 600 further includes:
  • the obtaining module 630 is configured to obtain statistics information of the network device 600, where the statistical information includes a statistical result of one or more KPIs of the first network device;
  • the sending module 640 is configured to send the fault information packet of the network device 600 to the other network devices in the network in a flooding manner, where the fault information packet of the network device 600 includes the statistical information of the network device 600.
  • the sending module 640 is configured to send the fault information packet of the network device 600 to other network devices in the network in a flooding manner according to the first preset period.
  • the sending module 640 is further configured to send, in a flooding manner, to other network devices in the network, if the statistical result of the first KPI of the network device 600 meets a preset condition.
  • the fault information message of the network device 600 is further configured to send, in a flooding manner, to other network devices in the network, if the statistical result of the first KPI of the network device 600 meets a preset condition.
  • the preset condition includes that the statistical result of the first KPI is greater than or equal to a KPI threshold set by the network device for the first KPI.
  • the preset condition may also be a threshold of a cosine angle
  • the cosine angle may be a vector of all KPIs of the network device 600 according to a preset period and all of the network device 600.
  • the determining module 620 is specifically configured to determine a target KPI, and calculate, according to the statistical information of the network device and the statistical information of the other network device, each network device in the network. a KPI change rate of the target KPI; selecting a failed network device from the network device and the other network device according to a KPI change rate of a target KPI on each network device in the network, wherein the occurrence occurs
  • the KPI change rate of the target KPI of the failed network device is greater than or equal to the preset KPI change rate threshold.
  • the determining module 620 is specifically configured to determine a target KPI, and obtain, according to the statistical information of the network device and the statistical information of the other network device, any two network devices in the network. And generating, according to the statistics result of the packet of the internal gateway protocol, the target KPI transmitted between the two network devices, generating an adjacency matrix; The adjacency matrix determines a centrality of each network device in the network; and determines a faulty network device in the network according to the centrality of each network device.
  • the fault information packet is a packet specifically used for carrying a KPI of the network device, and the packet specifically used for carrying the KPI of the network device is defined according to a TLV field.
  • the fault information packet may also be an IGP protocol-based packet, where the fault information packet is extended based on a TLV field of the IGP protocol packet.
  • the fault information message sent by the first network device of the at least one network device carries a statistical result of one or more KPIs of the third network device adjacent to the second network device
  • the third network device is a network device that does not support the transmission of the fault information message.
  • the network is a pre-divided first subnet and a subnet in the second subnet, and the first subnet and the second subnet are based on not supporting the fault information message transmission.
  • the network location of the network device is divided.
  • network device 600 may correspond to the network device in the embodiment of the present invention, and the foregoing and other operations and/or functions of the respective modules in the network device 600 are implemented in FIG. 2 to FIG. 5, respectively.
  • the corresponding processes of the various methods are not repeated here for the sake of brevity.
  • FIG. 7 is a schematic structural diagram of a network device 700 according to an embodiment of the present invention.
  • the network device 700 includes a memory 710 and a processor 720 that communicate with one another via internal connection paths to communicate control and/or data signals.
  • the memory 710 is configured to store program code
  • the processor 720 is configured to invoke the program code to implement the methods in the various embodiments of the present invention.
  • the processor 720 may be a central processing unit (English: central processing unit, referred to as "CPU"), a network processor (English: network processor, referred to as "NP”) or a combination of CPU and NP. .
  • the processor may further include a hardware chip.
  • the hardware chip may be an application-specific integrated circuit ("ASIC"), a programmable logic device (“PLD”) or a combination thereof.
  • ASIC application-specific integrated circuit
  • PLD programmable logic device
  • An embodiment of the present invention provides a computer readable medium for storing computer program code, the computer program comprising instructions for performing the fault location method of the embodiment of the present invention described above in FIGS. 2 to 5.
  • the readable medium may be a read-only memory (English: read-only memory, abbreviated as "ROM”) or a random access memory (English: random access memory, abbreviated as "RAM”). limit.
  • the disclosed systems, devices, and methods may be implemented in other manners.
  • the device embodiments described above are merely illustrative.
  • the division of the unit is only a logical function division.
  • there may be another division manner for example, multiple units or components may be combined or Can be integrated into another system, or some features can be ignored or not executed.
  • the mutual coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection through some interface, device or unit, and may be in an electrical, mechanical or other form.
  • the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of the embodiment.
  • each functional unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Pure & Applied Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Optimization (AREA)
  • Computational Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Probability & Statistics with Applications (AREA)
  • Algebra (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Operations Research (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Environmental & Geological Engineering (AREA)
  • Databases & Information Systems (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

本发明实施例提供了一种故障定位方法和网络设备,该方法包括:第一网络设备接收网络中除第一网络设之外的其他网络设备以泛洪方式发送的故障信息报文,每个故障信息报文包括发送所述故障信息报文的网络设备关于内部网关协议的报文的统计信息,每个网络设备的统计信息包括所述网络设备对一个或多个关键性能指标KPI的统计结果;根据第一网络设备的统计信息以及其他网络设备的统计信息,确定所述网络中发生故障的网络设备。本发明实施例通过接收网络设备以泛洪方式发送的故障信息报文,能够快速收集故障定位所需的信息,即网络设备的KPI,从而加速了故障定位的过程,缩短了故障定位时间。

Description

故障定位方法和网络设备 技术领域
本发明实施例涉及通信领域,并且更具体的,涉及一种故障定位方法和网络设备。
背景技术
网络经常发生故障,当故障发生时,如果不及时定位出发生故障的网络设备或者链路并处理,该故障就可能扩散至整个网络,即发生网络震荡。网络震荡会引起全网瘫痪和业务中断。例如,假设某路由设备发生时钟故障,导致该路由设备的系统时间比网络中的其它路由设备的系统时间快数百倍,这样会引发整个网络的其他路由设备反复删除和生成该路由设备对应的表项,严重消耗了其他路由设备的资源,当其他路由设备的资源被耗尽时,会引发全网瘫痪和全网业务中断。
网络发生故障时,如何定位发生故障的网络设备是一个难题。以震荡类故障为例,震荡类故障的定位存在如下两个问题:
一是震荡类故障发生的时候,难以采集整个网络中的网络设备的故障信息,严重影响故障分析的进程。因为整个网络中,网络设备众多,部分网络设备不支持网管维护,部分网络设备不在网管部署的范围内,导致震荡类故障发生时,一般需要通过远程登陆服务标准协议(英文:Telnet)登陆到不同的网络设备上,分别收集登陆的网络设备的故障信息,例如,收集全网所有路由器的中央处理单元(英文:central processing unit,简称为“CPU”)的状态信息时,需要登陆到整个网络中的每个路由器,对每个路由器的CPU的状态信息分别进行采集,且该Telnet协议只能分组、串行登陆到不同网络设备上进行信息的采集。所以,在这种情况下,需要多台网络设备远程接入网络,且故障信息采集和定位的效率低。
二是在分析震荡类故障的时候,需要在众多的网络设备的海量信息中做人工排查,这种方式需要运维人员具有丰富的设备运营维护经验,且分析效率低下,导致故障持续的时间长,业务受影响面积大。
发明内容
本发明实施例提供一种故障定位方法和网络设备,能够对网络中发生故障的网络设备进行快速、准确的定位。
第一方面,提供一种故障定位的方法。第一网络设备接收除所述第一网络设备外的其他网络设备以泛洪方式发送的故障信息报文,每个故障信息报文包括发送所述故障信息报文的网络设备关于内部网关协议的报文的统计信息,每个网络设备的统计信息包括所述网络设备对一个或多个关键性能指标KPI的统计结果;所述第一网络设备根据所述第一网络设备的统计信息以及所述其他网络设备统计信息,确定所述网络中发生故障的网络设备。
通过接收网络设备以泛洪方式发送的故障信息报文,能够快速收集故障定位所需的 信息,即网络设备的KPI,从而加速了故障定位的过程,缩短了故障定位时间。
结合第一方面,在第一方面的第一种实现方式中,所述第一网络设备获取所述第一网络设备的统计信息,所述统计信息包括所述第一网络设备的一个或多个KPI的统计结果;所述第一网络设备以泛洪方式向所述其他网络设备发送所述第一网络设备的故障信息报文,所述第一网络设备的故障信息报文包含所述第一网络设备的统计信息。
结合第一方面的第一种实现方式,在第一方面的第二种实现方式中,所述第一网络设备以泛洪方式向所述其他网络设备发送所述第一网络设备的故障信息报文,包括:所述第一网络设备按照第一预设周期,以泛洪方式向所述其他网络设备发送所述第一网络设备的故障信息报文;或在所述第一网络设备的第一KPI的统计结果满足预设条件的情况下,所述第一网络设备以泛洪方式向所述其他网络设备发送所述第一网络设备的故障信息报文。
结合第一方面的第二种实现方式,在第一方面的第三种实现方式中,所述满足预设条件包括所述第一网络设备的所述第一KPI的统计结果大于或者等于所述第一网络设备为所述第一KPI设置的KPI阈值。
通过以第一网络设备的KPI满足预设条件这种泛洪方式,向其他网络设备发送第一网络设备的故障信息报文,节省了故障信息报文对网络带宽的资源占用。
结合第一方面或第一方面的第一种至第三种实现方式中的任一种,在第一方面的第四种实现方式中,所述第一网络设备根据所述第一网络设备的统计信息以及其他网络设备的统计信息,确定所述网络中的发生故障的网络设备,包括:所述第一网络设备确定目标KPI;所述第一网络设备根据所述第一网络设备的统计信息以及所述其他网络设备的统计信息,计算所述网络中每个网络设备上所述目标KPI的KPI变化率;所述第一网络设备根据所述网络中的每个网络设备上目标KPI的KPI变化率,从所述第一网络设备和所述其他网络设备中选取发生故障的网络设备,其中所述发生故障的网络设备的目标KPI的KPI变化率大于或等于预设KPI变化率阈值。
通过对网络设备的KPI变化率的分析,能够快速地定位发生故障的网络设备。
结合第一方面或第一方面的第一种至第三种实现方式中的任一种,在第一方面的第五种实现方式中,所述第一网络设备根据所述第一网络设备的统计信息和所述其他网络设备的统计信息,确定所述网络中的发生故障的网络设备,包括:所述第一网络设备确定目标KPI;所述第一网络设备根据所述第一网络设备的统计信息以及所述其他网络设备的统计信息,获取所述网络中任意两个网络设备之间传输的所述目标KPI对应内部网关协议的报文的统计结果;根据所述任意两个网络设备之间传输的所述目标KPI对应内部网关协议的报文的统计结果,生成邻接矩阵;根据所述邻接矩阵,确定所述网络中的每个网络设备的中心度;根据所述每个网络设备的中心度,确定所述网络中的发生故障的网络设备。
通过任意两个网络设备之间传输的所述目标KPI对应内部网关协议报文的统计结果,生成邻接矩阵,并计算生成的邻接矩阵的中心度,能够准确的定位发生故障的网络设备。
结合第一方面或第一方面的第一种至第五种实现方式中的任一种,在第一方面的第六种实现方式中,所述故障信息报文是专门用于承载网络设备的KPI的报文。
结合第一方面或第一方面的第一种至第五种实现方式中的任一种,在第一方面的第 七种实现方式中,所述故障信息报文是基于内部网关协议IGP的报文。
结合第一方面或第一方面的第一种至第七种实现方式中的任一种,在第一方面的第八种实现方式中,所述其他网络设备包括第二网络设备,所述第二网络设备发送的故障信息报文携带与所述第二网络设备相邻的第三网络设备的一个或多个KPI的统计结果,所述第三网络设备是不支持故障信息报文传输的网络设备。
第二方面,提供一种网络设备,包括用于执行第一方面中的方法的一个或多个模块。
第三方面,提供一种网络设备,包括存储器和处理器,所述存储器用于存储程序代码,所述处理器用于调用所述程序代码以实现上述第一方面及所述第一方面的各实现方式中的方法。
第四方面,提供一种计算机可读介质,所述计算机可读介质用于存储可被所述网络设备执行的程序代码,所述程序代码包括用于执行上述第一方面及所述第一方面的各实现方式中的方法的指令。
附图说明
图1是根据本发明实施例的应用场景的示意性架构图。
图2是根据本发明实施例的故障定位方法的示意性流程图。
图3是根据本发明实施例的故障定位方法的另一示意性流程图。
图4是可应用于本发明实施例的网络示意图。
图5是可应用于本发明实施例的另一网络示意图。
图6是根据本发明实施例的网络设备的示意性框图。
图7是根据本发明实施例的网络设备的示意性结构图。
具体实施方式
下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述。
图1示出了本发明实施例的一种应用场景的示意性架构图。如图1所示,网络100包括网络设备A、B、C、D、E、F、G、H,该多个网络设备之间相互连接。该网络100中的网络设备可以是路由器、交换机、集线器、网桥、网关或其他类型的网络设备,其中,各网络设备的类型可以相同也可以不同。
在该网络100中的网络设备发生故障时,该网络100中的网络设备可以接收网络中的除该网络设备之外的其他网络设备以泛洪方式发送的故障信息报文,每个故障信息报文包括发送所述故障信息报文的网络设备的关键性能指标(英文:key performance indicator,简称为“KPI”),该网络中的任意一个网络设备可以根据该网络中的所有网络设备的KPI快速、准确地定位出网络中发生故障的网络设备。
基于图1所示的应用场景,本发明实施例提出一种故障定位方法,能够快速收集用于故障定位的信息,提高故障定位效率,下面结合图2对所述方法进行详细描述。
图2是根据本发明实施例的故障定位方法200的示意性流程图,该方法200可以由第一网络设备执行,该第一网络设备可以是网络中的任意一个网络设备,例如,该第一网络设备可以是网络100中的网络设备A或该网络100中的其他网络设备,如图2的所示的方法200包括:
210,第一网络设备接收网络中除所述第一网络设备外的其他网络设备以泛洪方式发送的故障信息报文,每个故障信息报文包括发送所述故障信息报文的网络设备关于内部网关协议的报文的统计信息,每个网络设备的统计信息包括所述网络设备对一个或多个关键性能指标KPI的统计结果。
220,所述第一网络设备根据所述第一网络设备的统计信息以及其他网络设备的统计信息,确定该网络中的发生故障的网络设备。
具体而言,网络中的任意一个网络设备,接收该网络中除该网络设备之外的其他网络设备以泛洪方式发送的故障信息报文,每个故障信息报文中包括发送该故障信息报文的网络设备关于内部网关协议的报文的统计信息,该任意一个网络设备根据该任意一个网络设备的统计信息和网络中的其他网络设备的统计信息,确定网络中发生故障的网络设备,从而解决网络中的网络设备发生故障时故障信息采集困难的问题,快速、准确地定位网络故障。
应理解,泛洪(英文:flooding)是一种报文传递(英文:delivery)技术,网络设备以泛洪方式发送故障信息报文是指网络设备将故障信息报文通过该网络设备的所有接口发送出去。
网络设备相互之间以泛洪方式发送故障信息报文之后,每个网络设备均会收到其他网络设备的故障信息报文,从而收集其他网络设备的KPI,当网络设备发生故障时,运维人员可以登录任意一个网络设备,并根据该网络设备的KPI,以及该网络设备收集到的其他网络设备的KPI进行故障定位。
在步骤220中,所述第一网络设备根据所述第一网络设备的统计信息以及所述其他网络设备的统计信息,确定网络中发生故障的网络设备时,可以根据网络设备的KPI变化率,确定网络中发生故障的网络设备。
应理解,某个KPI的KPI变化率为该KPI在第一时间的统计结果相对于该KPI在第二时间的统计结果的变化率,假设KPI指标为接收IS-IS hello报文的个数,在第一时间T0得到的统计结果为100个,在第二时间T1得到的统计结果为150个,则KPI变化率为(150-100)/100=50%。
具体的,该第一网络设备确定目标KPI,所述目标KPI为当前时刻要进行分析的KPI,在确定目标KPI之后,该第一网络设备根据第一网络设备的统计信息以及网络中其他网络设备的统计信息,计算网络中每个网络设备上目标KPI的KPI变化率,在所述网络中某个网络设备上目标KPI的KPI变化率大于或等于预设KPI变化率阈值时,将该网络设备确定为发生故障的网络设备。发生故障的网络设备可能为一个或多个。
此处确定的目标KPI可以为一个或多个。该KPI变化率大于或等于预设KPI变化率阈值的目标KPI也可以为一个,也可以为多个。
例如,在第一网络设备确定目标KPI为接收中间系统到中间系统(英文:Intermediate system to intermediate system,简称为“IS-IS”)路由协议报文的个数时,计算网络中的每个网络设备上接收IS-IS路由协议报文的个数的变化率,在某个或者某几个网络设备上接收IS-IS路由协议报文的个数的变化率大于或等于第一预设阈值时,确定该某个或某几个网络设备为发生故障的网络设备。
可选地,对于不同的目标KPI可以有不同的预设KPI变化率阈值。
可选地,在一些实施例中,所述第一网络设备根据所述第一网络设备的统计信息以 及所述其他网络设备的统计信息,确定网络中发生故障的网络设备时,可以获取网络中任意两个网络设备之间传输的目标KPI对应内部网关协议的报文的统计结果,根据该统计结果生成邻接矩阵;根据该邻接矩阵,确定每个网络设备的中心度;根据每个网络设备的中心度,确定网络中发生故障的网络设备。
邻接矩阵表示网络中任意两个网络设备之间连接关系的矩阵。例如,在可以得到网络设备A和网络设备B之间传输的目标KPI对应内部网关协议的报文的统计结果时,则可以将网络设备A和网络设备B认为有连接关系的网络设备,将网络设备A和网络设备B的目标KPI的统计结果作为邻接矩阵的元素。
网络设备的中心度是指某一个网络设备在网络中处于核心地位的程度,反映了该网络设备在网络中的重要程度。在网络中,某个网络设备与周边的网络设备联系越紧密、交互越频繁,该网络设备在网络中处于的核心地位越高,也就越重要,即该网络设备的中心度越高。本发明实施例以网络设备中心度的高低来决定网络中发生故障的网络设备。
具体的,第一网络设备确定目标KPI,该目标KPI为当前时刻要进行分析的KPI,在第一网络设备确定目标KPI之后,该第一网络设备根据第一网络设备的统计信息以及其他网络设备的统计信息,获取网络中任意两个网络设备之间传输的所述目标KPI对应内部网关协议的报文的统计结果,并根据该统计结果生成邻接矩阵,根据该邻接矩阵可以确定每个网络设备的中心度,根据该每个网络设备的中心度值对应的网络设备确定网络中发生故障的网络设备。
例如,将网络中所有网络设备的中心度中最大的中心度值对应的网络设备确定为发生故障的网络设备;又例如,还可以将网络中所有网络设备的中心度中最小的中心度值对应的网络设备确定为发生故障的网络设备。
应理解,此处确定的目标KPI为当前时刻的一个确定的目标KPI,即当前时刻对一个目标KPI进行统计就可以得到一个邻接矩阵。
表1为网络设备的目标KPI为接收IS-IS路由协议报文的个数时,根据网络设备A、B、C、D、E、F、G、H中任意两个网络设备之间传输的接收IS-IS路由协议报文的个数的统计结果,生成的邻接矩阵Arrayij。表1中Arrayij代表网络设备i和网络设备j之间传输的目标KPI在网络设备i上的统计结果,其中,某个网络设备与自身之间传输的目标KPI的统计结果为0。例如,表1所示的第三行第一列的数据表示网络设备B接收网络设备A发送的IS-IS路由协议报文的个数在网络设备B上的统计结果。
如表1所示,可以看出,表1中的第三行和第三列的数据较其他行和列的数据来说显然大很多,这说明该第三行和第三列对应的网络设备B与网络中的其他网络设备之间的联系较紧密,也就是说,该第三行和第三列所对应的网络设备B有可能发生了故障,再根据表1所示的邻接矩阵,计算该邻接矩阵中每个网络设备的中心度,计算得到网络设备B的中心度最大,从而可以确定该网络设备B为发生故障的网络设备。
例如,计算邻接矩阵中第i个网络设备的中心度可以为:先计算该邻接矩阵的特征向量和特征值,获取特征值对应的最大的特征向量中的第i个分量,即可以得到网络中第i个网络设备的中心度。
表1网络设备A、B、C、D、E、F、G、H的邻接矩阵
Arrayij A B C D E F G H
A 0 23305 5 5 5 5 4 5
B 29109 0 81120 82505 123443 81847 81236 82268
C 7 73020 0 92 141 76 215 68
D 7 74395 1 0 1 278 1 175
E 7 108445 163 74 0 64 150 68
F 7 73660 1 214 1 0 1 136
G 7 73116 216 91 170 74 0 63
H 7 74111 1 200 1 151 1 0
本发明实施例中,网络中的任意一个网络设备接收该网络中其他网络设备以泛洪方式发送的故障信息报文,因此,登录该网络中的任意一个网络设备即可获取用于故障定位所需的信息,即所述网络中每个网络设备的KPI,然后可以根据所述网络中每个网络设备的KPI确定该网络中发生故障的网络设备,从而解决网络发生故障后故障信息采集困难的问题,实现了故障的快速、准确定位。
在网络发生故障造成网络震荡时,该震荡类故障涉及的协议一般为内部网关协议(英文:Interior Gateway Protocol,简称为“IGP”),常见的IGP协议类型有:IS-IS路由协议、开放式最短路径优先(英文:Open Shortest Path First,简称为“OSPF”)协议、路由信息协议(英文:Routing Information Protocol,简称为“RIP”)、增强内部网关路由协议(英文:Enhanced Interior Gateway Routing Protocol,简称为“EIGRP”)。本发明实施例以IS-IS路由协议报文相关的KPI为例,对本发明实施例中各网络设备的KPI进行说明,但本发明实施例并不限定于此。
表2所示,为与IS-IS路由协议报文相关的KPI。
表2与IS-IS路由协议报文相关的KPI
Figure PCTCN2017105881-appb-000001
应理解,本发明实施例仅以上述与IS-IS路由协议报文相关的KPI为例进行说明,但 本发明实施例并不限定于此。还应理解,本发明实施例中网络设备的KPI并不限于与IS-IS协议相关的KPI,还可以是与IGP协议中的其他协议报文相关的KPI,例如,可以为与OSPF协议报文、RIP协议报文或EIGRP协议报文相关的KPI,本发明实施例并不限定于此。
在一些实施例中,定义了一种承载网络设备的KPI的故障信息报文,该故障信息报文可以是专门用于承载网络设备的KPI的报文,也可以是基于现有的IGP协议报文进行扩展的用于承载网络设备的KPI的报文。
该故障信息报文是专门用于承载网络设备的KPI的报文时,该报文基于类型-长度-值(英文:type-length-value,简称为“TLV”)字段定义,该专门用于承载网络设备的KPI的报文携带网络设备的KPI标识、网络设备的KPI数值以及系统标识、KPI源系统标识以及KPI目的系统标识。例如,如表3所示,为该专门用于承载网络设备的KPI的故障信息报文,该故障信息报文类型为用于承载网络设备的KPI,报文长度为该报文内容的长度,KPI标识用于指示同一网络设备的不同KPI,KPI数值为KPI的大小,系统标识用于指示发送该故障信息报文的网络设备,KPI源系统标识用于指示故障信息报文中承载的KPI属于的网络设备,KPI目的系统标识用于指示接收KPI的网络设备。
例如,在网络设备A向网络设备B发送多个故障信息报文时,该多个故障信息报文的系统标识相同,均用于指示该多个故障信息报文是由网络设备A发送的,承载网络设备A的KPI的故障信息报文中的KPI源系统标识用于指示该故障信息报文承载的是网络设备A的KPI,承载网络设备C的故障信息报文中的KPI源系统标识用于指示该故障信息报文承载的是网络设备C的KPI,KPI目的系统标识用于指示接收KPI的网络设备为网络设备B,这样,在统计网络设备A的KPI时可以通过KPI源系统标识从诸多故障信息报文中轻松的分辨出承载网络设备A的KPI的故障信息报文。
表3中以报文类型为1字节,报文长度为1字节,KPI标识为1字节,KPI数值为16字节,系统标识为6字节,KPI源系统标识为6字节,KPI目的系统标识为6字节为例,但本发明实施例对于KPI标识和KPI数值的字节长度可以不限定于此,例如,KPI标识也可以是2字节,KPI数值也可以是20字节。
表3专门用于承载网络设备的KPI的报文
Figure PCTCN2017105881-appb-000002
该故障信息报文是基于现有的IGP协议报文进行扩展的报文时,可以对该IGP协议报文的TLV字段进行扩展。例如,对IS-IS路由协议的LSP报文的TLV字段进行扩展,扩展的TLV字段携带网络设备的KPI标识、网络设备的KPI数值、系统标识、KPI源系统标识以及KPI目的系统标识。
可选地,在一些实施例中,第一网络设备还可以以泛洪方式向网络中的其他网络设备发送该第一网络设备的故障信息报文,该故障信息报文包含该第一网络设备的统计信 息,该第一网络设备的统计信息包含该第一网络设备对一个或多个KPI的统计结果。
可选地,在一些实施例中,该第一网络设备可以按照第一预设周期性以泛洪方式向网络中的其他网络设备发送第一网络设备的故障信息报文,例如,第一网络设备向网络中的其他网络设备发送第一网络设备的故障信息报文的周期为120秒时,该第一网络设备每120秒向网络中的其他网络设备发送该第一网络设备的故障信息报文。本发明实施例仅以该第一预设周期为120秒为例,但本发明实施例并不限定于此,该第一预设周期可以根据不同网络设备设定。
可选地,在一些实施例中,第一网络设备还可以在该第一网络设备的第一KPI的统计结果满足预设条件的情况下,以泛洪方式向网络中的其他网络设备发送该第一网络设备的故障信息报文。
可选地,该满足预设条件包括所述第一KPI的统计结果大于或者等于所述第一网络设备为所述第一KPI设置的KPI阈值。
其中,所述第一KPI为所述第一网络设备的一个或多个KPI。
图3是根据本发明实施例的故障定位方法的另一示意性流程图。如图3所示,该第一网络设备配置该第一网络设备的第一KPI的KPI阈值,并对该第一KPI按照预设周期进行统计,得到该第一KPI的统计结果,将该第一KPI的统计结果与该第一网络设备为该第一KPI设置的KPI阈值进行比较。在该第一KPI的统计结果大于或等于该KPI阈值时,该第一网络设备以泛洪方式向网络中的其他网络设备发送该第一网络设备的故障信息报文,以便于网络中的其他网络设备根据该第一网络设备发送的故障信息报文中包含的该第一KPI的统计结果确定该第一网络设备是否发生故障。
例如,在该第一KPI为接收IS-IS路由协议报文的个数时,该第一KPI的KPI阈值相应为接收IS-IS路由协议报文的个数阈值,该接收IS-IS路由协议报文的个数阈值可以为第一阈值。当该第一网络设备接收的IS-IS路由协议报文的个数大于或等于第一阈值时,该第一网络设备向网络中的其他网络设备发送该第一网络设备的故障信息报文;在该第一网络设备接收的IS-IS路由协议报文的个数小于第一阈值时,该第一网络设备进入下一个预设周期继续统计接收IS-IS路由协议报文的个数。
可选地,在一些实施例中,在该第一网络设备统计的多个KPI中有部分KPI的统计结果大于或等于该部分KPI分别对应的KPI阈值时,该第一网络设备向网络中的其他网络设备发送的故障信息报文可以仅包含该部分KPI的统计结果,或者可以包含在该预设周期内统计的该第一网络设备的所有KPI的统计结果。
可理解,对该第一网络设备的按照预设周期进行KPI统计时,该预设周期可以与第一预设周期相同,也可以不同,本发明实施例对此不做限定。
该第一网络设备为每个KPI设置的KPI阈值可以通过人工经验设定,也可以根据该第一网络设备之前发生故障时收集的该第一网络设备的每个KPI的统计结果进行动态调整。
例如,如图3所示,在定位出第一网络设备为故障设备后,将获取的该第一网络设备的一个或多个KPI的统计结果返回给第一网络设备,在该第一网络设备下次配置该一个或多个KPI的KPI阈值时,可以根据所述第一网络设备发生故障时收集的该一个或多个KPI的统计结果对该一个或多个KPI的KPI阈值进行动态调整。
通过对第一网络设备的KPI阈值的动态调整,实现该KPI阈值的合理设定,避免了 因为KPI阈值设定的过低,导致该第一网络设备频繁向网络中的网络设备发送故障信息报文,造成资源浪费;同时也避免了因为该KPI阈值设置的过高,导致低于该KPI阈值的该第一网络设备的KPI的统计结果没有及时发送给网络中的其他网络设备,使得网络中的其他网络设备不能获取该第一网络设备的KPI的统计结果,因此不能对该第一网络设备是否发生故障进行准确定位。
可选的,在一些实施例中,网络中存在部分网络设备不支持故障信息报文传输的情况。在这种情况下,与该不支持故障信息报文传输的网络设备相邻的网络设备发送的故障信息报文中携带该不支持故障信息报文传输的网络设备的一个或多个KPI的统计结果。例如,网络中的第三网络设备不支持故障信息报文传输,该网络中的第二网络设备发送的故障信息报文可以携带该第三网络设备的一个或多个KPI的统计结果,该第二网络设备和第三网络设备相邻。
图4是可应用于本发明实施例的网络示意图。如图4所示,该网络中的网络设备C不支持故障信息报文传输,在这种情况下,与该网络设备C相邻的网络设备B和/或网络设备E可以通过预先配置获取网络设备C的一个或多个KPI的统计结果,从而通过网络设备B或网络设备E将该网络设备C的一个或多个KPI的统计结果发送给该网络中的每个网络设备,在网络中的某些网络设备发生故障时,登录该网络中的除网络设备C之外的其他网络设备就可以对网络中发生故障的网络设备进行故障定位,这种故障定位的方法与图2所示的网络设备故障定位的方法相同,为避免重复,在此不再赘述。
应理解,该不支持故障信息报文传输的网络设备可以不止一个,本发明实施例仅以网络中存在一个网络设备不支持故障信息报文传输为例,但本发明实施例并不限定于此。
可选的,在一些实施例中,网络中存在不支持故障信息报文传输的网络设备的网络位置将该网络分隔开的情况,在这种情况下,该网络可以是预先划分的第一子网和第二子网中的一个子网,该第一子网和第二子网基于该不支持故障信息报文传输的网络设备的网络位置而划分。
图5是可应用于本发明实施例的另一网络示意图。如图5所示,网络设备C和网络设备D为不支持故障信息传输的网络设备,该网络设备C和网络设备D的网络位置将该网络划分为第一子网和第二子网,可理解,该网络可以是该第一子网和第二子网中的一个子网,所以该第一子网和第二子网中进行网络设备故障定位的方法与图2所示的网络设备故障定位的方法相同,为避免重复,在此不再赘述。
上文结合图2至图5,详细的描述了本发明实施例的方法实施例,下文将结合图6和图7,详细描述本发明实施例的网络设备的实施例,应理解,网络设备实施例与方法实施例相互对应,类似的描述可以参照方法实施例。
图6是根据本发明实施例的网络设备600的示意性框图。如图6所示,该网络设备600包括:
接收模块610,用于接收网络中除所述网络设备600之外的其他网络设备以泛洪方式发送的故障信息报文,每个故障信息报文包括发送所述故障信息报文的网络设备关于内部网关协议的报文的统计信息,每个网络设备的统计信息包括所述网络设备对一个或多个KPI的统计结果;
确定模块620,用于根据网络设备600的统计信息以及其他网络设备的统计信息,确定网络中的发生故障的网络设备。
在本发明实施例中,该网络设备通过接收模块610接收除该网络设备之外的其他网络设备以泛洪方式发送的故障信息报文,能够快速收集故障定位所需的信息,即网络设备的一个或多个KPI的统计结果,从而加速了故障定位的过程,缩短了故障定位时间。
可选地,该网络设备600还包括:
获取模块630,用于获取该网络设备600的统计信息,所述统计信息包括所述第一网络设备的一个或多个KPI的统计结果;
发送模块640,用于以泛洪方式向所述网络中的其他网络设备发送该网络设备600的故障信息报文,该网络设备600的故障信息报文包含该网络设备600的统计信息。
在本发明实施例中,该发送模块640具体用于按照第一预设周期,以泛洪方式向网络中的其他网络设备发送该网络设备600的故障信息报文。
可选地,在一些实施例中,该发送模块640还用于在该网络设备600的第一KPI的统计结果满足预设条件的情况下,以泛洪方式向网络中的其他网络设备发送该网络设备600的故障信息报文。
可选地,在一些实施例中,该预设条件包括所述第一KPI的统计结果大于或者等于所述网络设备为所述第一KPI设置的KPI阈值。
可选地,在一些实施例中,该预设条件也可以是余弦夹角的阈值,该余弦夹角可以是按照预设周期统计的网络设备600的所有KPI的向量与该网络设备600的所有KPI阈值的向量的余弦夹角,或者是该按照预设周期统计的该网络设备600的部分KPI的向量与该网络设备600的部分KPI阈值的向量的余弦夹角。
可选地,在一些实施例中,该确定模块620具体用于确定目标KPI;根据所述网络设备的统计信息以及所述其他网络设备的统计信息,计算所述网络中的每个网络设备上所述目标KPI的KPI变化率;根据所述网络中的每个网络设备上目标KPI的KPI变化率,从所述网络设备和所述其他网络设备中选取发生故障的网络设备,其中所述发生故障的网络设备的目标KPI的KPI变化率大于或等于预设KPI变化率阈值。
可选地,在一些实施例中,该确定模块620具体用于确定目标KPI;根据所述网络设备的统计信息以及所述其他网络设备的统计信息,获取所述网络中任意两个网络设备之间传输的所述目标KPI对应内部网关协议的报文的统计结果;根据所述任意两个网络设备之间传输的所述目标KPI对应内部网关协议的报文的统计结果,生成邻接矩阵;根据所述邻接矩阵,确定所述网络中的每个网络设备的中心度;根据所述每个网络设备的中心度,确定所述网络中的发生故障的网络设备。
可选地,在一些实施例中,该故障信息报文是专门用于承载网络设备的KPI的报文,且该专门用于承载网络设备的KPI的报文基于TLV字段定义。
可选地,在一些实施例中,该故障信息报文还可以是基于IGP协议的报文,该故障信息报文基于IGP协议报文的TLV字段扩展。
可选地,在一些实施例中,至少一个网络设备中的第一网络设备发送的故障信息报文携带与该第二网络设备相邻的第三网络设备的一个或多个KPI的统计结果,该第三网络设备是不支持该故障信息报文传输的网络设备。
可选地,在一些实施例中,网络为预先划分的第一子网和第二子网中的一个子网,该第一子网和该第二子网基于不支持该故障信息报文传输的网络设备的网络位置而划分。
应理解,根据本发明实施例的网络设备600可对应于本发明实施例中的网络设备,并且该网络设备600中的各个模块的上述和其他操作和/或功能分别实现图2至图5中的各个方法的相应流程,为了简洁,在此不再赘述。
图7是根据本发明实施例的网络设备700的示意性结构图。如图7所示,所述网络设备700包括存储器710和处理器720,所述存储器710和处理器720之间通过内部连接通路互相通信,传递控制和/或数据信号。
所述存储器710用于存储程序代码;
所述处理器720用于调用所述程序代码以实现本发明上述各实施例中的方法。
在本发明实施例中,处理器720可以是中央处理器(英文:central processing unit,简称为“CPU”),网络处理器(英文:network processor,简称为“NP”)或者CPU和NP的组合。处理器还可以进一步包括硬件芯片。上述硬件芯片可以是专用集成电路(英文:application-specific integrated Circuit,简称为“ASIC”),可编程逻辑器件(英文:programmable logic device,简称为“PLD”)或其组合。本发明实施例提供了一种计算机可读介质,用于存储计算机程序代码,该计算机程序包括用于执行上述图2至图5中本发明实施例的故障定位方法的指令。该可读介质可以是只读存储器(英文:read-only memory,简称为“ROM”)或随机存取存储器(英文:random access memory,简称为“RAM”),本发明实施例对此不做限制。
应理解,本文中术语“和/或”仅仅是一种描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。另外,本文中字符“/”,一般表示前后关联对象是一种“或”的关系。
本领域普通技术人员可以意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。
在本申请所提供的几个实施例中,应该理解到,所揭露的系统、装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。
以上所述,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应所述以权利要求的保护范围为准。

Claims (14)

  1. 一种故障定位方法,其特征在于,所述方法包括:
    第一网络设备接收网络中除所述第一网络设备外的其他网络设备以泛洪方式发送的故障信息报文,每个故障信息报文包括发送所述故障信息报文的网络设备关于内部网关协议的报文的统计信息,每个网络设备的统计信息包括所述网络设备对一个或多个关键性能指标KPI的统计结果;
    所述第一网络设备根据所述第一网络设备的统计信息以及其他网络设备的统计信息,确定所述网络中发生故障的网络设备。
  2. 根据权利要求1所述的方法,其特征在于,所述方法还包括:
    所述第一网络设备获取所述第一网络设备的统计信息,所述统计信息包括所述第一网络设备的一个或多个KPI的统计结果;
    所述第一网络设备以泛洪方式向所述其他网络设备发送所述第一网络设备的故障信息报文,所述第一网络设备的故障信息报文包含所述第一网络设备的统计信息。
  3. 根据权利要求2所述的方法,其特征在于,所述第一网络设备以泛洪方式向所述其他网络设备发送所述第一网络设备的故障信息报文,包括:
    所述第一网络设备按照第一预设周期,以泛洪方式向所述其他网络设备发送所述第一网络设备的故障信息报文;或
    在所述第一网络设备的第一KPI的统计结果满足预设条件的情况下,所述第一网络设备以泛洪方式向所述其他网络设备发送所述第一网络设备的故障信息报文。
  4. 根据权利要求3所述的方法,其特征在于,所述满足预设条件包括所述第一KPI的统计结果大于或者等于所述第一网络设备为所述第一KPI设置的KPI阈值。
  5. 根据权利要求1至4中任一项所述的方法,其特征在于,所述第一网络设备根据所述第一网络设备的统计信息以及其他网络设备的统计信息,确定所述网络中的发生故障的网络设备,包括:
    所述第一网络设备确定目标KPI;
    所述第一网络设备根据所述第一网络设备的统计信息以及所述其他网络设备的统计信息,计算所述网络中每个网络设备上所述目标KPI的KPI变化率;
    所述第一网络设备根据所述网络中每个网络设备上所述目标KPI的KPI变化率,从所述第一网络设备和所述其他网络设备中选取发生故障的网络设备,其中所述发生故障的网络设备的所述目标KPI的KPI变化率大于或等于预设KPI变化率阈值。
  6. 根据权利要求1至4中任一项所述的方法,其特征在于,所述第一网络设备根据所述第一网络设备的统计信息以及其他网络设备的统计信息,确定所述网络中的发生故障的网络设备,包括:
    所述第一网络设备确定目标KPI;
    所述第一网络设备根据所述第一网络设备的统计信息以及所述其他网络设备的统计信息,获取所述网络中任意两个网络设备之间传输的所述目标KPI对应内部网关协议的报文的统计结果;
    根据所述任意两个网络设备之间传输的所述目标KPI对应内部网关协议的报文的统 计结果,生成邻接矩阵;
    根据所述邻接矩阵,确定所述网络中的每个网络设备的中心度;
    根据所述每个网络设备的中心度,确定所述网络中的发生故障的网络设备。
  7. 根据权利要求1至6中任一项所述的方法,其特征在于,所述其他网络设备包括第二网络设备,所述第二网络发送的故障信息报文携带与所述第二网络设备相邻的第三网络设备的一个或多个KPI的统计结果,所述第三网络设备是不支持所述故障信息报文传输的网络设备。
  8. 一种网络设备,其特征在于,所述网络设备包括:
    接收模块,用于接收网络中除所述网络设备之外的其他网络设备以泛洪方式发送的故障信息报文,每个故障信息报文包括发送所述故障信息报文的网络设备关于内部网关协议的报文的统计信息,每个网络设备的统计信息包括所述网络设备对一个或多个关键性能指标KPI的统计结果;
    确定模块,用于根据所述网络设备的统计信息以及其他网络设备的统计信息,确定所述网络中发生故障的网络设备。
  9. 根据权利要求8所述的网络设备,其特征在于,所述网络设备还包括:
    获取模块,用于获取所述网络设备的统计信息,所述统计信息包括所述网络设备的一个或多个KPI的统计结果;
    发送模块,用于以泛洪方式向所述其他网络设备发送所述网络设备的故障信息报文,所述网络设备的故障信息报文包含所述网络设备的统计信息。
  10. 根据权利要求9所述的网络设备,其特征在于,所述发送模块具体用于:
    按照第一预设周期,以泛洪方式向所述其他网络设备发送所述网络设备的故障信息报文;或在所述网络设备的第一KPI的统计结果满足预设条件的情况下,以泛洪方式向所述其他网络设备发送所述网络设备的故障信息报文。
  11. 根据权利要求10所述的网络设备,其特征在于,所述满足预设条件包括所述第一KPI的统计结果大于或者等于所述第一网络设备为所述第一KPI设置的KPI阈值。
  12. 根据权利要求8至11中任一项所述的网络设备,其特征在于,所述确定模块具体用于确定目标KPI;根据所述网络设备的统计信息以及所述其他网络设备的统计信息,计算所述网络中的每个网络设备上所述目标KPI的KPI变化率;根据所述网络中的每个网络设备上目标KPI的KPI变化率,从所述网络设备和所述其他网络设备中选取发生故障的网络设备,其中所述发生故障的网络设备的目标KPI的KPI变化率大于或等于预设KPI变化率阈值。
  13. 根据权利要求8至11中任一项所述的网络设备,其特征在于,所述确定模块具体用于确定目标KPI;根据所述网络设备的统计信息以及所述其他网络设备的统计信息,获取所述网络中任意两个网络设备之间传输的所述目标KPI对应内部网关协议的报文的统计结果;根据所述任意两个网络设备之间传输的所述目标KPI对应内部网关协议的报文的统计结果,生成邻接矩阵;根据所述邻接矩阵,确定所述网络中的每个网络设备的中心度;根据所述每个网络设备的中心度,确定所述网络中的发生故障的网络设备。
  14. 根据权利要求8至13中任一项所述的网络设备,其特征在于,所述其他网络设备包括第二网络设备,所述第二网络设备发送的故障信息报文携带与所述第二网络设备相邻的第三网络设备的一个或多个KPI的统计结果,所述第三网络设备是不支持所述故障信息报文传输的网络设备。
PCT/CN2017/105881 2016-12-12 2017-10-12 故障定位方法和网络设备 WO2018107882A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
EP17881289.7A EP3547612A1 (en) 2016-12-12 2017-10-12 Fault positioning method and network device
US16/437,442 US11411810B2 (en) 2016-12-12 2019-06-11 Fault locating method and network device

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201611138835.9A CN108616367B (zh) 2016-12-12 2016-12-12 故障定位方法和网络设备
CN201611138835.9 2016-12-12

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US16/437,442 Continuation US11411810B2 (en) 2016-12-12 2019-06-11 Fault locating method and network device

Publications (1)

Publication Number Publication Date
WO2018107882A1 true WO2018107882A1 (zh) 2018-06-21

Family

ID=62557871

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2017/105881 WO2018107882A1 (zh) 2016-12-12 2017-10-12 故障定位方法和网络设备

Country Status (4)

Country Link
US (1) US11411810B2 (zh)
EP (1) EP3547612A1 (zh)
CN (1) CN108616367B (zh)
WO (1) WO2018107882A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3846389A4 (en) * 2018-09-19 2021-10-27 Huawei Technologies Co., Ltd. ROUTING OSCILLATION INFORMATION DETERMINATION PROCESS AND ASSOCIATED DEVICE

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10977574B2 (en) * 2017-02-14 2021-04-13 Cisco Technology, Inc. Prediction of network device control plane instabilities
CN113179171B (zh) * 2020-01-24 2023-04-18 华为技术有限公司 故障检测方法、装置及系统
CN111766788A (zh) * 2020-06-01 2020-10-13 珠海格力电器股份有限公司 智能家居控制方法及装置
CN113810238A (zh) * 2020-06-12 2021-12-17 中兴通讯股份有限公司 网络监测方法、电子设备及存储介质
CN112749370B (zh) * 2021-04-06 2021-07-02 广东际洲科技股份有限公司 一种基于物联网的故障跟踪定位方法和系统
CN114598609A (zh) * 2022-03-11 2022-06-07 杭州网银互联科技股份有限公司 一种网络拓扑连接结构信息存储方法

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101431448A (zh) * 2008-10-22 2009-05-13 华为技术有限公司 定位ip承载网故障的方法、设备和系统
WO2012042440A2 (en) * 2010-09-29 2012-04-05 Telefonaktiebolaget L M Ericsson (Publ) Fast flooding based fast convergence to recover from network failures
US20130290230A1 (en) * 2012-04-27 2013-10-31 Nokia Siemens Networks Oy Method for heterogeneous network policy based management
CN103945442A (zh) * 2014-05-07 2014-07-23 东南大学 移动通信系统中基于线性预测原理的系统异常检测方法
CN103973496A (zh) * 2014-05-21 2014-08-06 华为技术有限公司 故障诊断方法及装置
CN105071968A (zh) * 2015-08-18 2015-11-18 大唐移动通信设备有限公司 一种通信设备的业务面和控制面的隐性故障修复方法和装置

Family Cites Families (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5901214A (en) * 1996-06-10 1999-05-04 Murex Securities, Ltd. One number intelligent call processing system
US7254114B1 (en) * 2002-08-26 2007-08-07 Juniper Networks, Inc. Network router having integrated flow accounting and packet interception
US7225258B2 (en) * 2002-09-09 2007-05-29 General Dynamics Corporation System and method for connecting dynamic networks with limited resources
JP4376601B2 (ja) * 2003-12-09 2009-12-02 富士通株式会社 ネットワーク障害検出方法及びその装置
EP1964330B1 (en) * 2005-12-23 2010-08-11 Telecom Italia S.p.A. Method for reducing fault detection time in a telecommunication network
US7543190B2 (en) * 2006-06-28 2009-06-02 Walker Don H System and method for detecting false positive information handling system device connection errors
US7889679B2 (en) * 2008-02-01 2011-02-15 Telenor Asa Arrangements for networks
US8243587B2 (en) * 2009-10-22 2012-08-14 Verizon Business Global Llc Label distribution protocol synchronization in multi-protocol label switching environments
CN101877659B (zh) * 2010-06-30 2014-07-16 中兴通讯股份有限公司 一种丢包监控的方法、设备和系统
DE102011004064A1 (de) * 2011-02-14 2012-08-16 Siemens Aktiengesellschaft Zwischennetzwerk in Ringtopologie und Verfahren zum Herstellen einer Netzwerkverbindung zwischen zwei Netzwerkdomänen
ES2632475T3 (es) * 2012-11-13 2017-09-13 Telefonaktiebolaget Lm Ericsson (Publ) Método para modificar valores de parámetro para modo de extensión de largo alcance y nodo correspondiente
CN103051956A (zh) * 2012-12-24 2013-04-17 乐视致新电子科技(天津)有限公司 一种实现日志上报和故障诊断的机顶盒及其方法
US9660860B1 (en) * 2014-12-30 2017-05-23 Juniper Networks, Inc. Path computation delay timer in multi-protocol label switched networks
US9769065B2 (en) * 2015-05-06 2017-09-19 Telefonaktiebolaget Lm Ericsson (Publ) Packet marking for L4-7 advanced counting and monitoring
US11089517B2 (en) * 2016-03-09 2021-08-10 Telefonaktiebolaget Lm Ericsson (Publ) Traffic availability in a cellular communication network

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101431448A (zh) * 2008-10-22 2009-05-13 华为技术有限公司 定位ip承载网故障的方法、设备和系统
WO2012042440A2 (en) * 2010-09-29 2012-04-05 Telefonaktiebolaget L M Ericsson (Publ) Fast flooding based fast convergence to recover from network failures
US20130290230A1 (en) * 2012-04-27 2013-10-31 Nokia Siemens Networks Oy Method for heterogeneous network policy based management
CN103945442A (zh) * 2014-05-07 2014-07-23 东南大学 移动通信系统中基于线性预测原理的系统异常检测方法
CN103973496A (zh) * 2014-05-21 2014-08-06 华为技术有限公司 故障诊断方法及装置
CN105071968A (zh) * 2015-08-18 2015-11-18 大唐移动通信设备有限公司 一种通信设备的业务面和控制面的隐性故障修复方法和装置

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP3547612A4 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3846389A4 (en) * 2018-09-19 2021-10-27 Huawei Technologies Co., Ltd. ROUTING OSCILLATION INFORMATION DETERMINATION PROCESS AND ASSOCIATED DEVICE
US11489759B2 (en) 2018-09-19 2022-11-01 Huawei Technologies Co., Ltd. Method for determining route flapping information and related device

Also Published As

Publication number Publication date
EP3547612A4 (en) 2019-10-02
US20190296968A1 (en) 2019-09-26
CN108616367B (zh) 2021-01-05
US11411810B2 (en) 2022-08-09
EP3547612A1 (en) 2019-10-02
CN108616367A (zh) 2018-10-02

Similar Documents

Publication Publication Date Title
WO2018107882A1 (zh) 故障定位方法和网络设备
US10868730B2 (en) Methods, systems, and computer readable media for testing network elements of an in-band network telemetry capable network
US10142203B2 (en) Ethernet fault management systems and methods
EP2486706B1 (en) Network path discovery and analysis
EP3817298A1 (en) Data message detection method, device and system
WO2016184245A1 (zh) 一种隧道丢包检测方法、装置及网络通信设备
WO2021093465A1 (zh) 发送报文、接收报文以进行oam的方法、装置及系统
CN104753828A (zh) 一种sdn控制器、数据中心系统和路由连接方法
CN109428782B (zh) 网络监控的方法和设备
CN108566336A (zh) 一种网络路径获取方法和设备
CN110557342A (zh) 用于分析和减轻丢弃的分组的设备
CN109558727A (zh) 一种路由安全检测方法和系统
US9641355B2 (en) Communication device, communication method, and program
US9893979B2 (en) Network topology discovery by resolving loops
CN108040007A (zh) 一种备用路由链路质量监测方法与系统
US20160248652A1 (en) System and method for classifying and managing applications over compressed or encrypted traffic
WO2021027420A1 (zh) 用于数据传输的方法和装置
US10148515B2 (en) Determining connections of non-external network facing ports
US9667439B2 (en) Determining connections between disconnected partial trees
US20190372852A1 (en) Event-aware dynamic network control
CN107070673B (zh) 基于集中式控制平面的路径状态回报方法
Manzanares-Lopez et al. Host Discovery Solution: An Enhancement of Topology Discovery in OpenFlow based SDN Networks.
KR101707073B1 (ko) Sdn 기반의 에러 탐색 네트워크 시스템
JP5829183B2 (ja) 経路制御プロトコルに基づいて障害ノード装置又は障害リンクをリアルタイムに検出する方法、ノード装置及びプログラム
EP4319089A1 (en) Path determination method and apparatus, device, system, and computer readable storage medium

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17881289

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 2017881289

Country of ref document: EP

Effective date: 20190625